RE: Updating Solr index from XML files
Otis, What is the difference or advantage if using solr.pm? http://search.cpan.org/~garafola/Solr-0.03/lib/Solr.pm Thanks Francis -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, July 07, 2009 10:34 PM To: solr-user@lucene.apache.org Subject: Re: Updating Solr index from XML files If Perl is you choice: http://search.cpan.org/~bricas/WebService-Solr-0.07/lib/WebService/Solr.pm Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Francis Yakin fya...@liquid.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Wednesday, July 8, 2009 1:16:04 AM Subject: Updating Solr index from XML files I have the following curl cmd to update and doing commit to Solr ( I have 10 xml files just for testing) curl http://solr00:7001/solr/update --data-binary @xml_Artist-100170.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101062.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101238.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101400.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101513.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101517.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101572.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101691.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101694.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101698.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @commit.txt -H 'Content-type:text/plain; charset=utf-8' It works so far. But I will have 3 xml files. What's the efficient way to do these things? I can script it with for loop using regular shell script or perl. I am also looking into solr.pm from this: http://wiki.apache.org/solr/IntegratingSolr BTW: We are using weblogic to deploy the solr.war and by default solr in weblogic using port 7001, but not 8983. Thanks Francis
Re: Updating Solr index from XML files
If Perl is you choice: http://search.cpan.org/~bricas/WebService-Solr-0.07/lib/WebService/Solr.pm h. Very interesting; I had not seen this! Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Francis Yakin fya...@liquid.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Wednesday, July 8, 2009 1:16:04 AM Subject: Updating Solr index from XML files I have the following curl cmd to update and doing commit to Solr ( I have 10 xml files just for testing) curl http://solr00:7001/solr/update --data-binary @xml_Artist-100170.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101062.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101238.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101400.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101513.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101517.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101572.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101691.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101694.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101698.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @commit.txt -H 'Content-type:text/plain; charset=utf-8' It works so far. But I will have 3 xml files. What's the efficient way to do these things? I can script it with for loop using regular shell script or perl. Assuming Solr1.4 or a nightly build. I would use DIH for this:- If all the files to be added/updated are in a directory. Then the FileListEntityProcessor could be used to find and index the files. It walks the disk from a given starting point. If you have another file, listing the files to be indexed, then I would use LineEntityProcessor to process that list. One or other of the above would locate file to be indexed and would pass the filename to XPathEntityProcessor with useSolrAddSchema set to true. See http://wiki.apache.org/solr/DataImportHandler I am also looking into solr.pm from this: http://wiki.apache.org/solr/IntegratingSolr BTW: We are using weblogic to deploy the solr.war and by default solr in weblogic using port 7001, but not 8983. Thanks Francis -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===
Re: Preparing the ground for a real multilang index
On 08.07.2009 00:50 Jan Høydahl wrote: itself and do not need to know the query language. You may then want to do a copyfield from all your text_lang - text for convenient one- field-to-rule-them-all search. Would that really help? As I understand it, copyfield takes the raw, not yet analyzed field value. I cannot see yet the advantage of this text-field over the current situation with no text_lang fields at all. The copied-to text field has to be language agnostic with no stemming at all, so it would miss many hits. Or is there a way to combine many differently stemmed variants into one field to be able to search against all of them at once? That would be great indeed! -Michael
Re: Can't limit return fields in custom request handler
II'll look SolrPluginUtils.setReturnFields. I'm running same query : http://localhost:8983/solr/select/?qt=cfacetq=%2BitemTitle:nokia%20%2BcategoryId:130start=0limit=3http://localhost:8983/solr/select/?qt=cfacetq=%2BitemTitle:nokia%20%2BcategoryId:130start=0limit=3fl=id I get none empty result when filter parameter is null, but when i pass inStores filter parameter to getDocListAndSet i get empty result. SolrParams solrParams = req.getParams(); Query q = QueryParsing.parseQuery(solrParams.get(q), req.getSchema()); Query filter = new TermQuery(new Term(inStores, true)); DocListAndSet results = req.getSearcher().getDocListAndSet(q, (Query)filter, (Sort)null, solrParams.getInt(start), solrParams.getInt(limit)); Thanks. On Tue, Jul 7, 2009 at 11:45 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : But I have a problem like this; when i call : http://localhost:8983/solr/select/?qt=cfacetq=%2BitemTitle:nokia%20%2BcategoryId:130start=0limit=3fl=id , : itemTitle : i'm getiing all fields instead of only id and itemTitle. Your custom handler is responsible for checking the fl and setting what you want the response fields to be on the response object. SolrPluginUtils.setReturnFields can be used if you want this to be done in the normal way. : Also i'm gettting no result when i give none null filter parameter in : getDocListAndSet(...). ... : DocListAndSet results = req.getSearcher().getDocListAndSet(q, : (Query)null, (Sort)null, solrParams.getInt(start), : solrParams.getInt(limit)); ...that should work. What does your query look like? what are you passing for the start and limit params (is it possible you are getting results, but limit=0 so there aren't any results on the current page of pagination?) what does the debug output look like? -Hoss -- Osman İZBAT
RE: Browse indexed terms in a field
Thanks ! it seems that can do the trick... Date: Tue, 7 Jul 2009 11:10:15 -0400 Subject: Re: Browse indexed terms in a field From: bill.w...@gmail.com To: solr-user@lucene.apache.org You can use facet.perfix to match the beginning of a given word: http://wiki.apache.org/solr/SimpleFacetParameters#head-579914ef3a14d775a5ac64d2c17a53f3364e3cf6 Bill On Tue, Jul 7, 2009 at 11:02 AM, Pierre-Yves LANDRON pland...@hotmail.comwrote: Hello, Here is what I would like to achieve : in an indexed document there's a fulltext indexed field ; I'd like to browse the terms in this field, ie. get all the terms that match the begining of a given word, for example. I can get all the field's facets for this document, but that's a lot of terms to process ; is there a way to constraint the returned facets ? Thank you for your highlights. Kind regards, Pierre. _ More than messages–check out the rest of the Windows Live™. http://www.microsoft.com/windows/windowslive/ _ Windows Live™: Keep your life in sync. Check it out! http://windowslive.com/explore?ocid=TXT_TAGLM_WL_t1_allup_explore_012009
Change in DocListAndSetNC not messing everything
Hey there, I had to implement something similar to field collapsing but could't use the patch as it decreases a lot performance with index of about 4 Gig. For testing, what I have done is do some hacks to SolrIndexSearcher's getDocListAndSetNC funcion. I fill the ids array in my own order or I just don't add some docs id's (and so change this id's array size). I have been testing it and the performance is dramatically better that using the patch. Can anyone tell me witch is the best way to hack DocListAndSetNC? I mean, I know this change can make me go mad in the future, when I decide to update trunk version or update to new releases. My hack provably is too specific for my use case but could upload the source in case someone can advice me what to do. Thanks in advance, -- View this message in context: http://www.nabble.com/Change-in-DocListAndSetNC-not-messing-everything-tp24387830p24387830.html Sent from the Solr - User mailing list archive at Nabble.com.
how to do the distributed search with sort using solr?
In my project, I am trying to do a distributed search sorted by some field using solr. The test code is as follows: SolrQuery query = new SolrQuery(); query.set(q, id:[1 TO *]); query.setSortField(id,SolrQuery.ORDER.asc); query.setParam(shards, localhost:8983/solr, localhost:7574/solr); QueryResponse response = server.query(query); I get the following error. It seems that solr doesn't support the sort function while doing the distributed search. Do you have any suggestions to solve this problem, thanks! org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:103) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:115) at test.MainClassTest.searchTest(MainClassTest.java:88) at test.MainClassTest.main(MainClassTest.java:48) Caused by: org.apache.solr.client.solrj.SolrServerException: java.net.ConnectException: Connection refused at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:391) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:183) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) ... 3 more
Re: how to do the distributed search with sort using solr?
Sorry, the error is as follows. I have read the solr wiki carefully and google it, but I haven't founded any related question or solution, any one can help me, thanks! org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:103) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:115) at test.MainClassTest.searchTest(MainClassTest.java:88) at test.MainClassTest.main(MainClassTest.java:48) Caused by: org.apache.solr.common.SolrException:
Re: Updating Solr index from XML files
On Tue, 7 Jul 2009 22:16:04 -0700 Francis Yakin fya...@liquid.com wrote: I have the following curl cmd to update and doing commit to Solr ( I have 10 xml files just for testing) [...] hello, DIH supports XML, right? not sure if it works with n files...but it's worth looking at it. alternatively, u can write a relatively simple java app that will pick each file up and post it for you using SolrJ b _ {Beto|Norberto|Numard} Meijome Mix a little foolishness with your serious plans; it's lovely to be silly at the right moment. Horace I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Adding new Fields ?
Hello. I posted recently in this ML a script to transform any xml files in Solr's xml files. Anyway. I've got a problem when I want to index my file, the indexation script from the demonstration works perfectly, but now the only problem is, I can make any research on this document. I added field name=lomgeneralidentifier type=text indexed=true stored=true multiValued=true omitNorms=true termVectors=true / and copyfield source=lomgeneralidentifierentry dest=text / In schema.xml file. Did I forgot something ? -- Saeli Mathieu.
All in one index, or multiple indexes?
Hi, I am wondering if it is common to have just one very large index, or multiple smaller indexes specialized for different content types. We currently have multiple smaller indexes, although one of them is much larger then the others. We are considering merging them, to allow the convenience of searching across multiple types at once and get them back in one list. The largest of the current indexes has a couple of types that belong together, it has just one text field, and it is usually quite short and is similar to product names (words like The matter). Another index I would merge with this one, has multiple text fields (also quite short). We of course would still like to be able to get specific types. Is doing filtering on just one type a big performance hit compared to just querying it from it's own index? Bare in mind all these indexes run on the same machine. (we replicate them all to three machines and do load balancing). There are a number of considerations. From an application standpoint when querying across all types we may split the results out into the separate types anyway once we have the list back. If we always do this, is it silly to have them in one index, rather then query multiple indexes at once? Is multiple http requests less significant then the time to post split the results? In some ways it is easier to maintain a single index, although it has felt easier to optimize the results for the type of content if they are in separate indexes. My main concern of putting it all in one index is that we'll make it harder to work with. We will definitely want to do filtering on types sometimes, and if we go with a mashed up index I'd prefer not to maintain separate specialized indexes as well. Any thoughts? ~Tim.
Re: how to do the distributed search with sort using solr?
On Wed, Jul 8, 2009 at 6:45 AM, shb suh...@gmail.com wrote: Sorry, the error is as follows. I have read the solr wiki carefully and google it, but I haven't founded any related question or solution, any one can help me, thanks! org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:103) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:115) at test.MainClassTest.searchTest(MainClassTest.java:88) at test.MainClassTest.main(MainClassTest.java:48) Caused by: org.apache.solr.common.SolrException: java.net.ConnectException: Connection refused at org.apache.solr.client.solrj. impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:391) Are you sure both servers are running properly? You can hit them individually? -- -- - Mark http://www.lucidimagination.com
Re: Updating Solr index from XML files
On Wed, Jul 8, 2009 at 4:19 PM, Norberto Meijomenumard...@gmail.com wrote: On Tue, 7 Jul 2009 22:16:04 -0700 Francis Yakin fya...@liquid.com wrote: I have the following curl cmd to update and doing commit to Solr ( I have 10 xml files just for testing) [...] hello, DIH supports XML, right? yes. it supports multiple files too . (use FileListEntityProcessor) not sure if it works with n files...but it's worth looking at it. alternatively, u can write a relatively simple java app that will pick each file up and post it for you using SolrJ b _ {Beto|Norberto|Numard} Meijome Mix a little foolishness with your serious plans; it's lovely to be silly at the right moment. Horace I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Updating Solr index from XML files
On Jul 8, 2009, at 6:49 AM, Norberto Meijome wrote: alternatively, u can write a relatively simple java app that will pick each file up and post it for you using SolrJ Note that Solr ships with post.jar. So one could post a bunch of Solr XML file like this: java -jar post.jar *.xml Erik
Re: facets and stopwords
hossman wrote: but are you sure that example would actually cause a problem? i suspect if you index thta exact sentence as is you wouldn't see the facet count for si or que increase at all. If you do a query for {!raw field=content}que you bypass the query parsers (which is respecting your stopwords file) and see all docs that contain the raw term que in the content field. if you look at some of the docs that match, and paste their content field into the analysis tool, i think you'll see that the problem comes from using the whitespace tokenizer, and is masked by using the WDF after the stop filter ... things like Que? are getting ignored by the stopfilter, but ultimately winding up in your index as que -Hoss Yes your are right, que? que, que... i need to change the analyzer. They are not detected by the stopwords analyzer because i use the whitespace tokenizer, I will use the StanadardTokenizer Thanks Hoss -- View this message in context: http://www.nabble.com/facets-and-stopwords-tp23952823p24390157.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Adding new Fields ?
On Jul 8, 2009, at 7:06 AM, Saeli Mathieu wrote: Hello. I posted recently in this ML a script to transform any xml files in Solr's xml files. Anyway. I've got a problem when I want to index my file, the indexation script from the demonstration works perfectly, but now the only problem is, I can make any research on this document. I added field name=lomgeneralidentifier type=text indexed=true stored=true multiValued=true omitNorms=true termVectors=true / and copyfield source=lomgeneralidentifierentry dest=text / In schema.xml file. Did I forgot something ? your field name is not the same as your copyfield source (note the entry on the source attribute) Erik
Re: how to do the distributed search with sort using solr?
java.net.ConnectException: Connection refused at org.apache.solr.client.solrj. impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:391) The Connection refused error is caused becaused that the servers have been stopped. IAre you sure both servers are running properly? You can hit them individually? I start both servers and if I comment out query.setParam(shards, localhost:8983/solr, localhost:7574/solr); or query.setSortField(id,SolrQuery.ORDER.asc);, it will both work correctly. However, if I keep them both in the program, I got the error as follows: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:103) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:115) at test.MainClassTest.searchTest(MainClassTest.java:88) at test.MainClassTest.main(MainClassTest.java:48)
Re: Adding new Fields ?
Yep I know that, I almost add more than 60 lines in this file :) It's just an example. Do you have any idea why when I'm trying to search something, the result of Solr is equal to 0 ? I'm looking forward to read you. -- Saeli Mathieu.
Re: Question regarding ExtractingRequestHandler
For metadata, you can add the ext.metadata.prefix field and then use a dynamic field that maps that prefix, such as: ext.metadata.prefix=metadata_ dynamicField name=metadata_* type=stringindexed=true stored=true/ Note, some of this is currently under review to be changed. See https://issues.apache.org/jira/browse/SOLR-284 -Grant On Jul 7, 2009, at 10:49 AM, ahammad wrote: Hello, I've recently started using this handler to index MS Word and PDF files. When I set ext.extract.only=true, I get back all the metadata that is associated with that file. If I want to index, I need to set ext.extract.only=false. If I want to index all that metadata along with the contents, what inputs do I need to pass to the http request? Do I have to specifically define all the fields in the schema or can Solr dynamically generate those fields? Thanks. -- View this message in context: http://www.nabble.com/Question-regarding-ExtractingRequestHandler-tp24374393p24374393.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Placing a CSV file into SOLR Server
Is there any way to Place the CSV file to index in the SOLR Server so that the file can be indexed and searched. If so please let me know the location in which we have to place the file. We are looking for a workaround to avoid the HTTP request to the SOLR server as it is taking much time. -- View this message in context: http://www.nabble.com/Placing-a-CSV-file-into-SOLR-Server-tp24390648p24390648.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Adding new Fields ?
On Jul 8, 2009, at 8:10 AM, Saeli Mathieu wrote: Yep I know that, I almost add more than 60 lines in this file :) It's just an example. Do you have any idea why when I'm trying to search something, the result of Solr is equal to 0 ? The first place I start with a general question like is add debugQuery=true and see what the query expression is parsed to, then go from there to find out if that is the actually intended query (proper fields being used, etc) and then back into the analysis process and the data that was indexed. analysis.jsp comes in real handy troubleshooting these things. Erik
Re: Placing a CSV file into SOLR Server
from: http://wiki.apache.org/solr/UpdateCSV The following request will cause Solr to directly read the input file: curl http://localhost:8983/solr/update/csv?stream.file=exampledocs/books.csvstream.contentType=text/plain;charset=utf-8 #NOTE: The full path, or a path relative to the CWD of the running solr server must be used. So you can put it anywhere local and give solr the full path to directly read it. -Yonik http://www.lucidimagination.com On Wed, Jul 8, 2009 at 8:34 AM, Anand Kumar Prabhakaranand2...@gmail.com wrote: Is there any way to Place the CSV file to index in the SOLR Server so that the file can be indexed and searched. If so please let me know the location in which we have to place the file. We are looking for a workaround to avoid the HTTP request to the SOLR server as it is taking much time. -- View this message in context: http://www.nabble.com/Placing-a-CSV-file-into-SOLR-Server-tp24390648p24390648.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Adding new Fields ?
The research debug is bit wired... I'll give you a typical example. I want to find, this word Cycle the field in my xml file is this one add doc .. field name=lomclassificationtaxonPathtaxonentrystringCycle 2/field /doc /add This field is refered in my schema.xml by this way. fields field name=lomclassificationtaxonPathtaxonentrystring type=text indexed=true stored=true multiValued=true omitNorms=true termVectors=true / /fields and copyfield source=lomclassificationtaxonPathtaxonentrystring dest=text/ Here is my research in debug mode with this request http://localhost:8983/solr/select?indent=onversion=2.2q=Cyclestart=0rows=10fl=*%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl.fl= -response -lst name=responseHeader int name=status0/int int name=QTime0/int -lst name=params str name=explainOther/ str name=fl*,score/str str name=debugQueryon/str str name=indenton/str str name=start0/str str name=qCycle/str str name=hl.fl/ str name=qtstandard/str str name=wtstandard/str str name=version2.2/str str name=rows10/str /lst /lst result name=response numFound=0 start=0 maxScore=0.0/ -lst name=debug str name=rawquerystringCycle/str str name=querystringCycle/str str name=parsedquerytext:cycl/str str name=parsedquery_toStringtext:cycl/str lst name=explain/ str name=QParserOldLuceneQParser/str -lst name=timing double name=time0.0/double -lst name=prepare double name=time0.0/double -lst name=org.apache.solr.handler.component.QueryComponent double name=time0.0/double /lst -lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst -lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst -lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst -lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst -lst name=process double name=time0.0/double -lst name=org.apache.solr.handler.component.QueryComponent double name=time0.0/double /lst -lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst -lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst -lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst -lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst /lst /lst /response I don't know what I'm missing :/ Because I think I add all the necessary information in schema.xml. -- Saeli Mathieu.
Re: Preparing the ground for a real multilang index
Can't the copy field use a different analyzer? Both for query and indexing? Otherwise you need to craft your own analyzer which reads the language from the field-name... there's several classes ready for this. paul Le 08-juil.-09 à 02:36, Michael Lackhoff a écrit : On 08.07.2009 00:50 Jan Høydahl wrote: itself and do not need to know the query language. You may then want to do a copyfield from all your text_lang - text for convenient one- field-to-rule-them-all search. Would that really help? As I understand it, copyfield takes the raw, not yet analyzed field value. I cannot see yet the advantage of this text-field over the current situation with no text_lang fields at all. The copied-to text field has to be language agnostic with no stemming at all, so it would miss many hits. Or is there a way to combine many differently stemmed variants into one field to be able to search against all of them at once? That would be great indeed! -Michael smime.p7s Description: S/MIME cryptographic signature
Re: Placing a CSV file into SOLR Server
Thank you for the input Yonik, anyway again we are sending an HTTP request to the server, my requirement is to skip the HTTP request to the SOLR server. Is there any way to avoid these HTTP requests? Yonik Seeley-2 wrote: from: http://wiki.apache.org/solr/UpdateCSV The following request will cause Solr to directly read the input file: curl http://localhost:8983/solr/update/csv?stream.file=exampledocs/books.csvstream.contentType=text/plain;charset=utf-8 #NOTE: The full path, or a path relative to the CWD of the running solr server must be used. So you can put it anywhere local and give solr the full path to directly read it. -Yonik http://www.lucidimagination.com On Wed, Jul 8, 2009 at 8:34 AM, Anand Kumar Prabhakaranand2...@gmail.com wrote: Is there any way to Place the CSV file to index in the SOLR Server so that the file can be indexed and searched. If so please let me know the location in which we have to place the file. We are looking for a workaround to avoid the HTTP request to the SOLR server as it is taking much time. -- View this message in context: http://www.nabble.com/Placing-a-CSV-file-into-SOLR-Server-tp24390648p24390648.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Placing-a-CSV-file-into-SOLR-Server-tp24390648p24391630.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr's MLT query call doesn't work
Hi, Recently, while implementing the MoreLikeThis search, I've run into the situation when Solr's mlt query calls don't work. More specifically, the following query: http://localhost:8080/solr/select?q=id:10mlt=truemlt.fl=content_mltmlt.maxqt= 5mlt.interestingTerms=detailsfl=title+author+score brings back just the doc with id=10 and nothing else. While using the GetMethod approach (putting /mlt explicitely into the url), I got back some results. I've been trying to solve this problem for more than a week with no luck. If anybody has any hint, please help. Below, I put logs outputs from 3 runs: a) Solr; b) GetMethod (/mlt); c) GetMethod (/select). Thanks a lot. Regards, Sergey Goldberg Here're the logs: a) Solr (http://localhost:8080/solr/select) 08.07.2009 15:50:33 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt= truemlt.interestingTerms=detailsmlt.maxqt=5wt=javabinversion=2.2} hits=1 status=0 QTime=172 INFO MLTSearchRequestProcessor:49 - SolrServer url: http://localhost:8080/solr INFO MLTSearchRequestProcessor:67 - solrQuery q=id%3A10mlt=truemlt.fl=content_mltmlt.maxqt= 5mlt.interestingTerms=detailsfl=title+author+score INFO MLTSearchRequestProcessor:73 - Number of docs found = 1 INFO MLTSearchRequestProcessor:77 - title = SG_Book; score = 2.098612 b) GetMethod (http://localhost:8080/solr/mlt) 08.07.2009 16:55:44 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/mlt params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt.max qt=5mlt.interestingTerms=details} status=0 QTime=15 INFO MLT2SearchRequestProcessor:76 - ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime0/int/lstresult name=match numFound=1 start=0 maxScore=2.098612docfloat name=score2.098612/floatarr name= authorstrS.G./str/arrstr name=titleSG_Book/str/doc/resultresult name=response n umFound=4 start=0 maxScore=0.28923997docfloat name=score0.28923997/floatarr name=authorstrO. Henry/strstrS.G./str/arrstr name=titleFour Million, The/str/docdocfloat name=score0.08667877/floatarr name=authorstrKatherine Mosby/str/arrstr name=titleThe Season of Lillian Dawes/str/docdocfloat name=score0.07947738/floatarr name=authorstrJerome K. Jerome/str/arrstr name=titleThree Men in a Boat/str/docdocfloat name=score0.047219563/floatarr name=authorstrCharles Oliver/strstrS.G./str/arrstr name=titleABC's of Science/str/doc/resultlst name=interestingTermsfloat name=content_mlt:ye1.0/floatfloat name=content_mlt:tobin1.0/floatfloat name=content_mlt:a1.0/floatfloat name=content_mlt:i1.0/floatfloat name=content_mlt:his1.0/float/lst /response c) GetMethod (http://localhost:8080/solr/select) 08.07.2009 17:06:45 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt. maxqt=5mlt.interestingTerms=details} hits=1 status=0 QTime=16 INFO MLT2SearchRequestProcessor:80 - ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime16/intlst name=paramsstr name=fltitle author score/strstr name=mlt.flcontent_mlt/strstr name=qid:10/strstr name=mlt.maxqt5/strstr name=mlt.interestingTermsdetails/str/lst/lstresult name=response numFound=1 start=0 maxScore=2.098612docfloat name=score2.098612/floatarr name=authorstrS.G./str/arrstr name=titleSG_Book/str/doc/resultlst name=debugstr name=rawquerystringid:10/strstr name=querystringid:10/strstr name=parsedq ueryid:10/strstr name=parsedquery_toStringid:10/strlst name=explainstr name=10 2.098612 = (MATCH) weight(id:10 in 3), product of: 0.9994 = queryWeight(id:10), product of: 2.0986123 = idf(docFreq=1, numDocs=5) 0.47650534 = queryNorm 2.0986123 = (MATCH) fieldWeight(id:10 in 3), product of: 1.0 = tf(termFreq(id:10)=1) 2.0986123 = idf(docFreq=1, numDocs=5) 1.0 = fieldNorm(field=id, doc=3) /str/lststr name=QParserOldLuceneQParser/strlst name=timingdouble name=time16.0/doublelst name=preparedouble name=time0.0/doublelst name=org.apache.solr.handler.component.QueryComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.FacetComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component .MoreLikeThisComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.HighlightComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.DebugComponentdouble name=time0.0/double/lst/lstlst name=processdouble name=time16.0/doublelst name=org.apache.solr.handler.component.QueryComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.FacetComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.MoreLikeThisComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.HighlightComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.DebugComponentdouble
Re: Placing a CSV file into SOLR Server
On Wed, Jul 8, 2009 at 9:33 AM, Anand Kumar Prabhakaranand2...@gmail.com wrote: Thank you for the input Yonik, anyway again we are sending an HTTP request to the server, my requirement is to skip the HTTP request to the SOLR server. Is there any way to avoid these HTTP requests? You're sending a tiny HTTP request to the server that tells Solr to directly read the big CSV file from disk... that should satisfy the requirement which seemed to stem from the desire to avoid network overhead, no? -Yonik http://www.lucidimagination.com
Re: Solr's MLT query call doesn't work
A couple of things, your mlt.fl value, must be part of fl. In this case, content_mlt is not included in fl. I think the fl parameter value need to be comma separated. try fl=title,author,content_mlt,score -Yao SergeyG wrote: Hi, Recently, while implementing the MoreLikeThis search, I've run into the situation when Solr's mlt query calls don't work. More specifically, the following query: http://localhost:8080/solr/select?q=id:10mlt=truemlt.fl=content_mltmlt.maxqt= 5mlt.interestingTerms=detailsfl=title+author+score brings back just the doc with id=10 and nothing else. While using the GetMethod approach (putting /mlt explicitely into the url), I got back some results. I've been trying to solve this problem for more than a week with no luck. If anybody has any hint, please help. Below, I put logs outputs from 3 runs: a) Solr; b) GetMethod (/mlt); c) GetMethod (/select). Thanks a lot. Regards, Sergey Goldberg Here're the logs: a) Solr (http://localhost:8080/solr/select) 08.07.2009 15:50:33 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt= truemlt.interestingTerms=detailsmlt.maxqt=5wt=javabinversion=2.2} hits=1 status=0 QTime=172 INFO MLTSearchRequestProcessor:49 - SolrServer url: http://localhost:8080/solr INFO MLTSearchRequestProcessor:67 - solrQuery q=id%3A10mlt=truemlt.fl=content_mltmlt.maxqt= 5mlt.interestingTerms=detailsfl=title+author+score INFO MLTSearchRequestProcessor:73 - Number of docs found = 1 INFO MLTSearchRequestProcessor:77 - title = SG_Book; score = 2.098612 b) GetMethod (http://localhost:8080/solr/mlt) 08.07.2009 16:55:44 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/mlt params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt.max qt=5mlt.interestingTerms=details} status=0 QTime=15 INFO MLT2SearchRequestProcessor:76 - ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime0/int/lstresult name=match numFound=1 start=0 maxScore=2.098612docfloat name=score2.098612/floatarr name= authorstrS.G./str/arrstr name=titleSG_Book/str/doc/resultresult name=response n umFound=4 start=0 maxScore=0.28923997docfloat name=score0.28923997/floatarr name=authorstrO. Henry/strstrS.G./str/arrstr name=titleFour Million, The/str/docdocfloat name=score0.08667877/floatarr name=authorstrKatherine Mosby/str/arrstr name=titleThe Season of Lillian Dawes/str/docdocfloat name=score0.07947738/floatarr name=authorstrJerome K. Jerome/str/arrstr name=titleThree Men in a Boat/str/docdocfloat name=score0.047219563/floatarr name=authorstrCharles Oliver/strstrS.G./str/arrstr name=titleABC's of Science/str/doc/resultlst name=interestingTermsfloat name=content_mlt:ye1.0/floatfloat name=content_mlt:tobin1.0/floatfloat name=content_mlt:a1.0/floatfloat name=content_mlt:i1.0/floatfloat name=content_mlt:his1.0/float/lst /response c) GetMethod (http://localhost:8080/solr/select) 08.07.2009 17:06:45 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt. maxqt=5mlt.interestingTerms=details} hits=1 status=0 QTime=16 INFO MLT2SearchRequestProcessor:80 - ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime16/intlst name=paramsstr name=fltitle author score/strstr name=mlt.flcontent_mlt/strstr name=qid:10/strstr name=mlt.maxqt5/strstr name=mlt.interestingTermsdetails/str/lst/lstresult name=response numFound=1 start=0 maxScore=2.098612docfloat name=score2.098612/floatarr name=authorstrS.G./str/arrstr name=titleSG_Book/str/doc/resultlst name=debugstr name=rawquerystringid:10/strstr name=querystringid:10/strstr name=parsedq ueryid:10/strstr name=parsedquery_toStringid:10/strlst name=explainstr name=10 2.098612 = (MATCH) weight(id:10 in 3), product of: 0.9994 = queryWeight(id:10), product of: 2.0986123 = idf(docFreq=1, numDocs=5) 0.47650534 = queryNorm 2.0986123 = (MATCH) fieldWeight(id:10 in 3), product of: 1.0 = tf(termFreq(id:10)=1) 2.0986123 = idf(docFreq=1, numDocs=5) 1.0 = fieldNorm(field=id, doc=3) /str/lststr name=QParserOldLuceneQParser/strlst name=timingdouble name=time16.0/doublelst name=preparedouble name=time0.0/doublelst name=org.apache.solr.handler.component.QueryComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.FacetComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component .MoreLikeThisComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.HighlightComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.DebugComponentdouble name=time0.0/double/lst/lstlst name=processdouble name=time16.0/doublelst name=org.apache.solr.handler.component.QueryComponentdouble name=time0.0/double/lstlst
Re: about defaultSearchField
Thanks for your reply. But it works not. Yang 2009/7/8 Yao Ge yao...@gmail.com Try with fl=* or fl=*,score added to your request string. -Yao Yang Lin-2 wrote: Hi, I have some problems. For my solr progame, I want to type only the Query String and get all field result that includ the Query String. But now I can't get any result without specified field. For example, query with tina get nothing, but Sentence:tina could. I hava adjusted the *schema.xml* like this: fields field name=CategoryNamePolarity type=text indexed=true stored=true multiValued=true/ field name=CategoryNameStrenth type=text indexed=true stored=true multiValued=true/ field name=CategoryNameSubjectivity type=text indexed=true stored=true multiValued=true/ field name=Sentence type=text indexed=true stored=true multiValued=true/ field name=allText type=text indexed=true stored=true multiValued=true/ /fields uniqueKey required=falseSentence/uniqueKey !-- field for the QueryParser to use when an explicit fieldname is absent -- defaultSearchFieldallText/defaultSearchField !-- SolrQueryParser configuration: defaultOperator=AND|OR -- solrQueryParser defaultOperator=OR/ copyfield source=CategoryNamePolarity dest=allText/ copyfield source=CategoryNameStrenth dest=allText/ copyfield source=CategoryNameSubjectivity dest=allText/ copyfield source=Sentence dest=allText/ I think the problem is in defaultSearchField, but I don't know how to fix it. Could anyone help me? Thanks Yang -- View this message in context: http://www.nabble.com/about-defaultSearchField-tp24382105p24384615.html Sent from the Solr - User mailing list archive at Nabble.com.
Using relevance scores for psuedo-random-probabilistic ordenation
Hi, I've just implemented my PseudoRandomFieldComparator (migrated from PseudoRandomComparatorSource). The problem that I see is that I don't have acces to the relevance's scores in the deprecated PseudoRandomComparatorSource. I'm trying to fill the scores from my PseudoRandomComponent (in the process() method). I don't know if use a PseudoRandomComparator that extends from QueryComponent and then repeat the query or sth similar like reorder my doclist, or if use two diferent components QueryComponent and PseudoComponent (extends from SearchComponent) and look for a good combination. How can I have my relevance scores on my PseudoRandomFieldComparator? Any ideas? Regards, Raimon Bosch. -- View this message in context: http://www.nabble.com/Using-relevance-scores-for-psuedo-random-probabilistic-ordenation-tp24392432p24392432.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Adding new Fields ?
I don't really know how to solve my problem :/ On Wed, Jul 8, 2009 at 3:16 PM, Saeli Mathieu saeli.math...@gmail.comwrote: The research debug is bit wired... I'll give you a typical example. I want to find, this word Cycle the field in my xml file is this one add doc .. field name=lomclassificationtaxonPathtaxonentrystringCycle 2/field /doc /add This field is refered in my schema.xml by this way. fields field name=lomclassificationtaxonPathtaxonentrystring type=text indexed=true stored=true multiValued=true omitNorms=true termVectors=true / /fields and copyfield source=lomclassificationtaxonPathtaxonentrystring dest=text/ Here is my research in debug mode with this request http://localhost:8983/solr/select?indent=onversion=2.2q=Cyclestart=0rows=10fl=*%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl.fl= -response -lst name=responseHeader int name=status0/int int name=QTime0/int -lst name=params str name=explainOther/ str name=fl*,score/str str name=debugQueryon/str str name=indenton/str str name=start0/str str name=qCycle/str str name=hl.fl/ str name=qtstandard/str str name=wtstandard/str str name=version2.2/str str name=rows10/str /lst /lst result name=response numFound=0 start=0 maxScore=0.0/ -lst name=debug str name=rawquerystringCycle/str str name=querystringCycle/str str name=parsedquerytext:cycl/str str name=parsedquery_toStringtext:cycl/str lst name=explain/ str name=QParserOldLuceneQParser/str -lst name=timing double name=time0.0/double -lst name=prepare double name=time0.0/double -lst name=org.apache.solr.handler.component.QueryComponent double name=time0.0/double /lst -lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst -lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst -lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst -lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst -lst name=process double name=time0.0/double -lst name=org.apache.solr.handler.component.QueryComponent double name=time0.0/double /lst -lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst -lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst -lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst -lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst /lst /lst /response I don't know what I'm missing :/ Because I think I add all the necessary information in schema.xml. -- Saeli Mathieu. -- Saeli Mathieu.
Re: Adding new Fields ?
I think at least you need to review your import process. If nothing indexed, there's going to be nothing that matched. We need a little more information. Stuff like a short but concise test sample of what you're trying to index, how you're submitting the http request and the commit request (you did commit, right?), what messages you're getting when you do index and then commit. I didn't look too closely at your last code example, but I would recommend using some XML libraries. If I remember it didn't. Most folks seem to process xml files for indexing by using the source xml files to create new files just for indexing. There's an identifier, which is usually used to link back to the source xml file in the application you design. Jon Gorman
Re: Indexing rich documents from websites using ExtractingRequestHandler
Try putting all the PDF URLs into a file, download with something like 'wget' then index locally. Glen Newton http://zzzoot.blogspot.com/ 2009/7/8 ahammad ahmed.ham...@gmail.com: Hello, I can index rich documents like pdf for instance that are on the filesystem. Can we use ExtractingRequestHandler to index files that are accessible on a website? For example, there is a file that can be reached like so: http://www.sub.myDomain.com/files/pdfdocs/testfile.pdf How would I go about indexing that file? I tried using the following combinations. I will put the errors in brackets: stream.file=http://www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The filename, directory name, or volume label syntax is incorrect) stream.file=www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The system cannot find the path specified) stream.file=//www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The format of the specified network name is invalid) stream.file=sub.myDomain.com/files/pdfdocs/testfile.pdf (The system cannot find the path specified) stream.file=//sub.myDomain.com/files/pdfdocs/testfile.pdf (The network path was not found) I sort of understand why I get those errors. What are the alternative methods of doing this? I am guessing that the stream.file attribute doesn't support web addresses. Is there another attribute that does? -- View this message in context: http://www.nabble.com/Indexing--rich-documents-from-websites-using-ExtractingRequestHandler-tp24392809p24392809.html Sent from the Solr - User mailing list archive at Nabble.com. -- -
Re: Adding new Fields ?
Have you thought about looking at your index with Luke to see ifwhat you expect to be there is actually there? Best Erick On Wed, Jul 8, 2009 at 11:28 AM, Jon Gorman jonathan.gor...@gmail.comwrote: I think at least you need to review your import process. If nothing indexed, there's going to be nothing that matched. We need a little more information. Stuff like a short but concise test sample of what you're trying to index, how you're submitting the http request and the commit request (you did commit, right?), what messages you're getting when you do index and then commit. I didn't look too closely at your last code example, but I would recommend using some XML libraries. If I remember it didn't. Most folks seem to process xml files for indexing by using the source xml files to create new files just for indexing. There's an identifier, which is usually used to link back to the source xml file in the application you design. Jon Gorman
SolrException - Lock obtain timed out, no leftover locks
Hi, I'm running Solr 1.3.0 in multicore mode and feeding it data from which the core name is inferred from a specific field. My service extracts the core name and, if it has not seen it before, issues a create request for that core before attempting to add the document (via SolrJ). I have a pool of MyIndexers that run in parallel, taking documents from a queue and adding them via the add method on the SolrServer instance corresponding to that core (exactly one per core exists). Each core is in a separate data directory. My timeouts are set as such: writeLockTimeout15000/writeLockTimeout commitLockTimeout25000/commitLockTimeout I remove the index directories, start the server, check that no locks exist, and generate ~500 documents spread across 5 cores for the MyIndexers to handle. Each time, I see one or more exceptions with a message like Lock_obtain_timed_out_SimpleFSLockmulticoreNewUser3dataindexlucenebd4994617386d14e2c8c29e23bcca719writelock__orgapachelucenestoreLockObtainFailedException_Lock_obtain_timed_out_... When the indexers have completed, no lock is left over. There is no discernible pattern as far as when the exception occurs (ie, it does not tend to happen on the first or last or any particular document). Interestingly, this problem does not happen when I have only a single MyIndexer, or if I have a pool of MyIndexers and am running in single core mode. I've looked at the other posts from users getting this exception but it always seemed to be a different case, such as the server having crashed previously and a lock file being left over. -- View this message in context: http://www.nabble.com/SolrException---Lock-obtain-timed-out%2C-no-leftover-locks-tp24393255p24393255.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: about defaultSearchField
Just to be sure: You mentioned that you adjusted schema.xml - did you re-index after making your changes? -Jay On Wed, Jul 8, 2009 at 7:07 AM, Yang Lin beckl...@gmail.com wrote: Thanks for your reply. But it works not. Yang 2009/7/8 Yao Ge yao...@gmail.com Try with fl=* or fl=*,score added to your request string. -Yao Yang Lin-2 wrote: Hi, I have some problems. For my solr progame, I want to type only the Query String and get all field result that includ the Query String. But now I can't get any result without specified field. For example, query with tina get nothing, but Sentence:tina could. I hava adjusted the *schema.xml* like this: fields field name=CategoryNamePolarity type=text indexed=true stored=true multiValued=true/ field name=CategoryNameStrenth type=text indexed=true stored=true multiValued=true/ field name=CategoryNameSubjectivity type=text indexed=true stored=true multiValued=true/ field name=Sentence type=text indexed=true stored=true multiValued=true/ field name=allText type=text indexed=true stored=true multiValued=true/ /fields uniqueKey required=falseSentence/uniqueKey !-- field for the QueryParser to use when an explicit fieldname is absent -- defaultSearchFieldallText/defaultSearchField !-- SolrQueryParser configuration: defaultOperator=AND|OR -- solrQueryParser defaultOperator=OR/ copyfield source=CategoryNamePolarity dest=allText/ copyfield source=CategoryNameStrenth dest=allText/ copyfield source=CategoryNameSubjectivity dest=allText/ copyfield source=Sentence dest=allText/ I think the problem is in defaultSearchField, but I don't know how to fix it. Could anyone help me? Thanks Yang -- View this message in context: http://www.nabble.com/about-defaultSearchField-tp24382105p24384615.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Adding new Fields ?
Here is my result when I'm adding a file to solr {...@framboise.}:java -jar post.jar FinalParsing.xml [18:37]#25 SimplePostTool: version 1.2 SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, other encodings are not currently supported SimplePostTool: POSTing files to http://localhost:8983/solr/update.. SimplePostTool: POSTing file FinalParsing.xml SimplePostTool: COMMITting Solr index changes.. {...@framboise.}: Here is my typical xml file. add doc field name=id0/field field name=lomgeneralidentifiercatalogTEXT/field field name=lomgeneralidentifierentryTEXT/field field name=lomgeneraltitlestringTEXT/field field name=lomgenerallanguageTEXT/field field name=lomgeneraldescriptionstringTEXT/field field name=lomlifeCyclestatussourceTEXT/field field name=lomlifeCyclestatusvalueTEXT/field field name=lomlifeCyclecontributerolesourceTEXT/field field name=lomlifeCyclecontributerolevalueTEXT/field field name=lomlifeCyclecontributeentityTEXT/field field name=lommetaMetadataidentifiercatalogTEXT/field field name=lommetaMetadataidentifierentryTEXT/field field name=lommetaMetadatacontributerolesourceTEXT/field field name=lommetaMetadatacontributerolevalueTEXT/field field name=lommetaMetadatacontributeentityTEXT/field field name=lommetaMetadatacontributedatedateTimeTEXT/field field name=lommetaMetadatacontributerolesourceTEXT/field field name=lommetaMetadatacontributerolevalueTEXT/field field name=lommetaMetadatacontributeentityTEXT/field field name=lommetaMetadatacontributeentityTEXT/field field name=lommetaMetadatacontributeentityTEXT/field field name=lommetaMetadatacontributedatedateTimeTEXT/field field name=lommetaMetadatametadataSchemaTEXT/field field name=lommetaMetadatalanguageTEXT/field field name=lomtechnicallocationTEXT/field field name=lomeducationalintendedEndUserRolesourceTEXT/field field name=lomeducationalintendedEndUserRolevalueTEXT/field field name=lomeducationalcontextsourceTEXT/field field name=lomeducationalcontextvalueTEXT/field field name=lomeducationaltypicalAgeRangestringTEXT/field field name=lomeducationaltypicalAgeRangestringTEXT/field field name=lomeducationaldescriptionstringTEXT/field field name=lomeducationallanguageTEXT/field field name=lomannotationentityTEXT/field field name=lomannotationdatedateTimeTEXT/field field name=lomannotationdescriptionstringTEXT/field field name=lomclassificationpurposesourceTEXT/field field name=lomclassificationpurposevalueTEXT/field field name=lomclassificationtaxonPathsourcestringTEXT/field field name=lomclassificationtaxonPathtaxonidTEXT/field field name=lomclassificationtaxonPathtaxonentrystringTEXT/field field name=lomclassificationpurposesourceTEXT/field field name=lomclassificationpurposevalueTEXT/field field name=lomclassificationtaxonPathsourcestringTEXT/field field name=lomclassificationtaxonPathtaxonidTEXT/field field name=lomclassificationtaxonPathtaxonentrystringTEXT/field field name=lomclassificationtaxonPathsourcestringTEXT/field field name=lomclassificationtaxonPathtaxonidTEXT/field field name=lomclassificationtaxonPathtaxonentrystringTEXT/field field name=lomclassificationpurposesourceTEXT/field field name=lomclassificationpurposevalueTEXT/field field name=lomclassificationtaxonPathsourcestringTEXT/field field name=lomclassificationtaxonPathtaxonidTEXT/field field name=lomclassificationtaxonPathtaxonentrystringTEXT/field field name=lomclassificationtaxonPathsourcestringTEXT/field field name=lomclassificationtaxonPathtaxonidTEXT/field field name=lomclassificationtaxonPathtaxonentrystringTEXT/field field name=lomclassificationtaxonPathsourcestringTEXT/field field name=lomclassificationtaxonPathtaxonidTEXT/field field name=lomclassificationtaxonPathtaxonentrystringTEXT/field /doc /add here is my schema.xml configuration. fields field name=id type=string indexed=true stored=true required=true / field name=sku type=textTight indexed=true stored=true omitNorms=true/ field name=name type=text indexed=true stored=true/ field name=nameSort type=string indexed=true stored=false/ field name=alphaNameSort type=alphaOnlySort indexed=true stored=false/ field name=manu type=text indexed=true stored=true omitNorms=true/ field name=cat type=text_ws indexed=true stored=true multiValued=true omitNorms=true termVectors=true / field name=features type=text indexed=true stored=true multiValued=true/ field name=includes type=text indexed=true stored=true/ field name=lomgeneral type=text indexed=true
expand synonyms without tokenizing stream?
I'm pretty new to solr; my apologies if this is a naive question, and my apologies for the verbosity: I'd like to take keywords in my documents, and expand them as synonyms; for example, if the document gets annotated with a keyword of 'sf', I'd like that to expand to 'San Francisco'. (San Francisco,San Fran,SF is a line in my synonyms.txt file). But I also want to be able to display facets with counts for these keywords; I'd like them to be suitable for display. So, if I define the keywords field as 'text', I use the following pipeline (from my schema.xml): fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/filter class=solr.StopFilterFactoryignoreCase=true words=stopwords.txt enablePositionIncrements=true/filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/filter class=solr.LowerCaseFilterFactory/filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=querytokenizer class=solr.WhitespaceTokenizerFactory/filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/filter class=solr.LowerCaseFilterFactory/filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer/fieldType Faceting on this field, I get return values (when I query specifically for the single document in question): lst name=Keywords int name=fran1/int int name=francisco1/int int name=san1/int int name=sf1/int /lst I've also done a copyfield to a 'KeywordsString' field, which is defined as string. i.e. fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ Faceting on *that* field (when querying for just this 1 document, which has a keyword of 'sf'), results in: lst name=KeywordsString int name=sf1/int /lst I guess what I'd like to see is the ability to stamp keywords like 'sf', 'san fran', 'san francisco', and 'mlb' (with a synonyms.txt file entry of mlb = Major League Baseball, and see all the documents that are inscribed with all those synonym variants, come back as: lst name=KeywordsString int name=San Francisco1/int int name=Major League Baseball1/int /lst But, I don't know how to define a processing pipeline that expands synonyms that doesn't tokenize them, breaking 'San Francisco' into 'san' and 'francisco', and presenting those as separate facets. Thanks for any help, Don
Re: Multiple values for custom fields provided in SOLR query
Suryasnat, I suggest you go to your Solr Admin page and run a few searches from there, using Lucene query syntax (link on Lucene site). e.g. fieldID:111 AND fieldID:222 AND fieldID:333 AND foo:product then eplace ANDs with ORs where appropriate That should give you an idea/feel about which query you need. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Suryasnat Das suryaatw...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, July 7, 2009 12:16:30 PM Subject: Re: Multiple values for custom fields provided in SOLR query Hi Otis, Thanks for replying to my query. My query is, if multiple values are provided for a custom field then how can it be represented in a SOLR query. So if my field is fileID and its values are 111, 222 and 333 and my search string is ‘product’ then how can this be represented in a SOLR query? I want to perform the search on basis of fileIDs *and* search string provided. If i provide the query in the format, q=fileID:111+fileID:222+fileID:333+product, then how will it actually search? Can you please provide me the correct format of the query? Regards Suryasnat Das On Mon, Jul 6, 2009 at 10:05 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: I actually don't fully understand your question. q=+fileID:111+fileID:222+fileID:333+apple looks like a valid query to me. (not sure what that space encoded as + is, though) Also not sure what you mean by: Basically the requirement is , if fileIDs are provided as search parameter then search should happen on the basis of fileID. Do you mean apple should be ignored if a term (field name:field value) is provided? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Suryasnat Das To: solr-user@lucene.apache.org Sent: Monday, July 6, 2009 11:31:10 AM Subject: Multiple values for custom fields provided in SOLR query Hi, I have a requirement in which i need to have multiple values in my custom fields while forming the search query to SOLR. For example, fileID is my custom field. I have defined the fileID in schema.xml as name=fileID type=string indexed=true stored=true required=true multiValued=true/. Now fileID can have multiple values like 111,222,333 etc. So will my query be of the form, q=+fileID:111+fileID:222+fileID:333+apple where apple is my search query string. I tried with the above query but it did not work. SOLR gave invalid query error. Basically the requirement is , if fileIDs are provided as search parameter then search should happen on the basis of fileID. Is my approach correct or i need to do something else? Please, if immediate help is provided then that would be great. Regards Suryasnat Das Infosys.
Re: Solr's MLT query call doesn't work
Sergey, What about http://localhost:8080/solr/select?q=id:10mlt=truemlt.fl=content_mltmlt.maxqt= 5mlt.interestingTerms=detailsfl=title+author+scoreqt=mlt ? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: SergeyG sgoldb...@mail.ru To: solr-user@lucene.apache.org Sent: Wednesday, July 8, 2009 9:44:20 AM Subject: Solr's MLT query call doesn't work Hi, Recently, while implementing the MoreLikeThis search, I've run into the situation when Solr's mlt query calls don't work. More specifically, the following query: http://localhost:8080/solr/select?q=id:10mlt=truemlt.fl=content_mltmlt.maxqt= 5mlt.interestingTerms=detailsfl=title+author+score brings back just the doc with id=10 and nothing else. While using the GetMethod approach (putting /mlt explicitely into the url), I got back some results. I've been trying to solve this problem for more than a week with no luck. If anybody has any hint, please help. Below, I put logs outputs from 3 runs: a) Solr; b) GetMethod (/mlt); c) GetMethod (/select). Thanks a lot. Regards, Sergey Goldberg Here're the logs: a) Solr (http://localhost:8080/solr/select) 08.07.2009 15:50:33 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt= truemlt.interestingTerms=detailsmlt.maxqt=5wt=javabinversion=2.2} hits=1 status=0 QTime=172 INFO MLTSearchRequestProcessor:49 - SolrServer url: http://localhost:8080/solr INFO MLTSearchRequestProcessor:67 - solrQuery q=id%3A10mlt=truemlt.fl=content_mltmlt.maxqt= 5mlt.interestingTerms=detailsfl=title+author+score INFO MLTSearchRequestProcessor:73 - Number of docs found = 1 INFO MLTSearchRequestProcessor:77 - title = SG_Book; score = 2.098612 b) GetMethod (http://localhost:8080/solr/mlt) 08.07.2009 16:55:44 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/mlt params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt.max qt=5mlt.interestingTerms=details} status=0 QTime=15 INFO MLT2SearchRequestProcessor:76 - 0 name=QTime0 maxScore=2.0986122.098612S.G. name=titleSG_Book umFound=4 start=0 maxScore=0.28923997 name=score0.28923997O. HenryS.G.Four Million, The0.08667877 name=authorKatherine MosbyThe Season of Lillian Dawes0.07947738 name=authorJerome K. JeromeThree Men in a Boat name=score0.047219563Charles OliverS.G.ABC's of Science name=content_mlt:ye1.0 name=content_mlt:tobin1.0 name=content_mlt:a1.0 name=content_mlt:i1.0 name=content_mlt:his1.0 c) GetMethod (http://localhost:8080/solr/select) 08.07.2009 17:06:45 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt. maxqt=5mlt.interestingTerms=details} hits=1 status=0 QTime=16 INFO MLT2SearchRequestProcessor:80 - 0 name=QTime16title author scorecontent_mltid:10 name=mlt.maxqt5 name=mlt.interestingTermsdetails numFound=1 start=0 maxScore=2.098612 name=score2.098612S.G. name=titleSG_Book name=rawquerystringid:10id:10 name=parsedq ueryid:10id:10 name=explain 2.098612 = (MATCH) weight(id:10 in 3), product of: 0.9994 = queryWeight(id:10), product of: 2.0986123 = idf(docFreq=1, numDocs=5) 0.47650534 = queryNorm 2.0986123 = (MATCH) fieldWeight(id:10 in 3), product of: 1.0 = tf(termFreq(id:10)=1) 2.0986123 = idf(docFreq=1, numDocs=5) 1.0 = fieldNorm(field=id, doc=3) OldLuceneQParser name=timing16.0 name=time0.0 name=org.apache.solr.handler.component.QueryComponent name=time0.0 name=org.apache.solr.handler.component.FacetComponent name=time0.00.0 name=org.apache.solr.handler.component.HighlightComponent name=time0.0 name=org.apache.solr.handler.component.DebugComponent name=time0.0 name=time16.0 name=org.apache.solr.handler.component.QueryComponent name=time0.0 name=org.apache.solr.handler.component.FacetComponent name=time0.0 name=org.apache.solr.handler.component.MoreLikeThisComponent name=time0.0 name=org.apache.solr.handler.component.HighlightComponent name=time0.0 name=org.apache.solr.handler.component.DebugComponent name=time16.0 And here're the relevant entries from solrconfig.xml: explicit id,title,author,score on 1 10 -- View this message in context: http://www.nabble.com/Solr%27s-MLT-query-call-doesn%27t-work-tp24391843p24391843.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing rich documents from websites using ExtractingRequestHandler
I haven't tried this myself, but it sounds like what you're looking for is enabling remote streaming: http://wiki.apache.org/solr/ContentStream#head-7179a128a2fdd5dde6b1af553ed41735402aadbf As the link above shows you should be able to enable remote streaming like this: requestParsers enableRemoteStreaming=true multipartUploadLimitInKB=2048 / and then something like this might work: stream.url=http://www.sub.myDomain.com/files/pdfdocs/testfile.pdfhttp://www.sub.mydomain.com/files/pdfdocs/testfile.pdf So you use stream.url instead of stream.file. Hope this helps. -Jay On Wed, Jul 8, 2009 at 7:40 AM, ahammad ahmed.ham...@gmail.com wrote: Hello, I can index rich documents like pdf for instance that are on the filesystem. Can we use ExtractingRequestHandler to index files that are accessible on a website? For example, there is a file that can be reached like so: http://www.sub.myDomain.com/files/pdfdocs/testfile.pdf How would I go about indexing that file? I tried using the following combinations. I will put the errors in brackets: stream.file=http://www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The filename, directory name, or volume label syntax is incorrect) stream.file=www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The system cannot find the path specified) stream.file=//www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The format of the specified network name is invalid) stream.file=sub.myDomain.com/files/pdfdocs/testfile.pdf (The system cannot find the path specified) stream.file=//sub.myDomain.com/files/pdfdocs/testfile.pdf (The network path was not found) I sort of understand why I get those errors. What are the alternative methods of doing this? I am guessing that the stream.file attribute doesn't support web addresses. Is there another attribute that does? -- View this message in context: http://www.nabble.com/Indexing--rich-documents-from-websites-using-ExtractingRequestHandler-tp24392809p24392809.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr's MLT query call doesn't work
You definitely need mlt=true if you are not using /solr/mlt. Bill On Wed, Jul 8, 2009 at 2:14 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Sergey, What about http://localhost:8080/solr/select?q=id:10mlt=truemlt.fl=content_mltmlt.maxqt= 5mlt.interestingTerms=detailsfl=title+author+scoreqt=mlthttp://localhost:8080/solr/select?q=id:10mlt=truemlt.fl=content_mltmlt.maxqt=%0A5mlt.interestingTerms=detailsfl=title+author+scoreqt=mlt ? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: SergeyG sgoldb...@mail.ru To: solr-user@lucene.apache.org Sent: Wednesday, July 8, 2009 9:44:20 AM Subject: Solr's MLT query call doesn't work Hi, Recently, while implementing the MoreLikeThis search, I've run into the situation when Solr's mlt query calls don't work. More specifically, the following query: http://localhost:8080/solr/select?q=id:10mlt=truemlt.fl=content_mltmlt.maxqt= 5mlt.interestingTerms=detailsfl=title+author+score brings back just the doc with id=10 and nothing else. While using the GetMethod approach (putting /mlt explicitely into the url), I got back some results. I've been trying to solve this problem for more than a week with no luck. If anybody has any hint, please help. Below, I put logs outputs from 3 runs: a) Solr; b) GetMethod (/mlt); c) GetMethod (/select). Thanks a lot. Regards, Sergey Goldberg Here're the logs: a) Solr (http://localhost:8080/solr/select) 08.07.2009 15:50:33 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt= truemlt.interestingTerms=detailsmlt.maxqt=5wt=javabinversion=2.2} hits=1 status=0 QTime=172 INFO MLTSearchRequestProcessor:49 - SolrServer url: http://localhost:8080/solr INFO MLTSearchRequestProcessor:67 - solrQuery q=id%3A10mlt=truemlt.fl=content_mltmlt.maxqt= 5mlt.interestingTerms=detailsfl=title+author+score INFO MLTSearchRequestProcessor:73 - Number of docs found = 1 INFO MLTSearchRequestProcessor:77 - title = SG_Book; score = 2.098612 b) GetMethod (http://localhost:8080/solr/mlt) 08.07.2009 16:55:44 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/mlt params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt.max qt=5mlt.interestingTerms=details} status=0 QTime=15 INFO MLT2SearchRequestProcessor:76 - 0 name=QTime0 maxScore=2.0986122.098612S.G. name=titleSG_Book umFound=4 start=0 maxScore=0.28923997 name=score0.28923997O. HenryS.G.Four Million, The0.08667877 name=authorKatherine MosbyThe Season of Lillian Dawes0.07947738 name=authorJerome K. JeromeThree Men in a Boat name=score0.047219563Charles OliverS.G.ABC's of Science name=content_mlt:ye1.0 name=content_mlt:tobin1.0 name=content_mlt:a1.0 name=content_mlt:i1.0 name=content_mlt:his1.0 c) GetMethod (http://localhost:8080/solr/select) 08.07.2009 17:06:45 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt. maxqt=5mlt.interestingTerms=details} hits=1 status=0 QTime=16 INFO MLT2SearchRequestProcessor:80 - 0 name=QTime16title author scorecontent_mltid:10 name=mlt.maxqt5 name=mlt.interestingTermsdetails numFound=1 start=0 maxScore=2.098612 name=score2.098612S.G. name=titleSG_Book name=rawquerystringid:10id:10 name=parsedq ueryid:10id:10 name=explain 2.098612 = (MATCH) weight(id:10 in 3), product of: 0.9994 = queryWeight(id:10), product of: 2.0986123 = idf(docFreq=1, numDocs=5) 0.47650534 = queryNorm 2.0986123 = (MATCH) fieldWeight(id:10 in 3), product of: 1.0 = tf(termFreq(id:10)=1) 2.0986123 = idf(docFreq=1, numDocs=5) 1.0 = fieldNorm(field=id, doc=3) OldLuceneQParser name=timing16.0 name=time0.0 name=org.apache.solr.handler.component.QueryComponent name=time0.0 name=org.apache.solr.handler.component.FacetComponent name=time0.00.0 name=org.apache.solr.handler.component.HighlightComponent name=time0.0 name=org.apache.solr.handler.component.DebugComponent name=time0.0 name=time16.0 name=org.apache.solr.handler.component.QueryComponent name=time0.0 name=org.apache.solr.handler.component.FacetComponent name=time0.0 name=org.apache.solr.handler.component.MoreLikeThisComponent name=time0.0 name=org.apache.solr.handler.component.HighlightComponent name=time0.0 name=org.apache.solr.handler.component.DebugComponent name=time16.0 And here're the relevant entries from solrconfig.xml: explicit id,title,author,score on 1 10 -- View this message in context: http://www.nabble.com/Solr%27s-MLT-query-call-doesn%27t-work-tp24391843p24391843.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: about defaultSearchField
Yes, I have deleted whole index directory and re-index after making changes. Yang 2009/7/8 Jay Hill jayallenh...@gmail.com Just to be sure: You mentioned that you adjusted schema.xml - did you re-index after making your changes? -Jay On Wed, Jul 8, 2009 at 7:07 AM, Yang Lin beckl...@gmail.com wrote: Thanks for your reply. But it works not. Yang 2009/7/8 Yao Ge yao...@gmail.com Try with fl=* or fl=*,score added to your request string. -Yao Yang Lin-2 wrote: Hi, I have some problems. For my solr progame, I want to type only the Query String and get all field result that includ the Query String. But now I can't get any result without specified field. For example, query with tina get nothing, but Sentence:tina could. I hava adjusted the *schema.xml* like this: fields field name=CategoryNamePolarity type=text indexed=true stored=true multiValued=true/ field name=CategoryNameStrenth type=text indexed=true stored=true multiValued=true/ field name=CategoryNameSubjectivity type=text indexed=true stored=true multiValued=true/ field name=Sentence type=text indexed=true stored=true multiValued=true/ field name=allText type=text indexed=true stored=true multiValued=true/ /fields uniqueKey required=falseSentence/uniqueKey !-- field for the QueryParser to use when an explicit fieldname is absent -- defaultSearchFieldallText/defaultSearchField !-- SolrQueryParser configuration: defaultOperator=AND|OR -- solrQueryParser defaultOperator=OR/ copyfield source=CategoryNamePolarity dest=allText/ copyfield source=CategoryNameStrenth dest=allText/ copyfield source=CategoryNameSubjectivity dest=allText/ copyfield source=Sentence dest=allText/ I think the problem is in defaultSearchField, but I don't know how to fix it. Could anyone help me? Thanks Yang -- View this message in context: http://www.nabble.com/about-defaultSearchField-tp24382105p24384615.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr's MLT query call doesn't work
Many thanks to everybody who replied to my message. 1. A couple of things, your mlt.fl value, must be part of fl. In this case, content_mlt is not included in fl. I think the fl parameter value need to be comma separated. try fl=title,author,content_mlt,score Yao, Although I don't understand why mlt.fl must be part of fl (at least, I didn't see this mentioned anywhere), I included this field into fl. But this didn't change anything. As to the syntax, both fl=title,author,content_mlt,score and fl=title author content_mlt score produced the same output (what, again, was the exactly the same as the one with fl=title author score). 2. You definitely need mlt=true if you are not using /solr/mlt. Bill, mlt=true was included in the query while making the Solr call from the very beginning. 3. What about http://localhost:8080/solr/select?q=id:10mlt=truemlt.fl=content_mltmlt.maxqt=%0A5mlt.interestingTerms=detailsfl=title+author+scoreqt=mlt Otis, I tried that too and got this: INFO MLTSearchRequestProcessor:69 - solrQuery q=id%3A10mlt=truemlt.fl=content_mltmlt.maxqt= 5mlt.interestingTerms=detailsfl=title+author+scoreqt=mlt ERROR MLTSearchRequestProcessor:88 - Error executing query INFO MLTSearchRequestProcessor:69 - solrQuery q=id%3A10mlt=truemlt.fl=content_mltmlt.maxqt=5mlt.interestingTerms=detailsfl=title+author+content_mlt+scoreqt=mlt ERROR MLTSearchRequestProcessor:88 - Error executing query Well, I didn't expect this to be such a hurdle. And I'm sure that hundreds of people before me have already done something similar, haven't they? This really looks bizarre. Thank you all. (Otis, when I saw your name I got a feeling that it was just a matter of seconds till this stubborn calls would start doing their job. :) ) Sergey SergeyG wrote: Hi, Recently, while implementing the MoreLikeThis search, I've run into the situation when Solr's mlt query calls don't work. More specifically, the following query: http://localhost:8080/solr/select?q=id:10mlt=truemlt.fl=content_mltmlt.maxqt= 5mlt.interestingTerms=detailsfl=title+author+score brings back just the doc with id=10 and nothing else. While using the GetMethod approach (putting /mlt explicitely into the url), I got back some results. I've been trying to solve this problem for more than a week with no luck. If anybody has any hint, please help. Below, I put logs outputs from 3 runs: a) Solr; b) GetMethod (/mlt); c) GetMethod (/select). Thanks a lot. Regards, Sergey Goldberg Here're the logs: a) Solr (http://localhost:8080/solr/select) 08.07.2009 15:50:33 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt= truemlt.interestingTerms=detailsmlt.maxqt=5wt=javabinversion=2.2} hits=1 status=0 QTime=172 INFO MLTSearchRequestProcessor:49 - SolrServer url: http://localhost:8080/solr INFO MLTSearchRequestProcessor:67 - solrQuery q=id%3A10mlt=truemlt.fl=content_mltmlt.maxqt= 5mlt.interestingTerms=detailsfl=title+author+score INFO MLTSearchRequestProcessor:73 - Number of docs found = 1 INFO MLTSearchRequestProcessor:77 - title = SG_Book; score = 2.098612 b) GetMethod (http://localhost:8080/solr/mlt) 08.07.2009 16:55:44 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/mlt params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt.max qt=5mlt.interestingTerms=details} status=0 QTime=15 INFO MLT2SearchRequestProcessor:76 - ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime0/int/lstresult name=match numFound=1 start=0 maxScore=2.098612docfloat name=score2.098612/floatarr name= authorstrS.G./str/arrstr name=titleSG_Book/str/doc/resultresult name=response n umFound=4 start=0 maxScore=0.28923997docfloat name=score0.28923997/floatarr name=authorstrO. Henry/strstrS.G./str/arrstr name=titleFour Million, The/str/docdocfloat name=score0.08667877/floatarr name=authorstrKatherine Mosby/str/arrstr name=titleThe Season of Lillian Dawes/str/docdocfloat name=score0.07947738/floatarr name=authorstrJerome K. Jerome/str/arrstr name=titleThree Men in a Boat/str/docdocfloat name=score0.047219563/floatarr name=authorstrCharles Oliver/strstrS.G./str/arrstr name=titleABC's of Science/str/doc/resultlst name=interestingTermsfloat name=content_mlt:ye1.0/floatfloat name=content_mlt:tobin1.0/floatfloat name=content_mlt:a1.0/floatfloat name=content_mlt:i1.0/floatfloat name=content_mlt:his1.0/float/lst /response c) GetMethod (http://localhost:8080/solr/select) 08.07.2009 17:06:45 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={fl=title+author+scoremlt.fl=content_mltq=id:10mlt. maxqt=5mlt.interestingTerms=details} hits=1 status=0 QTime=16 INFO MLT2SearchRequestProcessor:80 - ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime16/intlst name=paramsstr
Best way to integrate custom functionality
Hello all, I am working on a project that involves searching through free-text fields and would like to add the ability to filter out negative expressions at a very simple level. For example, the field may contain the text, person has no cars. If the user were to search for cars, I would like to be able to intercept the results and return only those without the word no in front of the search term. While is is a very simple example, it's pretty much my end goal. I've been reading up on the various hooks provided within Solr but wanted to get some guidance on the best way to proceed. Thanks! --Andrew
Boosting for most recent documents
Hi, I'm trying to find a way to get the most recent entry for the searched word. For ex., if I have a document with field name user. If I search for user:vivek, I want to get the document that was indexed most recently. Two ways I could think of, 1) Sort by some time stamp field - but with millions of documents this becomes a huge memory problem as we have seen OOM with sorting before 2) Boost the most recent document - I'm not sure how to do this. Basically, we want to have the most recent document score higher than any other and then we can retrieve just 10 records and sort in the application by time stamp field to get the most recent document matching the keyword. Any suggestion on how can this be done? Thanks, -vivek
Re: Boosting for most recent documents
Sort by the internal Lucene document ID and pick the highest one. That might do the job for you. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: vivek sar vivex...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Wednesday, July 8, 2009 8:34:16 PM Subject: Boosting for most recent documents Hi, I'm trying to find a way to get the most recent entry for the searched word. For ex., if I have a document with field name user. If I search for user:vivek, I want to get the document that was indexed most recently. Two ways I could think of, 1) Sort by some time stamp field - but with millions of documents this becomes a huge memory problem as we have seen OOM with sorting before 2) Boost the most recent document - I'm not sure how to do this. Basically, we want to have the most recent document score higher than any other and then we can retrieve just 10 records and sort in the application by time stamp field to get the most recent document matching the keyword. Any suggestion on how can this be done? Thanks, -vivek
Re: Best way to integrate custom functionality
How about, for example +cars -no cars -nothing cars In other words, the basic query is the original query, and then loop over all negative words and append exclude phrase clauses like in the above example. That will find documents that have the word cars in them, but any documents with no cars phrase or nothing cars phrase will be excluded. Just make sure your negative words are not stopwords. Otis-- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Andrew Nguyen andrew-lists-solr-u...@na-consulting.net To: solr-user@lucene.apache.org Sent: Wednesday, July 8, 2009 7:17:09 PM Subject: Best way to integrate custom functionality Hello all, I am working on a project that involves searching through free-text fields and would like to add the ability to filter out negative expressions at a very simple level. For example, the field may contain the text, person has no cars. If the user were to search for cars, I would like to be able to intercept the results and return only those without the word no in front of the search term. While is is a very simple example, it's pretty much my end goal. I've been reading up on the various hooks provided within Solr but wanted to get some guidance on the best way to proceed. Thanks! --Andrew
Re: reindexed data on master not replicated to slave
On Wed, Jul 8, 2009 at 10:14 PM, solr jaysolr...@gmail.com wrote: Thanks. The patch looks good, and I now see the new index directory and it is in sync with the one on master. I'll do more testing. It is probably not important, but I am just curious why we switch index directory. I thought it would be easier to just rename index to index.*, and rename the new index directory to index. It is for consistency across OS's . Windows would not let me do a rename. 2009/7/7 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com jay, Thanks. The testcase was not enough. I have given a new patch . I guess that should solve this On Wed, Jul 8, 2009 at 3:48 AM, solr jaysolr...@gmail.com wrote: I guess in this case it doesn't matter whether the two directories tmpIndexDir and indexDir are the same or not. It looks that the index directory is switched to tmpIndexDir and then it is deleted inside finally. On Tue, Jul 7, 2009 at 12:31 PM, solr jay solr...@gmail.com wrote: In fact, I saw the directory was created and then deleted. On Tue, Jul 7, 2009 at 12:29 PM, solr jay solr...@gmail.com wrote: Ok, Here is the problem. In the function, the two directories tmpIndexDir and indexDir are the same (in this case only?), and then at the end of the function, the directory tmpIndexDir is deleted, which deletes the new index directory. } finally { delTree(tmpIndexDir); } On Tue, Jul 7, 2009 at 12:17 PM, solr jay solr...@gmail.com wrote: I see. So I tried it again. Now index.properties has #index properties #Tue Jul 07 12:13:49 PDT 2009 index=index.20090707121349 but there is no such directory index.20090707121349 under the data directory. Thanks, J On Tue, Jul 7, 2009 at 11:50 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Tue, Jul 7, 2009 at 11:50 PM, solr jay solr...@gmail.com wrote: It seemed that the patch fixed the symptom, but not the problem itself. Now the log messages looks good. After one download and installed the index, it printed out *Jul 7, 2009 10:35:10 AM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master.* but the files inside index directory did not change. Both index.properties and replication.properties were updated though. Note that in this case, Solr would have created a new index directory. Are you comparing the files on the slave in the new index directory? You can get the new index directory's name from index.properties. -- Regards, Shalin Shekhar Mangar. -- J -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- J -- - Noble Paul | Principal Engineer| AOL | http://aol.com