Tagging and searching on tagged indexes.
Hi, How do we tag solr indexes and search on those indexes, there is not much information on wiki. all i could find is this: http://wiki.apache.org/solr/UserTagDesign has anyone tried it? (using solr API) One more question, can we change the schema dynamically at runtime? (while solr instance is on??) Regards, Raakhi.
Re: Is there any other way to load the index beside using http connection?
Out of my head... but are you not supposed to active the stream-handler in SOLR ? Think it is documented... Cheers //Marcus On Mon, Jul 6, 2009 at 8:55 PM, Francis Yakin fya...@liquid.com wrote: Yes, I uploaded the CSV file that I get it from Database then I ran that cmd and I have the error. Any suggestions? Thanks Francis -Original Message- From: NitinMalik [mailto:malik.ni...@yahoo.com] Sent: Monday, July 06, 2009 11:32 AM To: solr-user@lucene.apache.org Subject: RE: Is there any other way to load the index beside using http connection? Hi Francis, I have experienced that update stream handler (for a xml file in my case) worked only for Solr running on the same machine. I also got same error when I tried to update the documents on a remote Solr instance. Regards Nitin Francis Yakin wrote: Ok, I have a CSV file(called it test.csv) from database. When I tried to upload this file to solr using this cmd, I got stream.contentType=text/plain: No such file or directory error curl http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csvstream.contentType=text/plain;charset=utf-8 -bash: stream.contentType=text/plain: No such file or directory undefined field cat What did I do wrong? Francis -Original Message- From: Norberto Meijome [mailto:numard...@gmail.com] Sent: Monday, July 06, 2009 11:01 AM To: Francis Yakin Cc: solr-user@lucene.apache.org Subject: Re: Is there any other way to load the index beside using http connection? On Mon, 6 Jul 2009 09:56:03 -0700 Francis Yakin fya...@liquid.com wrote: Norberto, Thanks, I think my questions is: why not generate your SQL output directly into your oracle server as a file What type of file is this? a file in a format that you can then import into SOLR. _ {Beto|Norberto|Numard} Meijome Gravity cannot be blamed for people falling in love. Albert Einstein I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned. -- View this message in context: http://www.nabble.com/Is-there-any-other-way-to-load-the-index-beside-using-%22http%22-connection--tp24297934p24360603.html Sent from the Solr - User mailing list archive at Nabble.com. -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/
Re: Tagging and searching on tagged indexes.
On Tue, Jul 7, 2009 at 11:37 AM, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, How do we tag solr indexes and search on those indexes, there is not much information on wiki. all i could find is this: http://wiki.apache.org/solr/UserTagDesign has anyone tried it? (using solr API) That page was created for brainstorming a possible enhancement. It is not implemented yet. One more question, can we change the schema dynamically at runtime? (while solr instance is on??) You'd need to reload the core (or restart the server) and re-index all documents for schema changes to take affect. -- Regards, Shalin Shekhar Mangar.
Can´t use wildcard * on alphanumeric values?
Hi, I indexed my data and defined a defaultsearchfield named text: (field name=text type=text indexed=true stored=false multiValued=true/). I copied all my other field values into that field. Now my problem: Lets say I have 2 values indexed 1.value ABCD 2.value ABCD3456 Now when I do a wildcard search over that two values the following happens: - query:q=AB* = All two values are returned ABCD and ABCD3456 = wildcard is functioning! - query:q=ABCD3* = No results are returned! (expected: ABCD3456) = wildcard does not function! Am I doing something wrong? Is there a way to use wildcards on alphanumeric values? (offtopic: How is for example google dealing with a problem like that, are they hiding the wildcards from the user) kind regards Sebastian -- View this message in context: http://www.nabble.com/Can%C2%B4t-use-wildcard-%22*%22-on-alphanumeric-values--tp24369209p24369209.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Filtering MoreLikeThis results
Using MoreLikeThisHandler you can use fq to filter your results. As far as I know bq are not allowed. Bill Au wrote: I have been trying to restrict MoreLikeThis results without any luck also. In additional to restricting the results, I am also looking to influence the scores similar to the way boost query (bq) works in the DisMaxRequestHandler. I think Solr's MoreLikeThis depends on Lucene's contrib queries MoreLikeThis, or at least it used to. Has anyone looked into enhancing Solrs' MoreLikeThis to support bq and restricting mlt results? Bill On Mon, Jul 6, 2009 at 2:16 PM, Yao Ge yao...@gmail.com wrote: I could not find any support from http://wiki.apache.org/solr/MoreLikeThison how to restrict MLT results to certain subsets. I passed along a fq parameter and it is ignored. Since we can not incorporate the filters in the query itself which is used to retrieve the target for similarity comparison, it appears there is no way to filter MLT results. BTW. I am using Solr 1.3. Please let me know if there is way (other than hacking the source code) to do this. Thanks! -- View this message in context: http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24360355.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24369257.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can´t use wildcard * on alphanumeric values?
On Tue, Jul 7, 2009 at 2:10 PM, gateway0 reiterwo...@yahoo.de wrote: I indexed my data and defined a defaultsearchfield named text: (field name=text type=text indexed=true stored=false multiValued=true/). Lets say I have 2 values indexed 1.value ABCD 2.value ABCD3456 Now when I do a wildcard search over that two values the following happens: - query:q=AB* = All two values are returned ABCD and ABCD3456 = wildcard is functioning! - query:q=ABCD3* = No results are returned! (expected: ABCD3456) = wildcard does not function! Am I doing something wrong? Is there a way to use wildcards on alphanumeric values? I think the problem is that the WordDelimiterFilter applied on 'text' type, splits 'ABCD3456' into 'ABCD' and '3456' etc. Also, prefix queries are not analyzed so that don't pass through the same filters. I guess one simple solution to your problem is to add preserveOriginal=1 to the WordDelimiterFilterFactory definition inside the 'text' field type. -- Regards, Shalin Shekhar Mangar.
spell checker's collate values
Hi all, i'm still trying to tune my spellchecker to get the results i expect I've created a dictionary and currently i want to get an special behaviour from the spellchecker. The fact is that when i introduce the query 'Fernandox Alonso' i get what i expect : bool name=correctlySpelledfalse/bool str name=collationFernando Alonso/str but when i try 'Fernanda Alonso' its returns lst name=spellcheck - lst name=suggestions bool name=correctlySpelledtrue/bool /lst /lst ok, Fernanda is a correct name, but i whant to boost some kind of values (Fernado Alonso, Michael Jackson) to be returned as suggestions. (as google do) Any help? regards -- Lici
Re: reindexed data on master not replicated to slave
Jay , I am opening an issue SOLR-1264 https://issues.apache.org/jira/browse/SOLR-1264 I have attached a patch as well . I guess that is the fix. could you please confirm that. On Tue, Jul 7, 2009 at 12:59 AM, solr jaysolr...@gmail.com wrote: It looks that the problem is here or before that in SnapPuller.fetchLatestIndex(): terminateAndWaitFsyncService(); LOG.info(Conf files are not downloaded or are in sync); if (isSnapNeeded) { modifyIndexProps(tmpIndexDir.getName()); } else { successfulInstall = copyIndexFiles(tmpIndexDir, indexDir); } if (successfulInstall) { logReplicationTimeAndConfFiles(modifiedConfFiles); doCommit(); } Debugged into the place, and noticed that isSnapNeeded is true and therefore modifyIndexProps(tmpIndexDir.getName()); executed, but from the function name it looks that installing index actually happens in successfulInstall = copyIndexFiles(tmpIndexDir, indexDir); The function returns false, but the caller (doSnapPull) never checked the return value. Thanks, J On Mon, Jul 6, 2009 at 8:02 AM, solr jay solr...@gmail.com wrote: There is only one index directory: index/ Here is the content of index.properties #index properties #Fri Jul 03 14:17:12 PDT 2009 index=index.20090703021705 Thanks, J 2009/7/5 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com BTW , how many index dirs are there in the data dir ? what is there in the datadir/index.properties ? On Sat, Jul 4, 2009 at 12:15 AM, solr jaysolr...@gmail.com wrote: I tried it with the latest nightly build and got the same result. Actually that was the symptom and it made me looking at the index directory. The same log messages repeated again and again, never end. 2009/7/2 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com jay , I see updating index properties... twice this should happen rarely. in your case it should have happened only once. because you cleaned up the master only once On Fri, Jul 3, 2009 at 6:09 AM, Otis Gospodneticotis_gospodne...@yahoo.com wrote: Jay, You didn't mention which version of Solr you are using. It looks like some trunk or nightly version. Maybe you can try the latest nightly? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: solr jay solr...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, July 2, 2009 9:14:48 PM Subject: reindexed data on master not replicated to slave Hi, When index data were corrupted on master instance, I wanted to wipe out all the index data and re-index everything. I was hoping the newly created index data would be replicated to slaves, but it wasn't. Here are the steps I performed: 1. stop master 2. delete the directory 'index' 3. start master 4. disable replication on master 5. index all data from scratch 6. enable replication on master It seemed from log file that the slave instances discovered that new index are available and claimed that new index installed, and then trying to update index properties, but looking into the index directory on slaves, you will find that no index data files were updated or added, plus slaves keep trying to get new index. Here are some from slave's log file: Jul 1, 2009 3:59:33 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Starting replication process Jul 1, 2009 3:59:33 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Number of files in latest snapshot in master: 69 Jul 1, 2009 3:59:33 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Total time taken for download : 0 secs Jul 1, 2009 3:59:33 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Conf files are not downloaded or are in sync Jul 1, 2009 3:59:33 PM org.apache.solr.handler.SnapPuller modifyIndexProps INFO: New index installed. Updating index properties... Jul 1, 2009 4:00:33 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Master's version: 1246488421310, generation: 9 Jul 1, 2009 4:00:33 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave's version: 1246385166228, generation: 56 Jul 1, 2009 4:00:33 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Starting replication process Jul 1, 2009 4:00:33 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Number of files in latest snapshot in master: 69 Jul 1, 2009 4:00:33 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Total time taken for download : 0 secs Jul 1, 2009 4:00:33 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Conf files are not downloaded or are in sync Jul 1, 2009 4:00:33 PM
Can't limit return fields in custom request handler
Hi. I'm writing my custom faceted request handler. But I have a problem like this; when i call http://localhost:8983/solr/select/?qt=cfacetq=%2BitemTitle:nokia%20%2BcategoryId:130start=0limit=3fl=id, itemTitle i'm getiing all fields instead of only id and itemTitle. Also i'm gettting no result when i give none null filter parameter in getDocListAndSet(...). public class MyCustomFacetRequestHandler extends StandardRequestHandler { public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception { try { SolrParams solrParams = req.getParams(); Query q = QueryParsing.parseQuery(solrParams.get(q), req.getSchema()); DocListAndSet results = req.getSearcher().getDocListAndSet(q, (Query)null, (Sort)null, solrParams.getInt(start), solrParams.getInt(limit)); ... Regards. -- Osman İZBAT
Re: Is there any other way to load the index beside using http connection?
Look at the error - it's bash (your command line shell) complaining. The '' terminates one command and puts it in the background. Surrounding the command with quotes will get you one step closer: curl 'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csvstream.contentType=text/plain;charset=utf-8' -Yonik http://www.lucidimagination.com On Mon, Jul 6, 2009 at 2:11 PM, Francis Yakinfya...@liquid.com wrote: Ok, I have a CSV file(called it test.csv) from database. When I tried to upload this file to solr using this cmd, I got stream.contentType=text/plain: No such file or directory error curl http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csvstream.contentType=text/plain;charset=utf-8 -bash: stream.contentType=text/plain: No such file or directory undefined field cat What did I do wrong? Francis -Original Message- From: Norberto Meijome [mailto:numard...@gmail.com] Sent: Monday, July 06, 2009 11:01 AM To: Francis Yakin Cc: solr-user@lucene.apache.org Subject: Re: Is there any other way to load the index beside using http connection? On Mon, 6 Jul 2009 09:56:03 -0700 Francis Yakin fya...@liquid.com wrote: Norberto, Thanks, I think my questions is: why not generate your SQL output directly into your oracle server as a file What type of file is this? a file in a format that you can then import into SOLR. _ {Beto|Norberto|Numard} Meijome Gravity cannot be blamed for people falling in love. Albert Einstein I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Loading Data into Solr without HTTP
On Tue, Jul 7, 2009 at 8:41 AM, Anand Kumar Prabhakaranand2...@gmail.com wrote: Is there any way so that we can read the data from the CSV file and load it into the Solr database without using /update/csv That *is* the right way to load a CSV file into Solr. How many records are in the CSV file, and how much heap are you giving the JVM? Try a small CSV file first to make sure that it's being parsed correctly... for example, do a head -1000 bigfile.csv smallfile.csv Now upload that and inspect the documents by querying Solr to ensure that everything imported as expected. -Yonik http://www.lucidimagination.com
Re: Loading Data into Solr without HTTP
Thank you for the Reply Yonik, I have already tried with smaller CSV files, currently we are trying to load a CSV file of 400 MB but this is taking too much time(more than half an hour). I want to know is there any method to do it much faster, we have overcome the OutOfMemoryException by increasing heap space. Please suggest. Yonik Seeley-2 wrote: On Tue, Jul 7, 2009 at 8:41 AM, Anand Kumar Prabhakaranand2...@gmail.com wrote: Is there any way so that we can read the data from the CSV file and load it into the Solr database without using /update/csv That *is* the right way to load a CSV file into Solr. How many records are in the CSV file, and how much heap are you giving the JVM? Try a small CSV file first to make sure that it's being parsed correctly... for example, do a head -1000 bigfile.csv smallfile.csv Now upload that and inspect the documents by querying Solr to ensure that everything imported as expected. -Yonik http://www.lucidimagination.com -- View this message in context: http://www.nabble.com/Loading-Data-into-Solr-without-HTTP-tp24372564p24373116.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can´t use wildcard * on alphanumeric values?
Thank you, that was it. Why is the preserveOriginal=1 option nowhere documented? Shalin Shekhar Mangar wrote: On Tue, Jul 7, 2009 at 2:10 PM, gateway0 reiterwo...@yahoo.de wrote: I indexed my data and defined a defaultsearchfield named text: (field name=text type=text indexed=true stored=false multiValued=true/). Lets say I have 2 values indexed 1.value ABCD 2.value ABCD3456 Now when I do a wildcard search over that two values the following happens: - query:q=AB* = All two values are returned ABCD and ABCD3456 = wildcard is functioning! - query:q=ABCD3* = No results are returned! (expected: ABCD3456) = wildcard does not function! Am I doing something wrong? Is there a way to use wildcards on alphanumeric values? I think the problem is that the WordDelimiterFilter applied on 'text' type, splits 'ABCD3456' into 'ABCD' and '3456' etc. Also, prefix queries are not analyzed so that don't pass through the same filters. I guess one simple solution to your problem is to add preserveOriginal=1 to the WordDelimiterFilterFactory definition inside the 'text' field type. -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/Can%C2%B4t-use-wildcard-%22*%22-on-alphanumeric-values--tp24369209p24373135.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Loading Data into Solr without HTTP
On Tue, Jul 7, 2009 at 9:14 AM, Anand Kumar Prabhakaranand2...@gmail.com wrote: I want to know is there any method to do it much faster, we have overcome the OutOfMemoryException by increasing heap space. Optimize your schema - eliminate all unnecessary copyFields and default values. The current example schema is not good for performance benchmarking. -Yonik http://www.lucidimagination.com
Re: Loading Data into Solr without HTTP
Also make sure you don't have any autocommit rules enabled in solrconfig.xml How many documents are in the 400MB CSV file, and how long does it take to index now? -Yonik http://www.lucidimagination.com On Tue, Jul 7, 2009 at 10:03 AM, Anand Kumar Prabhakaranand2...@gmail.com wrote: Hi Yonik, Currently our Schema has very few fields and we don't have any copy fields also. Please find the below Schema.xml we are using: ?xml version=1.0 encoding=UTF-8 ? schema name=cmps version=1.1 !-- attribute name is the name of this schema and is only used for display purposes. Applications should change this to reflect the nature of the search collection. version=1.1 is Solr's version number for the schema syntax and semantics. It should not normally be changed by applications. 1.0: multiValued attribute did not exist, all fields are multiValued by nature 1.1: multiValued attribute introduced, false by default -- types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=boolean class=solr.BoolField sortMissingLast=true omitNorms=true/ fieldType name=integer class=solr.IntField omitNorms=true/ fieldType name=long class=solr.LongField omitNorms=true/ fieldType name=float class=solr.FloatField omitNorms=true/ fieldType name=double class=solr.DoubleField omitNorms=true/ fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true/ fieldType name=slong class=solr.SortableLongField sortMissingLast=true omitNorms=true/ fieldType name=sfloat class=solr.SortableFloatField sortMissingLast=true omitNorms=true/ fieldType name=sdouble class=solr.SortableDoubleField sortMissingLast=true omitNorms=true/ fieldType name=date class=solr.DateField sortMissingLast=true omitNorms=true/ fieldType name=random class=solr.RandomSortField indexed=true / fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=textTight class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=textSpell class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=alphaNumericKeyword class=solr.TextField sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ /analyzer /fieldType fieldtype name=ignored stored=false indexed=false class=solr.StrField / fieldType name=phNo class=solr.TextField positionIncrementGap=100 sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ /analyzer /fieldType fieldType name=textStA
Re: Loading Data into Solr without HTTP
Hi Yonik, Currently our Schema has very few fields and we don't have any copy fields also. Please find the below Schema.xml we are using: ?xml version=1.0 encoding=UTF-8 ? schema name=cmps version=1.1 !-- attribute name is the name of this schema and is only used for display purposes. Applications should change this to reflect the nature of the search collection. version=1.1 is Solr's version number for the schema syntax and semantics. It should not normally be changed by applications. 1.0: multiValued attribute did not exist, all fields are multiValued by nature 1.1: multiValued attribute introduced, false by default -- types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=boolean class=solr.BoolField sortMissingLast=true omitNorms=true/ fieldType name=integer class=solr.IntField omitNorms=true/ fieldType name=long class=solr.LongField omitNorms=true/ fieldType name=float class=solr.FloatField omitNorms=true/ fieldType name=double class=solr.DoubleField omitNorms=true/ fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true/ fieldType name=slong class=solr.SortableLongField sortMissingLast=true omitNorms=true/ fieldType name=sfloat class=solr.SortableFloatField sortMissingLast=true omitNorms=true/ fieldType name=sdouble class=solr.SortableDoubleField sortMissingLast=true omitNorms=true/ fieldType name=date class=solr.DateField sortMissingLast=true omitNorms=true/ fieldType name=random class=solr.RandomSortField indexed=true / fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=textTight class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=textSpell class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=alphaNumericKeyword class=solr.TextField sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ /analyzer /fieldType fieldtype name=ignored stored=false indexed=false class=solr.StrField / fieldType name=phNo class=solr.TextField positionIncrementGap=100 sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ /analyzer /fieldType fieldType name=textStA class=solr.TextField positionIncrementGap=100 sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/
Re: Indexing XML
Saeli, Solr expects a certain XML structure when adding documents. You'll need to come up with a mapping, that translates the original structure to one that solr understands. You can then search solr and get those solr documents back. If you want to keep the original XML, you can store it in a field within the solr document. original data - mapping - solr XML document (with a field for the original data) Does that make sense? Can you describe what it is you want to do with results of a search? Matt On Tue, Jul 7, 2009 at 10:25 AM, Saeli Mathieu saeli.math...@gmail.comwrote: Hello. I'm a new user of Solr, I already used Lucene to index files and search. But my programme was too slow, it's why I was looking for another solution, and I thought I found it. I said I thought because I don't know if it's possible to use solar with this kind of XML files. lom xsi:schemaLocation=http://ltsc.ieee.org/xsd/lomv1.0 http://ltsc.ieee.org/xsd/lomv1.0/lom.xsd; general identifier catalogSTRING HERE/catalog entry STRING HERE /entry /identifier title string language=fr STRING HERE /string /title languagefr/language description string language=fr STRING HERE /string /description /general lifeCycle status sourceSTRING HERE/source valueSTRING HERE/value /status contribute role sourceSTRING HERE/source valueSTRING HERE/value /role entitySTRING HERE /entity /contribute /lifeCycle metaMetadata identifier catalogSTRING HERE/catalog entrySTRING HERE/entry /identifier contribute role sourceSTRING HERE/source valueSTRING HERE/value /role entitySTRING HERE /entity date dateTimeSTRING HERE/dateTime /date /contribute contribute role sourceSTRING HERE/source valueSTRING HERE/value /role entitySTRING HERE /entity entitySTRING HERE/entity entitySTRING HERE /entity date dateTimeSTRING HERE/dateTime /date /contribute metadataSchemaSTRING HERE/metadataSchema languageSTRING HERE/language /metaMetadata technical locationSTRING HERE /location /technical educational intendedEndUserRole sourceSTRING HERE/source valueSTRING HERE/value /intendedEndUserRole context sourceSTRING HERE/source valueSTRING HERE/value /context typicalAgeRange string language=frSTRING HERE/string /typicalAgeRange description string language=frSTRING HERE/string /description description string language=frSTRING HERE/string /description languageSTRING HERE/language /educational annotation entitySTRING HERE /entity date dateTimeSTRING HERE/dateTime /date /annotation classification purpose sourceSTRING HERE/source valueSTRING HERE/value /purpose /classification classification purpose sourceSTRING HERE/source valueSTRING HERE/value /purpose taxonPath source string language=frSTRING HERE/string /source taxon idSTRING HERE/id entry string language=frSTRING HERE/string /entry /taxon /taxonPath /classification classification purpose sourceSTRING HERE/source valueSTRING HERE/value /purpose taxonPath source string language=frSTRING HERE /string /source taxon idSTRING HERE/id entry string language=frSTRING HERE/string /entry /taxon /taxonPath taxonPath source string language=frSTRING HERE/string /source taxon idSTRING HERE/id entry string language=frSTRING HERE/string /entry /taxon /taxonPath /classification /lom I don't know how I can use this kind of file with Solr because the XML example are this one. add doc field name=idSOLR1000/field field name=nameSolr, the Enterprise Search Server/field field name=manuApache Software Foundation/field field name=catsoftware/field field name=catsearch/field field name=featuresAdvanced Full-Text Search Capabilities using Lucene/field field name=featuresOptimized for High Volume Web Traffic/field field name=featuresStandards Based Open Interfaces - XML and HTTP/field field name=featuresComprehensive HTML Administration Interfaces/field field name=featuresScalability - Efficient Replication to other Solr Search Servers/field field name=featuresFlexible and Adaptable with XML configuration and Schema/field field name=featuresGood unicode support: h#xE9;llo (hello with an accent over the e)/field field name=price0/field field name=popularity10/field field name=inStocktrue/field field name=incubationdate_dt2006-01-17T00:00:00.000Z/field /doc /add I understood Solr need this kind of architecture, by Architecture I mean field + name=keywordValue/field or as you can see I can't use this kind of architecture because I'm not allow to change my XML files. I'm looking forward to read you. Mathieu Saeli -- Saeli Mathieu.
Re: Filtering MoreLikeThis results
I think fq only works on the main response, not the mlt matches. I found a couple of releated jira: http://issues.apache.org/jira/browse/SOLR-295 http://issues.apache.org/jira/browse/SOLR-281 If I am reading them correctly, I should be able to use DIsMax and MoreLikeThis together. I will give that a try and report back. Bill On Tue, Jul 7, 2009 at 4:45 AM, Marc Sturlese marc.sturl...@gmail.comwrote: Using MoreLikeThisHandler you can use fq to filter your results. As far as I know bq are not allowed. Bill Au wrote: I have been trying to restrict MoreLikeThis results without any luck also. In additional to restricting the results, I am also looking to influence the scores similar to the way boost query (bq) works in the DisMaxRequestHandler. I think Solr's MoreLikeThis depends on Lucene's contrib queries MoreLikeThis, or at least it used to. Has anyone looked into enhancing Solrs' MoreLikeThis to support bq and restricting mlt results? Bill On Mon, Jul 6, 2009 at 2:16 PM, Yao Ge yao...@gmail.com wrote: I could not find any support from http://wiki.apache.org/solr/MoreLikeThison how to restrict MLT results to certain subsets. I passed along a fq parameter and it is ignored. Since we can not incorporate the filters in the query itself which is used to retrieve the target for similarity comparison, it appears there is no way to filter MLT results. BTW. I am using Solr 1.3. Please let me know if there is way (other than hacking the source code) to do this. Thanks! -- View this message in context: http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24360355.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24369257.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexing XML
Hello. I'm a new user of Solr, I already used Lucene to index files and search. But my programme was too slow, it's why I was looking for another solution, and I thought I found it. I said I thought because I don't know if it's possible to use solar with this kind of XML files. lom xsi:schemaLocation=http://ltsc.ieee.org/xsd/lomv1.0 http://ltsc.ieee.org/xsd/lomv1.0/lom.xsd; general identifier catalogSTRING HERE/catalog entry STRING HERE /entry /identifier title string language=fr STRING HERE /string /title languagefr/language description string language=fr STRING HERE /string /description /general lifeCycle status sourceSTRING HERE/source valueSTRING HERE/value /status contribute role sourceSTRING HERE/source valueSTRING HERE/value /role entitySTRING HERE /entity /contribute /lifeCycle metaMetadata identifier catalogSTRING HERE/catalog entrySTRING HERE/entry /identifier contribute role sourceSTRING HERE/source valueSTRING HERE/value /role entitySTRING HERE /entity date dateTimeSTRING HERE/dateTime /date /contribute contribute role sourceSTRING HERE/source valueSTRING HERE/value /role entitySTRING HERE /entity entitySTRING HERE/entity entitySTRING HERE /entity date dateTimeSTRING HERE/dateTime /date /contribute metadataSchemaSTRING HERE/metadataSchema languageSTRING HERE/language /metaMetadata technical locationSTRING HERE /location /technical educational intendedEndUserRole sourceSTRING HERE/source valueSTRING HERE/value /intendedEndUserRole context sourceSTRING HERE/source valueSTRING HERE/value /context typicalAgeRange string language=frSTRING HERE/string /typicalAgeRange description string language=frSTRING HERE/string /description description string language=frSTRING HERE/string /description languageSTRING HERE/language /educational annotation entitySTRING HERE /entity date dateTimeSTRING HERE/dateTime /date /annotation classification purpose sourceSTRING HERE/source valueSTRING HERE/value /purpose /classification classification purpose sourceSTRING HERE/source valueSTRING HERE/value /purpose taxonPath source string language=frSTRING HERE/string /source taxon idSTRING HERE/id entry string language=frSTRING HERE/string /entry /taxon /taxonPath /classification classification purpose sourceSTRING HERE/source valueSTRING HERE/value /purpose taxonPath source string language=frSTRING HERE /string /source taxon idSTRING HERE/id entry string language=frSTRING HERE/string /entry /taxon /taxonPath taxonPath source string language=frSTRING HERE/string /source taxon idSTRING HERE/id entry string language=frSTRING HERE/string /entry /taxon /taxonPath /classification /lom I don't know how I can use this kind of file with Solr because the XML example are this one. add doc field name=idSOLR1000/field field name=nameSolr, the Enterprise Search Server/field field name=manuApache Software Foundation/field field name=catsoftware/field field name=catsearch/field field name=featuresAdvanced Full-Text Search Capabilities using Lucene/field field name=featuresOptimized for High Volume Web Traffic/field field name=featuresStandards Based Open Interfaces - XML and HTTP/field field name=featuresComprehensive HTML Administration Interfaces/field field name=featuresScalability - Efficient Replication to other Solr Search Servers/field field name=featuresFlexible and Adaptable with XML configuration and Schema/field field name=featuresGood unicode support: h#xE9;llo (hello with an accent over the e)/field field name=price0/field field name=popularity10/field field name=inStocktrue/field field name=incubationdate_dt2006-01-17T00:00:00.000Z/field /doc /add I understood Solr need this kind of architecture, by Architecture I mean field + name=keywordValue/field or as you can see I can't use this kind of architecture because I'm not allow to change my XML files. I'm looking forward to read you. Mathieu Saeli -- Saeli Mathieu.
Question regarding ExtractingRequestHandler
Hello, I've recently started using this handler to index MS Word and PDF files. When I set ext.extract.only=true, I get back all the metadata that is associated with that file. If I want to index, I need to set ext.extract.only=false. If I want to index all that metadata along with the contents, what inputs do I need to pass to the http request? Do I have to specifically define all the fields in the schema or can Solr dynamically generate those fields? Thanks. -- View this message in context: http://www.nabble.com/Question-regarding-ExtractingRequestHandler-tp24374393p24374393.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SynonymFilterFactory usage
anyone? ps: my apologies if you guys think its spamming. but i really need some help here. thanks! mani On Sun, Jul 5, 2009 at 12:49 PM, Mani Kumar manikumarchau...@gmail.comwrote: hi all, i am confused a bit about how to use synonym filter configs. i am using solr 1.4. default config is like : for query analyzer: filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ for index analyzer: its commented. while looking @ documentation deeply on http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46 ***Keep in mind that while the SynonymFilter will happily work with synonyms containing multiple words (ie: ** sea biscuit, sea biscit, seabiscuit**) The recommended approach for dealing with synonyms like this, is to expand the synonym when indexing. This is because there are two potential issues that can arrise at query time** * * * considering this above recommendation i think following is the best option for synonym filter * for query analyzer: * filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ for index analyzer: filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ am i right? what do you guys suggest? thanks! mani kumar
Browse indexed terms in a field
Hello, Here is what I would like to achieve : in an indexed document there's a fulltext indexed field ; I'd like to browse the terms in this field, ie. get all the terms that match the begining of a given word, for example. I can get all the field's facets for this document, but that's a lot of terms to process ; is there a way to constraint the returned facets ? Thank you for your highlights. Kind regards, Pierre. _ More than messages–check out the rest of the Windows Live™. http://www.microsoft.com/windows/windowslive/
Re: Filtering MoreLikeThis results
At least in trunk, if you request for: http://localhost:8084/solr/core_A/mlt?q=id:7468365fq=price[100 TO 200] It will filter the MoreLikeThis results Bill Au wrote: I think fq only works on the main response, not the mlt matches. I found a couple of releated jira: http://issues.apache.org/jira/browse/SOLR-295 http://issues.apache.org/jira/browse/SOLR-281 If I am reading them correctly, I should be able to use DIsMax and MoreLikeThis together. I will give that a try and report back. Bill On Tue, Jul 7, 2009 at 4:45 AM, Marc Sturlese marc.sturl...@gmail.comwrote: Using MoreLikeThisHandler you can use fq to filter your results. As far as I know bq are not allowed. Bill Au wrote: I have been trying to restrict MoreLikeThis results without any luck also. In additional to restricting the results, I am also looking to influence the scores similar to the way boost query (bq) works in the DisMaxRequestHandler. I think Solr's MoreLikeThis depends on Lucene's contrib queries MoreLikeThis, or at least it used to. Has anyone looked into enhancing Solrs' MoreLikeThis to support bq and restricting mlt results? Bill On Mon, Jul 6, 2009 at 2:16 PM, Yao Ge yao...@gmail.com wrote: I could not find any support from http://wiki.apache.org/solr/MoreLikeThison how to restrict MLT results to certain subsets. I passed along a fq parameter and it is ignored. Since we can not incorporate the filters in the query itself which is used to retrieve the target for similarity comparison, it appears there is no way to filter MLT results. BTW. I am using Solr 1.3. Please let me know if there is way (other than hacking the source code) to do this. Thanks! -- View this message in context: http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24360355.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24369257.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24374996.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Browse indexed terms in a field
You can use facet.perfix to match the beginning of a given word: http://wiki.apache.org/solr/SimpleFacetParameters#head-579914ef3a14d775a5ac64d2c17a53f3364e3cf6 Bill On Tue, Jul 7, 2009 at 11:02 AM, Pierre-Yves LANDRON pland...@hotmail.comwrote: Hello, Here is what I would like to achieve : in an indexed document there's a fulltext indexed field ; I'd like to browse the terms in this field, ie. get all the terms that match the begining of a given word, for example. I can get all the field's facets for this document, but that's a lot of terms to process ; is there a way to constraint the returned facets ? Thank you for your highlights. Kind regards, Pierre. _ More than messages–check out the rest of the Windows Live™. http://www.microsoft.com/windows/windowslive/
Re: Filtering MoreLikeThis results
I have been using the StandardRequestHandler (ie /solr/select). fq does work with the MoreLikeThisHandler. I will switch to use that. Thanks. Bill On Tue, Jul 7, 2009 at 11:02 AM, Marc Sturlese marc.sturl...@gmail.comwrote: At least in trunk, if you request for: http://localhost:8084/solr/core_A/mlt?q=id:7468365fq=price[100http://localhost:8084/solr/core_A/mlt?q=id:7468365fq=price%5B100TO 200] It will filter the MoreLikeThis results Bill Au wrote: I think fq only works on the main response, not the mlt matches. I found a couple of releated jira: http://issues.apache.org/jira/browse/SOLR-295 http://issues.apache.org/jira/browse/SOLR-281 If I am reading them correctly, I should be able to use DIsMax and MoreLikeThis together. I will give that a try and report back. Bill On Tue, Jul 7, 2009 at 4:45 AM, Marc Sturlese marc.sturl...@gmail.comwrote: Using MoreLikeThisHandler you can use fq to filter your results. As far as I know bq are not allowed. Bill Au wrote: I have been trying to restrict MoreLikeThis results without any luck also. In additional to restricting the results, I am also looking to influence the scores similar to the way boost query (bq) works in the DisMaxRequestHandler. I think Solr's MoreLikeThis depends on Lucene's contrib queries MoreLikeThis, or at least it used to. Has anyone looked into enhancing Solrs' MoreLikeThis to support bq and restricting mlt results? Bill On Mon, Jul 6, 2009 at 2:16 PM, Yao Ge yao...@gmail.com wrote: I could not find any support from http://wiki.apache.org/solr/MoreLikeThison how to restrict MLT results to certain subsets. I passed along a fq parameter and it is ignored. Since we can not incorporate the filters in the query itself which is used to retrieve the target for similarity comparison, it appears there is no way to filter MLT results. BTW. I am using Solr 1.3. Please let me know if there is way (other than hacking the source code) to do this. Thanks! -- View this message in context: http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24360355.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24369257.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24374996.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr set up
Hi, I was interested in creating a test environment where i can make use of solr/ lucene .My objective is to be able to test various features of solr .(replication , performance, indexing , searching and so on) I wanted someone to give me a start on above.I am well versed with lucene/solr basics. Gaurav
Re: solr health check
solr jay wrote: Hi, I am looking at this piece of configuration in solrconfig.xml admin defaultQuerysolr/defaultQuery gettableFiles solrconfig.xml schema.xml /gettableFiles pingQueryq=solramp;version=2.0amp;start=0amp;rows=0/pingQuery !-- configure a healthcheck file for servers behind a loadbalancer -- healthcheck type=fileserver-enabled/healthcheck /admin I've never used this feature before, but reading source code... It wasn't clear to me what 'server-enabled' means here. Is it a file name? Yes, it is file name. If it is file name, where the file should be? The file name should be absolute path or relative path from solr work directory (if you start solr from example directory, make server-enabled file in example directory). I added healthcheck type=fileserver-enabled/healthcheckand admin/ping stopped working, which is good, but I couldn't make it work again, and admin UI generate an exception. Anyone used this feature before? I don't understand why you are getting the follwoing error... You should get HTTP ERROR: 503 Service disabled instead... Koji Thanks, J HTTP ERROR: 500 PWC6033: Unable to compile class for JSP PWC6197: An error occurred at line: 28 in the jsp file: /admin/action.jsp PWC6199: Generated servlet error: Type mismatch: cannot convert from Logger to Logger PWC6197: An error occurred at line: 28 in the jsp file: /admin/action.jsp PWC6199: Generated servlet error: The method log(Level, String) is undefined for the type Logger org.apache.jasper.JasperException: PWC6033: Unable to compile class for JSP PWC6197: An error occurred at line: 28 in the jsp file: /admin/action.jsp PWC6199: Generated servlet error: Type mismatch: cannot convert from Logger to Logger PWC6197: An error occurred at line: 28 in the jsp file: /admin/action.jsp PWC6199: Generated servlet error: The method log(Level, String) is undefined for the type Logger at org.apache.jasper.compiler.DefaultErrorHandler.javacError(DefaultErrorHandler.java:94) at org.apache.jasper.compiler.ErrorDispatcher.javacError(ErrorDispatcher.java:267) at org.apache.jasper.compiler.Compiler.generateClass(Compiler.java:332) at org.apache.jasper.compiler.Compiler.compile(Compiler.java:389) at org.apache.jasper.JspCompilationContext.compile(JspCompilationContext.java:579) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:344) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:464) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:358) at javax.servlet.http.HttpServlet.service(HttpServlet.java:853) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:367) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:268) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:273) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:295) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:503) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:827) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:511) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:210) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:379) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) RequestURI=/solr/admin/action.jsp
posting binary file and metadata in two separate documents
Hi. I am currently using Solr Cell to extract content from binary files, and I am passing along some additional metadata with ext.literal params. Sample below: curl http://localhost:8983/solr/update/extract?ext.literal.id=2ext.literal.some_code1=code1ext.literal.some_code2=code2ext.idx.attr=true\ext.def.fl=text; -F myfi...@myfile.pdf Where I have large numbers of ext.literal params this becomes a bit of a chore.. and it would be the same case in an html form with many params... can I pass both files to '/update/extract' as documents, (files) linked together? Or are there any other options like this? Perhaps something I can do with Solrj. Thanks in advance for your help, regards, Ross. -- View this message in context: http://www.nabble.com/posting-binary-file-and-metadata-in-two-separate-documents-tp24375649p24375649.html Sent from the Solr - User mailing list archive at Nabble.com.
KStem download
Hi, I want to try KStem. I'm following the instructions on this page: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem ... but the download link doesn't work. Is anyone know the new location to download KStem? -- View this message in context: http://www.nabble.com/KStem-download-tp24375856p24375856.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query on the updation of synonym and stopword file.
Sagar, I am facing a problem here that even after the core reload and re-indexing the documents the new updated synonym or stop words are not loaded. Seems so the filters are not aware that these files are updated so the solution to me is to restart the whole container in which I have embedded the Solr server; it is not feasible in production. I am not a multicore user, but I can see the synonyms.txt updated after reloading the core (I verified it via analysis.jsp, not re-indexing), wothout restarting solr server. I'm using 1.4. What version are you using? Koji Sagar Khetkade wrote: Hello All, I was figuring out the issue with the synonym.txt and stopword.txt files being updated on regular interval. Here in my case I am updating the synonym.txt and stopword.txt files as the synonym and stop word dictionary is update. I am facing a problem here that even after the core reload and re-indexing the documents the new updated synonym or stop words are not loaded. Seems so the filters are not aware that these files are updated so the solution to me is to restart the whole container in which I have embedded the Solr server; it is not feasible in production. I came across the discussion with subject “ synonyms.txt file updated frequently” in which Grant had a view to write a new logic in SynonymFilterFactory which would take care of this issue. Is there any possible solution to this or is this the solution. Thanks in advance! Regards, Sagar Khetkade _ Missed any of the IPL matches ? Catch a recap of all the action on MSN Videos http://msnvideos.in/iplt20/msnvideoplayer.aspx
Re: Multiple values for custom fields provided in SOLR query
Hi Otis, Thanks for replying to my query. My query is, if multiple values are provided for a custom field then how can it be represented in a SOLR query. So if my field is fileID and its values are 111, 222 and 333 and my search string is ‘product’ then how can this be represented in a SOLR query? I want to perform the search on basis of fileIDs *and* search string provided. If i provide the query in the format, q=fileID:111+fileID:222+fileID:333+product, then how will it actually search? Can you please provide me the correct format of the query? Regards Suryasnat Das On Mon, Jul 6, 2009 at 10:05 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: I actually don't fully understand your question. q=+fileID:111+fileID:222+fileID:333+apple looks like a valid query to me. (not sure what that space encoded as + is, though) Also not sure what you mean by: Basically the requirement is , if fileIDs are provided as search parameter then search should happen on the basis of fileID. Do you mean apple should be ignored if a term (field name:field value) is provided? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Suryasnat Das suryaatw...@gmail.com To: solr-user@lucene.apache.org Sent: Monday, July 6, 2009 11:31:10 AM Subject: Multiple values for custom fields provided in SOLR query Hi, I have a requirement in which i need to have multiple values in my custom fields while forming the search query to SOLR. For example, fileID is my custom field. I have defined the fileID in schema.xml as name=fileID type=string indexed=true stored=true required=true multiValued=true/. Now fileID can have multiple values like 111,222,333 etc. So will my query be of the form, q=+fileID:111+fileID:222+fileID:333+apple where apple is my search query string. I tried with the above query but it did not work. SOLR gave invalid query error. Basically the requirement is , if fileIDs are provided as search parameter then search should happen on the basis of fileID. Is my approach correct or i need to do something else? Please, if immediate help is provided then that would be great. Regards Suryasnat Das Infosys.
Re: Filtering MoreLikeThis results
I am not sure about the parameters for MLT the requestHandler plugin. Can one of you share the solrconfig.xml entry for MLT? Thanks in advance. -Yao Bill Au wrote: I have been using the StandardRequestHandler (ie /solr/select). fq does work with the MoreLikeThisHandler. I will switch to use that. Thanks. Bill On Tue, Jul 7, 2009 at 11:02 AM, Marc Sturlese marc.sturl...@gmail.comwrote: At least in trunk, if you request for: http://localhost:8084/solr/core_A/mlt?q=id:7468365fq=price[100http://localhost:8084/solr/core_A/mlt?q=id:7468365fq=price%5B100TO 200] It will filter the MoreLikeThis results Bill Au wrote: I think fq only works on the main response, not the mlt matches. I found a couple of releated jira: http://issues.apache.org/jira/browse/SOLR-295 http://issues.apache.org/jira/browse/SOLR-281 If I am reading them correctly, I should be able to use DIsMax and MoreLikeThis together. I will give that a try and report back. Bill On Tue, Jul 7, 2009 at 4:45 AM, Marc Sturlese marc.sturl...@gmail.comwrote: Using MoreLikeThisHandler you can use fq to filter your results. As far as I know bq are not allowed. Bill Au wrote: I have been trying to restrict MoreLikeThis results without any luck also. In additional to restricting the results, I am also looking to influence the scores similar to the way boost query (bq) works in the DisMaxRequestHandler. I think Solr's MoreLikeThis depends on Lucene's contrib queries MoreLikeThis, or at least it used to. Has anyone looked into enhancing Solrs' MoreLikeThis to support bq and restricting mlt results? Bill On Mon, Jul 6, 2009 at 2:16 PM, Yao Ge yao...@gmail.com wrote: I could not find any support from http://wiki.apache.org/solr/MoreLikeThison how to restrict MLT results to certain subsets. I passed along a fq parameter and it is ignored. Since we can not incorporate the filters in the query itself which is used to retrieve the target for similarity comparison, it appears there is no way to filter MLT results. BTW. I am using Solr 1.3. Please let me know if there is way (other than hacking the source code) to do this. Thanks! -- View this message in context: http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24360355.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24369257.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24374996.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24377360.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing XML
Mathieu, have a look at Solr's DataImportHandler. It provides a configuration-based approach to index different types of datasources including relational databases and XML files. In particular have a look at the XpathEntityProcessor ( http://wiki.apache.org/solr/DataImportHandler#head-f1502b1ed71d98ef0120671db5762e137e63f9d2) which allows you to use xpath syntax to map xml data to index fields. -Jay On Tue, Jul 7, 2009 at 7:25 AM, Saeli Mathieu saeli.math...@gmail.comwrote: Hello. I'm a new user of Solr, I already used Lucene to index files and search. But my programme was too slow, it's why I was looking for another solution, and I thought I found it. I said I thought because I don't know if it's possible to use solar with this kind of XML files. lom xsi:schemaLocation=http://ltsc.ieee.org/xsd/lomv1.0 http://ltsc.ieee.org/xsd/lomv1.0/lom.xsd; general identifier catalogSTRING HERE/catalog entry STRING HERE /entry /identifier title string language=fr STRING HERE /string /title languagefr/language description string language=fr STRING HERE /string /description /general lifeCycle status sourceSTRING HERE/source valueSTRING HERE/value /status contribute role sourceSTRING HERE/source valueSTRING HERE/value /role entitySTRING HERE /entity /contribute /lifeCycle metaMetadata identifier catalogSTRING HERE/catalog entrySTRING HERE/entry /identifier contribute role sourceSTRING HERE/source valueSTRING HERE/value /role entitySTRING HERE /entity date dateTimeSTRING HERE/dateTime /date /contribute contribute role sourceSTRING HERE/source valueSTRING HERE/value /role entitySTRING HERE /entity entitySTRING HERE/entity entitySTRING HERE /entity date dateTimeSTRING HERE/dateTime /date /contribute metadataSchemaSTRING HERE/metadataSchema languageSTRING HERE/language /metaMetadata technical locationSTRING HERE /location /technical educational intendedEndUserRole sourceSTRING HERE/source valueSTRING HERE/value /intendedEndUserRole context sourceSTRING HERE/source valueSTRING HERE/value /context typicalAgeRange string language=frSTRING HERE/string /typicalAgeRange description string language=frSTRING HERE/string /description description string language=frSTRING HERE/string /description languageSTRING HERE/language /educational annotation entitySTRING HERE /entity date dateTimeSTRING HERE/dateTime /date /annotation classification purpose sourceSTRING HERE/source valueSTRING HERE/value /purpose /classification classification purpose sourceSTRING HERE/source valueSTRING HERE/value /purpose taxonPath source string language=frSTRING HERE/string /source taxon idSTRING HERE/id entry string language=frSTRING HERE/string /entry /taxon /taxonPath /classification classification purpose sourceSTRING HERE/source valueSTRING HERE/value /purpose taxonPath source string language=frSTRING HERE /string /source taxon idSTRING HERE/id entry string language=frSTRING HERE/string /entry /taxon /taxonPath taxonPath source string language=frSTRING HERE/string /source taxon idSTRING HERE/id entry string language=frSTRING HERE/string /entry /taxon /taxonPath /classification /lom I don't know how I can use this kind of file with Solr because the XML example are this one. add doc field name=idSOLR1000/field field name=nameSolr, the Enterprise Search Server/field field name=manuApache Software Foundation/field field name=catsoftware/field field name=catsearch/field field name=featuresAdvanced Full-Text Search Capabilities using Lucene/field field name=featuresOptimized for High Volume Web Traffic/field field name=featuresStandards Based Open Interfaces - XML and HTTP/field field name=featuresComprehensive HTML Administration Interfaces/field field name=featuresScalability - Efficient Replication to other Solr Search Servers/field field name=featuresFlexible and Adaptable with XML configuration and Schema/field field name=featuresGood unicode support: h#xE9;llo (hello with an accent over the e)/field field name=price0/field field name=popularity10/field field name=inStocktrue/field field name=incubationdate_dt2006-01-17T00:00:00.000Z/field /doc /add I understood Solr need this kind of architecture, by Architecture I mean field + name=keywordValue/field or as you can see I can't use this kind of architecture because I'm not allow to change my XML files. I'm looking forward to read you. Mathieu Saeli -- Saeli Mathieu.
Solr Set Up
Hi, I was interested in creating a test environment where i can make use of solr/ lucene .My objective is to be able to test various features of solr .(replication , performance, indexing , searching and so on) I wanted someone to give me a start on above.I am well versed with lucene/solr basics. Gaurav
How to get various records in the result set
Hi buddy, I am working on a music search project and I have a special requirement about the ranking when querying the artist name. Ex: When I query the artist ne yo, there are 500results and maybe 100 song names are repeated. So the ideal thing is to let users get more different songs in on page and the results have lyrics must be shown in the front. My current solr query is: ?q=ne+yoqf=artistdefType=dismaxsort=lyric%20desc,links%20descstart=0rows=20indent=on then the results will shows same song names together because those records always get the same score. How to implement that effect? Thxs.
RE: Is there any other way to load the index beside using http connection?
I did try: curl 'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csvstream.contentType=text/plain;charset=utf-8' It doesn't work Francis -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Tuesday, July 07, 2009 4:59 AM To: solr-user@lucene.apache.org Cc: Norberto Meijome Subject: Re: Is there any other way to load the index beside using http connection? Look at the error - it's bash (your command line shell) complaining. The '' terminates one command and puts it in the background. Surrounding the command with quotes will get you one step closer: curl 'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csvstream.contentType=text/plain;charset=utf-8' -Yonik http://www.lucidimagination.com On Mon, Jul 6, 2009 at 2:11 PM, Francis Yakinfya...@liquid.com wrote: Ok, I have a CSV file(called it test.csv) from database. When I tried to upload this file to solr using this cmd, I got stream.contentType=text/plain: No such file or directory error curl http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csvstream.contentType=text/plain;charset=utf-8 -bash: stream.contentType=text/plain: No such file or directory undefined field cat What did I do wrong? Francis -Original Message- From: Norberto Meijome [mailto:numard...@gmail.com] Sent: Monday, July 06, 2009 11:01 AM To: Francis Yakin Cc: solr-user@lucene.apache.org Subject: Re: Is there any other way to load the index beside using http connection? On Mon, 6 Jul 2009 09:56:03 -0700 Francis Yakin fya...@liquid.com wrote: Norberto, Thanks, I think my questions is: why not generate your SQL output directly into your oracle server as a file What type of file is this? a file in a format that you can then import into SOLR. _ {Beto|Norberto|Numard} Meijome Gravity cannot be blamed for people falling in love. Albert Einstein I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
RE: Is there any other way to load the index beside using http connection?
With curl 'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csvstream.contentType=text/plain;charset=utf-8' No errors now. But , how can I verify if the update happening? Thanks Francis -Original Message- From: Francis Yakin [mailto:fya...@liquid.com] Sent: Tuesday, July 07, 2009 10:37 AM To: 'solr-user@lucene.apache.org'; 'yo...@lucidimagination.com' Cc: Norberto Meijome Subject: RE: Is there any other way to load the index beside using http connection? I did try: curl 'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csvstream.contentType=text/plain;charset=utf-8' It doesn't work Francis -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Tuesday, July 07, 2009 4:59 AM To: solr-user@lucene.apache.org Cc: Norberto Meijome Subject: Re: Is there any other way to load the index beside using http connection? Look at the error - it's bash (your command line shell) complaining. The '' terminates one command and puts it in the background. Surrounding the command with quotes will get you one step closer: curl 'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csvstream.contentType=text/plain;charset=utf-8' -Yonik http://www.lucidimagination.com On Mon, Jul 6, 2009 at 2:11 PM, Francis Yakinfya...@liquid.com wrote: Ok, I have a CSV file(called it test.csv) from database. When I tried to upload this file to solr using this cmd, I got stream.contentType=text/plain: No such file or directory error curl http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csvstream.contentType=text/plain;charset=utf-8 -bash: stream.contentType=text/plain: No such file or directory undefined field cat What did I do wrong? Francis -Original Message- From: Norberto Meijome [mailto:numard...@gmail.com] Sent: Monday, July 06, 2009 11:01 AM To: Francis Yakin Cc: solr-user@lucene.apache.org Subject: Re: Is there any other way to load the index beside using http connection? On Mon, 6 Jul 2009 09:56:03 -0700 Francis Yakin fya...@liquid.com wrote: Norberto, Thanks, I think my questions is: why not generate your SQL output directly into your oracle server as a file What type of file is this? a file in a format that you can then import into SOLR. _ {Beto|Norberto|Numard} Meijome Gravity cannot be blamed for people falling in love. Albert Einstein I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Is there any other way to load the index beside using http connection?
The double quotes around the ampersand don't belong there. I think that UTF8 should also be the default, so the following should also work: curl 'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csv' -Yonik http://www.lucidimagination.com On Tue, Jul 7, 2009 at 1:37 PM, Francis Yakinfya...@liquid.com wrote: I did try: curl 'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csvstream.contentType=text/plain;charset=utf-8' It doesn't work Francis -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Tuesday, July 07, 2009 4:59 AM To: solr-user@lucene.apache.org Cc: Norberto Meijome Subject: Re: Is there any other way to load the index beside using http connection? Look at the error - it's bash (your command line shell) complaining. The '' terminates one command and puts it in the background. Surrounding the command with quotes will get you one step closer: curl 'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csvstream.contentType=text/plain;charset=utf-8' -Yonik http://www.lucidimagination.com
RE: Is there any other way to load the index beside using http connection?
yeah, It works now. How can I verify if the new CSV file get uploaded? Thanks Francis -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Tuesday, July 07, 2009 10:49 AM To: solr-user@lucene.apache.org Cc: Norberto Meijome Subject: Re: Is there any other way to load the index beside using http connection? The double quotes around the ampersand don't belong there. I think that UTF8 should also be the default, so the following should also work: curl 'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csv' -Yonik http://www.lucidimagination.com On Tue, Jul 7, 2009 at 1:37 PM, Francis Yakinfya...@liquid.com wrote: I did try: curl 'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csvstream.contentType=text/plain;charset=utf-8' It doesn't work Francis -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Tuesday, July 07, 2009 4:59 AM To: solr-user@lucene.apache.org Cc: Norberto Meijome Subject: Re: Is there any other way to load the index beside using http connection? Look at the error - it's bash (your command line shell) complaining. The '' terminates one command and puts it in the background. Surrounding the command with quotes will get you one step closer: curl 'http://localhost:8983/solr/update/csv?stream.file=/opt/apache-1.2.0/example/exampledocs/test.csvstream.contentType=text/plain;charset=utf-8' -Yonik http://www.lucidimagination.com
Re: Is there any other way to load the index beside using http connection?
On Tue, Jul 7, 2009 at 1:50 PM, Francis Yakinfya...@liquid.com wrote: yeah, It works now. How can I verify if the new CSV file get uploaded? point your browser at http://localhost:8983/solr/admin/stats.jsp Check out the UPDATE HANDLERS section -Yonik http://www.lucidimagination.com
Re: reindexed data on master not replicated to slave
It seemed that the patch fixed the symptom, but not the problem itself. Now the log messages looks good. After one download and installed the index, it printed out *Jul 7, 2009 10:35:10 AM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master.* but the files inside index directory did not change. Both index.properties and replication.properties were updated though. Just a couple of files: from master instance: -rw-r--r-- 1 worun wheel 181 Jul 7 09:28 _6.fdt -rw-r--r-- 1 worun wheel 12 Jul 7 09:28 _6.fdx -rw-r--r-- 1 worun wheel 131 Jul 7 09:28 _6.fnm -rw-r--r-- 1 worun wheel 27 Jul 7 09:28 _6.frq -rw-r--r-- 1 worun wheel 11 Jul 7 09:28 _6.nrm from slave instance: -rw-r--r-- 1 jianhanguo admin 70 Jul 6 18:55 _14_5.del -rw-r--r-- 1 jianhanguo admin4016 Jul 6 18:55 _15.fdt -rw-r--r-- 1 jianhanguo admin 268 Jul 6 18:55 _15.fdx -rw-r--r-- 1 jianhanguo admin 131 Jul 6 18:55 _15.fnm -rw-r--r-- 1 jianhanguo admin 726 Jul 6 18:55 _15.frq Thanks, J 2009/7/7 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com Jay , I am opening an issue SOLR-1264 https://issues.apache.org/jira/browse/SOLR-1264 I have attached a patch as well . I guess that is the fix. could you please confirm that. On Tue, Jul 7, 2009 at 12:59 AM, solr jaysolr...@gmail.com wrote: It looks that the problem is here or before that in SnapPuller.fetchLatestIndex(): terminateAndWaitFsyncService(); LOG.info(Conf files are not downloaded or are in sync); if (isSnapNeeded) { modifyIndexProps(tmpIndexDir.getName()); } else { successfulInstall = copyIndexFiles(tmpIndexDir, indexDir); } if (successfulInstall) { logReplicationTimeAndConfFiles(modifiedConfFiles); doCommit(); } Debugged into the place, and noticed that isSnapNeeded is true and therefore modifyIndexProps(tmpIndexDir.getName()); executed, but from the function name it looks that installing index actually happens in successfulInstall = copyIndexFiles(tmpIndexDir, indexDir); The function returns false, but the caller (doSnapPull) never checked the return value. Thanks, J On Mon, Jul 6, 2009 at 8:02 AM, solr jay solr...@gmail.com wrote: There is only one index directory: index/ Here is the content of index.properties #index properties #Fri Jul 03 14:17:12 PDT 2009 index=index.20090703021705 Thanks, J 2009/7/5 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com BTW , how many index dirs are there in the data dir ? what is there in the datadir/index.properties ? On Sat, Jul 4, 2009 at 12:15 AM, solr jaysolr...@gmail.com wrote: I tried it with the latest nightly build and got the same result. Actually that was the symptom and it made me looking at the index directory. The same log messages repeated again and again, never end. 2009/7/2 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com jay , I see updating index properties... twice this should happen rarely. in your case it should have happened only once. because you cleaned up the master only once On Fri, Jul 3, 2009 at 6:09 AM, Otis Gospodneticotis_gospodne...@yahoo.com wrote: Jay, You didn't mention which version of Solr you are using. It looks like some trunk or nightly version. Maybe you can try the latest nightly? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: solr jay solr...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, July 2, 2009 9:14:48 PM Subject: reindexed data on master not replicated to slave Hi, When index data were corrupted on master instance, I wanted to wipe out all the index data and re-index everything. I was hoping the newly created index data would be replicated to slaves, but it wasn't. Here are the steps I performed: 1. stop master 2. delete the directory 'index' 3. start master 4. disable replication on master 5. index all data from scratch 6. enable replication on master It seemed from log file that the slave instances discovered that new index are available and claimed that new index installed, and then trying to update index properties, but looking into the index directory on slaves, you will find that no index data files were updated or added, plus slaves keep trying to get new index. Here are some from slave's log file: Jul 1, 2009 3:59:33 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Starting replication process Jul 1, 2009 3:59:33 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Number
Re: Solr slave Heap space error and index size issue
: 5-6 days after fresh index index size suddenly increased (no optimization in : between) by 150GB and then query takes long time and java heap error comes. : I run optimize in this index Its takes long time and result it increase : index size more more then 200GB and it didn't show about optimize completed. : merge factor is default as given in solr build. did you check your logs? this smells like maybe a failure during commit or optimize (OOM maybe?) that resulted in old files not being cleaned up on disk ... particularly you didn't show about optimize completed. comment. There is a CheckIndex tool that you can use (google for details in lucene-java mailing list) which *should* tell you if there are extra segments (i don't remember the details to be certain). -Hoss
Re: Can´t use wildcard * on alphanumeric values?
On Tue, Jul 7, 2009 at 6:45 PM, gateway0 reiterwo...@yahoo.de wrote: Thank you, that was it. Why is the preserveOriginal=1 option nowhere documented? A simple case of oversight :) I've added a note on preserveOriginal and splitOnNumerics (another omission) to the wiki page http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters -- Regards, Shalin Shekhar Mangar.
Re: how to shuffle the result while follow some priority rules at the same time
: I want to implement that effect that the results had better differ from each : other in one page, but I want to show some results first like those contains : more attributes. there is a RandomSortField that you can use as a tie breaker when all other fields are equal. info baout using that can be found in the example schema.xml you could also test drive the FieldCollapsing patch (in Jira, not yet committed) which would let you collapse the results based on a common field name (ie: if the song title name was identical) : pages. My current solr query is: : : ?q=ne+yoqf=artistdefType=dismaxsort=lyric%20desc,links%20descstart=0rows=20indent=on : So the results will shows some same song names together cause their : scores are totally the same. : How to modify to support random, hash effect? you aren't using score in your sort at all -- so score isn't influencing your result order at all. assuming lyric is a boolean indicating you have lyrics, you might wnat something like sort=lyric+desc,+score+desc,+links+desc ... so it will make sure things with lyrics appear first, but all songs with lyrics will be in score order; if and only if two docs have identicle scores (and both have lyrics) will it then do a secondary sort on links : BTW: I find the sorting with multi conditions does not work well. I : want to sort the second attribute : (links desc)based on the first condition. ( lyric desc) . The results : with lyric shows really in : the front, but the links attribute seems not in order. you haven't explained what links is so it's hard to guess what might be happening here. if you give contrete examples (ie: show us your schema, show us a real query, show us real results) then people might be able to help you. -Hoss
Re: reindexed data on master not replicated to slave
On Tue, Jul 7, 2009 at 11:50 PM, solr jay solr...@gmail.com wrote: It seemed that the patch fixed the symptom, but not the problem itself. Now the log messages looks good. After one download and installed the index, it printed out *Jul 7, 2009 10:35:10 AM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master.* but the files inside index directory did not change. Both index.properties and replication.properties were updated though. Note that in this case, Solr would have created a new index directory. Are you comparing the files on the slave in the new index directory? You can get the new index directory's name from index.properties. -- Regards, Shalin Shekhar Mangar.
Re: reindexed data on master not replicated to slave
I see. So I tried it again. Now index.properties has #index properties #Tue Jul 07 12:13:49 PDT 2009 index=index.20090707121349 but there is no such directory index.20090707121349 under the data directory. Thanks, J On Tue, Jul 7, 2009 at 11:50 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Tue, Jul 7, 2009 at 11:50 PM, solr jay solr...@gmail.com wrote: It seemed that the patch fixed the symptom, but not the problem itself. Now the log messages looks good. After one download and installed the index, it printed out *Jul 7, 2009 10:35:10 AM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master.* but the files inside index directory did not change. Both index.properties and replication.properties were updated though. Note that in this case, Solr would have created a new index directory. Are you comparing the files on the slave in the new index directory? You can get the new index directory's name from index.properties. -- Regards, Shalin Shekhar Mangar.
Re: facets and stopwords
: http://projecte01.development.barcelonamedia.org/fonetic/ : you will see a Top Words list (in Spanish and stemmed) in the list there : is the word si which is in 20649 documents. : If you click at this word, the system will perform the query : (x) content:si, with no answers at all : The same for la it is in 17881 documents, but the query content:la will : give no answers at all ... : To see what's going on on the index I have tested with the analyzer : http://projecte01.development.barcelonamedia.org/solr/admin/analysis.jsp ... : las cosas que si no pasan la proxima vez si que no veràs but are you sure that example would actually cause a problem? i suspect if you index thta exact sentence as is you wouldn't see the facet count for si or que increase at all. If you do a query for {!raw field=content}que you bypass the query parsers (which is respecting your stopwords file) and see all docs that contain the raw term que in the content field. if you look at some of the docs that match, and paste their content field into the analysis tool, i think you'll see that the problem comes from using the whitespace tokenizer, and is masked by using the WDF after the stop filter ... things like Que? are getting ignored by the stopfilter, but ultimately winding up in your index as que -Hoss
Re: Indexing XML
I'm sorry I almost finish my script to format my xml in Solr's xml. I'll give it to you later, I think that can help some people like me in the future :) I just need to formate my output text and everything will be fine :) Cheers for your help guys ;) On Tue, Jul 7, 2009 at 7:06 PM, Jay Hill jayallenh...@gmail.com wrote: Mathieu, have a look at Solr's DataImportHandler. It provides a configuration-based approach to index different types of datasources including relational databases and XML files. In particular have a look at the XpathEntityProcessor ( http://wiki.apache.org/solr/DataImportHandler#head-f1502b1ed71d98ef0120671db5762e137e63f9d2 ) which allows you to use xpath syntax to map xml data to index fields. -Jay On Tue, Jul 7, 2009 at 7:25 AM, Saeli Mathieu saeli.math...@gmail.com wrote: Hello. I'm a new user of Solr, I already used Lucene to index files and search. But my programme was too slow, it's why I was looking for another solution, and I thought I found it. I said I thought because I don't know if it's possible to use solar with this kind of XML files. lom xsi:schemaLocation=http://ltsc.ieee.org/xsd/lomv1.0 http://ltsc.ieee.org/xsd/lomv1.0/lom.xsd; general identifier catalogSTRING HERE/catalog entry STRING HERE /entry /identifier title string language=fr STRING HERE /string /title languagefr/language description string language=fr STRING HERE /string /description /general lifeCycle status sourceSTRING HERE/source valueSTRING HERE/value /status contribute role sourceSTRING HERE/source valueSTRING HERE/value /role entitySTRING HERE /entity /contribute /lifeCycle metaMetadata identifier catalogSTRING HERE/catalog entrySTRING HERE/entry /identifier contribute role sourceSTRING HERE/source valueSTRING HERE/value /role entitySTRING HERE /entity date dateTimeSTRING HERE/dateTime /date /contribute contribute role sourceSTRING HERE/source valueSTRING HERE/value /role entitySTRING HERE /entity entitySTRING HERE/entity entitySTRING HERE /entity date dateTimeSTRING HERE/dateTime /date /contribute metadataSchemaSTRING HERE/metadataSchema languageSTRING HERE/language /metaMetadata technical locationSTRING HERE /location /technical educational intendedEndUserRole sourceSTRING HERE/source valueSTRING HERE/value /intendedEndUserRole context sourceSTRING HERE/source valueSTRING HERE/value /context typicalAgeRange string language=frSTRING HERE/string /typicalAgeRange description string language=frSTRING HERE/string /description description string language=frSTRING HERE/string /description languageSTRING HERE/language /educational annotation entitySTRING HERE /entity date dateTimeSTRING HERE/dateTime /date /annotation classification purpose sourceSTRING HERE/source valueSTRING HERE/value /purpose /classification classification purpose sourceSTRING HERE/source valueSTRING HERE/value /purpose taxonPath source string language=frSTRING HERE/string /source taxon idSTRING HERE/id entry string language=frSTRING HERE/string /entry /taxon /taxonPath /classification classification purpose sourceSTRING HERE/source valueSTRING HERE/value /purpose taxonPath source string language=frSTRING HERE /string /source taxon idSTRING HERE/id entry string language=frSTRING HERE/string /entry /taxon /taxonPath taxonPath source string language=frSTRING HERE/string /source taxon idSTRING HERE/id entry string language=frSTRING HERE/string /entry /taxon /taxonPath /classification /lom I don't know how I can use this kind of file with Solr because the XML example are this one. add doc field name=idSOLR1000/field field name=nameSolr, the Enterprise Search Server/field field name=manuApache Software Foundation/field field name=catsoftware/field field name=catsearch/field field name=featuresAdvanced Full-Text Search Capabilities using Lucene/field field name=featuresOptimized for High Volume Web Traffic/field field name=featuresStandards Based Open Interfaces - XML and HTTP/field field name=featuresComprehensive HTML Administration Interfaces/field field name=featuresScalability - Efficient Replication to other Solr Search Servers/field field name=featuresFlexible and Adaptable with XML configuration and Schema/field field name=featuresGood unicode support: h#xE9;llo (hello with an accent over the e)/field field name=price0/field field name=popularity10/field field name=inStocktrue/field field name=incubationdate_dt2006-01-17T00:00:00.000Z/field /doc /add I understood Solr need this kind of architecture, by Architecture I mean field + name=keywordValue/field or as you can see I can't use this kind of architecture because I'm
Re: How to get various records in the result set
duplicate post? http://www.nabble.com/how-to-shuffle-the-result-while-follow-some-priority-rules-at-the--same-time-to24282025.html#a24282025 FYI: reposting the same question twice doesn't tend to get responses faster, it just increases the total volume of mail and slows down everyones ability to read/reply to messages. what can help get a response: Replying to your own question with additional details like configs, concrete examples, debugging output, log messages, things you've tried to solve hte problem, etc... -Hoss
Re: reindexed data on master not replicated to slave
Ok, Here is the problem. In the function, the two directories tmpIndexDir and indexDir are the same (in this case only?), and then at the end of the function, the directory tmpIndexDir is deleted, which deletes the new index directory. } finally { delTree(tmpIndexDir); } On Tue, Jul 7, 2009 at 12:17 PM, solr jay solr...@gmail.com wrote: I see. So I tried it again. Now index.properties has #index properties #Tue Jul 07 12:13:49 PDT 2009 index=index.20090707121349 but there is no such directory index.20090707121349 under the data directory. Thanks, J On Tue, Jul 7, 2009 at 11:50 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Tue, Jul 7, 2009 at 11:50 PM, solr jay solr...@gmail.com wrote: It seemed that the patch fixed the symptom, but not the problem itself. Now the log messages looks good. After one download and installed the index, it printed out *Jul 7, 2009 10:35:10 AM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master.* but the files inside index directory did not change. Both index.properties and replication.properties were updated though. Note that in this case, Solr would have created a new index directory. Are you comparing the files on the slave in the new index directory? You can get the new index directory's name from index.properties. -- Regards, Shalin Shekhar Mangar.
Re: reindexed data on master not replicated to slave
In fact, I saw the directory was created and then deleted. On Tue, Jul 7, 2009 at 12:29 PM, solr jay solr...@gmail.com wrote: Ok, Here is the problem. In the function, the two directories tmpIndexDir and indexDir are the same (in this case only?), and then at the end of the function, the directory tmpIndexDir is deleted, which deletes the new index directory. } finally { delTree(tmpIndexDir); } On Tue, Jul 7, 2009 at 12:17 PM, solr jay solr...@gmail.com wrote: I see. So I tried it again. Now index.properties has #index properties #Tue Jul 07 12:13:49 PDT 2009 index=index.20090707121349 but there is no such directory index.20090707121349 under the data directory. Thanks, J On Tue, Jul 7, 2009 at 11:50 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Tue, Jul 7, 2009 at 11:50 PM, solr jay solr...@gmail.com wrote: It seemed that the patch fixed the symptom, but not the problem itself. Now the log messages looks good. After one download and installed the index, it printed out *Jul 7, 2009 10:35:10 AM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master.* but the files inside index directory did not change. Both index.properties and replication.properties were updated though. Note that in this case, Solr would have created a new index directory. Are you comparing the files on the slave in the new index directory? You can get the new index directory's name from index.properties. -- Regards, Shalin Shekhar Mangar.
Re: Filtering MoreLikeThis results
The answer to my owner question: ... requestHandler name=mlt class=solr.MoreLikeThisHandler lst name=defaults/ /requestHandler ... would work. -Yao Yao Ge wrote: I am not sure about the parameters for MLT the requestHandler plugin. Can one of you share the solrconfig.xml entry for MLT? Thanks in advance. -Yao Bill Au wrote: I have been using the StandardRequestHandler (ie /solr/select). fq does work with the MoreLikeThisHandler. I will switch to use that. Thanks. Bill On Tue, Jul 7, 2009 at 11:02 AM, Marc Sturlese marc.sturl...@gmail.comwrote: At least in trunk, if you request for: http://localhost:8084/solr/core_A/mlt?q=id:7468365fq=price[100http://localhost:8084/solr/core_A/mlt?q=id:7468365fq=price%5B100TO 200] It will filter the MoreLikeThis results Bill Au wrote: I think fq only works on the main response, not the mlt matches. I found a couple of releated jira: http://issues.apache.org/jira/browse/SOLR-295 http://issues.apache.org/jira/browse/SOLR-281 If I am reading them correctly, I should be able to use DIsMax and MoreLikeThis together. I will give that a try and report back. Bill On Tue, Jul 7, 2009 at 4:45 AM, Marc Sturlese marc.sturl...@gmail.comwrote: Using MoreLikeThisHandler you can use fq to filter your results. As far as I know bq are not allowed. Bill Au wrote: I have been trying to restrict MoreLikeThis results without any luck also. In additional to restricting the results, I am also looking to influence the scores similar to the way boost query (bq) works in the DisMaxRequestHandler. I think Solr's MoreLikeThis depends on Lucene's contrib queries MoreLikeThis, or at least it used to. Has anyone looked into enhancing Solrs' MoreLikeThis to support bq and restricting mlt results? Bill On Mon, Jul 6, 2009 at 2:16 PM, Yao Ge yao...@gmail.com wrote: I could not find any support from http://wiki.apache.org/solr/MoreLikeThison how to restrict MLT results to certain subsets. I passed along a fq parameter and it is ignored. Since we can not incorporate the filters in the query itself which is used to retrieve the target for similarity comparison, it appears there is no way to filter MLT results. BTW. I am using Solr 1.3. Please let me know if there is way (other than hacking the source code) to do this. Thanks! -- View this message in context: http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24360355.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24369257.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24374996.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Filtering-MoreLikeThis-results-tp24360355p24380408.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Faceting with MoreLikeThis
Faceting on MLT request the use of MoreLikeThisHandler. The standard request handler, while provide support to MLT via a search component, does not return facets on MLT results. To enable MLT handler, add an entry like below to your solrconfig.xml requestHandler name=mlt class=solr.MoreLikeThisHandler lst name=defaults/ /requestHandler The query parameters syntax for faceting remains the same as standard request handler. -Yao Yao Ge wrote: Does Solr support faceting on MoreLikeThis search results? -- View this message in context: http://www.nabble.com/Faceting-with-MoreLikeThis-tp24356166p24380459.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing XML
And here it's my code :) If you need some explanation feel free to ask :) You can test it on the first test file I gave you when I open the thread. At the moment that works only on one file, I have to change it a bit to make it works on repertory with lots of xml files, See you later guys :-) $repertory = 0.xml; $BaseObject = simplexml_load_file($repertory); $Prefix = $BaseObject-getName(); $Final = recu($BaseObject); format($Final); function OpenFile() { if (!file_exists(FinalParsin.xml)) $fd = fopen(FinalParsing.xml, w+); else $fd = fopen(FinalParsing.xml, x); if ($fd 0) { echo Fatal Error: Couldn't Create and open the tempory file.\n; exit -1; } return $fd; } function Xfwrite($fd, $String) { if (!fwrite($fd, $String)) { echo Fatal Error: Couldn't write in the tompory file.\n; exit -1; } return ; } function format($String) { $fd = OpenFile(); Xfwrite($fd, add\ndoc\n); $String = split(\n, $String); for ($i = 0; $String[$i]; ++$i) { $Parsing = split(=, $String[$i]); Xfwrite($fd, \t.'field name='.$Parsing[0].''.$Parsing[1].'field'.\n); } Xfwrite($fd, /doc\n/add); fclose($fd); } function recu($Object, $Prefix = null) { if ($Prefix === null) $Prefix = $Object-getName(); else $Prefix .= '::'.$Object-getName(); if (count($Object-Children()) 1) return $Prefix.'='.str_replace(\n, , $Object).\n; foreach($Object-Children() as $Child) $Save .= recu($Child, $Prefix); return $Save; } -- Saeli Mathieu.
Re: Can't limit return fields in custom request handler
: But I have a problem like this; when i call : http://localhost:8983/solr/select/?qt=cfacetq=%2BitemTitle:nokia%20%2BcategoryId:130start=0limit=3fl=id, : itemTitle : i'm getiing all fields instead of only id and itemTitle. Your custom handler is responsible for checking the fl and setting what you want the response fields to be on the response object. SolrPluginUtils.setReturnFields can be used if you want this to be done in the normal way. : Also i'm gettting no result when i give none null filter parameter in : getDocListAndSet(...). ... : DocListAndSet results = req.getSearcher().getDocListAndSet(q, : (Query)null, (Sort)null, solrParams.getInt(start), : solrParams.getInt(limit)); ...that should work. What does your query look like? what are you passing for the start and limit params (is it possible you are getting results, but limit=0 so there aren't any results on the current page of pagination?) what does the debug output look like? -Hoss
RE: Is there any other way to load the index beside using http connection?
Norberto, You said last week: why not generate your SQL output directly into your oracle server as a file, upload the file to your SOLR server? Then the data file is local to your SOLR server , you will bypass any WAN and firewall you may be having. (or some variation of it, sql - SOLR server as file, etc..) I think this is the best solution that we are going to without changing too much on our setup. Like said we have file name test.xml which come from SQL output , we put it locally on the solr server under /opt/test.xml So, I need to execute the commands from solr system to add and update this to the solr data/indexes. What commands do I have to use, for example the xml file named /opt/test.xml ? Thanks Francis -Original Message- From: Norberto Meijome [mailto:numard...@gmail.com] Sent: Sunday, July 05, 2009 3:57 AM To: Francis Yakin Cc: solr-user@lucene.apache.org Subject: Re: Is there any other way to load the index beside using http connection? On Thu, 2 Jul 2009 11:02:28 -0700 Francis Yakin fya...@liquid.com wrote: Norberto, Thanks for your input. What do you mean with Have you tried connecting to SOLR over HTTP from localhost, therefore avoiding any firewall issues and network latency ? it should work a LOT faster than from a remote site. ? Here are how our servers lay out: 1) Database ( Oracle ) is running on separate machine 2) Solr master is running on separate machine by itself 3) 6 solr slaves ( these 6 pulll the index from master using rsync) We have a SQL(Oracle) script to post the data/index from Oracle Database machine to Solr Master over http. We wrote those script(Someone in Oracle Database administrator write it). You said in your other email you are having issues with slow transfers between 1) and 2). Your subject relates to the data transfer between 1) and 2, - 2) and 3) is irrelevant to this part. My question (what you quoted above) relates to the point you made about it being slow ( WHY is it slow?), and issues with opening so many connections through firewall. so, I'll rephrase my question (see below...) [] We can not do localhost since it's solr is not running on Oracle machine. why not generate your SQL output directly into your oracle server as a file, upload the file to your SOLR server? Then the data file is local to your SOLR server , you will bypass any WAN and firewall you may be having. (or some variation of it, sql - SOLR server as file, etc..) Any speed issues that are rooted in the fact that you are posting via HTTP (vs embedded solr or DIH) aren't going to go away. But it's the simpler approach without changing too much of your current setup. Another alternative that we think of is to transform XML into CSV and import/export it. How about if LUSQL, some mentioned about this? Is this apps free(open source) application? Do you have any experience with this apps? Not i, sorry. Have you looked into DIH? It's designed for this kind of work. B _ {Beto|Norberto|Numard} Meijome Great spirits have often encountered violent opposition from mediocre minds. Albert Einstein I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
about defaultSearchField
Hi, I have some problems. For my solr progame, I want to type only the Query String and get all field result that includ the Query String. But now I can't get any result without specified field. For example, query with tina get nothing, but Sentence:tina could. I hava adjusted the *schema.xml* like this: fields field name=CategoryNamePolarity type=text indexed=true stored=true multiValued=true/ field name=CategoryNameStrenth type=text indexed=true stored=true multiValued=true/ field name=CategoryNameSubjectivity type=text indexed=true stored=true multiValued=true/ field name=Sentence type=text indexed=true stored=true multiValued=true/ field name=allText type=text indexed=true stored=true multiValued=true/ /fields uniqueKey required=falseSentence/uniqueKey !-- field for the QueryParser to use when an explicit fieldname is absent -- defaultSearchFieldallText/defaultSearchField !-- SolrQueryParser configuration: defaultOperator=AND|OR -- solrQueryParser defaultOperator=OR/ copyfield source=CategoryNamePolarity dest=allText/ copyfield source=CategoryNameStrenth dest=allText/ copyfield source=CategoryNameSubjectivity dest=allText/ copyfield source=Sentence dest=allText/ I think the problem is in defaultSearchField, but I don't know how to fix it. Could anyone help me? Thanks Yang
Re: reindexed data on master not replicated to slave
I guess in this case it doesn't matter whether the two directories tmpIndexDir and indexDir are the same or not. It looks that the index directory is switched to tmpIndexDir and then it is deleted inside finally. On Tue, Jul 7, 2009 at 12:31 PM, solr jay solr...@gmail.com wrote: In fact, I saw the directory was created and then deleted. On Tue, Jul 7, 2009 at 12:29 PM, solr jay solr...@gmail.com wrote: Ok, Here is the problem. In the function, the two directories tmpIndexDir and indexDir are the same (in this case only?), and then at the end of the function, the directory tmpIndexDir is deleted, which deletes the new index directory. } finally { delTree(tmpIndexDir); } On Tue, Jul 7, 2009 at 12:17 PM, solr jay solr...@gmail.com wrote: I see. So I tried it again. Now index.properties has #index properties #Tue Jul 07 12:13:49 PDT 2009 index=index.20090707121349 but there is no such directory index.20090707121349 under the data directory. Thanks, J On Tue, Jul 7, 2009 at 11:50 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Tue, Jul 7, 2009 at 11:50 PM, solr jay solr...@gmail.com wrote: It seemed that the patch fixed the symptom, but not the problem itself. Now the log messages looks good. After one download and installed the index, it printed out *Jul 7, 2009 10:35:10 AM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master.* but the files inside index directory did not change. Both index.properties and replication.properties were updated though. Note that in this case, Solr would have created a new index directory. Are you comparing the files on the slave in the new index directory? You can get the new index directory's name from index.properties. -- Regards, Shalin Shekhar Mangar. -- J
Re: Stopwords when facetting
: When indexing or querying text, i'm using the solr.StopFilterFactory ; it seems to works just fine... : : But I want to use the text field as a facet, and get all the commonly : used words in a set of results, without the stopwords. As far as I : tried, I always get stopwords, and numerical terms, that pollute my : facets results. How can I perform this ? perhaps you have the same problem as described here... http://www.nabble.com/facets-and-stopwords-to23952823.html#a24379679 ...it's hard to be certain without any actual concrete examples (what does your schema.xml look like, what are you stopwords, what terms are still showing up in your facet list even though they are stop words, what documents contain those terms (the raw parser can help you find them... q={!raw field=yourFieldName}wordYouDoNotExpect -Hoss
Re: Preparing the ground for a real multilang index
When using stemming, you have to know the query language. For your project, perhaps you should look into switching to a lemmatizer instead. I believe Lucid can provide integration with a commercial lemmatizer. This way you can expand the document field itself and do not need to know the query language. You may then want to do a copyfield from all your text_lang - text for convenient one- field-to-rule-them-all search. -- Jan Høydahl Gründer senior architect Cominvent AS, Stabekk, Norway www.cominvent.com +20 100930908 On 3. juli. 2009, at 08.43, Michael Lackhoff wrote: On 03.07.2009 00:49 Paul Libbrecht wrote: [I'll try to address the other responses as well] I believe the proper way is for the server to compute a list of accepted languages in order of preferences. The web-platform language (e.g. the user-setting), and the values in the Accept-Language http header (which are from the browser or platform). All this is not going to help much because the main application is a scientific search portal for books and articles with many users searching cross-language. The most typical use case is a German user searching multilingual. So we might even get the search multilingual, e.g. TITLE:cancer OR TITLE:krebs. No way here to watch out for Accept-headers or a language select field (would be left on any in most cases). Other popular use cases are citations (in whatever language) cut and pasted into the search field. Then you expand your query for surfing waves (say) to: - phrase query: surfing waves exactly (^2.0) - two terms, no stemming: surfing waves (^1.5) - iterate through the languages and query for stemmed variants: - english: surf wav ^1.0 - german surfing wave ^0.9 - - then maybe even try the phonetic analyzer (matched in a separate field probably) This is an even more sophisticated variant of the multiple OR I came up with. Oh well... I think this is a common pattern on the web where the users, browsers, and servers are all somewhat multilingual. indeed and often users are not even aware of it, especially in a scientific context they use their native tongue and English almost interchangably -- and they expect the search engine to cope with it. I think the best would be to process the data according to its language but don't make any assumptions about the query language and I am totally lost how to get a clever schema.xml out of all this. Thanks everyone for listening and I am still open for good suggestions to deal with this problem! -Michael
Re: Preparing the ground for a real multilang index
There is an alternative to knowing the language at query: multiply-process for stems or lemmas of all the possible languages. This may well be a cure much worse than the disease. Yes, LI can sell you our lemma-production capability. --benson margulies basis technology On Tue, Jul 7, 2009 at 6:50 PM, Jan Høydahlj...@cominvent.com wrote: When using stemming, you have to know the query language. For your project, perhaps you should look into switching to a lemmatizer instead. I believe Lucid can provide integration with a commercial lemmatizer. This way you can expand the document field itself and do not need to know the query language. You may then want to do a copyfield from all your text_lang - text for convenient one-field-to-rule-them-all search. -- Jan Høydahl Gründer senior architect Cominvent AS, Stabekk, Norway www.cominvent.com +20 100930908 On 3. juli. 2009, at 08.43, Michael Lackhoff wrote: On 03.07.2009 00:49 Paul Libbrecht wrote: [I'll try to address the other responses as well] I believe the proper way is for the server to compute a list of accepted languages in order of preferences. The web-platform language (e.g. the user-setting), and the values in the Accept-Language http header (which are from the browser or platform). All this is not going to help much because the main application is a scientific search portal for books and articles with many users searching cross-language. The most typical use case is a German user searching multilingual. So we might even get the search multilingual, e.g. TITLE:cancer OR TITLE:krebs. No way here to watch out for Accept-headers or a language select field (would be left on any in most cases). Other popular use cases are citations (in whatever language) cut and pasted into the search field. Then you expand your query for surfing waves (say) to: - phrase query: surfing waves exactly (^2.0) - two terms, no stemming: surfing waves (^1.5) - iterate through the languages and query for stemmed variants: - english: surf wav ^1.0 - german surfing wave ^0.9 - - then maybe even try the phonetic analyzer (matched in a separate field probably) This is an even more sophisticated variant of the multiple OR I came up with. Oh well... I think this is a common pattern on the web where the users, browsers, and servers are all somewhat multilingual. indeed and often users are not even aware of it, especially in a scientific context they use their native tongue and English almost interchangably -- and they expect the search engine to cope with it. I think the best would be to process the data according to its language but don't make any assumptions about the query language and I am totally lost how to get a clever schema.xml out of all this. Thanks everyone for listening and I am still open for good suggestions to deal with this problem! -Michael
A big question about Solr and SolrJ range query ?
Hi all: Suppose that my index have 3 fields: title, x and y. I know one range(10 x 100) can query liks this: http://localhost:8983/solr/select?q=x:[10 TO 100]fl=title If I want to two range(10 x 100 AND 20 y 300) query like SQL(select title where x10 and x 100 and y 20 and y 300) by using Solr range query or SolrJ, but not know how to implement. Anybody know ? Thanks Email: enzhao...@gmail.com -- View this message in context: http://www.nabble.com/A-big-question-about-Solr-and-SolrJ-range-query---tp24384416p24384416.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: A big question about Solr and SolrJ range query ?
use Solr's Filter Query parameter fq: fq=x:[10 TO 100]fq=y:[20 TO 300]fl=title -Yao huenzhao wrote: Hi all: Suppose that my index have 3 fields: title, x and y. I know one range(10 x 100) can query liks this: http://localhost:8983/solr/select?q=x:[10 TO 100]fl=title If I want to two range(10 x 100 AND 20 y 300) query like SQL(select title where x10 and x 100 and y 20 and y 300) by using Solr range query or SolrJ, but not know how to implement. Anybody know ? Thanks Email: enzhao...@gmail.com -- View this message in context: http://www.nabble.com/A-big-question-about-Solr-and-SolrJ-range-query---tp24384416p24384540.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: about defaultSearchField
Try with fl=* or fl=*,score added to your request string. -Yao Yang Lin-2 wrote: Hi, I have some problems. For my solr progame, I want to type only the Query String and get all field result that includ the Query String. But now I can't get any result without specified field. For example, query with tina get nothing, but Sentence:tina could. I hava adjusted the *schema.xml* like this: fields field name=CategoryNamePolarity type=text indexed=true stored=true multiValued=true/ field name=CategoryNameStrenth type=text indexed=true stored=true multiValued=true/ field name=CategoryNameSubjectivity type=text indexed=true stored=true multiValued=true/ field name=Sentence type=text indexed=true stored=true multiValued=true/ field name=allText type=text indexed=true stored=true multiValued=true/ /fields uniqueKey required=falseSentence/uniqueKey !-- field for the QueryParser to use when an explicit fieldname is absent -- defaultSearchFieldallText/defaultSearchField !-- SolrQueryParser configuration: defaultOperator=AND|OR -- solrQueryParser defaultOperator=OR/ copyfield source=CategoryNamePolarity dest=allText/ copyfield source=CategoryNameStrenth dest=allText/ copyfield source=CategoryNameSubjectivity dest=allText/ copyfield source=Sentence dest=allText/ I think the problem is in defaultSearchField, but I don't know how to fix it. Could anyone help me? Thanks Yang -- View this message in context: http://www.nabble.com/about-defaultSearchField-tp24382105p24384615.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Query on the updation of synonym and stopword file.
I am using Solr1.3 version.. Date: Wed, 8 Jul 2009 01:12:02 +0900 From: k...@r.email.ne.jp To: solr-user@lucene.apache.org Subject: Re: Query on the updation of synonym and stopword file. Sagar, I am facing a problem here that even after the core reload and re-indexing the documents the new updated synonym or stop words are not loaded. Seems so the filters are not aware that these files are updated so the solution to me is to restart the whole container in which I have embedded the Solr server; it is not feasible in production. I am not a multicore user, but I can see the synonyms.txt updated after reloading the core (I verified it via analysis.jsp, not re-indexing), wothout restarting solr server. I'm using 1.4. What version are you using? Koji Sagar Khetkade wrote: Hello All, I was figuring out the issue with the synonym.txt and stopword.txt files being updated on regular interval. Here in my case I am updating the synonym.txt and stopword.txt files as the synonym and stop word dictionary is update. I am facing a problem here that even after the core reload and re-indexing the documents the new updated synonym or stop words are not loaded. Seems so the filters are not aware that these files are updated so the solution to me is to restart the whole container in which I have embedded the Solr server; it is not feasible in production. I came across the discussion with subject “ synonyms.txt file updated frequently” in which Grant had a view to write a new logic in SynonymFilterFactory which would take care of this issue. Is there any possible solution to this or is this the solution. Thanks in advance! Regards, Sagar Khetkade _ Missed any of the IPL matches ? Catch a recap of all the action on MSN Videos http://msnvideos.in/iplt20/msnvideoplayer.aspx _ More than messages–check out the rest of the Windows Live™. http://www.microsoft.com/india/windows/windowslive/
Updating Solr index from XML files
I have the following curl cmd to update and doing commit to Solr ( I have 10 xml files just for testing) curl http://solr00:7001/solr/update --data-binary @xml_Artist-100170.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101062.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101238.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101400.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101513.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101517.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101572.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101691.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101694.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101698.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @commit.txt -H 'Content-type:text/plain; charset=utf-8' It works so far. But I will have 3 xml files. What's the efficient way to do these things? I can script it with for loop using regular shell script or perl. I am also looking into solr.pm from this: http://wiki.apache.org/solr/IntegratingSolr BTW: We are using weblogic to deploy the solr.war and by default solr in weblogic using port 7001, but not 8983. Thanks Francis
Re: reindexed data on master not replicated to slave
jay, Thanks. The testcase was not enough. I have given a new patch . I guess that should solve this On Wed, Jul 8, 2009 at 3:48 AM, solr jaysolr...@gmail.com wrote: I guess in this case it doesn't matter whether the two directories tmpIndexDir and indexDir are the same or not. It looks that the index directory is switched to tmpIndexDir and then it is deleted inside finally. On Tue, Jul 7, 2009 at 12:31 PM, solr jay solr...@gmail.com wrote: In fact, I saw the directory was created and then deleted. On Tue, Jul 7, 2009 at 12:29 PM, solr jay solr...@gmail.com wrote: Ok, Here is the problem. In the function, the two directories tmpIndexDir and indexDir are the same (in this case only?), and then at the end of the function, the directory tmpIndexDir is deleted, which deletes the new index directory. } finally { delTree(tmpIndexDir); } On Tue, Jul 7, 2009 at 12:17 PM, solr jay solr...@gmail.com wrote: I see. So I tried it again. Now index.properties has #index properties #Tue Jul 07 12:13:49 PDT 2009 index=index.20090707121349 but there is no such directory index.20090707121349 under the data directory. Thanks, J On Tue, Jul 7, 2009 at 11:50 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Tue, Jul 7, 2009 at 11:50 PM, solr jay solr...@gmail.com wrote: It seemed that the patch fixed the symptom, but not the problem itself. Now the log messages looks good. After one download and installed the index, it printed out *Jul 7, 2009 10:35:10 AM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master.* but the files inside index directory did not change. Both index.properties and replication.properties were updated though. Note that in this case, Solr would have created a new index directory. Are you comparing the files on the slave in the new index directory? You can get the new index directory's name from index.properties. -- Regards, Shalin Shekhar Mangar. -- J -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Updating Solr index from XML files
If Perl is you choice: http://search.cpan.org/~bricas/WebService-Solr-0.07/lib/WebService/Solr.pm Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Francis Yakin fya...@liquid.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Wednesday, July 8, 2009 1:16:04 AM Subject: Updating Solr index from XML files I have the following curl cmd to update and doing commit to Solr ( I have 10 xml files just for testing) curl http://solr00:7001/solr/update --data-binary @xml_Artist-100170.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101062.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101238.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101400.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101513.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101517.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101572.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101691.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101694.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @xml_Artist-101698.txt -H 'Content-type:text/plain; charset=utf-8' curl http://solr00:7001/solr/update --data-binary @commit.txt -H 'Content-type:text/plain; charset=utf-8' It works so far. But I will have 3 xml files. What's the efficient way to do these things? I can script it with for loop using regular shell script or perl. I am also looking into solr.pm from this: http://wiki.apache.org/solr/IntegratingSolr BTW: We are using weblogic to deploy the solr.war and by default solr in weblogic using port 7001, but not 8983. Thanks Francis