Re: Delete solr data from disk space
Hi Toby, Thanks but i have tried this solution earlier but the problem with this solution is that it is taking too much disk space for optimization(more than two times of originally index data size) Do you have any better solution or any other option by which we can use optimize without using too much space. Thanks Ashish Toby Cole-2 wrote: Hi Anish, Have you optimized your index? When you delete documents in lucene they are simply marked as 'deleted', they aren't physically removed from the disk. To get the disk space back you must run an optimize, which re-writes the index out to disk without the deleted documents, then deletes the original. Toby On 4 Aug 2009, at 14:41, Ashish Kumar Srivastava wrote: Hi , Sorry!! But this solution will not work because I deleted data by certain query. Then how can i know which files should be deleted. I cant delete whole data. Markus Jelsma - Buyways B.V. wrote: Hello, A rigorous but quite effective method is manually deleting the files in your SOLR_HOME/data directory and reindex the documents you want. This will surely free some diskspace. Cheers, - Markus Jelsma Buyways B.V. Tel. 050-3118123 Technisch ArchitectFriesestraatweg 215c Fax. 050-3118124 http://www.buyways.nl 9743 AD GroningenKvK 01074105 On Tue, 2009-08-04 at 06:26 -0700, Ashish Kumar Srivastava wrote: I am facing a problem in deleting solr data form disk space. I had 80Gb of of solr data. I deleted 30% of these data by using query in solr-php client and committed. Now deleted data is not visible from the solr UI but used disk space is still 80Gb for solr data. Please reply if you have any solution to free the disk space after deleting some solr data. Thanks in advance. -- View this message in context: http://www.nabble.com/Delete-solr-data-from-disk-space-tp24808676p24808883.html Sent from the Solr - User mailing list archive at Nabble.com. -- Toby Cole Software Engineer, Semantico Limited Registered in England and Wales no. 03841410, VAT no. GB-744614334. Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK. Check out all our latest news and thinking on the Discovery blog http://blogs.semantico.com/discovery-blog/ -- View this message in context: http://www.nabble.com/Delete-solr-data-from-disk-space-tp24808676p24821241.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Delete solr data from disk space
Hi Toby, Thanks for the reply, But i have tried this solution earlier but the problem with this solution is that it is taking too much disk space for optimization(more than two times of originally index data size) Do you have any better solution or any other option by which we can use optimize without using too much space. Thanks Ashish Toby Cole-2 wrote: Hi Anish, Have you optimized your index? When you delete documents in lucene they are simply marked as 'deleted', they aren't physically removed from the disk. To get the disk space back you must run an optimize, which re-writes the index out to disk without the deleted documents, then deletes the original. Toby On 4 Aug 2009, at 14:41, Ashish Kumar Srivastava wrote: Hi , Sorry!! But this solution will not work because I deleted data by certain query. Then how can i know which files should be deleted. I cant delete whole data. Markus Jelsma - Buyways B.V. wrote: Hello, A rigorous but quite effective method is manually deleting the files in your SOLR_HOME/data directory and reindex the documents you want. This will surely free some diskspace. Cheers, - Markus Jelsma Buyways B.V. Tel. 050-3118123 Technisch ArchitectFriesestraatweg 215c Fax. 050-3118124 http://www.buyways.nl 9743 AD GroningenKvK 01074105 On Tue, 2009-08-04 at 06:26 -0700, Ashish Kumar Srivastava wrote: I am facing a problem in deleting solr data form disk space. I had 80Gb of of solr data. I deleted 30% of these data by using query in solr-php client and committed. Now deleted data is not visible from the solr UI but used disk space is still 80Gb for solr data. Please reply if you have any solution to free the disk space after deleting some solr data. Thanks in advance. -- View this message in context: http://www.nabble.com/Delete-solr-data-from-disk-space-tp24808676p24808883.html Sent from the Solr - User mailing list archive at Nabble.com. -- Toby Cole Software Engineer, Semantico Limited Registered in England and Wales no. 03841410, VAT no. GB-744614334. Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK. Check out all our latest news and thinking on the Discovery blog http://blogs.semantico.com/discovery-blog/ -- View this message in context: http://www.nabble.com/Delete-solr-data-from-disk-space-tp24808676p24821271.html Sent from the Solr - User mailing list archive at Nabble.com.
DataImportHandler: Partial Delete and Update (Hacking deleteQuery in SOLR 1.3?)
Hi all, the database from which I populate the SOLR index is refreshed partially. Subsets of the data is deleted and readded for a certain group identifier. Is it possible to do something alike in a (delta) import of the DataImportHandler? Example: SOLR-Index: groupID: 1, PK: 1, refreshDate: [before last_index_time] groupID: 1, PK: 2, refreshDate: [before last_index_time] groupID: 1, PK: 3, refreshDate: [before last_index_time] Refreshed DB: groupID: 1, PK: 1, refreshDate: [after last_index_time] groupID: 1, PK: 5, refreshDate: [after last_index_time] groupID: 1, PK: 30, refreshDate: [after last_index_time] (PK 2 and 3 are not there, anymore. PK is unique across all groupIDs) deleteQuery=groupID:1 (An attribute of the entity element that the DocBuilder (1.3) reads and sends as query once, before the delta import, unchanged to the SOLR writer to delete documents.) After that, the delta import loads data with groupID=1 from the DB. Could I plug into SOLR with maybe a custom processor to achieve something in the direction of: deleteInput=select FIELD_VALUE from TABLE where CHANGED_DATE '${dataimporter.last_index_time}' group by FIELD_VALUE deleteQuery=field:${my_entity.FIELD_VALUE} FIELD_VALUE is not the primary key, and the deleteInput query can return multiple rows. I am aware of SOLR-1060 and SOLR-1059 but I am not sure that those will help me. In those cases it looks like the delete is run per entity. I want the delete to run before the (delta)import, once. If that impression is wrong, I'll happily switch to 1.4, of course. Cheers! Chantal -- Chantal Ackermann
Re: DataImportHandler: Partial Delete and Update (Hacking deleteQuery in SOLR 1.3?)
did you explore the deletedPkQuery ? On Wed, Aug 5, 2009 at 11:46 AM, Chantal Ackermannchantal.ackerm...@btelligent.de wrote: Hi all, the database from which I populate the SOLR index is refreshed partially. Subsets of the data is deleted and readded for a certain group identifier. Is it possible to do something alike in a (delta) import of the DataImportHandler? Example: SOLR-Index: groupID: 1, PK: 1, refreshDate: [before last_index_time] groupID: 1, PK: 2, refreshDate: [before last_index_time] groupID: 1, PK: 3, refreshDate: [before last_index_time] Refreshed DB: groupID: 1, PK: 1, refreshDate: [after last_index_time] groupID: 1, PK: 5, refreshDate: [after last_index_time] groupID: 1, PK: 30, refreshDate: [after last_index_time] (PK 2 and 3 are not there, anymore. PK is unique across all groupIDs) deleteQuery=groupID:1 (An attribute of the entity element that the DocBuilder (1.3) reads and sends as query once, before the delta import, unchanged to the SOLR writer to delete documents.) After that, the delta import loads data with groupID=1 from the DB. Could I plug into SOLR with maybe a custom processor to achieve something in the direction of: deleteInput=select FIELD_VALUE from TABLE where CHANGED_DATE '${dataimporter.last_index_time}' group by FIELD_VALUE deleteQuery=field:${my_entity.FIELD_VALUE} FIELD_VALUE is not the primary key, and the deleteInput query can return multiple rows. I am aware of SOLR-1060 and SOLR-1059 but I am not sure that those will help me. In those cases it looks like the delete is run per entity. I want the delete to run before the (delta)import, once. If that impression is wrong, I'll happily switch to 1.4, of course. Cheers! Chantal -- Chantal Ackermann -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: change sort order for MoreLikeThis
Thanks guys. I tried to boost it instead (as sort looks like not supported) but it's not taking effect. Here are the parameters that I'm using: I want to boost by time_published field and I enable mlt.boost bf=recip(rord(time_published),1,1000,165)^1500qt=mltmlt.boost=true Regards, /Renz 2009/8/4 Avlesh Singh avl...@gmail.com You lost me. Absolutely sorry about that Bill :( How does boosting change the sort order? What I really meant here is that if you have more than one similarity fields in you MLT query, you can boost the results found due to one over the other. It was not at all aimed to be an answer for sort. Actually, I was too prompt to respond! What about sorting on a field that is not the mlt field? Haven't tried this yet. It would be surprising if it does not work as expected. Cheers Avlesh On Tue, Aug 4, 2009 at 3:24 AM, Bill Au bill.w...@gmail.com wrote: Avlesh, You lost me. How does boosting change the sort order? What about sorting on a field that is not the mlt field? Bill On Mon, Aug 3, 2009 at 3:13 AM, Avlesh Singh avl...@gmail.com wrote: You can boost the similarity field matches, if you want. Look for mlt.boost at http://wiki.apache.org/solr/MoreLikeThis Cheers Avlesh On Mon, Aug 3, 2009 at 11:33 AM, Renz Daluz renz052...@gmail.com wrote: Hi, I'm looking at changing the result order when searching by MLT. I tried the sort=field,order but it's not working. I check the wiki and can't find anything. Is there a way to do this? Thanks, /Laurence
Re: change sort order for MoreLikeThis
Oh and yes, I tried to sort that is not mlt field and it's not taking effect: Here the whole parameters that I'm using: mlt.fl=text,titletie=0.01mlt.mintf=1mlt.match.include=truefl=tagged_bucket,tagged_entitiesbf=recip(rord(time_published),1,1000,165)^1500qt=mltmlt.minwl=3mm=5mlt.boost=trueqf=text^0.5+title^0.4++description^0.01+keywords^0.01+bestlink_keywords^0.1+authors_t^0.05mlt.maxwl=20mlt.maxntp=200mlt.maxqt=10mlt.interestingTerms=detailsrows=200mlt.mindf=3pf=text^300+title^10+tagged_entities^200+inbound_text^1+bestlink_keywords^1q=id:story|25584945cps=1sort=time_published+desc Thanks, Renz 2009/8/5 Renz Daluz renz052...@gmail.com Thanks guys. I tried to boost it instead (as sort looks like not supported) but it's not taking effect. Here are the parameters that I'm using: I want to boost by time_published field and I enable mlt.boost bf=recip(rord(time_published),1,1000,165)^1500qt=mltmlt.boost=true Regards, /Renz 2009/8/4 Avlesh Singh avl...@gmail.com You lost me. Absolutely sorry about that Bill :( How does boosting change the sort order? What I really meant here is that if you have more than one similarity fields in you MLT query, you can boost the results found due to one over the other. It was not at all aimed to be an answer for sort. Actually, I was too prompt to respond! What about sorting on a field that is not the mlt field? Haven't tried this yet. It would be surprising if it does not work as expected. Cheers Avlesh On Tue, Aug 4, 2009 at 3:24 AM, Bill Au bill.w...@gmail.com wrote: Avlesh, You lost me. How does boosting change the sort order? What about sorting on a field that is not the mlt field? Bill On Mon, Aug 3, 2009 at 3:13 AM, Avlesh Singh avl...@gmail.com wrote: You can boost the similarity field matches, if you want. Look for mlt.boost at http://wiki.apache.org/solr/MoreLikeThis Cheers Avlesh On Mon, Aug 3, 2009 at 11:33 AM, Renz Daluz renz052...@gmail.com wrote: Hi, I'm looking at changing the result order when searching by MLT. I tried the sort=field,order but it's not working. I check the wiki and can't find anything. Is there a way to do this? Thanks, /Laurence
query matching issue
Hello list, I have documents contains word Richard Nass. I need to match the Richard Nass documents for the query strings richard, nass, rich. The search works for the following query , http://localhost:8983/solr/select?q=author:Richard http://localhost:8983/solr/select?q=author:Richard nass nass http://localhost:8983/solr/select?q=author:Richard http://localhost:8983/solr/select?q=author:Richard Nass Nass http://localhost:8983/solr/select?q=author:richard http://localhost:8983/solr/select?q=author:richard nass nass But doesnot work for q=author:Richard, q=author:nass q=author:rich... I tried wildcard search like q=author:rich* also. Can anyone help me how to get the flexible search as above. Thanks in advance.. Radha.C
Re: DataImportHandler: Partial Delete and Update (Hacking deleteQuery in SOLR 1.3?)
Hi Paul, yes, I did and I just verified in the code. The deletedPkQuery is used to collect all primary keys of the root entity that shall be deleted from the index. The deletion is done on the SOLR writer by unique ID: writer.deleteDoc(deletedKey.get(root.pk)); //DocBuilder delCmd.id = id.toString(); // SOLR Writer deleteDoc() delCmd.fromPending = true; delCmd.fromCommitted = true; processor.processDelete(delCmd); // RunUpdateProcessorFactory @Override public void processDelete(DeleteUpdateCommand cmd) throws IOException { if( cmd.id != null ) { updateHandler.delete(cmd); // writer.deleteDoc() uses that } else { updateHandler.deleteByQuery(cmd); // I would like to use that } super.processDelete(cmd); } My problem is that the ids I have to delete are those that do not exist in the database anymore. So, I have no means to return them by DB query. That is why I would like to use a different field that a group of documents has in common, and that would allow me to get hold of the outdated documents in the index. (But I have to find out the value of that other field by DB query.) Cheers, Chantal Noble Paul നോബിള് नोब्ळ् schrieb: did you explore the deletedPkQuery ? On Wed, Aug 5, 2009 at 11:46 AM, Chantal Ackermannchantal.ackerm...@btelligent.de wrote: Hi all, the database from which I populate the SOLR index is refreshed partially. Subsets of the data is deleted and readded for a certain group identifier. Is it possible to do something alike in a (delta) import of the DataImportHandler? Example: SOLR-Index: groupID: 1, PK: 1, refreshDate: [before last_index_time] groupID: 1, PK: 2, refreshDate: [before last_index_time] groupID: 1, PK: 3, refreshDate: [before last_index_time] Refreshed DB: groupID: 1, PK: 1, refreshDate: [after last_index_time] groupID: 1, PK: 5, refreshDate: [after last_index_time] groupID: 1, PK: 30, refreshDate: [after last_index_time] (PK 2 and 3 are not there, anymore. PK is unique across all groupIDs) deleteQuery=groupID:1 (An attribute of the entity element that the DocBuilder (1.3) reads and sends as query once, before the delta import, unchanged to the SOLR writer to delete documents.) After that, the delta import loads data with groupID=1 from the DB. Could I plug into SOLR with maybe a custom processor to achieve something in the direction of: deleteInput=select FIELD_VALUE from TABLE where CHANGED_DATE '${dataimporter.last_index_time}' group by FIELD_VALUE deleteQuery=field:${my_entity.FIELD_VALUE} FIELD_VALUE is not the primary key, and the deleteInput query can return multiple rows. I am aware of SOLR-1060 and SOLR-1059 but I am not sure that those will help me. In those cases it looks like the delete is run per entity. I want the delete to run before the (delta)import, once. If that impression is wrong, I'll happily switch to 1.4, of course. Cheers! Chantal -- Chantal Ackermann -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: DataImportHandler: Partial Delete and Update (Hacking deleteQuery in SOLR 1.3?)
Thanks, Paul! :-) The wiki doesn't mark $deleteDocByQuery (and the other special commands) as 1.4, as it usually does. Maybe it's worth correcting that? Noble Paul നോബിള് नोब्ळ् schrieb: ok, writing an EntityProcessor/Transofrmer may help here use the special command http://wiki.apache.org/solr/DataImportHandler#head-5e9ebf5a2aaa1dc54464102c395ed1bf7cdb98c3 $deleteDocByQuery is what you need .
query matching issue
Hello list, I have documents contain word Richard Nass. I need to match the Richard Nass documents for the query strings richard, nass, rich. The search works for the following queries , http://localhost:8983/solr/select?q=author:Richard nass http://localhost:8983/solr/select?q=author:Richard nass http://localhost:8983/solr/select?q=author:Richard Nass http://localhost:8983/solr/select?q=author:Richard Nass http://localhost:8983/solr/select?q=author:richard nass http://localhost:8983/solr/select?q=author:richard nass But doesnot work for q=author:Richard, q=author:nass q=author:rich I tried wildcard search like q=author:rich* also. Can anyone help me how to get the flexible search as above. Thanks in advance.. Radha.C
Re: ClassCastException from custom request handler
OK, problem solved! Well, worked around. I gave up on the new style plugin loading in a multicore Jetty setup, and packaged up my plugin in a rebuilt solr.war. I had tried this before, but only putting the class files in WEB-INF/lib. If I put a jar file in there, it works. 2009/8/4 Chantal Ackermann chantal.ackerm...@btelligent.de James Brady schrieb: Yeah I was thinking T would be SolrRequestHandler too. Eclipse's debugger can't tell me... You could try disassembling. Or Eclipse opens classes in a very rudimentary format when there is no source code attached. Maybe it shows the actual return value there, instead of T. Lot's of other handlers are created with no problem before my plugin falls over, so I don't think it's a problem with T not being what we expected. Do you know of any working examples of plugins I can download and build in my environment to see what happens? No sorry. I've only overwritten the EntityProcessor from DataImportHandler, and that is not configured in solrconfig.xml. 2009/8/4 Chantal Ackermann chantal.ackerm...@btelligent.de Code is from AbstractPluginLoader in the solr plugin package, 1.3 (the regular stable release, no svn checkout). 80-84 @SuppressWarnings(unchecked) protected T create( ResourceLoader loader, String name, String className, Node node ) throws Exception { return (T) loader.newInstance( className, getDefaultPackages() ); } -- http://twitter.com/goodgravy 512 300 4210 http://webmynd.com/ Sent from Bury, United Kingdom -- http://twitter.com/goodgravy 512 300 4210 http://webmynd.com/ Sent from Bury, United Kingdom
mergeContiguous for multiple search terms
Hello, we would like to use the highlightingComponent with the mergeContiguous parameter set to true. We have a field with value: Ökonom Charles Goodhart. If we search for all three words, they are found correctly: emÖkonom/em emCharles/em emGoodhart/em But, as I set the mergeContiguous parameter to true, I expected: emÖkonom Charles Goodhart/em. Am I misunderstanding the behaviour of this parameter? We are using the dismax-query parser and solr-1.3. Thank you very much for your time. Björn Hachmann
help getting started with spell check dictionary
Hi, I have downloaded a dictionary in plane text format from http://icon.shef.ac.uk/Moby/mwords.html and added it to my /mnt directory. When I tried to add: lst name=dictionary str name=nameexternal/str str name=typeorg.apache.solr.spelling.FileBasedSpellChecker/str str name=sourceLocation/mnt/dictionary.txt/str str name=fieldTypetext/str /lst within the requestHandler name=spellchecker class=solr.SpellCheckerRequestHandler startup=lazy block, I thought it would be as easy as running a query like: http://localhost:8983/solr/select/?q=cancrspellcheck=truespellcheck.build=true to get it to work. Can anyone tell me what steps I am missing here? Thanks for any help here. I was trying to get the idea from the example here: https://issues.apache.org/jira/browse/SOLR-572 after reading through http://wiki.apache.org/solr/SpellCheckComponent -- Regards, Ian Connor
SolrJ and ISO-8859-1
Hello, Is it possible to change the encoding of the SolrJ request and response? Regards, Rene
Index rebuilding.
Hi All, Am I right, when I say, that solr index is rebuild, when 'commit' command send? Let's suppose yes. For instance, I have solr index with 1M document, and then I'm committing one more million documents. Here is some questions: - will this (second) commit took longer then first one? much longer? - Will it use some drive space for temporary data, while rebuilding index, which is then will be free? how much? - Is is possible to perform searches, which this rebuilding is in progress? Thanks! -- View this message in context: http://www.nabble.com/Index-rebuilding.-tp24829220p24829220.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DisMax - fetching dynamic fields
My bad! Please disregard this post. Alex On Tue, Aug 4, 2009 at 9:21 PM, Alexey Serbaase...@gmail.com wrote: Solr 1.4 built from trunk revision 790594 ( 02 Jul 2009 ) On Tue, Aug 4, 2009 at 9:19 PM, Alexey Serbaase...@gmail.com wrote: Hi everybody, I have a couple of dynamic fields in my schema, e.g. rating_* popularity_* The problem I have is that if I try to specify existing fields rating_1 popularity_1 in fl parameter - DisMax handler just ignores them whereas StandardRequestHandler works fine. Any clues what's wrong? Thanks in advance, Alex
Re: Index rebuilding.
On Wed, Aug 5, 2009 at 8:21 PM, caezar caeza...@gmail.com wrote: Hi All, Am I right, when I say, that solr index is rebuild, when 'commit' command send? Let's suppose yes. For instance, I have solr index with 1M document, and then I'm committing one more million documents. Here is some questions: - will this (second) commit took longer then first one? much longer? When you do the second commit, the auto-warming of caches and/or queries on newSearcher may take longer. Also, during indexing segments may get merged which may add some time. - Will it use some drive space for temporary data, while rebuilding index, which is then will be free? how much? No. Commit should not need extra drive space. An optimize may need additional space temporarily. But it is always good to have extra free space on the disk. - Is is possible to perform searches, which this rebuilding is in progress? Yes. -- Regards, Shalin Shekhar Mangar.
Re: Wild card search does not return any result
Thanks Otis and Avlesh, Below is the configuration I have 1] solrconfig.xml . requestHandler name=standard class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.count1/str /lst arr name=last-components strspellcheck/str /arr /requestHandler . . requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdata-import.xml/str /lst /requestHandler .. .. searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=namedefault/str str name=fieldSPELL/str str name=spellcheckIndexDir./spellcheckerIndex/str str name=buildOnCommittrue/str str name=buildOnOptimizetrue/str /lst /searchComponent 2] data-import.xml . .. document name=doc entity name=user pk=ID query=select * from user field column=ROLE name=ROLE / field column=ID name=ID / field column=BUS name=BUS / . . 3] schema.xml .. .. field name=ID type=float indexed=true stored=true / field name=BUS type=text indexed=true stored=true/ field name=ROLE type=text indexed=true stored=true / .. .. field name=ID type=float indexed=true stored=true / field name=BUS type=text indexed=true stored=true/ field name=ROLE type=text indexed=true stored=true / field name=SPELL type=textSpell indexed=true stored=true multiValued=true/ copyField source=BUS dest=SPELL / .. .. fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType To make it simple. I have only one record in the table, ID=1 BUS=ICS ROLE=SSE like I said before, *I don't get any match, if i search for q=ics* I get the match, which is correct result, if i search for q=sse** I have not done any query rewriting, i am just using the default configuration, that comes with solr. Otis, Let me know if you need any more information. Avlesh, The above set up is just a striped down version, to figure out what is the issue, In my real application, I have 100 of collums in the table, that i use for building the search index. I dont think its a good option to copy over all the fields and create another 100 odd fields, with just lower case filter applied. Parvez From: Otis Gospodnetic otis_gospodne...@yahoo.com Date: Tue, Aug 4, 2009 at 8:25 PM Subject: Re: Wild card search does not return any result To: solr-user@lucene.apache.org Hi, I doubt it's a bug. It's probably working correctly based on the config, etc., I just don't have enough details about the configuration, your request handler, query rewriting, the data in your index, etc. to tell you what exactly is happening. Otis On Tue, Aug 4, 2009 at 11:13 PM, Avlesh Singh avl...@gmail.com wrote: You read it incorrectly Parvez. The bug that Bill seem to have found out is with the analysis tool and NOT the search handler itself. Results in your case is as expected. Wildcard queries are not analyzed hence the inconsistency. A workaround is suggested, on the same thread, here - http://markmail.org/message/ts65a6jok3ii6nva#query:+page:1+mid:i5zxdbnvspgek2bp+state:results Cheers Avlesh On Wed, Aug 5, 2009 at 12:52 AM, Mohamed Parvez par...@gmail.com wrote: Thanks Otis, The thread suggests that this is bug http://markmail.org/message/ts65a6jok3ii6nva#query:+page:1+mid:qinymqdn6mkocv4k Both SSE and ICS are 3 letter word and both are not part of English language. SEE* works
Re: Wild card search does not return any result
looks like earlier schema.xml, has some typo. below is the correct schema.xml 3] schema.xml .. .. field name=ID type=float indexed=true stored=true / field name=BUS type=text indexed=true stored=true/ field name=ROLE type=text indexed=true stored=true / field name=SPELL type=textSpell indexed=true stored=true multiValued=true/ copyField source=BUS dest=SPELL / .. .. fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Thanks/Regards, Parvez On Wed, Aug 5, 2009 at 10:53 AM, Mohamed Parvez par...@gmail.com wrote: Thanks Otis and Avlesh, Below is the configuration I have 1] solrconfig.xml . requestHandler name=standard class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.count1/str /lst arr name=last-components strspellcheck/str /arr /requestHandler . . requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdata-import.xml/str /lst /requestHandler .. .. searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=namedefault/str str name=fieldSPELL/str str name=spellcheckIndexDir./spellcheckerIndex/str str name=buildOnCommittrue/str str name=buildOnOptimizetrue/str /lst /searchComponent 2] data-import.xml . .. document name=doc entity name=user pk=ID query=select * from user field column=ROLE name=ROLE / field column=ID name=ID / field column=BUS name=BUS / . . 3] schema.xml .. .. field name=ID type=float indexed=true stored=true / field name=BUS type=text indexed=true stored=true/ field name=ROLE type=text indexed=true stored=true / .. .. field name=ID type=float indexed=true stored=true / field name=BUS type=text indexed=true stored=true/ field name=ROLE type=text indexed=true stored=true / field name=SPELL type=textSpell indexed=true stored=true multiValued=true/ copyField source=BUS dest=SPELL / .. .. fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType To make it
Re: THIS WEEK: PNW Hadoop, HBase / Apache Cloud Stack Users' Meeting, Wed Jul 29th, Seattle
A big thanks to everyone who came out despite the heat! Hope to see you again the last week of August, probably at UW. On Wed, Jul 29, 2009 at 4:52 PM, Bradford Stephensbradfordsteph...@gmail.com wrote: Don't forget this is tonight! Excited to see everyone there. On Tue, Jul 28, 2009 at 11:25 AM, Bradford Stephensbradfordsteph...@gmail.com wrote: Hey everyone, SLIGHT change of plans. A few people have asked me to move to a place with Air Conditioning, since the temperature's in the 90's this week. So, here we go: Big Time Brewing Company 4133 University Way NE Seattle, WA 98105 Call me at 904-415-3009 if you have any questions. On Mon, Jul 27, 2009 at 12:16 PM, Bradford Stephensbradfordsteph...@gmail.com wrote: Hello again! Yes, I know some of us are still recovering from OSCON. It's time for another delicious meetup to chat about Hadoop, HBase, Solr, Lucene, and more! UW is quite a pain for us to access until August, so we're changing the venue to one pretty close: Piccolo's Pizza 5301 Roosevelt Way NE (between 53rd St 55th St) 6:45pm - 8:30 (or when we get bored)! As usual, people are more than welcome to give talks, whether they're long-format or lightning. I'd also really like to start thinking about hackathons, perhaps we could have one next month? I'll be talking about HBase .20 and the possibility of low-latency HBase Analytics. I'd be very excited to hear what people are up to! Contact me if there's any questions: 904-415-3009 Cheers, Bradford -- http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science -- http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science -- http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science -- http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
Limit of Index size per machine..
Hi , We are planning to use Solr for indexing the server log contents. The expected processed log file size per day: 100 GB We are expecting to retain these indexes for 30 days (100*30 ~ 3 TB). Can any one provide what would be the optimal size of the index that I can store on a single server, without hampering the search performance etc. We are planning to use OSX server with a configuration of 16 GB (Can go to 24 GB). We need to figure out how many servers are required to handle such amount of data.. Any help would be greatly appreciated. Thanks SilentSurfer
Re: sole 1.3: bug in phps response writer
Hey Otis, I don't think this issue has been solved yet. I am working with Solr 1.3 release and yet i get the same exception as the original post. I have Solr 1.3 release with the localsolr jars. Any advice is helpful ... for now i will use the json response writer and work around this bug. Thanks -- take care Otis Gospodnetic wrote: Hi Alok, I don't think it's a known issue and 2. a) sounds like the best and most appreciated approach! :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch From: Alok Dhir ad...@symplicity.com To: solr-user@lucene.apache.org Sent: Monday, November 17, 2008 12:36:25 PM Subject: sole 1.3: bug in phps response writer Distributed queries: curl 'http://devxen0:8983/solr/core0/select?shards=search3:0,search3:8983/solr/core2version=2.2start=0rows=10q=instance%3Arit%5C-csm.symplicity.com+AND+label%3ALoginwt=php' curl 'http://devxen0:8983/solr/core0/select?shards=search3:0,search3:8983/solr/core2version=2.2start=0rows=10q=instance%3Arit%5C-csm.symplicity.com+AND+label%3ALoginwt=xml curl 'http://devxen0:8983/solr/core0/select?shards=search3:0,search3:8983/solr/core2version=2.2start=0rows=10q=instance%3Arit%5C-csm.symplicity.com+AND+label%3ALoginwt=json'' All work fine, providing identical results in their respective formats (note the change in the wt param). curl 'http://devxen0:8983/solr/core0/select?shards=search3:8983/solr/core0,search3:8983/solr/core2version=2.2start=0rows=10q=instance%3Arit%5C-csm.symplicity.com+AND+label%3ALoginwt=phps' fails with: java.lang.IllegalArgumentException: Map size must not be negative at org.apache.solr.request.PHPSerializedWriter.writeMapOpener(PHPSerializedResponseWriter.java:195) at org.apache.solr.request.JSONWriter.writeSolrDocument(JSONResponseWriter.java:392) at org.apache.solr.request.JSONWriter.writeSolrDocumentList(JSONResponseWriter.java:547) at org.apache.solr.request.TextResponseWriter.writeVal(TextResponseWriter.java:147) at org.apache.solr.request.JSONWriter.writeNamedListAsMapMangled(JSONResponseWriter.java:150) at org.apache.solr.request.PHPSerializedWriter.writeNamedList(PHPSerializedResponseWriter.java:71) at org.apache.solr.request.PHPSerializedWriter.writeResponse(PHPSerializedResponseWriter.java:66) at org.apache.solr.request.PHPSerializedResponseWriter.write(PHPSerializedResponseWriter.java:47) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Questions: 1) Is this known? I didn't see it in the issue treacker. 2) What's the better course of action: a) download source, fix, submit patch, wait for new relase; b) drop phps and use json instead? Thanks -- View this message in context: http://www.nabble.com/sole-1.3%3A-bug-in-phps-response-writer-tp20544146p24834570.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Limit of Index size per machine..
I try to keep the index directory size less than the amount of RAM and rely on the OS to cache as it needs. Linux does a pretty good job here and I am sure OS X will do a good job also. Distributed search here will be your friend so you can chunk it up to a number of servers to keep your cost down (2GB RAM sticks are much cheaper than 4GB RAM sticks $20 $100). Ian. On Wed, Aug 5, 2009 at 1:44 PM, Silent Surfer silentsurfe...@yahoo.comwrote: Hi , We are planning to use Solr for indexing the server log contents. The expected processed log file size per day: 100 GB We are expecting to retain these indexes for 30 days (100*30 ~ 3 TB). Can any one provide what would be the optimal size of the index that I can store on a single server, without hampering the search performance etc. We are planning to use OSX server with a configuration of 16 GB (Can go to 24 GB). We need to figure out how many servers are required to handle such amount of data.. Any help would be greatly appreciated. Thanks SilentSurfer -- Regards, Ian Connor 1 Leighton St #723 Cambridge, MA 02141 Call Center Phone: +1 (714) 239 3875 (24 hrs) Fax: +1(770) 818 5697 Skype: ian.connor
RE: 99.9% uptime requirement
Maintenance Questions: In a two slave one master setup where the two slaves are behind load balancers what happens if I have to restart solr? If I have to restart solr say for a schema update where I have added a new field then what is the recommended procedure? If I can guarantee no commits or optimizes happen on the master during the schema update so no new snapshots become available then can I safely leave rsyncd enabled? When I stop and start a slave server, should I first pull it out of the load balancers list or will solr gracefully release connections as it shuts down so no searches are lost? What do you guys do to push out updates? Thanks for any thoughts, Robi -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: Tuesday, August 04, 2009 8:57 AM To: solr-user@lucene.apache.org Subject: Re: 99.9% uptime requirement Right. You don't get to 99.9% by assuming that an 8 hour outage is OK. Design for continuous uptime, with plans for how long it takes to patch around a single point of failure. For example, if your load balancer is a single point of failure, make sure that you can redirect the front end servers to a single Solr server in much less than 8 hours. Also, think about your SLA. Can the search index be more than 8 hours stale? How quickly do you need to be able to replace a failed indexing server? You might be able to run indexing locally on each search server if they are lightly loaded. wunder On Aug 4, 2009, at 7:11 AM, Norberto Meijome wrote: On Mon, 3 Aug 2009 13:15:44 -0700 Robert Petersen rober...@buy.com wrote: Thanks all, I figured there would be more talk about daemontools if there were really a need. I appreciate the input and for starters we'll put two slaves behind a load balancer and grow it from there. Robert, not taking away from daemon tools, but daemon tools won't help you if your whole server goes down. don't put all your eggs in one basket - several servers, load balancer (hardware load balancers x 2, haproxy, etc) and sure, use daemon tools to keep your services running within each server... B _ {Beto|Norberto|Numard} Meijome Why do you sit there looking like an envelope without any address on it? Mark Twain I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
enablereplication does not work
Hi, http://localhost:8549/solr/replication?command=enablereplication does not seem working. After making the request, I run http://localhost:8549/solr/replication?command=indexversion and here is the response: response lst name=responseHeader int name=status0/int int name=QTime0/int /lst long name=indexversion0/long long name=generation0/long /response Notice the indexversion is 0, which is the value after you disable replication. On the other hand http://localhost:8549/solr/replication?command=details returns: response lst name=responseHeader int name=status0/int int name=QTime7/int /lst lst name=details str name=indexSize692 bytes/str str name=indexPath /tmp/solr/solrdata/index /str arr name=commits/ str name=isMastertrue/str str name=isSlavefalse/str long name=indexVersion1249517184279/long long name=generation2/long lst name=master str name=replicateAftercommit/str /lst /lst str name=WARNING This response format is experimental. It is likely to change in the future. /str /response Notice that the indexversion is 1249517184279. thanks, -- J
Re: Limit of Index size per machine..
Hi, That means we need approximately 3000 GB (Index Size)/24 GB (RAM) = 125 servers. It would be very hard to convince my org to go for 125 servers for log management of 3 Terabytes of indexes. Has any one used, solr for processing and handling of the indexes of the order of 3 TB ? If so how many servers were used for indexing alone. Thanks, sS --- On Wed, 8/5/09, Ian Connor ian.con...@gmail.com wrote: From: Ian Connor ian.con...@gmail.com Subject: Re: Limit of Index size per machine.. To: solr-user@lucene.apache.org Date: Wednesday, August 5, 2009, 9:38 PM I try to keep the index directory size less than the amount of RAM and rely on the OS to cache as it needs. Linux does a pretty good job here and I am sure OS X will do a good job also. Distributed search here will be your friend so you can chunk it up to a number of servers to keep your cost down (2GB RAM sticks are much cheaper than 4GB RAM sticks $20 $100). Ian. On Wed, Aug 5, 2009 at 1:44 PM, Silent Surfer silentsurfe...@yahoo.comwrote: Hi , We are planning to use Solr for indexing the server log contents. The expected processed log file size per day: 100 GB We are expecting to retain these indexes for 30 days (100*30 ~ 3 TB). Can any one provide what would be the optimal size of the index that I can store on a single server, without hampering the search performance etc. We are planning to use OSX server with a configuration of 16 GB (Can go to 24 GB). We need to figure out how many servers are required to handle such amount of data.. Any help would be greatly appreciated. Thanks SilentSurfer -- Regards, Ian Connor 1 Leighton St #723 Cambridge, MA 02141 Call Center Phone: +1 (714) 239 3875 (24 hrs) Fax: +1(770) 818 5697 Skype: ian.connor
Re: Limit of Index size per machine..
That is why people don't use search engines to manage logs. Look at a Hadoop cluster. wunder On Aug 5, 2009, at 10:08 PM, Silent Surfer wrote: Hi, That means we need approximately 3000 GB (Index Size)/24 GB (RAM) = 125 servers. It would be very hard to convince my org to go for 125 servers for log management of 3 Terabytes of indexes. Has any one used, solr for processing and handling of the indexes of the order of 3 TB ? If so how many servers were used for indexing alone. Thanks, sS --- On Wed, 8/5/09, Ian Connor ian.con...@gmail.com wrote: From: Ian Connor ian.con...@gmail.com Subject: Re: Limit of Index size per machine.. To: solr-user@lucene.apache.org Date: Wednesday, August 5, 2009, 9:38 PM I try to keep the index directory size less than the amount of RAM and rely on the OS to cache as it needs. Linux does a pretty good job here and I am sure OS X will do a good job also. Distributed search here will be your friend so you can chunk it up to a number of servers to keep your cost down (2GB RAM sticks are much cheaper than 4GB RAM sticks $20 $100). Ian. On Wed, Aug 5, 2009 at 1:44 PM, Silent Surfer silentsurfe...@yahoo.com wrote: Hi , We are planning to use Solr for indexing the server log contents. The expected processed log file size per day: 100 GB We are expecting to retain these indexes for 30 days (100*30 ~ 3 TB). Can any one provide what would be the optimal size of the index that I can store on a single server, without hampering the search performance etc. We are planning to use OSX server with a configuration of 16 GB (Can go to 24 GB). We need to figure out how many servers are required to handle such amount of data.. Any help would be greatly appreciated. Thanks SilentSurfer -- Regards, Ian Connor 1 Leighton St #723 Cambridge, MA 02141 Call Center Phone: +1 (714) 239 3875 (24 hrs) Fax: +1(770) 818 5697 Skype: ian.connor
Re: Limit of Index size per machine..
Hi, We initially went with Hadoop path, but as it is one more software based file system on top of the OS file system, we didn't get a buy in from our system Engineers. i.e In case if we run into any HDFS issues, SEs won't be supporting us :( Regards, sS --- On Thu, 8/6/09, Walter Underwood wun...@wunderwood.org wrote: From: Walter Underwood wun...@wunderwood.org Subject: Re: Limit of Index size per machine.. To: solr-user@lucene.apache.org Date: Thursday, August 6, 2009, 5:12 AM That is why people don't use search engines to manage logs. Look at a Hadoop cluster. wunder On Aug 5, 2009, at 10:08 PM, Silent Surfer wrote: Hi, That means we need approximately 3000 GB (Index Size)/24 GB (RAM) = 125 servers. It would be very hard to convince my org to go for 125 servers for log management of 3 Terabytes of indexes. Has any one used, solr for processing and handling of the indexes of the order of 3 TB ? If so how many servers were used for indexing alone. Thanks, sS --- On Wed, 8/5/09, Ian Connor ian.con...@gmail.com wrote: From: Ian Connor ian.con...@gmail.com Subject: Re: Limit of Index size per machine.. To: solr-user@lucene.apache.org Date: Wednesday, August 5, 2009, 9:38 PM I try to keep the index directory size less than the amount of RAM and rely on the OS to cache as it needs. Linux does a pretty good job here and I am sure OS X will do a good job also. Distributed search here will be your friend so you can chunk it up to a number of servers to keep your cost down (2GB RAM sticks are much cheaper than 4GB RAM sticks $20 $100). Ian. On Wed, Aug 5, 2009 at 1:44 PM, Silent Surfer silentsurfe...@yahoo.com wrote: Hi , We are planning to use Solr for indexing the server log contents. The expected processed log file size per day: 100 GB We are expecting to retain these indexes for 30 days (100*30 ~ 3 TB). Can any one provide what would be the optimal size of the index that I can store on a single server, without hampering the search performance etc. We are planning to use OSX server with a configuration of 16 GB (Can go to 24 GB). We need to figure out how many servers are required to handle such amount of data.. Any help would be greatly appreciated. Thanks SilentSurfer -- Regards, Ian Connor 1 Leighton St #723 Cambridge, MA 02141 Call Center Phone: +1 (714) 239 3875 (24 hrs) Fax: +1(770) 818 5697 Skype: ian.connor
Re: enablereplication does not work
how is the replicationhandler configured? if there was no commit/optimize thhen it would show the version as '0' On Thu, Aug 6, 2009 at 5:50 AM, solr jaysolr...@gmail.com wrote: Hi, http://localhost:8549/solr/replication?command=enablereplication does not seem working. After making the request, I run http://localhost:8549/solr/replication?command=indexversion and here is the response: response lst name=responseHeader int name=status0/int int name=QTime0/int /lst long name=indexversion0/long long name=generation0/long /response Notice the indexversion is 0, which is the value after you disable replication. On the other hand http://localhost:8549/solr/replication?command=details returns: response lst name=responseHeader int name=status0/int int name=QTime7/int /lst lst name=details str name=indexSize692 bytes/str str name=indexPath /tmp/solr/solrdata/index /str arr name=commits/ str name=isMastertrue/str str name=isSlavefalse/str long name=indexVersion1249517184279/long long name=generation2/long lst name=master str name=replicateAftercommit/str /lst /lst str name=WARNING This response format is experimental. It is likely to change in the future. /str /response Notice that the indexversion is 1249517184279. thanks, -- J -- - Noble Paul | Principal Engineer| AOL | http://aol.com