Re: Replication clients logs in solr 1.4
Oops. Ok my mistakes. The logs are actually for the solr 1.3 system scripts based distribution only. And the config files synchronize only on change .. J. 2010/1/20 Jérôme Etévé : > Hi All, > > I'm using the build in replication with master/slave(s) Solr and the > indices are replicating just fine. > > Just something troubles me: > > Nothing happens in my logs/ directory .. > On the slave(s), no logs/snapshot.current file. > And on the master, nothing either appears on logs/clients/ > > The logs directories belongs to the tomcat running solr and are writable > > Another thing I noticed is I've got some timesFailed=18 in the slave > replication.properties, although I cannot see any error in my > catalina.out :(, I just have: > 20-Jan-2010 16:11:00 org.apache.solr.handler.SnapPuller fetchLatestIndex > INFO: Slave in sync with master > > Is there any reason for this? > > What I also don't get is that no documents are being updated on my > master, the index versions are the same on my slave and master and > still timesFailed is increasing continuously. > > The master config files seems to fail to synchronize as well. > > Thanks for any help. > > Jerome. > > > > > -- > Jerome Eteve. > http://www.eteve.net > jer...@eteve.net > -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Replication clients logs in solr 1.4
Hi All, I'm using the build in replication with master/slave(s) Solr and the indices are replicating just fine. Just something troubles me: Nothing happens in my logs/ directory .. On the slave(s), no logs/snapshot.current file. And on the master, nothing either appears on logs/clients/ The logs directories belongs to the tomcat running solr and are writable Another thing I noticed is I've got some timesFailed=18 in the slave replication.properties, although I cannot see any error in my catalina.out :(, I just have: 20-Jan-2010 16:11:00 org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master Is there any reason for this? What I also don't get is that no documents are being updated on my master, the index versions are the same on my slave and master and still timesFailed is increasing continuously. The master config files seems to fail to synchronize as well. Thanks for any help. Jerome. -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Re: exact match lookup
If feedClass acts as an identifier, better use string :) use sort=title asc,score desc (not sort:) J. 2009/11/4 Joel Nylund : > thank worked for me, changed to: > > http://localhost:8983/solr/select?q=feedClass:%22social%20news%22 > > and the matches are correct, I changed the feedClass field back to type > text. > > A followup question has to do with sorting these results. > > I have a field called title that I want the results sorted by. > > http://localhost:8983/solr/select?q=feedClass:%22social%20news%22&sort:title%20asc > > I tried this and the results are not sorted (they seem random) > > any ideas? > > thanks > Joel > > > > − > > 0 > 1 > − > > feedClass:"social news" > > > > − > > − > > Social News > F > Far > > > Social News > D > dig > > > Social News > T > Tech > > > Social News > M > Mix > > > > > On Nov 4, 2009, at 12:15 PM, Jérôme Etévé wrote: > >> Hi, >> you need to quote your phrase when you search for 'Social News': >> >> feedClass:"Social News" (URI encoded of course). >> >> otherwise your request will become (I assume you're using a standard >> query parser) feedClass:Social defaultField:News . Well that's the >> idea. >> >> It should then work using the type string. >> >> Cheers! >> >> J. >> >> >> 2009/11/4 Joel Nylund : >>> >>> Hi, >>> >>> I have a field that I want to do exact match lookups using. >>> (when I say exact match, im looking for equivalent to a sql query where >>> with >>> no like clause so where feedClass = "Social News") >>> >>> For example the field is called feedClass and im doing: >>> >>> http://localhost:8983/solr/select?q=feedClass:Blog >>> >>> http://localhost:8983/solr/select?q=feedClass:Social%20News >>> >>> I tried using "text" and it seems to work pretty well except for classes >>> with spaces in them. >>> >>> So I tried using field type string, that didnt work. Then I tried >>> defining a >>> new type called: >>> >>> >> positionIncrementGap="100"> >>> >>> >>> >>> This didnt seem to help either. >>> >>> When I do these queries for this field with spaces, I seem to get random >>> results >>> >>> For example: >>> >>> >>> − >>> >>> 0 >>> 5 >>> − >>> >>> feedClass:Social News >>> >>> >>> − >>> >>> − >>> >>> Blog >>> N >>> >>> >>> >>> any ideas? >>> >>> thanks >>> Joel >>> >>> >> >> >> >> -- >> Jerome Eteve. >> http://www.eteve.net >> jer...@eteve.net > > -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Re: character encoding issue
Hi, How do you post your data to solr? If it's by posting XML, then it should be properly encoded in UTF-8 (which is the XML default). Regardless of what's in the DB (which can be a mystery with MySQL). At query time, if the XML writer is used, then it's encoded in UTF-8. If the json one is used, I think it's the same. Because json is unicode compliant by nature (javascript). According to what you say, I would bet for a PHP problem. It seems PHP takes the correct UTF8 octets from solr and displays them as latin1 encoding (hence the strange characters). You need to - either output your pages in UTF-8 - or decode the octets given by solr to a unicode string and let it be encoded as latin1 for output (with the risk of loosing non-latin1 encodable characters). I hope it helps. J. 2009/11/4 Jonathan Hendler : > Hi Peter, > > I have the same set of issues and will look for a response here. > > Sometimes those other chars can be create at the time of input (like > extraction from a Microsoft Office doc from third part tool for example). > But MySQL looking OK in the browser might be because the encoding of MySQL > was not the same as the original text. Say for example that the collation of > MySQL is Latin, and the document was UTF-8. When a browser renders, it might > assume chars are UTF-8, but SOLR might be taking the table type literally in > the DIH (Latin1 Swedish for example). Could also be the way PHP doesn't > handle UTF-8 well and it depends on your client. > > Don't think it has anything to do with Jetty - I use Resin. > > Hope that helps, > > - Jonathan > > > On Nov 4, 2009, at 8:48 AM, Peter Hedlund wrote: > >> I'm having a problem with character encoding. The data that I'm indexing >> with SOLR is being pulled from a MySQL database and then the index is being >> integrated into a PHP application. When I display the text from the SOLR >> index it's full of strange characters (–, é, etc...). However, when I >> bypass SOLR and access the data from the MySQL table directly and write to >> the browser I don't see any problems with em-dashes and accented characters. >> >> Is this a JETTY issue or a SOLR issue or something else? (It's not simply >> an issue of including > content="text/html;charset=UTF-8"> either) >> >> Thanks for any help. >> >> Peter Hedlund >> >> > > -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Re: exact match lookup
Hi, you need to quote your phrase when you search for 'Social News': feedClass:"Social News" (URI encoded of course). otherwise your request will become (I assume you're using a standard query parser) feedClass:Social defaultField:News . Well that's the idea. It should then work using the type string. Cheers! J. 2009/11/4 Joel Nylund : > Hi, > > I have a field that I want to do exact match lookups using. > (when I say exact match, im looking for equivalent to a sql query where with > no like clause so where feedClass = "Social News") > > For example the field is called feedClass and im doing: > > http://localhost:8983/solr/select?q=feedClass:Blog > > http://localhost:8983/solr/select?q=feedClass:Social%20News > > I tried using "text" and it seems to work pretty well except for classes > with spaces in them. > > So I tried using field type string, that didnt work. Then I tried defining a > new type called: > > positionIncrementGap="100"> > > > > This didnt seem to help either. > > When I do these queries for this field with spaces, I seem to get random > results > > For example: > > > − > > 0 > 5 > − > > feedClass:Social News > > > − > > − > > Blog > N > > > > any ideas? > > thanks > Joel > > -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Re: Lock problems: Lock obtain timed out
Hi, It seems this situation is caused by some No space left on device exeptions: SEVERE: java.io.IOException: No space left on device at java.io.RandomAccessFile.writeBytes(Native Method) at java.io.RandomAccessFile.write(RandomAccessFile.java:466) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.flushBuffer(SimpleFSDirectory.java:192) at org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96) I'd better try to set my maxMergeDocs and mergeFactor to more adequates values for my app (I'm indexing ~15 Gb of data on 20Gb device, so I guess there's problem when solr tries to merge the index bits being build. At the moment, they are set to 100 and 2147483647 Jerome. -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Lock problems: Lock obtain timed out
Hi, I've got a few machines who post documents concurrently to a solr instance. They do not issue the commit themselves, instead, I've got autocommit set up at solr server side: 5 6 This usually works fine, but sometime the server goes in a deadlock state . Here's the errors I get from the log (these go on forever until I delete the index and restart all from zero): 02-Nov-2009 10:35:27 org.apache.solr.update.SolrIndexWriter finalize SEVERE: SolrIndexWriter was not closed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! ... [ multiple messages like this ] ... 02-Nov-2009 10:35:27 org.apache.solr.common.SolrException log SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/home/solrdata/jobs/index/lucene-703db99881e56205cb910a2e5fd816d3-write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:85) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1538) at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1395) at org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:190) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98) at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:220) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) I'm wondering what could be the reason for this (if a commit takes mire than 60 seconds for instance?), and if I should use better locking or autocommittting options? Here's the locking conf I've got at the moment: 1000 1 native I'm using solr trunk from 12 oct 2009 within tomcat. Thanks for any help. Jerome. -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Re: Slow Commits
Hi, here's two thing that can slow down commits: 1) Autowarming the caches. 2) The Java old generation object garbage collection. You can try: - Turning autowarming off (set autowarmCount="0" in the caches configuration) - If you use the sun jvm, use -XX:+UseConcMarkSweepGC to get a less blocking garbage collection. You may also try to: - Not wait for the new searcher when you commit. The commit will then be instant from your posting application point of view. ( option waitSearcher=false ). - Leave the commits to the server ( by setting autocommits in the solrconfig.xml). This is the best strategy if you've got lot of concurrent processes who posts. Cheers. Jerome. 2009/10/28 Jim Murphy : > > Hi All, > > We have 8 solr shards, index is ~ 90M documents 190GB. :) > > 4 of the shards have acceptable commit time - 30-60 seconds. The other 4 > have drifted over the last couple months to but up around 2-3 minutes. This > is killing our write throughput as you can imagine. > > I've included a log dump of a typical commit. Not the large time period > (3:40) between the start commit log message and the OnCommit log message. > So, I think warming issues are not relevant. > > Any ideas what to debug at this point? > > I'm about to issue an optimize and see where that goes. Its been a while > since I did that. > > Cheers, > > Jim > > > > > Oct 28, 2009 11:47:02 AM org.apache.solr.update.DirectUpdateHandler2 commit > INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true) > Oct 28, 2009 11:50:43 AM org.apache.solr.core.SolrDeletionPolicy onCommit > INFO: SolrDeletionPolicy.onCommit: commits:num=2 > > commit{dir=/master/data/index,segFN=segments_8us4,version=1228872482131,generation=413140,filenames=[segments_8us4, > _alae.fnm, _ai > lk.tis, _ala9.fnm, _ala9.fdx, _alac.fnm, _al9w_h.del, _alab.prx, _ala9.fdt, > _a61p_b76.del, _alab.fnm, _al8x.frq, _al7i_2f.del, _akh1.tis, > _add1.frq, _alae.tis, _alad_1.del, _alaa.fnm, _alad.nrm, _al9w.frq, > _alae.tii, _ailk.tii, _add1.tis, _alac.tii, _akuu.tis, _add1.tii, _ail > k.frq, _alac.tis, _7zfh.tii, _962y.tis, _ala7.frq, _ah91.prx, _akuu.tii, > _alab_3.del, _ah91.fnm, _7zfh.tis, _ala8.frq, _962y.tii, _alae.pr > x, _a61p.fdt, _akuu.frq, _a61p.fdx, _al7i.fdx, _al2o.tis, _al9w.tis, > _ala7.fnm, _a61p.frq, _akzu.fnm, _9wzn.fnm, _akh1.prx, _al7i.fdt, _al > a9_2.del, _962y.prx, _al7i.prx, _al9w.tii, _alaa_4.del, _al7i.frq, > _ah91.tii, _ala8.nrm, _962y.fdt, _add1_62u.del, _alae.nrm, _ah91.tis, _ > 962y.fdx, _akh1.fnm, _al8x.prx, _al2o.tii, _ala7.fdx, _ala9.prx, _ala7.fdt, > _al9w.prx, _ala8.prx, _akh1.tii, _al2o.fdx, _7zfh.frq, _alac_3 > .del, _akzu.tii, _akzu.fdt, _alad.fnm, _akzu.tis, _alab.nrm, _akzu.fdx, > _al2o.fnm, _al2o.fdt, _alaa.prx, _alaa.nrm, _962y.fnm, _ala7.prx, > _alaa.tis, _ailk.fdt, _akzu_8d.del, _alac.frq, _akzu.prx, _ala9.nrm, > _ailk.prx, _ala9.tis, _alaa.tii, _alae.frq, _add1.fnm, _7zfh.prx, _al > 9w.fnm, _ala9.tii, _ala9.frq, _962y.nrm, _alab.frq, _ala8.fdx, _al8x.fnm, > _a61p.prx, _7zfh.fnm, _ala8.fdt, _ailk.fdx, _alaa.frq, _7zfh.fdx > , _al7i.tis, _ah91.fdt, _ailk.fnm, _9wzn_i0m.del, _ah91.fdx, _al7i.tii, > _ailk_24j.del, _alad.fdx, _al8x.tii, _alae.fdx, _add1.prx, _akuu.f > nm, _al8x.tis, _ah91.frq, _ala8.fnm, _7zfh.fdt, _alad.fdt, _alae_1.del, > _alae.fdt, _akzu.frq, _a61p.fnm, _9wzn.frq, _ala8.tii, _7zfh_1gsd. > del, _7zfh.nrm, _ala7_6.del, _a61p.tis, _9wzn.tii, _alad.frq, _alad.tii, > _akuu.fdt, _alab.tii, _ala8.tis, _962y_xgg.del, _akh1.frq, _akuu. > fdx, _alab.tis, _al7i.fnm, _alad.tis, _alac.nrm, _alab.fdx, _ala8_5.del, > _add1.fdx, _ala7.tii, _akuu_cc.del, _alab.fdt, _9wzn.prx, _alaa.f > dx, _al9w.fdt, _al2o.frq, _akh1_nf.del, _alac.prx, _akh1.fdx, _alaa.fdt, > _al9w.fdx, _al8x_17.del, _add1.fdt, _al2o.prx, _akh1.fdt, _alad.p > rx, _akuu.prx, _962y.frq, _al2o_66.del, _alac.fdt, _ala7.tis, _a61p.tii, > _alac.fdx, _al8x.fdt, _9wzn.tis, _9wzn.fdt, _al8x.fdx, _9wzn.fdx, > _ah91_35l.del] > > commit{dir=/master/data/index,segFN=segments_8us5,version=1228872482132,generation=413141,filenames=[_ala9.fnm, > _alaa_5.del, _alab > .fnm, _962y_xgh.del, _al8x.frq, _akh1.tis, _add1.frq, _alae.tis, > _7zfh_1gse.del, _alad.nrm, _alae.tii, _akuu.tis, _ah91_35m.del, _ailk.frq > , _7zfh.tii, _962y.tis, _akuu.tii, _ah91.prx, _7zfh.tis, _ala8.frq, > _962y.tii, _ala7.fnm, _akzu.fnm, _9wzn.fnm, _ala9_2.del, _ala8.nrm, _a > laf.fnm, _alae.nrm, _ala9.prx, _ailk_24k.del, _alaf.prx, _al9w.prx, > _ala8.prx, _akh1.tii, _akzu.tii, _akzu.tis, _alad.fnm, _al2o.fnm, _962 > y.fnm, _al8x_18.del, _ala7_7.del, _alaa.tis, _ala9.nrm, _ala9.tis, > _alaa.tii, _962y.nrm, _ala9.tii, _a61p.prx, _add1_62v.del, _al8x.fnm, _ > 7zfh.fnm, _al7i_2g.del, _ailk.fnm, _al8x.tii, _al8x.tis, _ala8.fnm, > _akzu.frq, _9wzn.frq, _7zfh.nrm, _akuu.fdt, _alad.tii, _akuu.fdx, _aku > u_cd.del, _a61p_b77.del, _alad.tis, _al2o_67.del, _add1.fdx, _9wzn.prx, > _al9w.fdt, _add1.fdt, _al9w.fdx, _akuu.prx, _962y.frq, _9wzn.fdt, > _alab_4.del, _9wzn.fdx, segments_8us5, _alac_4.del, _alae.fnm, _ailk
Re: Multifield query parser and phrase query behaviour from 1.3 to 1.4
Mea maxima culpa, I had foolishly set the option omitTermFreqAndPositions="false" in an attempt to save space. It works when this is set to 'true'. However, even when it's set to 'false' , the highlighting of a field continues to work even if the search doesn't. Does the highlighter use a different strategy to match the query terms in the fields? Cheers! Jerome. 2009/10/27 Jérôme Etévé : > Actually here is the difference between the textgen analysis pipeline and our: > > For the phrase "ingenieur d'affaire senior" , > Our pipeline gives right after our tokenizer: > > term position 1 2 3 4 > term text ingenieur d affaire senior > > 'd' and 'affaire' are separated as different tokens straight away. Our > filters have no later effect for this phrase. > > * The textgen pipeline uses a whitespace tokenizer, so it gives first: > term position 1 2 3 > term text ingenieur d'affaire senior > term type wordwordword > source start,end0,9 10,19 20,26 > > * Then a word delimiter filter splits the token "d'affaire" (and > generate the concatenation): > erm position1 2 3 4 > term text ingenieur d affaire senior > daffaire > term type wordwordwordword > word > source start,end0,9 10,11 12,19 20,26 > 10,19 > > > Could you see a reason why title:"d affaire" works with textgen but > not with our type?
Re: facet.query and fq
Hi, you need to 'tag' your filter and then exclude it from the faceting. An example here: http://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters J. 2009/10/27 David Giffin : > Hi There, > > Is there a way to get facet.query= to ignore the fq= param? We want to > do a query like this: > > select?fl=*&start=0&q=cool&fq=in_stock:true&facet=true&facet.query=in_stock:false&qt=dismax > > To understand the count of items not in stock, when someone has > filtered items that are in stock. Or is there a way to combine two > queries into one? > > Thanks, > David > -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Re: Multifield query parser and phrase query behaviour from 1.3 to 1.4
Actually here is the difference between the textgen analysis pipeline and our: For the phrase "ingenieur d'affaire senior" , Our pipeline gives right after our tokenizer: term position 1 2 3 4 term text ingenieur d affaire senior 'd' and 'affaire' are separated as different tokens straight away. Our filters have no later effect for this phrase. * The textgen pipeline uses a whitespace tokenizer, so it gives first: term position 1 2 3 term text ingenieur d'affaire senior term type wordwordword source start,end0,9 10,19 20,26 * Then a word delimiter filter splits the token "d'affaire" (and generate the concatenation): erm position1 2 3 4 term text ingenieur d affaire senior daffaire term type wordwordwordword word source start,end0,9 10,11 12,19 20,26 10,19 Could you see a reason why title:"d affaire" works with textgen but not with our type? Thanks! Jerome. 2009/10/27 Jérôme Etévé : > Hum, > That's probably because of our own customized types/tokenizers/filters. > > I tried reindexing and querying our data using the default solr type > 'textgen' and it works fine. > > I need to investigate which features of the new lucene 2.9 API is not > implemented in our own tokenizers etc... > > Thanks. > > Jerome. > > 2009/10/27 Yonik Seeley : >> On Tue, Oct 27, 2009 at 8:44 AM, Jérôme Etévé wrote: >>> I don't really get why these two tokens are subsequently put together >>> in a phrase query. >> >> That's the way the Lucene query parser has always worked... phrase >> queries are made if multiple tokens are produced from one field query. >> >>> In solr 1.3, it didn't seem to be a problem though. title:"d affaire" >>> matches document where title contains "d'affaire" and all is fine. >> >> This should not have changed between 1.3 and 1.4... >> What's the fieldType and it's definition for your title field? >> >> -Yonik >> http://www.lucidimagination.com >> > > > > -- > Jerome Eteve. > http://www.eteve.net > jer...@eteve.net > -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Re: Multifield query parser and phrase query behaviour from 1.3 to 1.4
Hum, That's probably because of our own customized types/tokenizers/filters. I tried reindexing and querying our data using the default solr type 'textgen' and it works fine. I need to investigate which features of the new lucene 2.9 API is not implemented in our own tokenizers etc... Thanks. Jerome. 2009/10/27 Yonik Seeley : > On Tue, Oct 27, 2009 at 8:44 AM, Jérôme Etévé wrote: >> I don't really get why these two tokens are subsequently put together >> in a phrase query. > > That's the way the Lucene query parser has always worked... phrase > queries are made if multiple tokens are produced from one field query. > >> In solr 1.3, it didn't seem to be a problem though. title:"d affaire" >> matches document where title contains "d'affaire" and all is fine. > > This should not have changed between 1.3 and 1.4... > What's the fieldType and it's definition for your title field? > > -Yonik > http://www.lucidimagination.com > -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Multifield query parser and phrase query behaviour from 1.3 to 1.4
Hi All, I'm using a multified query parser to generated weighted queries across different fields. For instance, perl developer gives me: +(title:perl^10.0 keywords:perl company:perl^3.0) +(title:developer^10.0 keywords:developer company:developer^3.0) Either in solr 1.3 or solr 1.4 (from 12 oct 2009), a query like "d'affaire" gives me: title:"d affaire"^10.0 keywords:"d affaire" company:"d affaire"^3.0 nb: "d" is not a stopword That's the first thing I don't get, since "d'affaire" is parsed as two separate tokens 'd' and 'affaire' , why these phrase queries appear? When I use the analysis interface of solr, "d'affaire" gives (for query or indexing, since the analyzer is the same): term position 1 2 term text d affaire term type wordword source start,end0,1 2,9 You can't see it in this email, but 'd' and 'affaire' are both purple, indicating a match with the query tokens. I don't really get why these two tokens are subsequently put together in a phrase query. In solr 1.3, it didn't seem to be a problem though. title:"d affaire" matches document where title contains "d'affaire" and all is fine. That's the behaviour we should expect since the title field uses exactly the same analyzer at index and query time. Since I'm using solr 1.4, title:"d affaire" does not give any results back. Is there any behaviour change that could be responsible for this, and what's the correct way to fix this? Thanks for your help. Jerome. -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Where the new replication pulls the files?
Hi all, I'm wondering where a slave pulls the files from the master on replication. Is it directly to the index/ directory or is it somewhere else before it's completed and gets copied to index? Cheers! Jerome. -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Re: QTime always a multiple of 50ms ?
2009/10/23 Andrzej Bialecki : > Jérôme Etévé wrote: >> >> Hi all, >> >> I'm using Solr trunk from 2009-10-12 and I noticed that the QTime >> result is always a multiple of roughly 50ms, regardless of the used >> handler. >> >> For instance, for the update handler, I get : >> >> INFO: [idx1] webapp=/solr path=/update/ params={} status=0 QTime=0 >> INFO: [idx1] webapp=/solr path=/update/ params={} status=0 QTime=104 >> INFO: [idx1] webapp=/solr path=/update/ params={} status=0 QTime=52 >> ... >> >> Is this a known issue ? > > It may be an issue with System.currentTimeMillis() resolution on some > platforms (e.g. Windows)? I don't know, I'm using linux 2.6.22 and a jvm 1.6.0 -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
QTime always a multiple of 50ms ?
Hi all, I'm using Solr trunk from 2009-10-12 and I noticed that the QTime result is always a multiple of roughly 50ms, regardless of the used handler. For instance, for the update handler, I get : INFO: [idx1] webapp=/solr path=/update/ params={} status=0 QTime=0 INFO: [idx1] webapp=/solr path=/update/ params={} status=0 QTime=104 INFO: [idx1] webapp=/solr path=/update/ params={} status=0 QTime=52 ... Is this a known issue ? Cheers! J. -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Disable replication on master while slaves are pulling
Hi there, I'm planning to reindex all my data on my master server everyday, so here's what I intend to do on the master: 1 - disable replication on the master 2 - Empty the index 3 - Reindex everything 4 - Optimize 5 - enable replication again There's something I'm wondering about this strategy. What would happen if a slave is not finished pulling the data when I start step 1? Is there a better strategy to achieve daily complete reindexing? Thanks! Jerome. -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Re: Is Relational Mapping (foreign key) possible in solr ??
Hi, here's what you could do: * Use multivalued fields instead of 'comma separated values', so you won't need a separator. * Store project identifiers in the user index. Denormalised projects informations in a user entry will fatally need re-indexing lot of user entries when project info changes. * You could have a mixed index with user and project entries in the same index, so if you search for a name, you'd find users and projects matching that name. Jerome. 2009/10/19 ashokcz : > > Hi i browsed through the solr docs and user forums and what i infer is we > cant use solr to store Relational > Mapping(foreign key) in solr . > > but just want to know if any chances of doing the same. > > I have two tables User table (with 1,00,000 entries ) and project table > with (200 entries ). > User table columns : userid , name ,country , location , etc. > Project tables Columns : project name , description , business unit , > project type . > Here User Location , Country , Project Name , Project business unit , > project type are faceted > A user can be mapped to multiple projects. > In solr i store the details like this > [ > { > userId:1234; > userName:ABC; > Country:US; > Location:NY; > Project Name:Project1,Project2; > Project Description:Project1,Project2; > Project business unit:unit1,unit2; > Project type:Type1,Type2 > } > ] > > With this structure i could get faceted details about both user data and > project data . > > But here i face 2 Problems . > > 1.A project can be mapped to many users say 10,000 Users . So if i change a > project name then i end > up indexing 10,000 Records which is a very time consuming work. > > 2.for Fields like Project Description i could not find any proper delimiter > . for other fields comma (,) is > > okay but being description i could not use any specific delimiter .This is > not faceted but still in search results i need to take this out and show the > project details in tabular format. and i use delimiter to split it .For > other project fields like Project Name and Type i could do it but not for > this Project Description field > > So i expect is there any way of storing the data as relational records like > in user details where we will have field called project Id and data will be > 1,2 which refers to project records primary key in solr and still preserve > the faceted approach. > > As for my knowledge my guess is it cant be done ??? > Am i correct ??? > If so then how we can achieve the solutions to my problem?? > Pls if someone could share some ideas it will be useful. > -- > View this message in context: > http://www.nabble.com/Is-Relational-Mapping-%28foreign-key%29-possible-in-solrtp25955068p25955068.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Fwd: Replication filelist command failure on container restart
-- Forwarded message -- From: Jérôme Etévé Date: 2009/10/16 Subject: Re: Replication filelist command failure on container restart To: yo...@lucidimagination.com Thanks Yonik, It works now! J. 2009/10/16 Yonik Seeley : > I think you may need to tell the replication handler to enable > replication after startup too? > >commit >startup > > -Yonik > http://www.lucidimagination.com > > > On Fri, Oct 16, 2009 at 12:58 PM, Jérôme Etévé wrote: >> Hi All, >> I'm facing a small problem with the replication handler: >> >> After restarting my master container (tomcat), >> /admin/replication/index.jsp shows me the right information, >> basically the same indexversion as before the restart (no >> commits/optimize have been done after restart): >> >> Local Index Index Version: 1255709893043, Generation: 8 >> >> However, if I query the handler with the filelist command and this >> version number : >> /replication?command=filelist&indexversion=1255709893043 , the handler >> gives me an error: >> >> invalid indexversion >> >> So I think my slaves will get confused if this information doesn't >> remain consistent after a master container restart. >> >> Is there a way to go around this problem, for instance by triggering a >> commit on startup (or reload) ? >> >> >> Thanks! >> >> Jerome. >> >> -- >> Jerome Eteve. >> http://www.eteve.net >> jer...@eteve.net >> > -- Jerome Eteve. http://www.eteve.net jer...@eteve.net -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Replication filelist command failure on container restart
Hi All, I'm facing a small problem with the replication handler: After restarting my master container (tomcat), /admin/replication/index.jsp shows me the right information, basically the same indexversion as before the restart (no commits/optimize have been done after restart): Local Index Index Version: 1255709893043, Generation: 8 However, if I query the handler with the filelist command and this version number : /replication?command=filelist&indexversion=1255709893043 , the handler gives me an error: invalid indexversion So I think my slaves will get confused if this information doesn't remain consistent after a master container restart. Is there a way to go around this problem, for instance by triggering a commit on startup (or reload) ? Thanks! Jerome. -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Solr 1.4 Release date/ lucene 2.9 API ?
Hi all, Have you planned a release date for solr 1.4? If I understood well, it will use lucene 2.9 release from last sept. 24th with a stable API? Thanks. Jerome. -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Re: Where do I need to install Solr
Solr is a separate service, in the same way a RDMS is a separate service. Whether you install it on the same machine as your webserver or not, it's logically separated from your server. Jerome. 2009/9/30 Claudio Martella : > Kevin Miller wrote: >> Does Solr have to be installed on the web server, or can I install Solr >> on a different server and access it from my web server? >> >> Kevin Miller >> Web Services >> >> > you can access it from your webserver (or browser) via HTTP/XML requests > and responses. > have a look at solr tutorial: http://lucene.apache.org/solr/tutorial.html > and this one: http://www.xml.com/lpt/a/1668 > > -- > Claudio Martella > Digital Technologies > Unit Research & Development - Engineer > > TIS innovation park > Via Siemens 19 | Siemensstr. 19 > 39100 Bolzano | 39100 Bozen > Tel. +39 0471 068 123 > Fax +39 0471 068 129 > claudio.marte...@tis.bz.it http://www.tis.bz.it > > Short information regarding use of personal data. According to Section 13 of > Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we > process your personal data in order to fulfil contractual and fiscal > obligations and also to send you information regarding our services and > events. Your personal data are processed with and without electronic means > and by respecting data subjects' rights, fundamental freedoms and dignity, > particularly with regard to confidentiality, personal identity and the right > to personal data protection. At any time and without formalities you can > write an e-mail to priv...@tis.bz.it in order to object the processing of > your personal data for the purpose of sending advertising materials and also > to exercise the right to access personal data and other rights referred to in > Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation > Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete > information on the web site www.tis.bz.it. > > > -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Re: delay while adding document to solr index
Hi, - Try to let solr do the commits for you (setting up autocommit feature). (and stop committing after inserting one document). This should greatly improve the delays you're experiencing. - If you do not optimize, it's normal your index size only grows. Optimize once regularly when your load is minimal. Jerome. 2009/9/30 swapna_here : > > thanks again for your immediate response > > yes, i am running the commit after a document is indexed > > here i don't understand why my index size is increased to 625MB(for the > 10 documents) > which was previously 250MB > is this due to i have not optimized at all my index or since i am adding > documents individually > > i need solution for this urgently > thanks a lot > -- > View this message in context: > http://www.nabble.com/delay-while-adding-document-to-solr-index-tp25676777p25679463.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
init parameters for queryParser
Hi all, I've got my own query parser plugin defined thanks to the queryParser tag: The QParserPlugin class has got an init method like this: public void init(NamedList args); Where and how do I put my args to be passed to init for my query parser plugin? I'm trying value1 value1 But I'm not sure if it's the right way. Could we also update the wiki about this? http://wiki.apache.org/solr/SolrPlugins#QParserPlugin Jerome. -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
What options would you recommend for the Sun JVM?
Hi solr addicts, I know there's no one size fits all set of options for the sun JVM, but I think It'd be useful to everyone to share your tips on using the sun JVM with solr. For instance, I recently figured out that setting the tenured generation garbage collection to Concurrent mark and sweep ( -XX:+UseConcMarkSweepGC ) have dramatically decreased the amount of time java hangs on tenured gen. garbage collecting. On my settings, the old gen. garbage collection went from big time chunks of 1~2 second to multiple small slices of ~0.2 s. As a result, the commits (hence the searcher drop/rebuild) are much less painful from the application performance point of view. What are the other options you would recommend? Cheers! Jerome. -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Re: do NOT want to stem plurals for a particular field, or words
Hi, You can enable/disable stemming per field type in the schema.xml, by removing the stemming filters from the type definition. Basically, copy your prefered type, rename it to something like 'text_nostem', remove the stemming filter from the type and use your 'text_nostem' type for your field 'type' . By what you say, I guess your field 'type' will be even more happier to simply be of type 'string' . Jerome. 2009/9/15 DHast : > > I have a field where there are items that are plurals, and used as very > specific locators, so i do a solr search type:articles, and it translates it > into : type:article, then into type:articl... is tehre a way to stop it from > doing this on either the field "type" or on a list of words "articles, > notes, etc" > > i tried enering into the protwords.txt file and dont seem to get any where > -- > View this message in context: > http://www.nabble.com/do-NOT-want-to-stem-plurals-for-a-particular-field%2C-or-words-tp25455570p25455570.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Best strategy to commit often under load.
Hi all, I've got a solr server under significant load ( ~40/s ) and a single process which can potentially commit as often as possible. Typically, when it commits every 5 or 10s, my solr server slows down quite a lot and this can lead to congestion problems on my client side. What would you recommend in this situation, is it better to leave solr performs the commits automatically with reasonable autocommit parameters? What are solr's best practices concerning this point? Thanks for your help! Jerome. -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Re: Implementing customized Scorer with solr API 1.4
Hi , Thanks for your help. So do I have to do: public Scorer scorer(IndexReader reader) throws IOException { SolrIndexReader solrReader = (SolrIndexReader) reader; int offset = solrReader.getBase() ; Or is it a bit more complex than that? Jerome. 2009/8/20 Mark Miller : > Jérôme Etévé wrote: >> Hi all, >> >> I'm kind of struggling with a customized lucene.Scorer of mine, since >> I use solr 1.4. >> >> Here's the problem: >> >> I wrote a DocSetQuery which inherit from a lucene.Query. This query >> is a decorator for a lucene.Query that filters out the documents which >> are not in a given set of predefined documents (a solr.DocSet which I >> call docset ). >> >> So In my Weight / Scorer, I implemented the method nextDoc like that: >> >> public int nextDoc() throws IOException { >> do { >> if (decoScorer.nextDoc() == NO_MORE_DOCS) { >> return NO_MORE_DOCS; >> } >> // DO THIS UNTIL the doc is in the docset >> } while (!docset.exists(decoScorer.docID())); >> return decoScorer.docID(); >> } >> >> The decoScorer here is the decorated scorer. >> >> My problem here is that in docset, there are 'absolute' documents IDs, >> but now solr uses a number of sub readers each with a kind of offset, >> so decoScorer.docID() gives 'relative' document ID . Because of this, >> I happen to test relative document IDs against a set of absolute >> docIDs. >> >> So my DocSetQuery does not work anymore. The solution would be I think >> to have a way of getting the offset of the SolrReader being used in >> the context to be able to do docset.exists(decoScorer.docID() + >> offset) . >> >> But how can I get this offset? >> The scorer is built with a lucene.IndexReader in parameter: >> public Scorer scorer(IndexReader reader) . >> >> Within solr, this IndexReader happens to be an instance of >> SolrIndexReader so I though maybe I could downcast reader to a >> SolrIndexReader to be able to call the offset related methods on it >> (getBase() etc...). >> > It may not feel super clean, but it should be fine - Solr always uses a > SolrIndexSearcher which always wraps all of the IndexReaders in > SolrIndexReader. I'm fairly sure anyway ;) > > By getting the base of the subreader wihtin the top reader, you can add > it to the doc id to get the top reader doc id. >> I feel quite unconfortable with this solution since my DocSetQuery >> inherits from a lucene thing, so it would be quite odd to downcast >> something to a solr class inside it, plus I didn't really figured out >> how to use those offset related methods. >> >> Thanks for your help! >> >> All the best! >> >> Jerome Eteve. >> >> > > > -- > - Mark > > http://www.lucidimagination.com > > > > -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Implementing customized Scorer with solr API 1.4
Hi all, I'm kind of struggling with a customized lucene.Scorer of mine, since I use solr 1.4. Here's the problem: I wrote a DocSetQuery which inherit from a lucene.Query. This query is a decorator for a lucene.Query that filters out the documents which are not in a given set of predefined documents (a solr.DocSet which I call docset ). So In my Weight / Scorer, I implemented the method nextDoc like that: public int nextDoc() throws IOException { do { if (decoScorer.nextDoc() == NO_MORE_DOCS) { return NO_MORE_DOCS; } // DO THIS UNTIL the doc is in the docset } while (!docset.exists(decoScorer.docID())); return decoScorer.docID(); } The decoScorer here is the decorated scorer. My problem here is that in docset, there are 'absolute' documents IDs, but now solr uses a number of sub readers each with a kind of offset, so decoScorer.docID() gives 'relative' document ID . Because of this, I happen to test relative document IDs against a set of absolute docIDs. So my DocSetQuery does not work anymore. The solution would be I think to have a way of getting the offset of the SolrReader being used in the context to be able to do docset.exists(decoScorer.docID() + offset) . But how can I get this offset? The scorer is built with a lucene.IndexReader in parameter: public Scorer scorer(IndexReader reader) . Within solr, this IndexReader happens to be an instance of SolrIndexReader so I though maybe I could downcast reader to a SolrIndexReader to be able to call the offset related methods on it (getBase() etc...). I feel quite unconfortable with this solution since my DocSetQuery inherits from a lucene thing, so it would be quite odd to downcast something to a solr class inside it, plus I didn't really figured out how to use those offset related methods. Thanks for your help! All the best! Jerome Eteve. -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Re: Writing and using your own Query class in solr 1.4 (trunk)
That's right. I just had another decorator which was not adapted for the new API. My fault .. Thanks, Jerome. 2009/8/18 Mark Miller : > I'm pretty sure one of them is called. In the version you have: > > public void search(Query query, HitCollector results) > throws IOException { > search(createQueryWeight(query), null, new HitCollectorWrapper(results)); > } > > protected QueryWeight createQueryWeight(Query query) throws IOException { > return query.queryWeight(this); > } > > > Query.queryWeight will in turn call Query.createQueryWight (either for your > Query, or for the primitive Query > it rewrites itself too). > > > > -- > - Mark > > http://www.lucidimagination.com > > > > Jérôme Etévé wrote: >> >> Hi Mark, >> >> >> Thanks for clarifying this. So should I keep both sets of method >> implemented? I guess it won't hurt when solr trunk will use the >> updated version of lucene without those methods. >> >> What I don't get is that neither my createWeight or createQueryWeight >> methods seem to be called when I call >> rb.req.getSearcher().search(limitedQuery, myCollector); >> >> I'll look at the code to find out. >> >> Thanks! >> >> Jerome >> >> 2009/8/18 Mark Miller : >> >>> >>> You have run into some stuff that has been somewhat rolled back in >>> Lucene. >>> >>> QueryWieght, and the methods it brought have been reverted. >>> >>> Shortly (when Solr trunk updates Lucene), Solr will go back to just >>> createWeight and weight. >>> >>> The main change that will be left is that Weight will be an abstract >>> class >>> rather than an interface. >>> >>> >>> -- >>> - Mark >>> >>> http://www.lucidimagination.com >>> >>> Jérôme Etévé wrote: >>> >>>> >>>> Hi all, >>>> >>>> I have a custom search component which uses a query I wrote. >>>> Basically, this Query (called DocSetQuery) is a Query decorator that >>>> skips any document which is not in a given document set. My code used >>>> to work perfectly in solr 1.3 but in solr 1.4, it seems that my >>>> DocSetQuery has lost all its power. >>>> >>>> I noticed that to be compliant with solr 1.4 trunk and the lucene it >>>> contains, I should implement two new methods: >>>> >>>> createQueryWeight >>>> and >>>> queryWeight >>>> >>>> So I did. It was very easy, because basically it's only about re-using >>>> the deprecated Weight createWeight and wrapping the result with a >>>> QueryWeightWrapper. >>>> >>>> So now I believe my DocSetQuery complies with the new >>>> solr1.4/lucene2.9-dev api. And I've got those methods: >>>> >>>> public QueryWeight queryWeight(Searcher searcher) throws IOException { >>>> return createQueryWeight(searcher); >>>> } >>>> public QueryWeight createQueryWeight(Searcher searcher) throws >>>> IOException >>>> { >>>> log.info("[sponsoring] creating QueryWeight calling createQueryWeight >>>> "); >>>> return new QueryWeightWrapper(createWeight(searcher)); >>>> } >>>> public Weight weight(Searcher searcher) throws IOException { >>>> return createWeight(searcher); >>>> } >>>> >>>> //and of course >>>> >>>> protected Weight createWeight(final Searcher searcher) throws >>>> IOException >>>> { >>>> log.info("[sponsoring] creating weight with DoCset " + docset.size()); >>>> ... >>>> } >>>> >>>> I'm then using my DocSetQuery in my custom SearchComponent like that: >>>> >>>> Query limitedQuery = new DocSetQuery(decoratedQuery , ... ); >>>> >>>> Then I simply perform a search by doing >>>> >>>> rb.req.getSearcher().search(limitedQuery, myCollector); >>>> >>>> My problem is neither of createQueryWeight or createWeight is called >>>> by the solr Searcher, and I'm wondering what I did wrong. >>>> Should I build the Weight myself and call the search method which >>>> accepts a Weight object? >>>> >>>> This is quite confusing because: >>>> - it used to work perfectly in solr 1.3 >>>> - in the nightly build version of lucene API, those new methods >>>> createQueryWeight and queryWeight have disappeared but with the lucene >>>> solr1.4trunk uses, they exists plus the old ones ( createWeight and >>>> weight) are deprecated. >>>> >>>> >>>> Thanks for your help. >>>> >>>> Jerome Eteve. >>>> >>>> >>> >>> >>> >>> >> >> >> >> > > > > > -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Re: Writing and using your own Query class in solr 1.4 (trunk)
Hi Mark, Thanks for clarifying this. So should I keep both sets of method implemented? I guess it won't hurt when solr trunk will use the updated version of lucene without those methods. What I don't get is that neither my createWeight or createQueryWeight methods seem to be called when I call rb.req.getSearcher().search(limitedQuery, myCollector); I'll look at the code to find out. Thanks! Jerome 2009/8/18 Mark Miller : > You have run into some stuff that has been somewhat rolled back in Lucene. > > QueryWieght, and the methods it brought have been reverted. > > Shortly (when Solr trunk updates Lucene), Solr will go back to just > createWeight and weight. > > The main change that will be left is that Weight will be an abstract class > rather than an interface. > > > -- > - Mark > > http://www.lucidimagination.com > > Jérôme Etévé wrote: >> >> Hi all, >> >> I have a custom search component which uses a query I wrote. >> Basically, this Query (called DocSetQuery) is a Query decorator that >> skips any document which is not in a given document set. My code used >> to work perfectly in solr 1.3 but in solr 1.4, it seems that my >> DocSetQuery has lost all its power. >> >> I noticed that to be compliant with solr 1.4 trunk and the lucene it >> contains, I should implement two new methods: >> >> createQueryWeight >> and >> queryWeight >> >> So I did. It was very easy, because basically it's only about re-using >> the deprecated Weight createWeight and wrapping the result with a >> QueryWeightWrapper. >> >> So now I believe my DocSetQuery complies with the new >> solr1.4/lucene2.9-dev api. And I've got those methods: >> >> public QueryWeight queryWeight(Searcher searcher) throws IOException { >> return createQueryWeight(searcher); >> } >> public QueryWeight createQueryWeight(Searcher searcher) throws IOException >> { >> log.info("[sponsoring] creating QueryWeight calling createQueryWeight "); >> return new QueryWeightWrapper(createWeight(searcher)); >> } >> public Weight weight(Searcher searcher) throws IOException { >> return createWeight(searcher); >> } >> >> //and of course >> >> protected Weight createWeight(final Searcher searcher) throws IOException >> { >> log.info("[sponsoring] creating weight with DoCset " + docset.size()); >> ... >> } >> >> I'm then using my DocSetQuery in my custom SearchComponent like that: >> >> Query limitedQuery = new DocSetQuery(decoratedQuery , ... ); >> >> Then I simply perform a search by doing >> >> rb.req.getSearcher().search(limitedQuery, myCollector); >> >> My problem is neither of createQueryWeight or createWeight is called >> by the solr Searcher, and I'm wondering what I did wrong. >> Should I build the Weight myself and call the search method which >> accepts a Weight object? >> >> This is quite confusing because: >> - it used to work perfectly in solr 1.3 >> - in the nightly build version of lucene API, those new methods >> createQueryWeight and queryWeight have disappeared but with the lucene >> solr1.4trunk uses, they exists plus the old ones ( createWeight and >> weight) are deprecated. >> >> >> Thanks for your help. >> >> Jerome Eteve. >> > > > > > -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Writing and using your own Query class in solr 1.4 (trunk)
Hi all, I have a custom search component which uses a query I wrote. Basically, this Query (called DocSetQuery) is a Query decorator that skips any document which is not in a given document set. My code used to work perfectly in solr 1.3 but in solr 1.4, it seems that my DocSetQuery has lost all its power. I noticed that to be compliant with solr 1.4 trunk and the lucene it contains, I should implement two new methods: createQueryWeight and queryWeight So I did. It was very easy, because basically it's only about re-using the deprecated Weight createWeight and wrapping the result with a QueryWeightWrapper. So now I believe my DocSetQuery complies with the new solr1.4/lucene2.9-dev api. And I've got those methods: public QueryWeight queryWeight(Searcher searcher) throws IOException { return createQueryWeight(searcher); } public QueryWeight createQueryWeight(Searcher searcher) throws IOException { log.info("[sponsoring] creating QueryWeight calling createQueryWeight "); return new QueryWeightWrapper(createWeight(searcher)); } public Weight weight(Searcher searcher) throws IOException { return createWeight(searcher); } //and of course protected Weight createWeight(final Searcher searcher) throws IOException { log.info("[sponsoring] creating weight with DoCset " + docset.size()); ... } I'm then using my DocSetQuery in my custom SearchComponent like that: Query limitedQuery = new DocSetQuery(decoratedQuery , ... ); Then I simply perform a search by doing rb.req.getSearcher().search(limitedQuery, myCollector); My problem is neither of createQueryWeight or createWeight is called by the solr Searcher, and I'm wondering what I did wrong. Should I build the Weight myself and call the search method which accepts a Weight object? This is quite confusing because: - it used to work perfectly in solr 1.3 - in the nightly build version of lucene API, those new methods createQueryWeight and queryWeight have disappeared but with the lucene solr1.4trunk uses, they exists plus the old ones ( createWeight and weight) are deprecated. Thanks for your help. Jerome Eteve. -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Re: facet performance tips
Thanks everyone for your advices. I increased my filterCache, and the faceting performances improved greatly. My faceted field can have at the moment ~4 different terms, so I did set a filterCache size of 5 and it works very well. However, I'm planning to increase the number of terms to maybe around 500 000, so I guess this approach won't work anymore, as I doubt a 500 000 sized fieldCache would work. So I guess my best move would be to upgrade to the soon to be 1.4 version of solr to benefit from its new faceting method. I know this is a bit off-topic, but do you have a rough idea about when 1.4 will be an official release? As well, is the current trunk OK for production? Is it compatible with 1.3 configuration files? Thanks ! Jerome. 2009/8/13 Stephen Duncan Jr : > Note that depending on the profile of your field (full text and how many > unique terms on average per document), the improvements from 1.4 may not > apply, as you may exceed the limits of the new faceting technique in Solr > 1.4. > -Stephen > > On Wed, Aug 12, 2009 at 2:12 PM, Erik Hatcher wrote: > >> Yes, increasing the filterCache size will help with Solr 1.3 performance. >> >> Do note that trunk (soon Solr 1.4) has dramatically improved faceting >> performance. >> >>Erik >> >> >> On Aug 12, 2009, at 1:30 PM, Jérôme Etévé wrote: >> >> Hi everyone, >>> >>> I'm using some faceting on a solr index containing ~ 160K documents. >>> I perform facets on multivalued string fields. The number of possible >>> different values is quite large. >>> >>> Enabling facets degrades the performance by a factor 3. >>> >>> Because I'm using solr 1.3, I guess the facetting makes use of the >>> filter cache to work. My filterCache is set >>> to a size of 2048. I also noticed in my solr stats a very small ratio >>> of cache hit (~ 0.01%). >>> >>> Can it be the reason why the faceting is slow? Does it make sense to >>> increase the filterCache size so it matches more or less the number >>> of different possible values for the faceted fields? Would that not >>> make the memory usage explode? >>> >>> Thanks for your help ! >>> >>> -- >>> Jerome Eteve. >>> >>> Chat with me live at http://www.eteve.net >>> >>> jer...@eteve.net >>> >> >> > > > -- > Stephen Duncan Jr > www.stephenduncanjr.com > -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Re: Solr 1.3 and JDK1.6
Hi, I'm running solr 1.3 with java -version java version "1.6..." . No problem to report. Cheers. J 2009/8/12 vaibhav joshi : > > Hi > > I am using Solr 1.3 ( official released version) and JDk1.5. My company is > moving towards upgrading all systems to JDK1.6. is it safe to upgrade to > JDK1.6 with Solr 1.3 wars? Are there any compatible issues with JDK1.6? > > Thanks > Vaibhav > > _ > Sports, news, fashion and entertainment. Pick it all up in a package called > MSN India > http://in.msn.com -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
facet performance tips
Hi everyone, I'm using some faceting on a solr index containing ~ 160K documents. I perform facets on multivalued string fields. The number of possible different values is quite large. Enabling facets degrades the performance by a factor 3. Because I'm using solr 1.3, I guess the facetting makes use of the filter cache to work. My filterCache is set to a size of 2048. I also noticed in my solr stats a very small ratio of cache hit (~ 0.01%). Can it be the reason why the faceting is slow? Does it make sense to increase the filterCache size so it matches more or less the number of different possible values for the faceted fields? Would that not make the memory usage explode? Thanks for your help ! -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Re: Synonym aware string field typ
2009/8/4 Otis Gospodnetic : > Yes, you need to specify one or the other then, index-time or query-time, > depending on where you want your synonyms to kick in. Ok great. Thx ! > Eh, hitting reply to this email used your personal email instead of > solr-user@lucene.apache.org . Eh eh. Making it hard for people replying to > keep the discussion on the list without doing extra work It did the same for me with your message. I had to click 'reply all' . Maybe it's a gmail problem. J. > > Otis > -- > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > > > > - Original Message >> From: Jérôme Etévé >> To: Otis Gospodnetic >> Cc: solr-user@lucene.apache.org >> Sent: Tuesday, August 4, 2009 12:39:33 PM >> Subject: Re: Synonym aware string field typ >> >> Hi Otis, >> >> Thanks. Yep, this synonym behaviour is the one I want. >> >> So if I don't want the synonyms to be applied at index time, I need >> to specify an index time analyzer right ? >> >> Jerome. >> >> >> 2009/8/4 Otis Gospodnetic : >> > Hi, >> > >> > KeywordTokenizer will not tokenize your string. I have a feeling that >> > won't >> work with synonyms, unless your field value entirely match a synonym. Maybe >> an >> example would help: >> > >> > If you have: >> > foo canine bar >> > Then KeywordTokenizer won't break this into 3 tokens. >> > And then canine/dog synonym won't work. >> > >> > Yes, if you define the analyzer like that, it will be used both at index >> > and >> query time. >> > >> > Otis >> > -- >> > Sematext is hiring -- http://sematext.com/about/jobs.html?mls >> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR >> > >> > >> > >> > - Original Message >> >> From: Jérôme Etévé >> >> To: solr-user@lucene.apache.org >> >> Sent: Tuesday, August 4, 2009 7:33:28 AM >> >> Subject: Synonym aware string field typ >> >> >> >> Hi all, >> >> >> >> I'd like to have a string type which is synonym aware at query time. >> >> Is it ok to have something like that: >> >> >> >> >> >> >> >> >> >> >> >> tokenizerFactory="solr.KeywordTokenizerFactory" >> >> synonyms="my_synonyms.txt" ignoreCase="true"/> >> >> >> >> >> >> >> >> >> >> >> >> My questions are: >> >> >> >> - Will the index time analyzer stay the default for the type >> >> solr.StrField . >> >> - Is the KeywordTokenizerFactory the right one to use for the query >> >> time analyzer ? >> >> >> >> Cheers! >> >> >> >> Jerome. >> >> >> >> -- >> >> Jerome Eteve. >> >> >> >> Chat with me live at http://www.eteve.net >> >> >> >> jer...@eteve.net >> > >> > >> >> >> >> -- >> Jerome Eteve. >> >> Chat with me live at http://www.eteve.net >> >> jer...@eteve.net > > -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Re: Synonym aware string field typ
Hi Otis, Thanks. Yep, this synonym behaviour is the one I want. So if I don't want the synonyms to be applied at index time, I need to specify an index time analyzer right ? Jerome. 2009/8/4 Otis Gospodnetic : > Hi, > > KeywordTokenizer will not tokenize your string. I have a feeling that won't > work with synonyms, unless your field value entirely match a synonym. Maybe > an example would help: > > If you have: > foo canine bar > Then KeywordTokenizer won't break this into 3 tokens. > And then canine/dog synonym won't work. > > Yes, if you define the analyzer like that, it will be used both at index and > query time. > > Otis > -- > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > > > > - Original Message >> From: Jérôme Etévé >> To: solr-user@lucene.apache.org >> Sent: Tuesday, August 4, 2009 7:33:28 AM >> Subject: Synonym aware string field typ >> >> Hi all, >> >> I'd like to have a string type which is synonym aware at query time. >> Is it ok to have something like that: >> >> >> >> >> >> tokenizerFactory="solr.KeywordTokenizerFactory" >> synonyms="my_synonyms.txt" ignoreCase="true"/> >> >> >> >> >> >> My questions are: >> >> - Will the index time analyzer stay the default for the type solr.StrField . >> - Is the KeywordTokenizerFactory the right one to use for the query >> time analyzer ? >> >> Cheers! >> >> Jerome. >> >> -- >> Jerome Eteve. >> >> Chat with me live at http://www.eteve.net >> >> jer...@eteve.net > > -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Synonym aware string field typ
Hi all, I'd like to have a string type which is synonym aware at query time. Is it ok to have something like that: My questions are: - Will the index time analyzer stay the default for the type solr.StrField . - Is the KeywordTokenizerFactory the right one to use for the query time analyzer ? Cheers! Jerome. -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Faceting in more like this
Hi all, Is there a way to enable faceting when using a more like this handler? I'd like to have facets from my similar documents. Cheers ! J. -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Reasonable number of maxWarming searchers
Hi All, I'm planning to have a certain number of processes posting independently in a solr instance. This instance will solely act as a master instance. No clients queries on it. Is there a problem if i set maxWarmingSearchers to something like 30 or 40? Also, how do I disable the cache warming? Is setting autowarmCount's to 0 enough? Regards, Jerome. -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Re: Mailing list: Change the reply too ?
2009/7/30 Erik Hatcher : > > On Jul 30, 2009, at 1:44 PM, Jérôme Etévé wrote: > >> Hi all, >> >> I don't know if it does the same from everyone, but when I use the >> reply function of my mail agent, it sets the recipient to the user who >> sent the message, and not the mailing list. >> >> So it's quite annoying cause I have to change the recipient each time >> I reply to someone on the list. >> >> Can the list admins fix this issue ? > > All my replies go to the list. > > From your message, the header says: > > Reply-To: solr-user@lucene.apache.org > >Erik It works with your messages. It might depends on mail agents. Jerome. -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Mailing list: Change the reply too ?
Hi all, I don't know if it does the same from everyone, but when I use the reply function of my mail agent, it sets the recipient to the user who sent the message, and not the mailing list. So it's quite annoying cause I have to change the recipient each time I reply to someone on the list. Can the list admins fix this issue ? Cheers ! J. -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Re: Posting data in JSON
Hi, Nope, I'm not using solrj (my client code is in Perl), and I'm with solr 1.3. J. 2009/7/30 Shalin Shekhar Mangar : > On Thu, Jul 30, 2009 at 8:31 PM, Jérôme Etévé > wrote: >> >> Hi All, >> >> I'm wondering if it's possible to post documents to solr in JSON format. >> >> JSON is much faster than XML to get the queries results, so I think >> it'd be great to be able to post data in JSON to speed up the indexing >> and lower the network load. > > If you are using Java,Solrj on 1.4 (trunk), you can use the binary format > which is extremely compact and efficient. Note that with Solr/Solrj 1.3, > binary became the default response format for Solrj clients. > > -- > Regards, > Shalin Shekhar Mangar. > -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Posting data in JSON
Hi All, I'm wondering if it's possible to post documents to solr in JSON format. JSON is much faster than XML to get the queries results, so I think it'd be great to be able to post data in JSON to speed up the indexing and lower the network load. All the best ! Jerome Eteve. -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Synchronisation problem with replication
Hi All, I've got here a small problem about replication. Let's say I post a document on the master server, and the slaves do a snappuller/installer via crontab every 1 minutes. Then between in average 30 seconds, all my search servers are not synchronized. Is there a way to improve this situation ? Cheers !!! J. -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Re: Disable unique-key for Solr index
Hi ! Is there any " primary table " in your view with a unique single key you could use ? J. 2009/5/11 jcott28 : > > I have a case where I would like a solr index created which disables the > unique-key option. > > I've tried commenting out the option and that just spits out an > error: > > SEVERE: org.apache.solr.common.SolrException: QueryElevationComponent > requires the schema to have a uniqueKeyField > > > I've tried something like this : > > Nothing seems to do the trick. > > The problem with a unique key is that the uniqueness for my results are > actually based on all the fields in my document. There isn't one specific > field which is unique. All the fields combined are unique though (they are > taken directly from a View inside an RDBMS whose primary key is all of the > columns). > > Any help would be greatly appreciated! > > Thanks, > Jeff > > -- > View this message in context: > http://www.nabble.com/Disable-unique-key-for-Solr-index-tp23487249p23487249.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Concurrent run of snapshot scripts.
Hi Everyone, I'm running solr 1.3 and I was wondering if there's a problem with running the snapshot script concurrently . For instance, I have a cron job which performs a snappuller/snapinstaller every minute on my slave servers. Sometime (for instance after an optimize), the snappuller can take more than one minute. Is that a problem if another snappuller is spawned whilst another one older than one minute is still running ? Cheers !! Jerome Eteve. -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Re: Very long commit time.
On Wed, Mar 4, 2009 at 1:21 PM, Yonik Seeley wrote: > On Wed, Mar 4, 2009 at 5:25 AM, Jérôme Etévé wrote: >> Great, >> >> It went down to less than 10 secs now :) >> What I don't really understand is that my autowarmCount were pretty >> low ( like 128 ) and still the autowarming of the caches were very >> slow. >> >> Can you explain more why it can be that slow ? > > One possibility is a lack of physical memory available to the OS for > caching reads on both the old index and the new index. This would > cause all of the queries to be slower if they ended up doing real disk > IO for each query/filter being warmed. Strange, we've got plenty of memory on this box and the swap is zero. But well, I'm happy we went around the problem. What's your experience with commits with ~10M docs ( and ~128 autowarming count in caches ) ? Cheers. Jerome. -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Re: Very long commit time.
Great, It went down to less than 10 secs now :) What I don't really understand is that my autowarmCount were pretty low ( like 128 ) and still the autowarming of the caches were very slow. Can you explain more why it can be that slow ? Cheers ! Jerome. On Tue, Mar 3, 2009 at 8:00 PM, Yonik Seeley wrote: > Looks like cache autowarming. > If you have statically defined warming queries in solrconfig.xml, you > could try setting autowarmCount=0 for all the caches. > > -Yonik > http://www.lucidimagination.com > > > On Tue, Mar 3, 2009 at 2:37 PM, Jérôme Etévé wrote: >> Dear solr fans, >> >> I have a solr index of roughly 8M docs and I have here a little >> problem when I commit some insertion into it. >> >> The insert itself is very fast, but my commit takes 163 seconds. >> >> Here's the solr trace the commit leaves: >> >> INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true) >> 03-Mar-2009 20:20:35 org.apache.solr.search.SolrIndexSearcher >> INFO: Opening searc...@7de212f9 main >> 03-Mar-2009 20:20:35 org.apache.solr.update.DirectUpdateHandler2 commit >> INFO: end_commit_flush >> 03-Mar-2009 20:20:35 org.apache.solr.search.SolrIndexSearcher warm >> INFO: autowarming searc...@7de212f9 main from searc...@732d8b11 main >> >> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=16,evictions=0,size=16,warmupTime=71641,cumulative_lookups=90,cumulative_hits=68,cumulative_hitratio=0.75,cumulative_inserts=22,cumulative_evictions=0} >> 03-Mar-2009 20:21:52 org.apache.solr.search.SolrIndexSearcher warm >> INFO: autowarming result for searc...@7de212f9 main >> >> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=16,evictions=0,size=16,warmupTime=76905,cumulative_lookups=90,cumulative_hits=68,cumulative_hitratio=0.75,cumulative_inserts=22,cumulative_evictions=0} >> 03-Mar-2009 20:21:52 org.apache.solr.search.SolrIndexSearcher warm >> INFO: autowarming searc...@7de212f9 main from searc...@732d8b11 main >> >> queryResultCache{lookups=24,hits=24,hitratio=1.00,inserts=32,evictions=0,size=32,warmupTime=82406,cumulative_lookups=6310,cumulative_hits=268,cumulative_hitratio=0.04,cumulative_inserts=6041,cumulative_evictions=5522} >> 03-Mar-2009 20:23:17 org.apache.solr.search.SolrIndexSearcher warm >> INFO: autowarming result for searc...@7de212f9 main >> >> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=32,evictions=0,size=32,warmupTime=85591,cumulative_lookups=6310,cumulative_hits=268,cumulative_hitratio=0.04,cumulative_inserts=6041,cumulative_evictions=5522} >> 03-Mar-2009 20:23:17 org.apache.solr.search.SolrIndexSearcher warm >> INFO: autowarming searc...@7de212f9 main from searc...@732d8b11 main >> >> documentCache{lookups=720,hits=710,hitratio=0.98,inserts=40,evictions=0,size=40,warmupTime=0,cumulative_lookups=415308,cumulative_hits=283661,cumulative_hitratio=0.68,cumulative_inserts=131647,cumulative_evictions=131105} >> 03-Mar-2009 20:23:17 org.apache.solr.search.SolrIndexSearcher warm >> INFO: autowarming result for searc...@7de212f9 main >> >> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=415308,cumulative_hits=283661,cumulative_hitratio=0.68,cumulative_inserts=131647,cumulative_evictions=131105} >> 03-Mar-2009 20:23:17 org.apache.solr.core.QuerySenderListener newSearcher >> INFO: QuerySenderListener sending requests to searc...@7de212f9 main >> >> // Then the few warm up queries defined in solrconfig.xml >> >> INFO: Closing searc...@732d8b11 main >> >> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=16,evictions=0,size=16,warmupTime=71641,cumulative_lookups=90,cumulative_hits=68,cumulative_hitratio=0.75,cumulative_inserts=22,cumulative_evictions=0} >> >> queryResultCache{lookups=24,hits=24,hitratio=1.00,inserts=32,evictions=0,size=32,warmupTime=82406,cumulative_lookups=6310,cumulative_hits=268,cumulative_hitratio=0.04,cumulative_inserts=6041,cumulative_evictions=5522} >> >> documentCache{lookups=720,hits=710,hitratio=0.98,inserts=40,evictions=0,size=40,warmupTime=0,cumulative_lookups=415308,cumulative_hits=283661,cumulative_hitratio=0.68,cumulative_inserts=131647,cumulative_evictions=131105} >> 03-Mar-2009 20:23:18 org.apache.solr.update.processor.LogUpdateProcessor >> finish >> INFO: {commit=} 0 163189 >> 03-Mar-2009 20:23:18 org.apache.solr.core.SolrCore execute >> INFO: [jobs] webapp=/cjsolr path=/update/ params={} status=0 QTime=163189 >> >> >> I'm sure I'm doing something wrong. Does this 163 seconds commit time >> have to do with the commit parameters : >> (optimize=false,waitFlush=false,waitSearcher=true) ?? >> >> Thanks for any help. >> >> Cheers !! >> >> Jerome. >> >> -- >> Jerome Eteve. >> >> Chat with me live at http://www.eteve.net >> >> jer...@eteve.net >> > -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Very long commit time.
Dear solr fans, I have a solr index of roughly 8M docs and I have here a little problem when I commit some insertion into it. The insert itself is very fast, but my commit takes 163 seconds. Here's the solr trace the commit leaves: INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true) 03-Mar-2009 20:20:35 org.apache.solr.search.SolrIndexSearcher INFO: Opening searc...@7de212f9 main 03-Mar-2009 20:20:35 org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush 03-Mar-2009 20:20:35 org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming searc...@7de212f9 main from searc...@732d8b11 main filterCache{lookups=0,hits=0,hitratio=0.00,inserts=16,evictions=0,size=16,warmupTime=71641,cumulative_lookups=90,cumulative_hits=68,cumulative_hitratio=0.75,cumulative_inserts=22,cumulative_evictions=0} 03-Mar-2009 20:21:52 org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for searc...@7de212f9 main filterCache{lookups=0,hits=0,hitratio=0.00,inserts=16,evictions=0,size=16,warmupTime=76905,cumulative_lookups=90,cumulative_hits=68,cumulative_hitratio=0.75,cumulative_inserts=22,cumulative_evictions=0} 03-Mar-2009 20:21:52 org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming searc...@7de212f9 main from searc...@732d8b11 main queryResultCache{lookups=24,hits=24,hitratio=1.00,inserts=32,evictions=0,size=32,warmupTime=82406,cumulative_lookups=6310,cumulative_hits=268,cumulative_hitratio=0.04,cumulative_inserts=6041,cumulative_evictions=5522} 03-Mar-2009 20:23:17 org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for searc...@7de212f9 main queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=32,evictions=0,size=32,warmupTime=85591,cumulative_lookups=6310,cumulative_hits=268,cumulative_hitratio=0.04,cumulative_inserts=6041,cumulative_evictions=5522} 03-Mar-2009 20:23:17 org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming searc...@7de212f9 main from searc...@732d8b11 main documentCache{lookups=720,hits=710,hitratio=0.98,inserts=40,evictions=0,size=40,warmupTime=0,cumulative_lookups=415308,cumulative_hits=283661,cumulative_hitratio=0.68,cumulative_inserts=131647,cumulative_evictions=131105} 03-Mar-2009 20:23:17 org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for searc...@7de212f9 main documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=415308,cumulative_hits=283661,cumulative_hitratio=0.68,cumulative_inserts=131647,cumulative_evictions=131105} 03-Mar-2009 20:23:17 org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener sending requests to searc...@7de212f9 main // Then the few warm up queries defined in solrconfig.xml INFO: Closing searc...@732d8b11 main filterCache{lookups=0,hits=0,hitratio=0.00,inserts=16,evictions=0,size=16,warmupTime=71641,cumulative_lookups=90,cumulative_hits=68,cumulative_hitratio=0.75,cumulative_inserts=22,cumulative_evictions=0} queryResultCache{lookups=24,hits=24,hitratio=1.00,inserts=32,evictions=0,size=32,warmupTime=82406,cumulative_lookups=6310,cumulative_hits=268,cumulative_hitratio=0.04,cumulative_inserts=6041,cumulative_evictions=5522} documentCache{lookups=720,hits=710,hitratio=0.98,inserts=40,evictions=0,size=40,warmupTime=0,cumulative_lookups=415308,cumulative_hits=283661,cumulative_hitratio=0.68,cumulative_inserts=131647,cumulative_evictions=131105} 03-Mar-2009 20:23:18 org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {commit=} 0 163189 03-Mar-2009 20:23:18 org.apache.solr.core.SolrCore execute INFO: [jobs] webapp=/cjsolr path=/update/ params={} status=0 QTime=163189 I'm sure I'm doing something wrong. Does this 163 seconds commit time have to do with the commit parameters : (optimize=false,waitFlush=false,waitSearcher=true) ?? Thanks for any help. Cheers !! Jerome. -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Collection distribution in a multicore environment
Hi fellow Solr fans, I'm setting up some collection distribution along with multicore solr . I'm using version 1.3 I have no problem with the snapshooter, since this can be set within each core in solrconfig.xml. My question is more about the rsyncd . The rsyncd-start creates a rsyncd.conf in the conf directory relative to where it lies , so what I did is copying bin/rsynd-start in each core directory: solr/ core1/ bin/ rsyncd-start conf/ rsyncd.conf core2/ - same thing - Then for each core, I launch a rsyncd : /../solr/core1/bin/rsyncd-start -p 18080 -d /../solr/core1/data/ This way, it can be stopped properly when I use (rsyncd-stop grabs the data from the conf/rsyncd.conf of the containing core). /../solr/core1/bin/rsyncd-stop The problem is I'm not very confortable with having one running deamon per core (each on a different port), plus a copy of each script inside each core. Is there any better way to set this up ? Cheers !! Jerome Eteve. -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Re: Precisions on solr.xml about cross context forwarding.
I was thinking, maybe we should write a patch to fix this issue. For instance by making a dispatch servlet (with a "core" parameter or request attribute) that would act the same way as the filter but provide a cross context addressable entry point. What do you think ? Jerome On Wed, Dec 17, 2008 at 6:24 PM, Jérôme Etévé wrote: > Maybe there's an 'internal query' concept in j2ee that could be a workaround ? > I'm not really a j2ee expert .. > > Jerome. > > On Wed, Dec 17, 2008 at 5:09 PM, Smiley, David W. wrote: >> This bothers me too. I find it really strange that Solr's entry-point is a >> servlet filter instead of a servlet. >> >> ~ David >> >> >> On 12/17/08 12:07 PM, "Jérôme Etévé" wrote: >> >> Hi all, >> >> In solr.xml ( /lucene/solr/trunk/src/webapp/web/WEB-INF/web.xml >> ),it's written that >> >> "It is unnecessary, and potentially problematic, to have the >> SolrDispatchFilter >> configured to also filter on forwards. Do not configure >> this dispatcher as FORWARD." >> >> The problem is that if filters do not have this FORWARD thing, then >> cross context forwarding doesn't work. >> >> Is there a workaround to this problem ? >> >> Jerome. >> >> -- >> Jerome Eteve. >> >> Chat with me live at http://www.eteve.net >> >> jer...@eteve.net >> >> > > > > -- > Jerome Eteve. > > Chat with me live at http://www.eteve.net > > jer...@eteve.net > -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Re: Precisions on solr.xml about cross context forwarding.
Maybe there's an 'internal query' concept in j2ee that could be a workaround ? I'm not really a j2ee expert .. Jerome. On Wed, Dec 17, 2008 at 5:09 PM, Smiley, David W. wrote: > This bothers me too. I find it really strange that Solr's entry-point is a > servlet filter instead of a servlet. > > ~ David > > > On 12/17/08 12:07 PM, "Jérôme Etévé" wrote: > > Hi all, > > In solr.xml ( /lucene/solr/trunk/src/webapp/web/WEB-INF/web.xml > ),it's written that > > "It is unnecessary, and potentially problematic, to have the > SolrDispatchFilter > configured to also filter on forwards. Do not configure > this dispatcher as FORWARD." > > The problem is that if filters do not have this FORWARD thing, then > cross context forwarding doesn't work. > > Is there a workaround to this problem ? > > Jerome. > > -- > Jerome Eteve. > > Chat with me live at http://www.eteve.net > > jer...@eteve.net > > -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Precisions on solr.xml about cross context forwarding.
Hi all, In solr.xml ( /lucene/solr/trunk/src/webapp/web/WEB-INF/web.xml ),it's written that "It is unnecessary, and potentially problematic, to have the SolrDispatchFilter configured to also filter on forwards. Do not configure this dispatcher as FORWARD." The problem is that if filters do not have this FORWARD thing, then cross context forwarding doesn't work. Is there a workaround to this problem ? Jerome. -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Re: AW: Cross-context-forward to solr-instance
Hi Lance, Can you tell us what's this parameter and how to set it ? I'm also stucked with the same problem :( Thanks !! Jerome On Mon, Sep 8, 2008 at 6:02 PM, Lance Norskog wrote: > You can give a default core set by adding a default parameter to the query > in solrconfig.xml. This is hacky, but it gives you a set of cores instead of > just one core. > > -Original Message- > From: David Smiley @MITRE.org [mailto:dsmi...@mitre.org] > Sent: Monday, September 08, 2008 7:54 AM > To: solr-user@lucene.apache.org > Subject: Re: AW: Cross-context-forward to solr-instance > > > FWIW, I'm also using the SolrRequestFilter for forwards, despite the > warning. > Solr1.3 doesn't have the concept of a default core anymore yet I want this > feature. I made an uber-simple JSP like this: > " > /> > And so now my clients don't need to update their URL just because I've > migrated to Solr 1.3. Oh, I needed to set up the dispatcher FORWARD as you > mentioned and I also remapped the /select/* servlet mapping to my jsp.: > >selectDefaultCore >/selectDefaultCore.jsp > > > >selectDefaultCore >/select/* > > > The only problem I've seen so far is that if I echo the params > (echoParams=all), I see the output doubled. Weird but inconsequential. > > ~ David Smiley > > > Hachmann wrote: >> >> Hi, >> >> I made a mistake. At least with Tomcat 5.5.x, if you configure the >> SolrRequestFilter with FORWARD it indeed gets >> called even when you forward from another web-context! >> >> Note, that the documentation says this might be problematic! >> >> Sorry for the previous overhasty post. >> Björn >> >>> -Ursprüngliche Nachricht- >>> Von: >>> solr-user-return-13537-hachmann.bjoern=guj...@lucene.apache.or >>> g >>> [mailto:solr-user-return-13537-hachmann.bjoern=guj...@lucene.a >> pache.org] Im Auftrag von Hachmann, Bjoern >>> Gesendet: Samstag, 6. September 2008 08:01 >>> An: solr-user@lucene.apache.org >>> Betreff: Cross-context-forward to solr-instance >>> >>> Hi, >>> >>> yesterday I tried the Solr-1.3-RC2 and everything seems to work fine >>> using the traditional single-core setup. But while troubleshooting >>> the new multi-core feature, I realized for the first time, that I >>> have been using the deprecated (even in 1.2) class SolrServlet. This >>> is a huge problem for us, as we run the solr-web-app parallel to our >>> main web-app in the same servlet-container. Using this approach we >>> can internally forward update- and select-requests to the >>> Solr-instance currently in use. >>> >>> ServletContext ctx = getServletContext().getContext("solr1"); >>> RequestDispatcher rd = ctx.getNamedDispatcher("SolrServer"); >>> rd.forward(request, response); >>> >>> As you can see, this approach only works for the servlet named >>> 'SolrServer' which references the deprecated class. >>> >>> The attempt of using a path based dispatcher >>> (ctx.getRequestDispatcher) was not successful, even though I >>> configured the SolrRequestFilter in the solr-web.xml to work on >>> forwards (FORWARD), which the documentation >>> discourages. Maybe this is because of the cross-context-dispatch? >>> >>> At the moment I ran totally out of ideas, apart from completely >>> redesigning our whole setup. Any ideas are highly appreciated. >>> >>> Thanks in advance, >>> Björn >> >> > > -- > View this message in context: > http://www.nabble.com/Cross-context-forward-to-solr-instance-tp19343349p1937 > 3757.html > Sent from the Solr - User mailing list archive at Nabble.com. > > > -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Accessing multiple core via cross context queries.
Hi All, I'm developping a webapp that needs access to the solr webapp. I can get my solr context like that: ServletContext solrContext = getServletContext().getContext("/solr"); but when I do solrContext.getRequestDispatcher("/core0/select/").dispatch(request,response) ; I get a 404 error: HTTP Status 404 - /solr/core0/select/ type Status report message /solr/core0/select/ description The requested resource (/solr/core0/select/) is not available. Beside that, if access /solr/core0/select/ directly then everything is fine. >From what I saw in the sources , solr relies on a Filter notion to deal with queries involving multicore, but I cannot see why this could have an influence on what resources is available from the eyes of who. Can't a webapp see the same things as the web users does ? j2ee gurus help ! Is there something I'm missing here ? (both webapps are with crossContext=true ) Cheers! Jerome. -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
MoreLikeThis and boost functions
Hi everyone, I'm wondering if the MoreLikeThis handler takes the boost function parameter into account for the scoring (hence the sorting I guess) of the similar documents it finds. Thanks for your help ! Jerome. -- Jerome Eteve. Chat with me live at http://www.eteve.net [EMAIL PROTECTED]
Different tokenizing algorithms for the same stream
Hi, you have to keep track of the character position yourself in your custom Tokenizer. See org.apache.lucene.analysis.CharTokenizer for a starting example. Cheers, J. On Fri, Nov 7, 2008 at 3:33 PM, Yoav Caspi <[EMAIL PROTECTED]> wrote: > Thanks, Jerome. > > My problem is that in Token next(Token result) there is no information about > the location inside the stream. > I can read characters from the input Reader, but couldn't find a way to know > if it's the beginning of the input or not. > > -J > > On Fri, Nov 7, 2008 at 6:13 AM, Jérôme Etévé <[EMAIL PROTECTED]> wrote: >> >> Hi, >> >> I think you could implement your personalized tokenizer in a way it >> changes its behaviour after it has delivered X tokens. >> >> This implies a new tokenizer instance is build from the factory for >> every string analyzed, which I believe is true. >> >> Can this be confirmed ? >> >> Cheers ! >> >> Jerome. >> >> >> On Thu, Nov 6, 2008 at 11:08 PM, Yuri Jan <[EMAIL PROTECTED]> wrote: >> > Hello all, >> > >> > I'm trying to implement a tokenizer that will behave differently on >> > different parts of the incoming stream. >> > For example, for the first X words in the stream I would like to use one >> > tokenizing algorithm, while for the rest of the stream a different >> > tokenizing algorithm will be used. >> > >> > What is the best way to implement that? >> > Where should I store this stream-related data? >> > >> > Thanks, >> > Yuri >> > >> >> >> >> -- >> Jerome Eteve. >> >> Chat with me live at http://www.eteve.net >> >> [EMAIL PROTECTED] > > -- Jerome Eteve. Chat with me live at http://www.eteve.net [EMAIL PROTECTED] -- Jerome Eteve. Chat with me live at http://www.eteve.net [EMAIL PROTECTED]
Re: Batch and Incremental mode of indexing
Hi, For batch indexing, what you could do is to use two core. One in production and one used for your update. Once your update core is build (delete *:* plus batch insert) , you can swap the cores to put it in production: http://wiki.apache.org/solr/CoreAdmin#head-928b872300f1b66748c85cebb12a59bb574e501b Cheers, J On Fri, Nov 7, 2008 at 12:18 PM, Vaijanath N. Rao <[EMAIL PROTECTED]> wrote: > Hi Solr-Users, > > I am not sure but does there exist any mechanism where-in we can specify > solr as Batch and incremental indexing. > What I mean by batch indexing is solr would delete all the records which > existed in the index and will create an new index form the given data. > For incremental I want solr to just do the operation ( add/delete/... ). > > This is how we currently do batch-indexing, issue an command to solr delete > q=*:* commit and than start the indexing. > For incremental operation we just take the data and the operation specified. > > Kindly let me know if there exist a smarter way to get this working. > > --Thanks and Regards > Vaijanath > -- Jerome Eteve. Chat with me live at http://www.eteve.net [EMAIL PROTECTED]
delivering customized results using a SearchComponent plugin
Hi there, I developed a personalized SearchComponent in which I'm building a docset from a personalized Query, and a personalized Priority Queue. To be short, I'm doing that (in the process method) : HitCollector hitCol = new HitCollector() { @Override public void collect(int doc, float score) { myQueue.insert(new ScoreDoc(doc, score)); myNumHits[0]++; } }; rb.req.getSearcher().search(myQuery, hitCol); After popping the ids from myQueue etc ..., I add a nice DocSlice to the output: rb.rsp.add("myResponse", new DocSlice(0, mySliceLen, myIds, myScores, myNumHits[0], myMaxScore)); The effect of that, is that the given online response automagically (well, as far as I understand :D) contains the documents of my docSlice (under the key 'myResponse') , each one of them containing the fields defined in the 'fl' parameter (including the score). What I'd like to do is to add some fields to the returned documents. I thought about doing this in the way the QueryComponent adds the score ( see returnFields method in QueryComponent ), but my own 'handleResponses' method is not called, plus I can't access to rb._responseDocs (which seems to be imperative to have an effect on the returned online response). Here's what could help me a lot: - Where does the solr framework transforms the doc Ids (which are just integers) ? - How is the standard QueryComponent given the possibility to add this 'score' field to the returned document ? - How to hook in that process so I can add my own field ? Cheers !! Jerome. -- Jerome Eteve. Chat with me live at http://www.eteve.net [EMAIL PROTECTED]
Re: Different tokenizing algorithms for the same stream
Hi, I think you could implement your personalized tokenizer in a way it changes its behaviour after it has delivered X tokens. This implies a new tokenizer instance is build from the factory for every string analyzed, which I believe is true. Can this be confirmed ? Cheers ! Jerome. On Thu, Nov 6, 2008 at 11:08 PM, Yuri Jan <[EMAIL PROTECTED]> wrote: > Hello all, > > I'm trying to implement a tokenizer that will behave differently on > different parts of the incoming stream. > For example, for the first X words in the stream I would like to use one > tokenizing algorithm, while for the rest of the stream a different > tokenizing algorithm will be used. > > What is the best way to implement that? > Where should I store this stream-related data? > > Thanks, > Yuri > -- Jerome Eteve. Chat with me live at http://www.eteve.net [EMAIL PROTECTED]
DocSet: BitDocSet or HashDocSet ?
Hi all, In my code, I'd like to keep a subset of my 14M docs which is around 100k large. What is according to you the best option in terms of speed and memory usage ? Some basic thoughts tells me the BitDocSet should be the fastest for lookup, but takes ~ 14M * sizeof(int) in memory, whereas the HashDocSet takes just ~ 100k * sizeof(int) , but is a bit slower lookup. The doc of HashDocSet says "t can be a better choice if there are few docs in the set" . What does 'few' means in this context ? Cheers ! Jerome. -- Jerome Eteve. Chat with me live at http://www.eteve.net [EMAIL PROTECTED]
Re: Deadlock problem on searcher at warm up.
Great, it works now. Thanks ! J On Fri, Oct 24, 2008 at 4:45 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On Fri, Oct 24, 2008 at 8:21 AM, Jérôme Etévé <[EMAIL PROTECTED]> wrote: >> I though it'd be ok to trigger this the very first time the process >> method is called by doing something like that: >> >> private boolean firstTime= true ; >> >> public void process(ResponseBuilder rb) throws IOException { >>if ( firstTime ){ >>firstTime = false ; >>buildMyStuff(rb) ; >>} >> } >> >> >> The problem is that my method buildMyStuff hangs when calling >> rb.req.getCore().getSearcher() ; , >> and I believe this is happening when the warm up queries are executed. > > getSearcher() can wait for a searcher to be registered. > getNewestSearcher() can be used from places like inform(), but if you > are already in process() > then the one you should use is the one bound to the request (the > SolrQueryRequest object) - rb.req.getSearcher() > > -Yonik > -- Jerome Eteve. Chat with me live at http://www.eteve.net [EMAIL PROTECTED]
Re: One document inserted but nothing showing up ? SOLR 1.3
Hi there, Are you sure you did a commit after your insertion ? On Fri, Oct 24, 2008 at 8:11 AM, sunnyfr <[EMAIL PROTECTED]> wrote: > > Even that doesn't work, > How can I check properly, I did insert one document but I can't get it back > ??? > > > Feak, Todd wrote: >> >> Unless "q=ALL" is a special query I don't know about, the only reason you >> would get results is if "ALL" showed up in the default field of the single >> document that was inserted/updated. >> >> You could try a query of "*:*" instead. Don't forget to URL encode if you >> are doing this via URL. >> >> -Todd >> >> >> -Original Message- >> From: sunnyfr [mailto:[EMAIL PROTECTED] >> Sent: Thursday, October 23, 2008 9:17 AM >> To: solr-user@lucene.apache.org >> Subject: One document inserted but nothing showing up ? SOLR 1.3 >> >> >> Hi >> >> Can somebody help me ? >> How can I see all my documents, I just did a full import : >> >> Indexing completed. Added/Updated: 1 documents. Deleted 0 documents. >> >> >> and when I do :8180/solr/video/select/?q=ALL, I've no result ? >> >> − >> >> 0 >> 0 >> − >> >> ALL >> >> >> >> >> >> Thanks a lot, >> >> -- >> View this message in context: >> http://www.nabble.com/One-document-inserted-but-nothing-showing-up---SOLR-1.3-tp20134357p20134357.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> >> > > -- > View this message in context: > http://www.nabble.com/One-document-inserted-but-nothing-showing-up---SOLR-1.3-tp20134357p20145343.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Jerome Eteve. Chat with me live at http://www.eteve.net [EMAIL PROTECTED]
Deadlock problem on searcher at warm up.
Hi everyone, I'm implementing a search component inherited from SearchComponent . This component has to build a data structure from the index. Like in the SpellChecker, I trigger this building by giving a special argument at query time (from the process method) and I'm using the searcher I get like this: RefCounted search = rb.req.getCore() .getSearcher(); ... search.decref(); I included this component at the end of the chain in my search handler. What I'd like to do is to trigger this building for a first time at solr startup so I don't need to artificially trigger it for a first time. I though it'd be ok to trigger this the very first time the process method is called by doing something like that: private boolean firstTime= true ; public void process(ResponseBuilder rb) throws IOException { if ( firstTime ){ firstTime = false ; buildMyStuff(rb) ; } } The problem is that my method buildMyStuff hangs when calling rb.req.getCore().getSearcher() ; , and I believe this is happening when the warm up queries are executed. Furthermore, any regular queries on a solr instance like this would hang and wait forever. I there any way I can get around this problem, or is there a better way to buildMyStuff a first time when solr is started up? Cheers, Jerome. -- Jerome Eteve. Chat with me live at http://www.eteve.net [EMAIL PROTECTED]
Re: solr 1.3 database connection latin1/stored utf8 in mysql?
Hi, See http://java.sun.com/j2se/1.3/docs/guide/intl/encoding.doc.html and http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#getBytes(java.lang.String) Also note that you cannot transform a latin1 string in a utf-8 string. What you can do is to decode a latin1 octet array to a String (java uses its own internal representation for String which you shouldn't even know about), and you can encode a String to an utf-8 bytes array. Cheers. J. On Wed, Oct 22, 2008 at 10:11 AM, sunnyfr <[EMAIL PROTECTED]> wrote: > > Hi Shalin > Thanks for your answer but it doesn't work just with Dfile.encoding > I was hoping it could work. > > I definitely can't change the database so I guess I must change java code. > I've a function to change latin-1 string to utf8 but I don't know really > where should I put it? > > Thanks for your answer, > > > Shalin Shekhar Mangar wrote: >> >> Hi, >> >> The best way to manage international characters is to keep everything in >> UTF-8. Otherwise it will be difficult to figure out the source of the >> problem. >> >> 1. Make sure the program which writes data into MySQL is using UTF-8 >> 2. Make sure the MySQL tables are using UTF-8. >> 3. Make sure MySQL client connections use UTF-8 by default >> 4. If the SQL written in your data-config has international characters, >> start Solr with "-Dfile.encoding=UTF-8" as a command line parameter >> >> http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html >> >> I don't think there is any easy way to go about this. You may need to >> revisit all the parts of your system. >> >> On Wed, Oct 22, 2008 at 12:52 PM, sunnyfr <[EMAIL PROTECTED]> wrote: >> >>> >>> Hi, >>> >>> I'm using solr1.3 mysql and tomcat55, can you please help to sort this >>> out? >>> How can I index data in UTF8 ? I tried to add the parameter >>> encoding="UTF-8" >>> in the datasource in data-config.xml. >>> >>> | character_set_client| latin1 >>> | character_set_connection| latin1 >>> But data are stored in UTF8 inside database, not very logic but I can't >>> change it. >>> >>> But still doesn't work, Help would be more than welcome, >>> Thanks >>> -- >>> View this message in context: >>> http://www.nabble.com/solr-1.3-database-connection-latin1-stored-utf8-in-mysql--tp20105301p20105301.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> >> -- >> Regards, >> Shalin Shekhar Mangar. >> >> > > -- > View this message in context: > http://www.nabble.com/solr-1.3-database-connection-latin1-stored-utf8-in-mysql--tp20105342p20106791.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Jerome Eteve. Chat with me live at http://www.eteve.net [EMAIL PROTECTED]
Re: tomcat55/solr1.3 - Indexing data, doesnt take in consideration utf8!
Looks like you have a double encoding problem. It might be because you fetch UTF-8 binary data from mysql (I know that for instance the perl driver has an issue with that) and you then encode it a second time in UTF-8 when you post to solr. Make sure the string you're getting from mysql are actually proper unicode strings and not the raw UTF-8 encoded binary form. You may want to have a look at http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-charsets.html for the proper option to use with your connection. What you can try to check you're posting actual UTF-8 data to solr is to dump your xml post in a file (don't forget to set the input encoding to UTF-8 ). Then you can check if this file is readable with any UTF-8 aware editor. Cheers, Jerome. On Tue, Oct 21, 2008 at 10:43 AM, sunnyfr <[EMAIL PROTECTED]> wrote: > > Hi, > > I've solr 1.3 and tomcat55. > When I try to index a bit of data and I request ALL, obviously my accent and > UTF8 encoding is not took in consideration. > > 2006-12-14T15:28:27Z > > Le 1er film de Goro Miyazaki (fils de Hayao) > je suis allÃ(c)e ... > > 渡邊 å‰ å· vs 三ç"°ä¸‹ç"° 1 > > > My database Mysql is well in UTF8, if I request data manually from mysql I > will get accent even japan characters properly > > I index my data, my data-config is : >driver="com.mysql.jdbc.Driver" > url="jdbc:mysql://master-spare.videos.com/videos" > user="solr" > password="pass" > batchSize="-1" > responseBuffering="adaptive"/> > > My schema config file start by : > > I've add in my server.xml : because my localhost point on 8180 > maxThreads="150" minSpareThreads="25" maxSpareThreads="75" > enableLookups="false" redirectPort="8443" acceptCount="100" > connectionTimeout="2" disableUploadTimeout="true" > URIEncoding="UTF-8" useBodyEncodingForURI="true" /> > > What can I check? > I'm using a linux server. > If I do dpkg-reconfigure -plow locales > Generating locales... > fr_BE.UTF-8... up-to-date > fr_CA.UTF-8... up-to-date > fr_CH.UTF-8... up-to-date > fr_FR.UTF-8... up-to-date > fr_LU.UTF-8... up-to-date > Generation complete. > > Would that be a problem, I would say no but maybe, do I miss a package??? > > > > -- > View this message in context: > http://www.nabble.com/tomcat55-solr1.3---Indexing-data%2C-doesnt-take-in-consideration-utf8%21-tp20086167p20086167.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Jerome Eteve. Chat with me live at http://www.eteve.net [EMAIL PROTECTED]
Re: Discarding undefined fields in query
On Tue, Oct 7, 2008 at 12:56 AM, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > : req.getSchema().getQueryAnalyzer(); > : > : I think it's in this analyzer that the undefined field error happens > : (because for instance the field 'foo' doesn't exists in the schema, > : and so it's impossible to find a specific analyzer to this field in > : the schema). > > Correct. > > : The strange thing is that any QueryParser (Lucene API) is supposed to > : raise a ParseException if anything wrong happens with the parsing with > : the parse(String) method. > : > : But here, it seems that the Analyzer from the schema (the one we get > : from getQueryAnalyzer()) is creating it's own error ( the undefined > : field one, instance of SolrException) and instead of propagating it to > : the QueryParser which could have a chance to propagate it as a > : standard ParseException, it seems it stops solr processing the query > : directly. > > Solr isn't doing anything magical here -- it's just throwing a > SolrException, which is a RuntimeExcepttion -- the Lucene > QueryParser.parse method only throws a ParseException in th event of > TooManyClauses, TokenMgrError, or an inner ParseException. > Ook, I get it now. Runtime exceptions don't have to be checked at compile time, ( and couldn't be here since the Analyzer could be anything throwing anything). I'll catch that and deal with it then (Or is it bad programming ?) . Thanks for your help . -- Jerome Eteve. Chat with me live at http://www.eteve.net [EMAIL PROTECTED]
Re: Discarding undefined fields in query
Hi, yes I've got the stack trace giving me the beginning of an explanation. One of the QueryParsers I use in my Query parser plugin is a multifiedQueryParser and it needs a fields aware Analyzer, which I get from the schema like this: req.getSchema().getQueryAnalyzer(); I think it's in this analyzer that the undefined field error happens (because for instance the field 'foo' doesn't exists in the schema, and so it's impossible to find a specific analyzer to this field in the schema). The strange thing is that any QueryParser (Lucene API) is supposed to raise a ParseException if anything wrong happens with the parsing with the parse(String) method. But here, it seems that the Analyzer from the schema (the one we get from getQueryAnalyzer()) is creating it's own error ( the undefined field one, instance of SolrException) and instead of propagating it to the QueryParser which could have a chance to propagate it as a standard ParseException, it seems it stops solr processing the query directly. Here's the full stack (with the undefined field being 'hwss' ) org.apache.solr.common.SolrException: undefined field hwss at org.apache.solr.schema.IndexSchema.getDynamicFieldType(IndexSchema.java:1053) at org.apache.solr.schema.IndexSchema$SolrQueryAnalyzer.getAnalyzer(IndexSchema.java:373) at org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.tokenStream(IndexSchema.java:348) at org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:473) at org.apache.lucene.queryParser.MultiFieldQueryParser.getFieldQuery(MultiFieldQueryParser.java:120) at org.apache.lucene.queryParser.MultiFieldQueryParser.getFieldQuery(MultiFieldQueryParser.java:135) at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1248) at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1135) at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1092) at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1052) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:168) at my.organisation.lucene.queryParser.MyLuceneQueryParser.parse(Unknown Source) at my.organisation.solr.search.MyQParser.parse(Unknown Source) at org.apache.solr.search.QParser.getQuery(QParser.java:88) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:82) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:155) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:525) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Cheers ! Jerome. On Tue, Sep 30, 2008 at 10:34 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On Tue, Sep 30, 2008 at 2:42 PM, Jérôme Etévé <[EMAIL PROTECTED]> wrote: >> But still I have an error from the webapp when I try to query my >> schema with non existing fields in my query ( like foo:bar ). >> >> I'm wondering if the query q is parsed in a very simple way somewhere >> else (and independently from any customized QParserPlugin) and checked >> against the schema. > > It should not be. Are you sure your QParser is being used? > Does the error contain a stack trace that can pinpoint where it's coming from? > > -Yonik > -- Jerome Eteve. Chat with me live at http://www.eteve.net [EMAIL PROTECTED]
Discarding undefined fields in query
Hi All, I wrote a customized query parser which discards non-schema fields from the query (I'm using the schema field names from req.getSchema().getFields().keySet() ) . This parser works fine in unit tests. But still I have an error from the webapp when I try to query my schema with non existing fields in my query ( like foo:bar ). I'm wondering if the query q is parsed in a very simple way somewhere else (and independently from any customized QParserPlugin) and checked against the schema. Is there an option to modify this behaviour so undefined fields in a query could be simply discarded instead of throwing an error ? Cheers ! Jerome. -- Jerome Eteve. Chat with me live at http://www.eteve.net [EMAIL PROTECTED]
Re: Multicore and custom jars loading
My mistake, Using the sharedLib="lib/" attribute in the solr tag of solr.xml solved the problem. J. On Mon, Sep 29, 2008 at 2:43 PM, Jérôme Etévé <[EMAIL PROTECTED]> wrote: > Hello all. > > I'm using a multicore installation and I've got a small issue with > the loading of our customized jars. > > Let's say I've got a class my.company.MyAnalyzer which is distributed > in a jar called company-solr.jar > > If I put this jar in the lib directory, at the solr home like this: > > $solr_home/: >solr.xml >core1/ >core2/ >lib/company-solr.jar > > , then the solr class loader adds properly the company-solr.jar to the > class loader, but then it's not possible to find those classes from > the cores. > For instance if you have core1/conf/schema.xml which makes use of the > my.company.MyAnalyzer class, it won't work because this class won't be > found. > > At the moment, I solved the pb by duplicating the jar inside the two > cores like that: > > core1/lib/company-solr.jar > ... > core2/lib/company-solr.jar > > But I'm not very happy with this solution. > > Is there anyway to allow core shema files to references classes loaded > as jars in the top level lib path ? > > I'm running solr1.3.0 in tomcat 6.0.18 > > > Cheers !! > > Jerome. > > -- > Jerome Eteve. > > Chat with me live at http://www.eteve.net > > [EMAIL PROTECTED] > -- Jerome Eteve. Chat with me live at http://www.eteve.net [EMAIL PROTECTED]
Multicore and custom jars loading
Hello all. I'm using a multicore installation and I've got a small issue with the loading of our customized jars. Let's say I've got a class my.company.MyAnalyzer which is distributed in a jar called company-solr.jar If I put this jar in the lib directory, at the solr home like this: $solr_home/: solr.xml core1/ core2/ lib/company-solr.jar , then the solr class loader adds properly the company-solr.jar to the class loader, but then it's not possible to find those classes from the cores. For instance if you have core1/conf/schema.xml which makes use of the my.company.MyAnalyzer class, it won't work because this class won't be found. At the moment, I solved the pb by duplicating the jar inside the two cores like that: core1/lib/company-solr.jar ... core2/lib/company-solr.jar But I'm not very happy with this solution. Is there anyway to allow core shema files to references classes loaded as jars in the top level lib path ? I'm running solr1.3.0 in tomcat 6.0.18 Cheers !! Jerome. -- Jerome Eteve. Chat with me live at http://www.eteve.net [EMAIL PROTECTED]
Querying multicore
Hi everyone, I'm planning to use the multicore cause it seems more convenient than having multiple instances of solr in the same container. I'm wondering if it's possible to query different cores ( hence different schemas / searchers ... indices !) from a customized SolrRequestHandler to build a response. ? If not I'll have to build my own webapp and query solr through crossContext requests. Has someone done that already ? Kind regards, Jerome Eteve. -- Jerome Eteve. Chat with me live at http://www.eteve.net [EMAIL PROTECTED]
Re: Solr deployment in tomcat
On 10/9/07, Chris Laux <[EMAIL PROTECTED]> wrote: > Jérôme Etévé wrote: > [...] > > /var/solr/foo/ is the solr home for this instance (where you'll put > > your schema.xml , solrconfig.xml etc.. ) . > > Thanks for the input Jérôme, I gave it another try and discovered that > what I was doing wrong was copying the solr/example/ directory to what > you call "/var/solr/foo/", while copying solr/example/solr/ is what > works now. > > Maybe I should add a note to the Wiki... Sounds like a good idea ! Actually I remember struggling a bit to have multiple instance of solr in tomcat. -- Jerome Eteve. [EMAIL PROTECTED] http://jerome.eteve.free.fr/
Re: Solr deployment in tomcat
Hi, Here's what I've got (multiplesolr instance within the same tomcat server) In /var/tomcat/conf/Catalina/localhost/ For an instance 'foo' : foo.xml : /var/tomcat/solrapp/solr.war is the path to the solr war file. It can be anywhere on the disk. /var/solr/foo/ is the solr home for this instance (where you'll put your schema.xml , solrconfig.xml etc.. ) . Restart tomcat and you should see your foo app appear in your deployed apps. Jerome. On 10/9/07, Chris Laux <[EMAIL PROTECTED]> wrote: > > Hello Group, > > Does anyone able to deploy solr.war @ tomcat. I just tried to deploy it as > > per wiki and it gives bunch of exceptions and I dont think those exceptions > > have any relevance with the actual cause. I was wondering if there is any > > speciaf configuration needed? > > I had that very same problem while trying to set solr up with tomcat > (and multiple instances). I have given up for now and am working with > Jetty instead. > > Chris Laux > > -- Jerome Eteve. [EMAIL PROTECTED] http://jerome.eteve.free.fr/
Re: Problem with html code inside xml
If I understand, you want to keep the raw html code in solr like that (in your posting xml file): I think you should encode your content to protect these xml entities: < -> < > -> > " -> " & -> & If you use perl, have a look at HTML::Entities. On 9/25/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > Hello, > > I've got some problem with html code who is embedded in xml file: > > Sample source . > > > > > Les débats > > > Le premier tour des élections fédérales se déroulera > le 21 > octobre prochain. D'ici là, La 1ère vous propose plusieurs rendez- > vous, dont plusieurs grands débats à l'enseigne de Forums. > > > > > my para textehere > > > Vous trouverez sur cette page toutes les > dates et les heures de > ces différents rendez-vous ainsi que le nom et les partis des > débatteurs. De plus, vous pourrez également écouter ou réécouter > l'ensemble de ces émissions. > > > > - > When a make a query on solr I've got something like that in the > source code of the xml result: > > http://www.w3.org/1999/xhtml";> > < > div > class > = > "paragraph" > > > < > div > class > = > "paragraphTitle" > /> > − > < > ... > > It is not exactly what I want. I want to keep the html tags, that all > without formatting. > > So the br tags and a tags are well formed in xml and json result, but > the div tags are not kept. > - > In the schema.xml I've got this for the html content > > > >stored="true" multiValued="true"/> > > - > > Any help would be appreciate. > > Thanks in advance. > > S. Christin > > > > > > -- Jerome Eteve. [EMAIL PROTECTED] http://jerome.eteve.free.fr/
Re: How to get all the search results - python
By design, it's not very efficient to ask for a large number of results with solr/lucene. I think you will face performance and memory problems if you do that. On 9/24/07, Thorsten Scherler <[EMAIL PROTECTED]> wrote: > On Mon, 2007-09-24 at 16:29 +0530, Roopesh P Raj wrote: > > > Hi Roopesh, > > > > > I am not sure whether I understand your problem. > > > > > Is it the limitation of rows/pagination? > > > If so why not using a real high number (like rows=100)? > > > > > salu2 > > > > Hi, > > > > Assigning a high number will solve my problem. (I thought that there will > > something like rows='all' to do it). > > > > Can I do pagination using the python client? > > I am not a python expert but I think so. > > > How can I specify the starting position, offset etc for > > pagination through the python client? > > http://wiki.apache.org/solr/CommonQueryParameters > > It should work as described in the above document (with the start > parameter. > > e.g. > data = c.search(q='query', fl='id score unique_id Message-ID To From > Subject',rows=50, wt='python',start=50) > > HTH > -- > Thorsten Scherler thorsten.at.apache.org > Open Source Java consulting, training and solutions > > -- Jerome Eteve. [EMAIL PROTECTED] http://jerome.eteve.free.fr/
-field:[* TO *] doesn't seem to work
Hi all I've got a problem here with the '-field:[* TO *]' syntax. It doesn't seem to work as expected (see http://wiki.apache.org/solr/SolrQuerySyntax ). My request is 'word -fieldD:[* TO *]' and the debugQuery=1 solr option shows that it's properly transformed as : +(fieldA:chef^10.0 fieldB:chef fieldC:chef^2.0) -fieldD:[* TO *] but solr still gives back documents with non void fieldD . My fieldD is defined as ' with text_ws being the standard solr text field that only splits on whitespace for exact matching of words. Did I missed something ? I'm using solr 1.2.1-dev . Thanks for any help ! Jerome. -- Jerome Eteve. [EMAIL PROTECTED] http://jerome.eteve.free.fr/
Re: Index HotSwap
On 8/21/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > : I'm wondering what's the best way to completely change a big index > : without loosing any requests. > > use the snapinstaller script -- or adopt the same atomic copying approach > it uses. I'm having a look :) > : - Between the two mv's, the directory dir does not exists, which can > : cause some solr failure. > > this shouldn't cause any failure unless you tell Solr to try and reload > turing the move (ie: you send it a commit) ... either way an atomic copy > in place of a mv should work much better. Why, does the reloading of the searcher triggers a re loading of the files from disk ? Thx ! > > -Hoss > > -- Jerome Eteve. [EMAIL PROTECTED] http://jerome.eteve.free.fr/
Re: Indexing HTML content... (Embed HTML into XML?)
You need to encode your html content so it can be include as a normal 'string' value in your xml element. As far as remember, the only unsafe characters you have to encode as entities are: < -> < > -> > " -> "e; & -> & (google xml entities to be sure). I dont know what language you use , but for perl for instance, you can use something like: use HTML::Entities ; my $xmlString = encode_entities($rawHTML , '<>&"' ); Also you need to make sure your Html is encoded in UTF-8 . To comply with solr need for UTF-8 encoded xml. I hope it helps. J. On 8/22/07, Ravish Bhagdev <[EMAIL PROTECTED]> wrote: > Hello, > > Sorry for stupid question. I'm trying to index html file as one of > the fields in Solr, I've setup appropriate analyzer in schema but I'm > not sure how to add html content to Solr. Encapsulating HTML content > within field tag is obviously not valid. How do I add html content? > Hope the query is clear > > Thanks, > Ravi > -- Jerome Eteve. [EMAIL PROTECTED] http://jerome.eteve.free.fr/
Index HotSwap
Hi all, I'm wondering what's the best way to completely change a big index without loosing any requests. That's how I do at the moment: solr index is a soft link to a directory dir. When I want to install a new index (in dir.new), I do a mv dir dir.old ; mv dir.new dir Then I ask for a reload of the solr application (within tomcat). I can see two problems with this method: - Between the two mv's, the directory dir does not exists, which can cause some solr failure. - Apparently It's not that safe to reload a webapp within tomcat. I thought it was the equivalent of the apache graceful reloading (completing current requests and putting incoming ones into a queue while the application restarts), but it's apparently not. I noticed we have a couple of query lost when it happens. One is a 503 This application is not currently available, and the one just after is a 404 /solr//select/ - The requested resource (/solr//select/) is not available. Does anybody know how to avoid this behaviour, and eventually what is the best way to swap between two big indexes. Thanks for any help ! Jerome. -- Jerome Eteve. [EMAIL PROTECTED] http://jerome.eteve.free.fr/
Using MMapDirectory instead of FSDirectory
Hi ! Is there a way to use a MMapDirectory instead of FSDirectory within Solr ? Our index is quite big and It takes a long time to go up in the OS cached memory. I'm wondering if an MMapDirectory could help to have our data in memory quicker (our index on disk is bigger than our memory available). Do you have tips on optimizing such thing ? Thanks !!! Jerome. -- Jerome Eteve. [EMAIL PROTECTED] http://jerome.eteve.free.fr/
Re: Solr Search any Lucene Index ?
Hi, From my personal experience, solr is capable to search in an index generated with CLucene. Of course, you have to be carefull on the type mappings. J. On 7/16/07, Ard Schrijvers <[EMAIL PROTECTED]> wrote: Hello, AFAIK, Solr Search is only capable of searching in a Lucene index that is created by Solr (at least, this seems logical to me)...or, the exact same fields and analyzers must have been indexed the way solr would have done it. Ard > > Hi, > Can Solr Search any Lucene Index. If "YES" what should > be change in > configuration. > > Thanks > Narendra > -- Jerome Eteve. [EMAIL PROTECTED] http://jerome.eteve.free.fr/
Re: Pluggable IndexSearcher Proposal
Actually, I implemented that feature for the 1.2.0 version of solr (the one I use) It allows you to speficy the IndexSearcher used by solr in the schema configuration file: If the specified class cant be loaded, a severe message is issued in the log and solr falls back to the hardcoded lucene IndexSearcher . The patch to apply is attached to this email. I also created an issue in the solr jira: https://issues.apache.org/jira/browse/SOLR-288 but I didn t find the way to upload the patch. Thanks for your comments. Jerome. On 7/5/07, Jérôme Etévé <[EMAIL PROTECTED]> wrote: Hi all ! I need a new feature in solr : to allow the configuration of the IndexSearcher class in the schema configuration to override the lucene IndexSearcher . I noticed that there's only one point in the code where the searcher is built: in org/apache/solr/search/SolrIndexSearcher.java: private SolrIndexSearcher(IndexSchema schema, String name, IndexReader r, boolean closeReader, boolean enableCache) { this.schema = schema; this.name = "Searcher@" + Integer.toHexString(hashCode()) + (name!=null ? " "+name : ""); log.info("Opening " + this.name); reader = r; /** HERE */ searcher = new IndexSearcher(r); I'd like to allow a new tag in the schema : I dont exactly know what is the best way to do it. I was think of: * In IndexSchema: implement a method String getLuceneIndexSearcherClassName() * In SolrIndexSearcher in private SolrIndexSearcher: String idxSearcherClassName = schema.getLuceneIndexSearcherClassName() // Then load the class itself // Then build a new instance of this class with the IndexReader r What solr special class loader and instance builder do I have to use to do the last two operation ? Can I use directly : Class idxSearcherClass = Config.findClass(idxSearcherClassName) and then build a idxSearcher by using the standard java.lang.Class methods ? Am I in the right and does it fit with the solr architecture to do that ? I'd be perfectly happy to implement that and submit a patch. Thanks for your comments and answers. Jerome -- Jerome Eteve. [EMAIL PROTECTED] http://jerome.eteve.free.fr/ -- Jerome Eteve. [EMAIL PROTECTED] http://jerome.eteve.free.fr/ diff -Nurp src_old/java/org/apache/solr/schema/IndexSchema.java src/java/org/apache/solr/schema/IndexSchema.java --- src_old/java/org/apache/solr/schema/IndexSchema.java 2007-05-30 16:51:06.0 +0100 +++ src/java/org/apache/solr/schema/IndexSchema.java 2007-07-05 16:46:11.0 +0100 @@ -125,6 +125,13 @@ public final class IndexSchema { public Collection getRequiredFields() { return requiredFields; } private Similarity similarity; + private String searcherClassName = null ; + +/** + * Returns the indexSearcherClassName to use with this index + */ +public String getSearcherClassName() { return searcherClassName ;} + /** * Returns the Similarity used for this index @@ -449,6 +456,15 @@ public final class IndexSchema { similarity = (Similarity)Config.newInstance(node.getNodeValue().trim()); log.fine("using similarity " + similarity.getClass().getName()); } + +// Grab indexSearcher class +node = (Node) xpath.evaluate("/schema/searcher/@class" , document, XPathConstants.NODE); +if ( node != null ){ + searcherClassName = node.getNodeValue().trim() ; + log.info("will use " + searcherClassName + " for IndexSearcher class"); +}else{ + log.info("No customized index searcher class - will use default"); +} node = (Node) xpath.evaluate("/schema/defaultSearchField/text()", document, XPathConstants.NODE); if (node==null) { diff -Nurp src_old/java/org/apache/solr/search/SolrIndexSearcher.java src/java/org/apache/solr/search/SolrIndexSearcher.java --- src_old/java/org/apache/solr/search/SolrIndexSearcher.java 2007-05-30 16:51:15.0 +0100 +++ src/java/org/apache/solr/search/SolrIndexSearcher.java 2007-07-05 17:45:18.0 +0100 @@ -41,6 +41,8 @@ import java.util.*; import java.util.logging.Level; import java.util.logging.Logger; +import java.lang.reflect.Constructor ; + /** * SolrIndexSearcher adds schema awareness and caching functionality @@ -104,7 +106,33 @@ public class SolrIndexSearcher extends S log.info("Opening " + this.name); reader = r; -searcher = new IndexSearcher(r); +//searcher = new IndexSearcher(r); + +// Eventually build a searcher according to configuration +String idxSearcherClassName = schema.getSearcherClassName() ; +if ( idxSearcherClassName == null ){ + log.info("Using hardcoded standard lucene IndexSearcher"); + searcher = new IndexSearcher(r); +}else{ + log.info("Attempting to load " + idxSearcherClassName ); + IndexSearcher customsearcher ; + try{ + Class idx
Pluggable IndexSearcher Proposal
Hi all ! I need a new feature in solr : to allow the configuration of the IndexSearcher class in the schema configuration to override the lucene IndexSearcher . I noticed that there's only one point in the code where the searcher is built: in org/apache/solr/search/SolrIndexSearcher.java: private SolrIndexSearcher(IndexSchema schema, String name, IndexReader r, boolean closeReader, boolean enableCache) { this.schema = schema; this.name = "Searcher@" + Integer.toHexString(hashCode()) + (name!=null ? " "+name : ""); log.info("Opening " + this.name); reader = r; /** HERE */ searcher = new IndexSearcher(r); I'd like to allow a new tag in the schema : I dont exactly know what is the best way to do it. I was think of: * In IndexSchema: implement a method String getLuceneIndexSearcherClassName() * In SolrIndexSearcher in private SolrIndexSearcher: String idxSearcherClassName = schema.getLuceneIndexSearcherClassName() // Then load the class itself // Then build a new instance of this class with the IndexReader r What solr special class loader and instance builder do I have to use to do the last two operation ? Can I use directly : Class idxSearcherClass = Config.findClass(idxSearcherClassName) and then build a idxSearcher by using the standard java.lang.Class methods ? Am I in the right and does it fit with the solr architecture to do that ? I'd be perfectly happy to implement that and submit a patch. Thanks for your comments and answers. Jerome -- Jerome Eteve. [EMAIL PROTECTED] http://jerome.eteve.free.fr/
Specific fields with DisMaxQueryHandler
Hi , when we use DisMaxQueryHandler, queries that includes specific fields which are not part of the boost string doesn't seem to work. For instance, If the boost string ( qf ) is 'a^3 b^4' and my query is 'term +c:term2' , it doesnt produce any result. Am I using this QueryHandler the bad way ? Thanks for your help. Jerome. -- Jerome Eteve. [EMAIL PROTECTED] http://jerome.eteve.free.fr/
MultifieldSolrQueryParser ?
Hi, Solr uses a default query parser which is a SolrQueryParser based on a org.apache.lucene.queryParser.QueryParser; I wonder which is the best way to make the IndexSchema use some kind of MultifieldSolrQueryParser which could be based on a org.apache.lucene.queryParser.MultiFieldQueryParser for per field boost factor. Thank you for any help ! Jerome. -- Jerome Eteve. [EMAIL PROTECTED] http://jerome.eteve.free.fr/
Re: Log levels setting
On 6/29/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: : Hi, : is there a way to avoid going to the web interface to set up the solr : log level ? he web intrface for tweaking the log level is actually a miss-feature in my opinion ... it's a handy way to quickly crank the logging level up if something weird is happening nad you want to see why, but the best way to configre logging for Solr is via whatever configuration mechanism your Servlet Container provides for managing JDK logging. Thanks for those informations ! I'm using tomcat 6, does somebody has a snippet of conf file to set up the log level for all org.apache.solr.* classes ? Resin, Tomcat, and Jetty all support differnet configuration mechanisms for controlling the logging level of individual loggers (which is one way you can say i want INFO level from these classes, but only WARNINGs from these other classes) ... in the absolute worst case scenerio if your servlet container doesn't support any special logging configuration, you can use the JDK system properties to specify a logging.properties file the JDK should load on startup... http://java.sun.com/j2se/1.5.0/docs/guide/logging/overview.html -Hoss -- Jerome Eteve. [EMAIL PROTECTED] http://jerome.eteve.free.fr/
Log levels setting
Hi, is there a way to avoid going to the web interface to set up the solr log level ? I'm also a bit confused about the INFO log level. Actually it's very nice to see some startup info about the schema , solr home setting, customize modules loaded .. But also this INFO log levels gives two lines for every request done, which fills up very quickly the log file with not so usefull information . Is there a way to isolate those request informations from the INFO log level ? Thanks for your comments and advices ! Jerome. -- Jerome Eteve. [EMAIL PROTECTED] http://jerome.eteve.free.fr/
float field indexed with clucene, access with solr
Hi, I have an index which I generated with clucene where there is a float field. This float field is stored as a simple verbatim character string. The solr schema doc states that for such float fields: And for sortable float fields: What does exactly means 'a string value that isn't human-readable in its internal form' ? Does that mean that such a field as to be indexed as a binary representation of the number to allow the use of the sfloat type ? I noticed that in the FloatField class, the method getSortField is like that: public SortField getSortField(SchemaField field,boolean reverse) { return new SortField(field.name,SortField.FLOAT, reverse); } It seems to return the right type of SortField.FLOAT adapted to my field. In SortableFloatField, public SortField getSortField(SchemaField field,boolean reverse) { return getStringSort(field,reverse); } I'm not sure to understand all of this, but what I feel is that since the type 'FloatField' gives that 'new SortField(field.name,SortField.FLOAT)' , it should suits my verbatim float data for sorting the query results. Do I have the right feeling ? thanks for your help -- Jerome Eteve. [EMAIL PROTECTED] http://jerome.eteve.free.fr/
Re: Problems querying Russian content
On 6/28/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 6/28/07, Daniel Alheiros <[EMAIL PROTECTED]> wrote: > I'm in trouble now about how to issue queries against Solr using in my "q" > parameter content in Russian (it applies to Chinese and Arabic as well). > > The problem is I can't send any Russian special character in URL's because > they don't fit in ASCII domain, so I'm doing a POST to accomplish that. You can send unicode in URLs (it's done as the UTF-8 bytes percent encoded). http://www.ietf.org/rfc/rfc3986.txt But a POST should work too. You just need to make sure the Content-type contains the character encoding, and that it actually matches what is being sent. If this is a browser doing the POST, it can be a bit tricky to get it to post UTF-8... basically, I think the browser uses the charset of the HTML page containing the form when it does the POST (so make sure that's UTF8). You can also ensure the browser sends an utf8 encoded post by http://jerome.eteve.free.fr/
Re: XML vs JSON writer performance issues
2007/6/27, Yonik Seeley <[EMAIL PROTECTED]>: > > It would be helpful if you could try out the patch at > https://issues.apache.org/jira/browse/SOLR-276 > > -Yonik I just tryed it out and it works. json output is now as fast as xml ! Well done :) thank you ! J. -- Jerome Eteve. [EMAIL PROTECTED] http://jerome.eteve.free.fr/
Re: XML vs JSON writer performance issues
On 6/26/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 6/26/07, Jérôme Etévé <[EMAIL PROTECTED]> wrote: > I'm currently running some tests with solr on a small index and I > noticed a big difference on the response time of queries depending on > the use of XML or json as a response format. > In average, my test queries (including http connections open and close > ) takes 6 ms to perform when I ask for XML and they take 30 ms when I > ask for JSON. Wow, that's a surprise. The only thing I can figure is that perhaps during the string escaping the JSON writer is writing to the stream character-by-character. Could you try the python writer and see if there is a speed difference? It uses a StringBuilder when escaping the string. I just tried the python writer and it's as fast as XML is. I'm still looking at the code trying to point out the reason of that. Thanks for any help. J -- Jerome Eteve. [EMAIL PROTECTED] http://www.eteve.net
XML vs JSON writer performance issues
Hi all. I'm currently running some tests with solr on a small index and I noticed a big difference on the response time of queries depending on the use of XML or json as a response format. In average, my test queries (including http connections open and close ) takes 6 ms to perform when I ask for XML and they take 30 ms when I ask for JSON. When I'm running lots of test clients at the same time, the same factor 30/6 seems to apply. I looked at the code and didn't see any major difference between the two writers. I'd rather use json instead of XML, but that performance issue prevents me to. I'm using apache-solr-1.2.0 / apache-tomcat-6.0.13 / java version "1.5.0_09" (Sun) Thanks for any comments or help. -- Jerome Eteve. [EMAIL PROTECTED] http://www.eteve.net