Re: dose solr sopport distribute index storage ?
On Mon, Oct 12, 2009 at 10:27 AM, Pravin Karne pravin_ka...@persistent.co.in wrote: How to set master/slave setup for solr. Index documents only on the master. Put the slaves behind a load balancer and query only on slaves. Setup replication between the master and slaves. See http://wiki.apache.org/solr/SolrReplication -- Regards, Shalin Shekhar Mangar.
Re: Facet query help
On Mon, Oct 12, 2009 at 6:07 AM, Tommy Chheng tommy.chh...@gmail.comwrote: The dummy data set is composed of 6 docs. My query is set for 'tommy' with the facet query of Memory_s:1+GB http://lh:8983/solr/select/?facet=truefacet.field=CPU_sfacet.field=Memory_sfacet.field=Video+Card_swt=rubyfacet.query=Memory_s:1+GBq=tommyindent=on However, in the response (http://pastie.org/650932), I get two docs: one which has the correct field Memory_s:1 GB and the second document which has a Memory_s:3+GB. Why did the second document match if i set the facet.query to just 1+GB?? facet.query does not limit documents. It is used for finding the number of documents matching the query. In order to filter the result set you should use filter query e.g. fq=Memory_s:1 GB -- Regards, Shalin Shekhar Mangar.
Re: Is negative boost possible?
Yonik Seeley wrote: On Sun, Oct 11, 2009 at 6:04 PM, Lance Norskog goks...@gmail.com wrote: And the other important thing to know about boost values is that the dynamic range is about 6-8 bits That's an index-time boost - an 8 bit float with 5 bits of mantissa and 3 bits of exponent. Query time boosts are normal 32 bit floats. To be more specific: index-time float encoding does not permit negative numbers (see SmallFloat), but query-time boosts can be negative, and they DO affect the score - see below. BTW, standard Collectors collect only results with positive scores, so if you want to collect results with negative scores as well then you need to use a custom Collector. --- BeanShell 2.0b4 - by Pat Niemeyer (p...@pat.net) bsh % import org.apache.lucene.search.*; bsh % import org.apache.lucene.index.*; bsh % import org.apache.lucene.store.*; bsh % import org.apache.lucene.document.*; bsh % import org.apache.lucene.analysis.*; bsh % tq = new TermQuery(new Term(a, b)); bsh % print(tq); a:b bsh % tq.setBoost(-1); bsh % print(tq); a:b^-1.0 bsh % q = new BooleanQuery(); bsh % tq1 = new TermQuery(new Term(a, c)); bsh % tq1.setBoost(10); bsh % q.add(tq1, BooleanClause.Occur.SHOULD); bsh % q.add(tq, BooleanClause.Occur.SHOULD); bsh % print(q); a:c^10.0 a:b^-1.0 bsh % dir = new RAMDirectory(); bsh % w = new IndexWriter(dir, new WhitespaceAnalyzer()); bsh % doc = new Document(); bsh % doc.add(new Field(a, b c d, Field.Store.YES, Field.Index.ANALYZED)); bsh % w.addDocument(doc); bsh % w.close(); bsh % r = IndexReader.open(dir); bsh % is = new IndexSearcher(r); bsh % td = is.search(q, 10); bsh % sd = td.scoreDocs; bsh % print(sd.length); 1 bsh % print(is.explain(q, 0)); 0.1373985 = (MATCH) sum of: 0.15266499 = (MATCH) weight(a:c^10.0 in 0), product of: 0.99503726 = queryWeight(a:c^10.0), product of: 10.0 = boost 0.30685282 = idf(docFreq=1, numDocs=1) 0.32427183 = queryNorm 0.15342641 = (MATCH) fieldWeight(a:c in 0), product of: 1.0 = tf(termFreq(a:c)=1) 0.30685282 = idf(docFreq=1, numDocs=1) 0.5 = fieldNorm(field=a, doc=0) -0.0152664995 = (MATCH) weight(a:b^-1.0 in 0), product of: -0.099503726 = queryWeight(a:b^-1.0), product of: -1.0 = boost 0.30685282 = idf(docFreq=1, numDocs=1) 0.32427183 = queryNorm 0.15342641 = (MATCH) fieldWeight(a:b in 0), product of: 1.0 = tf(termFreq(a:b)=1) 0.30685282 = idf(docFreq=1, numDocs=1) 0.5 = fieldNorm(field=a, doc=0) bsh % -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: rollback and cumulative_add
Koji Sekiguchi wrote: Hello, I found that rollback resets adds and docsPending count, but doesn't reset cumulative_adds. $ cd example/exampledocs # comment out the line of commit/ so avoid committing in post.sh $ ./post.sh *.xml = docsPending=19, adds=19, cumulative_adds=19 # do rollback $ curl http://localhost:8983/solr/update?rollback=true = rollbacks=1, docsPending=0, adds=0, cumulative_adds=19 Is this correct behavior? Koji (forwarded dev list) I think this is a bug that was introduced by me when I contributed the first patch for the rollback and the bug was inherited by the successive patches. I'll reopen SOLR-670 and attach the fix soon: https://issues.apache.org/jira/browse/SOLR-670 Koji -- http://www.rondhuit.com/
Re: Is negative boost possible?
On Mon, Oct 12, 2009 at 5:58 AM, Andrzej Bialecki a...@getopt.org wrote: BTW, standard Collectors collect only results with positive scores, so if you want to collect results with negative scores as well then you need to use a custom Collector. Solr never discarded non-positive hits, and now Lucene 2.9 no longer does either. -Yonik
two facet.prefix on one facet field in a single query
Is it possible to have two different facet.prefix on the same facet field in a single query. I wan to get facet counts for two prefix, xx and yy. I tried using two facet.prefix (ie facet.prefix=xxfacet.prefix=yy) but the second one seems to have no effect. Bill
Re: Facet query help
ok, so fq != facet.query. i thought it was an alias. I'm trying your suggestion fq=Memory_s:1 GB and now it's returning zero documents even though there is one document that has tommy and Memory_s:1 GB as seen in the original pastie(http://pastie.org/650932). I tried the fq query body with quotes and without quotes. http://lh:8983/solr/select/?facet=truefacet.field=CPU_sfacet.field=Memory_sfacet.field=Video+Card_swt=rubyfq=%22Memory_s:1+GB%22q=tommyindent=on Any thoughts? thanks, tommy On 10/12/09 1:00 AM, Shalin Shekhar Mangar wrote: On Mon, Oct 12, 2009 at 6:07 AM, Tommy Chhengtommy.chh...@gmail.comwrote: The dummy data set is composed of 6 docs. My query is set for 'tommy' with the facet query of Memory_s:1+GB http://lh:8983/solr/select/?facet=truefacet.field=CPU_sfacet.field=Memory_sfacet.field=Video+Card_swt=rubyfacet.query=Memory_s:1+GBq=tommyindent=on However, in the response (http://pastie.org/650932), I get two docs: one which has the correct field Memory_s:1 GB and the second document which has a Memory_s:3+GB. Why did the second document match if i set the facet.query to just 1+GB?? facet.query does not limit documents. It is used for finding the number of documents matching the query. In order to filter the result set you should use filter query e.g. fq=Memory_s:1 GB
Re: format of sort parameter in Solr::Request::Standard
I did an experiment that worked. In Solr::Request::Standard, in the to_hash() method, I changed the commented line below to the two lines following it. sort = @params[:sort].collect do |sort| key = sort.keys[0] #{key.to_s} #{sort[key] == :descending ? 'desc' : 'asc'} end.join(',') if @params[:sort] # START OF CHANGES #hash[:q] = sort ? #...@params[:query]};#{sort} : @params[:query] hash[:q] = @params[:query] hash[:sort] = sort if sort != nil # END OF CHANGES hash[q.op] = @params[:operator] hash[:df] = @params[:default_field] Does this make sense? Should this be changed in the next version of the solr-ruby gem? Paul Rosen wrote: Hi all, I'm using solr-ruby 0.0.7 and am having trouble getting Sort to work. I have the following statement: req = Solr::Request::Standard.new(:start = start, :rows = max, :sort = [ :title_sort = :ascending ], :query = query, :filter_queries = filter_queries, :field_list = @field_list, :facets = {:fields = @facet_fields, :mincount = 1, :missing = true, :limit = -1}, :highlighting = {:field_list = ['text'], :fragment_size = 600}, :shards = @cores) That produces no results, but removing the :sort parameter off does give results. Here is the output from solr: INFO: [merged] webapp=/solr path=/select params={wt=rubyfacet.limit=-1rows=30start=0facet=truefacet.mincount=1q=(rossetti);title_sort+ascfl=archive,date_label,genre,role_ART,role_AUT,role_EDT,role_PBL,role_TRL,source,image,thumbnail,text_url,title,alternative,uri,url,exhibit_type,license,title_sort,author_sortqt=standardfacet.missing=truehl.fl=textfacet.field=genrefacet.field=archivefacet.field=freeculturehl.fragsize=600hl=trueshards=localhost:8983/solr/merged} status=0 QTime=19 It looks to me like the string should have sort=title_sort+asc instead of ;title_sort_asc tacked on to the query, but I'm not sure about that. Any clues what I'm doing wrong? Thanks, Paul
Re: format of sort parameter in Solr::Request::Standard
Paul- Trunk solr-ruby has this instead: hash[:sort] = @params[:sort].collect do |sort| key = sort.keys[0] #{key.to_s} #{sort[key] == :descending ? 'desc' : 'asc'} end.join(',') if @params[:sort] The ;sort... stuff is now deprecated with Solr itself I suppose the 0.8 gem needs to be pushed to rubyforge, eh? Erik On Oct 12, 2009, at 11:03 AM, Paul Rosen wrote: I did an experiment that worked. In Solr::Request::Standard, in the to_hash() method, I changed the commented line below to the two lines following it. sort = @params[:sort].collect do |sort| key = sort.keys[0] #{key.to_s} #{sort[key] == :descending ? 'desc' : 'asc'} end.join(',') if @params[:sort] # START OF CHANGES #hash[:q] = sort ? #...@params[:query]};#{sort} : @params[:query] hash[:q] = @params[:query] hash[:sort] = sort if sort != nil # END OF CHANGES hash[q.op] = @params[:operator] hash[:df] = @params[:default_field] Does this make sense? Should this be changed in the next version of the solr-ruby gem? Paul Rosen wrote: Hi all, I'm using solr-ruby 0.0.7 and am having trouble getting Sort to work. I have the following statement: req = Solr::Request::Standard.new(:start = start, :rows = max, :sort = [ :title_sort = :ascending ], :query = query, :filter_queries = filter_queries, :field_list = @field_list, :facets = {:fields = @facet_fields, :mincount = 1, :missing = true, :limit = -1}, :highlighting = {:field_list = ['text'], :fragment_size = 600}, :shards = @cores) That produces no results, but removing the :sort parameter off does give results. Here is the output from solr: INFO: [merged] webapp=/solr path=/select params = {wt = ruby facet .limit = -1 rows=30start=0facet=truefacet.mincount=1q=(rossetti);title_sort + asc fl = archive ,date_label ,genre ,role_ART ,role_AUT ,role_EDT ,role_PBL ,role_TRL ,source ,image ,thumbnail ,text_url ,title ,alternative ,uri ,url ,exhibit_type ,license ,title_sort ,author_sort qt = standard facet .missing = true hl .fl = text facet .field = genre facet .field = archive facet.field=freeculturehl.fragsize=600hl=trueshards=localhost: 8983/solr/merged} status=0 QTime=19 It looks to me like the string should have sort=title_sort+asc instead of ;title_sort_asc tacked on to the query, but I'm not sure about that. Any clues what I'm doing wrong? Thanks, Paul
Solr over DRBD
Hi there, I have a 2 node cluster running apache and solr over a shared partition ontop of DRBD. Think of it like a SAN. I'm curios as to how I should do load balancing / sharing with Solr in this setup. I'm already using DNS round robbin for apache. My Solr installation is on /cluster/Solr. I've been starting an instance of Solr on each server out of the same installation / working directory. Is this safe? I haven't noticed any problems so far. Does this mean they'll share the same index? Is there a better way to do this? Should I perhaps only do commits on one of the servers (and setup heartbeat to determine which server to run the commit on)? I'm running Solr 1.3, but I'm not against upgrading if that provides me with a better way of load balancing. Kind regards, Pieter
capitalization and delimiters
In my search docs, I have content such as 'powershot' and 'powerShot'. I would expect 'powerShot' would be searched as 'power', 'shot' and 'powershot', so that results for all these are returned. Instead, only results for 'power' and 'shot' are returned. Any suggestions? In the schema, index analyzer:filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=1 catenateAll=0/filter class=solr.LowerCaseFilterFactory/ In the schema, query analyzerfilter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/filter class=solr.LowerCaseFilterFactory/ ThanksAudrey _ New! Open Messenger faster on the MSN homepage http://go.microsoft.com/?linkid=9677405
Re: Default query parameter for one core
Thanks for your input, Shalin. On Sun, Oct 11, 2009 at 12:30 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: - I can't use a variable like ${shardsParam} in a single shared solrconfig.xml, because the line str name=shards${shardsParam}/str has to be in there, and that forces a (possibly empty) shards parameter onto cores that *don't* need one, causing a NullPointerException. Well, we can fix the NPE :) Please raise an issue. The NPE may be the correct behavior -- I'm causing an empty shards= parameter, which doesn't have a defined behavior AFAIK. The deficiency I was pointing out was that using ${shardsParam} doesn't help me achieve my real goal, which is to have the entire str tag disappear for some shards. So I think my best bet is to make two mostly-identical solrconfig.xmls, and point core0 to the one specifying a shards= parameter: core name=core0 config=core0_solrconfig.xml/ I don't like the duplication of config, but at least it accomplishes my goal! There is another way too. Each plugin in Solr now supports a configuration attribute named enable which can be true or false. You can control the value (true/false) through a variable. So you can duplicate just the handle instead of the complete solrconfig.xml I had looked into this, but thought it doesn't help because I'm not disabling an entire plugin -- just a str tag specifying a default parameter to a requestHandler. Individual str tags don't have an enable flag for me to conditionally set to false. Maybe I'm misunderstanding what you're suggesting? Thanks again, Michael
Re: Is negative boost possible?
Yonik Seeley wrote: On Mon, Oct 12, 2009 at 5:58 AM, Andrzej Bialecki a...@getopt.org wrote: BTW, standard Collectors collect only results with positive scores, so if you want to collect results with negative scores as well then you need to use a custom Collector. Solr never discarded non-positive hits, and now Lucene 2.9 no longer does either. Hmm ... The code that I pasted in my previous email uses Searcher.search(Query, int), which in turn uses search(Query, Filter, int), and it doesn't return any results if only the first clause is present (the one with negative boost) even though it's a matching clause. I think this is related to the fact that in TopScoreDocCollector:48 the pqTop.score is initialized to 0, and then all results that have lower score that this are discarded. Perhaps this should be initialized to Float.MIN_VALUE? -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: Scoring for specific field queries
Avlesh, I got it, finally, by doing an OR between the two fields, one with an exact match keyword and the other is grouped. q=suggestion:formula xxx OR tokenized_suggestion:(formula ) Thanks for all your help! Rih On Fri, Oct 9, 2009 at 4:26 PM, R. Tan tanrihae...@gmail.com wrote: I ended up with the same set of results earlier but I don't results such as the champion, I think because of the EdgeNGram filter. With NGram, I'm back to the same problem: Result for q=ca doc float name=score0.8717008/float str name=tokenized_suggestionBlu Jazz Cafe/str /doc doc float name=score0.8717008/float str name=tokenized_suggestionCafé in the Pond/str /doc
Letters with accent in query
Hi, I'm querying with an accented keyword such as café but the debug info shows that it is only searching for caf. I'm using the ISOLatin1Accent filter as well. Query: http://localhost:8983/solr/select?q=%E9debugQuery=true Params return shows this: lst name=params str name=q/ str name=debugQuerytrue/str /lst What am I missing here? Rih
Re: Default query parameter for one core
OK, a hacky but working solution to making one core shard to all others: have the default parameter *name* vary, so that one core gets shards=foo and all other cores get dummy=foo. # solr.xml solr ... property name=shardsKey value=dummy / property name=shardsValue value= / cores ... core name=core0 instanceDir=./ property name=shardsKey value=shards / property name=shardsValue value=localhost:9990/solr/core1,.../ /core core name=core1 instanceDir=./ dataDir=/search/1/ ... /cores /solr # solrconfig.xml requestHandler ... list name=defaults str name=${shardsKey}${shardsValue}/str ... Michael On Mon, Oct 12, 2009 at 12:00 PM, Michael solrco...@gmail.com wrote: Thanks for your input, Shalin. On Sun, Oct 11, 2009 at 12:30 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: - I can't use a variable like ${shardsParam} in a single shared solrconfig.xml, because the line str name=shards${shardsParam}/str has to be in there, and that forces a (possibly empty) shards parameter onto cores that *don't* need one, causing a NullPointerException. Well, we can fix the NPE :) Please raise an issue. The NPE may be the correct behavior -- I'm causing an empty shards= parameter, which doesn't have a defined behavior AFAIK. The deficiency I was pointing out was that using ${shardsParam} doesn't help me achieve my real goal, which is to have the entire str tag disappear for some shards. So I think my best bet is to make two mostly-identical solrconfig.xmls, and point core0 to the one specifying a shards= parameter: core name=core0 config=core0_solrconfig.xml/ I don't like the duplication of config, but at least it accomplishes my goal! There is another way too. Each plugin in Solr now supports a configuration attribute named enable which can be true or false. You can control the value (true/false) through a variable. So you can duplicate just the handle instead of the complete solrconfig.xml I had looked into this, but thought it doesn't help because I'm not disabling an entire plugin -- just a str tag specifying a default parameter to a requestHandler. Individual str tags don't have an enable flag for me to conditionally set to false. Maybe I'm misunderstanding what you're suggesting? Thanks again, Michael
Re: Letters with accent in query
What tokenizer and filters are you using in what order? See schema.xml. Also, you may wish to use ASCIIFoldingFilter, which covers more cases than ISOLatin1AccentFilter. Michael On Mon, Oct 12, 2009 at 12:42 PM, R. Tan tanrihae...@gmail.com wrote: Hi, I'm querying with an accented keyword such as café but the debug info shows that it is only searching for caf. I'm using the ISOLatin1Accent filter as well. Query: http://localhost:8983/solr/select?q=%E9debugQuery=true Params return shows this: lst name=params str name=q/ str name=debugQuerytrue/str /lst What am I missing here? Rih
Search results order
Hi, I have indexed my xml which contains the following data. add doc field name=urlhttp://www.yahoo.com /field field name=titleyahoomail/field field name=descriptionyahoo has various links and gives in detail about the all the links in it/field /doc doc field name=urlhttp://www.rediff.com/field field name=titleIt is a good website/field field name=descriptionRediff has a interesting homepage/field /doc doc field name=urlhttp://www.ndtv.com/field field name=titleNdtv has a variety of good links/field field name=descriptionThe homepage of Ndtv is very good/field /doc /add In my solr home page , when I search input as “good” It displays the docs which has “good” as highest occurrences by default. The output comes as follows. doc field name=urlhttp://www.ndtv.com/field field name=titleNdtv has a variety of good links/field field name=descriptionThe homepage of Ndtv is very good/field /doc doc field name=urlhttp://www.rediff.com/field field name=titleIt is a good website/field field name=descriptionRediff has a interesting homepage/field /doc If I need to display doc which has least occurrence of search input “good” as first result. What changes should I make in solrconfig file to achieve the same?. Any suggestions would be helpful. For me the output should come as below. doc field name=urlhttp://www.rediff.com/field field name=titleIt is a good website/field field name=descriptionRediff has a interesting homepage/field /doc doc field name=urlhttp://www.ndtv.com/field field name=titleNdtv has a variety of good links/field field name=descriptionThe homepage of Ndtv is very good/field /doc Regards Bhaskar
Re: dose solr sopport distribute index storage ?
Hi, How should we setup master and slaves in Solr? What configuration files and parameters should we need to change and how ? Thanks, Chaitali --- On Mon, 10/12/09, Shalin Shekhar Mangar shalinman...@gmail.com wrote: From: Shalin Shekhar Mangar shalinman...@gmail.com Subject: Re: dose solr sopport distribute index storage ? To: solr-user@lucene.apache.org Date: Monday, October 12, 2009, 3:17 AM On Mon, Oct 12, 2009 at 10:27 AM, Pravin Karne pravin_ka...@persistent.co.in wrote: How to set master/slave setup for solr. Index documents only on the master. Put the slaves behind a load balancer and query only on slaves. Setup replication between the master and slaves. See http://wiki.apache.org/solr/SolrReplication -- Regards, Shalin Shekhar Mangar.
Conditional copyField
Hi, I am pushing data to solr from two different sources nutch and a cms. I have a data clash in that in nutch a copyField is required to push the url field to the id field as it is used as the primary lookup in the nutch solr intergration update. The other cms also uses the url field but also populates the id field with a different value. Now I can't really change either source definition so is there a way in solrconfig or schema to check if id is empty and only copy if true or is there a better way via the updateprocessor? Thanks for your help in advance Regards David
Re: format of sort parameter in Solr::Request::Standard
I've just pushed a new 0.0.8 gem to Rubyforge that includes the fix I described for the sort parameter. Erik On Oct 12, 2009, at 11:03 AM, Paul Rosen wrote: I did an experiment that worked. In Solr::Request::Standard, in the to_hash() method, I changed the commented line below to the two lines following it. sort = @params[:sort].collect do |sort| key = sort.keys[0] #{key.to_s} #{sort[key] == :descending ? 'desc' : 'asc'} end.join(',') if @params[:sort] # START OF CHANGES #hash[:q] = sort ? #...@params[:query]};#{sort} : @params[:query] hash[:q] = @params[:query] hash[:sort] = sort if sort != nil # END OF CHANGES hash[q.op] = @params[:operator] hash[:df] = @params[:default_field] Does this make sense? Should this be changed in the next version of the solr-ruby gem? Paul Rosen wrote: Hi all, I'm using solr-ruby 0.0.7 and am having trouble getting Sort to work. I have the following statement: req = Solr::Request::Standard.new(:start = start, :rows = max, :sort = [ :title_sort = :ascending ], :query = query, :filter_queries = filter_queries, :field_list = @field_list, :facets = {:fields = @facet_fields, :mincount = 1, :missing = true, :limit = -1}, :highlighting = {:field_list = ['text'], :fragment_size = 600}, :shards = @cores) That produces no results, but removing the :sort parameter off does give results. Here is the output from solr: INFO: [merged] webapp=/solr path=/select params = {wt = ruby facet .limit = -1 rows=30start=0facet=truefacet.mincount=1q=(rossetti);title_sort + asc fl = archive ,date_label ,genre ,role_ART ,role_AUT ,role_EDT ,role_PBL ,role_TRL ,source ,image ,thumbnail ,text_url ,title ,alternative ,uri ,url ,exhibit_type ,license ,title_sort ,author_sort qt = standard facet .missing = true hl .fl = text facet .field = genre facet .field = archive facet.field=freeculturehl.fragsize=600hl=trueshards=localhost: 8983/solr/merged} status=0 QTime=19 It looks to me like the string should have sort=title_sort+asc instead of ;title_sort_asc tacked on to the query, but I'm not sure about that. Any clues what I'm doing wrong? Thanks, Paul
Re: dose solr sopport distribute index storage ?
On 10/12/2009 10:49 AM, Chaitali Gupta wrote: Hi, How should we setup master and slaves in Solr? What configuration files and parameters should we need to change and how ? Thanks, Chaitali Hi - I think Shalin was pretty clear on that, it is documented very well at http://wiki.apache.org/solr/SolrReplication . I am responding, however, to explain something that took me a bit of time to wrap my brain around in the hopes that it helps you and perhaps some others. Solr in itself does not replicate. Instead, Solr relies on an underlying rsync setup to keep these indices sync'd throughout the collective. When you break it down, its simply rsync with a configuration file making all the nodes aware that they participate in this configuration. Wrap a cron around this between all the nodes, and they simply replicate raw data from one master to one or more slave. I would suggest reading up on how snapshots are preformed and how the log files are created/what they do. Of course it would benefit you to know the ins and outs of all the elements that help Solr replicate, but its been my experience that most of it has to do with those particular items. Thanks -dant
Re: dose solr sopport distribute index storage ?
Sorry for the hijack, but s replication necessary when using a cluster file-system such as GFS2. Where the files are the same for any instance of Solr? On Mon, Oct 12, 2009 at 8:36 PM, Dan Trainor dtrai...@toolbox.com wrote: On 10/12/2009 10:49 AM, Chaitali Gupta wrote: Hi, How should we setup master and slaves in Solr? What configuration files and parameters should we need to change and how ? Thanks, Chaitali Hi - I think Shalin was pretty clear on that, it is documented very well at http://wiki.apache.org/solr/SolrReplication . I am responding, however, to explain something that took me a bit of time to wrap my brain around in the hopes that it helps you and perhaps some others. Solr in itself does not replicate. Instead, Solr relies on an underlying rsync setup to keep these indices sync'd throughout the collective. When you break it down, its simply rsync with a configuration file making all the nodes aware that they participate in this configuration. Wrap a cron around this between all the nodes, and they simply replicate raw data from one master to one or more slave. I would suggest reading up on how snapshots are preformed and how the log files are created/what they do. Of course it would benefit you to know the ins and outs of all the elements that help Solr replicate, but its been my experience that most of it has to do with those particular items. Thanks -dant
Re: Search results order
You can reverse the sort order. In this case, you want score ascending: sort=score+asc If you just want documents without that keyword, then try using the minus sign: q=-good http://wiki.apache.org/solr/CommonQueryParameters -Nick On Mon, Oct 12, 2009 at 1:19 PM, bhaskar chandrasekar bas_s...@yahoo.co.inwrote: Hi, I have indexed my xml which contains the following data. add doc field name=urlhttp://www.yahoo.com /field field name=titleyahoomail/field field name=descriptionyahoo has various links and gives in detail about the all the links in it/field /doc doc field name=urlhttp://www.rediff.com/field field name=titleIt is a good website/field field name=descriptionRediff has a interesting homepage/field /doc doc field name=urlhttp://www.ndtv.com/field field name=titleNdtv has a variety of good links/field field name=descriptionThe homepage of Ndtv is very good/field /doc /add In my solr home page , when I search input as “good” It displays the docs which has “good” as highest occurrences by default. The output comes as follows. doc field name=urlhttp://www.ndtv.com/field field name=titleNdtv has a variety of good links/field field name=descriptionThe homepage of Ndtv is very good/field /doc doc field name=urlhttp://www.rediff.com/field field name=titleIt is a good website/field field name=descriptionRediff has a interesting homepage/field /doc If I need to display doc which has least occurrence of search input “good” as first result. What changes should I make in solrconfig file to achieve the same?. Any suggestions would be helpful. For me the output should come as below. doc field name=urlhttp://www.rediff.com/field field name=titleIt is a good website/field field name=descriptionRediff has a interesting homepage/field /doc doc field name=urlhttp://www.ndtv.com/field field name=titleNdtv has a variety of good links/field field name=descriptionThe homepage of Ndtv is very good/field /doc Regards Bhaskar
Re: Boosting of words
The easiest way to boost your query is to modify your query string. q=product:red color:red^10 In the above example, I have boosted the color field. If red is found in that field, it will get a boost of 10. If it is only found in the product field, then there will be no boost. Here's more information: http://wiki.apache.org/solr/SolrRelevancyCookbook#Boosting_Ranking_Terms Once you're comfortable with that, I suggest that you look into using the DisMax request handler. It will allow you to easily search across multiple fields with custom boost values. http://wiki.apache.org/solr/DisMaxRequestHandler -Nick On Sun, Oct 11, 2009 at 12:26 PM, bhaskar chandrasekar bas_s...@yahoo.co.in wrote: Hi, I would like to know how can i give boosting to search input in Solr. Where exactly should i make the changes?. Regards Bhaskar
Re: Is negative boost possible?
On Mon, Oct 12, 2009 at 12:03 PM, Andrzej Bialecki a...@getopt.org wrote: Solr never discarded non-positive hits, and now Lucene 2.9 no longer does either. Hmm ... The code that I pasted in my previous email uses Searcher.search(Query, int), which in turn uses search(Query, Filter, int), and it doesn't return any results if only the first clause is present (the one with negative boost) even though it's a matching clause. I think this is related to the fact that in TopScoreDocCollector:48 the pqTop.score is initialized to 0, and then all results that have lower score that this are discarded. Perhaps this should be initialized to Float.MIN_VALUE? Hmmm, You're actually seeing this with Lucene 2.9? The HitQueue (subclass of PriorityQueue) is pre-populated with sentinel objects with scores of -Inf, not zero. -Yonik http://www.lucidimagination.com
Re: Conditional copyField
Hi, I am pushing data to solr from two different sources nutch and a cms. I have a data clash in that in nutch a copyField is required to push the url field to the id field as it is used as the primary lookup in the nutch solr intergration update. The other cms also uses the url field but also populates the id field with a different value. Now I can't really change either source definition so is there a way in solrconfig or schema to check if id is empty and only copy if true or is there a better way via the updateprocessor? copyField declaration has three attributes: source, dest and maxChars. Therefore it can be concluded that there is no way to do it in schema.xml Luckily, Wiki [1] has a quick example that implements a conditional copyField. [1] http://wiki.apache.org/solr/UpdateRequestProcessor
doing searches from within an UpdateRequestProcessor
Is it possible to do searches from within an UpdateRequestProcessor? The documents in my index reference each other. When a document is deleted, I would like to update all documents containing a reference to the deleted document. My initial idea is to use a custom UpdateRequestProcessor. Is there a better way to do this? Bill
Lucene Merge Threads
Hi, I'm attempting to optimize a pretty large index, and even though the optimize request timed out, I watched it using a profiler and saw that the optimize thread continued executing. Eventually it completed, but in the background I still see a thread performing a merge: Lucene Merge Thread #0 [RUNNABLE, IN_NATIVE] CPU time: 17:51 java.io.RandomAccessFile.readBytes(byte[], int, int) java.io.RandomAccessFile.read(byte[], int, int) org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(byte[], int, int) org.apache.lucene.store.BufferedIndexInput.refill() org.apache.lucene.store.BufferedIndexInput.readByte() org.apache.lucene.store.IndexInput.readVInt() org.apache.lucene.index.TermBuffer.read(IndexInput, FieldInfos) org.apache.lucene.index.SegmentTermEnum.next() org.apache.lucene.index.SegmentMergeInfo.next() org.apache.lucene.index.SegmentMerger.mergeTermInfos(FormatPostingsFieldsConsumer) org.apache.lucene.index.SegmentMerger.mergeTerms() org.apache.lucene.index.SegmentMerger.merge(boolean) org.apache.lucene.index.IndexWriter.mergeMiddle(MergePolicy$OneMerge) org.apache.lucene.index.IndexWriter.merge(MergePolicy$OneMerge) org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(MergePolicy$OneMerge) org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run() This has taken quite a while, and hasn't really been fully utilizing the machine's resources. After looking at the Lucene source, I noticed that you can set a MaxThreadCount parameter in this class. Is this parameter exposed by Solr somehow? I see the class mentioned, commented out, in my solrconfig.xml, but I'm not sure of the correct way to specify the parameter: !-- Expert: The Merge Scheduler in Lucene controls how merges are performed. The ConcurrentMergeScheduler (Lucene 2.3 default) can perform merges in the background using separate threads. The SerialMergeScheduler (Lucene 2.2 default) does not. -- !--mergeSchedulerorg.apache.lucene.index.ConcurrentMergeScheduler/mergeScheduler-- Also, if I can specify this parameter, is it safe to just start/stop my servlet server (Tomcat) mid-merge? Thanks in advance, Gio.
Re: Lucene Merge Threads
Try this in solrconfig.xml: mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler int name=maxThreadCount1/int /mergeScheduler Yes you can stop the process mid-merge. The partially merged files will be deleted on restart. We need to update the wiki? On Mon, Oct 12, 2009 at 4:05 PM, Giovanni Fernandez-Kincade gfernandez-kinc...@capitaliq.com wrote: Hi, I'm attempting to optimize a pretty large index, and even though the optimize request timed out, I watched it using a profiler and saw that the optimize thread continued executing. Eventually it completed, but in the background I still see a thread performing a merge: Lucene Merge Thread #0 [RUNNABLE, IN_NATIVE] CPU time: 17:51 java.io.RandomAccessFile.readBytes(byte[], int, int) java.io.RandomAccessFile.read(byte[], int, int) org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(byte[], int, int) org.apache.lucene.store.BufferedIndexInput.refill() org.apache.lucene.store.BufferedIndexInput.readByte() org.apache.lucene.store.IndexInput.readVInt() org.apache.lucene.index.TermBuffer.read(IndexInput, FieldInfos) org.apache.lucene.index.SegmentTermEnum.next() org.apache.lucene.index.SegmentMergeInfo.next() org.apache.lucene.index.SegmentMerger.mergeTermInfos(FormatPostingsFieldsConsumer) org.apache.lucene.index.SegmentMerger.mergeTerms() org.apache.lucene.index.SegmentMerger.merge(boolean) org.apache.lucene.index.IndexWriter.mergeMiddle(MergePolicy$OneMerge) org.apache.lucene.index.IndexWriter.merge(MergePolicy$OneMerge) org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(MergePolicy$OneMerge) org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run() This has taken quite a while, and hasn't really been fully utilizing the machine's resources. After looking at the Lucene source, I noticed that you can set a MaxThreadCount parameter in this class. Is this parameter exposed by Solr somehow? I see the class mentioned, commented out, in my solrconfig.xml, but I'm not sure of the correct way to specify the parameter: !-- Expert: The Merge Scheduler in Lucene controls how merges are performed. The ConcurrentMergeScheduler (Lucene 2.3 default) can perform merges in the background using separate threads. The SerialMergeScheduler (Lucene 2.2 default) does not. -- !--mergeSchedulerorg.apache.lucene.index.ConcurrentMergeScheduler/mergeScheduler-- Also, if I can specify this parameter, is it safe to just start/stop my servlet server (Tomcat) mid-merge? Thanks in advance, Gio.
RE: Lucene Merge Threads
Do you have to make a new call to optimize to make it start the merge again? -Original Message- From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] Sent: Monday, October 12, 2009 7:28 PM To: solr-user@lucene.apache.org Subject: Re: Lucene Merge Threads Try this in solrconfig.xml: mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler int name=maxThreadCount1/int /mergeScheduler Yes you can stop the process mid-merge. The partially merged files will be deleted on restart. We need to update the wiki? On Mon, Oct 12, 2009 at 4:05 PM, Giovanni Fernandez-Kincade gfernandez-kinc...@capitaliq.com wrote: Hi, I'm attempting to optimize a pretty large index, and even though the optimize request timed out, I watched it using a profiler and saw that the optimize thread continued executing. Eventually it completed, but in the background I still see a thread performing a merge: Lucene Merge Thread #0 [RUNNABLE, IN_NATIVE] CPU time: 17:51 java.io.RandomAccessFile.readBytes(byte[], int, int) java.io.RandomAccessFile.read(byte[], int, int) org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(byte[], int, int) org.apache.lucene.store.BufferedIndexInput.refill() org.apache.lucene.store.BufferedIndexInput.readByte() org.apache.lucene.store.IndexInput.readVInt() org.apache.lucene.index.TermBuffer.read(IndexInput, FieldInfos) org.apache.lucene.index.SegmentTermEnum.next() org.apache.lucene.index.SegmentMergeInfo.next() org.apache.lucene.index.SegmentMerger.mergeTermInfos(FormatPostingsFieldsConsumer) org.apache.lucene.index.SegmentMerger.mergeTerms() org.apache.lucene.index.SegmentMerger.merge(boolean) org.apache.lucene.index.IndexWriter.mergeMiddle(MergePolicy$OneMerge) org.apache.lucene.index.IndexWriter.merge(MergePolicy$OneMerge) org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(MergePolicy$OneMerge) org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run() This has taken quite a while, and hasn't really been fully utilizing the machine's resources. After looking at the Lucene source, I noticed that you can set a MaxThreadCount parameter in this class. Is this parameter exposed by Solr somehow? I see the class mentioned, commented out, in my solrconfig.xml, but I'm not sure of the correct way to specify the parameter: !-- Expert: The Merge Scheduler in Lucene controls how merges are performed. The ConcurrentMergeScheduler (Lucene 2.3 default) can perform merges in the background using separate threads. The SerialMergeScheduler (Lucene 2.2 default) does not. -- !--mergeSchedulerorg.apache.lucene.index.ConcurrentMergeScheduler/mergeScheduler-- Also, if I can specify this parameter, is it safe to just start/stop my servlet server (Tomcat) mid-merge? Thanks in advance, Gio.
Re: two facet.prefix on one facet field in a single query
It looks like there is a JIRA covering this: https://issues.apache.org/jira/browse/SOLR-1387 On Mon, Oct 12, 2009 at 11:00 AM, Bill Au bill.w...@gmail.com wrote: Is it possible to have two different facet.prefix on the same facet field in a single query. I wan to get facet counts for two prefix, xx and yy. I tried using two facet.prefix (ie facet.prefix=xxfacet.prefix=yy) but the second one seems to have no effect. Bill
XSLT Response for multivalue fields
I am having trouble generating the xsl file for multivalue entries. I'm not sure I'm missing something, or if this is how it is supposed to function. I have to authors and I'd like to have seperate ByLine notes in my translation. Here is what solr returns normally ... arr name=author strCrista Souza/str strDarrell Dunn/str /arr Here is my xsl xsl:for-each select=a...@name='author']::* ByLine xsl:value-of select=./ /ByLine /xsl:for-each And here is what it is returning: ByLineCrista SouzaDarrell Dunn/ByLine I was expecting it to return ByLineCrista Souza/ByLine ByLineDarrell Dunn/ByLine I've tried other variations and using templates instead but it keeps displaying the same thing, one ByLine field with things mushed together. Any clues if this is an issue with xslt code, the xslt response Writer, XALAN, or solr? I've no clues where to go from here. Any ideas to point me in the right direction appreciated. -- View this message in context: http://www.nabble.com/XSLT-Response-for-multivalue-fields-tp25865618p25865618.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 1.4 Release Party
It is my email signature. It is a sort of hybrid/mashup from different sources. On Mon, Oct 12, 2009 at 6:49 PM, Michael Masters mmast...@gmail.com wrote: Where does the quote come from :) On Sat, Oct 10, 2009 at 6:38 AM, Israel Ekpo israele...@gmail.com wrote: I can't wait... -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.
Re: Boosting of words
Hi Nicholas, Thanks for your input.Where exactly the query q=product:red color:red^10 should be used and defined?. Help me. Regards Bhaskar --- On Mon, 10/12/09, Nicholas Clark clark...@gmail.com wrote: From: Nicholas Clark clark...@gmail.com Subject: Re: Boosting of words To: solr-user@lucene.apache.org Date: Monday, October 12, 2009, 2:13 PM The easiest way to boost your query is to modify your query string. q=product:red color:red^10 In the above example, I have boosted the color field. If red is found in that field, it will get a boost of 10. If it is only found in the product field, then there will be no boost. Here's more information: http://wiki.apache.org/solr/SolrRelevancyCookbook#Boosting_Ranking_Terms Once you're comfortable with that, I suggest that you look into using the DisMax request handler. It will allow you to easily search across multiple fields with custom boost values. http://wiki.apache.org/solr/DisMaxRequestHandler -Nick On Sun, Oct 11, 2009 at 12:26 PM, bhaskar chandrasekar bas_s...@yahoo.co.in wrote: Hi, I would like to know how can i give boosting to search input in Solr. Where exactly should i make the changes?. Regards Bhaskar
RE: Lucene Merge Threads
This didn't end up working. I got the following error when I tried to commit: Oct 12, 2009 8:36:42 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error loading class ' 5 ' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:310) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:325) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:81) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:178) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:123) at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:172) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:400) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:168) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.ClassNotFoundException: 5 at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.$$YJP$$doPrivileged(Native Method) at java.security.AccessController.doPrivileged(Unknown Source) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.net.FactoryURLClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClassInternal(Unknown Source) at java.lang.Class.$$YJP$$forName0(Native Method) at java.lang.Class.forName0(Unknown Source) at java.lang.Class.forName(Unknown Source) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:294) ... 28 more I believe it's because the MaxThreadCount is not a public property of the ConcurrentMergeSchedulerClass. You have to call this method to set it: public void setMaxThreadCount(int count) { if (count 1) throw new IllegalArgumentException(count should be at least 1); maxThreadCount = count; } Is that possible through the solrconfig? Thanks, Gio. -Original Message- From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com] Sent: Monday, October 12, 2009 7:53 PM To: solr-user@lucene.apache.org Subject: RE: Lucene Merge Threads Do you have to make a new call to optimize to make it start the merge again? -Original Message- From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] Sent: Monday, October 12, 2009 7:28 PM To: solr-user@lucene.apache.org Subject: Re: Lucene Merge Threads Try this in solrconfig.xml: mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler int name=maxThreadCount1/int /mergeScheduler Yes you can stop the process mid-merge. The partially merged files will be deleted on restart. We need to update the wiki? On Mon, Oct 12, 2009 at 4:05 PM, Giovanni Fernandez-Kincade
SpellCheck Index not building
Hi, I am using Solr 1.3 for spell checking. I am facing a strange problem of spell checking index not been generated. When I have less number of documents (less than 1000) indexed then the spell check index builds, but when the documents are more (around 40K), then the index for spell checking does not build. I can see the directory for spell checking build and there are two files under it: segments_3 segments.gen I am using the following query to build the spell checking index: /select params={spellcheck=truestart=0qt=contentsearchwt=xmlrows=0spellcheck.build=trueversion=2.2 In the logs I see: INFO: [] webapp=/solr path=/select params={spellcheck=truestart=0qt=contentsearchwt=xmlrows=0spellcheck.build=trueversion=2.2} hits=37467 status=0 QTime=44 Please help me solve this problem. Here is my configuration: *schema.xml:* fieldType name=textSpell class=solr.TextField positionIncrementGap=100 stored=false multiValued=true analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType field name=a_spell type=textSpell / copyField source=title dest=a_spell / copyField source=content dest=a_spell / *solrconfig.xml:* requestHandler name=contentsearch class=solr.DisMaxRequestHandler lst name=defaults str name=defTypedismax/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.count5/str str name=spellcheck.collatetrue/str str name=spellcheck.dictionaryjarowinkler/str /lst arr name=last-components strspellcheck/str /arr /requestHandler searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=namea_spell/str str name=fielda_spell/str str name=spellcheckIndexDir./spellchecker_a_spell/str str name=accuracy0.7/str /lst lst name=spellchecker str name=namejarowinkler/str str name=fielda_spell/str !-- Use a different Distance Measure -- str name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str str name=spellcheckIndexDir./spellchecker_a_spell/str str name=accuracy0.7/str /lst /searchComponent -- Thanks Varun Gupta
Re: SpellCheck Index not building
On Tue, Oct 13, 2009 at 8:36 AM, Varun Gupta varun.vgu...@gmail.com wrote: Hi, I am using Solr 1.3 for spell checking. I am facing a strange problem of spell checking index not been generated. When I have less number of documents (less than 1000) indexed then the spell check index builds, but when the documents are more (around 40K), then the index for spell checking does not build. I can see the directory for spell checking build and there are two files under it: segments_3 segments.gen It seems that you might be running out of memory with a larger index. Can you check the logs to see if it has any exceptions recorded? -- Regards, Shalin Shekhar Mangar.
Re: SpellCheck Index not building
No, there are no exceptions in the logs. -- Thanks Varun Gupta On Tue, Oct 13, 2009 at 8:46 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Tue, Oct 13, 2009 at 8:36 AM, Varun Gupta varun.vgu...@gmail.com wrote: Hi, I am using Solr 1.3 for spell checking. I am facing a strange problem of spell checking index not been generated. When I have less number of documents (less than 1000) indexed then the spell check index builds, but when the documents are more (around 40K), then the index for spell checking does not build. I can see the directory for spell checking build and there are two files under it: segments_3 segments.gen It seems that you might be running out of memory with a larger index. Can you check the logs to see if it has any exceptions recorded? -- Regards, Shalin Shekhar Mangar.
Re: doing searches from within an UpdateRequestProcessor
A custom UpdateRequestProcessor is the solution. You can access the searcher in a UpdateRequestProcessor. On Tue, Oct 13, 2009 at 4:20 AM, Bill Au bill.w...@gmail.com wrote: Is it possible to do searches from within an UpdateRequestProcessor? The documents in my index reference each other. When a document is deleted, I would like to update all documents containing a reference to the deleted document. My initial idea is to use a custom UpdateRequestProcessor. Is there a better way to do this? Bill -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: search by some functionality
: Maybe I'm missing something, but function queries aren't involved in : determining whether a document matches or not, only its score. How is a a : custom function / value-source going to filter? it's not ... i didn't realize that was the context of the question, i was just answering the specific question about how to create custom functions. -Hoss
Re: Weird Facet and KeywordTokenizerFactory Issue
: I had to be brief as my facets are in the order of 100K over 800K documents : and also if I give the complete schema.xml I was afraid nobody would read my : long message :-) ..Hence I showed only relevant pieces of the result showing : different fields having same problem relevant is good, but you have to provide a consistent picture from start to finish ... you don't need to show 1,000 lines of facet field output, but you at least need to show the field names. : fieldType name=keywordText class=solr.TextField : sortMissingLast=true omitNorms=true positionIncrementGap=100 : analyzer type=index : tokenizer class=solr.KeywordTokenizerFactory/ : filter class=solr.TrimFilterFactory / : filter class=solr.StopFilterFactory ignoreCase=true : words=stopwords.txt,entity-stopwords.txt enablePositionIncrements=true/ : : filter class=solr.SynonymFilterFactory synonyms=synonyms.txt : ignoreCase=true expand=false / : filter class=solr.RemoveDuplicatesTokenFilterFactory/ : /analyzer ...have you used analysis.jsp to see what terms that analyzer produces based on the strings you are indexing for your documents? becuase combined with synonyms like this... : New York, N.Y., NY = New York ...it doesn't suprise me that you're getting New as an indexed term. By default SynonymFilter uses whitespace to delimit tokens in multi-token synonyms, so for some input like NY you should see it produce the token New and York you can use the tokenizerFactory attribute on SynonymFilterFactory to specify a TokenizerFactory class to use when parsing synonyms.txt -Hoss
Re: Question about PatternReplace filter and automatic Synonym generation
: There is a Solr.PatternTokenizerFactory class which likely fits the bill in : this case. The related question I have is this - is it possible to have : multiple Tokenizers in your analysis chain? No .. Tokenizers consume CharReaders and produce a TokenStream ... what's needed here is a TokenFilter that comsumes a TOkenStream and produces a TokenStream -Hoss
Re: De-basing / re-basing docIDs, or how to effectively pass calculated values from a Scorer or Filter up to (Solr's) QueryComponent.process
: In the code I'm working with, I generate a cache of calculated values as a : by-product within a Filter.getDocidSet implementation (and within a Query-ized : version of the filter and its Scorer method) . These values are keyed off the : IndexReader's docID values, since that's all that's accessible at that level. : Ultimately, however, I need to be able to access these values much higher up : in the stack (Solr's QueryComponent.process method), so that I can inject the my suggestion would be to change your Filter to use the FieldCache to lookup the uiqueKey for your docid, and base your cache off that ... then other uses of your cache (higher up the chain) will have an idea that makes sense outside the ocntext of segment reader. -Hoss
Re: DIH and EmbeddedSolr
Hey Any reason why it may be happening ?? Regards Rohan On Sun, Oct 11, 2009 at 9:25 PM, rohan rai hiroha...@gmail.com wrote: Small data set.. ?xml version=1.0 encoding=UTF-8 ? root test id11/id name11/name type11/type /test test id22/id name22/name type22/type /test test id33/id name33/name type33/type /test /root data-config dataConfig dataSource type=FileDataSource/ document entity name=test processor=XPathEntityProcessor forEach=/root/test/ url=/home/test/test_data.xml field column=id name=id xpath=/root/test/id/ field column=name name=name xpath=/root/test/name/ field column=type name=type xpath=/root/test/type/ /entity /document /dataConfig schema ?xml version=1.0 ? schema name=test version=1.1 types fieldtype name=string class=solr.StrField sortMissingLast=true omitNorms=true/ /types fields field name=id type=string indexed=true stored=true multiValued=false required=true/ field name=typetype=string indexed=true stored=true multiValued=false / field name=nametype=string indexed=true stored=true multiValued=false / /fields uniqueKeyid/uniqueKey defaultSearchFieldname/defaultSearchField solrQueryParser defaultOperator=OR/ /schema Sometime it creates sometimes it gives thread pool exception. It does not consistently creates the index. Regards Rohan On Sun, Oct 11, 2009 at 3:56 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Sat, Oct 10, 2009 at 7:44 PM, rohan rai hiroha...@gmail.com wrote: This is pretty unstable...anyone has any clue...Sometimes it even creates index, sometimes it does not ?? Most DataImportHandler tests run Solr in an embedded-like mode and they run fine. Can you tell us which version of Solr are you using? Also, any data which can help us reproduce the problem would be nice. -- Regards, Shalin Shekhar Mangar.