Re: Facets with an IDF concept
Hi again, I guess nobody has used facets in the way I described below before. Do any of the experts have any ideas as to how to do this efficiently and correctly? Any thoughts would be greatly appreciated. Thanks, Asif On Wed, Jun 17, 2009 at 12:42 PM, Asif Rahman wrote: > Hi all, > > We have an index of news articles that are tagged with news topics. > Currently, we use solr facets to see which topics are popular for a given > query or time period. I'd like to apply the concept of IDF to the facet > counts so as to penalize the topics that occur broadly through our index. > I've begun to write custom facet component that applies the IDF to the facet > counts, but I also wanted to check if anyone has experience using facets in > this way. > > Thanks, > > Asif > -- Asif Rahman Lead Engineer - NewsCred a...@newscred.com http://platform.newscred.com
Re: Facets with an IDF concept
Hi Asif, I was holding back because we have a similar problem, but we're not sure how best to approach it, or even whether approaching it at all is the right thing to do. Background: - large index (~35m documents) - about 120k on these include full text book contents plus metadata, the rest are just metadata - we plan to increase number of full text books to around 1m, number of records will greatly increase We've found that because of the sheer volume of content in full text, we get lots of results in full text of very low relevance. The Lucene relevance ranking works wonderfully to "hide" these way down the list, and when these are the only results at all, the user may be delighted to find obscure hits. But when you search for, say : soldier of fortune : one of the 55k+ results is Huck Finn, with 4 "soldier(s)" and 6 "fortunes", but it probably isn't relevant. The searcher will find it in the result sets, but should the author, subject, dates, formats etc (our facets) of Huck Finn be contributing to the facets shown to the user as equally as, say, the top 500 results? Maybe, but perhaps they are "diluting" the value of facets contributed by the more relevant results. So, we are considering restricting the contents of the result bit set used for faceting to exclude results with a very very low score (with our own QueryComponent). But there are problems: - what's a low score? How will a low score threshold vary across queries? (Or should we use a rank cutoff instead, which is much more expensive to compute, or some combo that works with results that only have very low relevance results?) - should we do this for all facets, or just some (where the less relevant results seem particularly annoying, as they can "mask" facets from the most relevant results - the authors, years and subjects we have full text for are not representative of the whole corpus) - if a searcher pages through to the 1000th result page, down to these less relevant results, should we somehow include these results in the facets we show? sorry, only more questions! Regards, Kent Fitch On Tue, Jun 23, 2009 at 5:58 PM, Asif Rahman wrote: > Hi again, > > I guess nobody has used facets in the way I described below before. Do any > of the experts have any ideas as to how to do this efficiently and > correctly? Any thoughts would be greatly appreciated. > > Thanks, > > Asif > > On Wed, Jun 17, 2009 at 12:42 PM, Asif Rahman wrote: > >> Hi all, >> >> We have an index of news articles that are tagged with news topics. >> Currently, we use solr facets to see which topics are popular for a given >> query or time period. I'd like to apply the concept of IDF to the facet >> counts so as to penalize the topics that occur broadly through our index. >> I've begun to write custom facet component that applies the IDF to the facet >> counts, but I also wanted to check if anyone has experience using facets in >> this way. >> >> Thanks, >> >> Asif >> > > > > -- > Asif Rahman > Lead Engineer - NewsCred > a...@newscred.com > http://platform.newscred.com >
No wildcards with solr.ASCIIFoldingFilterFactory?
Hi all, could somebody help me to understand why I can not search with wildcard if I use the solr.ASCIIFoldingFilterFactory? So I get results if I am searching for "münchen", "munchen" or "munchen*", but I get no results if I do the search for "münchen*". The original records contain the terms "München" and "Münchener". The solr.ASCIIFoldingFilterFactory is configured on both sides index and query. We are using the 1.4-dev version from trunk. Thank you very much! Regards, Vladimir -- View this message in context: http://www.nabble.com/No-wildcards-with-solr.ASCIIFoldingFilterFactory--tp24162104p24162104.html Sent from the Solr - User mailing list archive at Nabble.com.
Search returning 0 results for "U2"
We are moving from Lucene indexer and readers to a Hybrid solution where we still use the Lucene Indexer but use Solr for querying the index. I am indexing our content using Lucene 2.3 and have a field called "contents" which is tokenized and stored. When I search the contents field for "U2" using Luke the correct document turns up. However, searching for U2 in solr returns nothing. Any ideas? Lucene field is as follows: doc.add(new Field("contents", body.replaceAll(" ", ""), Field.Store.YES, Field.Index.TOKENIZED)); Lucene is using the following analyzer: result = new ISOLatin1AccentFilter(result); result = new StandardFilter(result); result = new LowerCaseFilter(result); result = new StopFilter(result,StandardAnalyzer.STOP_WORDS);//, stopTable); result = new PorterStemFilter(result); In Solr I have the field mapped as a text field stored and indexed. The text field uses Regards, John *** The information in this e-mail is confidential and may be legally privileged. It is intended solely for the addressee. Access to this e-mail by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution, or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Please note that emails to, from and within RTÉ may be subject to the Freedom of Information Act 1997 and may be liable to disclosure.
Bug with QueryParser.
Hi, I have tried both queryParser and MultifieldQueryParser. Suppose you want to search Douglas Adams and give default operator as AND. Lets say on the field author. Instead of generating query like +(author:douglas author:dougla) +(author:adams author:adam) it generates +author:douglas +author:adams +author:dougla +author:adam. Can any one tell me how to fix this. TIA Saurabh -- View this message in context: http://www.nabble.com/Bug-with-QueryParser.-tp24163501p24163501.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ExtractRequestHandler - not properly indexing office docs?
Can you change the text field to be stored and then point the LukeRequestHandler at that field (/admin/luke) and report back? Also, can you post your full schema and config? Finally, can you get the example to work? On Jun 23, 2009, at 1:41 AM, cloax wrote: I've tried 'text' ( taken from the example config ) and then tried creating a new field called doc_content and using that. Neither has worked. Grant Ingersoll-6 wrote: What's your default search field? On Jun 22, 2009, at 12:29 PM, cloax wrote: Yep, I've tried both of those and still no joy. Here's both my curl statement and the resulting Solr log output. curl http://localhost:8983/solr/update/extract?ext.def.fl=text \&ext.literal.id=1\&ext.map.div=text\&ext.capture=div -F "myfi...@dj_character.doc" Curls output: 0317 Solr log: Jun 22, 2009 12:21:42 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/update/extract params ={ext.map.div=text&ext.def.fl=text&ext.capture=div&ext.literal.id=1} status=0 QTime=544 Jun 22, 2009 12:22:26 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[1]} 0 317 Jun 22, 2009 12:22:26 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/update/extract params ={ext.map.div=text&ext.def.fl=text&ext.capture=div&ext.literal.id=1} status=0 QTime=317 Jun 22, 2009 12:22:37 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params = {wt = standard &rows = 10 &start = 0 &explainOther =&hl.fl=&indent=on&q=kondel&fl=*,score&qt=standard&version=2.2} hits=0 status=0 QTime=2 The submitted document has "kondel" in it numerous times, so Solr should have a hit. Yet it returns nothing. I also made sure I committed, but that didn't seem to help either. Grant Ingersoll-6 wrote: Do you have a default field declared? &ext.default.fl= Either that, or you need to explicitly capture the fields you are interested in using &ext.capture= You could add this to your curl statement to try out. -Grant -- View this message in context: http://www.nabble.com/ExtractRequestHandler---not-properly-indexing-office-docs--tp24120125p24150763.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- View this message in context: http://www.nabble.com/ExtractRequestHandler---not-properly-indexing-office-docs--tp24120125p24159267.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Facets with an IDF concept
On Jun 23, 2009, at 3:58 AM, Asif Rahman wrote: Hi again, I guess nobody has used facets in the way I described below before. Do any of the experts have any ideas as to how to do this efficiently and correctly? Any thoughts would be greatly appreciated. Thanks, Asif On Wed, Jun 17, 2009 at 12:42 PM, Asif Rahman wrote: Hi all, We have an index of news articles that are tagged with news topics. Currently, we use solr facets to see which topics are popular for a given query or time period. I'd like to apply the concept of IDF to the facet counts so as to penalize the topics that occur broadly through our index. I've begun to write custom facet component that applies the IDF to the facet counts, but I also wanted to check if anyone has experience using facets in this way. I'm not sure I'm following. Would you be faceting on one field, but using the DF from some other field? Faceting is already a count of all the documents that contain the term on a given field for that search. If I'm understanding, you would still do the typical faceting, but then rerank by the global DF values, right? Backing up, what is the problem you are seeing that you are trying to solve? I think you could do this, but you'd have to hook it in yourself. By penalize, do you mean remove, or just have them in the sort? Generally speaking, looking up the DF value can be expensive, especially if you do a lot of skipping around. I don't know how pluggable the sort capabilities are for faceting, but that might be the place to start if you are just looking at the sorting options. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Facets with an IDF concept
Hi Kent, Your problem is close cousin of the problem that we're tackling. We have experience the same problem as you when calculating facets on MoreLikeThis queries, since those queries tend to match a lot of documents. We used one of the solutions that you mentioned, rank cutoff, to solve it. We first run the MoreLikeThis query, then use the top N documents' unique ids as a filter query for a second query. The performance is still acceptable, however our index size is smaller than yours by an order of magnitude. Regards, Asif On Tue, Jun 23, 2009 at 10:34 AM, Kent Fitch wrote: > Hi Asif, > > I was holding back because we have a similar problem, but we're not > sure how best to approach it, or even whether approaching it at all is > the right thing to do. > > Background: > - large index (~35m documents) > - about 120k on these include full text book contents plus metadata, > the rest are just metadata > - we plan to increase number of full text books to around 1m, number > of records will greatly increase > > We've found that because of the sheer volume of content in full text, > we get lots of results in full text of very low relevance. The Lucene > relevance ranking works wonderfully to "hide" these way down the list, > and when these are the only results at all, the user may be delighted > to find obscure hits. > > But when you search for, say : soldier of fortune : one of the 55k+ > results is Huck Finn, with 4 "soldier(s)" and 6 "fortunes", but it > probably isn't relevant. The searcher will find it in the result > sets, but should the author, subject, dates, formats etc (our facets) > of Huck Finn be contributing to the facets shown to the user as > equally as, say, the top 500 results? Maybe, but perhaps they are > "diluting" the value of facets contributed by the more relevant > results. > > So, we are considering restricting the contents of the result bit set > used for faceting to exclude results with a very very low score (with > our own QueryComponent). But there are problems: > > - what's a low score? How will a low score threshold vary across > queries? (Or should we use a rank cutoff instead, which is much more > expensive to compute, or some combo that works with results that only > have very low relevance results?) > > - should we do this for all facets, or just some (where the less > relevant results seem particularly annoying, as they can "mask" facets > from the most relevant results - the authors, years and subjects we > have full text for are not representative of the whole corpus) > > - if a searcher pages through to the 1000th result page, down to these > less relevant results, should we somehow include these results in the > facets we show? > > sorry, only more questions! > > Regards, > > Kent Fitch > > On Tue, Jun 23, 2009 at 5:58 PM, Asif Rahman wrote: > > Hi again, > > > > I guess nobody has used facets in the way I described below before. Do > any > > of the experts have any ideas as to how to do this efficiently and > > correctly? Any thoughts would be greatly appreciated. > > > > Thanks, > > > > Asif > > > > On Wed, Jun 17, 2009 at 12:42 PM, Asif Rahman wrote: > > > >> Hi all, > >> > >> We have an index of news articles that are tagged with news topics. > >> Currently, we use solr facets to see which topics are popular for a > given > >> query or time period. I'd like to apply the concept of IDF to the facet > >> counts so as to penalize the topics that occur broadly through our > index. > >> I've begun to write custom facet component that applies the IDF to the > facet > >> counts, but I also wanted to check if anyone has experience using facets > in > >> this way. > >> > >> Thanks, > >> > >> Asif > >> > > > > > > > > -- > > Asif Rahman > > Lead Engineer - NewsCred > > a...@newscred.com > > http://platform.newscred.com > > > -- Asif Rahman Lead Engineer - NewsCred a...@newscred.com http://platform.newscred.com
Re: SolrCore, reload, synonyms not reloaded
Hello, I was having a similar problem during indexing with my synonyms file. But I was able to resolve it using the steps you had outlined. Thanks! I am wondering if there is a way to reload the index with a new synonym file without Solr/Multi core support? I would really appreciate it if you could post the steps to accomplish this. hossman wrote: > > > : I'm using Solr 1.3 and I've never been able to get the SolrCore > (formerly > : MultiCore) reload feature to pick up changes I made to my synonyms file. > At > : index time I expand synonyms. If I change my synonyms.txt file then do > a > : MultiCore RELOAD and then reindex my data and then do a query that > should > : work now that I added a synonym, it doesn't work. If I go to the > analysis > : page and try putting in the text I see that it did pick up the changes. > I'm > : forced to bring down the the webapp for the changes to truly be > reloaded. > : Has anyone else seen this? > > David: I don't really use the Multi Core support, but your problem > descripting intrigued me so i tried it out, and i can *not* reproduce the > problem you are having. > > Steps i took > > 1) applied the patch listed at the end of this email to the Solr trunk. > note that it adds a "text" field to the multicore "core1" example configs. > this field uses SynonymFilter at index time. I also added a synonyms file > with "chris, hostetter" as the only entry. > > 2) cd example; java -Dsolr.solr.home=multicore -jar start.jar > > 3) java -Ddata=args -Durl=http://localhost:8983/solr/core1/update -jar > post.jar '1chris and > david' > > 4) checked luke handler, confirmed that chris, hostetter, and, & david > were indexed terms. > > 5) added "david, smiley" to my synonyms file > > 6) http://localhost:8983/solr/admin/cores?action=RELOAD&core=core1 > > 7) repeated step #3 > > 8) confirmed with luke that "smiley" was now an indexed term. also > confirmed that query for text:smiley found my doc > > > Here's the patch... > > > > Index: example/multicore/core1/conf/schema.xml > === > --- example/multicore/core1/conf/schema.xml (revision 693303) > +++ example/multicore/core1/conf/schema.xml (working copy) > @@ -19,6 +19,18 @@ > > > omitNorms="true"/> > + > + positionIncrementGap="100"> > + > + > + > + synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/> > + > + > + > + > + > + > > > > @@ -27,6 +39,7 @@ > multiValued="false" /> > multiValued="false" /> > multiValued="false" /> > + multiValued="false" /> > > > > Index: example/multicore/core1/conf/index_synonyms.txt > === > --- example/multicore/core1/conf/index_synonyms.txt (revision 0) > +++ example/multicore/core1/conf/index_synonyms.txt (revision 0) > @@ -0,0 +1,2 @@ > +chris, hostetter > + > > Property changes on: example/multicore/core1/conf/index_synonyms.txt > ___ > Name: svn:keywords >+ Date Author Id Revision HeadURL > Name: svn:eol-style >+ native > > > > -- View this message in context: http://www.nabble.com/SolrCore%2C-reload%2C-synonyms-not-reloaded-tp19339767p24164306.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Numerical range faceting
gwk wrote: Hi, I'm currently using facet.query to do my numerical range faceting. I basically use a fixed price range of €0 to €1 in steps of €500 which means 20 facet.queries plus an extra facet.query for anything above €1. I use the inclusive/exclusive query as per my question two days ago so the facets add up to the total number of products. This is done so that the javascript on my search page can accurately show the amount of products returned for a specified range before submitting it to the server by adding up the facet counts for the selected range. I'm a bit concerned about the amount and size of my request to the server. Especially because there are other numerical values which might be interesting to facet on and I've noticed the server won't response correctly if I add (many) more facet.queries by decreasing the step size. I was really hoping for faceting options for numerical ranges similar to the date faceting options. The functionality would be practically identical as far as I can tell (which isn't very far as I know very little about the internals of Solr) so I was wondering if such options are planned or if I'm overlooking something. Regards, gwk Hello, Well since I got no response, I flexed my severely atrophied Java-muscles (Last time I used the language Swing was new) and dove straight into the Solr code. Well, not really, mostly I did some copy-pasting and with some assistance from the API Reference I was able to add numerical faceting on sortable numerical fields (it seems to work for both integers and floating point numbers) with a similar syntax to the date faceting. I also added an extra parameter for whether the ranges should be inclusive or exclusive (on either end). And it seems to work. Although the quality of my code is not of the same grade as the rest of the Solr code (I was amazed how easy it was for me to add this feature). I was wondering if someone is interested in a patch file and if so, where should I post it? Regards, gwk As an example, the following query: http://localhost:8080/select/?q=*%3A*&echoParams=none&rows=0&indent=on&facet=true&; facet.number=price&f.price.facet.number.start=0& f.price.facet.number.end=100&f.price.facet.number.gap=1& f.price.facet.number.other=all&f.price.facet.number.exclusive=end yields the following results: 0 3 1820 2697 2588 2622 2459 2455 2597 2530 2518 2389 18 54 19 23 43 67 1.0 100.0 0 2733 60974
Re: Facets with an IDF concept
Hi Grant, I'll give a real life example of the problem that we are trying to solve. We index a large number of current news articles on a continuing basis. We tag these articles with news topics (e.g. Barack Obama, Iran, etc.). We then use these tags to facet our queries. For example, we might issue a query for all articles in the last 24 hours. The facets would then tell us which news topics have been written about the most in that period. The problem is that "Barack Obama", for example, is always written about in high frequency, as opposed to "Iran" which is currently very hot in the news, but which has not always been the case. In this case, we'd like to see "Iran" show up higher than "Barack Obama" in the facet results. To me, this seems identical to the tf-idf scoring expression that is used in normal search. The facet count is analogous to the tf and I can access the facet term idf's through the Similarity API. Is my reasoning sound? Can you provide any guidance as to the best way to implement this? Thanks for your help, Asif On Tue, Jun 23, 2009 at 1:19 PM, Grant Ingersoll wrote: > > On Jun 23, 2009, at 3:58 AM, Asif Rahman wrote: > > Hi again, >> >> I guess nobody has used facets in the way I described below before. Do >> any >> of the experts have any ideas as to how to do this efficiently and >> correctly? Any thoughts would be greatly appreciated. >> >> Thanks, >> >> Asif >> >> On Wed, Jun 17, 2009 at 12:42 PM, Asif Rahman wrote: >> >> Hi all, >>> >>> We have an index of news articles that are tagged with news topics. >>> Currently, we use solr facets to see which topics are popular for a given >>> query or time period. I'd like to apply the concept of IDF to the facet >>> counts so as to penalize the topics that occur broadly through our index. >>> I've begun to write custom facet component that applies the IDF to the >>> facet >>> counts, but I also wanted to check if anyone has experience using facets >>> in >>> this way. >>> >> > > I'm not sure I'm following. Would you be faceting on one field, but using > the DF from some other field? Faceting is already a count of all the > documents that contain the term on a given field for that search. If I'm > understanding, you would still do the typical faceting, but then rerank by > the global DF values, right? > > Backing up, what is the problem you are seeing that you are trying to > solve? > > I think you could do this, but you'd have to hook it in yourself. By > penalize, do you mean remove, or just have them in the sort? Generally > speaking, looking up the DF value can be expensive, especially if you do a > lot of skipping around. I don't know how pluggable the sort capabilities > are for faceting, but that might be the place to start if you are just > looking at the sorting options. > > > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search > > -- Asif Rahman Lead Engineer - NewsCred a...@newscred.com http://platform.newscred.com
Re: Numerical range faceting
On Tue, Jun 23, 2009 at 4:55 PM, gwk wrote: > > I was wondering if someone is interested in a patch file and if so, where > should I post it? > This seems useful. Please open an issue and submit a patch. I'm sure there will be interest. http://wiki.apache.org/solr/HowToContribute -- Regards, Shalin Shekhar Mangar.
Re: Facets with an IDF concept
Asif Rahman wrote: Hi Grant, I'll give a real life example of the problem that we are trying to solve. We index a large number of current news articles on a continuing basis. We tag these articles with news topics (e.g. Barack Obama, Iran, etc.). We then use these tags to facet our queries. For example, we might issue a query for all articles in the last 24 hours. The facets would then tell us which news topics have been written about the most in that period. The problem is that "Barack Obama", for example, is always written about in high frequency, as opposed to "Iran" which is currently very hot in the news, but which has not always been the case. In this case, we'd like to see "Iran" show up higher than "Barack Obama" in the facet results. your not looking for a IDF based function. you need to figure out what a 'normal' amount of news flow for a given topic is and then determine when an abnormal amount is happening. note.. that an abnormal amount is positive or negative. we use a similar method to this on http://love.com, so we know for example something is going on with Ed McMahon as I type. I wouldn't be looking at using SOLR to do this kind of thing btw. try something like esper. I think it might hold some promise to this kind of thing (esper is a open source stream database). Regards To me, this seems identical to the tf-idf scoring expression that is used in normal search. The facet count is analogous to the tf and I can access the facet term idf's through the Similarity API. Is my reasoning sound? Can you provide any guidance as to the best way to implement this? Thanks for your help, Asif On Tue, Jun 23, 2009 at 1:19 PM, Grant Ingersoll wrote: On Jun 23, 2009, at 3:58 AM, Asif Rahman wrote: Hi again, I guess nobody has used facets in the way I described below before. Do any of the experts have any ideas as to how to do this efficiently and correctly? Any thoughts would be greatly appreciated. Thanks, Asif On Wed, Jun 17, 2009 at 12:42 PM, Asif Rahman wrote: Hi all, We have an index of news articles that are tagged with news topics. Currently, we use solr facets to see which topics are popular for a given query or time period. I'd like to apply the concept of IDF to the facet counts so as to penalize the topics that occur broadly through our index. I've begun to write custom facet component that applies the IDF to the facet counts, but I also wanted to check if anyone has experience using facets in this way. I'm not sure I'm following. Would you be faceting on one field, but using the DF from some other field? Faceting is already a count of all the documents that contain the term on a given field for that search. If I'm understanding, you would still do the typical faceting, but then rerank by the global DF values, right? Backing up, what is the problem you are seeing that you are trying to solve? I think you could do this, but you'd have to hook it in yourself. By penalize, do you mean remove, or just have them in the sort? Generally speaking, looking up the DF value can be expensive, especially if you do a lot of skipping around. I don't know how pluggable the sort capabilities are for faceting, but that might be the place to start if you are just looking at the sorting options. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Solr Authentication Problem
HI All, As I am using SVN build of solr 1.4 and in it I am not able to find this method. Whether there is some changed in solr1.4 java client api. Thanks in advance Regards, Allahbaksh 2009/6/23 Noble Paul നോബിള് नोब्ळ् > I have raised an issue https://issues.apache.org/jira/browse/SOLR-1238 > > there is patch attached to the issue. > > > On Mon, Jun 22, 2009 at 1:40 PM, Allahbaksh Asadullah > wrote: > > > > Hi All, > > I am facing getting error when I am using Authentication in Solr. I > > followed Wiki. The error doesnot appear when I searching. Below is the > > code snippet and the error. > > > > Please note I am using Solr 1.4 Development build from SVN. > > > > > >HttpClient client=new HttpClient(); > > > >AuthScope scope = new > AuthScope(AuthScope.ANY_HOST, > > AuthScope.ANY_PORT,null, null); > > > >client.getState().setCredentials( > > > > scope, > > > >new UsernamePasswordCredentials("guest", > "guest") > > > >); > > > >SolrServer server =new > > CommonsHttpSolrServer("http://localhost:8983/solr",client); > > > > > > > > > > > >SolrInputDocument doc1=new SolrInputDocument(); > > > >//Add fields to the document > > > >doc1.addField("employeeid", "1237"); > > > >doc1.addField("employeename", "Ann"); > > > >doc1.addField("employeeunit", "etc"); > > > >doc1.addField("employeedoj", > "1995-11-31T23:59:59Z"); > > > >server.add(doc1); > > > > > > > > > > > > Exception in thread "main" > > org.apache.solr.client.solrj.SolrServerException: > > org.apache.commons.httpclient.ProtocolException: Unbuffered entity > > enclosing request can not be repeated. > > > >at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:468) > > > >at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242) > > > >at > org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259) > > > >at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63) > > > >at > test.SolrAuthenticationTest.(SolrAuthenticationTest.java:49) > > > >at > test.SolrAuthenticationTest.main(SolrAuthenticationTest.java:113) > > > > Caused by: org.apache.commons.httpclient.ProtocolException: Unbuffered > > entity enclosing request can not be repeated. > > > >at > org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:487) > > > >at > org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114) > > > >at > org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096) > > > >at > org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) > > > >at > org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) > > > >at > org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) > > > >at > org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) > > > >at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:415) > > > >... 5 more. > > > > Thanks and regards, > > Allahbaksh > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com > -- Allahbaksh Mohammedali Asadullah, Software Engineering & Technology Labs, Infosys Technolgies Limited, Electronic City, Hosur Road, Bangalore 560 100, India. (Board: 91-80-28520261 | Extn: 73927 | Direct: 41173927. Fax: 91-80-28520362 | Mobile: 91-9845505322.
RE: Data Import Handler
With the data-config file filled out, I am receiving errors telling me that the indexing of my database has failed. I think I have filled out everything I need to in the data-config file and that I have everything in the right directory. My details are described below, including locations of files, contents of the data-config file, and the errors I am seeing. Has anyone else seen problems like this? As of right now, I have data-config.xml in /usr/local/tomcat6.0.20/webapps/solr/, and I have the database bell_labs.sql in the solr/home directory /usr/local/tomcat6.0.20/solr/. Data-config.xml has the following contents: When I go to http://localhost:8080/solr/dataimport, I see the following displayed to my browser: This XML file does not appear to have any style information associated with it. The document tree is shown below. − − 0 0 − − − /usr/local/tomcat6.0.20/webapps/solr/data-config.xml idle − 0:0:35.614 0 0 0 0 2009-06-23 09:24:15 Indexing failed. Rolled back all changes. 2009-06-23 09:24:15 − This response format is experimental. It is likely to change in the future. When I go to http://localhost:8080/solr/admin/dataimport.jsp, I see two frames, the left frame having the DataImportHandler Development Console, and the right frame displaying the following: This XML file does not appear to have any style information associated with it. The document tree is shown below. − − 0 24 − − − /usr/local/tomcat6.0.20/webapps/solr/data-config.xml full-import debug idle Configuration Re-loaded sucessfully − 0:0:0.19 0 0 0 0 2009-06-23 09:26:15 Indexing failed. Rolled back all changes. 2009-06-23 09:26:15 − This response format is experimental. It is likely to change in the future. -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Monday, June 22, 2009 1:55 PM To: solr-user@lucene.apache.org Subject: Re: Data Import Handler On Mon, Jun 22, 2009 at 10:51 PM, Mukerjee, Neiloy (Neil) < neil.muker...@alcatel-lucent.com> wrote: > > I suspect that the fact that the data-config file is blank is causing these > issues, but per the documentation on the website, there is no indication of > what, if anything, should go there - is there an alternate resource that > anyone knows of which I could use? > > The data-config.xml is the file which specified how and from where Solr can pull data. For example look at the full-import from a database data-config.xml at http://wiki.apache.org/solr/DataImportHandler#head-c24dc86472fa50f3e87f744d3c80ebd9c31b791c Or, look at the Slashdot feed example at http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476 -- Regards, Shalin Shekhar Mangar.
Re: SolrCore, reload, synonyms not reloaded
a singlecore single system has no admin commands.But it is possible to setup a multicore with only one core and you will be able to reload the core On Tue, Jun 23, 2009 at 5:09 PM, ranjitr wrote: > > Hello, > > I was having a similar problem during indexing with my synonyms file. But I > was able to resolve it using the steps you had outlined. Thanks! > > I am wondering if there is a way to reload the index with a new synonym file > without Solr/Multi core support? I would really appreciate it if you could > post the steps to accomplish this. > > > > > > hossman wrote: > > > > > > : I'm using Solr 1.3 and I've never been able to get the SolrCore > > (formerly > > : MultiCore) reload feature to pick up changes I made to my synonyms file. > > At > > : index time I expand synonyms. If I change my synonyms.txt file then do > > a > > : MultiCore RELOAD and then reindex my data and then do a query that > > should > > : work now that I added a synonym, it doesn't work. If I go to the > > analysis > > : page and try putting in the text I see that it did pick up the changes. > > I'm > > : forced to bring down the the webapp for the changes to truly be > > reloaded. > > : Has anyone else seen this? > > > > David: I don't really use the Multi Core support, but your problem > > descripting intrigued me so i tried it out, and i can *not* reproduce the > > problem you are having. > > > > Steps i took > > > > 1) applied the patch listed at the end of this email to the Solr trunk. > > note that it adds a "text" field to the multicore "core1" example configs. > > this field uses SynonymFilter at index time. I also added a synonyms file > > with "chris, hostetter" as the only entry. > > > > 2) cd example; java -Dsolr.solr.home=multicore -jar start.jar > > > > 3) java -Ddata=args -Durl=http://localhost:8983/solr/core1/update -jar > > post.jar '1chris and > > david' > > > > 4) checked luke handler, confirmed that chris, hostetter, and, & david > > were indexed terms. > > > > 5) added "david, smiley" to my synonyms file > > > > 6) http://localhost:8983/solr/admin/cores?action=RELOAD&core=core1 > > > > 7) repeated step #3 > > > > 8) confirmed with luke that "smiley" was now an indexed term. also > > confirmed that query for text:smiley found my doc > > > > > > Here's the patch... > > > > > > > > Index: example/multicore/core1/conf/schema.xml > > === > > --- example/multicore/core1/conf/schema.xml (revision 693303) > > +++ example/multicore/core1/conf/schema.xml (working copy) > > @@ -19,6 +19,18 @@ > > > > > > > omitNorms="true"/> > > + > > + > positionIncrementGap="100"> > > + > > + > > + > > + > synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/> > > + > > + > > + > > + > > + > > + > > > > > > > > @@ -27,6 +39,7 @@ > > > multiValued="false" /> > > > multiValued="false" /> > > > multiValued="false" /> > > + > multiValued="false" /> > > > > > > > > Index: example/multicore/core1/conf/index_synonyms.txt > > === > > --- example/multicore/core1/conf/index_synonyms.txt (revision 0) > > +++ example/multicore/core1/conf/index_synonyms.txt (revision 0) > > @@ -0,0 +1,2 @@ > > +chris, hostetter > > + > > > > Property changes on: example/multicore/core1/conf/index_synonyms.txt > > ___ > > Name: svn:keywords > > + Date Author Id Revision HeadURL > > Name: svn:eol-style > > + native > > > > > > > > > > -- > View this message in context: > http://www.nabble.com/SolrCore%2C-reload%2C-synonyms-not-reloaded-tp19339767p24164306.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Numerical range faceting
Shalin Shekhar Mangar wrote: On Tue, Jun 23, 2009 at 4:55 PM, gwk wrote: I was wondering if someone is interested in a patch file and if so, where should I post it? This seems useful. Please open an issue and submit a patch. I'm sure there will be interest. Hi, I cleaned up the code a bit, added some javadoc (I hope I did it correctly) and created a ticket: http://issues.apache.org/jira/browse/SOLR-1240 Regards, gwk
Re: Search returning 0 results for "U2"
To answer my own question, the issue was with the WordDelimiter filter. Issue now resolved. J On Tue, 2009-06-23 at 11:13 +0100, John G. Moylan wrote: > We are moving from Lucene indexer and readers to a Hybrid solution where > we still use the Lucene Indexer but use Solr for querying the index. > > I am indexing our content using Lucene 2.3 and have a field called > "contents" which is tokenized and stored. When I search the contents > field for "U2" using Luke the correct document turns up. However, > searching for U2 in solr returns nothing. > > Any ideas? > > Lucene field is as follows: > > doc.add(new Field("contents", body.replaceAll(" ", ""), > Field.Store.YES, Field.Index.TOKENIZED)); > > Lucene is using the following analyzer: > > result = new ISOLatin1AccentFilter(result); > result = new StandardFilter(result); > result = new LowerCaseFilter(result); > result = new StopFilter(result,StandardAnalyzer.STOP_WORDS);//, > stopTable); > result = new PorterStemFilter(result); > > > In Solr I have the field mapped as a text field stored and indexed. The > text field uses > > positionIncrementGap="100"> > > > > ignoreCase="true" > words="stopwords.txt" > enablePositionIncrements="true" > /> > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > > > > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > words="stopwords.txt"/> > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > > > > Regards, > John > > > > > *** > The information in this e-mail is confidential and may be legally privileged. > It is intended solely for the addressee. Access to this e-mail by anyone else > is unauthorised. If you are not the intended recipient, any disclosure, > copying, distribution, or any action taken or omitted to be taken in reliance > on it, is prohibited and may be unlawful. > Please note that emails to, from and within RTÉ may be subject to the Freedom > of Information Act 1997 and may be liable to disclosure. > *** The information in this e-mail is confidential and may be legally privileged. It is intended solely for the addressee. Access to this e-mail by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution, or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Please note that emails to, from and within RTÉ may be subject to the Freedom of Information Act 1997 and may be liable to disclosure.
building custom RequestHandlers
I am using solr and php quite nicely. Currently the work flow includes some manipulation on php side so I correctly format the query string and pass to tomcat/solr. I somehow want to build own request handler in java so I skip the whole apache/php request that is just for formating. This will saves me tons of requests to apache since I use solr directly from javascript. Would like to ask if there is something ready that I can use and adjust. I am kinda new in Java but once I get the pointers I think should be able to pull out. Thanks, JD
why EnglishPorterFilterFactory transforms germany to germani?
Hi, Might be normal but I am confused why EnglishPorterFilterFactory transforms germany to germani Cheers
Re: building custom RequestHandlers
Are you using the JavaScript interface to Solr? http://wiki.apache.org/solr/SolrJS It may provide much of what you are looking for! Eric On Jun 23, 2009, at 10:27 AM, Julian Davchev wrote: I am using solr and php quite nicely. Currently the work flow includes some manipulation on php side so I correctly format the query string and pass to tomcat/solr. I somehow want to build own request handler in java so I skip the whole apache/php request that is just for formating. This will saves me tons of requests to apache since I use solr directly from javascript. Would like to ask if there is something ready that I can use and adjust. I am kinda new in Java but once I get the pointers I think should be able to pull out. Thanks, JD - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: why EnglishPorterFilterFactory transforms germany to germani?
That is how Porter stemmers work. They do not produce dictionary stems. They produce a common token for different inflections of the same word. The stem for "germanies" is also "germani". Example sentence: "The two Germanies merged in 1990." More info here: http://tartarus.org/~martin/PorterStemmer/ wunder On 6/23/09 7:36 AM, "Julian Davchev" wrote: > Hi, > Might be normal but I am confused why EnglishPorterFilterFactory > transforms germany to germani > > Cheers
Re: building custom RequestHandlers
Never used it.. I am just looking in docs how can I extend solr but no luck so far :( Hoping for some docs or real extend example. Eric Pugh wrote: > Are you using the JavaScript interface to Solr? > http://wiki.apache.org/solr/SolrJS > > It may provide much of what you are looking for! > > Eric > > On Jun 23, 2009, at 10:27 AM, Julian Davchev wrote: > >> I am using solr and php quite nicely. >> Currently the work flow includes some manipulation on php side so I >> correctly format the query string and pass to tomcat/solr. >> I somehow want to build own request handler in java so I skip the whole >> apache/php request that is just for formating. >> This will saves me tons of requests to apache since I use solr directly >> from javascript. >> >> Would like to ask if there is something ready that I can use and adjust. >> I am kinda new in Java but once I get the pointers >> I think should be able to pull out. >> Thanks, >> JD >> >> > > - > Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | > http://www.opensourceconnections.com > Free/Busy: http://tinyurl.com/eric-cal > > > >
Re: spellcheck. limit the suggested words by some field
I seem to have found answer to this one by digging 30mins in archives. One approach is to use copyfield and store only stuff that is interesting e.g spell_city field that will have stuff only where type is city Second approach includes extending IndexBasedSpellchecker ..alas I can find nowhere in docs how is this done. Julian Davchev wrote: > Hi, > I have build spellcheck dictionary based on name field. > It works like a charm but I'd like to limit the returned suggestion. > For example we have following sturcutre > > id name type > 1Berlin city > 2berganphony > > > So when I search for suggested words of "ber" I would get both Berlin > and bergan but I somehow want to limit to only those of type city. > I tried with fq=type:city but this didn't help either. > > Any pointers are more than welcome. The other approeach would be makind > different spellcheck dictionaries based on type and just use the > specific dictionary but then againI didn't see option howto build > dictionary based on type. > > Thanks. >
Re: Data Import Handler
On Tue, Jun 23, 2009 at 7:12 PM, Mukerjee, Neiloy (Neil) < neil.muker...@alcatel-lucent.com> wrote: > With the data-config file filled out, I am receiving errors telling me that > the indexing of my database has failed. I think I have filled out everything > I need to in the data-config file and that I have everything in the right > directory. My details are described below, including locations of files, > contents of the data-config file, and the errors I am seeing. Has anyone > else seen problems like this? > What error are you seeing? Can you please post the stack trace? > As of right now, I have data-config.xml in > /usr/local/tomcat6.0.20/webapps/solr/, and I have the database bell_labs.sql > in the solr/home directory /usr/local/tomcat6.0.20/solr/. What is bell_labs.sql? DataImportHandler imports from databases not from sql dumps. Is your database jdbc:mysql://localhost/bell_labs running? > Data-config.xml has the following contents: > > url="jdbc:mysql://localhost/bell_labs" user="root" password=""/> > > > > > > > > > > > > > Note that if the column name in your database and the name of the field in Solr is same, then you do not need to write the 'name' attribute in the field tags. > > When I go to http://localhost:8080/solr/dataimport, I see the following > displayed to my browser: > This XML file does not appear to have any style information associated with > it. The document tree is shown below. > > Indexing failed. Rolled back all changes. > 2009-06-23 09:24:15 > > It says that indexing failed. You should be able to see some exceptions in the solr log. If you can post them here, we might be able to help you more. -- Regards, Shalin Shekhar Mangar.
Trie vs long string for sorting
I've having trouble understanding how the Trie type compares (speed- and memory-wise) with dealing with long *string* (as opposed to integers). My data are library call numbers, normalized to be comparable, resulting in (maximum) 21-character strings of the form "RK 052180H359~999~999" Now, these are fine -- they work for sorting and ranges and the whole thing, but right now I can't use them because I've got two or three for each of my 6M documents and on a 32-bit machine I run out of heap. Another option would be to turn them into longs (using roughly 56 bits of the 64 bit space) and use a trie type. Is there any sort of a win involved there? -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: building custom RequestHandlers
Like most things JavaScript, I found that I had to just dig through it and play with it. However, the Reuters demo site was very easy to customize to interact with my own Solr instance, and I went from there. On Jun 23, 2009, at 11:30 AM, Julian Davchev wrote: Never used it.. I am just looking in docs how can I extend solr but no luck so far :( Hoping for some docs or real extend example. Eric Pugh wrote: Are you using the JavaScript interface to Solr? http://wiki.apache.org/solr/SolrJS It may provide much of what you are looking for! Eric On Jun 23, 2009, at 10:27 AM, Julian Davchev wrote: I am using solr and php quite nicely. Currently the work flow includes some manipulation on php side so I correctly format the query string and pass to tomcat/solr. I somehow want to build own request handler in java so I skip the whole apache/php request that is just for formating. This will saves me tons of requests to apache since I use solr directly from javascript. Would like to ask if there is something ready that I can use and adjust. I am kinda new in Java but once I get the pointers I think should be able to pull out. Thanks, JD - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Initialize SOLR DataImportHandler
We use the DataImportHandler for indexes from a RDBMS. Is there any way to make sure that the import is run when the SOLR webapp/core starts up? Do we need to send a command to SOLR to make this happen? -- View this message in context: http://www.nabble.com/Initialize-SOLR-DataImportHandler-tp24167359p24167359.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: building custom RequestHandlers
Is it possible to change the javascript output? I find some of the information choices (e.g., that facet information is returned in a flat list, with facet names in the even-numbered indexes and number-of-items following them in the odd-numbered indexes) kind of annoying. On Tue, Jun 23, 2009 at 12:16 PM, Eric Pugh wrote: > Like most things JavaScript, I found that I had to just dig through it and > play with it. However, the Reuters demo site was very easy to customize to > interact with my own Solr instance, and I went from there. > > > On Jun 23, 2009, at 11:30 AM, Julian Davchev wrote: > > Never used it.. I am just looking in docs how can I extend solr but no >> luck so far :( >> Hoping for some docs or real extend example. >> >> >> >> Eric Pugh wrote: >> >>> Are you using the JavaScript interface to Solr? >>> http://wiki.apache.org/solr/SolrJS >>> >>> It may provide much of what you are looking for! >>> >>> Eric >>> >>> On Jun 23, 2009, at 10:27 AM, Julian Davchev wrote: >>> >>> I am using solr and php quite nicely. Currently the work flow includes some manipulation on php side so I correctly format the query string and pass to tomcat/solr. I somehow want to build own request handler in java so I skip the whole apache/php request that is just for formating. This will saves me tons of requests to apache since I use solr directly from javascript. Would like to ask if there is something ready that I can use and adjust. I am kinda new in Java but once I get the pointers I think should be able to pull out. Thanks, JD >>> - >>> Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | >>> http://www.opensourceconnections.com >>> Free/Busy: http://tinyurl.com/eric-cal >>> >>> >>> >>> >>> >> > - > Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | > http://www.opensourceconnections.com > Free/Busy: http://tinyurl.com/eric-cal > > > > > -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: Facets with an IDF concept
On Jun 23, 2009, at 8:05 AM, Asif Rahman wrote: Hi Grant, I'll give a real life example of the problem that we are trying to solve. We index a large number of current news articles on a continuing basis. We tag these articles with news topics (e.g. Barack Obama, Iran, etc.). We then use these tags to facet our queries. For example, we might issue a query for all articles in the last 24 hours. The facets would then tell us which news topics have been written about the most in that period. The problem is that "Barack Obama", for example, is always written about in high frequency, as opposed to "Iran" which is currently very hot in the news, but which has not always been the case. In this case, we'd like to see "Iran" show up higher than "Barack Obama" in the facet results. To me, this seems identical to the tf-idf scoring expression that is used in normal search. The facet count is analogous to the tf and I can access the facet term idf's through the Similarity API. I'd say faceting is akin to the DF (doc freq) part of search, not TF. TF is per document, DF is across all the docs. Faceting is just counting all of docs that contain the various terms in that field across the results set. Regardless of the semantics, it doesn't sound like DF would give you what you want. It could be entirely possible that in some short timespan the number of docs on Iran could match up w/ the number on Obama (maybe not for that particular example) in which case your "hot" item would no longer appear hot. One idea is that you could take baselines of all the facets nightly for that field (via *:* or something) and then you could track the trends that way by calculating the diffs. Of course, you could then do this hour to hour and get into all kinds of trend detection stuff. In other words, it does seem like it's something you could do with Solr. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Question about index sizes.
Can anyone give me a rule of thumb for knowing when you need to go to multicore or shards? How many records can be in an index before it breaks down? Does it break down? Is it 10 million? 20 million? 50 million? Thanks, Jim
Re: Upgrading 1.2.0 to 1.3.0 solr
Actually, it was a very straightforward installation. I just tweaked the configurations afterward to better support for the new 1.3.0 features I wanted to use (spelling suggestions and faceting). Ryan T. Grange, IT Manager DollarDays International, Inc. rgra...@dollardays.com (480)922-8155 x106 Francis Yakin wrote: DO you have experience to upgrade from 1.2.0 to 1.3.0? In other words, do you have any suggestions or best if you have any docs or instructions for doing this. I appreciate if you can help me. Thanks Francis -Original Message- From: Ryan Grange [mailto:rgra...@dollardays.com] Sent: Thursday, June 11, 2009 8:39 AM To: solr-user@lucene.apache.org Subject: Re: Upgrading 1.2.0 to 1.3.0 solr I disagree with waiting that month. At this point, most of the kinks in the upgrade from 1.2 to 1.3 have been worked out. Waiting for 1.4 to come out risks you becoming a guinea pig for the upgrade procedure. Plus, if any show-stoppers come along delaying 1.4, you delay implementation of your auto-complete function. When 1.4 comes out, if it has any features you feel compel an upgrade, you can begin another round of testing and migration, but don't upgrade a production system just for the sake of being bleeding edge. Ryan T. Grange, IT Manager DollarDays International, Inc. rgra...@dollardays.com (480)922-8155 x106 Otis Gospodnetic wrote: Francis, If you can wait another month or so, you could skip 1.3.0, and jump to 1.4 which will be released soon. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch From: Francis Yakin To: "solr-user@lucene.apache.org" Sent: Wednesday, June 10, 2009 1:17:25 AM Subject: Upgrading 1.2.0 to 1.3.0 solr I am in process to upgrade our solr 1.2.0 to solr 1.3.0 Our solr 1.2.0 now is working fine, we just want to upgrade it cause we have an application that requires some function from 1.3.0( we call it autocomplete). Currently our config files on 1.2.0 are as follow: Solrconfig.xml Schema.xml ( we wrote this in house) Index_synonyms.txt ( we also modified and wrote this in house) Scripts.conf Protwords.txt Stopwords.txt Synonyms.txt I understand on 1.3.0 , it has new solrconfig.xml . My questions are: 1) what config files that I can reuse from 1.2.0 for 1.3.0 can I use the same schema.xml 2) Solrconfig.xml, can I use the 1.2.0 version or I have to stick with 1.3.0 If I need to stick with 1.3.0, what that I need to change. As of right I am testing it in my sandbox, so it doesn't work. Please advice, if you have any docs for upgrading 1.2.0 to 1.3.0 let me know. Thanks in advance Francis Note: I attached my solrconfigand schema.xml in this email -Inline Attachment Follows- {edited out by Ryan for brevity}
RE: Question about index sizes.
That's a great question. And the answer is, of course, it depends. Mostly on the size of the documents you are indexing. 50 million rows from a database table with a handful of columns is very different from 50 million web pages, pdf documents, books, etc. We currently have about 50 million documents split across 2 servers with reasonable performance - sub-second response time in most cases. The total size of the 2 indices is about 300G. I'd say most of the size is from stored fields, though we index just about everything. This is on 64-bit ubuntu boxes with 32G of memory. We haven't pushed this into production yet, but initial load-testing results look promising. Hope this helps! > -Original Message- > From: Jim Adams [mailto:jasolru...@gmail.com] > Sent: Tuesday, June 23, 2009 1:24 PM > To: solr-user@lucene.apache.org > Subject: Question about index sizes. > > Can anyone give me a rule of thumb for knowing when you need to go to > multicore or shards? How many records can be in an index before it > breaks > down? Does it break down? Is it 10 million? 20 million? 50 million? > > Thanks, Jim
Function query using Map
Hi, I'm trying to use the map function with a function query. I want to map a particular value to 1 and all other values to 0. We currently use the map function that has 4 parameters with no problem. However, for the map function with 5 parameters, I get a parse error. The following are the query and error returned: _query_ id:[* TO *] _val_:"map(ethnicity,3,3,1,0)" _error message_ *type* Status report *message* _org.apache.lucene.queryParser.ParseException: Cannot parse 'id:[* TO *] _val_:"map(ethnicity,3,3,1,0)"': Expected ')' at position 20 in 'map(ethnicity,3,3,1,0)'_ *description* _The request sent by the client was syntactically incorrect (org.apache.lucene.queryParser.ParseException: Cannot parse 'id:[* TO *] _val_:"map(ethnicity,3,3,1,0)"': Expected ')' at position 20 in 'map(ethnicity,3,3,1,0)'). _ It appears that the parser never evaluates the map string for anything other than the 4 parameters version. Could anyone give me some insight into this? Thanks in advance.
Re: THIS WEEK: PNW Hadoop / Apache Cloud Stack Users' Meeting, Wed Jun 24th, Seattle
Greetings, I've gotten a few replies on this, but I'd really like to know who else is coming. Just send me a quick note :) Cheers, Bradford On Mon, Jun 22, 2009 at 5:40 PM, Bradford Stephens wrote: > Hey all, just a friendly reminder that this is Wednesday! I hope to see > everyone there again. Please let me know if there's something interesting > you'd like to talk about -- I'll help however I can. You don't even need a > Powerpoint presentation -- there's many whiteboards. I'll try to have a > video cam, but no promises. > Feel free to call at 904-415-3009 if you need directions or any questions :) > ~~` > Greetings, > > On the heels of our smashing success last month, we're going to be > convening the Pacific Northwest (Oregon and Washington) > Hadoop/HBase/Lucene/etc. meetup on the last Wednesday of June, the > 24th. The meeting should start at 6:45, organized chats will end > around 8:00, and then there shall be discussion and socializing :) > > The meeting will be at the University of Washington in > Seattle again. It's in the Computer Science building (not electrical > engineering!), room 303, located > here: http://www.washington.edu/home/maps/southcentral.html?80,70,792,660 > > If you've ever wanted to learn more about distributed computing, or > just see how other people are innovating with Hadoop, you can't miss > this opportunity. Our focus is on learning and education, so every > presentation must end with a few questions for the group to research > and discuss. (But if you're an introvert, we won't mind). > > The format is two or three 15-minute "deep dive" talks, followed by > several 5 minute "lightning chats". We had a few interesting topics > last month: > > -Building a Social Media Analysis company on the Apache Cloud Stack > -Cancer detection in images using Hadoop > -Real-time OLAP on HBase -- is it possible? > -Video and Network Flow Analysis in Hadoop vs. Distributed RDBMS > -Custom Ranking in Lucene > > We already have one "deep dive" scheduled this month, on truly > scalable Lucene with Katta. If you've been looking for a way to handle > those large Lucene indices, this is a must-attend! > > Looking forward to seeing everyone there again. > > Cheers, > Bradford > > http://www.roadtofailure.com -- The Fringes of Distributed Computing, > Computer Science, and Social Media.
Re: building custom RequestHandlers
I am not sure we talk about same thing at all. I want to extend solr (java) so that I have another request handler in java And I can do for example /select?qt=myhandler&q=querystring Then in this myhandler class in java ot whoever I will parse querystring so I build the final correct query to pass to the engine. So question is howto extend the class where to place the file, howto recomplie, set in solrconfig.xml etc... so that it's all glued together and can make use of it. Bill Dueber wrote: > Is it possible to change the javascript output? I find some of the > information choices (e.g., that facet information is returned in a flat > list, with facet names in the even-numbered indexes and number-of-items > following them in the odd-numbered indexes) kind of annoying. > > On Tue, Jun 23, 2009 at 12:16 PM, Eric Pugh >> wrote: >> > > >> Like most things JavaScript, I found that I had to just dig through it and >> play with it. However, the Reuters demo site was very easy to customize to >> interact with my own Solr instance, and I went from there. >> >> >> On Jun 23, 2009, at 11:30 AM, Julian Davchev wrote: >> >> Never used it.. I am just looking in docs how can I extend solr but no >> >>> luck so far :( >>> Hoping for some docs or real extend example. >>> >>> >>> >>> Eric Pugh wrote: >>> >>> Are you using the JavaScript interface to Solr? http://wiki.apache.org/solr/SolrJS It may provide much of what you are looking for! Eric On Jun 23, 2009, at 10:27 AM, Julian Davchev wrote: I am using solr and php quite nicely. > Currently the work flow includes some manipulation on php side so I > correctly format the query string and pass to tomcat/solr. > I somehow want to build own request handler in java so I skip the whole > apache/php request that is just for formating. > This will saves me tons of requests to apache since I use solr directly > from javascript. > > Would like to ask if there is something ready that I can use and adjust. > I am kinda new in Java but once I get the pointers > I think should be able to pull out. > Thanks, > JD > > > > - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal >> - >> Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | >> http://www.opensourceconnections.com >> Free/Busy: http://tinyurl.com/eric-cal >> >> >> >> >> >> > > >
Re: Auto suggest.. how to do mixed case
hi shalin, can you please share code or tutorial documents for (it'll be great help) 1. Prefix search on shingles 2. Exact (phrase) search on n-grams The regular prefix search also works. The good thing with these is that you can filter and different stored value is also possible. ?? thanks! mani On Mon, Jun 22, 2009 at 4:41 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Mon, Jun 22, 2009 at 2:55 PM, Ingo Renner wrote: > > > > > Hi Shalin, > > > > I think > >> that by naming it as /autoSuggest, a lot of users have been misled since > >> there are other techniques available. > >> > > > > what would you suggest? > > > > > There are many techniques. Personally, I've used > > 1. Prefix search on shingles > 2. Exact (phrase) search on n-grams > > The regular prefix search also works. The good thing with these is that you > can filter and different stored value is also possible. > > -- > Regards, > Shalin Shekhar Mangar. >
Solr Logging in Weblogic
Has anyone been able to successfully configure logging from Solr in Weblogic? I am trying to increase the verbosity of the logs as I am seeing some strange behavior during indexing, but have not been able to make this work. Changes to Server -> Configuration -> Logging section in the Weblogic admin console don't seem to have any effect. Is even a way to configure the log level using Weblogic, or does it need to be done using logging.properties in the JVM? Ryan -- Ryan Heinen, Sr. Software Engineer Phone 604.408.8078 ext. 243 Email: ryan.hei...@elasticpath.com Elastic Path Software, Inc. 800 - 1045 Howe Street, Vancouver, BC V6Z 2A9 Fax: 604.408.8079 Web: www.elasticpath.com Blog: www.getelastic.com Community: http://grep.elasticpath.com
Re: building custom RequestHandlers
: Is it possible to change the javascript output? I find some of the : information choices (e.g., that facet information is returned in a flat : list, with facet names in the even-numbered indexes and number-of-items : following them in the odd-numbered indexes) kind of annoying. Did you look at the optional params for the JSON output format? (ie: json.nl)... http://wiki.apache.org/solr/SolJSON -Hoss
Re: building custom RequestHandlers
: So question is howto extend the class where to place the file, howto : recomplie, set in solrconfig.xml etc... so that it's all glued together : and can make use of it. I would start here... http://wiki.apache.org/solr/SolrPlugins ...and then ask specific questions as you encounter them. -Hoss
Re: building custom RequestHandlers
Is it just me or this is thread steal? nothing todo with what thread is originally about. Cheers Bill Dueber wrote: > Is it possible to change the javascript output? I find some of the > information choices (e.g., that facet information is returned in a flat > list, with facet names in the even-numbered indexes and number-of-items > following them in the odd-numbered indexes) kind of annoying. > > On Tue, Jun 23, 2009 at 12:16 PM, Eric Pugh >> wrote: >> > > >> Like most things JavaScript, I found that I had to just dig through it and >> play with it. However, the Reuters demo site was very easy to customize to >> interact with my own Solr instance, and I went from there. >> >> >> On Jun 23, 2009, at 11:30 AM, Julian Davchev wrote: >> >> Never used it.. I am just looking in docs how can I extend solr but no >> >>> luck so far :( >>> Hoping for some docs or real extend example. >>> >>> >>> >>> Eric Pugh wrote: >>> >>> Are you using the JavaScript interface to Solr? http://wiki.apache.org/solr/SolrJS It may provide much of what you are looking for! Eric On Jun 23, 2009, at 10:27 AM, Julian Davchev wrote: I am using solr and php quite nicely. > Currently the work flow includes some manipulation on php side so I > correctly format the query string and pass to tomcat/solr. > I somehow want to build own request handler in java so I skip the whole > apache/php request that is just for formating. > This will saves me tons of requests to apache since I use solr directly > from javascript. > > Would like to ask if there is something ready that I can use and adjust. > I am kinda new in Java but once I get the pointers > I think should be able to pull out. > Thanks, > JD > > > > - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal >> - >> Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | >> http://www.opensourceconnections.com >> Free/Busy: http://tinyurl.com/eric-cal >> >> >> >> >> >> > > >
Re: No wildcards with solr.ASCIIFoldingFilterFactory?
Wildcard queries are not analyzed, so you are getting what you type - which doesnt match what went through an analyzer and into the index. I don't think Solr has a solution for this at the moment. I think Lucene has a special analyzer with deals with this to some degree, but I have never used it. -- - Mark http://www.lucidimagination.com vladimirneu wrote: Hi all, could somebody help me to understand why I can not search with wildcard if I use the solr.ASCIIFoldingFilterFactory? So I get results if I am searching for "münchen", "munchen" or "munchen*", but I get no results if I do the search for "münchen*". The original records contain the terms "München" and "Münchener". The solr.ASCIIFoldingFilterFactory is configured on both sides index and query. We are using the 1.4-dev version from trunk. Thank you very much! Regards, Vladimir
Re: Facets with an IDF concept
: Regardless of the semantics, it doesn't sound like DF would give you what you : want. It could be entirely possible that in some short timespan the number of : docs on Iran could match up w/ the number on Obama (maybe not for that : particular example) in which case your "hot" item would no longer appear hot. but if hte numbers match up in that timespan then the "hot" item isn't as "hot" anymore. Myabe i'm missunderstanding: but it sounds like Asif's question esentailly boils down to getting facet constraints sorted after using some normalizing fraction ... the simplest case being the inverse ratio (this is where i think Asif is comparing it to IDF) of the number of matches for that facet in some larger docset to the size of the docset-- typically that docset could be the entire index, but it could also be the same search over a large window of time. So if i was doing a news search for all docs in the last 24 hours, I could multiple each of those facet counts by the ratio of the corrisponding counts from the past month to the number of articles from the past monght see how much "hotter" they are in my smaller result set... current result set facet counts (X)... News:1100 Obama:1000 Iran:800 Miley Cyrus:700 iPod:500 facet counts from the past month (Y), during which type 9000 (Z) documents were published... News:9000 Obama:7000 Iran:1000 Miley Cyrus:4000 iPod:5000 X*(Z/Y)... Iran:7200 Miley Cyrus:1575 Obama:1285.7 News:1100 iPod:900 Doing this in a Solr plugin would be the best way to to this -- because otherwise your "hot" terms might not even show up in the facet lists. any attempt to do it on the client would just be an approximation, and could easily miss the "hottest" item if it was just below cutoff for hte number of constraints to be returned. -Hoss
Re: Solr relevancy score - conversion
On Mon, 8 Jun 2009, Vijay_here wrote: : Would need an more proportionate score like rounded to 100% (95% relevant, : 80 % relevant and so on). Is there a way to make solr returns such scores of : such relevance. Any other approach to arrive at this scores also be : appreciated There is a reason Solr doens't return scores like that -- they are meaningless, more info is availabe on the Lucene-Java wiki... http://wiki.apache.org/lucene-java/ScoresAsPercentages -Hoss
Re: qf boost Versus field boost for Dismax queries
On Tue, 9 Jun 2009, ashokc wrote: : When 'dismax' queries are use, where is the best place to apply boost : values/factors? While indexing by supplying the 'boost' attribute to the : field, or in solrconfig.xml by specifying the 'qf' parameter with the same : boosts? What are the advantages/disadvantages to each? What happens if both This is discussed in the Lucene-Java FAQ... http://wiki.apache.org/lucene-java/LuceneFAQ#head-246300129b9d3bf73f597facec54ac2ee54e15d7 What is the difference between field (or document) boosting and query boosting? Index time field boosts (field.setBoost(boost)) are a way to express things like "this document's title is worth twice as much as the title of most documents". Query time boosts (query.setBoost(boost)) are a way to express "I care about matches on this clause of my query twice as much as I do about matches on other clauses of my query". Index time field boosts are worthless if you set them on every document. Index time document boosts (doc.setBoost(float)) are equivalent to setting a field boost on ever field in that document. -Hoss
Re: Search returning 0 results for "U2"
John, This is simply the case of mismatching index-time and query-time analyzers. When you use Luke you get the match, but Luke doesn't use the tokenizer+filters you specified in Solr for your field. In your Solr installation, go to Solr Admin page, then to Analysis page, enter U2 and select all other relevant checkboxes to see how U2 is getting analyzed by Solr. You should be able to spot the incompatibility then. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: John G. Moylan > To: solr-user@lucene.apache.org > Sent: Tuesday, June 23, 2009 6:13:32 AM > Subject: Search returning 0 results for "U2" > > We are moving from Lucene indexer and readers to a Hybrid solution where > we still use the Lucene Indexer but use Solr for querying the index. > > I am indexing our content using Lucene 2.3 and have a field called > "contents" which is tokenized and stored. When I search the contents > field for "U2" using Luke the correct document turns up. However, > searching for U2 in solr returns nothing. > > Any ideas? > > Lucene field is as follows: > > doc.add(new Field("contents", body.replaceAll(" ", ""), > Field.Store.YES, Field.Index.TOKENIZED)); > > Lucene is using the following analyzer: > > result = new ISOLatin1AccentFilter(result); > result = new StandardFilter(result); > result = new LowerCaseFilter(result); > result = new StopFilter(result,StandardAnalyzer.STOP_WORDS);//, > stopTable); > result = new PorterStemFilter(result); > > > In Solr I have the field mapped as a text field stored and indexed. The > text field uses > > > positionIncrementGap="100"> > > > > > ignoreCase="true" > words="stopwords.txt" > enablePositionIncrements="true" > /> > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > > protected="protwords.txt"/> > > > > > > > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > > words="stopwords.txt"/> > > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > > > protected="protwords.txt"/> > > > > > > Regards, > John
Re: Facets with an IDF concept
Hi, Hm, I don't think facets (nor pure search/Solr) are the right tool for this job. I think you have to do what Ian said, which is to compute the baseline for various concepts of interest (Barack Obama and Iran in your example), and then compare. Look at point #2 on http://www.sematext.com/product-key-phrase-extractor.html . I think this is what you are after, and you will even see an example that matches yours very closely. My guess is that's how http://www.google.com/trends/hottrends works, too. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Asif Rahman > To: solr-user@lucene.apache.org > Sent: Tuesday, June 23, 2009 8:05:48 AM > Subject: Re: Facets with an IDF concept > > Hi Grant, > > I'll give a real life example of the problem that we are trying to solve. > > We index a large number of current news articles on a continuing basis. We > tag these articles with news topics (e.g. Barack Obama, Iran, etc.). We > then use these tags to facet our queries. For example, we might issue a > query for all articles in the last 24 hours. The facets would then tell us > which news topics have been written about the most in that period. The > problem is that "Barack Obama", for example, is always written about in high > frequency, as opposed to "Iran" which is currently very hot in the news, but > which has not always been the case. In this case, we'd like to see "Iran" > show up higher than "Barack Obama" in the facet results. > > To me, this seems identical to the tf-idf scoring expression that is used in > normal search. The facet count is analogous to the tf and I can access the > facet term idf's through the Similarity API. > > Is my reasoning sound? Can you provide any guidance as to the best way to > implement this? > > Thanks for your help, > > Asif > > > On Tue, Jun 23, 2009 at 1:19 PM, Grant Ingersoll wrote: > > > > > On Jun 23, 2009, at 3:58 AM, Asif Rahman wrote: > > > > Hi again, > >> > >> I guess nobody has used facets in the way I described below before. Do > >> any > >> of the experts have any ideas as to how to do this efficiently and > >> correctly? Any thoughts would be greatly appreciated. > >> > >> Thanks, > >> > >> Asif > >> > >> On Wed, Jun 17, 2009 at 12:42 PM, Asif Rahman wrote: > >> > >> Hi all, > >>> > >>> We have an index of news articles that are tagged with news topics. > >>> Currently, we use solr facets to see which topics are popular for a given > >>> query or time period. I'd like to apply the concept of IDF to the facet > >>> counts so as to penalize the topics that occur broadly through our index. > >>> I've begun to write custom facet component that applies the IDF to the > >>> facet > >>> counts, but I also wanted to check if anyone has experience using facets > >>> in > >>> this way. > >>> > >> > > > > I'm not sure I'm following. Would you be faceting on one field, but using > > the DF from some other field? Faceting is already a count of all the > > documents that contain the term on a given field for that search. If I'm > > understanding, you would still do the typical faceting, but then rerank by > > the global DF values, right? > > > > Backing up, what is the problem you are seeing that you are trying to > > solve? > > > > I think you could do this, but you'd have to hook it in yourself. By > > penalize, do you mean remove, or just have them in the sort? Generally > > speaking, looking up the DF value can be expensive, especially if you do a > > lot of skipping around. I don't know how pluggable the sort capabilities > > are for faceting, but that might be the place to start if you are just > > looking at the sorting options. > > > > > > > > -- > > Grant Ingersoll > > http://www.lucidimagination.com/ > > > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > > Solr/Lucene: > > http://www.lucidimagination.com/search > > > > > > > -- > Asif Rahman > Lead Engineer - NewsCred > a...@newscred.com > http://platform.newscred.com
Query regarding Solr search options..
Hi, Can Solr search be customized to provide N number of lines before and after the line that contains matches the keyword. For eg: Suppose i have a document with 10 lines, and 5th line contains the key word 'X' I am interested in. Now if I am fire a Solr search for the keyword 'X'. Is there any preference/option available in Solr, which can be set so the search results contains only the 3 lines above and 3 lines after the line where the Keyword match successfully. Thanks, Silent Surfer
Re: Query regarding Solr search options..
Hello, Not quite "lines", but look at the various Highlighter options on the Wiki and in the example solrconfig.xml. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Silent Surfer > To: Solr User > Sent: Tuesday, June 23, 2009 11:04:53 PM > Subject: Query regarding Solr search options.. > > > Hi, > > Can Solr search be customized to provide N number of lines before and after > the > line that contains matches the keyword. > > For eg: Suppose i have a document with 10 lines, and 5th line contains the > key > word 'X' I am interested in. Now if I am fire a Solr search for the keyword > 'X'. > Is there any preference/option available in Solr, which can be set so the > search > results contains only the 3 lines above and 3 lines after the line where the > Keyword match successfully. > > Thanks, > Silent Surfer
Re: Facets with an IDF concept
On Jun 23, 2009, at 6:23 PM, Chris Hostetter wrote: : Regardless of the semantics, it doesn't sound like DF would give you what you : want. It could be entirely possible that in some short timespan the number of : docs on Iran could match up w/ the number on Obama (maybe not for that : particular example) in which case your "hot" item would no longer appear hot. but if hte numbers match up in that timespan then the "hot" item isn't as "hot" anymore. Not necessarily true. Consider the case where over the year there are 50 stories about Obama. Then, in the span of 5 days, there are 50 stories about Iran. Iran, in my view, is still hotter than Obama. In Asif's case, he was suggesting comparing against the global DF. Not to worry, though, your proposal is much the same as mine, namely take a baseline based on some set of docs (I chose *:*, you chose past month) and then compare. Myabe i'm missunderstanding: but it sounds like Asif's question esentailly boils down to getting facet constraints sorted after using some normalizing fraction ... the simplest case being the inverse ratio (this is where i think Asif is comparing it to IDF) of the number of matches for that facet in some larger docset to the size of the docset-- typically that docset could be the entire index, but it could also be the same search over a large window of time. So if i was doing a news search for all docs in the last 24 hours, I could multiple each of those facet counts by the ratio of the corrisponding counts from the past month to the number of articles from the past monght see how much "hotter" they are in my smaller result set... current result set facet counts (X)... News:1100 Obama:1000 Iran:800 Miley Cyrus:700 iPod:500 facet counts from the past month (Y), during which type 9000 (Z) documents were published... News:9000 Obama:7000 Iran:1000 Miley Cyrus:4000 iPod:5000 X*(Z/Y)... Iran:7200 Miley Cyrus:1575 Obama:1285.7 News:1100 iPod:900 Doing this in a Solr plugin would be the best way to to this -- because otherwise your "hot" terms might not even show up in the facet lists. any attempt to do it on the client would just be an approximation, and could easily miss the "hottest" item if it was just below cutoff for hte number of constraints to be returned. -Hoss -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Function query using Map
The five parameter feature is added in solr1.4 . which version of solr are you using? On Wed, Jun 24, 2009 at 12:57 AM, David Baker wrote: > > Hi, > > I'm trying to use the map function with a function query. I want to map a > particular value to 1 and all other values to 0. We currently use the map > function that has 4 parameters with no problem. However, for the map > function with 5 parameters, I get a parse error. The following are the query > and error returned: > > _query_ > id:[* TO *] _val_:"map(ethnicity,3,3,1,0)" > > _error message_ > > *type* Status report > *message* _org.apache.lucene.queryParser.ParseException: Cannot parse 'id:[* > TO *] _val_:"map(ethnicity,3,3,1,0)"': Expected ')' at position 20 in > 'map(ethnicity,3,3,1,0)'_ > *description* _The request sent by the client was syntactically incorrect > (org.apache.lucene.queryParser.ParseException: Cannot parse 'id:[* TO *] > _val_:"map(ethnicity,3,3,1,0)"': Expected ')' at position 20 in > 'map(ethnicity,3,3,1,0)'). > _ > > It appears that the parser never evaluates the map string for anything other > than the 4 parameters version. Could anyone give me some insight into this? > Thanks in advance. > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Initialize SOLR DataImportHandler
yes , you will need to fire a command On Tue, Jun 23, 2009 at 9:51 PM, ice wrote: > > > We use the DataImportHandler for indexes from a RDBMS. Is there any way to > make sure that the import is run when the SOLR webapp/core starts up? Do we > need to send a command to SOLR to make this happen? > -- > View this message in context: > http://www.nabble.com/Initialize-SOLR-DataImportHandler-tp24167359p24167359.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: building custom RequestHandlers
this part of the doc explains what you shold do to write a custom requesthandler http://wiki.apache.org/solr/SolrPlugins#head-7c0d03515c496017f6c0116ebb096e34a872cb61 On Wed, Jun 24, 2009 at 3:35 AM, Julian Davchev wrote: > Is it just me or this is thread steal? nothing todo with what thread is > originally about. > Cheers > > Bill Dueber wrote: >> Is it possible to change the javascript output? I find some of the >> information choices (e.g., that facet information is returned in a flat >> list, with facet names in the even-numbered indexes and number-of-items >> following them in the odd-numbered indexes) kind of annoying. >> >> On Tue, Jun 23, 2009 at 12:16 PM, Eric Pugh > >>> wrote: >>> >> >> >>> Like most things JavaScript, I found that I had to just dig through it and >>> play with it. However, the Reuters demo site was very easy to customize to >>> interact with my own Solr instance, and I went from there. >>> >>> >>> On Jun 23, 2009, at 11:30 AM, Julian Davchev wrote: >>> >>> Never used it.. I am just looking in docs how can I extend solr but no >>> luck so far :( Hoping for some docs or real extend example. Eric Pugh wrote: > Are you using the JavaScript interface to Solr? > http://wiki.apache.org/solr/SolrJS > > It may provide much of what you are looking for! > > Eric > > On Jun 23, 2009, at 10:27 AM, Julian Davchev wrote: > > I am using solr and php quite nicely. > >> Currently the work flow includes some manipulation on php side so I >> correctly format the query string and pass to tomcat/solr. >> I somehow want to build own request handler in java so I skip the whole >> apache/php request that is just for formating. >> This will saves me tons of requests to apache since I use solr directly >> from javascript. >> >> Would like to ask if there is something ready that I can use and adjust. >> I am kinda new in Java but once I get the pointers >> I think should be able to pull out. >> Thanks, >> JD >> >> >> >> > - > Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | > http://www.opensourceconnections.com > Free/Busy: http://tinyurl.com/eric-cal > > > > > > >>> - >>> Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | >>> http://www.opensourceconnections.com >>> Free/Busy: http://tinyurl.com/eric-cal >>> >>> >>> >>> >>> >>> >> >> >> > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Changing the score of a document based on the value of a field
The SolrRelevancyFAQ has a heading that's the same as my message's subject: http://wiki.apache.org/solr/SolrRelevancyFAQ#head-f013f5f2811e3ed28b200f326dd686afa491be5e There's a TODO on the wiki to provide an actual example. Does anybody happen to have an example handy that I could model my query after? Thank you -- Martin
Solrj no search results
Hi, I'm using an EmbeddedSolrServer. Adding documents with the example jetty server example using this method worked fine: doc1.addField( "id", "id1"); doc1.addField( "name", "doc1"); doc1.addField( "price", 10); server.add(doc1) However now I have changed the schema.xml so I can use my own fields and the documents will add to index (no compilation errors), however I will not get any search results back what so ever. Any Ideas? Thanks. -- View this message in context: http://www.nabble.com/Solrj-no-search-results-tp24179484p24179484.html Sent from the Solr - User mailing list archive at Nabble.com.