Re: Spellcheck in solr-nutch integration
Hello Anurag, I'm facing the same problem. Will u please elaborate on how u solved the problem? It would be great if u give me a step by step description as I'm new in Solr. -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-in-solr-nutch-integration-tp1953232p2429702.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Spellcheck in solr-nutch integration
First go thru the schema.xml file . Look at the different components. On Sat, Feb 5, 2011 at 1:01 PM, 666 [via Lucene] ml-node+2429702-1399813783-146...@n3.nabble.comml-node%2b2429702-1399813783-146...@n3.nabble.com wrote: Hello Anurag, I'm facing the same problem. Will u please elaborate on how u solved the problem? It would be great if u give me a step by step description as I'm new in Solr. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Spellcheck-in-solr-nutch-integration-tp1953232p2429702.html To unsubscribe from Spellcheck in solr-nutch integration, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1953232code=YW51cmFnLml0LmpvbGx5QGdtYWlsLmNvbXwxOTUzMjMyfC0yMDk4MzQ0MTk2. -- Kumar Anurag - Kumar Anurag -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-in-solr-nutch-integration-tp1953232p2429782.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Indexing Performance
I indexed 1000 pdf file with the same configuration, it completed in about 32 min.
Re: DataImportHandler: no queries when using entity=something
sorry add to url clean=false http://solr:8983/solr/dataimport?command=full-importentity=games; clean=false this is by mistake it was intended for somebody else
Re: Performance optimization of Proximity/Wildcard searches
Correct me if I am wrong. Commit in index flushes SOLR cache but of course OS cache would still be useful? If a an index is updated every hour then a warm up that takes less than 5 mins should be more than enough, right? On Sat, Feb 5, 2011 at 7:42 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Salman, Warming up may be useful if your caches are getting decent hit ratios. Plus, you are warming up the OS cache when you warm up. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Salman Akram salman.ak...@northbaysolutions.net To: solr-user@lucene.apache.org Sent: Fri, February 4, 2011 3:33:41 PM Subject: Re: Performance optimization of Proximity/Wildcard searches I know so we are not really using it for regular warm-ups (in any case index is updated on hourly basis). Just tried few times to compare results. The issue is I am not even sure if warming up is useful for such regular updates. On Fri, Feb 4, 2011 at 5:16 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Salman, I only skimmed your email, but wanted to say that this part sounds a little suspicious: Our warm up script currently executes all distinct queries in our logs having count 5. It was run yesterday (with all the indexing update every It sounds like this will make warmup take a long time, assuming you have more than a handful distinct queries in your logs. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Salman Akram salman.ak...@northbaysolutions.net To: solr-user@lucene.apache.org; t...@statsbiblioteket.dk Sent: Tue, January 25, 2011 6:32:48 AM Subject: Re: Performance optimization of Proximity/Wildcard searches By warmed index you only mean warming the SOLR cache or OS cache? As I said our index is updated every hour so I am not sure how much SOLR cache would be helpful but OS cache should still be helpful, right? I haven't compared the results with a proper script but from manual testing here are some of the observations. 'Recent' queries which are in cache of course return immediately (only if they are exactly same - even if they took 3-4 mins first time). I will need to test how many recent queries stay in cache but still this would work only for very common queries. User can run different queries and I want at least them to be at 'acceptable' level (5-10 secs) even if not very fast. Our warm up script currently executes all distinct queries in our logs having count 5. It was run yesterday (with all the indexing update every hour after that) and today when I executed some of the same queries again their time seemed a little less (around 15-20%), I am not sure if this means anything. However, still their time is not acceptable. What do you think is the best way to compare results? First run all the warm up queries and then execute same randomly and compare? We are using Windows server, would it make a big difference if we move to Linux? Our load is not high but some queries are really complex. Also I was hoping to move to SSD in last after trying out all software options. Is that an agreed fact that on large indexes (which don't fit in RAM) proximity/wildcard/phrase queries (on common words) would be slow and it can be only improved by cache warm up and better hardware? Otherwise with an index of around 150GB such queries will take more than a min? If that's the case I know this question is very subjective but if a single query takes 2 min on SAS 10K RPM what would its approx time be on a good SSD (everything else same)? Thanks! On Tue, Jan 25, 2011 at 3:44 PM, Toke Eskildsen t...@statsbiblioteket.dkwrote: On Tue, 2011-01-25 at 10:20 +0100, Salman Akram wrote: Cache warming is a good option too but the index get updated every hour so not sure how much would that help. What is the time difference between queries with a warmed index and a cold one? If the warmed index performs satisfactory, then one answer is to upgrade your underlying storage. As always for IO-caused performance problem in Lucene/Solr-land, SSD is the answer. -- Regards, Salman Akram -- Regards, Salman Akram -- Regards, Salman Akram
Re: Performance optimization of Proximity/Wildcard searches
Since all queries return total count as well so on average a query matches 10% of the total documents. The index I am talking about is around 13 million so that means around 1.3 million documents match on average. Of course all of them won't be overlapping so I am guessing that around 30-50% documents do match the daily queries. I tried to find out a lot if you can tell SOLR to stop searching after a certain count - I don't mean no. of rows but just like MySQL limit so that it doesn't have to spend time calculating the total count whereas its only returning few rows to UI and we are OK in showing count as 1000+ (if its more than 1000) but couldn't find any way. On Sat, Feb 5, 2011 at 7:45 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Heh, I'm not sure if this is valid thinking. :) By *matching* doc distribution I meant: what proportion of your millions of documents actually ever get matched and then how many of those make it to the UI. If you have 1000 queries in a day and they all end up matching only 3 of your docs, the system will need less RAM than a system where 1000 queries match 5 different docs. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Salman Akram salman.ak...@northbaysolutions.net To: solr-user@lucene.apache.org Sent: Fri, February 4, 2011 3:38:55 PM Subject: Re: Performance optimization of Proximity/Wildcard searches Well I assume many people out there would have indexes larger than 100GB and I don't think so normally you will have more RAM than 32GB or 64! As I mentioned the queries are mostly phrase, proximity, wildcard and combination of these. What exactly do you mean by distribution of documents? On this index our documents are not more than few hundred KB's on average (file system size) and there are around 14 million documents. 80% of the index size is taken up by position file. I am not sure if this is what you asked? On Fri, Feb 4, 2011 at 5:19 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, Sharding is an option too but that too comes with limitations so want to keep that as a last resort but I think there must be other things coz 150GB is not too big for one drive/server with 32GB Ram. Hmm what makes you think 32 GB is enough for your 150 GB index? It depends on queries and distribution of matching documents, for example. What's yours like? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Salman Akram salman.ak...@northbaysolutions.net To: solr-user@lucene.apache.org Sent: Tue, January 25, 2011 4:20:34 AM Subject: Performance optimization of Proximity/Wildcard searches Hi, I am facing performance issues in three types of queries (and their combination). Some of the queries take more than 2-3 mins. Index size is around 150GB. - Wildcard - Proximity - Phrases (with common words) I know CommonGrams and Stop words are a good way to resolve such issues but they don't fulfill our functional requirements (Common Grams seem to have issues with phrase proximity, stop words have issues with exact match etc). Sharding is an option too but that too comes with limitations so want to keep that as a last resort but I think there must be other things coz 150GB is not too big for one drive/server with 32GB Ram. Cache warming is a good option too but the index get updated every hour so not sure how much would that help. What are the other main tips that can help in performance optimization of the above queries? Thanks -- Regards, Salman Akram -- Regards, Salman Akram -- Regards, Salman Akram
TermVector query using Solr Tutorial
Hello all, I am following this tutorial: http://lucene.apache.org/solr/tutorial.html, I am playing with the TermVector, here is my step: 1. Launch the example server, java -jar start.jar 2. Index the monitor.xml, java -jar post.jar monitor.xml, which contains the following adddoc field name=id3007WFP/field field name=nameDell Widescreen UltraSharp 3007WFP/field field name=manuDell, Inc./field field name=catelectronics/field field name=catmonitor/field field name=features30 TFT active matrix LCD, 2560 x 1600, .25mm dot pitch, 700:1 contrast/field field name=includesUSB cable/field field name=weight401.6/field field name=price2199/field field name=popularity6/field field name=inStocktrue/field /doc/add 3. Execute the query to search for 25, as you can see, there are two `25` in the field features, i.e. http://localhost/solr/select/?q=25version=2.2start=0rows=10indent=onqt=tvrhtv.all=true 4. The term vector in the result does not make sense to me lst name=termVectors - lst name=doc-2 str name=uniqueKey3007WFP/str - lst name=includes - lst name=cabl int name=tf1/int - lst name=offsets int name=start4/int int name=end9/int /lst - lst name=positions int name=position1/int /lst int name=df1/int double name=tf-idf1.0/double /lst - lst name=usb int name=tf1/int - lst name=offsets int name=start0/int int name=end3/int /lst - lst name=positions int name=position0/int /lst int name=df1/int double name=tf-idf1.0/double /lst /lst /lst str name=uniqueKeyFieldNameid/str /lst What I want to know is the relative position the keywords within a field. Anyone can explain the above result to me? Thanks.
Re: Highlighting with/without Term Vectors
Yea I was going to reply to that thread but then it just slipped out of my mind. :) Actually we have two indexes. One that is used for searching and other for highlighting. Their structure is different too like the 1st one has all the metadata + document contents indexed (just for searching). This has around 13 million rows. In 2nd one we have mainly the document PAGE contents indexed/stored with Terms Vectors. This has around 130 million rows (since each row is a page). What we do is search on the 1st index (around 150GB) and get document ID's based on the page size (20/50/100) and then just search on these document ID's on 2nd index (but on pages - as we need to show results based on page no's) with text for highlighting as well. The 2nd index is around 700GB (which has that 450GB TVF file I was talking about) but since its only referred for small no. of documents mostly that is not an issue (in some queries that's slow too but its size is the main issue). On average more than 90% of the query time is taken by 1st index file in searching (and total count as well). The confusion that I had was on the 1st index file which didn't have Term Vectors in any of the fields in SOLR schema file but still had a TVF file. The reason in the end turned out to be Lucene indexing. Some of the initial documents were indexed through Lucene and there one of the field did had Term Vectors! Sorry for that... *Keeping in mind the above description any other ideas you would like to suggest? Thanks!!* On Sat, Feb 5, 2011 at 7:40 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Salman, Ah, so in the end you *did* have TV enabled on one of your fields! :) (I think this was a problem we were trying to solve a few weeks ago here) How many docs you have in the index doesn't matter here - only N docs/fields that you need to display on a page with N results need to be reanalyzed for highlighting purposes, so follow Grant's advice, make a small index without TV, and compare highlighting speed with and without TV. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Salman Akram salman.ak...@northbaysolutions.net To: solr-user@lucene.apache.org Sent: Fri, February 4, 2011 8:03:06 AM Subject: Re: Highlighting with/without Term Vectors Basically Term Vectors are only on one main field i.e. Contents. Average size of each document would be few KB's but there are around 130 million documents so what do you suggest now? On Fri, Feb 4, 2011 at 5:24 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Salman, It also depends on the size of your documents. Re-analyzing 20 fields of 500 bytes each will be a lot faster than re-analyzing 20 fields with 50 KB each. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Grant Ingersoll gsing...@apache.org To: solr-user@lucene.apache.org Sent: Wed, January 26, 2011 10:44:09 AM Subject: Re: Highlighting with/without Term Vectors On Jan 24, 2011, at 2:42 PM, Salman Akram wrote: Hi, Does anyone have any benchmarks how much highlighting speeds up with Term Vectors (compared to without it)? e.g. if highlighting on 20 documents take 1 sec with Term Vectors any idea how long it will take without them? I need to know since the index used for highlighting has a TVF file of around 450GB (approx 65% of total index size) so I am trying to see whether the decreasing the index size by dropping TVF would be more helpful for performance (less RAM, should be good for I/O too I guess) or keeping it is still better? I know the best way is try it out but indexing takes a very long time so trying to see whether its even worthy or not. Try testing on a smaller set. In general, you are saving the process of re-analyzing the content, so, to some extent it is going to be dependent on how fast your analyzer chain is. At the size you are at, I don't know if storing TVs is worth it. -- Regards, Salman Akram -- Regards, Salman Akram
jndi datasource in dataimport
Hi list, It looks like you can use a jndi datsource in the data import handler. however i can't find any syntax on this. Where is the best place to look for this ? (and confirm if jndi does work in dataimporthandler)
Re: jndi datasource in dataimport
ah should this work or am i doing something obvious wrong in config dataSource jndiName=java:sourcepathName type=JdbcDataSource user=xxx password=xxx/ in dataimport config dataSource type=JdbcDataSource name=java:sourcepathName / what am i doing wrong ? On 5 February 2011 10:16, lee carroll lee.a.carr...@googlemail.com wrote: Hi list, It looks like you can use a jndi datsource in the data import handler. however i can't find any syntax on this. Where is the best place to look for this ? (and confirm if jndi does work in dataimporthandler)
How to use q.op
Hi friends , Please tell me how to use q.op for for dismax and standared request handler. I found that q.op=AND was not working for dismax. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-use-q-op-tp2431273p2431273.html Sent from the Solr - User mailing list archive at Nabble.com.
AND operator and dismax request handler
Hi friends, Please suggest me that how can i set query operator to AND for dismax request handler case. My problem is that i am searching a string water treatment plant using dismax request handler . The query formed is of such type http://localhost:8884/solr/select/?q=water+treatment+plantq.alt=*:*start=0rows=5sort=score%20descqt=dismaxomitHeader=true My handling for dismax request handler in solrConfig.xml is - requestHandler name=dismax class=solr.DisMaxRequestHandler default=true lst name=defaults str name=facettrue/str str name=echoParamsexplicit/str float name=tie0.2/float str name=qf TDR_SUBIND_SUBTDR_SHORT^3 TDR_SUBIND_SUBTDR_DETAILS^2 TDR_SUBIND_COMP_NAME^1.5 TDR_SUBIND_LOC_STATE^3 TDR_SUBIND_PROD_NAMES^2.5 TDR_SUBIND_LOC_CITY^3 TDR_SUBIND_LOC_ZIP^2.5 TDR_SUBIND_NAME^1.5 TDR_SUBIND_TENDER_NO^1 /str str name=pf TDR_SUBIND_SUBTDR_SHORT^15 TDR_SUBIND_SUBTDR_DETAILS^10 TDR_SUBIND_COMP_NAME^20 /str str name=qs1/str int name=ps0/int str name=mm20%/str /lst /requestHandler In the final parsed query it is like +((TDR_SUBIND_PROD_NAMES:water^2.5 | TDR_SUBIND_LOC_ZIP:water^2.5 | TDR_SUBIND_COMP_NAME:water^1.5 | TDR_SUBIND_TENDER_NO:water | TDR_SUBIND_SUBTDR_SHORT:water^3.0 | TDR_SUBIND_SUBTDR_DETAILS:water^2.0 | TDR_SUBIND_LOC_CITY:water^3.0 | TDR_SUBIND_LOC_STATE:water^3.0 | TDR_SUBIND_NAME:water^1.5)~0.2 (TDR_SUBIND_PROD_NAMES:treatment^2.5 | TDR_SUBIND_LOC_ZIP:treatment^2.5 | TDR_SUBIND_COMP_NAME:treatment^1.5 | TDR_SUBIND_TENDER_NO:treatment | TDR_SUBIND_SUBTDR_SHORT:treatment^3.0 | TDR_SUBIND_SUBTDR_DETAILS:treatment^2.0 | TDR_SUBIND_LOC_CITY:treatment^3.0 | TDR_SUBIND_LOC_STATE:treatment^3.0 | TDR_SUBIND_NAME:treatment^1.5)~0.2 (TDR_SUBIND_PROD_NAMES:plant^2.5 | TDR_SUBIND_LOC_ZIP:plant^2.5 | TDR_SUBIND_COMP_NAME:plant^1.5 | TDR_SUBIND_TENDER_NO:plant | TDR_SUBIND_SUBTDR_SHORT:plant^3.0 | TDR_SUBIND_SUBTDR_DETAILS:plant^2.0 | TDR_SUBIND_LOC_CITY:plant^3.0 | TDR_SUBIND_LOC_STATE:plant^3.0 | TDR_SUBIND_NAME:plant^1.5)~0.2) (TDR_SUBIND_SUBTDR_DETAILS:water treatment plant^10.0 | TDR_SUBIND_COMP_NAME:water treatment plant^20.0 | TDR_SUBIND_SUBTDR_SHORT:water treatment plant^15.0)~0.2 Now it gives me results if any of the word is found from text water treatment plant. I think here OR operator is working which finally combines the results. Now i want only those results for which only complete text should be matching water treatment plant. 1. I do not want to make any change in solrConfig.xml dismax handler. If possible then suggest any other handler to deal with it. 2. Does there is really or operator is working in query. basically when i query like this q=%2Bwater%2Btreatment%2Bplantq.alt=*:*q.op=ANDstart=0rows=5sort=score desc,TDR_SUBIND_SUBTDR_OPEN_DATE ascomitHeader=truedebugQuery=trueqt=dismax OR q=water+AND+treatment+AND+plantq.alt=*:*q.op=ANDstart=0rows=5sort=score desc,TDR_SUBIND_SUBTDR_OPEN_DATE ascomitHeader=truedebugQuery=trueqt=dismax Then it is giving different results. Can you suggest what is the difference between above two queries. Please suggest me for full text search water treatment plant. Thanks for your response. -- View this message in context: http://lucene.472066.n3.nabble.com/AND-operator-and-dismax-request-handler-tp2431391p2431391.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Performance optimization of Proximity/Wildcard searches
Yes, OS cache mostly remains (obviously index files that are no longer around are going to remain the OS cache for a while, but will be useless and gradually replaced by new index files). How long warmup takes is not relevant here, but what queries you use to warm up the index and how much you auto-warm the caches. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Salman Akram salman.ak...@northbaysolutions.net To: solr-user@lucene.apache.org Sent: Sat, February 5, 2011 4:06:54 AM Subject: Re: Performance optimization of Proximity/Wildcard searches Correct me if I am wrong. Commit in index flushes SOLR cache but of course OS cache would still be useful? If a an index is updated every hour then a warm up that takes less than 5 mins should be more than enough, right? On Sat, Feb 5, 2011 at 7:42 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Salman, Warming up may be useful if your caches are getting decent hit ratios. Plus, you are warming up the OS cache when you warm up. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Salman Akram salman.ak...@northbaysolutions.net To: solr-user@lucene.apache.org Sent: Fri, February 4, 2011 3:33:41 PM Subject: Re: Performance optimization of Proximity/Wildcard searches I know so we are not really using it for regular warm-ups (in any case index is updated on hourly basis). Just tried few times to compare results. The issue is I am not even sure if warming up is useful for such regular updates. On Fri, Feb 4, 2011 at 5:16 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Salman, I only skimmed your email, but wanted to say that this part sounds a little suspicious: Our warm up script currently executes all distinct queries in our logs having count 5. It was run yesterday (with all the indexing update every It sounds like this will make warmup take a long time, assuming you have more than a handful distinct queries in your logs. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Salman Akram salman.ak...@northbaysolutions.net To: solr-user@lucene.apache.org; t...@statsbiblioteket.dk Sent: Tue, January 25, 2011 6:32:48 AM Subject: Re: Performance optimization of Proximity/Wildcard searches By warmed index you only mean warming the SOLR cache or OS cache? As I said our index is updated every hour so I am not sure how much SOLR cache would be helpful but OS cache should still be helpful, right? I haven't compared the results with a proper script but from manual testing here are some of the observations. 'Recent' queries which are in cache of course return immediately (only if they are exactly same - even if they took 3-4 mins first time). I will need to test how many recent queries stay in cache but still this would work only for very common queries. User can run different queries and I want at least them to be at 'acceptable' level (5-10 secs) even if not very fast. Our warm up script currently executes all distinct queries in our logs having count 5. It was run yesterday (with all the indexing update every hour after that) and today when I executed some of the same queries again their time seemed a little less (around 15-20%), I am not sure if this means anything. However, still their time is not acceptable. What do you think is the best way to compare results? First run all the warm up queries and then execute same randomly andcompare? We are using Windows server, would it make a big difference if we move to Linux? Our load is not high but some queries are really complex. Also I was hoping to move to SSD in last after trying out all software options. Is that an agreed fact that on large indexes (which don't fit in RAM) proximity/wildcard/phrase queries (on common words) would be slow and it can be only improved by cache warm up and better hardware? Otherwise with an index of around 150GB such queries will take more than a min? If that's the case I know this question is very subjective but if a single query takes 2 min on SAS 10K RPM what would its approx time be on a good SSD (everything else same)?
Is there anything like MultiSearcher?
Dear Solr experts, Could you recommend some strategies or perhaps tell me if I approach my problem from a wrong side? I was hoping to use MultiSearcher to search across multiple indexes in Solr, but there is no such a thing and MultiSearcher was removed according to this post: http://osdir.com/ml/solr-user.lucene.apache.org/2011-01/msg00250.html I though I had two use cases: 1. maintenance - I wanted to build two separate indexes, one for fulltext and one for metadata (the docs have the unique ids) - indexing them separately would make things much simpler 2. ability to switch indexes at search time (ie. for testing purposes - one fulltext index could be built by Solr standard mechanism, the other by a rather different process - independent instance of lucene) I think the recommended approach is to use the Distributed search - I found a nice solution here: http://stackoverflow.com/questions/2139030/search-multiple-solr-cores-and-return-one-result-set - however it seems to me, that data are sent over HTTP (5M from one core, and 5M from the other core being merged by the 3rd solr core?) and I would like to do it only for local indexes and without the network overhead. Could you please shed some light if there already exist an optimal solution to my use cases? And if not, whether I could just try to build a new SolrQuerySearcher that is extending lucene MultiSearcher instead of IndexSearch - or you think there are some deeply rooted problems there and the MultiSearch-er cannot work inside Solr? Thank you, Roman
Re: Index Not Matching
One other thing. After blowing away your index and doing a complete reindex, look at the Solr stats page for numDocs and maxDocs. If these numbers are not identical, you're somehow deleting records when reindexing, possibly because the uniqueKey in your schema is the same for some documents. Of course this is nonsense if your uniqueKey is also your database table primary key, but I thought I'd mention it On Fri, Feb 4, 2011 at 8:54 AM, Stefan Matheis matheis.ste...@googlemail.com wrote: try http://localhost:8080/solr/select?q=*:* or while using solr's default port http://localhost:8983/solr/select?q=*:* On Fri, Feb 4, 2011 at 2:50 PM, Esclusa, Will william.escl...@bonton.com wrote: Hello Grijesh, The URL below returns a 404 with the following error: The requested resource (/select/) is not available. -Original Message- From: Grijesh [mailto:pintu.grij...@gmail.com] Sent: Friday, February 04, 2011 12:17 AM To: solr-user@lucene.apache.org Subject: RE: Index Not Matching http://localhost:8080/select/?q=*:* will return all records form solr - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/Index-Not-Matching-tp2417612p2421560. html Sent from the Solr - User mailing list archive at Nabble.com.
Re: geodist and spacial search
Use the {!geofilt} param like Grant suggested. IMO, it works the best especially on larger datasets. Adam Sent from my iPhone On Feb 4, 2011, at 10:56 PM, Bill Bell billnb...@gmail.com wrote: Why not just: q=*:* fq={!bbox} sfield=store pt=49.45031,11.077721 d=40 fl=store sort=geodist() asc http://localhost:8983/solr/select?q=*:*sfield=storept=49.45031,11.077721; d=40fq={!bbox}sort=geodist%28%29%20asc That will sort, and filter up to 40km. No need for the fq={!func}geodist() sfield=store pt=49.45031,11.077721 Bill On 2/4/11 4:30 AM, Eric Grobler impalah...@googlemail.com wrote: Hi Grant, Thanks for the tip This seems to work: q=*:* fq={!func}geodist() sfield=store pt=49.45031,11.077721 fq={!bbox} sfield=store pt=49.45031,11.077721 d=40 fl=store sort=geodist() asc On Thu, Feb 3, 2011 at 7:46 PM, Grant Ingersoll gsing...@apache.org wrote: Use a filter query? See the {!geofilt} stuff on the wiki page. That gives you your filter to restrict down your result set, then you can sort by exact distance to get your sort of just those docs that make it through the filter. On Feb 3, 2011, at 10:24 AM, Eric Grobler wrote: Hi Erick, Thanks I saw that example, but I am trying to sort by distance AND specify the max distance in 1 query. The reason is: running bbox on 2 million documents with a 20km distance takes only 200ms. Sorting 2 million documents by distance takes over 1.5 seconds! So it will be much faster for solr to first filter the 20km documents and then to sort them. Regards Ericz On Thu, Feb 3, 2011 at 1:27 PM, Erick Erickson erickerick...@gmail.com wrote: Further down that very page G... Here's an example of sorting by distance ascending: - ...q=*:*sfield=storept=45.15,-93.85sort=geodist() asc http://localhost:8983/solr/select?wt=jsonindent=truefl=name,storeq=*:* sfield=storept=45.15,-93.85sort=geodist()%20asc The key is just the sort=geodist(), I'm pretty sure that's independent of the bbox, but I could be wrong. Best Erick On Wed, Feb 2, 2011 at 11:18 AM, Eric Grobler impalah...@googlemail.com wrote: Hi In http://wiki.apache.org/solr/SpatialSearch there is an example of a bbox filter and a geodist function. Is it possible to do a bbox filter and sort by distance - combine the two? Thanks Ericz -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem docs using Solr/Lucene: http://www.lucidimagination.com/search
keepword file with phrases
Hi List I'm trying to achieve the following text in this aisle contains preserves and savoury spreads desired index entry for a field to be used for faceting (ie strict set of normalised terms) is jams savoury spreads ie two facet terms current set up for the field is fieldType name=facet class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ShingleFilterFactory maxShingleSize=2 outputUnigrams=true/ filter class=solr.SynonymFilterFactory synonyms=goodForSynonyms.txt ignoreCase=true expand=true/ filter class=solr.KeepWordFilterFactory words=goodForKeepWords.txt ignoreCase=true/ /analyzer analyzer type=query charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ShingleFilterFactory maxShingleSize=2 outputUnigrams=true/ filter class=solr.SynonymFilterFactory synonyms=goodForSynonyms.txt ignoreCase=true expand=true/ filter class=solr.KeepWordFilterFactory words=goodForKeepWords.txt ignoreCase=true/ /analyzer /fieldType The thinking here is get rid of any mark up nonsense split into tokens based on whitespace = this aisle contains preserves and savoury spreads produce shingles of 1 or 2 tokens = this,this aisle, aisle, aisle contains, contains, contains preserves,preserves,and, and savoury, savoury, savoury spreads, spreads expand synonyms using a synomym file (preserves - jam) = this,this aisle, aisle, aisle contains, contains,contains preserves,preserves,jam,and,and savoury, savoury, savoury spreads, spreads produce a normalised term list using a keepword file of jam , savoury spreads in it which should place jam savoury spreads into the index field facet. However i don't get savoury spreads in the index. from the analysis.jsp everything goes to plan upto the last step where the keepword file does not like keeping the phrase savoury spreads. i've tried niavely quoting the phrase in the keepword file :-) What is the best way to achive the above ? Is this the correct approach or is there a better way ? thanks in advance lee
Re: keepword file with phrases
Just to add things are going not as expected before the keepword, the synonym list is not be expanded for shingles I think I don't understand term position On 5 February 2011 16:08, lee carroll lee.a.carr...@googlemail.com wrote: Hi List I'm trying to achieve the following text in this aisle contains preserves and savoury spreads desired index entry for a field to be used for faceting (ie strict set of normalised terms) is jams savoury spreads ie two facet terms current set up for the field is fieldType name=facet class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ShingleFilterFactory maxShingleSize=2 outputUnigrams=true/ filter class=solr.SynonymFilterFactory synonyms=goodForSynonyms.txt ignoreCase=true expand=true/ filter class=solr.KeepWordFilterFactory words=goodForKeepWords.txt ignoreCase=true/ /analyzer analyzer type=query charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ShingleFilterFactory maxShingleSize=2 outputUnigrams=true/ filter class=solr.SynonymFilterFactory synonyms=goodForSynonyms.txt ignoreCase=true expand=true/ filter class=solr.KeepWordFilterFactory words=goodForKeepWords.txt ignoreCase=true/ /analyzer /fieldType The thinking here is get rid of any mark up nonsense split into tokens based on whitespace = this aisle contains preserves and savoury spreads produce shingles of 1 or 2 tokens = this,this aisle, aisle, aisle contains, contains, contains preserves,preserves,and, and savoury, savoury, savoury spreads, spreads expand synonyms using a synomym file (preserves - jam) = this,this aisle, aisle, aisle contains, contains,contains preserves,preserves,jam,and,and savoury, savoury, savoury spreads, spreads produce a normalised term list using a keepword file of jam , savoury spreads in it which should place jam savoury spreads into the index field facet. However i don't get savoury spreads in the index. from the analysis.jsp everything goes to plan upto the last step where the keepword file does not like keeping the phrase savoury spreads. i've tried niavely quoting the phrase in the keepword file :-) What is the best way to achive the above ? Is this the correct approach or is there a better way ? thanks in advance lee
Re: UIMA Error
Hi Darx, are you running it without an internet connection? As the problem seems to be that the OpenCalais service host cannot be resolved. Remember that you can select which UIMA annotators run inside the OverridingParamsAggregateAEDescriptor.xml. Hope this helps. Tommaso 2011/2/5, Darx Oman darxo...@gmail.com: hi guys i'm trying to use UIMA contrib, but i got the following error ... INFO: [] webapp=/solr path=/select params={clean=falsecommit=truecommand=statusqt=/dataimport} status=0 QTime=0 05/02/2011 10:54:53 ص org.apache.solr.uima.processor.UIMAUpdateRequestProcessor processText INFO: Analazying text 05/02/2011 10:54:53 ص org.apache.solr.uima.processor.ae.OverridingParamsAEProvider getAE INFO: setting cat_apikey : 0449a72fe7ec5cb3497f14e77f338c86f2fe 05/02/2011 10:54:53 ص org.apache.solr.uima.processor.ae.OverridingParamsAEProvider getAE INFO: setting keyword_apikey : 0449a72fe7ec5cb3497f14e77f338c86f2fe 05/02/2011 10:54:53 ص org.apache.solr.uima.processor.ae.OverridingParamsAEProvider getAE INFO: setting concept_apikey : 0449a72fe7ec5cb3497f14e77f338c86f2fe 05/02/2011 10:54:53 ص org.apache.solr.uima.processor.ae.OverridingParamsAEProvider getAE INFO: setting entities_apikey : 0449a72fe7ec5cb3497f14e77f338c86f2fe 05/02/2011 10:54:53 ص org.apache.solr.uima.processor.ae.OverridingParamsAEProvider getAE INFO: setting lang_apikey : 0449a72fe7ec5cb3497f14e77f338c86f2fe 05/02/2011 10:54:53 ص org.apache.solr.uima.processor.ae.OverridingParamsAEProvider getAE INFO: setting oc_licenseID : g6h9zamsdtwhb93nc247ecrs 05/02/2011 10:54:53 ص WhitespaceTokenizer initialize INFO: Whitespace tokenizer successfully initialized 05/02/2011 10:54:56 ص org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={clean=falsecommit=truecommand=statusqt=/dataimport} status=0 QTime=0 05/02/2011 10:54:57 ص WhitespaceTokenizer typeSystemInit INFO: Whitespace tokenizer typesystem initialized 05/02/2011 10:54:57 ص WhitespaceTokenizer process INFO: Whitespace tokenizer starts processing 05/02/2011 10:54:57 ص WhitespaceTokenizer process INFO: Whitespace tokenizer finished processing 05/02/2011 10:54:57 ص org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl callAnalysisComponentProcess(405) SEVERE: Exception occurred org.apache.uima.analysis_engine.AnalysisEngineProcessException at org.apache.uima.annotator.calais.OpenCalaisAnnotator.process(OpenCalaisAnnotator.java:206) at org.apache.uima.analysis_component.CasAnnotator_ImplBase.process(CasAnnotator_ImplBase.java:56) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:567) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.init(ASB_impl.java:409) at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267) at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267) at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:280) at org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processText(UIMAUpdateRequestProcessor.java:122) at org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd(UIMAUpdateRequestProcessor.java:69) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:75) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:291) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:626) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:266) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:185) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:335) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:374) Caused by: java.net.UnknownHostException: api.opencalais.com at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:177) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:529) at java.net.Socket.connect(Socket.java:478) at sun.net.NetworkClient.doConnect(NetworkClient.java:163) at sun.net.www.http.HttpClient.openServer(HttpClient.java:394) at sun.net.www.http.HttpClient.openServer(HttpClient.java:529) at sun.net.www.http.HttpClient.init(HttpClient.java:233) at
Re: keepword file with phrases
You need to switch the order. Do synonyms and expansion first, then shingles.. Have you tried using analysis.jsp ? On 2/5/11 10:31 AM, lee carroll lee.a.carr...@googlemail.com wrote: Just to add things are going not as expected before the keepword, the synonym list is not be expanded for shingles I think I don't understand term position On 5 February 2011 16:08, lee carroll lee.a.carr...@googlemail.com wrote: Hi List I'm trying to achieve the following text in this aisle contains preserves and savoury spreads desired index entry for a field to be used for faceting (ie strict set of normalised terms) is jams savoury spreads ie two facet terms current set up for the field is fieldType name=facet class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ShingleFilterFactory maxShingleSize=2 outputUnigrams=true/ filter class=solr.SynonymFilterFactory synonyms=goodForSynonyms.txt ignoreCase=true expand=true/ filter class=solr.KeepWordFilterFactory words=goodForKeepWords.txt ignoreCase=true/ /analyzer analyzer type=query charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ShingleFilterFactory maxShingleSize=2 outputUnigrams=true/ filter class=solr.SynonymFilterFactory synonyms=goodForSynonyms.txt ignoreCase=true expand=true/ filter class=solr.KeepWordFilterFactory words=goodForKeepWords.txt ignoreCase=true/ /analyzer /fieldType The thinking here is get rid of any mark up nonsense split into tokens based on whitespace = this aisle contains preserves and savoury spreads produce shingles of 1 or 2 tokens = this,this aisle, aisle, aisle contains, contains, contains preserves,preserves,and, and savoury, savoury, savoury spreads, spreads expand synonyms using a synomym file (preserves - jam) = this,this aisle, aisle, aisle contains, contains,contains preserves,preserves,jam,and,and savoury, savoury, savoury spreads, spreads produce a normalised term list using a keepword file of jam , savoury spreads in it which should place jam savoury spreads into the index field facet. However i don't get savoury spreads in the index. from the analysis.jsp everything goes to plan upto the last step where the keepword file does not like keeping the phrase savoury spreads. i've tried niavely quoting the phrase in the keepword file :-) What is the best way to achive the above ? Is this the correct approach or is there a better way ? thanks in advance lee
Re: geodist and spacial search
Sure. I just didn't understand why you would use fq={!func}geodist() sfield=store pt=49.45031,11.077721 You would normally use {!geofilt} On 2/5/11 8:59 AM, Estrada Groups estrada.adam.gro...@gmail.com wrote: Use the {!geofilt} param like Grant suggested. IMO, it works the best especially on larger datasets. Adam Sent from my iPhone On Feb 4, 2011, at 10:56 PM, Bill Bell billnb...@gmail.com wrote: Why not just: q=*:* fq={!bbox} sfield=store pt=49.45031,11.077721 d=40 fl=store sort=geodist() asc http://localhost:8983/solr/select?q=*:*sfield=storept=49.45031,11.07772 1 d=40fq={!bbox}sort=geodist%28%29%20asc That will sort, and filter up to 40km. No need for the fq={!func}geodist() sfield=store pt=49.45031,11.077721 Bill On 2/4/11 4:30 AM, Eric Grobler impalah...@googlemail.com wrote: Hi Grant, Thanks for the tip This seems to work: q=*:* fq={!func}geodist() sfield=store pt=49.45031,11.077721 fq={!bbox} sfield=store pt=49.45031,11.077721 d=40 fl=store sort=geodist() asc On Thu, Feb 3, 2011 at 7:46 PM, Grant Ingersoll gsing...@apache.org wrote: Use a filter query? See the {!geofilt} stuff on the wiki page. That gives you your filter to restrict down your result set, then you can sort by exact distance to get your sort of just those docs that make it through the filter. On Feb 3, 2011, at 10:24 AM, Eric Grobler wrote: Hi Erick, Thanks I saw that example, but I am trying to sort by distance AND specify the max distance in 1 query. The reason is: running bbox on 2 million documents with a 20km distance takes only 200ms. Sorting 2 million documents by distance takes over 1.5 seconds! So it will be much faster for solr to first filter the 20km documents and then to sort them. Regards Ericz On Thu, Feb 3, 2011 at 1:27 PM, Erick Erickson erickerick...@gmail.com wrote: Further down that very page G... Here's an example of sorting by distance ascending: - ...q=*:*sfield=storept=45.15,-93.85sort=geodist() asc http://localhost:8983/solr/select?wt=jsonindent=truefl=name,storeq=* :* sfield=storept=45.15,-93.85sort=geodist()%20asc The key is just the sort=geodist(), I'm pretty sure that's independent of the bbox, but I could be wrong. Best Erick On Wed, Feb 2, 2011 at 11:18 AM, Eric Grobler impalah...@googlemail.com wrote: Hi In http://wiki.apache.org/solr/SpatialSearch there is an example of a bbox filter and a geodist function. Is it possible to do a bbox filter and sort by distance - combine the two? Thanks Ericz -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem docs using Solr/Lucene: http://www.lucidimagination.com/search
Re: Is there anything like MultiSearcher?
Why not just use sharding across the 2 cores? On 2/5/11 8:49 AM, Roman Chyla roman.ch...@gmail.com wrote: Dear Solr experts, Could you recommend some strategies or perhaps tell me if I approach my problem from a wrong side? I was hoping to use MultiSearcher to search across multiple indexes in Solr, but there is no such a thing and MultiSearcher was removed according to this post: http://osdir.com/ml/solr-user.lucene.apache.org/2011-01/msg00250.html I though I had two use cases: 1. maintenance - I wanted to build two separate indexes, one for fulltext and one for metadata (the docs have the unique ids) - indexing them separately would make things much simpler 2. ability to switch indexes at search time (ie. for testing purposes - one fulltext index could be built by Solr standard mechanism, the other by a rather different process - independent instance of lucene) I think the recommended approach is to use the Distributed search - I found a nice solution here: http://stackoverflow.com/questions/2139030/search-multiple-solr-cores-and- return-one-result-set - however it seems to me, that data are sent over HTTP (5M from one core, and 5M from the other core being merged by the 3rd solr core?) and I would like to do it only for local indexes and without the network overhead. Could you please shed some light if there already exist an optimal solution to my use cases? And if not, whether I could just try to build a new SolrQuerySearcher that is extending lucene MultiSearcher instead of IndexSearch - or you think there are some deeply rooted problems there and the MultiSearch-er cannot work inside Solr? Thank you, Roman
Re: geodist and spacial search
On Sat, Feb 5, 2011 at 10:59 AM, Estrada Groups estrada.adam.gro...@gmail.com wrote: Use the {!geofilt} param like Grant suggested. IMO, it works the best especially on larger datasets. Right, use geofilt if you need to restrict to a radius, or bbox if a bounding box is sufficient (which is often the case if you are going to sort by distance anyway). -Yonik http://lucidimagination.com
Re: prices
Jonathan- right in one! Using floats for prices will lead to madness. My mortgage UI kept changing the loan's interest rate. On Fri, Feb 4, 2011 at 12:13 PM, Dennis Gearon gear...@sbcglobal.net wrote: That's a good idea, Yonik. So, fields that aren't stored don't get displayed, so the float field in the schema never gets seen by the user. Good, I like it. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Yonik Seeley yo...@lucidimagination.com To: solr-user@lucene.apache.org Sent: Fri, February 4, 2011 10:49:42 AM Subject: Re: prices On Fri, Feb 4, 2011 at 12:56 PM, Dennis Gearon gear...@sbcglobal.net wrote: Using solr 1.4. I have a price in my schema. Currently it's a tfloat. Somewhere along the way from php, json, solr, and back, extra zeroes are getting truncated along with the decimal point for even dollar amounts. So I have two questions, neither of which seemed to be findable with google. A/ Any way to keep both zeroes going inito a float field? (In the analyzer, with XML output, the values are shown with 1 zero) B/ Can strings be used in range queries like a float and work well for prices? You could do a copyField into a stored string field and use the tfloat (or tint and store cents) for range queries, searching, etc, and the string field just for display. -Yonik http://lucidimagination.com Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. -- Lance Norskog goks...@gmail.com
Re: keepword file with phrases
: You need to switch the order. Do synonyms and expansion first, then : shingles.. except then he would be building shingles out of all the permutations of words in his symonyms -- including the multi-word synonyms. i don't *think* that's what he wants based on his example (but i may be wrong) : Have you tried using analysis.jsp ? he already mentioned he has, in his original mail, and that's how he can tell it's not working. lee: based on your followup post about seeing problems in the synonyms output, i suspect the problem you are having is with how the synonymfilter parses the synonyms file -- by default it assumes it should split on certain characters to creates multi-word synonyms -- but in your case the tokens you are feeding synonym filter (the output of your shingle filter) really do have whitespace in them there is a tokenizerFactory option that Koji added a hwile back to the SYnonymFilterFactory that lets you specify the classname of a TokenizerFactory to use when parsing the synonym rule -- that may be what you need to get your synonyms with spaces in them (so they work properly with your shingles) (assuming of course that i really understand your problem) -Hoss
Re: keepword file with phrases
OK that makes sense. If you double quote the synonyms file will that help for white space? Bill On 2/5/11 4:37 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : You need to switch the order. Do synonyms and expansion first, then : shingles.. except then he would be building shingles out of all the permutations of words in his symonyms -- including the multi-word synonyms. i don't *think* that's what he wants based on his example (but i may be wrong) : Have you tried using analysis.jsp ? he already mentioned he has, in his original mail, and that's how he can tell it's not working. lee: based on your followup post about seeing problems in the synonyms output, i suspect the problem you are having is with how the synonymfilter parses the synonyms file -- by default it assumes it should split on certain characters to creates multi-word synonyms -- but in your case the tokens you are feeding synonym filter (the output of your shingle filter) really do have whitespace in them there is a tokenizerFactory option that Koji added a hwile back to the SYnonymFilterFactory that lets you specify the classname of a TokenizerFactory to use when parsing the synonym rule -- that may be what you need to get your synonyms with spaces in them (so they work properly with your shingles) (assuming of course that i really understand your problem) -Hoss
Re: How to use q.op
: Dismax uses a strategy called Min-Should-Match which emulates the binary : operator in the Standard Handler. In a nutshell, this parameter (called mm) : specifies how many of the entered terms need to be present in your matched : documents. You can either specify an absolute number or a percentage. : : More information can be found here: : http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29 in future versions of solr, dismax will use the q.op param to provide a default for mm, but in Solr 1.4 and prior, you should basically set mm=0 if you want the equivilent of q.op=OR, and mm=100% if you want the equivilent of q.op=AND -Hoss
Re: How to use q.op
That sentence would be great to add to the Wiki. I changed the Wiki to add that. On 2/5/11 5:03 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Dismax uses a strategy called Min-Should-Match which emulates the binary : operator in the Standard Handler. In a nutshell, this parameter (called mm) : specifies how many of the entered terms need to be present in your matched : documents. You can either specify an absolute number or a percentage. : : More information can be found here: : http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27 _Match.29 in future versions of solr, dismax will use the q.op param to provide a default for mm, but in Solr 1.4 and prior, you should basically set mm=0 if you want the equivilent of q.op=OR, and mm=100% if you want the equivilent of q.op=AND -Hoss
Re: Is there anything like MultiSearcher?
Unless I am wrong, sharding across two cores is done over HTTP and has the limitations as listed at: http://wiki.apache.org/solr/DistributedSearch While MultiSearcher is just a decorator over IndexSearcher - therefore the limitations there would (?) not apply and if indexes reside locally, would be also faster Cheers, roman On Sat, Feb 5, 2011 at 10:02 PM, Bill Bell billnb...@gmail.com wrote: Why not just use sharding across the 2 cores? On 2/5/11 8:49 AM, Roman Chyla roman.ch...@gmail.com wrote: Dear Solr experts, Could you recommend some strategies or perhaps tell me if I approach my problem from a wrong side? I was hoping to use MultiSearcher to search across multiple indexes in Solr, but there is no such a thing and MultiSearcher was removed according to this post: http://osdir.com/ml/solr-user.lucene.apache.org/2011-01/msg00250.html I though I had two use cases: 1. maintenance - I wanted to build two separate indexes, one for fulltext and one for metadata (the docs have the unique ids) - indexing them separately would make things much simpler 2. ability to switch indexes at search time (ie. for testing purposes - one fulltext index could be built by Solr standard mechanism, the other by a rather different process - independent instance of lucene) I think the recommended approach is to use the Distributed search - I found a nice solution here: http://stackoverflow.com/questions/2139030/search-multiple-solr-cores-and- return-one-result-set - however it seems to me, that data are sent over HTTP (5M from one core, and 5M from the other core being merged by the 3rd solr core?) and I would like to do it only for local indexes and without the network overhead. Could you please shed some light if there already exist an optimal solution to my use cases? And if not, whether I could just try to build a new SolrQuerySearcher that is extending lucene MultiSearcher instead of IndexSearch - or you think there are some deeply rooted problems there and the MultiSearch-er cannot work inside Solr? Thank you, Roman
Re: UIMA Error
Hi Tommaso yes my server isn't connected to the internet. what other UIMA annotators that I can run which doesn't require an internet connection?
Re: UIMA Error
Hi Darx, The other in the basis configuration is the AlchemyAPIAnnotator. Cheers, Tommaso 2011/2/6, Darx Oman darxo...@gmail.com: Hi Tommaso yes my server isn't connected to the internet. what other UIMA annotators that I can run which doesn't require an internet connection?
Optimize seaches; business is progressing with my Solr site
Thanks to LOTS of information from you guys, my site is up and working. It's only an API now, I need to work on my OWN front end, LOL! I have my second customer. My general purpose repository API is very useful I'm finding. I will soon be in the business of optimizing the search engine part. For example. I have a copy field that has the words, 'boogie woogie ballroom' on lots of records in the copy field. I cannot find those records using 'boogie/boogi/boog', or the woogie versions of those, but I can with ballroom. For my VERY first lesson in optimization of search, what might be causing that, and where are the places to read on the Solr site on this? All the best on a Sunday, guys and gals. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die.