Re: Solr and UIMA
You can test our UIMA to Solr cas consumer is based on JulieLab Lucas and uses their CAS. but transformed to generate XML which can be saved to a file or posted direcly to solr In the map file you can define which information is generated for each token, and how its concatenaded, allowing the generation of thinks like the|AD car|NC which then can be processed using payloads. now you can get it from my page http://www.barcelonamedia.org/personal/joan.codina/en http://www.barcelonamedia.org/personal/joan.codina/en -- View this message in context: http://old.nabble.com/Solr-and-UIMA-tp24567504p27753399.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr Version
Hi, This is probably a really dumb question, but how can I find out which version of Solr is currently running on my (Windows) system? I can't seem to find anything in the Solr Admin interface nor the TomCat Manager. Thanks, Marc
AW: Solr Version
go to solr admin and then click on info, right in the first line you see the solr version -Ursprüngliche Nachricht- Von: Marc Wilson [mailto:wo...@fancydressoutfitters.co.uk] Gesendet: Dienstag, 2. März 2010 09:55 An: Solr Betreff: Solr Version Hi, This is probably a really dumb question, but how can I find out which version of Solr is currently running on my (Windows) system? I can't seem to find anything in the Solr Admin interface nor the TomCat Manager. Thanks, Marc
AW: Query from User Session to Documents with Must-Have Permissions
little question: what's the difference between a MustHavePermission and a protected document? at the moment we are developing a new search for our intranet and using solr. we also have some protected documents and implemented this kind of filter like you. i just think on using a true filter (fq=xxx) instead of adding conditions to the query. filters are cached and so improving performance and much more they do not affect scoring of matched documents! markus -Ursprüngliche Nachricht- Von: _jochen [mailto:jgai...@kbs.kaba.com] Gesendet: Montag, 1. März 2010 14:09 An: solr-user@lucene.apache.org Betreff: Query from User Session to Documents with Must-Have Permissions Hi @ all, i try to create a query out of a webbased content management system. In The CMS there are some protecetd Documents. While Feeding the Documents to Solr I have the Information: A Document is not protected ore someone with userGroup:group1 has access. So the query can look like: collection:collection1 AND textbody:text AND unprotected:true OR userGroups:group1 OR userGroups:group2 ... OR all other userGroups from user Session How does The Query look like, if Document contains Must-Have Permissions. I have this information while Feeding, so i have the possibility to Feed mustHaveGroups. I need to get all results matching the user Session and mustHaveUserGroup. Thanks for posting ideas! -- View this message in context: http://old.nabble.com/Query-from-User-Session-to-Documents-wit h-Must-Have-Permissions-tp27743114p27743114.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Cyrillic problem
Thank you very much! but I have problem with url :) If I send request using get method - I get: http://localhost/russian/result.php?search=%EF%F0%E8%E2%B3%F2 I use function (php)urldecode! If I print result, i get привіт! But if i send request to solr, my q param = пїЅпїЅпїЅпїЅпїЅ! -- View this message in context: http://old.nabble.com/Cyrillic-problem-tp27744106p27753656.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr for reporting purposes
doesn't sound like you need to add the complexity of breaking it up into 500 record chunks plenty of memory and a quad-core+ system and you should be fine with the kind of load you are talking about after all, should load test it first before you try any optimization tricks like this right? - Original Message - From: adeelmahmood adeelmahm...@gmail.com To: solr-user@lucene.apache.org Sent: Monday, 1 March, 2010 2:05:44 PM Subject: Re: solr for reporting purposes well thanks for ur reply .. as far as the load goes again I think most of the reports will be for 1000-4000 records and we dont have that many users .. its an internal system so we have about 400 users per day and we are opening this up for only half of those people (a specific role of people) .. so close to 200 people could potentially use it .. so practially speaking i think we can have up to 50 requests at a given time .. but again since its reports they are gonna be needed every day .. once you get a report you have it for a while .. so overall i dont think its that much of user load that we have .. what do you think also i was thinking about handling requests in a 500 records limit fashion .. so a request for 2000 records will be handled as 5 separate (refresh by a 5 sec timeout) requests .. do you think thats a good idea to ask solr to return 500 rows at a time but make that request 5 times .. or its better to just ask for 2000 rows alltogether Ron Chan wrote: we've done it successfully for similar requirements the resource requirements depends on how many concurrent people will be running those types of reports up to 4000 records is not a problem at all, one report at a time, but if you had concurrent requests running into thousands as well then you may have a problem, although you will probably run into memory problems at the rendering end before you have problems with Solr, i.e. not a Solr problem as such, but a problem generally of unrestricted adhoc reporting - Original Message - From: adeelmahmood adeelmahm...@gmail.com To: solr-user@lucene.apache.org Sent: Saturday, 27 February, 2010 5:57:00 AM Subject: Re: solr for reporting purposes I just want to clarify if its not obvious .. that the reason I am concerned about the performance of solr is becaues for reporting requests I will probably have to request all result rows at the same time .. instead of 10 or 20 adeelmahmood wrote: we are trying to use solr for somewhat of a reporting system too (along with search) .. since it provides such amazing control over queries and basically over the data that user wants .. they might as well be able to dump that data in an excel file too if needed .. our data isnt too much close to 25K docs with 15-20 fields in each doc .. and mostly these reports will be for close to 500 - 4000 records .. i am thinking about setting up a simple servlet that grabs all this data that submits the user query to solr over http .. grabs all that results data and dumps it in an excel file .. i was just hoping to get some idea of whether this is going to cause any performance impact on solr search .. especially since its all on the same server and some users will be doing reports while others will be searching .. right now search is working GREAT .. its blazing fast .. i dont wanna loose this but at the same time reporting is an important requirement as well .. also i would appreciate any hints towards some creative ways of doing it .. something like getting 500 some records in a single request and then using some timer task repeat the process .. thanks for ur help -- View this message in context: http://old.nabble.com/solr-for-reporting-purposes-tp27725967p27726016.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/solr-for-reporting-purposes-tp27725967p27743896.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: AW: Query from User Session to Documents with Must-Have Permissions
We have 2 different options in our acl: Someone has access using group1 OR group1,... Or someone has access using role1: group1 AND group2,... i could solve this problem resolving the roles while logging in of the user. So the session know which roles (group1 AND group2,...) the user has: queryString.append( AND (unprotected:true ); if (user != null) { CollectionString groups = user.getGroups(); for (String group : groups) { queryString.append( OR groups:); queryString.append(\ + group + \); } CollectionString andRoles = user.getAndRoles(); if (!andRoles.isEmpty()) { for (String role : andRoles) { queryString.append( OR roles:); queryString.append(\ + role + \); } } } queryString.append()); -- View this message in context: http://old.nabble.com/Query-from-User-Session-to-Documents-with-Must-Have-Permissions-tp27743114p27754276.html Sent from the Solr - User mailing list archive at Nabble.com.
Simultaneous Writes to Index
Hi, I am planning to development some application on which users could update their account data after login, this is on top of the search facility users have. the basic work flow is 1) user logs in 2) searches for some data 3) gets the results from solr index 4) save some of the search results into their repository 5) later on they may view their repository for this, at step4 I am planning to write that into a separate solr index as user may search within his repository and get the results, facets..etc. So thinking to write such data/info to a separate solr index. in this plan, how simultaneous writes to the user history index works. what are the best practices in such scenarios of updating index at a time by different users. the other alternative is to store such user info into DB, and schedule indexing process at regular intervals. But that wont make the system live with user actions, as there would be some delay, users cant see the data they saved in their repository until its indexed. that is the reason I am planning to use SOLR xml post request to update the index silently but how about multiple users writing on same index? Best Regards, Kranti K K Parisa
Issue on stopword list
Hi, How can i search using stopword my query like this This - 0 results becuase it is a stopword is - 0 results becuase it is a stopword that - 0 results becuase it is a stopword if i search like This is that - it must give the result for that i need to change anything in my schema file to get result This is that -- View this message in context: http://old.nabble.com/Issue-on-stopword-list-tp27754434p27754434.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Implementing hierarchical facet
Hi Andy, It sounds like you may want to have a look at tree faceting: https://issues.apache.org/jira/browse/SOLR-792 Date: Mon, 1 Mar 2010 18:23:51 -0800 From: angelf...@yahoo.com Subject: Implementing hierarchical facet To: solr-user@lucene.apache.org I read that a simple way to implement hierarchical facet is to concatenate strings with a separator. Something like level1level2level3 with as the separator. A problem with this approach is that the number of facet values will greatly increase. For example I have a facet Location with the hierarchy countrystatecity. Using the above approach every single city will lead to a separate facet value. With tens of thousands of cities in the world the response from Solr will be huge. And then on the client side I'd have to loop through all the facet values and combine those with the same country into a single value. Ideally Solr would be aware of the hierarchy structure and send back responses accordingly. So at level 1 Solr will send back facet values based on country (100 or so values). Level 2 the facet values will be based on the states within the selected country (a few dozen values). Next level will be cities within that state. and so on. Is it possible to implement hierarchical facet this way using Solr? _ Tell us your greatest, weirdest and funniest Hotmail stories http://clk.atdmt.com/UKM/go/195013117/direct/01/
Re: Simultaneous Writes to Index
as long as the document id is unique, concurrent writes is fine if for same reason the same doc id is used then it is overwritten, so last in will be the one that is in the index Ron - Original Message - From: Kranti™ K K Parisa kranti.par...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, 2 March, 2010 10:40:37 AM Subject: Simultaneous Writes to Index Hi, I am planning to development some application on which users could update their account data after login, this is on top of the search facility users have. the basic work flow is 1) user logs in 2) searches for some data 3) gets the results from solr index 4) save some of the search results into their repository 5) later on they may view their repository for this, at step4 I am planning to write that into a separate solr index as user may search within his repository and get the results, facets..etc. So thinking to write such data/info to a separate solr index. in this plan, how simultaneous writes to the user history index works. what are the best practices in such scenarios of updating index at a time by different users. the other alternative is to store such user info into DB, and schedule indexing process at regular intervals. But that wont make the system live with user actions, as there would be some delay, users cant see the data they saved in their repository until its indexed. that is the reason I am planning to use SOLR xml post request to update the index silently but how about multiple users writing on same index? Best Regards, Kranti K K Parisa
Re: Simultaneous Writes to Index
Hi Ron, Thanks for the reply. So does this mean that writer lock is nothing to do with concurrent writes? Best Regards, Kranti K K Parisa On Tue, Mar 2, 2010 at 4:19 PM, Ron Chan rc...@i-tao.com wrote: as long as the document id is unique, concurrent writes is fine if for same reason the same doc id is used then it is overwritten, so last in will be the one that is in the index Ron - Original Message - From: Kranti™ K K Parisa kranti.par...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, 2 March, 2010 10:40:37 AM Subject: Simultaneous Writes to Index Hi, I am planning to development some application on which users could update their account data after login, this is on top of the search facility users have. the basic work flow is 1) user logs in 2) searches for some data 3) gets the results from solr index 4) save some of the search results into their repository 5) later on they may view their repository for this, at step4 I am planning to write that into a separate solr index as user may search within his repository and get the results, facets..etc. So thinking to write such data/info to a separate solr index. in this plan, how simultaneous writes to the user history index works. what are the best practices in such scenarios of updating index at a time by different users. the other alternative is to store such user info into DB, and schedule indexing process at regular intervals. But that wont make the system live with user actions, as there would be some delay, users cant see the data they saved in their repository until its indexed. that is the reason I am planning to use SOLR xml post request to update the index silently but how about multiple users writing on same index? Best Regards, Kranti K K Parisa
Optimize Index
Hi All Is there a post request method to clean the index? I have removed my index folder and restarted solr and its still showing documents in the stats. I have run this post request: http://localhost:8983/solr/core1/update?optimize=true I get no errors but the stats are still show my 4 documents Hope you can advise. Thanks
fieldType text
Hi, I'm using the default text field type that comes with the example. When searching for simple words as 'HP' or 'TCS' solr is returning results that contains 'HP1' or 'TCS' Is there a solution for to avoid this? Thanks, Frederico
search and count ocurrences
Hi, I need to implement a search where i should count the number of times the string appears on the search field, ie: only return articles that mention the word 'HP' at least 2x. I'm currently doing this after the SOLR search with my own methods. Is there a way that SOLR does this type of operation for me? Thanks, Frederico
Re: Solr Cell and Deduplication - Get ID of doc
Thanks for the responses. This is exactly what I had to resort to. I will definitely put in a feature request to get the generated ID back from the extract request. I am doing this with PHP cURL for extraction and pecl php solr for querying. I am then saving the unique id and dupe hash in a MySQL table which I check against after the doc is indexed in Solr. If it is a dupe I delete the Solr record and discard the file. My problem now is the dupe hash sometimes comes back NULL from Solr although when I check it through Solr Admin it is there. I am working through this now to isolate. I had to set Solr to ALLOW duplicates because I have to somehow know that the file is a dupe and then remove the duplicate files on my filesystem. Based on the extract response I have no way of knowing this if duplicates are disallowed. -Bill On Tue, Mar 2, 2010 at 2:11 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : To quote from the wiki, ... That's all true ... but Bill explicitly said he wanted to use SignatureUpdateProcessorFactory to generate a uniqueKey from the content field post-extraction so he could dedup documents with the same content ... his question was how to get that key after adding a doc. Using a unique literal.field value will work -- but only as the value of a secondary field that he can then query on to get the uniqueKeyField value. : : You could create your own unique ID and pass it in with the : : literal.field=value feature. : : By which Lance means you could specify an unique value in a differnet : field from yoru uniqueKey field, and then query on that field:value pair : to get the doc after it's been added -- but that query will only work : until some other version of the doc (with some other value) overwrites it. : so you'd esentially have to query for the field:value to lookup the : uniqueKey. : : it seems like it should definitely be feasible for the : Update RequestHandlers to return the uniqueKeyField values for all the : added docs (regardless of wether the key was included in the request, or : added by an UpdateProcessor -- but i'm not sure how that would fit in with : the SolrJ API. : : would you mind opening a feature request in Jira? : : : : -Hoss : : : : : : -- : Lance Norskog : goks...@gmail.com : -Hoss
Re: Issue on stopword list
This is a classic problem with Stopword removal. Have you tried just removing stopwords from the indexing definition and the query definition and reindexing? You can't search on them no matter what you do if they've been removed, they just aren't there HTH Erick On Tue, Mar 2, 2010 at 5:47 AM, Suram reactive...@yahoo.com wrote: Hi, How can i search using stopword my query like this This - 0 results becuase it is a stopword is - 0 results becuase it is a stopword that - 0 results becuase it is a stopword if i search like This is that - it must give the result for that i need to change anything in my schema file to get result This is that -- View this message in context: http://old.nabble.com/Issue-on-stopword-list-tp27754434p27754434.html Sent from the Solr - User mailing list archive at Nabble.com.
get Server Status, TotalDocCount .... PHP !
hello I use Solr in my cakePHP Framework. How can i get status information of my solr cores ?? I dont want analyze everytime the responseXML. do anybody know a nice way to get status messages from solr ? thx ;) Jonas -- View this message in context: http://old.nabble.com/get-Server-Status%2C-TotalDocCount--PHP-%21-tp27756118p27756118.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: fieldType text
I think that's because of the internal tokenization that Solr does. If a document contains HP1, and you're using the default text field type, Solr would tokenize that to HP and 1, so that document figures in the list of documents containing HP, and hence that documents appears in the search results for HP. Creating a separate text field which does not tokenize like that might be what you want. The various filter/tokenizer types are listed here - http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters On Tue, Mar 2, 2010 at 6:07 PM, Frederico Azeiteiro frederico.azeite...@cision.com wrote: Hi, I'm using the default text field type that comes with the example. When searching for simple words as 'HP' or 'TCS' solr is returning results that contains 'HP1' or 'TCS' Is there a solution for to avoid this? Thanks, Frederico -- - Siddhant
Re: Optimize Index
My very first guess would be that you're removing an index that isn't the one your SOLR configuration points at. Second guess would be that your browser is caching the results of your first query and not going to SOLR at all. Stranger things have happened G. Third guess is you've mis-identified the core in your URL. Can you check those three things and let us know if you still have the problem? Erick On Tue, Mar 2, 2010 at 7:36 AM, Lee Smith l...@weblee.co.uk wrote: Hi All Is there a post request method to clean the index? I have removed my index folder and restarted solr and its still showing documents in the stats. I have run this post request: http://localhost:8983/solr/core1/update?optimize=true I get no errors but the stats are still show my 4 documents Hope you can advise. Thanks
Re: fieldType text
Expanding on Siddant's comment, look carefully at WordDelimiterFilterFactory, as I remember it's in the default schema definition. This page helps: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters http://wiki.apache.org/solr/AnalyzersTokenizersTokenFiltersErick On Tue, Mar 2, 2010 at 8:51 AM, Siddhant Goel siddhantg...@gmail.comwrote: I think that's because of the internal tokenization that Solr does. If a document contains HP1, and you're using the default text field type, Solr would tokenize that to HP and 1, so that document figures in the list of documents containing HP, and hence that documents appears in the search results for HP. Creating a separate text field which does not tokenize like that might be what you want. The various filter/tokenizer types are listed here - http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters On Tue, Mar 2, 2010 at 6:07 PM, Frederico Azeiteiro frederico.azeite...@cision.com wrote: Hi, I'm using the default text field type that comes with the example. When searching for simple words as 'HP' or 'TCS' solr is returning results that contains 'HP1' or 'TCS' Is there a solution for to avoid this? Thanks, Frederico -- - Siddhant
exact search
Hi, How do search the exact match like this The Books of Three ,if give this it would found Exact result + Some result related to Books. In my schema.xml file i has changed field type String instead of Text but not getting anychange -- View this message in context: http://old.nabble.com/exact-search-tp27756351p27756351.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Implementing hierarchical facet
Ideally Solr would be aware of the hierarchy structure and send back responses accordingly. If I understand it correctly, SOLR-64 supports them I think? So at level 1 Solr will send back facet values based on country (100 or so values). facet=onfacet.depth=1 ? Level 2 the facet values will be based on the states within the selected country (a few dozen values). facet=onfacet.prefix=selected-countryfacet.depth=2 ? Next level will be cities within that state. and so on. facet=onfacet.prefix=selected-country/selected-statefacet.depth=3 ? Koji -- http://www.rondhuit.com/en/
Re: get Server Status, TotalDocCount .... PHP !
Hi Have you tried the php_solr extension from PECL? It has a handy SolrPingResponse class. Or you could just call the CORENAME/admin/ping?wt=phps URL and unserialize it. Regards, -- I N S T A N T | L U X E - 44 rue de Montmorency | 75003 Paris | France Tél. : 01 80 50 52 51 | Mob. : 06 09 96 10 29 | web : www.instantluxe.com On Tue, Mar 2, 2010 at 2:50 PM, stocki st...@shopgate.com wrote: hello I use Solr in my cakePHP Framework. How can i get status information of my solr cores ?? I dont want analyze everytime the responseXML. do anybody know a nice way to get status messages from solr ? thx ;) Jonas -- View this message in context: http://old.nabble.com/get-Server-Status%2C-TotalDocCount--PHP-%21-tp27756118p27756118.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Issue on stopword list
Don't remove stopwords if you want to search on them. --wunder On Mar 2, 2010, at 5:43 AM, Erick Erickson wrote: This is a classic problem with Stopword removal. Have you tried just removing stopwords from the indexing definition and the query definition and reindexing? You can't search on them no matter what you do if they've been removed, they just aren't there HTH Erick On Tue, Mar 2, 2010 at 5:47 AM, Suram reactive...@yahoo.com wrote: Hi, How can i search using stopword my query like this This - 0 results becuase it is a stopword is - 0 results becuase it is a stopword that - 0 results becuase it is a stopword if i search like This is that - it must give the result for that i need to change anything in my schema file to get result This is that -- View this message in context: http://old.nabble.com/Issue-on-stopword-list-tp27754434p27754434.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: get Server Status, TotalDocCount .... PHP !
Hey- No i use the SolrPHPClient http://code.google.com/p/solr-php-client/ i not really want tu use two different php-libs. ^^ what do you mean with unserialize ? XD Guillaume Rossolini-2 wrote: Hi Have you tried the php_solr extension from PECL? It has a handy SolrPingResponse class. Or you could just call the CORENAME/admin/ping?wt=phps URL and unserialize it. Regards, -- I N S T A N T | L U X E - 44 rue de Montmorency | 75003 Paris | France Tél. : 01 80 50 52 51 | Mob. : 06 09 96 10 29 | web : www.instantluxe.com On Tue, Mar 2, 2010 at 2:50 PM, stocki st...@shopgate.com wrote: hello I use Solr in my cakePHP Framework. How can i get status information of my solr cores ?? I dont want analyze everytime the responseXML. do anybody know a nice way to get status messages from solr ? thx ;) Jonas -- View this message in context: http://old.nabble.com/get-Server-Status%2C-TotalDocCount--PHP-%21-tp27756118p27756118.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/get-Server-Status%2C-TotalDocCount--PHP-%21-tp27756118p27756852.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Optimize Index
Ha Now I feel stupid !! I had a misspell in the data path and you were correct. Can I ask Erik was the command correct though ? Thank you Lee On 2 Mar 2010, at 13:54, Erick Erickson wrote: My very first guess would be that you're removing an index that isn't the one your SOLR configuration points at. Second guess would be that your browser is caching the results of your first query and not going to SOLR at all. Stranger things have happened G. Third guess is you've mis-identified the core in your URL. Can you check those three things and let us know if you still have the problem? Erick On Tue, Mar 2, 2010 at 7:36 AM, Lee Smith l...@weblee.co.uk wrote: Hi All Is there a post request method to clean the index? I have removed my index folder and restarted solr and its still showing documents in the stats. I have run this post request: http://localhost:8983/solr/core1/update?optimize=true I get no errors but the stats are still show my 4 documents Hope you can advise. Thanks
Indexing HTML document
Hi, How to index properly HTML documents? All the documents are HTML, some containing charaters encodid like #x17E;#xED; ... Is there a character filter for filtering these codes? Is there a way to strip the HTML tags out? Does solr weight the terms in the document based on where they appear?.. words in headers (H1, H2,..) would be supposed to describe the document more then words in paragraphs. Thanks for help, Georg
Re: Indexing HTML document
There is an HTML filter documented here, which might be of some help - http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory Control characters can be eliminated using code like this - http://bitbucket.org/cogtree/python-solr/src/tip/pythonsolr/pysolr.py#cl-449 On Tue, Mar 2, 2010 at 9:37 PM, György Frivolt gyorgy.friv...@gmail.comwrote: Hi, How to index properly HTML documents? All the documents are HTML, some containing charaters encodid like #x17E;#xED; ... Is there a character filter for filtering these codes? Is there a way to strip the HTML tags out? Does solr weight the terms in the document based on where they appear?.. words in headers (H1, H2,..) would be supposed to describe the document more then words in paragraphs. Thanks for help, Georg -- - Siddhant
Re: Issue on stopword list
or you can try the commongrams filter that combines tokens next to a stopword On Tue, Mar 2, 2010 at 6:56 AM, Walter Underwood wun...@wunderwood.org wrote: Don't remove stopwords if you want to search on them. --wunder On Mar 2, 2010, at 5:43 AM, Erick Erickson wrote: This is a classic problem with Stopword removal. Have you tried just removing stopwords from the indexing definition and the query definition and reindexing? You can't search on them no matter what you do if they've been removed, they just aren't there HTH Erick On Tue, Mar 2, 2010 at 5:47 AM, Suram reactive...@yahoo.com wrote: Hi, How can i search using stopword my query like this This - 0 results becuase it is a stopword is - 0 results becuase it is a stopword that - 0 results becuase it is a stopword if i search like This is that - it must give the result for that i need to change anything in my schema file to get result This is that -- View this message in context: http://old.nabble.com/Issue-on-stopword-list-tp27754434p27754434.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: replication issue
Hi Paul Thank you for your amswer I did put all the directory structure on /raid ... /raid/solr_env/solr ... , /raid/solr_env/jetty ... And it still didn't work even after I applied patch SOLR-1736 I am investigating if this is because tempDir and data dir are not on the same partition matt --- On Mon, 3/1/10, Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com wrote: From: Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com Subject: Re: replication issue To: solr-user@lucene.apache.org Date: Monday, March 1, 2010, 10:30 PM The data/index.20100226063400 dir is a temporary dir and isc reated in the same dir where the index dir is located. I'm wondering if the symlink is causing the problem. Why don't you set the data dir as /raid/data instead of /solr/data On Sat, Feb 27, 2010 at 12:13 AM, Matthieu Labour matthieu_lab...@yahoo.com wrote: Hi I am still having issues with the replication and wonder if things are working properly So I have 1 master and 1 slave On the slave, I deleted the data/index directory and data/replication.properties file and restarted solr. When slave is pulling data from master, I can see that the size of data directory is growing r...@slr8:/raid/data# du -sh 3.7M . r...@slr8:/raid/data# du -sh 4.7M . and I can see that data/replication.properties file got created and also a directory data/index.20100226063400 soon after index.20100226063400 disapears and the size of data/index is back to 12K r...@slr8:/raid/data/index# du -sh 12K . And when I look for the number of documents via the admin interface, I still see 0 documents so I feel something is wrong One more thing, I have a symlink for /solr/data --- /raid/data Thank you for your help ! matt -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Warning : no lockType configured for...
Hi Mani, Mani EZZAT wrote: I'm dynamically creating cores with a new index, using the same schema and solrconfig.xml Does the problem occur if you use the same configuration in a single, static core? Tom -- View this message in context: http://old.nabble.com/Re%3A-Warning-%3A-no-lockType-configured-for...-tp27740724p27758951.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: get Server Status, TotalDocCount .... PHP !
The last time I tried using SolrPHPClient for this stuff, it did not really handle the response very well because of the JSON response generated on the server side. I am not sure if anything has changed since then. The JSON code generated could not be parsed properly. If you do not want to analyze the xml response each time and if you are not using the pecl extension you will need to send a request manually to the solr server using CURL and you have to specify the response format as phps On Tue, Mar 2, 2010 at 9:59 AM, stocki st...@shopgate.com wrote: Hey- No i use the SolrPHPClient http://code.google.com/p/solr-php-client/ i not really want tu use two different php-libs. ^^ what do you mean with unserialize ? XD Guillaume Rossolini-2 wrote: Hi Have you tried the php_solr extension from PECL? It has a handy SolrPingResponse class. Or you could just call the CORENAME/admin/ping?wt=phps URL and unserialize it. Regards, -- I N S T A N T | L U X E - 44 rue de Montmorency | 75003 Paris | France Tél. : 01 80 50 52 51 | Mob. : 06 09 96 10 29 | web : www.instantluxe.com On Tue, Mar 2, 2010 at 2:50 PM, stocki st...@shopgate.com wrote: hello I use Solr in my cakePHP Framework. How can i get status information of my solr cores ?? I dont want analyze everytime the responseXML. do anybody know a nice way to get status messages from solr ? thx ;) Jonas -- View this message in context: http://old.nabble.com/get-Server-Status%2C-TotalDocCount--PHP-%21-tp27756118p27756118.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/get-Server-Status%2C-TotalDocCount--PHP-%21-tp27756118p27756852.html Sent from the Solr - User mailing list archive at Nabble.com. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: replication issue
I think this issue is tot related to patch SOLR-1736 Here is the error I get ... Thank you for any help [2010-03-02 19:07:26] [pool-3-thread-1] ERROR(ReplicationHandler.java:266) - SnapPull failed org.apache.solr.common.SolrException: Unable to download _7bre.fdt completely. Downloaded 0!=15591 at org.apache.solr.handler.SnapPuller$FileFetcher.cleanup(SnapPuller.java:1036) at org.apache.solr.handler.SnapPuller$FileFetcher.fetchFile(SnapPuller.java:916) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:541) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:294) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:264) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:280) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:135) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:65) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:146) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:170) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675) at java.lang.Thread.run(Thread.java:595) --- On Tue, 3/2/10, Matthieu Labour matthieu_lab...@yahoo.com wrote: From: Matthieu Labour matthieu_lab...@yahoo.com Subject: Re: replication issue To: solr-user@lucene.apache.org Date: Tuesday, March 2, 2010, 11:23 AM Hi Paul Thank you for your amswer I did put all the directory structure on /raid ... /raid/solr_env/solr ... , /raid/solr_env/jetty ... And it still didn't work even after I applied patch SOLR-1736 I am investigating if this is because tempDir and data dir are not on the same partition matt --- On Mon, 3/1/10, Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com wrote: From: Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com Subject: Re: replication issue To: solr-user@lucene.apache.org Date: Monday, March 1, 2010, 10:30 PM The data/index.20100226063400 dir is a temporary dir and isc reated in the same dir where the index dir is located. I'm wondering if the symlink is causing the problem. Why don't you set the data dir as /raid/data instead of /solr/data On Sat, Feb 27, 2010 at 12:13 AM, Matthieu Labour matthieu_lab...@yahoo.com wrote: Hi I am still having issues with the replication and wonder if things are working properly So I have 1 master and 1 slave On the slave, I deleted the data/index directory and data/replication.properties file and restarted solr. When slave is pulling data from master, I can see that the size of data directory is growing r...@slr8:/raid/data# du -sh 3.7M . r...@slr8:/raid/data# du -sh 4.7M . and I can see that data/replication.properties file got created and also a directory data/index.20100226063400 soon after index.20100226063400 disapears and the size of data/index is back to 12K r...@slr8:/raid/data/index# du -sh 12K . And when I look for the number of documents via the admin interface, I still see 0 documents so I feel something is wrong One more thing, I have a symlink for /solr/data --- /raid/data Thank you for your help ! matt -- - Noble Paul | Systems Architect| AOL | http://aol.com
Logging in Embedded SolrServer - What a nightmare.
Hello all, I'm having a hard time trying to change Solr queries logging level. I've tried a lot of things I've found in the internet, this mailing list and solr docs. What I've found so far: - Solr Embedded Server uses sfl4j lib for intermediating logging. Here I'm using Log4j as my logging framework. - Changing the .../jre/lib/logging.properties worked, but only when querying using solr over http, and not on solr embedded. - A log4j.xml that I've added it is not being respected. (It is logging with a totally different layout and appenders) - I've searched for other log4j config files in the classpath, and found nothing... - Even tried to call Logger.getLogger(org.apache.solr) and then set its level manually inside the app, nothing changed... So, Embedded Solr Server keeps logging queries and other stuff in my stdout. Most docs and guides I've found in the internet is talking about solr http, this is ok for me, with http I got everything working, but not with solr embedded. Have anyone achieved this with embedded? Thanks a lot ppl, []s, Lucas Frare Teixeira .·. - lucas...@gmail.com - lucastex.com.br - blog.lucastex.com - twitter.com/lucastex
Ignore accents
Hi, guys, I have a solr index, and i need it to ignore accents and special characters. Eg: São Paulo = Sao Paulo, cadarço=cadarco. I know we could use a synonim, but i guess solr already has a filter or plugin for theses cases. Anyone knows how to do it? Att, Paulo Marinho
Re: Ignore accents
I have a solr index, and i need it to ignore accents and special characters. Eg: São Paulo = Sao Paulo, cadarço=cadarco. I know we could use a synonim, but i guess solr already has a filter or plugin for theses cases. Anyone knows how to do it? ASCIIFoldingFilterFactory[1] or charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ [1]http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ASCIIFoldingFilterFactory
CoreAdminHandler question
The action CREATE creates a new core based on preexisting instanceDir/solrconfig.xml/schema.xml, and registers it. That's what the documentation is stating. Is there a way to instruct solr to create the instanceDir if does not exist? I'm trying to create new core based on a existing schema/config to rebuild the index, after that swap it with the existing old core. The problem is that the instanceDir of the new core should exist before the core creation, and would be nice to programmatically create the instanceDir using the CoreAdminHandler. Maybe i'm missing something.. Thanks in advance. [ ]'s Leonardo da S. Souza °v° Linux user #375225 /(_)\ http://counter.li.org/ ^ ^
Unindexed Fields Are Searchable?
I've noticed that fields that I define as index=false in the schema.xml are still searchable. Here's the definition of the field: field name=object_id type=string index=false stored=true multiValued=false/ or field name=object_id type=string index=false stored=false multiValued=false/ I can then add a new document with the field object_id=26 and have the document returned when searching for +object_Id=26. On the other hand if I add the document using the Lucene API the Solr search does not return the document. Is there a bug in Solr 1.4 that allows for searchable unindexed fields for documents added by Solr?
Re: Unindexed Fields Are Searchable?
I've noticed that fields that I define as index=false in the schema.xml are still searchable. indexed=false defined fields are neither searchable nor sortable. Did you re-start servlet container and re-index your documents after changing this attribute in schema.xml?
Returning function result in results
Is there way to return function value in search results besides using score ?=20 -- This email is confidential to the intended recipient. If you have received it in error, please notify the sender and delete it from your system. Any unauthorized use, disclosure or copying is not permitted. The views or opinions presented are solely those of the sender and do not necessarily represent those of Public Library of Science unless otherwise specifically stated. Please note that neither Public Library of Science nor any of its agents accept any responsibility for any viruses that may be contained in this e-mail or its attachments and it is your responsibility to scan the e-mail and attachments (if any).
Different weights to different fields
Hi everyone, I'm new to Solr and just getting it set up and testing it out. I'd like to know if there's a way to give a different weight to different data fields. For an example, I'm going to be storing song information. I have the fields: Artist, Title, Description, and Tags. I'd like occurrences of the search term in Artist and Title to count more than the ones found in Description and Tags. For instance, a search for Bruce Springsteen against all the fields should return the ones where artist=Bruce Springsteen higher than ones that just have that within the description. Is this possible either in the indexing or with a query option? Thanks, Alex -- Alex Thurlow Blastro Networks http://www.blastro.com http://www.roxwel.com http://www.yallwire.com
Setting the return query fields
Hi, I would like to solr to return to record from /exampledocs/hd.xml when I search for the value 6H500F0 (which is the ID field for the 2'nd record in that file). I know there is a setting that I should change to get this done, but I can't locate it. Field name ID is alread included in schema.xml file. Thanks, Dhanushka
RE: Unindexed Fields Are Searchable?
My schema has always had index=false for that field. I only stopped and restarted the servlet container when I added a document to the index using the Lucene API instead of Solr. -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Tuesday, March 02, 2010 1:01 PM To: solr-user@lucene.apache.org Subject: Re: Unindexed Fields Are Searchable? I've noticed that fields that I define as index=false in the schema.xml are still searchable. indexed=false defined fields are neither searchable nor sortable. Did you re-start servlet container and re-index your documents after changing this attribute in schema.xml?
Re: Different weights to different fields
I'm new to Solr and just getting it set up and testing it out. I'd like to know if there's a way to give a different weight to different data fields. For an example, I'm going to be storing song information. I have the fields: Artist, Title, Description, and Tags. I'd like occurrences of the search term in Artist and Title to count more than the ones found in Description and Tags. For instance, a search for Bruce Springsteen against all the fields should return the ones where artist=Bruce Springsteen higher than ones that just have that within the description. Is this possible either in the indexing or with a query option? You can do it in either query time or index time. In query time you can assign different boost values with carat operator. e.g. Artist:(Bruce Springsteen)^10 Title:(Bruce Springsteen)^5 Also dismax[1] request handler might useful to you. [1]http://wiki.apache.org/solr/DisMaxRequestHandler At index time you can give different boost values to different fields. [2] e.g. field name=Artist boost=10.0 [2]http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22field.22
Re: Setting the return query fields
Hi, I would like to solr to return to record from /exampledocs/hd.xml when I search for the value 6H500F0 (which is the ID field for the 2'nd record in that file). I know there is a setting that I should change to get this done, but I can't locate it. Field name ID is alread included in schema.xml file. If you want to retrieve the document with ID=6H500F0, use id:6H500F0 as query. If you don't explicitly specify field name in your query defaultSearchField (which is defined in schema.xml) is used/queried. http://localhost:8983/solr/select/?q=id%3A6H500F0version=2.2start=0rows=10indent=on
RE: Unindexed Fields Are Searchable?
My schema has always had index=false for that field. I only stopped and restarted the servlet container when I added a document to the index using the Lucene API instead of Solr. Is there a special reason/use-case for to add documents using Lucene API?
Re: Setting the return query fields
Thanks for the reply. Is there a place in the config file where I can set it to explicitly search the fields I want? On Tue, Mar 2, 2010 at 3:17 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, I would like to solr to return to record from /exampledocs/hd.xml when I search for the value 6H500F0 (which is the ID field for the 2'nd record in that file). I know there is a setting that I should change to get this done, but I can't locate it. Field name ID is alread included in schema.xml file. If you want to retrieve the document with ID=6H500F0, use id:6H500F0 as query. If you don't explicitly specify field name in your query defaultSearchField (which is defined in schema.xml) is used/queried. http://localhost:8983/solr/select/?q=id%3A6H500F0version=2.2start=0rows=10indent=on
RE: Unindexed Fields Are Searchable?
For testing purposes. I just wanted to see if unindex fields in documents added by Lucene API were searchable by Solr. This is after discovering that the unindexed fields in documents added by Solr are searchable. -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Tuesday, March 02, 2010 1:23 PM To: solr-user@lucene.apache.org Subject: RE: Unindexed Fields Are Searchable? My schema has always had index=false for that field. I only stopped and restarted the servlet container when I added a document to the index using the Lucene API instead of Solr. Is there a special reason/use-case for to add documents using Lucene API?
Re: Setting the return query fields
Thanks for the reply. Is there a place in the config file where I can set it to explicitly search the fields I want? If you don't want to specify your fields at query time (also you want to query more than one fields at the same time) you can use DisMaxRequestHandler[1]. There are two example configurations (name=dismax and name=partitioned) in solrconfig.xml. You can invoke them by appending qt=dismax or qt=partitioned to your search url. [1]http://wiki.apache.org/solr/DisMaxRequestHandler
Re: Different weights to different fields
If you get the PACKT Solr 1.4 book, there are extensive examples of this very thing. It's *well* worth the time it'll save you... Erick On Tue, Mar 2, 2010 at 4:11 PM, Ahmet Arslan iori...@yahoo.com wrote: I'm new to Solr and just getting it set up and testing it out. I'd like to know if there's a way to give a different weight to different data fields. For an example, I'm going to be storing song information. I have the fields: Artist, Title, Description, and Tags. I'd like occurrences of the search term in Artist and Title to count more than the ones found in Description and Tags. For instance, a search for Bruce Springsteen against all the fields should return the ones where artist=Bruce Springsteen higher than ones that just have that within the description. Is this possible either in the indexing or with a query option? You can do it in either query time or index time. In query time you can assign different boost values with carat operator. e.g. Artist:(Bruce Springsteen)^10 Title:(Bruce Springsteen)^5 Also dismax[1] request handler might useful to you. [1]http://wiki.apache.org/solr/DisMaxRequestHandler At index time you can give different boost values to different fields. [2] e.g. field name=Artist boost=10.0 [2] http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22field.22
Re: Logging in Embedded SolrServer - What a nightmare.
Not sure if it will solve your specific problem. We use Solr as a WAR as well as Solrj. So the main solr distribution comes with slf4j-jdk-1.5.5.jar. I just deleted that and replaced it with slf4j-log4j12-1.5.5.jar. And then it used my existing log4j.properties file. From: Lucas F. A. Teixeira lucas...@gmail.com To: solr-user@lucene.apache.org Sent: Tue, March 2, 2010 11:14:26 AM Subject: Logging in Embedded SolrServer - What a nightmare. Hello all, I'm having a hard time trying to change Solr queries logging level. I've tried a lot of things I've found in the internet, this mailing list and solr docs. What I've found so far: - Solr Embedded Server uses sfl4j lib for intermediating logging. Here I'm using Log4j as my logging framework. - Changing the .../jre/lib/logging.properties worked, but only when querying using solr over http, and not on solr embedded. - A log4j.xml that I've added it is not being respected. (It is logging with a totally different layout and appenders) - I've searched for other log4j config files in the classpath, and found nothing... - Even tried to call Logger.getLogger(org.apache.solr) and then set its level manually inside the app, nothing changed... So, Embedded Solr Server keeps logging queries and other stuff in my stdout. Most docs and guides I've found in the internet is talking about solr http, this is ok for me, with http I got everything working, but not with solr embedded. Have anyone achieved this with embedded? Thanks a lot ppl, []s, Lucas Frare Teixeira .·. - lucas...@gmail.com - lucastex.com.br - blog.lucastex.com - twitter.com/lucastex
Re: Unindexed Fields Are Searchable?
Again, note that it should be index_ed_=false. ed - very important! If you're saying index=false, Solr is not reading that attribute at all, and going with the default for the field type. Erik On Mar 2, 2010, at 4:31 PM, Thomas Nguyen wrote: For testing purposes. I just wanted to see if unindex fields in documents added by Lucene API were searchable by Solr. This is after discovering that the unindexed fields in documents added by Solr are searchable. -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Tuesday, March 02, 2010 1:23 PM To: solr-user@lucene.apache.org Subject: RE: Unindexed Fields Are Searchable? My schema has always had index=false for that field. I only stopped and restarted the servlet container when I added a document to the index using the Lucene API instead of Solr. Is there a special reason/use-case for to add documents using Lucene API?
Re: replication issue
The replication does not work for me I have a big master solr and I want to start replicating it. I can see that the slave is downloading data from the master... I see a directory index.20100302093000 gets created in data/ next to index... I can see its size growing but then the directory gets deleted Here is the complete trace (I added a couple of LOG messages and compile solr) [2010-03-02 21:24:00] [pool-3-thread-1] DEBUG(MultiThreadedHttpConnectionManager.java:961) - Notifying no-one, there are no waiting threads [2010-03-02 21:24:00] [pool-3-thread-1] INFO (SnapPuller.java:278) - Number of files in latest index in master: 163 [2010-03-02 21:24:00] [pool-3-thread-1] DEBUG(SnapPuller.java:536) - downloadIndexFiles(downloadCompleteIndex=false,tmpIdxDir=../solr/data/index.20100302092400,latestVersion=1266003907838) [2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(SnapPuller.java:541) - --localIndexFile=/opt/solr_env/solr/data/index/_7h0y.fdx [2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(SnapPuller.java:900) - fetchFile() [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter PostMethod.addParameter(String, String) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - enter EntityEnclosingMethod.clearRequestBody() [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter PostMethod.addParameter(String, String) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - enter EntityEnclosingMethod.clearRequestBody() [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter PostMethod.addParameter(String, String) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - enter EntityEnclosingMethod.clearRequestBody() [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter PostMethod.addParameter(String, String) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - enter EntityEnclosingMethod.clearRequestBody() [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter PostMethod.addParameter(String, String) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - enter EntityEnclosingMethod.clearRequestBody() [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpClient.java:321) - enter HttpClient.executeMethod(HttpMethod) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpClient.java:374) - enter HttpClient.executeMethod(HostConfiguration,HttpMethod,HttpState) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(MultiThreadedHttpConnectionManager.java:405) - enter HttpConnectionManager.getConnectionWithTimeout(HostConfiguration, long) [2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(MultiThreadedHttpConnectionManager.java:412) - HttpConnectionManager.getConnection: config = HostConfiguration[host=http://myserver.com:8983], timeout = 0 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(MultiThreadedHttpConnectionManager.java:805) - enter HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(MultiThreadedHttpConnectionManager.java:805) - enter HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration) [2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(MultiThreadedHttpConnectionManager.java:839) - Getting free connection, hostConfig=HostConfiguration[host=http://myserver.com:8983] [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodDirector.java:379) - Attempt number 1 to process request [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:1079) - enter HttpMethodBase.execute(HttpState, HttpConnection) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:2057) - enter HttpMethodBase.writeRequest(HttpState, HttpConnection) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:2212) - enter HttpMethodBase.writeRequestLine(HttpState, HttpConnection) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:1496) - enter HttpMethodBase.generateRequestLine(HttpConnection, String, String, String, String) [2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(Wire.java:70) - POST /solr/replication HTTP/1.1[\r][\n] [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpConnection.java:1032) - enter HttpConnection.print(String) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpConnection.java:942) - enter HttpConnection.write(byte[]) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpConnection.java:963) - enter HttpConnection.write(byte[], int, int) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:2175) - enter HttpMethodBase.writeRequestHeaders(HttpState,HttpConnection) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:370) - enter EntityEnclosingMethod.addRequestHeaders(HttpState, HttpConnection) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(ExpectContinueMethod.java:183) - enter ExpectContinueMethod.addRequestHeaders(HttpState, HttpConnection)
Re: Different weights to different fields
That's great information. Thanks! -Alex Alex Thurlow Blastro Networks http://www.blastro.com http://www.roxwel.com http://www.yallwire.com On 3/2/2010 3:11 PM, Ahmet Arslan wrote: I'm new to Solr and just getting it set up and testing it out. I'd like to know if there's a way to give a different weight to different data fields. For an example, I'm going to be storing song information. I have the fields: Artist, Title, Description, and Tags. I'd like occurrences of the search term in Artist and Title to count more than the ones found in Description and Tags. For instance, a search for Bruce Springsteen against all the fields should return the ones where artist=Bruce Springsteen higher than ones that just have that within the description. Is this possible either in the indexing or with a query option? You can do it in either query time or index time. In query time you can assign different boost values with carat operator. e.g. Artist:(Bruce Springsteen)^10 Title:(Bruce Springsteen)^5 Also dismax[1] request handler might useful to you. [1]http://wiki.apache.org/solr/DisMaxRequestHandler At index time you can give different boost values to different fields. [2] e.g.field name=Artist boost=10.0 [2]http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22field.22
Re: Unindexed Fields Are Searchable?
Again, note that it should be index_ed_=false. ed - very important! If you're saying index=false, Solr is not reading that attribute at all, and going with the default for the field type. Perfect catch :)
Re: replication issue
One More information I deleted the index on the master and I restarted the master and restarted the slave and now the replication works Would it be possible that the replication doesn work well when started against an already existing big index ? Thank you --- On Tue, 3/2/10, Matthieu Labour matthieu_lab...@yahoo.com wrote: From: Matthieu Labour matthieu_lab...@yahoo.com Subject: Re: replication issue To: solr-user@lucene.apache.org Date: Tuesday, March 2, 2010, 3:35 PM The replication does not work for me I have a big master solr and I want to start replicating it. I can see that the slave is downloading data from the master... I see a directory index.20100302093000 gets created in data/ next to index... I can see its size growing but then the directory gets deleted Here is the complete trace (I added a couple of LOG messages and compile solr) [2010-03-02 21:24:00] [pool-3-thread-1] DEBUG(MultiThreadedHttpConnectionManager.java:961) - Notifying no-one, there are no waiting threads [2010-03-02 21:24:00] [pool-3-thread-1] INFO (SnapPuller.java:278) - Number of files in latest index in master: 163 [2010-03-02 21:24:00] [pool-3-thread-1] DEBUG(SnapPuller.java:536) - downloadIndexFiles(downloadCompleteIndex=false,tmpIdxDir=../solr/data/index.20100302092400,latestVersion=1266003907838) [2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(SnapPuller.java:541) - --localIndexFile=/opt/solr_env/solr/data/index/_7h0y.fdx [2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(SnapPuller.java:900) - fetchFile() [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter PostMethod.addParameter(String, String) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - enter EntityEnclosingMethod.clearRequestBody() [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter PostMethod.addParameter(String, String) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - enter EntityEnclosingMethod.clearRequestBody() [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter PostMethod.addParameter(String, String) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - enter EntityEnclosingMethod.clearRequestBody() [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter PostMethod.addParameter(String, String) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - enter EntityEnclosingMethod.clearRequestBody() [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter PostMethod.addParameter(String, String) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - enter EntityEnclosingMethod.clearRequestBody() [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpClient.java:321) - enter HttpClient.executeMethod(HttpMethod) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpClient.java:374) - enter HttpClient.executeMethod(HostConfiguration,HttpMethod,HttpState) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(MultiThreadedHttpConnectionManager.java:405) - enter HttpConnectionManager.getConnectionWithTimeout(HostConfiguration, long) [2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(MultiThreadedHttpConnectionManager.java:412) - HttpConnectionManager.getConnection: config = HostConfiguration[host=http://myserver.com:8983], timeout = 0 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(MultiThreadedHttpConnectionManager.java:805) - enter HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(MultiThreadedHttpConnectionManager.java:805) - enter HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration) [2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(MultiThreadedHttpConnectionManager.java:839) - Getting free connection, hostConfig=HostConfiguration[host=http://myserver.com:8983] [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodDirector.java:379) - Attempt number 1 to process request [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:1079) - enter HttpMethodBase.execute(HttpState, HttpConnection) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:2057) - enter HttpMethodBase.writeRequest(HttpState, HttpConnection) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:2212) - enter HttpMethodBase.writeRequestLine(HttpState, HttpConnection) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:1496) - enter HttpMethodBase.generateRequestLine(HttpConnection, String, String, String, String) [2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(Wire.java:70) - POST /solr/replication HTTP/1.1[\r][\n] [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpConnection.java:1032) - enter HttpConnection.print(String) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpConnection.java:942) - enter HttpConnection.write(byte[]) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpConnection.java:963) - enter
Re: replication issue
Hi Matthieu, Does this happen over and over? Is this with Solr 1.4 or some other version? Is there anything unusual about _7h0y.fdx? Does _7h0y.fdx still exist on the master when the replication fails? ... Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: Matthieu Labour matthieu_lab...@yahoo.com To: solr-user@lucene.apache.org Sent: Tue, March 2, 2010 4:35:46 PM Subject: Re: replication issue The replication does not work for me I have a big master solr and I want to start replicating it. I can see that the slave is downloading data from the master... I see a directory index.20100302093000 gets created in data/ next to index... I can see its size growing but then the directory gets deleted Here is the complete trace (I added a couple of LOG messages and compile solr) [2010-03-02 21:24:00] [pool-3-thread-1] DEBUG(MultiThreadedHttpConnectionManager.java:961) - Notifying no-one, there are no waiting threads [2010-03-02 21:24:00] [pool-3-thread-1] INFO (SnapPuller.java:278) - Number of files in latest index in master: 163 [2010-03-02 21:24:00] [pool-3-thread-1] DEBUG(SnapPuller.java:536) - downloadIndexFiles(downloadCompleteIndex=false,tmpIdxDir=../solr/data/index.20100302092400,latestVersion=1266003907838) [2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(SnapPuller.java:541) - --localIndexFile=/opt/solr_env/solr/data/index/_7h0y.fdx [2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(SnapPuller.java:900) - fetchFile() [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter PostMethod.addParameter(String, String) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - enter EntityEnclosingMethod.clearRequestBody() [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter PostMethod.addParameter(String, String) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - enter EntityEnclosingMethod.clearRequestBody() [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter PostMethod.addParameter(String, String) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - enter EntityEnclosingMethod.clearRequestBody() [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter PostMethod.addParameter(String, String) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - enter EntityEnclosingMethod.clearRequestBody() [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter PostMethod.addParameter(String, String) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - enter EntityEnclosingMethod.clearRequestBody() [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpClient.java:321) - enter HttpClient.executeMethod(HttpMethod) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpClient.java:374) - enter HttpClient.executeMethod(HostConfiguration,HttpMethod,HttpState) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(MultiThreadedHttpConnectionManager.java:405) - enter HttpConnectionManager.getConnectionWithTimeout(HostConfiguration, long) [2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(MultiThreadedHttpConnectionManager.java:412) - HttpConnectionManager.getConnection: config = HostConfiguration[host=http://myserver.com:8983], timeout = 0 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(MultiThreadedHttpConnectionManager.java:805) - enter HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(MultiThreadedHttpConnectionManager.java:805) - enter HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration) [2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(MultiThreadedHttpConnectionManager.java:839) - Getting free connection, hostConfig=HostConfiguration[host=http://myserver.com:8983] [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodDirector.java:379) - Attempt number 1 to process request [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:1079) - enter HttpMethodBase.execute(HttpState, HttpConnection) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:2057) - enter HttpMethodBase.writeRequest(HttpState, HttpConnection) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:2212) - enter HttpMethodBase.writeRequestLine(HttpState, HttpConnection) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:1496) - enter HttpMethodBase.generateRequestLine(HttpConnection, String, String, String, String) [2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(Wire.java:70) - POST /solr/replication HTTP/1.1[\r][\n] [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpConnection.java:1032) - enter HttpConnection.print(String) [2010-03-02 21:24:40] [pool-3-thread-1]
RE: Unindexed Fields Are Searchable?
Great catch! Thanks for spotting my error :) -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Tuesday, March 02, 2010 2:07 PM To: solr-user@lucene.apache.org Subject: Re: Unindexed Fields Are Searchable? Again, note that it should be index_ed_=false. ed - very important! If you're saying index=false, Solr is not reading that attribute at all, and going with the default for the field type. Perfect catch :)
Re: replication issue
Otis Thank your for your response. I apologize for not being specific enough -- yes it happened over over. -- apache-solr-1.4.0 -- I restarted the indexing+replication from scratch. Before I did that, I backed up the master index directory. I don't see _7h0y.fdx in it What could have possibly happen? --- On Tue, 3/2/10, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: From: Otis Gospodnetic otis_gospodne...@yahoo.com Subject: Re: replication issue To: solr-user@lucene.apache.org Date: Tuesday, March 2, 2010, 4:40 PM Hi Matthieu, Does this happen over and over? Is this with Solr 1.4 or some other version? Is there anything unusual about _7h0y.fdx? Does _7h0y.fdx still exist on the master when the replication fails? ... Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: Matthieu Labour matthieu_lab...@yahoo.com To: solr-user@lucene.apache.org Sent: Tue, March 2, 2010 4:35:46 PM Subject: Re: replication issue The replication does not work for me I have a big master solr and I want to start replicating it. I can see that the slave is downloading data from the master... I see a directory index.20100302093000 gets created in data/ next to index... I can see its size growing but then the directory gets deleted Here is the complete trace (I added a couple of LOG messages and compile solr) [2010-03-02 21:24:00] [pool-3-thread-1] DEBUG(MultiThreadedHttpConnectionManager.java:961) - Notifying no-one, there are no waiting threads [2010-03-02 21:24:00] [pool-3-thread-1] INFO (SnapPuller.java:278) - Number of files in latest index in master: 163 [2010-03-02 21:24:00] [pool-3-thread-1] DEBUG(SnapPuller.java:536) - downloadIndexFiles(downloadCompleteIndex=false,tmpIdxDir=../solr/data/index.20100302092400,latestVersion=1266003907838) [2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(SnapPuller.java:541) - --localIndexFile=/opt/solr_env/solr/data/index/_7h0y.fdx [2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(SnapPuller.java:900) - fetchFile() [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter PostMethod.addParameter(String, String) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - enter EntityEnclosingMethod.clearRequestBody() [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter PostMethod.addParameter(String, String) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - enter EntityEnclosingMethod.clearRequestBody() [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter PostMethod.addParameter(String, String) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - enter EntityEnclosingMethod.clearRequestBody() [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter PostMethod.addParameter(String, String) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - enter EntityEnclosingMethod.clearRequestBody() [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter PostMethod.addParameter(String, String) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - enter EntityEnclosingMethod.clearRequestBody() [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpClient.java:321) - enter HttpClient.executeMethod(HttpMethod) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpClient.java:374) - enter HttpClient.executeMethod(HostConfiguration,HttpMethod,HttpState) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(MultiThreadedHttpConnectionManager.java:405) - enter HttpConnectionManager.getConnectionWithTimeout(HostConfiguration, long) [2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(MultiThreadedHttpConnectionManager.java:412) - HttpConnectionManager.getConnection: config = HostConfiguration[host=http://myserver.com:8983], timeout = 0 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(MultiThreadedHttpConnectionManager.java:805) - enter HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(MultiThreadedHttpConnectionManager.java:805) - enter HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration) [2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(MultiThreadedHttpConnectionManager.java:839) - Getting free connection, hostConfig=HostConfiguration[host=http://myserver.com:8983] [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodDirector.java:379) - Attempt number 1 to process request [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:1079) - enter HttpMethodBase.execute(HttpState, HttpConnection) [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:2057) - enter HttpMethodBase.writeRequest(HttpState, HttpConnection) [2010-03-02 21:24:40] [pool-3-thread-1]
Re: Implementing hierarchical facet
If it's a requirement to let Solr handle the facet-hierarchy please disregard this post, but an alternative would be to have your App control when to ask for which 'facet-level' (e.g: country, state, city) in the hierarchy. as follows, each doc has 3 seperate fields (indexed=true, stored=false): - countryid - stateid - cityid facet on country: facet=onfacet.field=countryid facet on state ( country selected. functionally you probably don't want to show states without the user having selected a country anyway) facet=onfacet.field=countryidfq=countryid:somecountryid facet on city (state selected, same functional analogy as above) facet=onfacet.field=cityidfq=stateid:somestateid or facet on city (countryselected, same functional analogy as above) facet=onfacet.field=cityidfq=countryid:somecountryid grab the resulting facat and drop it under Location pros: - reusing fq's (good performance, I've never used hierarchical facets, but would be surprised if it has a (major) speed increase to this method) - flexible (you get multiple hierarchies: country -- state -- city and country -- city) cons: - a little more application logic Hope that helps, Geert-Jan 2010/3/2 Andy angelf...@yahoo.com I read that a simple way to implement hierarchical facet is to concatenate strings with a separator. Something like level1level2level3 with as the separator. A problem with this approach is that the number of facet values will greatly increase. For example I have a facet Location with the hierarchy countrystatecity. Using the above approach every single city will lead to a separate facet value. With tens of thousands of cities in the world the response from Solr will be huge. And then on the client side I'd have to loop through all the facet values and combine those with the same country into a single value. Ideally Solr would be aware of the hierarchy structure and send back responses accordingly. So at level 1 Solr will send back facet values based on country (100 or so values). Level 2 the facet values will be based on the states within the selected country (a few dozen values). Next level will be cities within that state. and so on. Is it possible to implement hierarchical facet this way using Solr?
Re: Implementing hierarchical facet
Using Solr 1.4: even less changes to the frontend: facet=onfacet.field={!key=Location}countryid ... facet=onfacet.field={!key=Location}cityidfq=countryid:somecountryid etc. will consistently render the resulting facet under the name Location . 2010/3/3 Geert-Jan Brits gbr...@gmail.com If it's a requirement to let Solr handle the facet-hierarchy please disregard this post, but an alternative would be to have your App control when to ask for which 'facet-level' (e.g: country, state, city) in the hierarchy. as follows, each doc has 3 seperate fields (indexed=true, stored=false): - countryid - stateid - cityid facet on country: facet=onfacet.field=countryid facet on state ( country selected. functionally you probably don't want to show states without the user having selected a country anyway) facet=onfacet.field=countryidfq=countryid:somecountryid facet on city (state selected, same functional analogy as above) facet=onfacet.field=cityidfq=stateid:somestateid or facet on city (countryselected, same functional analogy as above) facet=onfacet.field=cityidfq=countryid:somecountryid grab the resulting facat and drop it under Location pros: - reusing fq's (good performance, I've never used hierarchical facets, but would be surprised if it has a (major) speed increase to this method) - flexible (you get multiple hierarchies: country -- state -- city and country -- city) cons: - a little more application logic Hope that helps, Geert-Jan 2010/3/2 Andy angelf...@yahoo.com I read that a simple way to implement hierarchical facet is to concatenate strings with a separator. Something like level1level2level3 with as the separator. A problem with this approach is that the number of facet values will greatly increase. For example I have a facet Location with the hierarchy countrystatecity. Using the above approach every single city will lead to a separate facet value. With tens of thousands of cities in the world the response from Solr will be huge. And then on the client side I'd have to loop through all the facet values and combine those with the same country into a single value. Ideally Solr would be aware of the hierarchy structure and send back responses accordingly. So at level 1 Solr will send back facet values based on country (100 or so values). Level 2 the facet values will be based on the states within the selected country (a few dozen values). Next level will be cities within that state. and so on. Is it possible to implement hierarchical facet this way using Solr?
Need suggestion regarding custom transformer
Hi, Am new to solr. I am trying location aware search with spatial lucene in solr1.5 nightly build. My table in mysql has just lat,lng and some text .I want to add geohash, lat_rad(lat in radian) and lng_rad field into the document before indexing. I have used dataimport to get my table to solr. I have to use GeohashUtils.Encode() to get geohash from corresponding lat,lng of each row; and *ToRads function to get lat in radians. Can i use custom transformers so that after retreiving each row , add these fields and then index while using dataimport? Or do i have to do data migration to xml and then do changes required before indexing? Thanks in advance. -- View this message in context: http://old.nabble.com/Need-suggestion-regarding-custom-transformer-tp27763576p27763576.html Sent from the Solr - User mailing list archive at Nabble.com.
Getting total term count
Hi, I want a want a way to get total term count per document. I am using solr1.4. My query looks something like this http://192.168.1.50:8080/solr1/core_SFS/select/?q=content%3Apresident%0D%0Aversion=2.2start=0rows=10indent=on I tried to use TermVectorComponent but it just gives me the number of document where the term was found. (This was the option which i used --- qt=tvrhtv=truetv.tf=truetv.df=truetv.positionstv.offsets=true) Can anyone please guide me on how to get the total term count per document. Thanks. -- View this message in context: http://old.nabble.com/Getting-total-term-count-tp27763844p27763844.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Implementing hierarchical facet
Thanks. I didn't know about the {!key=Location} trick. Thanks everyone for your help. From what I could gather, there're 3 approaches: 1) SOLR-64 Pros: - can have arbitrary levels of hierarchy without modifying schema Cons: - each combination of all the levels in the hierarchy will result in a separate filter cache. This number could be huge, which would lead to poor performance 2) SOLR-792 Pros: - each level of the hierarchy separately results in filter cache. Much smaller number of filter cache. Better performance. Cons: - Only 2 levels are supported 3) Separate fields for each hierarchy levels Pros: - same as SOLR-792. Good performance Cons: - can only handle a fixed number of levels in the hierarchy. Adding any levels beyond that requires schema modification Does that sound right? Option 3 is probably the best match for my use case. Is there any trick to make it able to deal with arbitrary number of levels? Thanks. --- On Tue, 3/2/10, Geert-Jan Brits gbr...@gmail.com wrote: From: Geert-Jan Brits gbr...@gmail.com Subject: Re: Implementing hierarchical facet To: solr-user@lucene.apache.org Date: Tuesday, March 2, 2010, 8:02 PM Using Solr 1.4: even less changes to the frontend: facet=onfacet.field={!key=Location}countryid ... facet=onfacet.field={!key=Location}cityidfq=countryid:somecountryid etc. will consistently render the resulting facet under the name Location . 2010/3/3 Geert-Jan Brits gbr...@gmail.com If it's a requirement to let Solr handle the facet-hierarchy please disregard this post, but an alternative would be to have your App control when to ask for which 'facet-level' (e.g: country, state, city) in the hierarchy. as follows, each doc has 3 seperate fields (indexed=true, stored=false): - countryid - stateid - cityid facet on country: facet=onfacet.field=countryid facet on state ( country selected. functionally you probably don't want to show states without the user having selected a country anyway) facet=onfacet.field=countryidfq=countryid:somecountryid facet on city (state selected, same functional analogy as above) facet=onfacet.field=cityidfq=stateid:somestateid or facet on city (countryselected, same functional analogy as above) facet=onfacet.field=cityidfq=countryid:somecountryid grab the resulting facat and drop it under Location pros: - reusing fq's (good performance, I've never used hierarchical facets, but would be surprised if it has a (major) speed increase to this method) - flexible (you get multiple hierarchies: country -- state -- city and country -- city) cons: - a little more application logic Hope that helps, Geert-Jan 2010/3/2 Andy angelf...@yahoo.com I read that a simple way to implement hierarchical facet is to concatenate strings with a separator. Something like level1level2level3 with as the separator. A problem with this approach is that the number of facet values will greatly increase. For example I have a facet Location with the hierarchy countrystatecity. Using the above approach every single city will lead to a separate facet value. With tens of thousands of cities in the world the response from Solr will be huge. And then on the client side I'd have to loop through all the facet values and combine those with the same country into a single value. Ideally Solr would be aware of the hierarchy structure and send back responses accordingly. So at level 1 Solr will send back facet values based on country (100 or so values). Level 2 the facet values will be based on the states within the selected country (a few dozen values). Next level will be cities within that state. and so on. Is it possible to implement hierarchical facet this way using Solr?
Re: question regarding coord() value
The first 2 queries 'electORnics' instead of 'electROnics'. The third query shows the situation. The first clause has 1 out of 2 matches, and the second has 1 out of 3 matches. Look for the two 'coord' entries. They are 1/2 and 1/3. str name=SP2514N 0.61808145 = (MATCH) sum of: 0.16856766 = (MATCH) product of: 0.33713531 = (MATCH) sum of: 0.33713531 = (MATCH) weight(name:samsung in 0), product of: 0.39687544 = queryWeight(name:samsung), product of: 3.3978953 = idf(docFreq=1, maxDocs=22) 0.116800375 = queryNorm 0.84947383 = (MATCH) fieldWeight(name:samsung in 0), product of: 1.0 = tf(termFreq(name:samsung)=1) 3.3978953 = idf(docFreq=1, maxDocs=22) 0.25 = fieldNorm(field=name, doc=0) 0.5 = coord(1/2) 0.44951376 = (MATCH) product of: 1.3485413 = (MATCH) sum of: 1.3485413 = (MATCH) weight(manu:electronics in 0), product of: 0.39687544 = queryWeight(manu:electronics), product of: 3.3978953 = idf(docFreq=1, maxDocs=22) 0.116800375 = queryNorm 3.3978953 = (MATCH) fieldWeight(manu:electronics in 0), product of: 1.0 = tf(termFreq(manu:electronics)=1) 3.3978953 = idf(docFreq=1, maxDocs=22) 1.0 = fieldNorm(field=manu, doc=0) 0.3334 = coord(1/3) On Tue, Mar 2, 2010 at 3:35 AM, Smith G gudumba.sm...@gmail.com wrote: Hello , I have been trying to find out what exactly coord-value is . I have executed different queries where I have observed strange behaviour. Leave the numerator-value in coord fraction at the moment as I am really confused what exactly the denominator is. Here are the examples . Query 1) (+text:samsung +text:electron +name:samsung) (+manu:samsung +features:samsung (+manu:electronics +name:electronics)) manu:electornics name:one name:two coord value is : 1/5 [consider only denominator], I guess as there are 5 clauses (combinations) it could be five. Query 2) ((+text:samsung +(text:electron name:samsung)) (+manu:samsung +features:samsung (+manu:electronics +name:electronics))) (manu:electornics name:one) name:two coord value is :1/3 . Same logic works here [for the denominator value-3] Query 3) (name:samsung features:abc) (features:name name:electronics manu:electronics) But here, coord value is : 1/3 . I have been trying to reckon how it could be 3, but I could not. - I have tried to correlate the info present in the Java Documentation, but I was not successful again. Please clarify. Thanks. -- Lance Norskog goks...@gmail.com
Re: Simultaneous Writes to Index
Locking is at a lower level than indexing and queries. Solr coordinates multi-threaded indexing and query operations in memory and a separate thread writes data to disk. There are no performance problems with multiple searches and indexes happening at the same time. 2010/3/2 Kranti™ K K Parisa kranti.par...@gmail.com: and also about the time when two update requests come at the same time. Then whichever request comes first will be updating the index while other requests wait until the locktimeout that we have configured?? Best Regards, Kranti K K Parisa 2010/3/2 Kranti™ K K Parisa kranti.par...@gmail.com Hi Ron, Thanks for the reply. So does this mean that writer lock is nothing to do with concurrent writes? Best Regards, Kranti K K Parisa On Tue, Mar 2, 2010 at 4:19 PM, Ron Chan rc...@i-tao.com wrote: as long as the document id is unique, concurrent writes is fine if for same reason the same doc id is used then it is overwritten, so last in will be the one that is in the index Ron - Original Message - From: Kranti™ K K Parisa kranti.par...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, 2 March, 2010 10:40:37 AM Subject: Simultaneous Writes to Index Hi, I am planning to development some application on which users could update their account data after login, this is on top of the search facility users have. the basic work flow is 1) user logs in 2) searches for some data 3) gets the results from solr index 4) save some of the search results into their repository 5) later on they may view their repository for this, at step4 I am planning to write that into a separate solr index as user may search within his repository and get the results, facets..etc. So thinking to write such data/info to a separate solr index. in this plan, how simultaneous writes to the user history index works. what are the best practices in such scenarios of updating index at a time by different users. the other alternative is to store such user info into DB, and schedule indexing process at regular intervals. But that wont make the system live with user actions, as there would be some delay, users cant see the data they saved in their repository until its indexed. that is the reason I am planning to use SOLR xml post request to update the index silently but how about multiple users writing on same index? Best Regards, Kranti K K Parisa -- Lance Norskog goks...@gmail.com
DIH onError question
Hi all, I am using Solr 1.5 from trunk. I am getting the below error on a full load, and it is causing the import to fail and rollback. I am not concerned about the error but rather that I cannot seem to tell the indexing to continue. I have two entities, and I have tried all (4) combinations of skip and continue for their onError attributes. SEVERE: Exception while processing: f document : null org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NoClassDefFoundError: org/bouncycastle/jce/provider/BouncyCastleProvider at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j ava:652) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j ava:606) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java :261) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18 5) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte r.java:333) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java :391) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java: 372) Caused by: java.lang.NoClassDefFoundError: org/bouncycastle/jce/provider/BouncyCastleProvider at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1108 ) at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573) at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:23 5) at org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180) at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:69) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101) at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit yProcessor.java:124) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity ProcessorWrapper.java:233) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j ava:580) ... 6 more Mar 2, 2010 10:21:05 PM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NoClassDefFoundError: org/bouncycastle/jce/provider/BouncyCastleProvider at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j ava:652) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j ava:606) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java :261) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18 5) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte r.java:333) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java :391) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java: 372) Caused by: java.lang.NoClassDefFoundError: org/bouncycastle/jce/provider/BouncyCastleProvider at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1108 ) at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573) at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:23 5) at org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180) at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:69) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101) at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit yProcessor.java:124) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity ProcessorWrapper.java:233) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j ava:580) ... 6 more Mar 2, 2010 10:21:05 PM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: start rollback My data-config file: dataConfig dataSource name=binaryFile type=BinFileDataSource / document entity name=f processor=FileListEntityProcessor transformer=RegexTransformer,TemplateTransformer baseDir=C:\Docs fileName=.*pdf recursive=true rootEntity=false pk=id dataSource=binaryFile onError=skip field column=id sourceColName=fileAbsolutePath regex=\\ replaceWith=/ / entity dataSource=binaryFile name=x processor=TikaEntityProcessor url=${f.fileAbsolutePath} onError=continue field column=text name=text / /entity /entity /document /dataConfig Thanks, Nirmal
Re: How can I get Solr-Cell to extract to multi-valued fields?
It is a bug. I just filed this. It is just a unit test that displays the behavior. http://issues.apache.org/jira/browse/SOLR-1803 On Tue, Mar 2, 2010 at 9:07 AM, Mark Roberts mark.robe...@red-gate.com wrote: Hi, I have a schema with a multivalued field like so: field name=product type=string indexed=true stored=true multiValued=true/ I am uploading html documents to the Solr extraction handler which contain meta in the head, like so: meta name=product content=firstproduct / meta name=product content=anotherproduct / meta name=product content=andanotherproduct / I want the extraction handler to map each of these pieces of meta onto the product field, however, there seems to be a problem - only the last item andanotherproduct is mapped, the first seem to be ignored. It does work, however, if I pass the values as literals in the query string (e.g. literal.product=firstproductliteral.product=anotherproductliteral.product=andanotherproduct) I've tried the release version 1.4 of solr and a recent nightly build of 1.5 and neither work. Is this a bug in Solr-cell or am I doing something wrong? Many thanks, Mark. -- Lance Norskog goks...@gmail.com
Re: Warning : no lockType configured for...
I don't know, I didn't try because I have the need to create a different core each time. I'll do some tests with the default config and will report back to all of you Thank you for your time Tom Hill. wrote: Hi Mani, Mani EZZAT wrote: I'm dynamically creating cores with a new index, using the same schema and solrconfig.xml Does the problem occur if you use the same configuration in a single, static core? Tom