TZ parameter
Hi, I'm a little stumped on the TZ param for running a date range query. I've indexed a single doc with a dateTime field value as 2013-07-08T00:00:00Z. My query is basically this: ?q=date_dt:[* TO 2013-07-07T23:00:00Z]TZ=America/New_York From what I'm seeing here: http://wiki.apache.org/solr/CoreQueryParameters#TZ ... the date literal I'm passing-in should? be converted to UTC (w/DST applied), and end up looking something like: 2013-07-08T03:00:00Z -- am I on the right track here? When I run that query, the doc-set is empty. I would expect to see my document in the result because the date range (UTC via TZ) includes the documents' date_dt time value (UTC). What am I doing wrong here? Thanks, Matt
solr home in jar?
Hi, I'd like to bundle up a jar file, with a complete solr home and index. This jar file is a dependency for another application, which uses an instance of embedded solr, multi-core. Is there any way to have the application's embedded solr, read the configs/index data from jar dependency? I attempted using CoreContainer with a resource loader (and many other ways), but no luck! Any ideas? - Matt
solr geospatial / spatial4j
Hi, I'm researching options for handling a better geospatial solution. I'm currently using Solr 3.5 for a read-only database, and the point/radius searches work great. But I'd like to start doing point in polygon searches as well. I've skimmed through some of the geospatial jira issues, and read about spaitial4j, which is very interesting. I see on the github page that this will soon be part of lucene, can anyone confirm this? I attempted to build the spatial4j demo but no luck. It had problems finding lucene 4.0-SNAPSHOT, which I guess is because there are no 4.0-SNAPSHOT nightly builds? If anyone knows how I can get around this, please let me know! Other than spatial4j, is there a way to do point in polgyon searches with solr 3.5.0 right now? Is there some tricky indexing/querying strategy that would allow this? Thanks! - Matt
KeywordTokenizerFactory and stopwords
Hi, I have an autocomplete fieldType that works really well, but because the KeywordTokenizerFactory (if I understand correctly) is emitting a single token, the stopword filter will not detect any stopwords. Anyone know of a way to strip out stopwords when using KeywordTokenizerFactory? I did try the reg-exp replace filter, but I'm not sure I want to add a bunch of reg-exps for replacing every stopword. Thanks, Matt Here's the fieldType definition: fieldType name=autocomplete class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=50/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ /analyzer /fieldType
Re: KeywordTokenizerFactory and stopwords
Hi Erik. Yes something like what you describe would do the trick. I did find this: http://lucene.472066.n3.nabble.com/Concatenate-multiple-tokens-into-one-td1879611.html I might try the pattern replace filter with stopwords, even though that feels kinda clunky. Matt On Wed, Jun 8, 2011 at 11:04 AM, Erik Hatcher erik.hatc...@gmail.com wrote: This seems like it deserves some kind of collecting TokenFilter(Factory) that will slurp up all incoming tokens and glue them together with a space (and allow separator to be configurable). Hmmm surprised one of those doesn't already exist. With something like that you could have a standard tokenization chain, and put it all back together at the end. Erik On Jun 8, 2011, at 10:59 , Matt Mitchell wrote: Hi, I have an autocomplete fieldType that works really well, but because the KeywordTokenizerFactory (if I understand correctly) is emitting a single token, the stopword filter will not detect any stopwords. Anyone know of a way to strip out stopwords when using KeywordTokenizerFactory? I did try the reg-exp replace filter, but I'm not sure I want to add a bunch of reg-exps for replacing every stopword. Thanks, Matt Here's the fieldType definition: fieldType name=autocomplete class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=50/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ /analyzer /fieldType
Solr throwing exception when evicting from filterCache
I have a recent build of solr (4.0.0.2011.02.25.13.06.24). I am seeing this error when making a request (with fq's), right at the point where the eviction count goes from 0 up: severe: java.lang.classcastexception: [ljava.lang.object; cannot be cast to [lorg.apache.solr.common.util.concurrentlrucache$cacheentry If you then make another request, Solr response with the expected result. Is this a bug? Has anyone seen this before? Any tips/help/feedback/questions would be much appreciated! Thanks, Matt
Re: Solr throwing exception when evicting from filterCache
Here's the full stack trace: [Ljava.lang.Object; cannot be cast to [Lorg.apache.solr.common.util.ConcurrentLRUCache$CacheEntry; java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [Lorg.apache.solr.common.util.ConcurrentLRUCache$CacheEntry; at org.apache.solr.common.util.ConcurrentLRUCache$PQueue.myInsertWithOverflow(ConcurrentLRUCache.java:377) at org.apache.solr.common.util.ConcurrentLRUCache.markAndSweep(ConcurrentLRUCache.java:329) at org.apache.solr.common.util.ConcurrentLRUCache.put(ConcurrentLRUCache.java:144) at org.apache.solr.search.FastLRUCache.put(FastLRUCache.java:131) at org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:613) at org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:652) at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1233) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1086) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:337) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:431) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:231) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1298) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:340) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.h On Thu, Mar 24, 2011 at 1:54 PM, Matt Mitchell goodie...@gmail.com wrote: I have a recent build of solr (4.0.0.2011.02.25.13.06.24). I am seeing this error when making a request (with fq's), right at the point where the eviction count goes from 0 up: severe: java.lang.classcastexception: [ljava.lang.object; cannot be cast to [lorg.apache.solr.common.util.concurrentlrucache$cacheentry If you then make another request, Solr response with the expected result. Is this a bug? Has anyone seen this before? Any tips/help/feedback/questions would be much appreciated! Thanks, Matt
embedded solr and tomcat
I'm considering running an embedded instance of Solr in Tomcat (Amazon's beanstalk). Has anyone done this before? I'd be very interested in how I can instantiate Embedded solr in Tomcat. Do I need a resource loader to instantiate? If so, how? Thanks, Matt
api key filtering
Just wanted to see if others are handling this in some special way, but I think this is pretty simple. We have a database of api keys that map to allowed db records. I'm planning on indexing the db records into solr, along with their api keys in an indexed, non-stored, multi-valued field. Then, to query for docs that belong to a particular api key, they'll be queried using a filter query on api_key. The only concern of mine is that, what if we end up with 100k api_keys? Would it be a problem to have 100k non-stored keys in each document? We have about 500k documents total. Matt
Re: api key filtering
Hey thanks I'll definitely have a read. The only problem with this though, is that our api is a thin layer of app-code, with solr only (no db), we index data from our sql db into solr, and push the index off for consumption. The only other idea I had was to send a list of the allowed document ids along with every solr query, but then I'm sure I'd run into a filter query limit. Each key could be associated with up to 2k documents, so that's 2k values in an fq which would probably be too many for lucene (I think its limit 1024). Matt On Sat, Jan 22, 2011 at 3:40 PM, Dennis Gearon gear...@sbcglobal.netwrote: The only way that you would have that many api keys per record, is if one of them represented 'public', right? 'public' is a ROLE. Your answer is to use RBAC style techniques. Here are some links that I have on the subject. What I'm thinking of doing is: Sorry for formatting, Firefox is freaking out. I cut and pasted these from an email from my sent box. I hope the links came out. Part 1 http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/ Part2 Role-based access control in SQL, part 2 at Xaprb ACL/RBAC Bookmarks ALL UserRbac - symfony - Trac A Role-Based Access Control (RBAC) system for PHP Appendix C: Task-Field Access Role-based access control in SQL, part 2 at Xaprb PHP Access Control - PHP5 CMS Framework Development | PHP Zone Linux file and directory permissions MySQL :: MySQL 5.0 Reference Manual :: C.5.4.1 How to Reset the Root Password per RECORD/Entity permissions? - symfony users | Google Groups Special Topics: Authentication and Authorization | The Definitive Guide to Yii | Yii Framework att.net Mail (gear...@sbcglobal.net) Solr - User - Modelling Access Control PHP Generic Access Control Lists Row-level Model Access Control for CakePHP « some flot, some jet Row-level Model Access Control for CakePHP « some flot, some jet Yahoo! GeoCities: Get a web site with easy-to-use site building tools. Class that acts as a client to a JSON service : JSON « GWT « Java Juozas Kaziukėnas devBlog Re: [symfony-users] Implementing an existing ACL API in symfony php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow W3C ACL System makeAclTables.sql SchemaWeb - Classes And Properties - ACL Schema Reardon's Ruminations: Spring Security ACL Schema for Oracle trunk/modules/auth/libraries/Khacl.php | Source/SVN | Assembla Acl.php - kohana-mptt - Project Hosting on Google Code Asynchronous JavaScript Technology and XML (Ajax) With the Java Platform The page cannot be found Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Matt Mitchell goodie...@gmail.com To: solr-user@lucene.apache.org Sent: Sat, January 22, 2011 11:48:22 AM Subject: api key filtering Just wanted to see if others are handling this in some special way, but I think this is pretty simple. We have a database of api keys that map to allowed db records. I'm planning on indexing the db records into solr, along with their api keys in an indexed, non-stored, multi-valued field. Then, to query for docs that belong to a particular api key, they'll be queried using a filter query on api_key. The only concern of mine is that, what if we end up with 100k api_keys? Would it be a problem to have 100k non-stored keys in each document? We have about 500k documents total. Matt
Re: api key filtering
I think that indexing the access information is going to work nicely, and I agree that sticking with the simplest/solr way is best. The constraint is super simple... you can view this set of documents or you can't... based on an api key: fq=api_key:xxx Thanks for the feedback on this guys! Matt 2011/1/22 Jonathan Rochkind rochk...@jhu.edu If you COULD solve your problem by indexing 'public', or other tokens from a limited vocabulary of document roles, in a field -- then I'd definitely suggest you look into doing that, rather than doing odd things with Solr instead. If the only barrier is not currently having sufficient logic at the indexing stage to do that, then it is going to end up being a lot less of a headache in the long term to simply add a layer at the indexing stage to add that in, then trying to get Solr to do things outside of it's, well, 'comfort zone'. Of course, depending on your requirements, it might not be possible to do that, maybe you can't express the semantics in terms of a limited set of roles applied to documents. And then maybe your best option really is sending an up to 2k element list (not exactly the same list every time, presumably) of acceptable documents to Solr with every query, and maybe you can get that to work reasonably. Depending on how many different complete lists of documents you have, maybe there's a way to use Solr caches effectively in that situation, or maybe that's not even neccesary since lookup by unique id should be pretty quick anyway, not really sure. But if the semantics are possible, much better to work with Solr rather than against it, it's going to take a lot less tinkering to get Solr to perform well if you can just send an fq=role:public or something, instead of a list of document IDs. You won't need to worry about it, it'll just work, because you know you're having Solr do what it's built to do. Totally worth a bit of work to add a logic layer at the indexing stage. IMO. From: Erick Erickson [erickerick...@gmail.com] Sent: Saturday, January 22, 2011 4:50 PM To: solr-user@lucene.apache.org Subject: Re: api key filtering 1024 is the default number, it can be increased. See MaxBooleanClauses in solrconfig.xml This shouldn't be a problem with 2K clauses, but expanding it to tens of thousands is probably a mistake (but test to be sure). Best Erick On Sat, Jan 22, 2011 at 3:50 PM, Matt Mitchell goodie...@gmail.com wrote: Hey thanks I'll definitely have a read. The only problem with this though, is that our api is a thin layer of app-code, with solr only (no db), we index data from our sql db into solr, and push the index off for consumption. The only other idea I had was to send a list of the allowed document ids along with every solr query, but then I'm sure I'd run into a filter query limit. Each key could be associated with up to 2k documents, so that's 2k values in an fq which would probably be too many for lucene (I think its limit 1024). Matt On Sat, Jan 22, 2011 at 3:40 PM, Dennis Gearon gear...@sbcglobal.net wrote: The only way that you would have that many api keys per record, is if one of them represented 'public', right? 'public' is a ROLE. Your answer is to use RBAC style techniques. Here are some links that I have on the subject. What I'm thinking of doing is: Sorry for formatting, Firefox is freaking out. I cut and pasted these from an email from my sent box. I hope the links came out. Part 1 http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/ Part2 Role-based access control in SQL, part 2 at Xaprb ACL/RBAC Bookmarks ALL UserRbac - symfony - Trac A Role-Based Access Control (RBAC) system for PHP Appendix C: Task-Field Access Role-based access control in SQL, part 2 at Xaprb PHP Access Control - PHP5 CMS Framework Development | PHP Zone Linux file and directory permissions MySQL :: MySQL 5.0 Reference Manual :: C.5.4.1 How to Reset the Root Password per RECORD/Entity permissions? - symfony users | Google Groups Special Topics: Authentication and Authorization | The Definitive Guide to Yii | Yii Framework att.net Mail (gear...@sbcglobal.net) Solr - User - Modelling Access Control PHP Generic Access Control Lists Row-level Model Access Control for CakePHP « some flot, some jet Row-level Model Access Control for CakePHP « some flot, some jet Yahoo! GeoCities: Get a web site with easy-to-use site building tools. Class that acts as a client to a JSON service : JSON « GWT « Java Juozas Kaziukėnas devBlog Re: [symfony-users] Implementing an existing ACL API in symfony php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow W3C ACL System makeAclTables.sql SchemaWeb - Classes And Properties - ACL Schema
Re: snapshot-4.0 and maven
Hey thanks Tommy. To be more specific, I'm trying to use SolrJ in a clojure project. When I try to use SolrJ using what you showed me, I get errors saying lucene classes can't be found etc.. Is there a way to build everything SolrJ (snapshot-4.0) needs into one jar? Matt On Mon, Oct 18, 2010 at 11:01 PM, Tommy Chheng tommy.chh...@gmail.com wrote: Once you built the solr 4.0 jar, you can use mvn's install command like this: mvn install:install-file -DgroupId=org.apache -DartifactId=solr -Dpackaging=jar -Dversion=4.0-SNAPSHOT -Dfile=solr-4.0-SNAPSHOT.jar -DgeneratePom=true @tommychheng On 10/18/10 7:28 PM, Matt Mitchell wrote: I'd like to get solr snapshot-4.0 pushed into my local maven repo. Is this possible to do? If so, could someone give me a tip or two on getting started? Thanks, Matt
snapshot-4.0 and maven
I'd like to get solr snapshot-4.0 pushed into my local maven repo. Is this possible to do? If so, could someone give me a tip or two on getting started? Thanks, Matt
using score to find high confidence duplicates
I have a solr index full of documents that contain lots of duplicates. The duplicates are not exact duplicates though. Each may vary slightly in content. After indexing, I have a bit of code that loops through the entire index just to get what I'm calling target documents. For each target document, I then send another query to find similar documents to the target. This similarity query includes a clause to match the target to itself, so I can have a normalized max score. This was the only way I could figure out how to reasonably fix the scoring range. The response always includes the target at the top, and similar documents afterward. So I take the scores and scale to 0-100, where 100 is always the target matching itself. So far so good... What I want to do is create a confidence score threshold, so I can automatically accept similar documents that have a score above the threshold. If my query *structure* never changes, but only the values in the query change... is it possible to produce a reliable threshold score that I could use? Hope this makes sense :) Matt
Re: dynamic stop words?
Great, thanks Hoss. I'll try dismax out today and see what happens with this. Matt On Tue, Oct 12, 2010 at 7:35 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Is it possible to have certain query terms not effect score, if that : same query term is present in a field? For example, I have an index of that use case is precisely what the DisjunctionMaxQuery (generated by the dismax parser) does for you if you set the tie param to 0 when one of the words in query results in a high score in fieldA, the contribution to the score from that word in all of the other fields is ignored (the tie attribute is multiplied by the score of all the fields that are not the max score contribution) -Hoss
Re: using score to find high confidence duplicates
No this isn't the MLT, just the standard query parser for now. I did try the heuristic approach and I might stick with that actually. I ran the process on known duplicates and created a collection of all scores. I was then able to see how well the query worked. The scores seemed focused to one range, which is promising. I totally forgot about the de-duper, I'll have a look at that and see if I can get it to work. Thanks for your help, Matt On Wed, Oct 13, 2010 at 3:00 PM, Peter Karich peat...@yahoo.de wrote: Hi, are you using moreLikeThis for that feature? I have no suggestion for a reliable threshold, I think this depends on the domain you are operating and is IMO only solvable with a heuristic. It also depends on fields, boosts, ... It could be that there is a 'score gap' between duplicates and none duplicates which you can try to find, but I don't know BTW: did you check: http://wiki.apache.org/solr/Deduplication If you need deduplication while querying you could determine a hashvalue from the procedure above and index that into a different field. Then you can use collapse feature on that field to remove duplicates. Regards, Peter. I have a solr index full of documents that contain lots of duplicates. The duplicates are not exact duplicates though. Each may vary slightly in content. After indexing, I have a bit of code that loops through the entire index just to get what I'm calling target documents. For each target document, I then send another query to find similar documents to the target. This similarity query includes a clause to match the target to itself, so I can have a normalized max score. This was the only way I could figure out how to reasonably fix the scoring range. The response always includes the target at the top, and similar documents afterward. So I take the scores and scale to 0-100, where 100 is always the target matching itself. So far so good... What I want to do is create a confidence score threshold, so I can automatically accept similar documents that have a score above the threshold. If my query *structure* never changes, but only the values in the query change... is it possible to produce a reliable threshold score that I could use? Hope this makes sense :) Matt -- http://jetwick.com twitter search prototype
Re: dynamic stop words?
Thanks for the feedback. I thought about stop words but since I have a lot of documents spanning lots of different countries, I won't know all of the possible cities so stop-words could get hard to manage. Also, the city name is in the same field. I think I might try creating a new field called name_no_city, and at index time just strip the city name out? Matt On Sat, Oct 9, 2010 at 11:17 AM, Geert-Jan Brits gbr...@gmail.com wrote: That might work, although depending on your use-case it might be hard to have a good controlled vocab on citynames (hotel metropole bruxelles, hotel metropole brussels, hotel metropole brussel, etc.) Also 'hotel paris bruxelles' stinks... given your example: Doc 1 name = Holiday Inn city = Denver Doc 2 name = Holiday Inn, Denver city = Denver q=name:(Holiday Inn, Denver) turning it upside down, perhaps an alternative would be to query on: q=name:Holiday Inn+city:Denver and configure field 'name' in such a way that doc1 and doc2 score the same. I believe that must be possible, just not sure how to config it exactly at the moment. Of course, it depends on your scenario if you have enough knowlegde on the clientside to transform: q=name:(Holiday Inn, Denver) to q=name:Holiday Inn+city:Denver Hth, Geert-Jan 2010/10/9 Otis Gospodnetic otis_gospodne...@yahoo.com Matt, The first thing that came to my mind is that this might be interesting to try with a dictionary (of city names) if this example is not a made-up one. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Matt Mitchell goodie...@gmail.com To: solr-user@lucene.apache.org Sent: Fri, October 8, 2010 11:22:36 AM Subject: dynamic stop words? Is it possible to have certain query terms not effect score, if that same query term is present in a field? For example, I have an index of hotels. Each hotel has a name and city. If the name of a hotel has the name of the city in it's name field, I want to completely ignore that and not have it influence score. Example: Doc 1 name = Holiday Inn city = Denver Doc 2 name = Holiday Inn, Denver city = Denver q=name:(Holiday Inn, Denver) I'd like those docs to have the same score in the response. I don't want Doc2 to have a higher score, just because it has all of the query terms. Is this possible without using stop words? I hope this makes sense! Thanks, Matt
Re: dynamic stop words?
Exactly yep. I think that'll work nicely. Thanks Jonathan, Matt On Tue, Oct 12, 2010 at 9:47 AM, Jonathan Rochkind rochk...@jhu.edu wrote: You can identify what words are the city name at index time, because they're the ones in the city field, right? So why not just strip those words out at index time? Create a new field, name_search, and search on that, not name. Doc 1 name = Holiday Inn name_search = Holiday Inn [analyzed, perhaps lowercase normalized etc] city = Denver Doc 2 name = Holiday Inn, Denver name_search = Holiday Inn city = Denver Jonathan From: Matt Mitchell [goodie...@gmail.com] Sent: Tuesday, October 12, 2010 9:24 AM To: solr-user@lucene.apache.org Subject: Re: dynamic stop words? Thanks for the feedback. I thought about stop words but since I have a lot of documents spanning lots of different countries, I won't know all of the possible cities so stop-words could get hard to manage. Also, the city name is in the same field. I think I might try creating a new field called name_no_city, and at index time just strip the city name out? Matt On Sat, Oct 9, 2010 at 11:17 AM, Geert-Jan Brits gbr...@gmail.com wrote: That might work, although depending on your use-case it might be hard to have a good controlled vocab on citynames (hotel metropole bruxelles, hotel metropole brussels, hotel metropole brussel, etc.) Also 'hotel paris bruxelles' stinks... given your example: Doc 1 name = Holiday Inn city = Denver Doc 2 name = Holiday Inn, Denver city = Denver q=name:(Holiday Inn, Denver) turning it upside down, perhaps an alternative would be to query on: q=name:Holiday Inn+city:Denver and configure field 'name' in such a way that doc1 and doc2 score the same. I believe that must be possible, just not sure how to config it exactly at the moment. Of course, it depends on your scenario if you have enough knowlegde on the clientside to transform: q=name:(Holiday Inn, Denver) to q=name:Holiday Inn+city:Denver Hth, Geert-Jan 2010/10/9 Otis Gospodnetic otis_gospodne...@yahoo.com Matt, The first thing that came to my mind is that this might be interesting to try with a dictionary (of city names) if this example is not a made-up one. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Matt Mitchell goodie...@gmail.com To: solr-user@lucene.apache.org Sent: Fri, October 8, 2010 11:22:36 AM Subject: dynamic stop words? Is it possible to have certain query terms not effect score, if that same query term is present in a field? For example, I have an index of hotels. Each hotel has a name and city. If the name of a hotel has the name of the city in it's name field, I want to completely ignore that and not have it influence score. Example: Doc 1 name = Holiday Inn city = Denver Doc 2 name = Holiday Inn, Denver city = Denver q=name:(Holiday Inn, Denver) I'd like those docs to have the same score in the response. I don't want Doc2 to have a higher score, just because it has all of the query terms. Is this possible without using stop words? I hope this makes sense! Thanks, Matt
Re: case-insensitive phrase query for string fields
Hey thanks guys! This all makes sense now. I'm using a text field and it's giving good results of course. Matt On Fri, Oct 8, 2010 at 6:08 AM, Erik Hatcher erik.hatc...@gmail.com wrote: Matt - https://issues.apache.org/jira/browse/SOLR-2145 Erik On Oct 7, 2010, at 23:38 , Jonathan Rochkind wrote: If you are going to put explict phrase quotes in the query string like that, an ordinary text field will match fine, on phrase searches or other searches. That is a solr.TextField, not a solr.StrField as you're using. And then you can put a LowerCaseFilter on it of course. And use an ordinary tokenizer, whitespace or worddelimiter or what have you, not the non-tokenizing keywordtokenizer. Just an ordinary solr.TextField. I've never been entirely sure what an indexed solr.StrField is good for exactly. Oh, facets, right. But it's not generally good for matching in an actual 'q', because it's not a tokenized field. Not sure what happens telling a StrField that isn't ever tokenized to use a KeywordTokenizerFactory, maybe it just ignores it, or maybe that's part of the problem. If you mean you only want it to match on _exact_ matches (rather than phrase matches), I haven't quite figured out how to do that, in a dismax query where you only want one field of many to behave that way. But for a single field query (in an fq, or as the only field in a standard query parser q), the field defType will do it. Although now I'm wondering if there is a way to trick a StrField into doing that. From: Matt Mitchell [goodie...@gmail.com] Sent: Thursday, October 07, 2010 10:53 PM To: solr-user@lucene.apache.org Subject: case-insensitive phrase query for string fields What's the recommended approach for handling case-insensitive phrase queries? I've got this setup, but no luck: fieldType name=ci_string class=solr.StrField analyzer filter class=solr.LowerCaseFilterFactory/ tokenizer class=solr.KeywordTokenizerFactory/ /analyzer /fieldType So if I index a doc with a title of Golden Master, then I'd expect a query of q=title:golden master to work, but no go... I know I must be missing something super obvious! Matt
dynamic stop words?
Is it possible to have certain query terms not effect score, if that same query term is present in a field? For example, I have an index of hotels. Each hotel has a name and city. If the name of a hotel has the name of the city in it's name field, I want to completely ignore that and not have it influence score. Example: Doc 1 name = Holiday Inn city = Denver Doc 2 name = Holiday Inn, Denver city = Denver q=name:(Holiday Inn, Denver) I'd like those docs to have the same score in the response. I don't want Doc2 to have a higher score, just because it has all of the query terms. Is this possible without using stop words? I hope this makes sense! Thanks, Matt
case-insensitive phrase query for string fields
What's the recommended approach for handling case-insensitive phrase queries? I've got this setup, but no luck: fieldType name=ci_string class=solr.StrField analyzer filter class=solr.LowerCaseFilterFactory/ tokenizer class=solr.KeywordTokenizerFactory/ /analyzer /fieldType So if I index a doc with a title of Golden Master, then I'd expect a query of q=title:golden master to work, but no go... I know I must be missing something super obvious! Matt
Re: clustering component
Hey thanks Stanislaw! I'm going to try this against the current trunk tonight and see what happens. Matt On Wed, Jul 28, 2010 at 8:41 AM, Stanislaw Osinski stanislaw.osin...@carrotsearch.com wrote: The patch should also work with trunk, but I haven't verified it yet. I've just added a patch against solr trunk to https://issues.apache.org/jira/browse/SOLR-1804. S.
clustering component
Hi, I'm attempting to get the carrot based clustering component (in trunk) to work. I see that the clustering contrib has been disabled for the time being. Does anyone know if this will be re-enabled soon, or even better, know how I could get it working as it is? Thanks, Matt
Re: solr with tomcat in cluster mode
We have a similar setup and I'd be curious to see how folks are doing this as well. Our setup: A few servers and an F5 load balancer. Each Solr instance points to a shared index. We use a separate server for indexing. When the index is complete, we do some juggling using the Core Admin SWAP function and update the shared index. I've wondered about having a shared index across multiple instances of (read-only) Solr -- any problems there? Matt On Fri, Jan 22, 2010 at 9:35 AM, ZAROGKIKAS,GIORGOS g.zarogki...@multirama.gr wrote: Hi I'm using solr 1.4 with tomcat in a single pc and I want to turn it in cluster mode with 2 nodes and load balancing But I can't find info how to do Is there any manual or a recorded procedure on the internet to do that Or is there anyone to help me ? Thanks in advance Ps : I use windows server 2008 for OS
Re: solr with tomcat in cluster mode
Hey Otis, We're indexing on a separate machine because we want to keep our production nodes away from processes like indexing. The indexing server also has a ton of resources available, more so than the production nodes. We set it up as an indexing server at one point and have decided to stick with it. We're not indexing the same index as the search indexes because we want to be able to step back a day or two if needed. So we do the SWAP when things are done and OK. So that last part you mentioned about the searchers needing to re-open will happen with a SWAP right? Is your concern that there will be a lag time, making it so the slaves will be out of sync for some small period of time? Would it be simpler/better to move to using Solrs native slave/master feature? I'd love to hear any suggestions you might have. Thanks, Matt On Fri, Jan 22, 2010 at 1:58 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: This should work fine. But why are you indexing to a separate index/core? Why not index in the very same index you are searching? Slaves won't see changes until their searchers re-open. Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message From: Matt Mitchell goodie...@gmail.com To: solr-user@lucene.apache.org Sent: Fri, January 22, 2010 9:44:03 AM Subject: Re: solr with tomcat in cluster mode We have a similar setup and I'd be curious to see how folks are doing this as well. Our setup: A few servers and an F5 load balancer. Each Solr instance points to a shared index. We use a separate server for indexing. When the index is complete, we do some juggling using the Core Admin SWAP function and update the shared index. I've wondered about having a shared index across multiple instances of (read-only) Solr -- any problems there? Matt On Fri, Jan 22, 2010 at 9:35 AM, ZAROGKIKAS,GIORGOS g.zarogki...@multirama.gr wrote: Hi I'm using solr 1.4 with tomcat in a single pc and I want to turn it in cluster mode with 2 nodes and load balancing But I can't find info how to do Is there any manual or a recorded procedure on the internet to do that Or is there anyone to help me ? Thanks in advance Ps : I use windows server 2008 for OS
Re: Solr - data flattening
Can you post a few examples of your source data? What kinds of relationships are you having to deal with? If you want to retain a link to the source then that's pretty simple (field for the file, url etc.). If your relationships will be between the Solr documents themselves, then I think you'd really need to show source examples, and then describe what it is you want in the output/Solr application. Matt On Sun, Jan 17, 2010 at 8:05 PM, Ankit Bhatnagar abhatna...@vantage.comwrote: Hi guys, I have a Question regarding flattening of data for indexing. Scenario - We have tons of records however they come from disparate data sources. How to flatten data so as to retain the relationship? Thanks Ankit
java heap space error when faceting
I have an index with more than 6 million docs. All is well, until I turn on faceting and specify a facet.field. There is only about unique 20 values for this particular facet throughout the entire index. I was able to make things a little better by using facet.method=enum. That seems to work, until I add another facet.field to the request, which is another facet that doesn't have that many unique values. I utlimately end up running out of heap space memory. I should also mention that in every case, the rows param is set to 0. I've thrown as much memory as I can at the JVM (+3G for start-up and max), tweaked filter cache settings etc.. I can't seem to get this error to go away. Anyone have any tips to throw my way? -- using a recent nighlty build of solr 1.5 dev and Jetty as my servlet container. Thanks! Matt
Re: java heap space error when faceting
These are single valued fields. Strings and integers. Is there more specific info I could post to help diagnose what might be happening? Thanks! Matt On Sat, Jan 16, 2010 at 10:42 AM, Yonik Seeley yo...@lucidimagination.comwrote: On Sat, Jan 16, 2010 at 10:01 AM, Matt Mitchell goodie...@gmail.com wrote: I have an index with more than 6 million docs. All is well, until I turn on faceting and specify a facet.field. There is only about unique 20 values for this particular facet throughout the entire index. Hmmm, that doesn't sound right... unless you're already near max memory usage due to other things. Is this a single-valued or multi-valued field? If multi-valued, how many values does each document have on average? -Yonik http://www.lucidimagination.com
Re: java heap space error when faceting
I'm embarrassed (but hugely relieved) to say that, the script I had for starting Jetty had a bug in the way it set java options! So, my heap start/max was always set at the default. I did end up using jconsole and learned quite a bit from that too. Thanks for your help Yonik :) Matt On Sat, Jan 16, 2010 at 11:13 AM, Yonik Seeley yo...@lucidimagination.comwrote: On Sat, Jan 16, 2010 at 11:04 AM, Matt Mitchell goodie...@gmail.com wrote: These are single valued fields. Strings and integers. Is there more specific info I could post to help diagnose what might be happening? Faceting on either should currently take ~24MB (6M docs @ 4 bytes per doc + size_of_unique_values) With that small number of values, facet.enum may be faster in general (and take up less room: 6M/8*20 or 15MB). But you certainly shouldn't be running out of space with the heap sizes you mentioned. Perhaps look at the stats.jsp page in the admin and see what's listed in the fieldCache? And verify that your heap is really as big as you think it is. You can also use something like jconsole that ships with the JDK to manually do a GC and check out how much of the heap is in use before you try to facet. -Yonik http://www.lucidimagination.com
Re: Getting solr response data in a JS query
I remember having a difficult time getting jquery to work as I thought it would. Something to do with the wt. I ended up creating a little client lib. Maybe this will be useful in finding your problem? example: http://github.com/mwmitchell/get_rest/blob/master/solr_example.html lib: http://github.com/mwmitchell/get_rest/blob/master/solr_client.jquery.js Matt On Mon, Jan 11, 2010 at 11:22 AM, Gregg Hoshovsky hosho...@ohsu.edu wrote: You might be running into an Ajax restriction. See if an article like this helps. http://www.nathanm.com/ajax-bypassing-xmlhttprequest-cross-domain-restriction/ On 1/9/10 11:37 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Dan, You didn't mention whether you tried wt=json . Does it work if you use that to tell Solr to return its response in JSON format? Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message From: Dan Yamins dyam...@gmail.com To: solr-user@lucene.apache.org Sent: Sat, January 9, 2010 10:05:54 PM Subject: Getting solr response data in a JS query Hi: I'm trying to use figure out how to get solr responses and use them in my website.I'm having some problems figure out how to 1) My initial thought is is to use ajax, and insert a line like this in my script: data = eval($.get(http://localhost:8983/solr/select/?q=*:* ).responseText) ... and then do what I want with the data, with logic being done in Javascript on the front page. However, this is just not working technically: no matter what alternative I use, I always seem to get no response to this query. I think I'm having exactly the same problem as described here: http://www.mail-archive.com/solr-user@lucene.apache.org/msg29949.html %20http://www.mail-archive.com/solr-user@lucene.apache.org/msg29949.html and here: http://stackoverflow.com/questions/1906498/solr-responses-to-webbrowser-url-but-not-from-javascript-code Just like those two OPs, I can definitely access my solr responese through a web browser, but my jquery is getting nothing.Unfortunately, in neither thread did the answer seem to have been figured out satisfactorily. Does anybody know what the problem is? 2) As an alternative, I _can_ use the ajax-solr library. Code like this: var Manager; (function ($) { $(function () { Manager = new AjaxSolr.Manager({ solrUrl: 'http://localhost:8983/solr/' }); Manager.init(); Manager.store.addByValue('q', '*:*'); Manager.store.addByValue('rows', '1000'); Manager.doRequest(); }); })(jQuery); does indeed load solr data into my DOM.Somehow, ajax-solr's doRequest method is doing something that makes it possible to receive the proper response from the solr servlet, but I don't know what it is so I can't replicate it with my own ajax. Does anyone know what is happening? (Of course, I _could_ just use ajax-solr, but doing so would mean figuring out how to re-write my existing application for how to display search results in a form that works with the ajax-solr api, and I' d rather avoid this if possible since it looks somewhat nontrivial.) Thanks! Dan
Re: why is XMLWriter declared as final?
OK thanks Shalin. Matt On Wed, Nov 25, 2009 at 8:48 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Wed, Nov 25, 2009 at 3:33 AM, Matt Mitchell goodie...@gmail.com wrote: Is there any reason the XMLWriter is declared as final? I'd like to extend it for a special case but can't. The other writers (ruby, php, json) are not final. I don't think it needs to be final. Maybe it is final because it wasn't designed to be extensible. Please open a jira issue. -- Regards, Shalin Shekhar Mangar.
Re: why is XMLWriter declared as final?
Interesting. Well just to clarify my intentions a bit, I'll quickly explain what I was trying to do. I'm using the MLT component but because some of my stored fields are really big, I don't need (or want) all of the fields for my MLT docs in the response. I want my MLT docs to have only 2 fields, but I need my main docs fl to have all fields. So a simple override of the XMLWriter writeNamedList method would do the trick. All you have to do is check if the name == moreLikeThis. If so, process the docs and specify a different field list. If not, just call super(). Worked like a charm, but oh well. I really only need the Ruby response anyway, so I'll move on to that. I'm glad this spurred some interest though. -- It'd be great to let components have control over their fl value instead of having a global fl value for all doc lists within a writer? Matt On Wed, Nov 25, 2009 at 2:33 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I don't think it needs to be final. Maybe it is final because it wasn't : designed to be extensible. Please open a jira issue. it really wasn't, and it probably shouldn't be ... there is another thread currently in progress (in response to SOLR-1592) about this. Given how kludgy the entire API is, i'd really prefer it not be made un-final .. it would need some serious overhaul/review to make it possible to subclass in a sensical way, and coming up with a new API is likely to make a lot more sense then trying to retrofit that one. -Hoss
why is XMLWriter declared as final?
Is there any reason the XMLWriter is declared as final? I'd like to extend it for a special case but can't. The other writers (ruby, php, json) are not final. Thanks, Matt
Re: multicore and ruby
Hey Paul, In rsolr, you could use the #request method to set a request handler path: solr.request('/core1/select', :q='*:*') Alternatively, (rsolr and solr-ruby) you could probably handle this by creating a new instance of a connection object per-core, and then have some kind of factory to return connection objects by a core-name? What kinds of things were you hoping to find when looking for multicore support in either solr-ruby or rsolr? Matt On Wed, Sep 9, 2009 at 12:38 PM, Paul Rosen p...@performantsoftware.comwrote: Hi all, I'd like to start experimenting with multicore in a ruby on rails app. Right now, the app is using the solr-ruby-rails-0.0.5 to communicate with solr and it doesn't appear to have direct support for multicore and I didn't have any luck googling around for it. We aren't necessarily wedded to using solr-ruby-rails-0.0.5, but I looked at rsolr very briefly and didn't see any reference to multicore there, either. I can certainly hack something together, but it seems like this is a common problem. How are others doing multicore from ruby? Thanks, Paul
Re: multicore and ruby
Yep same thing in rsolr and just use the :shards param. It'll return whatever solr returns. Matt On Wed, Sep 9, 2009 at 4:17 PM, Paul Rosen p...@performantsoftware.comwrote: Hi Erik, Yes, I've been doing that in my tests, but I also have the case of wanting to do a search over all the cores using the shards syntax. I was thinking that the following wouldn't work: solr = Solr::Connection.new(' http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1 ') because it has a ? in it. Erik Hatcher wrote: With solr-ruby, simply put the core name in the URL of the Solr::Connection... solr = Solr::Connection.new('http://localhost:8983/solr/core_name') Erik On Sep 9, 2009, at 6:38 PM, Paul Rosen wrote: Hi all, I'd like to start experimenting with multicore in a ruby on rails app. Right now, the app is using the solr-ruby-rails-0.0.5 to communicate with solr and it doesn't appear to have direct support for multicore and I didn't have any luck googling around for it. We aren't necessarily wedded to using solr-ruby-rails-0.0.5, but I looked at rsolr very briefly and didn't see any reference to multicore there, either. I can certainly hack something together, but it seems like this is a common problem. How are others doing multicore from ruby? Thanks, Paul
Re: response issues with ruby and json
Thanks Yonik. Yeah the additional facet fields being added by the client do make it a little more complicated. Matt On Sun, Aug 23, 2009 at 12:47 PM, Yonik Seeley yo...@lucidimagination.comwrote: The spellcheck issue needs to be resolved. It doesn't seem like a good idea to access facet.fields by position though - there has never been any guarantee about the order that these come back in, and additional ones could be added as default parameters for example. -Yonik http://www.lucidimagination.com On Thu, Aug 20, 2009 at 10:54 PM, Matt Mitchellgoodie...@gmail.com wrote: Hi, I was using the spellcheck component a while ago and noticed that parts of the response are hashes, that use duplicate keys. This is the issue here: http://issues.apache.org/jira/browse/SOLR-1071 Also, the facet/facet_fields response is a hash, where the keys are field names. This is mostly fine BUT, when eval'd in Ruby, the resulting key order is not consistent; I think this is pretty normal for most languages. It seems to me that an array of hashes would be more useful to preserve the ordering? For example, we have an application that uses a custom handler that specifies the facet fields. It'd be nice if the response ordering could also be controlled in the solrconfig.xml I guess I have 2 questions: 1. Does anyone if the spellcheck component going to get updated so there are not duplicate keys 2. How could we get the facet fields into arrays instead of hashes for the ruby response writer? Should I submit a patch? Is this important to anyone else? I guess the alternative is to use the xml response. Thanks, Matt
Using DirectConnection or EmbeddedSolrServer, within a component
Hi, I'm experimenting with Solr components. I'd like to be able to use a nice-high-level querying interface like the DirectSolrConnection or EmbeddedSolrServer provides. Would it be considered absolutely insane to use one of those *within a component* (using the same core instance)? Matt
Re: Indexing XML
Saeli, Solr expects a certain XML structure when adding documents. You'll need to come up with a mapping, that translates the original structure to one that solr understands. You can then search solr and get those solr documents back. If you want to keep the original XML, you can store it in a field within the solr document. original data - mapping - solr XML document (with a field for the original data) Does that make sense? Can you describe what it is you want to do with results of a search? Matt On Tue, Jul 7, 2009 at 10:25 AM, Saeli Mathieu saeli.math...@gmail.comwrote: Hello. I'm a new user of Solr, I already used Lucene to index files and search. But my programme was too slow, it's why I was looking for another solution, and I thought I found it. I said I thought because I don't know if it's possible to use solar with this kind of XML files. lom xsi:schemaLocation=http://ltsc.ieee.org/xsd/lomv1.0 http://ltsc.ieee.org/xsd/lomv1.0/lom.xsd; general identifier catalogSTRING HERE/catalog entry STRING HERE /entry /identifier title string language=fr STRING HERE /string /title languagefr/language description string language=fr STRING HERE /string /description /general lifeCycle status sourceSTRING HERE/source valueSTRING HERE/value /status contribute role sourceSTRING HERE/source valueSTRING HERE/value /role entitySTRING HERE /entity /contribute /lifeCycle metaMetadata identifier catalogSTRING HERE/catalog entrySTRING HERE/entry /identifier contribute role sourceSTRING HERE/source valueSTRING HERE/value /role entitySTRING HERE /entity date dateTimeSTRING HERE/dateTime /date /contribute contribute role sourceSTRING HERE/source valueSTRING HERE/value /role entitySTRING HERE /entity entitySTRING HERE/entity entitySTRING HERE /entity date dateTimeSTRING HERE/dateTime /date /contribute metadataSchemaSTRING HERE/metadataSchema languageSTRING HERE/language /metaMetadata technical locationSTRING HERE /location /technical educational intendedEndUserRole sourceSTRING HERE/source valueSTRING HERE/value /intendedEndUserRole context sourceSTRING HERE/source valueSTRING HERE/value /context typicalAgeRange string language=frSTRING HERE/string /typicalAgeRange description string language=frSTRING HERE/string /description description string language=frSTRING HERE/string /description languageSTRING HERE/language /educational annotation entitySTRING HERE /entity date dateTimeSTRING HERE/dateTime /date /annotation classification purpose sourceSTRING HERE/source valueSTRING HERE/value /purpose /classification classification purpose sourceSTRING HERE/source valueSTRING HERE/value /purpose taxonPath source string language=frSTRING HERE/string /source taxon idSTRING HERE/id entry string language=frSTRING HERE/string /entry /taxon /taxonPath /classification classification purpose sourceSTRING HERE/source valueSTRING HERE/value /purpose taxonPath source string language=frSTRING HERE /string /source taxon idSTRING HERE/id entry string language=frSTRING HERE/string /entry /taxon /taxonPath taxonPath source string language=frSTRING HERE/string /source taxon idSTRING HERE/id entry string language=frSTRING HERE/string /entry /taxon /taxonPath /classification /lom I don't know how I can use this kind of file with Solr because the XML example are this one. add doc field name=idSOLR1000/field field name=nameSolr, the Enterprise Search Server/field field name=manuApache Software Foundation/field field name=catsoftware/field field name=catsearch/field field name=featuresAdvanced Full-Text Search Capabilities using Lucene/field field name=featuresOptimized for High Volume Web Traffic/field field name=featuresStandards Based Open Interfaces - XML and HTTP/field field name=featuresComprehensive HTML Administration Interfaces/field field name=featuresScalability - Efficient Replication to other Solr Search Servers/field field name=featuresFlexible and Adaptable with XML configuration and Schema/field field name=featuresGood unicode support: h#xE9;llo (hello with an accent over the e)/field field name=price0/field field name=popularity10/field field name=inStocktrue/field field name=incubationdate_dt2006-01-17T00:00:00.000Z/field /doc /add I understood Solr need this kind of architecture, by Architecture I mean field + name=keywordValue/field or as you can see I can't use this kind of architecture because I'm not allow to change my XML files. I'm looking forward to read you. Mathieu Saeli -- Saeli Mathieu.
transforming an XML field using the XSL tr param
I know you can transform Solr document fields, but is it possible to have Solr transform XML that might be embedded (as a string) in a field? Matt
Re: [OT] New Book: Search User Interfaces
This is great! Thanks for this. Matt On Mon, Jun 29, 2009 at 12:30 AM, Ian Holsman li...@holsman.net wrote: not directly related to SOLR I know.. but I think most people would find it interesting. http://searchuserinterfaces.com/book/ from the preface: Search is an integral part of peoples' online lives; people turn to search engines for help with a wide range of needs and desires, from satisfying idle curiousity to finding life-saving health remedies, from learning about medieval art history to finding video game solutions and pop music lyrics. Web search engines are now the second most frequently used online computer application, after email. Not long ago, most software applications did not contain a search module. Today, search is fully integrated into operating systems and is viewed as an essential part of most information systems. Many books on information retrieval describe the algorithms behind search engines and information retrieval systems. By contrast, this book focuses on the human users of search systems and the tool they use to interact with them: the search user interface. Because of their global reach, search user interfaces must be understandable by and appealing to a wide variety of people of all ages, cultures and backgrounds, and for an enormous variety of information needs.
Re: are there any good samples / tutorials on making queries facets ?
Yeah the lucid imagination articles are great! Jonathan, you can also use the dismax query parser and apply boosts using the qf (query fields) param: q=my query hereqf=title^0.5 author^0.1 http://wiki.apache.org/solr/DisMaxRequestHandler#head-af452050ee272a1c88e2ff89dc0012049e69e180 Matt On Sat, Jun 20, 2009 at 10:11 PM, Michel Bottan freakco...@gmail.comwrote: Hi Jonathan, I think this is the best article related to faceted search. http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Faceted-Search-Solr On Sat, Jun 20, 2009 at 9:56 PM, Jonathan Vanasco jvana...@2xlp.com wrote: i've gone through the official docs a few times, and then found some offsite stuff of varying quality regarding how-tos. can anyone here recommend either howtos/tutorials or sample applications that they have found worthwhile ? specifically i'm looking to do the following: - with regular searching, query the system with a single term, and have solr search multiple fields - each one having a different weight In order to search into multiple fields and have a different weight for each of them, you could use the Dismax requesthandler and boost each field. - use dismax - boost weights of each field using bq parameter bq=foofield:term^0.5 http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3 - implement faceted browsing i know this is quite easy to do with solr, i'm just not seeing docs that resonate with me yet. thanks! Cheers, Michel
moreLikeThis fl
I'd like to have a MLT query return similar docs, but the fl for those mlt docs should be different from the main fl. For example, the main fl is *, score -- but I only want the title and id in my MLT results. Is this possible? Matt
Re: grouping response docs together
Thomas, Here's what I get after patching nightly build (yesterday) and running ant test compileTests: [javac] Compiling 1 source file to /Users/mwm4n/Downloads/apache-solr-nightly/build/tests [javac] /Users/mwm4n/Downloads/apache-solr-nightly/src/test/org/apache/solr/search/TestDocSet.java:138: checkEqual(org.apache.lucene.util.OpenBitSet,org.apache.solr.search.DocSet) in org.apache.solr.search.TestDocSet cannot be applied to (org.apache.solr.search.DocSet,org.apache.solr.search.DocSet) [javac] checkEqual(a1, NegatedDocSet.negation(NegatedDocSet.negation(b1))); [javac] ^ [javac] 1 error Matt On Mon, May 25, 2009 at 7:59 PM, Matt Mitchell goodie...@gmail.com wrote: Hi Thomas, In a 5-24-09 nightly build, I applied the patch: cd apache-solr-nightly patch -p0 ~/Projects/apache-solr-patches/SOLR-236_collapsing.patch patching file src/common/org/apache/solr/common/params/CollapseParams.java patching file src/java/org/apache/solr/handler/component/CollapseComponent.java patching file src/java/org/apache/solr/search/CollapseFilter.java patching file src/java/org/apache/solr/search/NegatedDocSet.java patching file src/java/org/apache/solr/search/SolrIndexSearcher.java Hunk #1 succeeded at 1444 (offset -39 lines). patching file src/test/org/apache/solr/search/TestDocSet.java Hunk #1 succeeded at 134 (offset 42 lines). ... and got this when running ant dist docs: [mkdir] Created dir: /Users/mwm4n/Downloads/apache-solr-nightly/contrib/javascript/dist/doc [java] Exception in thread main java.lang.NoClassDefFoundError: org/mozilla/javascript/tools/shell/Main [java] at JsRun.main(Unknown Source) BUILD FAILED /Users/mwm4n/Downloads/apache-solr-nightly/common-build.xml:338: The following error occurred while executing this line: /Users/mwm4n/Downloads/apache-solr-nightly/common-build.xml:215: The following error occurred while executing this line: /Users/mwm4n/Downloads/apache-solr-nightly/contrib/javascript/build.xml:74: Java returned: 1 Not sure what any of that means, but the ant dist task worked fine before the patch. Any ideas? Thanks, Matt On Mon, May 25, 2009 at 3:59 PM, Thomas Traeger t.trae...@kabuco.dewrote: Hello Matt, the patch should work with trunk and after a small fix with 1.3 too (see my comment in SOLR-236). I just made a successful build to be sure. Do you see any error messages? Thomas Matt Mitchell schrieb: Thanks guys. I looked at the dedup stuff, but the documents I'm adding aren't really duplicates. They're very similar, but different. I checked out the field collapsing feature patch, applied the patch but can't get it to build successfully. Will this patch work with a nightly build? Thanks! On Fri, May 15, 2009 at 7:47 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Matt - you may also want to detect near duplicates at index time: http://wiki.apache.org/solr/Deduplication Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Matt Mitchell goodie...@gmail.com To: solr-user@lucene.apache.org Sent: Friday, May 15, 2009 6:52:48 PM Subject: grouping response docs together Is there a built-in mechanism for grouping similar documents together in the response? I'd like to make it look like there is only one document with multiple hits. Matt
Re: grouping response docs together
Thanks Otis. I'll give the dedup a test drive today. I'll explain what I'm trying to do a little better though because I don't think I have yet! So, I'm indexing an XML file. There are different sections in the XML file. Each of those sections gets a solr doc (the xml text-only is indexed). Each solr doc also has a field to specify the source filename. What I'd like to have happen is, when I do a search, I want my search results to combine all documents that have the same filename... I want to group by filename if that makes sense. Or at the very least, show only one and indicate that there are more. Matt On Tue, May 26, 2009 at 12:58 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Matt, The Deduplication feature in Solr does support near-duplicate scenario. It comes with a few components to help you detect near-duplicates, and you should be able to write a custom near-dupe detection component and plug it in. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Matt Mitchell goodie...@gmail.com To: solr-user@lucene.apache.org Sent: Monday, May 25, 2009 3:30:42 PM Subject: Re: grouping response docs together Thanks guys. I looked at the dedup stuff, but the documents I'm adding aren't really duplicates. They're very similar, but different. I checked out the field collapsing feature patch, applied the patch but can't get it to build successfully. Will this patch work with a nightly build? Thanks! On Fri, May 15, 2009 at 7:47 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Matt - you may also want to detect near duplicates at index time: http://wiki.apache.org/solr/Deduplication Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Matt Mitchell To: solr-user@lucene.apache.org Sent: Friday, May 15, 2009 6:52:48 PM Subject: grouping response docs together Is there a built-in mechanism for grouping similar documents together in the response? I'd like to make it look like there is only one document with multiple hits. Matt
Re: grouping response docs together
Thanks guys. I looked at the dedup stuff, but the documents I'm adding aren't really duplicates. They're very similar, but different. I checked out the field collapsing feature patch, applied the patch but can't get it to build successfully. Will this patch work with a nightly build? Thanks! On Fri, May 15, 2009 at 7:47 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Matt - you may also want to detect near duplicates at index time: http://wiki.apache.org/solr/Deduplication Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Matt Mitchell goodie...@gmail.com To: solr-user@lucene.apache.org Sent: Friday, May 15, 2009 6:52:48 PM Subject: grouping response docs together Is there a built-in mechanism for grouping similar documents together in the response? I'd like to make it look like there is only one document with multiple hits. Matt
Re: highlighting performance
Thanks Otis. I added termVector=true for those fields, but there isn't a noticeable difference. So, just to be a little more clear, the dynamic fields I'm adding... there might be hundreds. Do you see this as a problem? Thanks, Matt On Fri, May 15, 2009 at 7:48 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Matt, I believe indexing those fields that you will use for highlighting with term vectors enabled will make things faster (and your index a bit bigger). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Matt Mitchell goodie...@gmail.com To: solr-user@lucene.apache.org Sent: Friday, May 15, 2009 5:08:23 PM Subject: highlighting performance Hi, I'm experimenting with highlighting and am noticing a big drop in performance with my setup. I have documents that use quite a few dynamic fields (20-30). The fields are multiValued stored/indexed text fields, each with a few paragraphs worth of text. My hl.fl param is set to *_t What kinds of things can I tweak to make this faster? Is it because I'm highlighting so many different fields? Thanks, Matt
Re: grouping response docs together
Hi Thomas, In a 5-24-09 nightly build, I applied the patch: cd apache-solr-nightly patch -p0 ~/Projects/apache-solr-patches/SOLR-236_collapsing.patch patching file src/common/org/apache/solr/common/params/CollapseParams.java patching file src/java/org/apache/solr/handler/component/CollapseComponent.java patching file src/java/org/apache/solr/search/CollapseFilter.java patching file src/java/org/apache/solr/search/NegatedDocSet.java patching file src/java/org/apache/solr/search/SolrIndexSearcher.java Hunk #1 succeeded at 1444 (offset -39 lines). patching file src/test/org/apache/solr/search/TestDocSet.java Hunk #1 succeeded at 134 (offset 42 lines). ... and got this when running ant dist docs: [mkdir] Created dir: /Users/mwm4n/Downloads/apache-solr-nightly/contrib/javascript/dist/doc [java] Exception in thread main java.lang.NoClassDefFoundError: org/mozilla/javascript/tools/shell/Main [java] at JsRun.main(Unknown Source) BUILD FAILED /Users/mwm4n/Downloads/apache-solr-nightly/common-build.xml:338: The following error occurred while executing this line: /Users/mwm4n/Downloads/apache-solr-nightly/common-build.xml:215: The following error occurred while executing this line: /Users/mwm4n/Downloads/apache-solr-nightly/contrib/javascript/build.xml:74: Java returned: 1 Not sure what any of that means, but the ant dist task worked fine before the patch. Any ideas? Thanks, Matt On Mon, May 25, 2009 at 3:59 PM, Thomas Traeger t.trae...@kabuco.de wrote: Hello Matt, the patch should work with trunk and after a small fix with 1.3 too (see my comment in SOLR-236). I just made a successful build to be sure. Do you see any error messages? Thomas Matt Mitchell schrieb: Thanks guys. I looked at the dedup stuff, but the documents I'm adding aren't really duplicates. They're very similar, but different. I checked out the field collapsing feature patch, applied the patch but can't get it to build successfully. Will this patch work with a nightly build? Thanks! On Fri, May 15, 2009 at 7:47 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Matt - you may also want to detect near duplicates at index time: http://wiki.apache.org/solr/Deduplication Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Matt Mitchell goodie...@gmail.com To: solr-user@lucene.apache.org Sent: Friday, May 15, 2009 6:52:48 PM Subject: grouping response docs together Is there a built-in mechanism for grouping similar documents together in the response? I'd like to make it look like there is only one document with multiple hits. Matt
highlighting performance
Hi, I'm experimenting with highlighting and am noticing a big drop in performance with my setup. I have documents that use quite a few dynamic fields (20-30). The fields are multiValued stored/indexed text fields, each with a few paragraphs worth of text. My hl.fl param is set to *_t What kinds of things can I tweak to make this faster? Is it because I'm highlighting so many different fields? Thanks, Matt
grouping response docs together
Is there a built-in mechanism for grouping similar documents together in the response? I'd like to make it look like there is only one document with multiple hits. Matt
Re: highlighting html content
Hi Christian, I decided to do something very similar. How do you handle cases where the highlighting is inside of html/xml tags though? I'm getting stuff like this: ?q=jackson entry type=song author=Michael emJackson/emBad by Michael emJackson/em/entry I wrote a regular expression to take care of the html/xml problem (highlighting inside of the tag), I'd be interested in seeing your and others approach to this, even if it's a regular expression. Matt On Tue, Apr 28, 2009 at 3:21 AM, Christian Vogler christian.vog...@gmail.com wrote: Hi Matt, On Tue, Apr 28, 2009 at 4:24 AM, Matt Mitchell goodie...@gmail.com wrote: I've been toying with setting custom pre/post delimiters and then removing them in the client, but I thought I'd ask the list before I go to far with that idea :) this is what I do. I define the custom highlight delimiters as [solr:hl] and [/solr:hl], and then do a string replace with em class=highlight /em on the search results. It is simple to implement, and effective. Best regards - Christian
Re: Can we provide context dependent faceted navigation from SOLR search results
Wow, this looks great. Thanks for this Koji! Matt On Tue, Apr 28, 2009 at 12:13 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: Thanh Doan wrote: Assuming a solr search returns 10 listing items as below 1) 4 digital cameras 2) 4 LCD televisions 3) 2 clothing items If we navigate to /electronics we want solr to show us facets specific to 8 electronics items (e.g brand, price). If we navigate to /electronics/cameraswe want solr to show us facets specific to 4 camera items (e.g mega-pixels, screens-size, brand, price). If we navigate to /electronics/televisions we want to see different facets and their counts specific to TV items. If we navigate to /clothing we want to obtain totally different facets and their counts. I am not sure if we can think of this as Hierarchical Facet Navigation system or not. From the UI perspective , we can think of /electronics/cameras as Hierarchical classification. There is a patch for Hierarchical Facet Navigation: https://issues.apache.org/jira/browse/SOLR-64 But how about electronics/cameras/canon vs electronics/canon/camera. In this case both navigation should show the same result set no matter which facet is selected first. The patch supports a document to have multiple hierarchical facet fields. for example: add doc field name=nameCanon Brand-new Digital Camera/field field name=catelectronics/cameras/canon/field field name=catelectronics/canon/cameras/field /doc /add Koji My question is with the current solr implementation can we provide context dependent faceted navigation from SOLR search results? Thank you. Thanh Doan
field type for serialized code?
Hi, I'm attempting to serialize a simple ruby object into a solr.StrField - but it seems that what I'm getting back is munged up a bit, in that I can't de-serialize it. Is there a field type for doing this type of thing? Thanks, Matt
highlighting html content
Hi, I've been looking around but can't seem to find any clear instruction on how to do this... I'm storing html content and would like to enable highlighting on the html content. The problem is that the search can sometimes match html element names or attributes, and when the highlighter adds the highlight tags, the html is bad. I've been toying with setting custom pre/post delimiters and then removing them in the client, but I thought I'd ask the list before I go to far with that idea :) Thanks, Matt
storing xml - how to highlight hits in response?
Hi, I'm storing some raw xml in solr (stored and non-tokenized). I'd like to highlight hits in the response, obviously this is problematic as the highlighting elements are also xml. So if I match an attribute value or tag name, the xml response is messed up. Is there a way to highlight only text, that is not part of an xml element? As in, only the text content? Matt
Re: storing xml - how to highlight hits in response?
Yeah great idea, thanks. Does anyone know if there is code out there that will do this sort of thing? Matt On Thu, Apr 23, 2009 at 3:23 PM, Ensdorf Ken ensd...@zoominfo.com wrote: Hi, I'm storing some raw xml in solr (stored and non-tokenized). I'd like to highlight hits in the response, obviously this is problematic as the highlighting elements are also xml. So if I match an attribute value or tag name, the xml response is messed up. Is there a way to highlight only text, that is not part of an xml element? As in, only the text content? You could create a custom Analyzer or Tokenizer that strips everything but the text content. -Ken
Re: Solr webinar
Thanks Erik! Looking forward to it. Matt On Mon, Apr 20, 2009 at 11:00 AM, ahammad ahmed.ham...@gmail.com wrote: Hello Erik, I'm interested in attending the Webinar. I just have some questions to verify whether or not I am fit to attend... 1) How will it be carried out? What software or application would I need? 2) Do I have to have any experience or can I attend for the purpose of learning about Solr? Thanks for taking time to do this. Regards Erik Hatcher wrote: (excuse the cross-post) I'm presenting a webinar on Solr. Registration is limited, so sign up soon. Looking forward to seeing some of you there! Thanks, Erik Got data? You can build your own Solr-powered Search Engine! Erik Hatcher, Lucene/Solr Committer and author, will show you how you how to use Solr to build an Enterprise Search engine that indexes a variety data sources all in a matter of minutes! Thursday, April 30, 2009 11:00AM - 12:00PM PDT / 2:00PM - 3:00PM EDT Sign up for this free webinar today at http://www2.eventsvc.com/lucidimagination/?trk=E1 -- View this message in context: http://www.nabble.com/Solr-webinar-tp23138157p23138451.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dismax query not working with 1.4
Do you have qf set? Just last week I had a problem where no results were coming back, and it turned out that my qf param was empty. Matt On Thu, Mar 26, 2009 at 2:30 PM, Ben Lavender blaven...@gmail.com wrote: Hello, I'm using the March 18th 1.4 nightly, and I can't get a dismax query to return results. The standard and partitioned query types return data fine. I'm using jetty, and the problem occurs with the default solrconfig.xml as well as the one I am using, which is the Drupal module, beta 6. The problem occurs in the admin interface for solr, though, not just in the end application. And...that's it? I don't know what else to say or offer other than dismax doesn't work, and I'm not sure where else to go to troubleshoot. Any ideas? Ben
Re: Tomcat5 + Solr. Problems in deploying the Webapp
Hi, Have you looked at this page: http://wiki.apache.org/solr/SolrTomcat It almost sounds like you're deploying twice? Putting the solr.war in webapps would be one way, and the other would be a context config file + using the web manager. If you're using the config/context, then don't put the solr.war in webapps, tomcat should do that for you after deploying with the manager. Matt On Wed, Mar 4, 2009 at 8:55 AM, Sudharshan S sudha...@gmail.com wrote: Hi all, I am trying to setup a solr instance with Tomcat5 on a Fedora10 machine. Here is what I did, 1.) Copy the apache-solr-nightly.war to webapps/solr.war 2.) Set solr.solr.home in tomcat.conf 3.) Use the Manager interface of tomcat to deploy the webapp But, while doing so, I get the following exceptions. Mar 4, 2009 6:55:09 PM org.apache.catalina.core.StandardContext filterStart SEVERE: Exception starting filter SolrRequestFilter java.lang.NoClassDefFoundError: Could not initialize class org.apache.solr.core.SolrConfig at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:76) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:221) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:302) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:78) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3635) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4222) at org.apache.catalina.manager.ManagerServlet.start(ManagerServlet.java:1173) at org.apache.catalina.manager.HTMLManagerServlet.start(HTMLManagerServlet.java:549) at org.apache.catalina.manager.HTMLManagerServlet.doGet(HTMLManagerServlet.java:105) at javax.servlet.http.HttpServlet.service(HttpServlet.java:617) at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:269) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:525) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Thread.java:636) What am I missing? If it matters I am running the nightly build from March 3 2009. Thanks and Regards Sudharshan S Blog : http://www.sudharsh.wordpress.com IRC : Sup3rkiddo @ Freenode, Gimpnet
Re: Column Specific Query with q parameter
The syntax for the q param when using dismax is different from standard. Check this out: http://wiki.apache.org/solr/DisMaxRequestHandler#head-df8184dddf870336839490ba276ea6ac566d0bdf q.alt under dismax is parsed using the standard query parser though: http://wiki.apache.org/solr/DisMaxRequestHandler#head-9d23a23915b7932490069d3262ef7f3625e398ff Using dismax with that query... you could do it using the fq param: ?fq=prdMainTitle_product_s:mathqt=dismaxrequestq.alt=*:* But make sure you understand how the fq param works; how solr uses its caching... http://wiki.apache.org/solr/CommonQueryParameters#head-6522ef80f22d0e50d2f12ec487758577506d6002 Hope this helps, Matt On Thu, Mar 5, 2009 at 1:30 AM, dabboo ag...@sapient.com wrote: Hi, I am implementing column specific query with q query parameter. for e.g. ?q=prdMainTitle_product_s:math qt=dismaxrequest The above query doesnt work while if I use the same query with q.alt parameter, it works. ?q=q.alt= prdMainTitle_product_s:math qt=dismaxrequest Please suggest, how to achieve this with q query. Thanks, Amit Garg -- View this message in context: http://www.nabble.com/Column-Specific-Query-with-q-parameter-tp22345960p22345960.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr and tomcat
Hi Matthew, The problem is that we have multiple instances of solr running under one tomcat. So setting -Dsolr.data.dir=foo would set the home for every solr. I guess multi-core might solve my problem, but that'd change our app architecture too much, maybe some other day. I *kind* of have a solution for the permissions thing though: - The project user is part of the tomcat group. - The tomcat user is part of the project user group. - We're making a call to umask 002 in the tomcat catalina.sh file (means all files created will have group write) So when solr (tomcat) creates the index, they're group writable now and I can remove etc.! So, I still need to figure out the data.dir problem. Hmm. Thanks for your help, Matt On Tue, Mar 3, 2009 at 11:31 AM, Matthew Runo mr...@zappos.com wrote: It looks like if you set a -Dsolr.data.dir=foo then you could specify where the index would be stored, yes? Are you properly setting your solr.home? I've never had to set the data directory specifically, Solr has always put it under my home. From solrconfig.xml: dataDir${solr.data.dir:./solr/data}/dataDir Since Solr is running under tomcat, I'd assume that the index will always appear to be owned by tomcat as well. I don't think there is any way to have a different user for the written files - but someone else might want to chime in before you believe me 100% on this one. Thanks for your time! Matthew Runo Software Engineer, Zappos.com mr...@zappos.com - 702-943-7833 On Mar 2, 2009, at 5:46 PM, Matt Mitchell wrote: Hi. I'm sorry if this is the second time this message comes through! A few questions here... #1 Does anyone know how to set the user/group and/or permissions on the index that solr creates? It's always the tomcat user. Is it possible to change this in my context file? Help! #2 I'm deploying Solr via Tomcat and really thought I had this stuff down. But it seems that with some recent system upgrades, my scheme is failing to set the data dir correctly. I'm deploying solr to tomcat, using a context file as described here: http://wiki.apache.org/solr/SolrTomcat#head-024d7e11209030f1dbcac9974e55106abae837ac But when I deploy, Tomcat says that it can't find a ./data/index directory -- relative to the tomcat home directory. How can I set the data dir relative to the solr home value I'm specifying in the tomcat context file? Note: a hard-coded absolute path works, but I want to configure at deployment time. In the past, I tried setting the data dir in the same way the solr home is set in the context file without luck. Does this now work in the latest solr nightly? Thanks,
Re: solr and tomcat
That's exactly what we're doing (setting the value in each config). The main problem with that is we have multiple people working on each of these solr projects, in different environments. Their data.dir path is always the same (relative) value which works fine under Jetty. But running under tomcat, the data dir is relative to tomcat's home. So an absolute hard-coded path is the only solution. My hope was that we'd be able to override it using the same method as setting the solr/home value in the tomcat context file. The thought of running multiple tomcats is interesting. Do you have any issues with memory or cpu performance? Thanks, Matt On Tue, Mar 3, 2009 at 11:45 AM, Matthew Runo mr...@zappos.com wrote: Perhaps you could hard code it in the solrconfig.xml file for each solr instance? Other than that, what we did was run multiple instances of Tomcat. That way if something goes bad in one, it doesn't affect the others. Thanks for your time! Matthew Runo Software Engineer, Zappos.com mr...@zappos.com - 702-943-7833 On Mar 3, 2009, at 8:39 AM, Matt Mitchell wrote: Hi Matthew, The problem is that we have multiple instances of solr running under one tomcat. So setting -Dsolr.data.dir=foo would set the home for every solr. I guess multi-core might solve my problem, but that'd change our app architecture too much, maybe some other day. I *kind* of have a solution for the permissions thing though: - The project user is part of the tomcat group. - The tomcat user is part of the project user group. - We're making a call to umask 002 in the tomcat catalina.sh file (means all files created will have group write) So when solr (tomcat) creates the index, they're group writable now and I can remove etc.! So, I still need to figure out the data.dir problem. Hmm. Thanks for your help, Matt On Tue, Mar 3, 2009 at 11:31 AM, Matthew Runo mr...@zappos.com wrote: It looks like if you set a -Dsolr.data.dir=foo then you could specify where the index would be stored, yes? Are you properly setting your solr.home? I've never had to set the data directory specifically, Solr has always put it under my home. From solrconfig.xml: dataDir${solr.data.dir:./solr/data}/dataDir Since Solr is running under tomcat, I'd assume that the index will always appear to be owned by tomcat as well. I don't think there is any way to have a different user for the written files - but someone else might want to chime in before you believe me 100% on this one. Thanks for your time! Matthew Runo Software Engineer, Zappos.com mr...@zappos.com - 702-943-7833 On Mar 2, 2009, at 5:46 PM, Matt Mitchell wrote: Hi. I'm sorry if this is the second time this message comes through! A few questions here... #1 Does anyone know how to set the user/group and/or permissions on the index that solr creates? It's always the tomcat user. Is it possible to change this in my context file? Help! #2 I'm deploying Solr via Tomcat and really thought I had this stuff down. But it seems that with some recent system upgrades, my scheme is failing to set the data dir correctly. I'm deploying solr to tomcat, using a context file as described here: http://wiki.apache.org/solr/SolrTomcat#head-024d7e11209030f1dbcac9974e55106abae837ac But when I deploy, Tomcat says that it can't find a ./data/index directory -- relative to the tomcat home directory. How can I set the data dir relative to the solr home value I'm specifying in the tomcat context file? Note: a hard-coded absolute path works, but I want to configure at deployment time. In the past, I tried setting the data dir in the same way the solr home is set in the context file without luck. Does this now work in the latest solr nightly? Thanks,
solr and tomcat
Hi. I'm sorry if this is the second time this message comes through! A few questions here... #1 Does anyone know how to set the user/group and/or permissions on the index that solr creates? It's always the tomcat user. Is it possible to change this in my context file? Help! #2 I'm deploying Solr via Tomcat and really thought I had this stuff down. But it seems that with some recent system upgrades, my scheme is failing to set the data dir correctly. I'm deploying solr to tomcat, using a context file as described here: http://wiki.apache.org/solr/SolrTomcat#head-024d7e11209030f1dbcac9974e55106abae837ac But when I deploy, Tomcat says that it can't find a ./data/index directory -- relative to the tomcat home directory. How can I set the data dir relative to the solr home value I'm specifying in the tomcat context file? Note: a hard-coded absolute path works, but I want to configure at deployment time. In the past, I tried setting the data dir in the same way the solr home is set in the context file without luck. Does this now work in the latest solr nightly? Thanks,
Re: [ANNOUNCE] Solr Logo Contest Results
Love it! Congratulations Michiel. Matt On Wed, Dec 17, 2008 at 9:15 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: (replies to solr-user please) On behalf of the Solr Committers, I'm happy to announce that we the Solr Logo Contest is officially concluded. (Woot!) And the Winner Is... https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg ...by Michiel We ran into a few hiccups during the contest making it take longer then intended, but the result was a thorough process in which everyone went above and beyond to ensure that the final choice best reflected the wishes of the community. You can expect to see the new logo appear on the site (and in the Solr app) in the next few weeks. Congrats Michiel! -Hoss
strange difference between json and xml responses
Hi, A while ago, we had a field called word which was used as a spelling field. We switched this to spell. When querying our solr instance with just q=*:*, we get back the expected results. When querying our solr instance with q=*:*wt=json, we get this (below). When setting the qt to dismax, the error goes away but no results come back. Is this a bug in the json response writer? Or more than likely, something I'm completely glossing over? Matt HTTP Status 400 - undefined field word -- *type* Status report *message* *undefined field word* *description* *The request sent by the client was syntactically incorrect (undefined field word).* -- Apache Tomcat/6.0.18
Re: strange difference between json and xml responses
Actually, the dismax thing was a bad example. So, forget about the qt param for now. I did however, search the schema and didn't find a reference to word. The problem comes in when I switch the wt param from xml to json (or ruby). q=*:*wt=xml == success q=*:*wt=json == error q=*:*wt=ruby == error Matt On Tue, Dec 9, 2008 at 5:10 PM, Otis Gospodnetic [EMAIL PROTECTED] wrote: Hi Matt, You need to edit your solrconfig.xml and look for the word word in the dismax section of the config and change it to spell. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Matt Mitchell [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Tuesday, December 9, 2008 2:08:43 PM Subject: strange difference between json and xml responses Hi, A while ago, we had a field called word which was used as a spelling field. We switched this to spell. When querying our solr instance with just q=*:*, we get back the expected results. When querying our solr instance with q=*:*wt=json, we get this (below). When setting the qt to dismax, the error goes away but no results come back. Is this a bug in the json response writer? Or more than likely, something I'm completely glossing over? Matt HTTP Status 400 - undefined field word -- *type* Status report *message* *undefined field word* *description* *The request sent by the client was syntactically incorrect (undefined field word).* -- Apache Tomcat/6.0.18
Re: strange difference between json and xml responses
Thanks Yonik. Should submit this as a bug ticket? Currently it's not a deal breaker as we're setting fl manually anyway. Matt On Tue, Dec 9, 2008 at 5:38 PM, Yonik Seeley [EMAIL PROTECTED] wrote: There is probably a document in your index with the field word. The json writers may be less tolerant when encountering a field that is not known. We should perhaps change the json/text based writers to handle this case gracefully also. -Yonik On Tue, Dec 9, 2008 at 5:18 PM, Matt Mitchell [EMAIL PROTECTED] wrote: Actually, the dismax thing was a bad example. So, forget about the qt param for now. I did however, search the schema and didn't find a reference to word. The problem comes in when I switch the wt param from xml to json (or ruby). q=*:*wt=xml == success q=*:*wt=json == error q=*:*wt=ruby == error Matt On Tue, Dec 9, 2008 at 5:10 PM, Otis Gospodnetic [EMAIL PROTECTED] wrote: Hi Matt, You need to edit your solrconfig.xml and look for the word word in the dismax section of the config and change it to spell. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Matt Mitchell [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Tuesday, December 9, 2008 2:08:43 PM Subject: strange difference between json and xml responses Hi, A while ago, we had a field called word which was used as a spelling field. We switched this to spell. When querying our solr instance with just q=*:*, we get back the expected results. When querying our solr instance with q=*:*wt=json, we get this (below). When setting the qt to dismax, the error goes away but no results come back. Is this a bug in the json response writer? Or more than likely, something I'm completely glossing over? Matt HTTP Status 400 - undefined field word -- *type* Status report *message* *undefined field word* *description* *The request sent by the client was syntactically incorrect (undefined field word).* -- Apache Tomcat/6.0.18
admin/luke and EmbeddedSolrServer
Is it possible to send a request to admin/luke using the EmbeddedSolrServer?
Re: solr-ruby gem
I've been using solr-ruby with 1.3 for quite a while now. It's powering our experimental, open-source OPAC, Blacklight: blacklight.rubyforge.org I've got a custom query builder and response wrapper, but it's using solr-ruby underneath. Matt On Tue, Nov 18, 2008 at 2:57 PM, Erik Hatcher [EMAIL PROTECTED]wrote: On Nov 18, 2008, at 2:41 PM, Kashyap, Raghu wrote: Anyone knows if the solr-ruby gem is compatible with solr 1.3?? Yes, the gem at rubyforge is compatible with 1.3. Also, the library itself is distributed with the binary release of Solr, in client/ruby/solr-ruby/lib Also anyone using acts_as_solr plugin? Off late the website is down and can't find any recent activities on that From my perspective, acts_as_solr is a mess. [My apologies for creating the initial hack that then morphed out of control] There are a lot of users of various versions of acts_as_solr, and discussion of that continues here: http://groups.google.com/group/acts_as_solr. There are numerous github branches each with various patches applied - take your pick and run with one of them :) Or go lighter weight and roll-your-own acts_as_solr by simply putting in after_save/after_destroy hooks. See slide 13 of http://code4lib.org/files/solr-ruby.pdf Erik
questions about Solr connection methods
I'm implementing connection adapters in ruby/jruby and wondering how all of the different solr connection classes relate. Is the only difference between EmbeddedSolrServer and DirectSolrConnection, that EmbeddedSolrServer provides some higher level methods for adding, deleting etc.? Or is there something else happening underneath the covers? If the higher level methods in EmbeddedSolrServer aren't really of use to me, would it be better to use the simpler DirectSolrConnection? Does DirectSolrConnection support multicore? Thanks, Matt
Question about CoreContainer
Hi, i'm using CoreContainer in jRuby. I'd like my data directory to be the standard solr-home/data. But since CoreContainer == multi-core, I need to supply a core name. Is it possible to use CoreContainer without a core? is it possible to set the dataDir? Also, it seems that no matter what I set as the solr home, a solr directory always gets created in the same directory that I'm execuing my script. Thanks, Matt
Re: solr 1.3 - spellcheck doesn't seems to get any data?
Did you send in a spellcheck.build=true ? Matt On Fri, Oct 17, 2008 at 7:31 AM, sunnyfr [EMAIL PROTECTED] wrote: Hi, How come I've nothing in my spellchercker directories : I've updated it but I'm started from an empty data directory and : [EMAIL PROTECTED]:/data/solr/video/data# ls spellchecker1/ segments.gen segments_1 [EMAIL PROTECTED]:/data/solr/video/data# ls spellchecker2/ segments.gen segments_1 [EMAIL PROTECTED]:/data/solr/video/data# ls spellcheckerFile/ segments.gen segments_1 [EMAIL PROTECTED]:/data/solr/video/data# ls index/ _a.fdt _a.fnm _a.nrm _a.tii _a.tvd _a.tvx _b.fdx _b.frq _b.prx _b.tis _b.tvf _c.fdt _c.fnm _c.nrm _c.tii _c.tvd _c.tvx segments_e _a.fdx _a.frq _a.prx _a.tis _a.tvf _b.fdt _b.fnm _b.nrm _b.tii _b.tvd _b.tvx _c.fdx _c.frq _c.prx _c.tis _c.tvf segments.gen My Files : http://www.nabble.com/file/p20031572/solrconfig.xml solrconfig.xml http://www.nabble.com/file/p20031572/schema.xml schema.xml Thanks, -- View this message in context: http://www.nabble.com/solr-1.3---spellcheck-doesn%27t-seems-to-get-any-data--tp20031572p20031572.html Sent from the Solr - User mailing list archive at Nabble.com.
delete field from index
Hi, I was using a field called word but have changed it to spell. Do I need to delete this field from the index and if so, how? I'm concerned because when I do a query like: ?q.alt=*:*qt=dismax I get an error saying the word field was not found. Matt
Re: delete field from index
OK I figured it out. It's because my fl had * in it. So, I'm guessing a re-index will remove the word field for good? + Erik for the tip :) Matt On Fri, Oct 17, 2008 at 2:57 PM, Matt Mitchell [EMAIL PROTECTED] wrote: Hi, I was using a field called word but have changed it to spell. Do I need to delete this field from the index and if so, how? I'm concerned because when I do a query like: ?q.alt=*:*qt=dismax I get an error saying the word field was not found. Matt
populating a spellcheck dictionary
I'm starting to implement the new SpellCheckComponent. The solr 1.3 dist example is using a file based dictionary, but I'd like to figure out the best way to populate the dictionary from our index. Should the spellcheck field be multivalued? Thanks, Matt
Re: populating a spellcheck dictionary
Woops, I was looking at the wrong example solrconfig.xml Thanks Grant! Matt On Thu, Oct 9, 2008 at 10:01 AM, Grant Ingersoll [EMAIL PROTECTED]wrote: The example in example/solr/conf/solrconfig.xml should show a couple of different options: searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=namedefault/str str name=fieldspell/str str name=spellcheckIndexDir./spellchecker1/str /lst lst name=spellchecker str name=namejarowinkler/str str name=fieldspell/str !-- Use a different Distance Measure -- str name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str str name=spellcheckIndexDir./spellchecker2/str /lst lst name=spellchecker str name=classnamesolr.FileBasedSpellChecker/str str name=namefile/str str name=sourceLocationspellings.txt/str str name=characterEncodingUTF-8/str str name=spellcheckIndexDir./spellcheckerFile/str /lst /searchComponent The first two are index based. The spell field for the example is: field name=spell type=textSpell indexed=true stored=true multiValued=true/ HTH, Grant On Oct 9, 2008, at 9:38 AM, Matt Mitchell wrote: I'm starting to implement the new SpellCheckComponent. The solr 1.3 dist example is using a file based dictionary, but I'd like to figure out the best way to populate the dictionary from our index. Should the spellcheck field be multivalued? Thanks, Matt -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Spellchecker Question
I'm using the Spellchecker handler but am a little confused. The docs say to run the cmd=rebuild when building the first time. Do I need to supply a q param with that cmd=rebuild? The examples show a url with the q param set while rebuilding, but the main section on the cmd param doesn't say much about it. My hunch is that I need to supply a q? Thanks, Matt
Re: Installation help
What does the Jetty log output say in the console after you start it? It should mention the port # on one of the last lines. If it does, try using curl or wget to do a local request: curl http://localhost:8983/solr/ wget http://localhost:8983/solr/ Matt On Wed, Apr 16, 2008 at 5:08 PM, Shawn Carraway [EMAIL PROTECTED] wrote: Hi all, I am trying to install Solr with Jetty (as part of another application) on a Linux server running Gentoo linux and JDK 1.6.0_05. When I try to start Jetty (and Solr), it doesn't open a port. I know you will need more info, but I'm not sure what you would need as I'm not clear on how this part works. Thanks, Shawn
custom request handler; standard vs dismax
Hi, I recently started playing with the dismax handler and custom request handlers. When using the solr.StandardRequestHandler class, I get the response that I want; lots of facet values. When I switch to the dismax class, I get none. I've posted my request handler definitions here. Am I missing something totally obvious? Thanks, Matt p.s. using the latest/nightly build of solr * an example url: http://localhost:8983/solr/select/?facet.limit=6wt=rubyrows=0facet=truefacet.mincount=1facet.offset=0q=*:*fl=*,scoreqt=catalogfacet.missing=truefacet.field=source_facetfacet.sort=true * no facet values with this: requestHandler name=catalog class=solr.DisMaxRequestHandler str name=q.alt*:*/str str name=hlon/str /requestHandler * lots of facet values with this: requestHandler name=catalog class=solr.StandardRequestHandler str name=q.alt*:*/str str name=hlon/str /requestHandler
Re: search for non empty field
Thanks Erik. I think this is the thread here: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200709.mbox/[EMAIL PROTECTED] Matt On Sun, Mar 30, 2008 at 9:50 PM, Erik Hatcher [EMAIL PROTECTED] wrote: Documents with a particular field can be matched using: field:[* TO *] Or documents without a particular field with: -field:[* TO *] An empty field? Meaning one that was indexed but with no terms? I'm not sure about that one. Seems like Hoss replied to something similar on this last week or so though - check the archives. Erik On Mar 30, 2008, at 9:43 PM, Matt Mitchell wrote: I'm looking for the exact same thing. On Sun, Mar 30, 2008 at 8:45 PM, Ismail Siddiqui [EMAIL PROTECTED] wrote: Hi all, I have a situation where i have to filter result on a non empty field . wild card wont work as it will have to match with a letter. How can I form query to return result where a particular field is non-empty . Ismail
Re: search for non empty field
I'm looking for the exact same thing. On Sun, Mar 30, 2008 at 8:45 PM, Ismail Siddiqui [EMAIL PROTECTED] wrote: Hi all, I have a situation where i have to filter result on a non empty field . wild card wont work as it will have to match with a letter. How can I form query to return result where a particular field is non-empty . Ismail
Using Ruby to POST to Solr
Hi, I just posted this to the ruby/google group. It probably belongs here! Also, anyone know exactly what the @ symbol in the curl command is doing? Thanks, Matt I've got a script that uses curl, and would like (for educational purposes mind you) to use ruby instead. This is the curl command that works: F=./my_data.xml curl 'http://localhost:8080/update' --data-binary @$F -H 'Content- type:text/xml; charset=utf-8' I've been messing with Net::Http using something like below, with variations (Base64.encode64) but nothing works yet. Anyone know the ruby equivlent to the curl version above? Thanks! # NOT WORKING: my_url = 'http://localhost:8080/update' data = File.read('my_data.xml') url = URI.parse(my_url) post = Net::HTTP::Post.new(url.path) post.body = data post.content_type = 'application/x-www-form-urlencoded; charset=utf-8' response = Net::HTTP.start(url.host, url.port) do |http| http.request(post) end puts response.body
Re: Using Ruby to POST to Solr
Hi Michael, Thanks for that. I've got something that's working now: data = File.read('my_solr_docs.xml') url = URI.parse('http://localhost:8080/my_solr/update') http = Net::HTTP.new(url.host, url.port) response, body = http.post(url.path, data, {'Content-type'='text/ xml; charset=utf-8'}) Matt On Sep 11, 2007, at 9:42 AM, Michael Kimsal wrote: The curl man page states: If you start the data with the letter @, the rest should be a file name to read the data from, or - if you want curl to read the data from stdin. The contents of the file must already be url-encoded. Multiple files can also be specified. Posting data from a file named 'foobar' would thus be done with --data @foobar. On 9/11/07, Matt Mitchell [EMAIL PROTECTED] wrote: Hi, I just posted this to the ruby/google group. It probably belongs here! Also, anyone know exactly what the @ symbol in the curl command is doing? Thanks, Matt I've got a script that uses curl, and would like (for educational purposes mind you) to use ruby instead. This is the curl command that works: F=./my_data.xml curl 'http://localhost:8080/update' --data-binary @$F -H 'Content- type:text/xml; charset=utf-8' I've been messing with Net::Http using something like below, with variations (Base64.encode64) but nothing works yet. Anyone know the ruby equivlent to the curl version above? Thanks! # NOT WORKING: my_url = 'http://localhost:8080/update' data = File.read('my_data.xml') url = URI.parse(my_url) post = Net::HTTP::Post.new(url.path) post.body = data post.content_type = 'application/x-www-form-urlencoded; charset=utf-8' response = Net::HTTP.start(url.host, url.port) do |http| http.request(post) end puts response.body -- Michael Kimsal http://webdevradio.com Matt Mitchell Digital Scholarship Services Box 400129 Alderman Library University of Virginia Charlottesville, VA 22904 [EMAIL PROTECTED]
Re: Using Ruby to POST to Solr
Yes! Beautiful. I'll be checking that out. matt On Sep 11, 2007, at 12:18 PM, Erik Hatcher wrote: Matt, Try this instead: gem install solr-ruby # ;) Then in irb or wherever: solr = Solr::Connection.new(http://localhost:8983/solr;) solr.add(:id = 123, :title = insert title here) solr.commit solr.query(title) Visit us over on the [EMAIL PROTECTED] e-mail list for more on working with Solr from Ruby. Erik On Sep 11, 2007, at 10:55 AM, Matt Mitchell wrote: Hi Michael, Thanks for that. I've got something that's working now: data = File.read('my_solr_docs.xml') url = URI.parse('http://localhost:8080/my_solr/update') http = Net::HTTP.new(url.host, url.port) response, body = http.post(url.path, data, {'Content-type'='text/ xml; charset=utf-8'}) Matt On Sep 11, 2007, at 9:42 AM, Michael Kimsal wrote: The curl man page states: If you start the data with the letter @, the rest should be a file name to read the data from, or - if you want curl to read the data from stdin. The contents of the file must already be url-encoded. Multiple files can also be specified. Posting data from a file named 'foobar' would thus be done with --data @foobar. On 9/11/07, Matt Mitchell [EMAIL PROTECTED] wrote: Hi, I just posted this to the ruby/google group. It probably belongs here! Also, anyone know exactly what the @ symbol in the curl command is doing? Thanks, Matt I've got a script that uses curl, and would like (for educational purposes mind you) to use ruby instead. This is the curl command that works: F=./my_data.xml curl 'http://localhost:8080/update' --data-binary @$F -H 'Content- type:text/xml; charset=utf-8' I've been messing with Net::Http using something like below, with variations (Base64.encode64) but nothing works yet. Anyone know the ruby equivlent to the curl version above? Thanks! # NOT WORKING: my_url = 'http://localhost:8080/update' data = File.read('my_data.xml') url = URI.parse(my_url) post = Net::HTTP::Post.new(url.path) post.body = data post.content_type = 'application/x-www-form-urlencoded; charset=utf-8' response = Net::HTTP.start(url.host, url.port) do |http| http.request(post) end puts response.body -- Michael Kimsal http://webdevradio.com Matt Mitchell Digital Scholarship Services Box 400129 Alderman Library University of Virginia Charlottesville, VA 22904 [EMAIL PROTECTED] Matt Mitchell Digital Scholarship Services Box 400129 Alderman Library University of Virginia Charlottesville, VA 22904 [EMAIL PROTECTED]
solr/home
Hi, I recently upgraded to Solr 1.2. I've set it up through Tomcat using context fragment files. I deploy using the tomcat web manager. In the context fragment I set the environment variable solr/home. This use to work as expected. The solr/home value pointed to the directory where data, conf etc. live. Now, this value doesn't get used and instead, tomcat creates a new directory called solr and solr/data in the same directory where the context fragment file is located. It's not really a problem in this particular instance. I like the idea of it defaulting to solr in the same location as the context fragment file, but as long as I can depend on it always working like that. It is a little puzzling as to why the value in my environment setting doesn't work though? Has anyone else experienced this behavior? Matt
Re: solr/home
Here you go: Context docBase=/usr/local/lib/solr.war debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=/usr/ local/projects/my_app/current/solr-home / /Context This is the same file I'm putting into the Tomcat manager XML Configuration file URL form input. Matt On Sep 6, 2007, at 3:25 PM, Tom Hill wrote: It works for me. (fragments with solr 1.2 on tomcat 5.5.20) Could you post your fragment file? Tom On 9/6/07, Matt Mitchell [EMAIL PROTECTED] wrote: Hi, I recently upgraded to Solr 1.2. I've set it up through Tomcat using context fragment files. I deploy using the tomcat web manager. In the context fragment I set the environment variable solr/home. This use to work as expected. The solr/home value pointed to the directory where data, conf etc. live. Now, this value doesn't get used and instead, tomcat creates a new directory called solr and solr/data in the same directory where the context fragment file is located. It's not really a problem in this particular instance. I like the idea of it defaulting to solr in the same location as the context fragment file, but as long as I can depend on it always working like that. It is a little puzzling as to why the value in my environment setting doesn't work though? Has anyone else experienced this behavior? Matt
Updating index on cluster
Hi, I'm currently working on an application which is living in a clustered server environment. There is a hardware based balancer, and each node in the cluster has a separate install of Solr. The application code and files are on a NFS mount, along with the solr/ home. The first node has been acting as the master. My question is about reindexing, and even schema updates in some circumstances. For a reindex, I post to Solr on the master node and then restart the remaining nodes. Is there a better way to do this? For a schema update, I stop the master, delete the data/index dir, start solr and then post to Solr on the master node. Then I restart the remaining nodes. Is there a better way to do this? Any tips, feedback or what have are much appreciated! Matt
Delete entire index
Hi, Is there a way to have Solr completely remove the current index? deleteAll/ ? We're still in development and so our schema is wavering. Anytime we make a change and want to re-index we first have to: stop tomcat (or the solr webapp) manually remove the data/index restart tomcat (or the solr webapp) The removing of the data/index directory is where we have the most trouble, because of the file permissions. The data/index directory is owned by tomcat/tomcat so in order to remove it, we have to issue sudo rm which we'd like to avoid. Ideally if we could just tell Solr to delete all data without having to do anymore manual work, it'd be great! : ) Something else that would help is if we tell Tomcat/Solr which user/ group and/or permission to use on the data/index directory when it's created. Any thoughts on this? Matt
Tomcat: The requested resource (/solr/update) is not available.
Hi, I've got an app using Cocoon and Solr, both running through Tomcat. The post.sh file has been modified to grab local files, send it to Cocoon (via http), the Solr-fied xml from Cocoon is then sent to the update url in Tomcat/Solr. Not sure any of that is relevant though! I'm running the post.sh file like: post.sh ../xml/*.xml Which sends all of the files in xml to the post.sh script. Most of the POSTs work fine, but every once in a while I'll get: The requested resource (/solr/update) is not available. So my questions is this, is there a problem with sending all of those post requests to solr all at once? Should I be waiting to get an ok response before posting the next? Or is it OK to just blast solr like that? I'm wondering if its a Tomcat issue? Matt
Commit failing with EOFException
Hi, I've had this application running before and not sure what has changed to cause this error. When trying to do a clean update (removed index dir and restarted solr) with just a commit/, Solr is returning a status 1 with this error at the top: java.io.EOFException: input contained no data Does anyone have any idea as to why that's happening? The same thing occurs when I try to use the post.sh script with a valid xml file. Thank you! Matt
Re: Commit failing with EOFException
OK figured this out. The short of it is, make sure your schema is always up to date! : ) The schema did not match the xml docs being posted. And because we had a previous solr update with those docs, even trying to post/ update a commit/ was failing because there was already bad data waiting to be committed. Matt On May 31, 2007, at 11:42 AM, Matt Mitchell wrote: Hi, I've had this application running before and not sure what has changed to cause this error. When trying to do a clean update (removed index dir and restarted solr) with just a commit/, Solr is returning a status 1 with this error at the top: java.io.EOFException: input contained no data Does anyone have any idea as to why that's happening? The same thing occurs when I try to use the post.sh script with a valid xml file. Thank you! Matt