Re: Geographical distance searching
Hi Patrick, On 9/27/07, patrick o'leary [EMAIL PROTECTED] wrote: p.s after a little tidy up I'll be adding this to both lucene and solr's repositories if folks feel that it's a useful addition. It's definitely very interesting. Did you compare performances of Lucene with a database allowing you to perform real GIS queries? I'm more a PostgreSQL guy and I must admit we usually use cube contrib or PostGIS for this sort of thing and with both, we are capable to use indexes for proximity queries and they can be pretty fast. Using the method you used with MySQL is definitely too slow and not used as soon as you have a certain amount of data in your table. Regards, -- Guillaume
Re: searching for non-empty fields
While in theory -URL: should be valid syntax, the Lucene query parser doesn't accept it and throws a ParseException. I've considered raising this issue on lucene-dev but it didn't seem to affect many users so I decided not to pursue the matter. On 27/09/2007, Chris Hostetter [EMAIL PROTECTED] wrote: ...and to work arround the problem untill you reindex... q=(URL:[* TO *] -URL:) ...at least: i'm 97% certain that will work. it won't help if you empty values are really oror ...
Re: Geographical distance searching
As far as I'm concerned nothings going to beat PG's GIS calculations, but it's tsearch was a lot slower than myisam. My goal was a single solution to reduce our complexity, but am interested to know if combining both an rdbms lucene works for you. Definitely let me know how it goes ! P Guillaume Smet wrote: Hi Patrick, On 9/27/07, patrick o'leary [EMAIL PROTECTED] wrote: p.s after a little tidy up I'll be adding this to both lucene and solr's repositories if folks feel that it's a useful addition. It's definitely very interesting. Did you compare performances of Lucene with a database allowing you to perform real GIS queries? I'm more a PostgreSQL guy and I must admit we usually use cube contrib or PostGIS for this sort of thing and with both, we are capable to use indexes for proximity queries and they can be pretty fast. Using the method you used with MySQL is definitely too slow and not used as soon as you have a certain amount of data in your table. Regards, -- Patrick O'Leary You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. Do you understand this? And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat. - Albert Einstein View Patrick O Leary's profile
Re: searching for non-empty fields
thanks Peter, Hoss and Ryan.. q=(URL:[* TO *] -URL:) This gives me 400 Query parsing error: Cannot parse '(URL:[* TO *] - URL:)': Lexical error at line 1, column 29. Encountered: \ (34), after : \ adding something like: filter class=solr.LengthFilterFactory min=1 max=1 / I'll do this but the problem here is I have to wait around for all these docs to re-index.. Your query will work if you make sure the URL field is omitted from the document at index time when the field is blank. The thing is, I thought I was omitting the field if it's blank. It's in a solrj instance that takes a lucenedocument, so maybe it's a solrj issue? if( URL != null URL.length() 5 ) doc.add(new Field(URL, URL, Field.Store.YES, Field.Index.UN_TOKENIZED)); And then during indexing: SimpleSolrDoc solrDoc = new SimpleSolrDoc(); solrDoc.setBoost( null, new Float ( doc.getBoost())); for (EnumerationField e = doc.fields(); e.hasMoreElements();) { Field field = e.nextElement(); if (!ignoreFields.contains((field.name( { solrDoc.addField(field.name(), field.stringValue()); } } try { solr.add(solrDoc); ...
LockObtainFailedException
will anyone help me why and how? org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SimpleFSLock@/usr/local/se archengine/apache-solr-1.2.0/fr_companies/solr/data/index/write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:70) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:579) at org.apache.lucene.index.IndexWriter.lt;initgt;(IndexWriter.java :341) at org.apache.solr.update.SolrIndexWriter.lt;initgt;( SolrIndexWriter.java:65) at org.apache.solr.update.UpdateHandler.createMainIndexWriter( UpdateHandler.java:120) at org.apache.solr.update.DirectUpdateHandler2.openWriter( DirectUpdateHandler2.java:181) at org.apache.solr.update.DirectUpdateHandler2.addDoc( DirectUpdateHandler2.java:259) at org.apache.solr.handler.XmlUpdateRequestHandler.update( XmlUpdateRequestHandler.java:166) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody (XmlUpdateRequestHandler .java:84) Thanks, Jae Joo
Re: LockObtainFailedException
quick fix look for a lucene lock file in your tmp directory and delete it, then restart solr, should start I am an idiot though, so be careful, in fact, I'm worse than an idiot, I know a little :-) you got a lock file somewhere though, deleting that will help you out, for me it was in my /tmp directory On 27 Sep 2007, at 14:10, Jae Joo wrote: will anyone help me why and how? org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SimpleFSLock@/usr/local/se archengine/apache-solr-1.2.0/fr_companies/solr/data/index/write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:70) at org.apache.lucene.index.IndexWriter.init (IndexWriter.java:579) at org.apache.lucene.index.IndexWriter.lt;initgt; (IndexWriter.java :341) at org.apache.solr.update.SolrIndexWriter.lt;initgt;( SolrIndexWriter.java:65) at org.apache.solr.update.UpdateHandler.createMainIndexWriter( UpdateHandler.java:120) at org.apache.solr.update.DirectUpdateHandler2.openWriter( DirectUpdateHandler2.java:181) at org.apache.solr.update.DirectUpdateHandler2.addDoc( DirectUpdateHandler2.java:259) at org.apache.solr.handler.XmlUpdateRequestHandler.update( XmlUpdateRequestHandler.java:166) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody (XmlUpdateRequestHandler .java:84) Thanks, Jae Joo
Re: What is facet?
On Sep 26, 2007, at 7:28 PM, Chris Hostetter wrote: cool = (popularity:[100 TO *] (+numFeatures:[10 TO *] +price:[0 TO 10])) lame = (+popularity:[* TO 99] +numFeatures:[* TO 9] +price:[11 TO *]) That example is definitely in the cool category. I couldn't resist creating a SolrTerminology wiki page linking to your post and breaking out the definitions we Solr folks want to embrace. I think it's a good idea to some common language definitions we agree upon here. Erik
Re: searching for non-empty fields
On 9/27/07, Pieter Berkel [EMAIL PROTECTED] wrote: While in theory -URL: should be valid syntax, the Lucene query parser doesn't accept it and throws a ParseException. I don't have time to work on that now, but I did just open a bug: https://issues.apache.org/jira/browse/LUCENE-1006 -Yonik
Request for graphics
I am trying to make a presentation on SOLR and have been unable to find the SOLR graphic in high quality. Could someone point me in the right direction or provide the graphics? Thanks, Benjamin Liles Lead Software Application Developer Digital Initiatives - Web Services University Libraries Texas AM University [EMAIL PROTECTED] 3.109E Library Annex | 5000 TAMU | College Station, TX 77843 Tel. 979.862.4948x122 http://library.tamu.edu
Re: moving index
On 9/27/07, Jae Joo [EMAIL PROTECTED] wrote: I do need to move the index files, but have a concerns any potential problem including performance? Do I have to keep the original document for querying? I assume you posted XML documents in Solr XML format (like adddoc...)? If so, that is just an example way to get the data into Solr. Those XML files aren't needed, and any high-speed indexing will avoid creating files at all - just create the XML doc in memory and send to solr via HTTP-POST. -Yonik
Re: Converting German special characters / umlaute
Chris Hostetter wrote: : is there an analyzer which automatically converts all german special : characters to their specific dissected from, such as ü to ue and ä to : ae, etc.?! See also the ISOLatin1TokenFilter which does this regardless of langauge. Actually, ISOLatin1TokenFilter does NOT convert /ü/ to /ue/, /ä/ to /ae/, etc. Instead, it converts /ü/ to /u/, /ä/ to /a/, etc. It *does* convert /ß/ to /ss/, though I've seen some people write that the correct substitution for /ß/ in German is /sz/ - I don't speak or read German, so I don't know. Maybe there should be an option on ISOLatin1TokenFilter to use German substitutions, in addition to the current behavior of simply stripping diacritics? Does anyone know if there are other (Latin-1-utilizing) languages besides German with standardized diacritic substitutions that involve something other than just stripping the diacritics? Steve
Problem with handle hold deleted files
Hi, I'm using EmbeddedSolrServer and when I start the snapinstaller process i'm calling the commit method of the EmbeddedSolr througth a servlet but the JVM holds deleted files on Operating System and usage disk space excessive. Follow line sample from the command lsof |grep deleted java 17255 weblogic 419r REG 104,6 437821226462 /domains/solr-indexes/q/OPNPrecoIndex/datasolr/index_22746_preCommit/_2kb.cfs (deleted) When restarting the JVM process, the deleted files opened was clear and the disk space was free. I need help on this case.
Re: LockObtainFailedException
In solrconfig.xml, useCompoundFilefalse/useCompoundFile mergeFactor10/mergeFactor maxBufferedDocs25000/maxBufferedDocs maxMergeDocs1400/maxMergeDocs maxFieldLength500/maxFieldLength writeLockTimeout1000/writeLockTimeout commitLockTimeout1/commitLockTimeout Does writeLockTimeout too small? Thanks, Jae On 9/27/07, matt davies [EMAIL PROTECTED] wrote: quick fix look for a lucene lock file in your tmp directory and delete it, then restart solr, should start I am an idiot though, so be careful, in fact, I'm worse than an idiot, I know a little :-) you got a lock file somewhere though, deleting that will help you out, for me it was in my /tmp directory On 27 Sep 2007, at 14:10, Jae Joo wrote: will anyone help me why and how? org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SimpleFSLock@/usr/local/se archengine/apache-solr-1.2.0/fr_companies/solr/data/index/write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:70) at org.apache.lucene.index.IndexWriter.init (IndexWriter.java:579) at org.apache.lucene.index.IndexWriter.lt;initgt; (IndexWriter.java :341) at org.apache.solr.update.SolrIndexWriter.lt;initgt;( SolrIndexWriter.java:65) at org.apache.solr.update.UpdateHandler.createMainIndexWriter( UpdateHandler.java:120) at org.apache.solr.update.DirectUpdateHandler2.openWriter( DirectUpdateHandler2.java:181) at org.apache.solr.update.DirectUpdateHandler2.addDoc( DirectUpdateHandler2.java:259) at org.apache.solr.handler.XmlUpdateRequestHandler.update( XmlUpdateRequestHandler.java:166) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody (XmlUpdateRequestHandler .java:84) Thanks, Jae Joo
Re: searching for non-empty fields
On 9/27/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 9/27/07, Pieter Berkel [EMAIL PROTECTED] wrote: While in theory -URL: should be valid syntax, the Lucene query parser doesn't accept it and throws a ParseException. I don't have time to work on that now, OK, I lied :-) It was simple (and a nice diversion). -Yonik but I did just open a bug: https://issues.apache.org/jira/browse/LUCENE-1006
Re: Converting German special characters / umlaute
At 12:13 PM -0400 9/27/07, Steven Rowe wrote: Chris Hostetter wrote: : is there an analyzer which automatically converts all german special : characters to their specific dissected from, such as ü to ue and ä to : ae, etc.?! See also the ISOLatin1TokenFilter which does this regardless of langauge. Actually, ISOLatin1TokenFilter does NOT convert /ü/ to /ue/, /ä/ to /ae/, etc. Instead, it converts /ü/ to /u/, /ä/ to /a/, etc. It *does* convert /ß/ to /ss/, though I've seen some people write that the correct substitution for /ß/ in German is /sz/ - I don't speak or read German, so I don't know. You and lots of other people, including myself... Thus while there is indeed a specific dissected form -- certainly German speakers clearly understand that when an input mechanism doesn't allow for umlauted vowels (e.g. ASCII, non-German typewriters) that the /ue/, /ae/, etc. equivalents are to be used -- if maximally flexible matching between input texts and queries is desired, an information system used by non-German speakers has to account for them simply ignoring the umlaut and entering /u/, /e/ etc. while /ß/ needs to be matched as itself, /ss/, /sz/ (/ß/ is read as 'ess zed'), and I expect even /b/. So perhaps it would make sense for translation into a canonical format /ü/ to /ue/ and /ß/ to /ss/ at both index and query time, but also to then emit synonym (overlapping) tokens with /ue/ - /u/, /sz/ - /ss/, and perhaps even /b/ - /ss/. (This is just thinking aloud and I'd love to be corrected by someone with more experience in this realm) Maybe there should be an option on ISOLatin1TokenFilter to use German substitutions, in addition to the current behavior of simply stripping diacritics? As for implementation, the first part could easily and flexibly accomplished with the current PatternReplaceFilter, and I'm thinking the second could be done with an extension to that or better yet a new Filter which allows parsing synonymous tokens from a flat to overlaid format, e.g. something on the order of: filter class=solr.PatternReplaceFilterFactory pattern=(.*)(ü|ue)(.*) replacement=$1ue$3|$1u$3 tokensep=| !-- not currently implemented -- replace=first/ or perhaps better, filter class=solr.PatternReplaceFilterFactory pattern=(.*)(ü|ue)(.*) replacement=$1ue$3|$1u$3 replace=first/ filter class=solr.OverlayTokenFilterFactory tokensep=|/ !-- not currently implemented -- which in my fantasy implementation would map: Müller - Mueller|Muller Mueller - Mueller|Muller Muller - Muller and could be run at index-time and/or query-time as appropriate. Does anyone know if there are other (Latin-1-utilizing) languages besides German with standardized diacritic substitutions that involve something other than just stripping the diacritics? I'm curious about this too. - J.J.
Re: Converting German special characters / umlaute
Accent transforms are language-specific, so an accent filter should take an ISO langauge code as an argument. Some examples: * In French and English, a diereses is a hint to pronounce neighboring vowels separateley, as in coöp, naïve, or Noël. * In German, ü transformes to ue. * In Swedish, ö is a different letter than o, and should not be transformed. The same is true for ø in Danish and Norwegian. * Then there is Motörhead and Motley Crüe, see: http://en.wikipedia.org/wiki/Heavy_metal_umlaut * I don't know of an ISO language code for Tolkein's Elvish, so we're out of luck for Manwë. Another approach would be to generate the accent-transformed terms as synonyms at the same token position. Then you could generate multiple options. Obviously, we had to do this right for Ultraseek a few years ago. wunder On 9/27/07 9:13 AM, Steven Rowe [EMAIL PROTECTED] wrote: Maybe there should be an option on ISOLatin1TokenFilter to use German substitutions, in addition to the current behavior of simply stripping diacritics? Does anyone know if there are other (Latin-1-utilizing) languages besides German with standardized diacritic substitutions that involve something other than just stripping the diacritics?
Re: Date facetting and ranges overlapping
: I'm now using date facetting to browse events. It works really fine : and is really useful. The only problem so far is that if I have an : event which is exactly on the boundary of two ranges, it is referenced : 2 times. yeah, this is one of the big caveats with date faceting right now ... i struggled with this a bit when designing it, and ultimately decided to punt on the issue. the biggest hangup was that even if hte facet counting code was smart about making sure the ranges don't overlap, the range query syntax in the QueryParser doesn't support ranges that exclude one input (so there wouldn't be a lot you can do with the ranges once you know the counts in them) one idea i had in SOLR-258 was that we could add an interval option that would define how much to add to the end or one range to get the start of another range (think of the current implementation having interval hardcoded to 0) which would solve the problem and work with range queries that were inclusive of both endpoints, but would require people to use -1MILLI a lot. a better option (assuming a query parser change) would be a new option thta says wether each computed range should be enclusive of the low poin,t the high point, both end points, neither end points, or be smart (where smart is the same as low except for the last range where the it includes both) (I think there's already a lucene issue to add the query parser support, i just haven't had time to look at it) The simple workarround: if you know all of your data is indexed with perfect 0.000second precision, then put -1MILLI at the end of your start and end date faceting params. -Hoss
Re: custom sorting
On Sep 27, 2007, at 2:50 PM, Chris Hostetter wrote: to answer the broader question of using customized LUcene SortComparatorSource objects in solr -- it is in fact possible. In Solr, all decisisons about how to sort are driven by FieldTypes. You can subclass any of the FieldTypes that come with Solr and override just the getSortField method to use whatever sort logic you want and then use your new FieldType as you would any other plugin... http://wiki.apache.org/solr/SolrPlugins In the case where you have a custom SortComparatorSource that is not field specific (or uses data from morethen one field) you would need to make your field type smart enough to let you cofigure (via the fieldType declaration in the schema) which fields (if any) to get it's data from, and then create a marker field of that type, which you don't use to index or store any data, but you use to indicate when to trigger your custom sort logic, ie... fieldType name=distance class=solr.YourField latFieldName=latitude lonFieldName=longitute stored=false indexed=false / ... field name=latitude type=sint indexed=true stored=true / field name=latitude type=sint indexed=true stored=true / field name=distance type=distance / ...and then use sort=distance+asc in your query Using something like this, how would the custom SortComparatorSource get a parameter from the request to use in sorting calculations? I haven't looked under the covers of the local-solr stuff that flew by earlier, but looks quite well done. I think I can speak for many that would love to have geo field types / sorting capability built into Solr. Erik
Selecting Distinct values?
Hi there. Is there a query I can use to select distinct values in an index? I thought I could use a facet, but the facets don't seem to return all the distinct values in the index, only the highest-count ones. Is there another query I can try? Or, can I adjust the facets somehow to make this work? Thanks, DW
Re: custom sorting
On 9/27/07, Erik Hatcher [EMAIL PROTECTED] wrote: Using something like this, how would the custom SortComparatorSource get a parameter from the request to use in sorting calculations? perhaps hook in via function query: dist(10.4,20.2,geoloc) And either manipulate the score with that and sort by score, q=+(foo bar)^0 dist(10.4,20.2,geoloc) sort=score asc or extend solr's sorting mechanisms to allow specifying a function to sort by. sort=dist(10.4,20.2,geoloc) asc -Yonik
Re: Date facetting and ranges overlapping
On 9/27/07, Chris Hostetter [EMAIL PROTECTED] wrote: a better option (assuming a query parser change) would be a new option thta says wether each computed range should be enclusive of the low poin,t the high point, both end points, neither end points, or be smart (where smart is the same as low except for the last range where the it includes both) That could be really cool. The simple workarround: if you know all of your data is indexed with perfect 0.000second precision, then put -1MILLI at the end of your start and end date faceting params. Good idea. The only problem is that I'll have to modify my client code to deal with the fact that solr now returns 17:59:59 instead of 18:00:00. Not difficult but less clean than before. Thanks for the advice. I'll give it a try. -- Guillaume
Re: Selecting Distinct values?
On 27-Sep-07, at 12:01 PM, David Whalen wrote: Hi there. Is there a query I can use to select distinct values in an index? I thought I could use a facet, but the facets don't seem to return all the distinct values in the index, only the highest-count ones. Is there another query I can try? Or, can I adjust the facets somehow to make this work? http://wiki.apache.org/solr/ SimpleFacetParameters#head-1b281067d007d3fb66f07a3e90e9b1704cbc59a3 cheers, -Mike
Re: Date facetting and ranges overlapping
On 9/27/07, Chris Hostetter [EMAIL PROTECTED] wrote: The simple workarround: if you know all of your data is indexed with perfect 0.000second precision, then put -1MILLI at the end of your start and end date faceting params. It fixed my problem. Thanks. -- Guillaume
RE: What is facet?
Thank you Ezra and Chris for explaining this, and I like your idea, Erik. This will make intro to Solr easier for new comers, and make Solr more popular. -Kuro That example is definitely in the cool category. I couldn't resist creating a SolrTerminology wiki page linking to your post and breaking out the definitions we Solr folks want to embrace. I think it's a good idea to some common language definitions we agree upon here. Erik
RE: Selecting Distinct values?
grin Silly me. Thanks! -Original Message- From: Mike Klaas [mailto:[EMAIL PROTECTED] Sent: Thursday, September 27, 2007 4:46 PM To: solr-user@lucene.apache.org Subject: Re: Selecting Distinct values? On 27-Sep-07, at 12:01 PM, David Whalen wrote: Hi there. Is there a query I can use to select distinct values in an index? I thought I could use a facet, but the facets don't seem to return all the distinct values in the index, only the highest-count ones. Is there another query I can try? Or, can I adjust the facets somehow to make this work? http://wiki.apache.org/solr/SimpleFacetParameters#head-1b28106 7d007d3fb66f07a3e90e9b1704cbc59a3 cheers, -Mike
Re: anyone can send me jetty-plus
If you're using Jetty 6, there's no need for a separate Jetty Plus download. The plus jarfiles come in the standard distribution. --matt On Sep 27, 2007, at 12:10 AM, James liu wrote: i can't download it from http://jetty.mortbay.org/jetty5/plus/ index.html -- regards jl -- Matt Kangas / [EMAIL PROTECTED]