[jira] Commented: (LUCENE-1815) Geohash encode/decode floating point problems
[ https://issues.apache.org/jira/browse/LUCENE-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787934#action_12787934 ] patrick o'leary commented on LUCENE-1815: - What google code are you working with? Geohash encode/decode floating point problems - Key: LUCENE-1815 URL: https://issues.apache.org/jira/browse/LUCENE-1815 Project: Lucene - Java Issue Type: Bug Components: contrib/spatial Affects Versions: 2.9 Reporter: Wouter Heijke Priority: Minor i'm finding the Geohash support in the spatial package to be rather unreliable. Here is the outcome of a test that encodes/decodes the same lat/lon and geohash a few times. the format: action geohash=(latitude, longitude) the result: encode u173zq37x014=(52.3738007,4.8909347) decode u173zq37x014=(52.3737996,4.890934) encode u173zq37rpbw=(52.3737996,4.890934) decode u173zq37rpbw=(52.3737996,4.89093295) encode u173zq37qzzy=(52.3737996,4.89093295) if I now change to the google code implementation: encode u173zq37x014=(52.3738007,4.8909347) decode u173zq37x014=(52.37380061298609,4.890934377908707) encode u173zq37x014=(52.37380061298609,4.890934377908707) decode u173zq37x014=(52.37380061298609,4.890934377908707) encode u173zq37x014=(52.37380061298609,4.890934377908707) Note the differences between the geohashes in both situations and the lat/lon's! Now things get worse if you work on low-precision geohashes: decode u173=(52.0,4.0) encode u14zg429yy84=(52.0,4.0) decode u14zg429yy84=(52.0,3.99) encode u14zg429ywx6=(52.0,3.99) and google: decode u173=(52.20703125,4.5703125) encode u173=(52.20703125,4.5703125) decode u173=(52.20703125,4.5703125) encode u173=(52.20703125,4.5703125) We are using geohashes extensively and will now use the google code version unfortunately. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1571) DistanceFilter problem with deleted documents
[ https://issues.apache.org/jira/browse/LUCENE-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719272#action_12719272 ] patrick o'leary commented on LUCENE-1571: - patch looks good to me, tests passed as well DistanceFilter problem with deleted documents - Key: LUCENE-1571 URL: https://issues.apache.org/jira/browse/LUCENE-1571 Project: Lucene - Java Issue Type: Bug Components: contrib/spatial Environment: N/A Reporter: Phillip Rhodes Assignee: Michael McCandless Fix For: 2.9 Attachments: LUCENE-1571.patch I know this is the locallucene lib, but wanted to make sure we don't get this bug when it gets into lucene contrib. I suspect that the issue is that deleted documents are trying to be evaluated by the filter. I did some debugging and I confirmed that it is bombing on a document that is marked as deleted (using Luke). Thanks! Using the locallucene library 1.51, I get a NullPointerException at line 123 of DistanceFilter The method is public BitSet bits(IndexReader reader) The line is double x = NumberUtils.SortableStr2double(sx); The stack trace is: java.lang.NullPointerException at org.apache.solr.util.NumberUtils.SortableStr2long(NumberUtils.java:149) at org.apache.solr.util.NumberUtils.SortableStr2double(NumberUtils.java:104) at com.pjaol.search.geo.utils.DistanceFilter.bits(DistanceFilter.java:123) at org.apache.lucene.search.Filter.getDocIdSet(Filter.java:49) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:140) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:112) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:113) at org.apache.lucene.search.Hits.init(Hits.java:90) at org.apache.lucene.search.Searcher.search(Searcher.java:72) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
ReadOnlyMultiSegmentReader bitset id vs doc id
hey I've got a filter that's storing document id's with a geo distance for spatial lucene using a bitset position for doc id, However with a MultiSegmentReader that's no longer going to working. What's the most appropriate way to go from bitset position to doc id now? Thanks Patrick
Re: ReadOnlyMultiSegmentReader bitset id vs doc id
Think I may have found it, it was multiple runs of the filter, one for each segment reader, I was generating a new map to hold distances each time. So only the distances from the last segment reader were stored. Currently it looks like those segmented searches are done serially, well in solr they are- I presume the end goal is to make them multi-threaded ? I'll need to make my map synchronized On Tue, Apr 28, 2009 at 4:42 PM, Uwe Schindler u...@thetaphi.de wrote: What is the problem exactly? Maybe you use the new Collector API, where the search is done for each segment, so caching does not work correctly? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -- *From:* patrick o'leary [mailto:pj...@pjaol.com] *Sent:* Tuesday, April 28, 2009 10:31 PM *To:* java-dev@lucene.apache.org *Subject:* ReadOnlyMultiSegmentReader bitset id vs doc id hey I've got a filter that's storing document id's with a geo distance for spatial lucene using a bitset position for doc id, However with a MultiSegmentReader that's no longer going to working. What's the most appropriate way to go from bitset position to doc id now? Thanks Patrick
Re: ReadOnlyMultiSegmentReader bitset id vs doc id
Ok finally with some pointers from Ryan, figured out the last problem. So as a note to anyone else who might encounter the same problems with multireader A) Directories can contain multiple segments and a reader for those segments B) Searches are replayed within each reader in a serial fashion ** C) If utilizing FieldCache / BitSet or anything related to document position within a reader, and you need docId -- document id = (sum of previous reader maxdocs )+ bitset position e.g. int offset; int nextOffset; public DocIdSet getDocIdSet(IndexReader reader) { OpenBitSet bitset = new OpenBitSet(reader.maxDoc()); offset += reader.maxDoc(); for (int i =0; i reader.maxDoc(); i++) { . filter stuff if ( good ) { bitset.set( i ); int docId = i + nextOffset; ... } } nextOffset += offset; ... } K, works time for sleep P On Tue, Apr 28, 2009 at 5:44 PM, patrick o'leary pj...@pjaol.com wrote: Think I may have found it, it was multiple runs of the filter, one for each segment reader, I was generating a new map to hold distances each time. So only the distances from the last segment reader were stored. Currently it looks like those segmented searches are done serially, well in solr they are- I presume the end goal is to make them multi-threaded ? I'll need to make my map synchronized On Tue, Apr 28, 2009 at 4:42 PM, Uwe Schindler u...@thetaphi.de wrote: What is the problem exactly? Maybe you use the new Collector API, where the search is done for each segment, so caching does not work correctly? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -- *From:* patrick o'leary [mailto:pj...@pjaol.com] *Sent:* Tuesday, April 28, 2009 10:31 PM *To:* java-dev@lucene.apache.org *Subject:* ReadOnlyMultiSegmentReader bitset id vs doc id hey I've got a filter that's storing document id's with a geo distance for spatial lucene using a bitset position for doc id, However with a MultiSegmentReader that's no longer going to working. What's the most appropriate way to go from bitset position to doc id now? Thanks Patrick
Re: Spatial package plans
Free world, help yourself :-) On Wed, Apr 22, 2009 at 6:39 PM, Wouter Heijke whei...@xs4all.nl wrote: The amount of replies and the state of the code make me think making my own distance filter using a real GIS solution like geotools is the way to go. I wonder anyway if GIS code should be in any Lucene package.. Wouter Yeah it's hard coded to use miles, 5 years in the US gets to you.. But the functionality doesn't change radius is double so you just need to convert km to miles for the DistanceQueryBuilder and just convert back from miles to km to display. On Mon, Apr 20, 2009 at 8:14 AM, Wouter Heijke whei...@xs4all.nl wrote: I'm working on local search functionality and am about to use the spatial code in contrib. I managed to have a proof of concept running using LatLongDistanceFilter. The only problem I have with this filter is that it is hardcoded to use Miles! Basically my question is what are the plans for the spatial code? Is it going to stay the way it is? Wouter - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Spatial package plans
Yeah it's hard coded to use miles, 5 years in the US gets to you.. But the functionality doesn't change radius is double so you just need to convert km to miles for the DistanceQueryBuilder and just convert back from miles to km to display. On Mon, Apr 20, 2009 at 8:14 AM, Wouter Heijke whei...@xs4all.nl wrote: I'm working on local search functionality and am about to use the spatial code in contrib. I managed to have a proof of concept running using LatLongDistanceFilter. The only problem I have with this filter is that it is hardcoded to use Miles! Basically my question is what are the plans for the spatial code? Is it going to stay the way it is? Wouter - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1588) Update Spatial Lucene sort to use FieldComparatorSource
[ https://issues.apache.org/jira/browse/LUCENE-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] patrick o'leary updated LUCENE-1588: Attachment: LUCENE-1588.patch Deprecate DistanceSortSource and Add DistanceFieldComparator updated Test case to use DistanceFieldComparator Usage {code} // Create a distance sort // As the radius filter has performed the distance calculations // already, pass in the filter to reuse the results. // DistanceFieldComparatorSource dsort = new DistanceFieldComparatorSource(dq.distanceFilter); Sort sort = new Sort(new SortField(foo, dsort,false)); // Perform the search, using the term query, the serial chain filter, and the // distance sort Hits hits = searcher.search(customScore, dq.getFilter(),sort); {code} If nobody objects I'll apply this later today Update Spatial Lucene sort to use FieldComparatorSource --- Key: LUCENE-1588 URL: https://issues.apache.org/jira/browse/LUCENE-1588 Project: Lucene - Java Issue Type: Improvement Components: contrib/spatial Affects Versions: 2.9 Reporter: patrick o'leary Assignee: patrick o'leary Priority: Trivial Fix For: 2.9 Attachments: LUCENE-1588.patch Update distance sorting to use FieldComparator sorting as opposed to SortComparator -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: List Moderators
Is it also worth while to check if a static signature can be added to mails with instructions Or a link to the apache mail instructions? It will reduce a lot of repeat questions. On Thu, Mar 26, 2009 at 2:46 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Every now and again, someone emails me off list asking to be removed from the : list and I always forward them to Erik, b/c I know he is a moderator. : However, I was wondering who else is besides Erik, since, AIUI, there needs to : be at least 3 in ASF-land, right? : : So, if you're a list moderator for dev/user, please stand up. the docs for say committers have instructions for checking the moderators for any list, however the process seems to no longer work (probably because mail handling got moved onto a different box)... http://www.apache.org/dev/committers.html#mailing-list-moderators https://svn.apache.org/repos/private/committers/docs/resources.txt ...might be worth following up with INFRA to sanity check the list of moderators on all lucene lists, make sure we have three *active* moderators on each list. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-1568) Fix for NPE's in Spatial Lucene for searching bounding box only
Fix for NPE's in Spatial Lucene for searching bounding box only --- Key: LUCENE-1568 URL: https://issues.apache.org/jira/browse/LUCENE-1568 Project: Lucene - Java Issue Type: Bug Components: contrib/spatial Reporter: patrick o'leary Assignee: patrick o'leary Priority: Minor NPE occurs when using DistanceQueryBuilder for minimal bounding box search without the distance filter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1568) Fix for NPE's in Spatial Lucene for searching bounding box only
[ https://issues.apache.org/jira/browse/LUCENE-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] patrick o'leary updated LUCENE-1568: Attachment: LUCENE-1568.patch Fixes an NPE when using DistanceQueryBuilder for just minimal bounding box searches e.g. {code} final DistanceQueryBuilder dq = new DistanceQueryBuilder( latitude, longitude, radius, latField, //name of latitude field in index lngField, //name of longitude field in index tierPrefix, // prefix of tier fields in index false /*filter by radius, false means mbb search */ ); {code} Fix for NPE's in Spatial Lucene for searching bounding box only --- Key: LUCENE-1568 URL: https://issues.apache.org/jira/browse/LUCENE-1568 Project: Lucene - Java Issue Type: Bug Components: contrib/spatial Reporter: patrick o'leary Assignee: patrick o'leary Priority: Minor Attachments: LUCENE-1568.patch NPE occurs when using DistanceQueryBuilder for minimal bounding box search without the distance filter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1568) Fix for NPE's in Spatial Lucene for searching bounding box only
[ https://issues.apache.org/jira/browse/LUCENE-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683951#action_12683951 ] patrick o'leary commented on LUCENE-1568: - If nobody objects I'll commit this later today Fix for NPE's in Spatial Lucene for searching bounding box only --- Key: LUCENE-1568 URL: https://issues.apache.org/jira/browse/LUCENE-1568 Project: Lucene - Java Issue Type: Bug Components: contrib/spatial Reporter: patrick o'leary Assignee: patrick o'leary Priority: Minor Fix For: 2.9 Attachments: LUCENE-1568.patch NPE occurs when using DistanceQueryBuilder for minimal bounding box search without the distance filter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Closed: (LUCENE-1568) Fix for NPE's in Spatial Lucene for searching bounding box only
[ https://issues.apache.org/jira/browse/LUCENE-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] patrick o'leary closed LUCENE-1568. --- resolved Fix for NPE's in Spatial Lucene for searching bounding box only --- Key: LUCENE-1568 URL: https://issues.apache.org/jira/browse/LUCENE-1568 Project: Lucene - Java Issue Type: Bug Components: contrib/spatial Reporter: patrick o'leary Assignee: patrick o'leary Priority: Minor Fix For: 2.9 Attachments: LUCENE-1568.patch NPE occurs when using DistanceQueryBuilder for minimal bounding box search without the distance filter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Committed revision 735928.
Committed revision 735928. Adding myself to contrib committers list / testing karma Thanks Patrick scootie:site pjaol$ svn diff docs/*.html Index: docs/whoweare.html === --- docs/whoweare.html (revision 735927) +++ docs/whoweare.html (working copy) @@ -285,6 +285,9 @@ bWolfgang Hoschek/b (whosc...@...)/li li +bPatrick O'Leary/b (pj...@...)/li + +li bUwe Schindler/b (uschind...@...)/li li @@ -300,7 +303,7 @@ /div -a name=N10087/aa name=emeritus/a +a name=N1008C/aa name=emeritus/a h2 class=boxedEmeritus Committers/h2 div class=section ul scootie:site pjaol$ svn diff src/documentation/content/xdocs/whoweare.xml Index: src/documentation/content/xdocs/whoweare.xml === --- src/documentation/content/xdocs/whoweare.xml (revision 735927) +++ src/documentation/content/xdocs/whoweare.xml (working copy) @@ -31,6 +31,7 @@ section id=contribtitleContrib Committers/title ul libWolfgang Hoschek/b (whosc...@...)/li +libPatrick O'Leary/b (pj...@...)/li libUwe Schindler/b (uschind...@...)/li libAndi Vajda/b (va...@...)/li libKarl Wettin/b (ka...@...)/li
[jira] Updated: (LUCENE-1512) Incorporate GeoHash in contrib/spatial
[ https://issues.apache.org/jira/browse/LUCENE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] patrick o'leary updated LUCENE-1512: Attachment: LUCENE-1512.patch Made necessary changes * Formatting fixed * Removed dependency on LUCENE-1504 * Moved GeoHash elements into o.a.l.spatial.geohash Incorporate GeoHash in contrib/spatial -- Key: LUCENE-1512 URL: https://issues.apache.org/jira/browse/LUCENE-1512 Project: Lucene - Java Issue Type: New Feature Components: contrib/spatial Reporter: patrick o'leary Priority: Minor Attachments: LUCENE-1512.patch, LUCENE-1512.patch Based on comments from Yonik and Ryan in SOLR-773 GeoHash provides the ability to store latitude / longitude values in a single field consistent hash field. Which elements the need to maintain 2 field caches for latitude / longitude fields, reducing the size of an index and the amount of memory needed for a spatial search. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1304) Memory Leak when using Custom Sort (i.e., DistanceSortSource) of LocalLucene with Lucene
[ https://issues.apache.org/jira/browse/LUCENE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12661199#action_12661199 ] patrick o'leary commented on LUCENE-1304: - How will LUCENE-1483 impact this immediately? I'd really like to get this patch in first and refactor if and when 1483 goes in, the benefit of bypassing static comparator is really needed. Memory Leak when using Custom Sort (i.e., DistanceSortSource) of LocalLucene with Lucene Key: LUCENE-1304 URL: https://issues.apache.org/jira/browse/LUCENE-1304 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.3 Environment: Windows/JDK 1.6 Reporter: Ethan Tao Attachments: LUCENE-1304.patch We had the memory leak issue when using DistanceSortSource of LocalLucene for repeated query/search. In about 450 queries, we are experiencing out of memory error. After dig in the code, we found the problem source is coming from Lucene package, the way how it handles custom type comparator. Lucene internally caches all created comparators. In the case of query using LocalLucene, we create new comparator for every search due to different lon/lat and query terms. This causes major memory leak as the cached comparators are also holding memory for other large objects (e.g., bit sets). The solution we came up with: ( the proposed change from Lucene is 1 and 3 below) 1.In Lucene package, create new file SortComparatorSourceUncacheable.java: package org.apache.lucene.search; import org.apache.lucene.index.IndexReader; import java.io.IOException; import java.io.Serializable; public interface SortComparatorSourceUncacheable extends Serializable { } 2.Have your custom sort class to implement the interface public class LocalSortSource extends DistanceSortSource implements SortComparatorSourceUncacheable { ... } 3.Modify Lucene's FieldSorterHitQueue.java to bypass caching for custom sort comparator: Index: FieldSortedHitQueue.java === --- FieldSortedHitQueue.java (revision 654583) +++ FieldSortedHitQueue.java (working copy) @@ -53,7 +53,12 @@ this.fields = new SortField[n]; for (int i=0; in; ++i) { String fieldname = fields[i].getField(); - comparators[i] = getCachedComparator (reader, fieldname, fields[i].getType(), fields[i].getLocale(), fields[i].getFactory()); + + if(fields[i].getFactory() instanceof SortComparatorSourceUncacheable) { // no caching to avoid memory leak +comparators[i] = getComparator (reader, fieldname, fields[i].getType(), fields[i].getLocale(), fields[i].getFactory()); + } else { +comparators[i] = getCachedComparator (reader, fieldname, fields[i].getType(), fields[i].getLocale(), fields[i].getFactory()); + } if (comparators[i].sortType() == SortField.STRING) { this.fields[i] = new SortField (fieldname, fields[i].getLocale(), fields[i].getReverse()); @@ -157,7 +162,18 @@ SortField[] getFields() { return fields; } - + + static ScoreDocComparator getComparator (IndexReader reader, String field, int type, Locale locale, SortComparatorSource factory) +throws IOException { + if (type == SortField.DOC) return ScoreDocComparator.INDEXORDER; + if (type == SortField.SCORE) return ScoreDocComparator.RELEVANCE; + FieldCacheImpl.Entry entry = (factory != null) +? new FieldCacheImpl.Entry (field, factory) +: new FieldCacheImpl.Entry (field, type, locale); + return (ScoreDocComparator)Comparators.createValue(reader, entry); +} + + Otis suggests that I put this in Jira. I 'll attach a patch shortly for review. -Ethan -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-1512) Incorporate GeoHash in contrib/spatial
Incorporate GeoHash in contrib/spatial -- Key: LUCENE-1512 URL: https://issues.apache.org/jira/browse/LUCENE-1512 Project: Lucene - Java Issue Type: New Feature Components: contrib/spatial Reporter: patrick o'leary Priority: Minor Based on comments from Yonik and Ryan in SOLR-773 GeoHash provides the ability to store latitude / longitude values in a single field consistent hash field. Which elements the need to maintain 2 field caches for latitude / longitude fields, reducing the size of an index and the amount of memory needed for a spatial search. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1512) Incorporate GeoHash in contrib/spatial
[ https://issues.apache.org/jira/browse/LUCENE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] patrick o'leary updated LUCENE-1512: Attachment: LUCENE-1512.patch spatial-lucene GeoHash implementation based on http://en.wikipedia.org/wiki/Geohash removable dependency on refactoring in LUCENE-1504 Incorporate GeoHash in contrib/spatial -- Key: LUCENE-1512 URL: https://issues.apache.org/jira/browse/LUCENE-1512 Project: Lucene - Java Issue Type: New Feature Components: contrib/spatial Reporter: patrick o'leary Priority: Minor Attachments: LUCENE-1512.patch Based on comments from Yonik and Ryan in SOLR-773 GeoHash provides the ability to store latitude / longitude values in a single field consistent hash field. Which elements the need to maintain 2 field caches for latitude / longitude fields, reducing the size of an index and the amount of memory needed for a spatial search. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1504) SerialChainFilter should use DocSet API rather then deprecated BitSet API
[ https://issues.apache.org/jira/browse/LUCENE-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] patrick o'leary updated LUCENE-1504: Attachment: LUCENE-1504.patch Changed filter calls from bits to getDocIdSet, the ISerialChainFilter will maintain a method called bits(IndexReader, BitSet) SerialChainFilter should use DocSet API rather then deprecated BitSet API - Key: LUCENE-1504 URL: https://issues.apache.org/jira/browse/LUCENE-1504 Project: Lucene - Java Issue Type: Improvement Components: contrib/spatial Reporter: Ryan McKinley Fix For: 2.9 Attachments: LUCENE-1504.patch, LUCENE-1504.patch From erik's comments in LUCENE-1387 * Maybe the Filter's should be using the DocIdSet API rather than the BitSet deprecated stuff? We can refactor that after being committed I supposed, but not something we want to leave like that. We should also look at moving SerialChainFilter out of the spatial contrib since it is more generally useful then just spatial search. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: LocalLucene or GeoHash for spatial search ?
Hey Marc LocalLucene has been rewritten since then to use a Cartesian grid for it's boundary box look ups http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene_v2.html GeoHash is method of consistent hashing to produce an id where the length of the id gives way to the precision of the point, as in 123ab6789 might be (42.12345, -73.12345) and 123ab would be (42.12, -73.12) It's a great way to store individual points or areas in a compressed format, kind of like a tiny url to a particular point on the globe. Locallucene works differently by placing points within boxes at different zoom levels. At minimum zoom level 0 (_localTier0) everything exists within 1 box, zoom level 1it's 4 boxes zoom level 2 it's 16 boxes . zoom level 15 it's 1,073,741,824 boxes Obviously the index will only contain box id's for the boxes that have points inside them (thus if your indexing only the land mass of the planet, your only going to use at most 30% of those boxes) Based on the radius of your search, locallucene will select the appropriate zoom level to find your results in. So locallucene can benefit from changing our notation for box id's to something similar to geohash to reduce index size, the concept for search is different. A couple of us are looking at including geohash into the locallucene code base, it would make our distance calculation less memory intensive having to load only one field cache for a point rather than the current 2 lat long fields we use, but I have to test the decoding speed to see if it slows us down. GeoHash's main benefit comes in the form of lookup by id, say for an image or tile map at a point or for geocoding. It probably has more benefits than that, and I'm sure someone will correct me on that. I should also warn you, that I'm the guy who wrote locallucene so I have a natural bias towards it, but I'll be honest this is how I see most geo searches working. - P squaro wrote: Hello everybody I would like to have your mind about spatial search techniques using Lucene According to you is it better to use http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene.htm LocalLucene or encoding lat and long with http://geohash.org/ Geohash ( and then use a RangeFilter between the two boundaries hash) ? In my mind I think using geohash should be better because the comparaison is done on one field only. What is your opinion about it ? Best regards Marc -- Patrick O'Leary AOL Local Search Technologies Phone: + 1 703 265 8763 You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. Do you understand this? And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat. - Albert Einstein View Patrick O Leary's profile
[jira] Updated: (LUCENE-1387) Add LocalLucene
[ https://issues.apache.org/jira/browse/LUCENE-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] patrick o'leary updated LUCENE-1387: Attachment: spatial-lucene.zip Latest version of local / spatial lucene with LGPL dependencies removed and working unit tests. The code's only dependency is on JUnit for tests during compilation. All the code's header's should be changed to Apache License as well. Add LocalLucene --- Key: LUCENE-1387 URL: https://issues.apache.org/jira/browse/LUCENE-1387 Project: Lucene - Java Issue Type: New Feature Components: contrib/* Reporter: Grant Ingersoll Priority: Minor Attachments: spatial-lucene.zip, spatial.zip Local Lucene (Geo-search) has been donated to the Lucene project, per https://issues.apache.org/jira/browse/INCUBATOR-77. This issue is to handle the Lucene portion of integration. See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: solr NumberUtils to lucene?
It would be great to get it consistent I cherry picked when someone pointed it out to me Erik Hatcher wrote: My thoughts... bring over any simple functions like these are that are generally useful. At a quick glance, the functions in Solr's NumberUtils are generally useful and fit well in Lucene's NumberTools. What's the harm? Erik On Dec 16, 2008, at 9:14 PM, Ryan McKinley wrote: I posted this same question for the same reasons a while back... http://markmail.org/message/mji7jnpa5xjfflmw I'm looking at local lucene and trying to figure out how it could go into lucene. As is, locallucene depends on solr since it needs NumberUtils. Any change of heart for moving it into lucene? - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- Patrick O'Leary AOL Local Search Technologies Phone: + 1 703 265 8763 You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. Do you understand this? And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat. - Albert Einstein View Patrick O Leary's profile
Re: [Fwd: Re: 2.9, 3.0 and deprecation]
Yes, typo.. long day yesterday Uwe Schindler wrote: I've only read through the jdoc of tier so far, but I'm guessing it's doing a dictionary search and splitting the the index readers position based on the result being less than or greater than upper / lower values. Which may be faster than a TermDocs seek, and certainly worth while investigating. Do you mean JDOC of "Trie" here? Uwe - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- Patrick O'Leary AOL Local Search Technologies Phone: + 1 703 265 8763 You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. Do you understand this? And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat. - Albert Einstein View Patrick O Leary's profile
Re: [Fwd: Re: 2.9, 3.0 and deprecation]
I think we need a more incremental approach, somehow, for StandardTokenizer. Like it does its own internal versioning or something. There have been lots of little cases over time where it needs fixing, yet, it would be a break in back compat to fix them. 11. Fieldable. Ah, Fieldable. I believe this is going to become an abstract base class, or go away. This is a biggie and nobody's stepped up so far to tackle it... I would say don't hold up 2.9 for this. Maybe add these ones: 12. LUCENE-1483 -- running Scorer HitCollector "per segment". We are making good progress here, and uncovering some nice per-query performance wins plus much faster searcher warming (sicne FieldCache is only used per-segment). On the current path it looks likely to deprecate current Field sorting classes, so it'd be great to get this in before 2.9. 13. LUCENE-831 (new FieldCache API). This is long standing and there's a fair amount of interest, and through our iterations with LUCENE-1483 (one of the primary users of the FieldCache API, field sorting) we are getting more clarity on what a new FieldCache API should look like. It'd be nice to resolve before 2.9, and I'd like to spend time doing so (after / with LUCENE-1483). Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- Patrick O'Leary AOL Local Search Technologies Phone: + 1 703 265 8763 You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. Do you understand this? And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat. - Albert Einstein View Patrick O Leary's profile
Re: [Fwd: Re: 2.9, 3.0 and deprecation]
Hi Uwe True it's not a generic solution, but then again I wouldn't really consider geo-search a generic ask. The indexing format for locallucene uses something I call a tier approach, similar to zoom levels in other mapping solutions. Each tier has a separate set of projects or Cartesian id's, and the projection interface has a bestFit function providing you with the optimal tier to search on. A quick explanation with graphics is here: http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene_v2.html The tier level and # of Cartesian id's are proportional thus a tier level 0 has 1 Cartesian id of "0.0" representing all results tier level 1 has 4 possible id's, level 2 has 16 etc., Where # ids = [ (2^tier) ^2] again specialized but on purpose to provide optimal lookup. Intersections and overlapping MBB's can also be done using best fit look up's using the TermDocs seek. I've only read through the jdoc of tier so far, but I'm guessing it's doing a dictionary search and splitting the the index readers position based on the result being less than or greater than upper / lower values. Which may be faster than a TermDocs seek, and certainly worth while investigating. But the ability to reduce your search from a 2 range filter look up to just a minimal set of term seeks, is what will give you better performance. The bestFit function if I remember is designed to give you between 1 and 6 terms to lookup, and for a boundary box search that's all you need. Thanks Patrick Uwe Schindler wrote: Hi Patrick, very interesting approach. In my opinion to compare: A standard RangeFilter is like using a standard relational database with separate lat/lon fields, no index and operator “between”. TrieRangeFilter is the same like adding indexes to the lat/lon fields, which is for most cases enough. Your LocalLucene approach is like using a relational database (e.g. Oracle) that is able to directly handle point/bbox coordinates and index them efficient. Is this correct? The more special the implementation is for the underlying data structure, the faster it is J. The drawback of your solution is, that is too specialized and TrieRangeQuery is optimal for ranges in a wider usage outsie of local queries. How does your implementation behave, when the query hits e.g. half of all documents or somebody selects (-180,-90,180,90) to get all documents? How does it behave with half open ranges, intersections? Uwe - UWE SCHINDLER Webserver/Middleware Development PANGAEA - Publishing Network for Geoscientific and Environmental Data MARUM - University of Bremen Room 2500, Leobener Str., D-28359 Bremen Tel.: +49 421 218 65595 Fax: +49 421 218 65505 http://www.pangaea.de/ E-mail: uschind...@pangaea.de From: patrick o'leary [mailto:polear...@aol.com] Sent: Monday, December 15, 2008 9:14 PM To: java-dev@lucene.apache.org Subject: Re: [Fwd: Re: 2.9, 3.0 and deprecation] Hey Jason o.a.l.s.trie looks interesting and has a lot of potential, locallucene 1.5+ release moved to a Cartesian tier system away from the boundary box filter a while though. A TierRange or RangeFilter as the one I used in v1.0 was a little inefficient as you have to do a bit AND on 2 range look ups e.g. RangeFilter(min-latitude, max-latitude) AND RangeFilter(min-longitude, max-longitude) (I extended the Filter class with an ISerialChainFilter to improve performance) The 1.5+ version of locallucene does it differently, where I pre-generate the bounding shape's Cartesian id's, so all the boxes that make up the overall bounding box, and simply pull the matching doc id's out of the TermEnumerator. Take a look at the CartesianShapeFilter http://locallucene.svn.sourceforge.net/viewvc/locallucene/trunk/locallucene/src/java/com/pjaol/search/geo/utils/CartesianShapeFilter.java?revision=66view=markup This gives you a bounding box lookup of about 3 - 4 ms on a 3 million doc index. Thanks Patrick Sean Timm wrote: Subject: Re: 2.9, 3.0 and deprecation From: "Jason Rutherglen" jason.rutherg...@gmail.com Date: Mon, 15 Dec 2008 12:29:38 -0500 To: java-dev@lucene.apache.org To: java-dev@lucene.apache.org About LocalLucene, it would benefit (be faster) by integrating with TrieRangeQuery for the bounding box filter. On Sun, Dec 14, 2008 at 3:54 AM, Michael McCandless luc...@mikemccandless.com wrote: I'd also personally like to see 2.9 released sooner rather than later, maybe earliesh next year? I don't think we should hold up 2.9 for s
[jira] Commented: (LUCENE-1387) Add LocalLucene
[ https://issues.apache.org/jira/browse/LUCENE-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12633276#action_12633276 ] patrick o'leary commented on LUCENE-1387: - Yeah, the tests numbers are wrong, I'll put together better tests later today for it. It was brought to my attention recently when someone was trying lucene 2.4, I just didn't get around to resolving it. Add LocalLucene --- Key: LUCENE-1387 URL: https://issues.apache.org/jira/browse/LUCENE-1387 Project: Lucene - Java Issue Type: New Feature Components: contrib/* Reporter: Grant Ingersoll Priority: Minor Attachments: spatial.zip Local Lucene (Geo-search) has been donated to the Lucene project, per https://issues.apache.org/jira/browse/INCUBATOR-77. This issue is to handle the Lucene portion of integration. See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]