[jira] Commented: (LUCENE-1815) Geohash encode/decode floating point problems

2009-12-08 Thread patrick o'leary (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787934#action_12787934
 ] 

patrick o'leary commented on LUCENE-1815:
-

What google code are you working with?


 Geohash encode/decode floating point problems
 -

 Key: LUCENE-1815
 URL: https://issues.apache.org/jira/browse/LUCENE-1815
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/spatial
Affects Versions: 2.9
Reporter: Wouter Heijke
Priority: Minor

 i'm finding the Geohash support in the spatial package to be rather 
 unreliable.
 Here is the outcome of a test that encodes/decodes the same lat/lon and 
 geohash a few times.
 the format:
 action geohash=(latitude, longitude)
 the result:
 encode u173zq37x014=(52.3738007,4.8909347)
 decode u173zq37x014=(52.3737996,4.890934)
 encode u173zq37rpbw=(52.3737996,4.890934)
 decode u173zq37rpbw=(52.3737996,4.89093295)
 encode u173zq37qzzy=(52.3737996,4.89093295)
 if I now change to the google code implementation:
 encode u173zq37x014=(52.3738007,4.8909347)
 decode u173zq37x014=(52.37380061298609,4.890934377908707)
 encode u173zq37x014=(52.37380061298609,4.890934377908707)
 decode u173zq37x014=(52.37380061298609,4.890934377908707)
 encode u173zq37x014=(52.37380061298609,4.890934377908707)
 Note the differences between the geohashes in both situations and the 
 lat/lon's!
 Now things get worse if you work on low-precision geohashes:
 decode u173=(52.0,4.0)
 encode u14zg429yy84=(52.0,4.0)
 decode u14zg429yy84=(52.0,3.99)
 encode u14zg429ywx6=(52.0,3.99)
 and google:
 decode u173=(52.20703125,4.5703125)
 encode u173=(52.20703125,4.5703125)
 decode u173=(52.20703125,4.5703125)
 encode u173=(52.20703125,4.5703125)
 We are using geohashes extensively and will now use the google code version 
 unfortunately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1571) DistanceFilter problem with deleted documents

2009-06-14 Thread patrick o'leary (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719272#action_12719272
 ] 

patrick o'leary commented on LUCENE-1571:
-

patch looks good to me, tests passed as well

 DistanceFilter problem with deleted documents
 -

 Key: LUCENE-1571
 URL: https://issues.apache.org/jira/browse/LUCENE-1571
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/spatial
 Environment: N/A
Reporter: Phillip Rhodes
Assignee: Michael McCandless
 Fix For: 2.9

 Attachments: LUCENE-1571.patch


 I know this is the locallucene lib, but wanted to make sure we don't get this 
 bug when it gets into lucene contrib.
 I suspect that the issue is that deleted documents are trying to be evaluated 
 by the filter.  I did some debugging and I confirmed that it is bombing on a 
 document that is marked as deleted (using Luke).
 Thanks!
 Using the locallucene library 1.51, I get a NullPointerException at line 123 
 of DistanceFilter
 The method is public BitSet bits(IndexReader reader) 
 The line is double x = NumberUtils.SortableStr2double(sx);
 The stack trace is:
 java.lang.NullPointerException
   at 
 org.apache.solr.util.NumberUtils.SortableStr2long(NumberUtils.java:149)
   at 
 org.apache.solr.util.NumberUtils.SortableStr2double(NumberUtils.java:104)
   at 
 com.pjaol.search.geo.utils.DistanceFilter.bits(DistanceFilter.java:123)
   at org.apache.lucene.search.Filter.getDocIdSet(Filter.java:49)
   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:140)
   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:112)
   at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:113)
   at org.apache.lucene.search.Hits.init(Hits.java:90)
   at org.apache.lucene.search.Searcher.search(Searcher.java:72)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



ReadOnlyMultiSegmentReader bitset id vs doc id

2009-04-28 Thread patrick o'leary
hey

I've got a filter that's storing document id's with a geo distance for
spatial lucene using a bitset position for doc id,
However with a MultiSegmentReader that's no longer going to working.

What's the most appropriate way to go from bitset position to doc id now?

Thanks
Patrick


Re: ReadOnlyMultiSegmentReader bitset id vs doc id

2009-04-28 Thread patrick o'leary
Think I may have found it, it was multiple runs of the filter, one for each
segment reader, I was generating a new map to hold distances each time. So
only the distances from the
last segment reader were stored.

Currently it looks like those segmented searches are done serially, well in
solr they are-
I presume the end goal is to make them multi-threaded ?
I'll need to make my map synchronized


On Tue, Apr 28, 2009 at 4:42 PM, Uwe Schindler u...@thetaphi.de wrote:

  What is the problem exactly? Maybe you use the new Collector API, where
 the search is done for each segment, so caching does not work correctly?



 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
   --

 *From:* patrick o'leary [mailto:pj...@pjaol.com]
 *Sent:* Tuesday, April 28, 2009 10:31 PM
 *To:* java-dev@lucene.apache.org
 *Subject:* ReadOnlyMultiSegmentReader bitset id vs doc id



 hey

 I've got a filter that's storing document id's with a geo distance for
 spatial lucene using a bitset position for doc id,
 However with a MultiSegmentReader that's no longer going to working.

 What's the most appropriate way to go from bitset position to doc id now?

 Thanks
 Patrick



Re: ReadOnlyMultiSegmentReader bitset id vs doc id

2009-04-28 Thread patrick o'leary
Ok finally with some pointers from Ryan, figured out the last problem.
So as a note to anyone else who might encounter the same problems with
multireader

A) Directories can contain multiple segments and a reader for those segments
B) Searches are replayed within each reader in a serial fashion **
C) If utilizing FieldCache / BitSet or anything related to document position
within a reader, and you need docId
   -- document id = (sum of previous reader maxdocs )+ bitset position

e.g.
int offset;
int nextOffset;

public DocIdSet getDocIdSet(IndexReader reader) {

   OpenBitSet bitset = new OpenBitSet(reader.maxDoc());
   offset += reader.maxDoc();
   for (int i =0; i reader.maxDoc(); i++)  {
.
 filter stuff 

if ( good ) {
   bitset.set( i );

   int docId = i + nextOffset;
   ...
}
   }

  nextOffset += offset;
  ...
}


K, works time for sleep

P


On Tue, Apr 28, 2009 at 5:44 PM, patrick o'leary pj...@pjaol.com wrote:

 Think I may have found it, it was multiple runs of the filter, one for each
 segment reader, I was generating a new map to hold distances each time. So
 only the distances from the
 last segment reader were stored.

 Currently it looks like those segmented searches are done serially, well in
 solr they are-
 I presume the end goal is to make them multi-threaded ?
 I'll need to make my map synchronized


 On Tue, Apr 28, 2009 at 4:42 PM, Uwe Schindler u...@thetaphi.de wrote:

  What is the problem exactly? Maybe you use the new Collector API, where
 the search is done for each segment, so caching does not work correctly?



 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
   --

 *From:* patrick o'leary [mailto:pj...@pjaol.com]
 *Sent:* Tuesday, April 28, 2009 10:31 PM
 *To:* java-dev@lucene.apache.org
 *Subject:* ReadOnlyMultiSegmentReader bitset id vs doc id



 hey

 I've got a filter that's storing document id's with a geo distance for
 spatial lucene using a bitset position for doc id,
 However with a MultiSegmentReader that's no longer going to working.

 What's the most appropriate way to go from bitset position to doc id now?

 Thanks
 Patrick





Re: Spatial package plans

2009-04-22 Thread patrick o'leary
Free world, help yourself :-)

On Wed, Apr 22, 2009 at 6:39 PM, Wouter Heijke whei...@xs4all.nl wrote:

 The amount of replies and the state of the code make me think making my
 own distance filter using a real GIS solution like geotools is the way to
 go.
 I wonder anyway if GIS code should be in any Lucene package..

 Wouter

  Yeah it's hard coded to use miles, 5 years in the US gets to you..
  But the functionality doesn't change radius is double so you just need to
  convert km to miles
  for the DistanceQueryBuilder and just convert back from miles to km to
  display.
 
  On Mon, Apr 20, 2009 at 8:14 AM, Wouter Heijke whei...@xs4all.nl
 wrote:
 
 
  I'm working on local search functionality and am about to use the
  spatial
  code in contrib.
  I managed to have a proof of concept running using
  LatLongDistanceFilter.
  The only problem I have with this filter is that it is hardcoded to use
  Miles!
 
  Basically my question is what are the plans for the spatial code? Is it
  going to stay the way it is?
 
  Wouter
 



 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org




Re: Spatial package plans

2009-04-20 Thread patrick o'leary
Yeah it's hard coded to use miles, 5 years in the US gets to you..
But the functionality doesn't change radius is double so you just need to
convert km to miles
for the DistanceQueryBuilder and just convert back from miles to km to
display.

On Mon, Apr 20, 2009 at 8:14 AM, Wouter Heijke whei...@xs4all.nl wrote:


 I'm working on local search functionality and am about to use the spatial
 code in contrib.
 I managed to have a proof of concept running using LatLongDistanceFilter.
 The only problem I have with this filter is that it is hardcoded to use
 Miles!

 Basically my question is what are the plans for the spatial code? Is it
 going to stay the way it is?

 Wouter


 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org




[jira] Updated: (LUCENE-1588) Update Spatial Lucene sort to use FieldComparatorSource

2009-04-06 Thread patrick o'leary (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

patrick o'leary updated LUCENE-1588:


Attachment: LUCENE-1588.patch

Deprecate DistanceSortSource and Add DistanceFieldComparator
updated Test case to use DistanceFieldComparator

Usage
{code}
// Create a distance sort
// As the radius filter has performed the distance calculations
// already, pass in the filter to reuse the results.
// 
DistanceFieldComparatorSource dsort = new 
DistanceFieldComparatorSource(dq.distanceFilter);
Sort sort = new Sort(new SortField(foo, dsort,false));

// Perform the search, using the term query, the serial chain filter, and the
// distance sort
Hits hits = searcher.search(customScore, dq.getFilter(),sort);
{code}

If nobody objects I'll apply this later today

 Update Spatial Lucene sort to use FieldComparatorSource
 ---

 Key: LUCENE-1588
 URL: https://issues.apache.org/jira/browse/LUCENE-1588
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/spatial
Affects Versions: 2.9
Reporter: patrick o'leary
Assignee: patrick o'leary
Priority: Trivial
 Fix For: 2.9

 Attachments: LUCENE-1588.patch


 Update distance sorting to use FieldComparator sorting as opposed to 
 SortComparator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: List Moderators

2009-03-26 Thread patrick o'leary
Is it also worth while to check if a static signature can be added to mails
with instructions
Or a link to the apache mail instructions?
It will reduce a lot of repeat questions.



On Thu, Mar 26, 2009 at 2:46 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : Every now and again, someone emails me off list asking to be removed from
 the
 : list and I always forward them to Erik, b/c I know he is a moderator.
 : However, I was wondering who else is besides Erik, since, AIUI, there
 needs to
 : be at least 3 in ASF-land, right?
 :
 : So, if you're a list moderator for dev/user, please stand up.

 the docs for say committers have instructions for checking the moderators
 for any list, however the process seems to no longer work (probably
 because mail handling got moved onto a different box)...

 http://www.apache.org/dev/committers.html#mailing-list-moderators
 https://svn.apache.org/repos/private/committers/docs/resources.txt

 ...might be worth following up with INFRA to sanity check the list of
 moderators on all lucene lists, make sure we have three *active*
 moderators on each list.


 -Hoss


 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org




[jira] Created: (LUCENE-1568) Fix for NPE's in Spatial Lucene for searching bounding box only

2009-03-20 Thread patrick o'leary (JIRA)
Fix for NPE's in Spatial Lucene for searching bounding box only
---

 Key: LUCENE-1568
 URL: https://issues.apache.org/jira/browse/LUCENE-1568
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/spatial
Reporter: patrick o'leary
Assignee: patrick o'leary
Priority: Minor


NPE occurs when using DistanceQueryBuilder for minimal bounding box search 
without the distance filter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1568) Fix for NPE's in Spatial Lucene for searching bounding box only

2009-03-20 Thread patrick o'leary (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

patrick o'leary updated LUCENE-1568:


Attachment: LUCENE-1568.patch

Fixes an NPE when using DistanceQueryBuilder for just minimal bounding box 
searches
e.g.
{code}
final DistanceQueryBuilder dq = new DistanceQueryBuilder(
 latitude, longitude,
 radius,
 latField, //name of 
latitude field in index
 lngField, //name of 
longitude field in index
 tierPrefix, // prefix 
of tier fields in index
 false  /*filter by 
radius, false means mbb search */ );
{code}

 Fix for NPE's in Spatial Lucene for searching bounding box only
 ---

 Key: LUCENE-1568
 URL: https://issues.apache.org/jira/browse/LUCENE-1568
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/spatial
Reporter: patrick o'leary
Assignee: patrick o'leary
Priority: Minor
 Attachments: LUCENE-1568.patch


 NPE occurs when using DistanceQueryBuilder for minimal bounding box search 
 without the distance filter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1568) Fix for NPE's in Spatial Lucene for searching bounding box only

2009-03-20 Thread patrick o'leary (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683951#action_12683951
 ] 

patrick o'leary commented on LUCENE-1568:
-

If nobody objects I'll commit this later today

 Fix for NPE's in Spatial Lucene for searching bounding box only
 ---

 Key: LUCENE-1568
 URL: https://issues.apache.org/jira/browse/LUCENE-1568
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/spatial
Reporter: patrick o'leary
Assignee: patrick o'leary
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1568.patch


 NPE occurs when using DistanceQueryBuilder for minimal bounding box search 
 without the distance filter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1568) Fix for NPE's in Spatial Lucene for searching bounding box only

2009-03-20 Thread patrick o'leary (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

patrick o'leary closed LUCENE-1568.
---


resolved

 Fix for NPE's in Spatial Lucene for searching bounding box only
 ---

 Key: LUCENE-1568
 URL: https://issues.apache.org/jira/browse/LUCENE-1568
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/spatial
Reporter: patrick o'leary
Assignee: patrick o'leary
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1568.patch


 NPE occurs when using DistanceQueryBuilder for minimal bounding box search 
 without the distance filter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Committed revision 735928.

2009-01-19 Thread patrick o'leary
Committed revision 735928.
Adding myself to contrib committers list / testing karma

Thanks
Patrick

scootie:site pjaol$ svn diff docs/*.html

Index: docs/whoweare.html

===

--- docs/whoweare.html (revision 735927)

+++ docs/whoweare.html (working copy)

@@ -285,6 +285,9 @@

 bWolfgang Hoschek/b (whosc...@...)/li



 li

+bPatrick O'Leary/b (pj...@...)/li

+

+li

 bUwe Schindler/b (uschind...@...)/li



 li

@@ -300,7 +303,7 @@

 /div





-a name=N10087/aa name=emeritus/a

+a name=N1008C/aa name=emeritus/a

 h2 class=boxedEmeritus Committers/h2

 div class=section

 ul


scootie:site pjaol$ svn diff src/documentation/content/xdocs/whoweare.xml

Index: src/documentation/content/xdocs/whoweare.xml

===

--- src/documentation/content/xdocs/whoweare.xml (revision 735927)

+++ src/documentation/content/xdocs/whoweare.xml (working copy)

@@ -31,6 +31,7 @@

 section id=contribtitleContrib Committers/title

 ul

 libWolfgang Hoschek/b (whosc...@...)/li

+libPatrick O'Leary/b (pj...@...)/li

 libUwe Schindler/b (uschind...@...)/li

 libAndi Vajda/b (va...@...)/li

 libKarl Wettin/b (ka...@...)/li


[jira] Updated: (LUCENE-1512) Incorporate GeoHash in contrib/spatial

2009-01-07 Thread patrick o'leary (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

patrick o'leary updated LUCENE-1512:


Attachment: LUCENE-1512.patch

Made necessary changes 
* Formatting fixed
* Removed dependency on LUCENE-1504
* Moved GeoHash elements into o.a.l.spatial.geohash


 Incorporate GeoHash in contrib/spatial
 --

 Key: LUCENE-1512
 URL: https://issues.apache.org/jira/browse/LUCENE-1512
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/spatial
Reporter: patrick o'leary
Priority: Minor
 Attachments: LUCENE-1512.patch, LUCENE-1512.patch


 Based on comments from Yonik and Ryan in SOLR-773 
 GeoHash provides the ability to store latitude / longitude values in a single 
 field consistent hash field.
 Which elements the need to maintain 2 field caches for latitude / longitude 
 fields, reducing the size of an index
 and the amount of memory needed for a spatial search.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1304) Memory Leak when using Custom Sort (i.e., DistanceSortSource) of LocalLucene with Lucene

2009-01-06 Thread patrick o'leary (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12661199#action_12661199
 ] 

patrick o'leary commented on LUCENE-1304:
-

How will LUCENE-1483 impact this immediately?
I'd really like to get this patch in first and refactor if and when 1483 goes 
in, the benefit of bypassing static comparator is
really needed. 

 Memory Leak when using Custom Sort (i.e., DistanceSortSource) of LocalLucene 
 with Lucene
 

 Key: LUCENE-1304
 URL: https://issues.apache.org/jira/browse/LUCENE-1304
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.3
 Environment: Windows/JDK 1.6
Reporter: Ethan Tao
 Attachments: LUCENE-1304.patch


 We had the memory leak issue when using DistanceSortSource of LocalLucene for 
 repeated query/search. In about 450 queries, we are experiencing out of 
 memory error. After dig in the code, we found the problem source is coming 
 from Lucene package, the way how it handles custom type comparator. Lucene 
 internally caches all created comparators. In the case of query using 
 LocalLucene, we create new comparator for every search due to different 
 lon/lat and query terms. This causes major memory leak as the cached 
 comparators are also holding memory for other large objects (e.g., bit sets). 
 The solution we came up with: ( the proposed change from Lucene is 1 and 3 
 below)
 1.In Lucene package, create new file SortComparatorSourceUncacheable.java:
 package org.apache.lucene.search;
 import org.apache.lucene.index.IndexReader;
 import java.io.IOException;
 import java.io.Serializable;
 public interface SortComparatorSourceUncacheable extends Serializable {
 }
 2.Have your custom sort class to implement the interface
 public class LocalSortSource extends DistanceSortSource implements 
 SortComparatorSourceUncacheable {
 ...
 }
 3.Modify Lucene's FieldSorterHitQueue.java to bypass caching for custom 
 sort comparator:
 Index: FieldSortedHitQueue.java
 ===
 --- FieldSortedHitQueue.java (revision 654583)
 +++ FieldSortedHitQueue.java  (working copy)
 @@ -53,7 +53,12 @@
  this.fields = new SortField[n];
  for (int i=0; in; ++i) {
String fieldname = fields[i].getField();
 -  comparators[i] = getCachedComparator (reader, fieldname, 
 fields[i].getType(), fields[i].getLocale(), fields[i].getFactory());
 +
 +  if(fields[i].getFactory() instanceof SortComparatorSourceUncacheable) 
 { // no caching to avoid memory leak
 +comparators[i] = getComparator (reader, fieldname, 
 fields[i].getType(), fields[i].getLocale(), fields[i].getFactory());
 +  } else {
 +comparators[i] = getCachedComparator (reader, fieldname, 
 fields[i].getType(), fields[i].getLocale(), fields[i].getFactory());
 +  }

if (comparators[i].sortType() == SortField.STRING) {
   this.fields[i] = new SortField (fieldname, 
 fields[i].getLocale(), fields[i].getReverse());
 @@ -157,7 +162,18 @@
SortField[] getFields() {
  return fields;
}
 -  
 +
 +  static ScoreDocComparator getComparator (IndexReader reader, String field, 
 int type, Locale locale, SortComparatorSource factory)
 +throws IOException {
 +  if (type == SortField.DOC) return ScoreDocComparator.INDEXORDER;
 +  if (type == SortField.SCORE) return ScoreDocComparator.RELEVANCE;
 +  FieldCacheImpl.Entry entry = (factory != null)
 +? new FieldCacheImpl.Entry (field, factory)
 +: new FieldCacheImpl.Entry (field, type, locale);
 +  return (ScoreDocComparator)Comparators.createValue(reader, entry);
 +}
 +
 +
 Otis suggests that I put this in Jira. I 'll attach a patch shortly for 
 review. 
 -Ethan

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1512) Incorporate GeoHash in contrib/spatial

2009-01-06 Thread patrick o'leary (JIRA)
Incorporate GeoHash in contrib/spatial
--

 Key: LUCENE-1512
 URL: https://issues.apache.org/jira/browse/LUCENE-1512
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/spatial
Reporter: patrick o'leary
Priority: Minor


Based on comments from Yonik and Ryan in SOLR-773 
GeoHash provides the ability to store latitude / longitude values in a single 
field consistent hash field.
Which elements the need to maintain 2 field caches for latitude / longitude 
fields, reducing the size of an index
and the amount of memory needed for a spatial search.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1512) Incorporate GeoHash in contrib/spatial

2009-01-06 Thread patrick o'leary (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

patrick o'leary updated LUCENE-1512:


Attachment: LUCENE-1512.patch

spatial-lucene GeoHash implementation based on 
http://en.wikipedia.org/wiki/Geohash
removable dependency on refactoring in LUCENE-1504

 Incorporate GeoHash in contrib/spatial
 --

 Key: LUCENE-1512
 URL: https://issues.apache.org/jira/browse/LUCENE-1512
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/spatial
Reporter: patrick o'leary
Priority: Minor
 Attachments: LUCENE-1512.patch


 Based on comments from Yonik and Ryan in SOLR-773 
 GeoHash provides the ability to store latitude / longitude values in a single 
 field consistent hash field.
 Which elements the need to maintain 2 field caches for latitude / longitude 
 fields, reducing the size of an index
 and the amount of memory needed for a spatial search.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1504) SerialChainFilter should use DocSet API rather then deprecated BitSet API

2009-01-05 Thread patrick o'leary (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

patrick o'leary updated LUCENE-1504:


Attachment: LUCENE-1504.patch

Changed filter calls from bits to getDocIdSet, the ISerialChainFilter will 
maintain
a method called bits(IndexReader, BitSet)


 SerialChainFilter should use DocSet API rather then deprecated BitSet API
 -

 Key: LUCENE-1504
 URL: https://issues.apache.org/jira/browse/LUCENE-1504
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/spatial
Reporter: Ryan McKinley
 Fix For: 2.9

 Attachments: LUCENE-1504.patch, LUCENE-1504.patch


 From erik's comments in LUCENE-1387
 * Maybe the Filter's should be using the DocIdSet API rather than the 
 BitSet deprecated stuff? We can refactor that after being committed I 
 supposed, but not something we want to leave like that.
 We should also look at moving SerialChainFilter out of the spatial contrib 
 since it is more generally useful then just spatial search.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: LocalLucene or GeoHash for spatial search ?

2008-12-29 Thread patrick o'leary




Hey Marc

LocalLucene has been rewritten since then to use a Cartesian grid for
it's boundary box look ups
http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene_v2.html

GeoHash is method of consistent hashing to produce an id where the
length of the id
gives way to the precision of the point, as in 123ab6789 might be
(42.12345, -73.12345)
and 123ab would be (42.12, -73.12)

It's a great way to store individual points or areas in a compressed
format, kind of like a tiny url to a particular point on the globe.

Locallucene works differently by placing points within boxes at
different zoom levels.
At minimum zoom level 0 (_localTier0) everything exists within 1 box, 
zoom level 1it's 4 boxes
zoom level 2 it's 16 boxes
.
zoom level 15 it's 1,073,741,824 boxes

Obviously the index will only contain box id's for the boxes that have
points inside them (thus if your indexing only
the land mass of the planet, your only going to use at most 30% of
those boxes)

Based on the radius of your search, locallucene will select the
appropriate zoom level to find your results in.

So locallucene can benefit from changing our notation for box id's to
something similar to geohash to reduce index size,
the concept for search is different. A couple of us are looking at
including geohash into the locallucene code base, it would make
our distance calculation less memory intensive having to load only one
field cache for a point rather than the current 2 lat  long
fields we use, but I have to test the decoding speed to see if it slows
us down.

GeoHash's main benefit comes in the form of lookup by id, say for an
image or tile map at a point or for geocoding.
It probably has more benefits than that, and I'm sure someone will
correct me on that.

I should also warn you, that I'm the guy who wrote locallucene so I
have a natural bias towards it, but I'll be honest this is how I see
most geo searches working. 

- P

squaro wrote:

  Hello everybody

I would like to have your mind about spatial search techniques using Lucene

According to you is it better to use 
http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene.htm
LocalLucene  or encoding lat and long with  http://geohash.org/ Geohash  (
and then use a RangeFilter between the two boundaries hash) ?

In my mind I think using geohash should be better because the comparaison is
done on one field only.

What is your opinion about it ?

Best regards

Marc
  


-- 
Patrick O'Leary

AOL Local Search Technologies
Phone: + 1 703 265 8763

You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles.
 Do you understand this? 
And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat.
  - Albert Einstein

View
Patrick O Leary's profile





[jira] Updated: (LUCENE-1387) Add LocalLucene

2008-12-18 Thread patrick o'leary (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

patrick o'leary updated LUCENE-1387:


Attachment: spatial-lucene.zip

Latest version of local / spatial lucene with LGPL dependencies removed
and working unit tests. The code's only dependency is on JUnit for tests during 
compilation.

All the code's header's should be changed to Apache License as well.

 Add LocalLucene
 ---

 Key: LUCENE-1387
 URL: https://issues.apache.org/jira/browse/LUCENE-1387
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/*
Reporter: Grant Ingersoll
Priority: Minor
 Attachments: spatial-lucene.zip, spatial.zip


 Local Lucene (Geo-search) has been donated to the Lucene project, per 
 https://issues.apache.org/jira/browse/INCUBATOR-77.  This issue is to handle 
 the Lucene portion of integration.
 See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: solr NumberUtils to lucene?

2008-12-17 Thread patrick o'leary




It would be great to get it consistent I cherry picked when someone
pointed it out to me

Erik Hatcher wrote:
My thoughts... bring over any simple functions like these
are that are generally useful. At a quick glance, the functions in
Solr's NumberUtils are generally useful and fit well in Lucene's
NumberTools. What's the harm?
  
  
 Erik
  
  
On Dec 16, 2008, at 9:14 PM, Ryan McKinley wrote:
  
  
  I posted this same question for the same
reasons a while back...

http://markmail.org/message/mji7jnpa5xjfflmw


I'm looking at local lucene and trying to figure out how it could go
into lucene. As is, locallucene depends on solr since it needs
NumberUtils.


Any change of heart for moving it into lucene?


-

To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org

For additional commands, e-mail: java-dev-h...@lucene.apache.org

  
  
  
-
  
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
  
For additional commands, e-mail: java-dev-h...@lucene.apache.org
  
  


-- 
Patrick O'Leary

AOL Local Search Technologies
Phone: + 1 703 265 8763

You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles.
 Do you understand this? 
And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat.
  - Albert Einstein

View
Patrick O Leary's profile





Re: [Fwd: Re: 2.9, 3.0 and deprecation]

2008-12-16 Thread patrick o'leary




Yes, typo..  long day yesterday

Uwe Schindler wrote:

  
I've only read through the jdoc of tier so far, but I'm guessing it's
doing a dictionary search and splitting the the index readers position
based on the result being less than or greater than upper / lower values.
Which may be faster than a TermDocs seek, and certainly
worth while investigating.

  
  
Do you mean JDOC of "Trie" here?

Uwe


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org


  


-- 
Patrick O'Leary

AOL Local Search Technologies
Phone: + 1 703 265 8763

You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles.
 Do you understand this? 
And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat.
  - Albert Einstein

View
Patrick O Leary's profile





Re: [Fwd: Re: 2.9, 3.0 and deprecation]

2008-12-15 Thread patrick o'leary
  

I think we need a more incremental approach, somehow, for
StandardTokenizer. Like it does its own internal versioning or
something. There have been lots of little cases over time where it
needs fixing, yet, it would be a break in back compat to fix them.



11. Fieldable. Ah, Fieldable. I believe this is going to become an
abstract base class, or go away.



This is a biggie and nobody's stepped up so far to tackle it... I would
say don't hold up 2.9 for this.


Maybe add these ones:

12. LUCENE-1483 -- running Scorer  HitCollector "per segment". We
are making good progress here, and uncovering some nice per-query
performance wins plus much faster searcher warming (sicne FieldCache is
only used per-segment). On the current path it looks likely to
deprecate current Field sorting classes, so it'd be great to get this
in before 2.9.

13. LUCENE-831 (new FieldCache API). This is long standing and there's
a fair amount of interest, and through our iterations with LUCENE-1483
(one of the primary users of the FieldCache API, field sorting) we are
getting more clarity on what a new FieldCache API should look like.
It'd be nice to resolve before 2.9, and I'd like to spend time doing
so (after / with LUCENE-1483).

Mike



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

    

  
  
  



-- 
Patrick O'Leary

AOL Local Search Technologies
Phone: + 1 703 265 8763

You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles.
 Do you understand this? 
And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat.
  - Albert Einstein

View
Patrick O Leary's profile





Re: [Fwd: Re: 2.9, 3.0 and deprecation]

2008-12-15 Thread patrick o'leary




Hi Uwe

True it's not a generic solution, but then again I wouldn't really
consider geo-search a generic ask.
The indexing format for locallucene uses something I call a tier
approach, similar to zoom levels in other mapping solutions.

Each tier has a separate set of projects or Cartesian id's, and the
projection interface has a bestFit function providing you with the
optimal
tier to search on.

A quick explanation with graphics is here:
http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene_v2.html

The tier level and # of Cartesian id's are proportional thus a tier
level 0 has 1 Cartesian id of "0.0" representing all results
tier level 1 has 4 possible id's, level 2 has 16 etc., 
Where # ids = [ (2^tier) ^2]  again specialized but on purpose to
provide optimal lookup.

Intersections and overlapping MBB's can also be done using best fit
look up's using the TermDocs seek.

I've only read through the jdoc of tier so far, but I'm guessing it's
doing a dictionary search and splitting the the index readers position
based on the result being less than or greater than upper / lower
values. Which may be faster than a TermDocs seek, and certainly
worth while investigating.

But the ability to reduce your search from a 2 range filter look up to
just a minimal set of term seeks, is what will give you better
performance.
The bestFit function if I remember is designed to give you between 1
and 6 terms to lookup, and for a boundary box search that's all you
need.

Thanks
Patrick


Uwe Schindler wrote:

  
  

  
  

  
  
  
  Hi Patrick,
   
  very
interesting
approach. In my opinion to compare:
   
  A
standard RangeFilter is
like using a standard relational database with separate lat/lon fields,
no
index and operator “between”. TrieRangeFilter is the same like
adding indexes to the lat/lon fields, which is for most cases enough.
Your
LocalLucene approach is like using a relational database (e.g. Oracle)
that is able
to directly handle point/bbox coordinates and index them efficient.
   
  Is
this correct? The more
special the implementation is for the underlying data structure, the
faster it is
  J.
The drawback of your solution is, that is too
specialized and TrieRangeQuery is optimal for ranges in a wider usage
outsie of
local queries.
  How
does your
implementation behave, when the query hits e.g. half of all documents
or
somebody selects (-180,-90,180,90) to get all documents? How does it
behave
with half open ranges, intersections?
   
  Uwe
  
  
  -
UWE SCHINDLER
Webserver/Middleware Development
PANGAEA - Publishing Network for Geoscientific and Environmental Data
MARUM - University
of Bremen
Room 2500, Leobener Str., D-28359 Bremen
Tel.: +49 421 218 65595
Fax:  +49 421 218 65505
  http://www.pangaea.de/
E-mail: uschind...@pangaea.de
  
  
  
  
  
  
  From:
patrick o'leary [mailto:polear...@aol.com] 
  Sent: Monday, December
15, 2008
9:14 PM
  To: java-dev@lucene.apache.org
  Subject: Re: [Fwd: Re:
2.9, 3.0
and deprecation]
  
   
  Hey Jason
  
o.a.l.s.trie looks interesting and has a lot of potential, locallucene
1.5+
release moved to a Cartesian tier system away from
the boundary box filter a while though.
  
A TierRange or RangeFilter as the one I used in v1.0 was a little
inefficient
as you have to do a bit AND on 2 range look ups
e.g.
  
RangeFilter(min-latitude, max-latitude)  AND
RangeFilter(min-longitude,  max-longitude)
(I extended the Filter class with an ISerialChainFilter to improve
performance)
  
The 1.5+ version of locallucene does it differently, where I
pre-generate the
bounding shape's Cartesian id's, so all the boxes that make up
the overall bounding box, and simply pull the matching doc id's out of
the
TermEnumerator.
  
Take a look at the CartesianShapeFilter http://locallucene.svn.sourceforge.net/viewvc/locallucene/trunk/locallucene/src/java/com/pjaol/search/geo/utils/CartesianShapeFilter.java?revision=66view=markup
  
This gives you a bounding box lookup of about 3 - 4 ms on a 3 million
doc
index.
  
Thanks
Patrick
  
Sean Timm wrote:
  
  
  
  
   
  

  


Subject: 

Re: 2.9, 3.0 and deprecation

  
  


From: 

"Jason Rutherglen" jason.rutherg...@gmail.com

  
  


Date: 

Mon, 15 Dec 2008 12:29:38 -0500

  
  


To: 

java-dev@lucene.apache.org

  

  
   
  

  


To: 

java-dev@lucene.apache.org

  

  
  
About LocalLucene, it would benefit (be faster) by integrating with
TrieRangeQuery for the bounding box filter.
  
  On Sun, Dec 14, 2008 at 3:54
AM, Michael McCandless luc...@mikemccandless.com
wrote:
  I'd also personally like to
see 2.9 released sooner
rather than later,
maybe earliesh next year?
  
I don't think we should hold up 2.9 for s

[jira] Commented: (LUCENE-1387) Add LocalLucene

2008-09-22 Thread patrick o'leary (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12633276#action_12633276
 ] 

patrick o'leary commented on LUCENE-1387:
-

Yeah, the tests numbers are wrong, I'll put together better tests later today 
for it.
It was brought to my attention recently when someone was trying lucene 2.4, I 
just didn't get around to resolving it.



 Add LocalLucene
 ---

 Key: LUCENE-1387
 URL: https://issues.apache.org/jira/browse/LUCENE-1387
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/*
Reporter: Grant Ingersoll
Priority: Minor
 Attachments: spatial.zip


 Local Lucene (Geo-search) has been donated to the Lucene project, per 
 https://issues.apache.org/jira/browse/INCUBATOR-77.  This issue is to handle 
 the Lucene portion of integration.
 See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]