Solr GeoHash Field (Solr 4.5)

2014-05-29 Thread Chris Atkinson
Hi,

I've been reading up a lot on what David has written about GeoHash fields
and would like to use them.

I'm trying to create a nice way to display cluster counts of geo points on
a google map. It's naturally not going to be possible to send 40k marker
information over the wire to cluster... so figured GeoHash would be
perfect.

I'm running Solr 4.5. I've seen this.. https://github.com/dsmiley/SOLR-2155
Would this be what I use? It looks like it's really old, and I noticed that
there is now a "solr.GeoHash" core field...

However, if I check the documentation at this page
https://wiki.apache.org/solr/SpatialSearchDev

Solr includes a the field type "solr.GeoHashField" but it unfortunately
> doesn't realize any of the intrinsic properties of the geohash to its
> advantage. *You shouldn't use it.* Instead, check out
> http://wiki.apache.org/solr/SpatialSearch#SOLR-2155. The main feature is
> multi-valued field support.

 Does this mean that there isn't any way to use GeoHash with my version of
Solr?

Should I just implement a multi value field andadd all of the multi value
fields myself?

(Also, can you confirm that for doing clustering, I'm on the right track
for using GeoHash. I don't need anything perfect. I just want to be able to
break up the markers into groups).

Thanks


Re: Geo spatial clustering of points

2013-08-21 Thread Chris Atkinson
Did you get any resolution for this? I'm about to implement something
identical.
On 3 Jul 2013 23:03, "Jeroen Steggink"  wrote:

> Hi,
>
> I'm looking for a way to clustering (or should I call it group) geo
> spatial points on map based on the current zoom level and get the median
> coordinate for that cluster.
> Let's say I'm on the world level, and I want to cluster spatial points
> within a 1000 km radius. When I zoom in I only want to get the clustered
> points for that boundary. Let's say all the points within the US and
> cluster them within a 500 km radius.
>
> I'm using Solr 4.3.0 and looked into SpatialRecursivePrefixTreeFiel**dType
> with faceting. However, I'm not sure if the geohashes are of any use for
> clustering points.
>
> Does anyone have any experience with geo spatial clustering with Solr?
>
> Regards,
>
> jeroen
>
>
>


Re: SpatialRecursivePrefixTreeFieldType Spatial Searching

2013-06-05 Thread Chris Atkinson
Everything is working great now.
Thanks David


On Wed, Jun 5, 2013 at 12:07 AM, David Smiley (@MITRE.org) <
dsmi...@mitre.org> wrote:

> maxDistErr should be like 0.3 based on earlier parts of this discussion
> since
> your data is to one of a couple hours of the day, not whole days.  If it
> was
> whole days, you would use 1.  Changing this requires a re-index.  So does
> changing worldBounds if you do so.
> distErrPct should be 0.  Changing it does not require a re-index because
> you
> are indexing points, not other shapes.  This only affects other shapes.
>
> Speaking of that slight buffer to the query shape I said in my last email,
> it should be < half of maxDistErr, whatever you set that to.  So use like
> 0.1.
>
> ~ David
>
>
> Chris Atkinson wrote
> > Hi David,
> > Thanks for your continued help.
> >
> > I think that you have nailed it on the head for me. I'm 100% sure that I
> > had previously tried that query without success. I'm not sure if perhaps
> I
> > had wrong  distErrPct or  maxDistErr values...
> > It's getting late, so I'm going to call it a night (I'm on GMT), but I'll
> > put your example into practice tomorrow and get confirmation that it's
> > working as expected.
> >
> > I'll keep playing around with the distErrPct values as well.
> > Do I need to do a reindex if I change these values? (I think yes?)
> >
> >
> > On Tue, Jun 4, 2013 at 10:44 PM, Smiley, David W. <
>
> > dsmiley@
>
> > > wrote:
> >
> >> So "availability" is the absence of any other document's indexed time
> >> duration overlapping with your availability query duration.  So I think
> >> you should negate an overlaps query.  The overlaps query looks like:
> >> Intersects(-Inf start end Inf).  And remember the slight buffering
> needed
> >> as described on the wiki.  You'd add a small fraction to the start time
> >> and subtract a small fraction from the end time, so that you don't
> >> accidentally match a document that is adjacent.
> >>
> >> -availability_spatial:"Intersects( 0 30.5 114.5 3650 )"
> >>
> >> Does that work against your data?  If it doesn't, can you conjecture why
> >> it doesn't work based on a sample point in a document that it matched,
> or
> >> a document that should have matched but didn't?
> >>
> >> ~ David
> >>
> >> On 6/4/13 3:31 PM, "Chris Atkinson" <
>
> > chrisacky@
>
> > > wrote:
> >>
> >> >Here is an example I have tried.
> >> >
> >> >So let's assume that I want to checkIn on the 30th day, and leave on
> the
> >> >115th day.
> >> >
> >> >My query would be:
> >> >
> >> >-availability_spatial:"Intersects(   30 0  3650 115 )"
> >> >
> >> >However, that wouldn't match anything. Here is an example document
> below
> >> >so
> >> >you can see. (I've not negated the spatial field in the filter query so
> >> >you
> >> >can see the field coordinates)
> >> >
> >> >In case the formatting is bad: See here
> >> >
> >> >http://pastie.org/pastes/8006249/text
> >> >
> >> >
> >> >
> >> >
> > 
> >
> >  >>
> >  >name="responseHeader"
> >> >>
> > 
> > 0
> > 
> >
> > 
> > 1
> > 
> >
> >  >>
> >  >>name="params"> <
> >> >str name="fl">availability_spatial
> > 
> >
> > 
> > true
> > 
> >> >
> >  >>
> >  >name="q">id:38197
> > 
> >
> > 
> > 1370374172298
> > 
> >
> > 
> >> >xml
> > 
> >
> > 
> > availability_spatial:"Intersects( 30 0 3650 115
> >> >)"
> >> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >> >
> > 
> >  <
> >> >arr name="availability_spatial">
> > 
> > 147.6 163.4
> > 
> >
> > 
> > 164.6 178.4
> >  >>
> >  >str>
> > 
> > 192.6 220.4
> > 
> >
> > 
> > 241.6 264.4
> > 
> >
> > 
> > 
> >
> > 
> >> >
> >  >>
> >  >response>
> >> >
>

Re: SpatialRecursivePrefixTreeFieldType Spatial Searching

2013-06-04 Thread Chris Atkinson
Hi David,
Thanks for your continued help.

I think that you have nailed it on the head for me. I'm 100% sure that I
had previously tried that query without success. I'm not sure if perhaps I
had wrong  distErrPct or  maxDistErr values...
It's getting late, so I'm going to call it a night (I'm on GMT), but I'll
put your example into practice tomorrow and get confirmation that it's
working as expected.

I'll keep playing around with the distErrPct values as well.
Do I need to do a reindex if I change these values? (I think yes?)


On Tue, Jun 4, 2013 at 10:44 PM, Smiley, David W.  wrote:

> So "availability" is the absence of any other document's indexed time
> duration overlapping with your availability query duration.  So I think
> you should negate an overlaps query.  The overlaps query looks like:
> Intersects(-Inf start end Inf).  And remember the slight buffering needed
> as described on the wiki.  You'd add a small fraction to the start time
> and subtract a small fraction from the end time, so that you don't
> accidentally match a document that is adjacent.
>
> -availability_spatial:"Intersects( 0 30.5 114.5 3650 )"
>
> Does that work against your data?  If it doesn't, can you conjecture why
> it doesn't work based on a sample point in a document that it matched, or
> a document that should have matched but didn't?
>
> ~ David
>
> On 6/4/13 3:31 PM, "Chris Atkinson"  wrote:
>
> >Here is an example I have tried.
> >
> >So let's assume that I want to checkIn on the 30th day, and leave on the
> >115th day.
> >
> >My query would be:
> >
> >-availability_spatial:"Intersects(   30 0  3650 115 )"
> >
> >However, that wouldn't match anything. Here is an example document below
> >so
> >you can see. (I've not negated the spatial field in the filter query so
> >you
> >can see the field coordinates)
> >
> >In case the formatting is bad: See here
> >
> >http://pastie.org/pastes/8006249/text
> >
> >
> >
> >   >name="responseHeader"
> >> 0 1  >>name="params"> <
> >str name="fl">availability_spatial true
> > >name="q">id:38197  1370374172298 
> >xml availability_spatial:"Intersects( 30 0 3650 115
> >)"
> >   
> > <
> >arr name="availability_spatial"> 147.6 163.4 164.6 178.4 >str> 192.6 220.4 241.6 264.4  
> > >response>
> >
> >
> >On Tue, Jun 4, 2013 at 8:14 PM, Chris Atkinson 
> >wrote:
> >
> >> Thanks David.
> >> Query times are really quick and my index is only 20Mb now which is
> >>about
> >> what I would expect.
> >> I'm having some problems figuring out what type of query I want to find
> >> *Available* properties with this new points system.
> >>
> >>
> >> I'm storing bookings against each document. So I have X Y coordinates,
> >> where X will be  the check in of a previous booking, and Y will be the
> >> departure.
> >>
> >> So for example illustrative purposes, a weeks booking from 10th January
> >>to
> >> the 17th, would be X Y => 10 17
> >>
> >> 10 17
> >> 22 27
> >>
> >> I might have several bookings.
> >>
> >> Now, I want to find available properties with my search, but I'm just
> >>not
> >> sure on the ordering of the end/start in the polygon Intersect.
> >>
> >> I've looked at this document very carefully and tried to draw it all out
> >> on paper.
> >>
> >>
> >>
> https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-2013011
> >>7/
> >>
> >> Here are the suggestions:
> >>
> >> q=fieldX:"Intersects(-ƒ end start ƒ)"
> >> q=fieldX:"Intersects(-ƒ start end ƒ)"
> >> q=fieldX:"Intersects(start -ƒ ƒ end)"
> >>
> >> All of these, are great for finding the existance of a field coordinate,
> >> but I need to make sure that the property is available. So I thought I
> >> could use one of these three queries in the negative by using
> >> -fieldX:"Inter" but none of those work.
> >>
> >> Can you shine some light on what I might be missing?
> >> What ordering would I want for *availability*
> >> Thanks very much.
> >>
> >>
> >>
> >> On Mon, Jun 3, 2013 at 11:45 PM, Smiley, David W.
> >>

Re: SpatialRecursivePrefixTreeFieldType Spatial Searching

2013-06-04 Thread Chris Atkinson
Here is an example I have tried.

So let's assume that I want to checkIn on the 30th day, and leave on the
115th day.

My query would be:

-availability_spatial:"Intersects(   30 0  3650 115 )"

However, that wouldn't match anything. Here is an example document below so
you can see. (I've not negated the spatial field in the filter query so you
can see the field coordinates)

In case the formatting is bad: See here

http://pastie.org/pastes/8006249/text



   0 1  <
str name="fl">availability_spatial true id:38197  1370374172298 
xml availability_spatial:"Intersects( 30 0 3650 115 )"
 <
arr name="availability_spatial"> 147.6 163.4 164.6 178.4 192.6 220.4 241.6 264.4   


On Tue, Jun 4, 2013 at 8:14 PM, Chris Atkinson  wrote:

> Thanks David.
> Query times are really quick and my index is only 20Mb now which is about
> what I would expect.
> I'm having some problems figuring out what type of query I want to find
> *Available* properties with this new points system.
>
>
> I'm storing bookings against each document. So I have X Y coordinates,
> where X will be  the check in of a previous booking, and Y will be the
> departure.
>
> So for example illustrative purposes, a weeks booking from 10th January to
> the 17th, would be X Y => 10 17
>
> 10 17
> 22 27
>
> I might have several bookings.
>
> Now, I want to find available properties with my search, but I'm just not
> sure on the ordering of the end/start in the polygon Intersect.
>
> I've looked at this document very carefully and tried to draw it all out
> on paper.
>
> https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/
>
> Here are the suggestions:
>
> q=fieldX:"Intersects(-∞ end start ∞)"
> q=fieldX:"Intersects(-∞ start end ∞)"
> q=fieldX:"Intersects(start -∞ ∞ end)"
>
> All of these, are great for finding the existance of a field coordinate,
> but I need to make sure that the property is available. So I thought I
> could use one of these three queries in the negative by using
> -fieldX:"Inter" but none of those work.
>
> Can you shine some light on what I might be missing?
> What ordering would I want for *availability*
> Thanks very much.
>
>
>
> On Mon, Jun 3, 2013 at 11:45 PM, Smiley, David W. wrote:
>
>> Hi Chris:
>>
>> Have you read: http://wiki.apache.org/solr/SpatialForTimeDurations
>> You're modeling your data sub-optimally.  Full precision rectangles
>> (distErrPct=0) doesn't scale well and you're seeing that.  You should
>> represent your durations as a point and it will take up a fraction of the
>> space (see above).  Furthermore, because your detail gets into one digit
>> to the right of the decimal, your maxDistErr should definitely be smaller
>> than 1 -- use something like 0.5 (given you have two levels of precision
>> below a full day) but to be safer (more certain it's not a problem) use
>> 0.3 -- a little less.  Please report back how that goes.
>>
>> ~ David
>>
>> On 6/3/13 7:27 AM, "Chris Atkinson"  wrote:
>>
>> >Hi,
>> >I'm seeing really slow query times. 7-25 seconds when I run a simple
>> >filter
>> >query that uses my SpatialRecursivePrefixTreeFieldType field.
>> >
>> >My index is about 30k documents. Prior to adding the Spatial field, the
>> on
>> >disk space was about 100Mb, so it's a really tiny index. Once I add the
>> >spatial field (which is multi-values), the index size jumps up to 2GB.
>> (Is
>> >this normal?).
>> >
>> >Only about 10k documents will have any spatial data. Typically, they will
>> >have at most 10 shapes each, but the majority are all one of two
>> >rectangles.
>> >
>> >This is my fieldType definition.
>> >
>> >   > >class="solr.SpatialRecursivePrefixTreeFieldType"
>> >geo="false"
>> >worldBounds="0 0 3650 1"
>> >distErrPct="0"
>> >maxDistErr="1"
>> >units="degrees"
>> >/>
>> >
>> >And the field
>> >
>> > > > indexed="true" stored="false" multiValued="true" />
>> >
>> >
>> >I am using the field to represent approximately 10 years after January
>> 1st
>> >2013, where each day is along the X-axis. Because the availability starts
>> >and ends at 2pm and 10am, I was using a decimal place when creating my
>> >shape to show that detail. (Is this approach wrong?)
>> >
>> >So a typical rectangle when indexed would be (minX minY maxX maxY)
>> >
>> >Rectangle 100.6 0 120.4 1
>> >
>> >Is it wrong that my Y and X values are not of the same scale? Since I
>> >don't
>> >care about the Y axis at all, I just set it to be of 1 height always.
>> >
>> >I'm running Solr 4.3, with a small JVM of 768M (can be increased). And I
>> >have 2GB RAM. (Again can be increased).
>> >
>> >Thanks
>>
>>
>


Re: SpatialRecursivePrefixTreeFieldType Spatial Searching

2013-06-04 Thread Chris Atkinson
Thanks David.
Query times are really quick and my index is only 20Mb now which is about
what I would expect.
I'm having some problems figuring out what type of query I want to find
*Available* properties with this new points system.


I'm storing bookings against each document. So I have X Y coordinates,
where X will be  the check in of a previous booking, and Y will be the
departure.

So for example illustrative purposes, a weeks booking from 10th January to
the 17th, would be X Y => 10 17

10 17
22 27

I might have several bookings.

Now, I want to find available properties with my search, but I'm just not
sure on the ordering of the end/start in the polygon Intersect.

I've looked at this document very carefully and tried to draw it all out on
paper.

https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/

Here are the suggestions:

q=fieldX:"Intersects(-∞ end start ∞)"
q=fieldX:"Intersects(-∞ start end ∞)"
q=fieldX:"Intersects(start -∞ ∞ end)"

All of these, are great for finding the existance of a field coordinate,
but I need to make sure that the property is available. So I thought I
could use one of these three queries in the negative by using
-fieldX:"Inter" but none of those work.

Can you shine some light on what I might be missing?
What ordering would I want for *availability*
Thanks very much.



On Mon, Jun 3, 2013 at 11:45 PM, Smiley, David W.  wrote:

> Hi Chris:
>
> Have you read: http://wiki.apache.org/solr/SpatialForTimeDurations
> You're modeling your data sub-optimally.  Full precision rectangles
> (distErrPct=0) doesn't scale well and you're seeing that.  You should
> represent your durations as a point and it will take up a fraction of the
> space (see above).  Furthermore, because your detail gets into one digit
> to the right of the decimal, your maxDistErr should definitely be smaller
> than 1 -- use something like 0.5 (given you have two levels of precision
> below a full day) but to be safer (more certain it's not a problem) use
> 0.3 -- a little less.  Please report back how that goes.
>
> ~ David
>
> On 6/3/13 7:27 AM, "Chris Atkinson"  wrote:
>
> >Hi,
> >I'm seeing really slow query times. 7-25 seconds when I run a simple
> >filter
> >query that uses my SpatialRecursivePrefixTreeFieldType field.
> >
> >My index is about 30k documents. Prior to adding the Spatial field, the on
> >disk space was about 100Mb, so it's a really tiny index. Once I add the
> >spatial field (which is multi-values), the index size jumps up to 2GB. (Is
> >this normal?).
> >
> >Only about 10k documents will have any spatial data. Typically, they will
> >have at most 10 shapes each, but the majority are all one of two
> >rectangles.
> >
> >This is my fieldType definition.
> >
> >>class="solr.SpatialRecursivePrefixTreeFieldType"
> >geo="false"
> >worldBounds="0 0 3650 1"
> >distErrPct="0"
> >maxDistErr="1"
> >units="degrees"
> >/>
> >
> >And the field
> >
> >  > indexed="true" stored="false" multiValued="true" />
> >
> >
> >I am using the field to represent approximately 10 years after January 1st
> >2013, where each day is along the X-axis. Because the availability starts
> >and ends at 2pm and 10am, I was using a decimal place when creating my
> >shape to show that detail. (Is this approach wrong?)
> >
> >So a typical rectangle when indexed would be (minX minY maxX maxY)
> >
> >Rectangle 100.6 0 120.4 1
> >
> >Is it wrong that my Y and X values are not of the same scale? Since I
> >don't
> >care about the Y axis at all, I just set it to be of 1 height always.
> >
> >I'm running Solr 4.3, with a small JVM of 768M (can be increased). And I
> >have 2GB RAM. (Again can be increased).
> >
> >Thanks
>
>


Re: SpatialRecursivePrefixTreeFieldType Spatial Searching

2013-06-03 Thread Chris Atkinson
Also, here is a sample query, and the debugQuery output

fq={!cost=200}*:* -availability_spatial:"Intersects(182.6 0 199.4 1)"

Incase the formatting is bad, here is a raw past of the debugQuery:

http://pastie.org/pastes/872/text?key=ksjyboect4imrha0rck8sa


   0 8171  true true *:* 1370259235923 xml <
str name="fq">{!cost=200}*:* -availability_spatial:"Intersects(182.6 0
199.4 1)" 0  *:* *:* MatchAllDocsQuery(*:*) 
*:*  LuceneQParser  {!cost=200}*:*
-availability_spatial:"Intersects(182.6 0 199.4 1)"   +MatchAllDocsQuery(*:*)
-ConstantScore(org.apache.lucene.spatial.prefix.IntersectsPrefixTreeFilter@42ce603b
)   8171.0  1.0  0.0   0.0   1.0   0.0  
0.0   0.0
8170.0  8170.0   0.0   0.0   0.0   0.0   0.0     



On Mon, Jun 3, 2013 at 12:27 PM, Chris Atkinson  wrote:

> Hi,
> I'm seeing really slow query times. 7-25 seconds when I run a simple
> filter query that uses my SpatialRecursivePrefixTreeFieldType field.
>
> My index is about 30k documents. Prior to adding the Spatial field, the on
> disk space was about 100Mb, so it's a really tiny index. Once I add the
> spatial field (which is multi-values), the index size jumps up to 2GB. (Is
> this normal?).
>
> Only about 10k documents will have any spatial data. Typically, they will
> have at most 10 shapes each, but the majority are all one of two
> rectangles.
>
> This is my fieldType definition.
>
> class="solr.SpatialRecursivePrefixTreeFieldType"
> geo="false"
> worldBounds="0 0 3650 1"
> distErrPct="0"
> maxDistErr="1"
> units="degrees"
> />
>
> And the field
>
>indexed="true" stored="false" multiValued="true" />
>
>
> I am using the field to represent approximately 10 years after January 1st
> 2013, where each day is along the X-axis. Because the availability starts
> and ends at 2pm and 10am, I was using a decimal place when creating my
> shape to show that detail. (Is this approach wrong?)
>
> So a typical rectangle when indexed would be (minX minY maxX maxY)
>
> Rectangle 100.6 0 120.4 1
>
> Is it wrong that my Y and X values are not of the same scale? Since I
> don't care about the Y axis at all, I just set it to be of 1 height always.
>
> I'm running Solr 4.3, with a small JVM of 768M (can be increased). And I
> have 2GB RAM. (Again can be increased).
>
> Thanks
>
>
>
>


SpatialRecursivePrefixTreeFieldType Spatial Searching

2013-06-03 Thread Chris Atkinson
Hi,
I'm seeing really slow query times. 7-25 seconds when I run a simple filter
query that uses my SpatialRecursivePrefixTreeFieldType field.

My index is about 30k documents. Prior to adding the Spatial field, the on
disk space was about 100Mb, so it's a really tiny index. Once I add the
spatial field (which is multi-values), the index size jumps up to 2GB. (Is
this normal?).

Only about 10k documents will have any spatial data. Typically, they will
have at most 10 shapes each, but the majority are all one of two
rectangles.

This is my fieldType definition.

   

And the field

 


I am using the field to represent approximately 10 years after January 1st
2013, where each day is along the X-axis. Because the availability starts
and ends at 2pm and 10am, I was using a decimal place when creating my
shape to show that detail. (Is this approach wrong?)

So a typical rectangle when indexed would be (minX minY maxX maxY)

Rectangle 100.6 0 120.4 1

Is it wrong that my Y and X values are not of the same scale? Since I don't
care about the Y axis at all, I just set it to be of 1 height always.

I'm running Solr 4.3, with a small JVM of 768M (can be increased). And I
have 2GB RAM. (Again can be increased).

Thanks