Re: Longitude Index: B-Tree or Cluster

John J Foster via 4D_Tech Fri, 10 Apr 2020 07:39:26 -0700

Hi Tom,

I didn’t give a complete scenario which is why the example looks so broad. In 
this case I have imported data to which I need to match with the correct Olson 
Time Zone. So in the example searching for Russia (RU) and then the longitude 
range returns many many records (aka, broad). However in this case this works 
because all records seem to be in the same timezone. So I just grab the first 
record and return the timezone.


I just want it to be as fast as possible.

Of course in a search for a specific city in a country that’s unique it finds 
it in a split second.

The B-Tree seems to be working fairly quickly unless you need a sort. Then it 
has to load records into the cache, etc. It would be faster the following times.

Appreciate,
John…


> Make sure you include testing a single query, if you haven’t already, then 
> see what 4D’s query analyzer does with it in the query plan. Your current use 
> of two queries may very well be optimal for your data set, since most 
> countries don’t span much of the world in terms of longitude. Your example of 
> Russia is the broadest case. You may see more benefits for Luxembourg, or 
> even Chile, or not. Only testing will tell.
> 
> Years ago I worked with Josh Fletcher to understand the different types of 4D 
> indexes. The result was a recommendation to leave them as the default B-Tree. 
> Any benefit to fiddling with each index was negligible. However, with a 
> specialized data set, such as longitude, it might make a bigger difference.
> 
> HTH,
> 
> Tom Benedict
> 
>> On Apr 10, 2020, at 06:26, John J Foster via 4D_Tech <4d_tech@lists.4d.com> 
>> wrote:
>> 
>> Hi Arnaud,
>> 
>>> what is the field type? 
>> 
>> The field type is a real number.
>> 
>> There are so many similar values in terms of the integer part but many many 
>> variations of the decimal part. Currently I am using a B-Tree index.
>> 
>> I’ve thought about breaking it the two parts but that just adds an extra 
>> field and storage and if I did that then maybe a Cluster index would be 
>> obvious.
>> 
>> I'll play with the Query plan and see if it’s helpful. Having almost 5 
>> million cities is a good testing ground and a good reference database when I 
>> am matching smaller sets of GEO data (currently imported and massaged from 
>> another database).
>> 
>> In this case I am searching by country (or country code) and longitude to 
>> find (if found) an Olson TimeZone. The fastest search I've found so far 
>> (B-Tree index):
>> 
>> // [PE_Geo_Location]lon_dec = 28.6
>> // [PE_Geo_Location]Country_Code = RU
>> 
>> GEO_GetTimeZone([PE_Geo_Location]Country_Code;[PE_Geo_Location]lon_dec)
>> 
>> $cntryCode:=$1
>> $lon_r:=Int($2)    // 28
>> 
>> QUERY([City];[City]country code=$cntryCode)                          
>> QUERY SELECTION([City];[City]longitude>=$lon_r;*)
>> QUERY SELECTION([City]; & ;[City]longitude<=$lon_r)
>> ….
>> 
>> $0:=[City]timezone
>> 
>> It’s not a “live” database so I don't want to spend too much time with it. 
>> But when I need to match data then, of course, I want it to search as fast 
>> as is possible.
>> 
>> Appreciate,
>> John…
>> 
>> 
>> 
>>>> Le 10 avr. 2020 à 02:03, John J Foster via 4D_Tech <4d_tech@lists.4d.com 
>>>> <mailto:4d_tech@lists.4d.com>> a écrit :
>>>> 
>>>> Hi All,
>>>> 
>>>> I have a GEO database of almost 5 million cities. They include various 
>>>> data and latitude and longitude.
>>>> 
>>>> I need to use the Longitude file in searches. To speed it up I need to 
>>>> index. Many of the values are similar like 141.235 or 141 3.23 and so on. 
>>>> So for those who might know would a Cluster index be better suited or a 
>>>> B-Tree?
>>> 
>>> I'd try distinct values first, a cluster is supposed to be better when 
>>> there's enough repeated values. And most probably I'd try… I didn't 
>>> understand from "141.235 or 141 3.23": what is the field type? 
>>> You may also consider using another field with "simplified value" from 
>>> longitude, depending on what's expected from that search. Or, if the search 
>>> is always combined with other fields than longitude, make some tests and 
>>> analyse query plan/query path to see if that index is required. 
>>> 
>>> -- 
>>> Arnaud de Montard 
>> 
>> 
> 

**********************************************************************
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**********************************************************************

Re: Longitude Index: B-Tree or Cluster

Reply via email to