Re: [Architecture] [Analytics][APIM] - Implement Geo location graph in Analytics

Tharindu Dharmarathna Mon, 07 Mar 2016 03:39:13 -0800

Hi Lasantha,

Upto now we are doing the following way in order to get the geo location
from the stated dump.


1.  two columns added filled with long value of lower and upper value of
network ip addresses. Then get the geoname_id with respect to the long
value for the given ip which between this above long values. Hope you will
got this idea on our approach. Is there any way to do bit wise operation in
order to get the network_cidr value ? .

Thanks
Tharindu

On Mon, Mar 7, 2016 at 12:05 AM, Lasantha Fernando <lasan...@wso2.com>
wrote:

> Hi all,
>
> I think what Sachith suggests also makes sense. But am also rooting for
> the in-memory cache implementation suggested by Sanjeewa with ip-netmask
> approach.
>
> Please find my comments inline.
>
> On 5 March 2016 at 23:50, Sachith Withana <sach...@wso2.com> wrote:
>
>> Hi all,
>>
>> From what I understand/was told, this happens once a day ( or relatively
>> infrequently), and you wanna avoid searching through all the geo data per
>> ip ( since you are grouping the requests by IP).
>>
>> IF that's the case, it would be better to use a separate DB table to
>> cache these data ( IP, geoID ..etc) with the IP being the primary key (
>> which would improve the lookup time), and even though there will be cache
>> misses, it would eventually reduce the (#cacheMisses/ Hits).
>>
>> Having a DB cache would be better since you do want to persist these data
>> to be used over time.
>>
>> BTW in a cache miss, if we can figure out a way to limit the search range
>> on the original table or at least stop the search once a match is found, it
>> would greatly improve the cache miss time as well.
>>
>> That's my two cents.
>>
>> Cheers,
>> Sachith
>>
>> On Sun, Mar 6, 2016 at 8:24 AM, Janaka Ranabahu <jan...@wso2.com> wrote:
>>
>>> Hi Sanjeewa,
>>>
>>> On Sun, Mar 6, 2016 at 7:25 AM, Sanjeewa Malalgoda <sanje...@wso2.com>
>>> wrote:
>>>
>>>> Implementing cache is better than having another table mapping IMO.
>>>> What if we query database and keep IP range and network name in memory.
>>>> Then we may do quick search on network name and then based on that rest
>>>> can load some other way.
>>>> WDYT?
>>>>
>>> We thought of having an in memory cache but we faced several issues
>>> along the way. Let me explain the situation as it is per now.
>>>
>>> The Max-Mind DB has the IP addresses with the IP and the netmask.
>>> Ex: 192.168.0.0/20
>>>
>>> The calculation of the IP address range would be like the following.
>>>
>>> Address:   192.168.0.1           11000000.10101000.0000 0000.00000001
>>> Netmask:   255.255.240.0 = 20    11111111.11111111.1111 0000.00000000
>>> Wildcard:  0.0.15.255            00000000.00000000.0000 1111.11111111
>>> =>Network:   192.168.0.0/20        11000000.10101000.0000 0000.00000000 
>>> (Class C)
>>> Broadcast: 192.168.15.255        11000000.10101000.0000 1111.11111111
>>> HostMin:   192.168.0.1           11000000.10101000.0000 0000.00000001
>>> HostMax:   192.168.15.254        11000000.10101000.0000 1111.11111110
>>> Hosts/Net: 4094                  (Private Internet 
>>> <http://www.ietf.org/rfc/rfc1918.txt>)
>>>
>>>
>>> Therefore what we are currently doing is to calculate the start and end
>>> IP for all the values in the max-mind DB and alter the tables with those
>>> values initially(this is a one time thing that will happen). When the Spark
>>> script executes, we check whether the given IP is between any of the start
>>> and end ranges in the tables. That is the reason why it is taking a long
>>> time to fetch results for a given IP.
>>>
>>> As a solution for this, we discussed what Tharindu has mentioned.
>>> 1. Have a in memory caching mechanism.
>>> 2. Have a DB based caching mechanism.
>>>
>>> The only point that we have to highlight is the fact that in both the
>>> above mechanisms we need to cache the IP address(not the ip-netmask as it
>>> was in the max-mind db) against the Geo location.
>>>
>>> Ex:-
>>> For 192.168.0.1       - Colombo, Sri Lanka
>>> For 192.168.15.254 - Colombo, Sri Lanka
>>>
>>> So as per the above example I took, if there are requests form all the
>>> possible 4094 address we will be caching each IP with the Geo
>>> location(since introducing range queries in a cache is not a good practice).
>>>
>>
> Since we are implementing a custom cache, won't we be doing a bitwise
> operation for the lookup with netmask and network IP? So basically, we
> would keep the network IP and the netmask in cache and simply do a bitwise
> AND to determine whether it is a match or not, right? Am thinking such an
> operation would not incur much of a performance hit and it won't be as
> prohibitive as a normal range query in a cache. If that is the case, I
> think we can go with the approach suggested by Sanjeewa.
>
> WDYT?
>
>
>>> Please find my comments about both the approaches.
>>>
>>> 1. Having an in-memory cache would speedup things but based on the IPs
>>> in the data set, there could be number of entries for IPs in the same
>>> range. One problem with this approach is that, if there is a server
>>> restart, the initial script execution would take a lots of time. Also based
>>> on certain scenarios(high number of different IPs) the cache would not have
>>> a significant effect on script execution performance.
>>>
>>> 2. Having a DB based cache would persist the data even on a restart and
>>> the data fetching query would be searching for an specific value(not a
>>> range query as against the max-mind DB). But the downside is that for a
>>> cache miss there would be minimum 3 DB queries (one for the cache table
>>> lookup and one for the max-mind db lookup and one for the
>>> cache persistence).
>>>
>>> That is why we have initiated this thread to finalize the caching
>>> approach we should take.
>>> 
>>> Thanks,
>>> Janaka
>>>
>>>
>>>
>>>> Thanks,
>>>> sanjeewa.
>>>>
>>>
> Thanks,
> Lasantha
>
>
>>
>>>> On Fri, Mar 4, 2016 at 3:12 PM, Tharindu Dharmarathna <
>>>> tharin...@wso2.com> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> We are going to implement Client IP based Geo-location Graph in API
>>>>> Manager Analytics. When we go through the ways of doing in [1] , we
>>>>> selected [2] as the most suitable way to do.
>>>>>
>>>>>
>>>>> *Overview of max-mind's DB.*
>>>>>
>>>>> As the structure of the db (attached in image), They have two tables
>>>>> which incorporate to get the location.
>>>>>
>>>>> Find geoname_id according to network and get Country,City from
>>>>> locations table.
>>>>>
>>>>> *Limitations*
>>>>>
>>>>> As their database dump we couldn't directly process the ip from those
>>>>> tables. We need to check the given ip is in between the network min and 
>>>>> max
>>>>> ip. This query get some long time (10 seconds in indexed data). If we
>>>>> directly do this from spark script for each and every ip which in summary
>>>>> table (regardless if ip is same from two row data) will query from the
>>>>> tables. Therefore this will incur the performance impact on this graph.
>>>>>
>>>>> *Solution*
>>>>>
>>>>> 1. Implement LRU cache against ip address vs location.
>>>>>
>>>>> This will need to implement on custom UDF in Spark. If ip querying
>>>>> from spark available in cache it will give the location from it , IF it is
>>>>> not It will retrieve from DB and put into the cache.
>>>>>
>>>>> 2. Persist in a Table
>>>>>
>>>>> ip as the primary key and Country and city as other columns and
>>>>> retrieve data from that table.
>>>>>
>>>>>
>>>>> Please feel free to give us the most suitable way of doing this
>>>>> solution?.
>>>>>
>>>>> [1] - Implementing Geographical based Analytics in API Manager mail
>>>>> thread.
>>>>>
>>>>> [2] - http://dev.maxmind.com/geoip/geoip2/geolite2/
>>>>>
>>>>>
>>>>> *Thanks*
>>>>>
>>>>> *Tharindu Dharmarathna*
>>>>> Associate Software Engineer
>>>>> WSO2 Inc.; http://wso2.com
>>>>> lean.enterprise.middleware
>>>>>
>>>>> mobile: *+94779109091 <%2B94779109091>*
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *Sanjeewa Malalgoda*
>>>> WSO2 Inc.
>>>> Mobile : +94713068779
>>>>
>>>> <http://sanjeewamalalgoda.blogspot.com/>blog
>>>> :http://sanjeewamalalgoda.blogspot.com/
>>>> <http://sanjeewamalalgoda.blogspot.com/>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> *Janaka Ranabahu*
>>> Associate Technical Lead, WSO2 Inc.
>>> http://wso2.com
>>>
>>>
>>> *E-mail: jan...@wso2.com <http://wso2.com>**M: **+94 718370861
>>> <%2B94%20718370861>*
>>>
>>> Lean . Enterprise . Middleware
>>>
>>
>>
>>
>> --
>> Sachith Withana
>> Software Engineer; WSO2 Inc.; http://wso2.com
>> E-mail: sachith AT wso2.com
>> M: +94715518127
>> Linked-In: <http://goog_416592669>
>> https://lk.linkedin.com/in/sachithwithana
>>
>> _______________________________________________
>> Architecture mailing list
>> Architecture@wso2.org
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
>
> --
> *Lasantha Fernando*
> Senior Software Engineer - Data Technologies Team
> WSO2 Inc. http://wso2.com
>
> email: lasan...@wso2.com
> mobile: (+94) 71 5247551
>
> _______________________________________________
> Architecture mailing list
> Architecture@wso2.org
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 

*Tharindu Dharmarathna*Associate Software Engineer
WSO2 Inc.; http://wso2.com
lean.enterprise.middleware

mobile: *+94779109091*

_______________________________________________
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] [Analytics][APIM] - Implement Geo location graph in Analytics

Reply via email to