Hi All,

After Going though above discussion , We had implemented the Plug-gable
User Define Extension point. From this configuration We can write our own
implementation which can used to get the Country and State of the Given IP.

*Caching Implementation*

We define two level of caching as below.

When IP address checked from the *UDF* , First It check on Cache to get the
Location Information. If it is not in cache  It I'll check on another
database which contain IP to Location Direct Mapping as *Sajith* Mentioned.
If it is there it will return and cache that location. If location not in
that database , IP will check against the *MAXMIND* database. and store the
location on cache and the above table.

Thanks
Tharindu


On Tue, Mar 8, 2016 at 2:34 PM, Tharindu Dharmarathna <tharin...@wso2.com>
wrote:

> Hi All,
>
> We have come across following ways to do the above task after the Initial
> POC.
>
> 1. Using File type database which given by max-mind (.mmdb) and use there
> database readers.
>
> From this approach we got lesser value to get the location from the above
> using JAX-RS service which is used to wrap the above database. This JAX-RS
> implementation is by default used the max-mind's Cache implementation which
> can find from [1] .
>
> *Limitations*
>
>
>    - Hosting of the Jax-RS app in another server.
>    - # of http calls will high.
>
>
> 2. Call query server as above thread and cached the location with ip.
>
> Here you can find the execution time for a single query which get for each
> method.
>
>
> *Method 1 : 4.5 seconds*
>
> *Method 2: 4.76 seconds*
>
>
> Thanks
> Tharindu
>
>
> On Tue, Mar 8, 2016 at 8:29 AM, Lasantha Fernando <lasan...@wso2.com>
> wrote:
>
>> Hi Tharindu,
>>
>> On 7 March 2016 at 21:10, Sajith Ravindra <saji...@wso2.com> wrote:
>>
>>>
>>> 2. Having a DB based cache would persist the data even on a restart and
>>>> the data fetching query would be searching for an specific value(not a
>>>> range query as against the max-mind DB). But the downside is that for a
>>>> cache miss there would be minimum 3 DB queries (one for the cache table
>>>> lookup and one for the max-mind db lookup and one for the
>>>> cache persistence).
>>>>
>>>
>>> In order to avoid expensive cache misses we may eagerly populate the DB
>>> table cache. i.e. When there's a cache miss we do the lookup in max-mind db
>>> and then add multiple entries for multiple IPs of that netwokrk_cid to the
>>> Cache DB table instead of only for that particular IP. That way we reduce
>>> the chance of cache miss being very expensive, as we increase the chance of
>>> it being found on the first DB lookup.
>>>
>>> We might need to do some evaluation to determine how much entries that
>>> we are going to add to the DB cache for IP belongs to a  particular
>>> netwokrk_cid. For an example if requests from a certain netwokrk_cidr is
>>> frequent we may want to add more entries with compared to a less frequent
>>> netwokrk_cidr.
>>>
>>> The downside is the DB cache tend to be more big.
>>>
>>> Thanks
>>> *,Sajith Ravindra*
>>> Senior Software Engineer
>>> WSO2 Inc.; http://wso2.com
>>> lean.enterprise.middleware
>>>
>>> mobile: +94 77 2273550
>>> blog: http://sajithr.blogspot.com/
>>> <http://lk.linkedin.com/pub/shani-ranasinghe/34/111/ab>
>>>
>>> On Mon, Mar 7, 2016 at 4:37 AM, Tharindu Dharmarathna <
>>> tharin...@wso2.com> wrote:
>>>
>>>> Hi Lasantha,
>>>>
>>>> Upto now we are doing the following way in order to get the geo
>>>> location from the stated dump.
>>>>
>>>> 1.  two columns added filled with long value of lower and upper value
>>>> of network ip addresses. Then get the geoname_id with respect to the long
>>>> value for the given ip which between this above long values. Hope you will
>>>> got this idea on our approach. Is there any way to do bit wise operation in
>>>> order to get the network_cidr value ? .
>>>>
>>>
>> Can't we do it by keeping the network IP and the subnet as two columns
>> and the geoname_id as the third. Say for example, if 192.168.0.0/20 is
>> the cidr, for IPv4 routing what is usually done is we get the IP as int,
>> then do a bitwise AND with the subnet mask (e.g. if subnet mask is 20, that
>> would mean 20 bits with value 1 and remaining 12 bits of value 0, i.e.
>> 11111111 11111111 11110000 00000) and check whether that returns the
>> network IP.
>>
>> You might find more info here [1]. I think there should be libraries that
>> wrap this operation. But if performance is a concern and we need to keep
>> the cache search implementation very lean, we can implement it ourselves.
>>
>> WDYT?
>>
>> [1]
>> http://stackoverflow.com/questions/4209760/validate-an-ip-address-with-mask
>>
>> Thanks,
>> Lasantha
>>
>>
>>>> Thanks
>>>> Tharindu
>>>>
>>>> On Mon, Mar 7, 2016 at 12:05 AM, Lasantha Fernando <lasan...@wso2.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I think what Sachith suggests also makes sense. But am also rooting
>>>>> for the in-memory cache implementation suggested by Sanjeewa with
>>>>> ip-netmask approach.
>>>>>
>>>>> Please find my comments inline.
>>>>>
>>>>> On 5 March 2016 at 23:50, Sachith Withana <sach...@wso2.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> From what I understand/was told, this happens once a day ( or
>>>>>> relatively infrequently), and you wanna avoid searching through all the 
>>>>>> geo
>>>>>> data per ip ( since you are grouping the requests by IP).
>>>>>>
>>>>>> IF that's the case, it would be better to use a separate DB table to
>>>>>> cache these data ( IP, geoID ..etc) with the IP being the primary key (
>>>>>> which would improve the lookup time), and even though there will be cache
>>>>>> misses, it would eventually reduce the (#cacheMisses/ Hits).
>>>>>>
>>>>>> Having a DB cache would be better since you do want to persist these
>>>>>> data to be used over time.
>>>>>>
>>>>>> BTW in a cache miss, if we can figure out a way to limit the search
>>>>>> range on the original table or at least stop the search once a match is
>>>>>> found, it would greatly improve the cache miss time as well.
>>>>>>
>>>>>> That's my two cents.
>>>>>>
>>>>>> Cheers,
>>>>>> Sachith
>>>>>>
>>>>>> On Sun, Mar 6, 2016 at 8:24 AM, Janaka Ranabahu <jan...@wso2.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Sanjeewa,
>>>>>>>
>>>>>>> On Sun, Mar 6, 2016 at 7:25 AM, Sanjeewa Malalgoda <
>>>>>>> sanje...@wso2.com> wrote:
>>>>>>>
>>>>>>>> Implementing cache is better than having another table mapping IMO.
>>>>>>>> What if we query database and keep IP range and network name in memory.
>>>>>>>> Then we may do quick search on network name and then based on that
>>>>>>>> rest can load some other way.
>>>>>>>> WDYT?
>>>>>>>>
>>>>>>> ​We thought of having an in memory cache but we faced several issues
>>>>>>> along the way. Let me explain the situation as it is per now.​
>>>>>>>
>>>>>>> The Max-Mind DB has the IP addresses with the IP and the netmask.
>>>>>>> Ex: 192.168.0.0/20
>>>>>>>
>>>>>>> The calculation of the IP address range would be like the following.
>>>>>>>
>>>>>>> Address:   192.168.0.1           11000000.10101000.0000 0000.00000001
>>>>>>> Netmask:   255.255.240.0 = 20    11111111.11111111.1111 0000.00000000
>>>>>>> Wildcard:  0.0.15.255            00000000.00000000.0000 1111.11111111
>>>>>>> =>Network:   192.168.0.0/20        11000000.10101000.0000 0000.00000000 
>>>>>>> (Class C)
>>>>>>> Broadcast: 192.168.15.255        11000000.10101000.0000 1111.11111111
>>>>>>> HostMin:   192.168.0.1           11000000.10101000.0000 0000.00000001
>>>>>>> HostMax:   192.168.15.254        11000000.10101000.0000 1111.11111110
>>>>>>> Hosts/Net: 4094                  (Private Internet 
>>>>>>> <http://www.ietf.org/rfc/rfc1918.txt>)
>>>>>>>
>>>>>>>
>>>>>>> Therefore what we are currently doing is to calculate the start and
>>>>>>> end IP for all the values in the max-mind DB and alter the tables with
>>>>>>> those values initially(this is a one time thing that will happen). When 
>>>>>>> the
>>>>>>> Spark script executes, we check whether the given IP is between any of 
>>>>>>> the
>>>>>>> start and end ranges in the tables. That is the reason why it is taking 
>>>>>>> a
>>>>>>> long time to fetch results for a given IP.
>>>>>>>
>>>>>>> As a solution for this, we discussed what Tharindu has mentioned.
>>>>>>> 1. Have a in memory caching mechanism.
>>>>>>> 2. Have a DB based caching mechanism.
>>>>>>>
>>>>>>> The only point that we have to highlight is the fact that in both
>>>>>>> the above mechanisms we need to cache the IP address(not the ip-netmask 
>>>>>>> as
>>>>>>> it was in the max-mind db) against the Geo location.
>>>>>>>
>>>>>>> Ex:-
>>>>>>> For 192.168.0.1       - Colombo, Sri Lanka
>>>>>>> For 192.168.15.254 - Colombo, Sri Lanka
>>>>>>>
>>>>>>> So as per the above example I took, if there are requests form all
>>>>>>> the possible 4094 address we will be caching each IP with the Geo
>>>>>>> location(since introducing range queries in a cache is not a good 
>>>>>>> practice).
>>>>>>>
>>>>>>
>>>>> Since we are implementing a custom cache, won't we be doing a bitwise
>>>>> operation for the lookup with netmask and network IP? So basically, we
>>>>> would keep the network IP and the netmask in cache and simply do a bitwise
>>>>> AND to determine whether it is a match or not, right? Am thinking such an
>>>>> operation would not incur much of a performance hit and it won't be as
>>>>> prohibitive as a normal range query in a cache. If that is the case, I
>>>>> think we can go with the approach suggested by Sanjeewa.
>>>>>
>>>>> WDYT?
>>>>>
>>>>>
>>>>>>> Please find my comments about both the approaches.
>>>>>>>
>>>>>>> 1. Having an in-memory cache would speedup things but based on the
>>>>>>> IPs in the data set, there could be number of entries for IPs in the 
>>>>>>> same
>>>>>>> range. One problem with this approach is that, if there is a server
>>>>>>> restart, the initial script execution would take a lots of time. Also 
>>>>>>> based
>>>>>>> on certain scenarios(high number of different IPs) the cache would not 
>>>>>>> have
>>>>>>> a significant effect on script execution performance.
>>>>>>>
>>>>>>> 2. Having a DB based cache would persist the data even on a restart
>>>>>>> and the data fetching query would be searching for an specific 
>>>>>>> value(not a
>>>>>>> range query as against the max-mind DB). But the downside is that for a
>>>>>>> cache miss there would be minimum 3 DB queries (one for the cache table
>>>>>>> lookup and one for the max-mind db lookup and one for the
>>>>>>> cache persistence).
>>>>>>>
>>>>>>> That is why we have initiated this thread to finalize the caching
>>>>>>> approach we should take.
>>>>>>> ​
>>>>>>> ​Thanks,
>>>>>>> Janaka​
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> sanjeewa.
>>>>>>>>
>>>>>>>
>>>>> Thanks,
>>>>> Lasantha
>>>>>
>>>>>
>>>>>>
>>>>>>>> On Fri, Mar 4, 2016 at 3:12 PM, Tharindu Dharmarathna <
>>>>>>>> tharin...@wso2.com> wrote:
>>>>>>>>
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> We are going to implement Client IP based Geo-location Graph in
>>>>>>>>> API Manager Analytics. When we go through the ways of doing in [1] , 
>>>>>>>>> we
>>>>>>>>> selected [2] as the most suitable way to do.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Overview of max-mind's DB.*
>>>>>>>>>
>>>>>>>>> As the structure of the db (attached in image), They have two
>>>>>>>>> tables which incorporate to get the location.
>>>>>>>>>
>>>>>>>>> Find geoname_id according to network and get Country,City from
>>>>>>>>> locations table.
>>>>>>>>>
>>>>>>>>> *Limitations*
>>>>>>>>>
>>>>>>>>> As their database dump we couldn't directly process the ip from
>>>>>>>>> those tables. We need to check the given ip is in between the network 
>>>>>>>>> min
>>>>>>>>> and max ip. This query get some long time (10 seconds in indexed 
>>>>>>>>> data). If
>>>>>>>>> we directly do this from spark script for each and every ip which in
>>>>>>>>> summary table (regardless if ip is same from two row data) will query 
>>>>>>>>> from
>>>>>>>>> the tables. Therefore this will incur the performance impact on this 
>>>>>>>>> graph.
>>>>>>>>>
>>>>>>>>> *Solution*
>>>>>>>>>
>>>>>>>>> 1. Implement LRU cache against ip address vs location.
>>>>>>>>>
>>>>>>>>> This will need to implement on custom UDF in Spark. If ip querying
>>>>>>>>> from spark available in cache it will give the location from it , IF 
>>>>>>>>> it is
>>>>>>>>> not It will retrieve from DB and put into the cache.
>>>>>>>>>
>>>>>>>>> 2. Persist in a Table
>>>>>>>>>
>>>>>>>>> ip as the primary key and Country and city as other columns and
>>>>>>>>> retrieve data from that table.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Please feel free to give us the most suitable way of doing this
>>>>>>>>> solution?.
>>>>>>>>>
>>>>>>>>> [1] - Implementing Geographical based Analytics in API Manager
>>>>>>>>> mail thread.
>>>>>>>>>
>>>>>>>>> [2] - http://dev.maxmind.com/geoip/geoip2/geolite2/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Thanks*
>>>>>>>>>
>>>>>>>>> *Tharindu Dharmarathna*
>>>>>>>>> Associate Software Engineer
>>>>>>>>> WSO2 Inc.; http://wso2.com
>>>>>>>>> lean.enterprise.middleware
>>>>>>>>>
>>>>>>>>> mobile: *+94779109091 <%2B94779109091>*
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> *Sanjeewa Malalgoda*
>>>>>>>> WSO2 Inc.
>>>>>>>> Mobile : +94713068779
>>>>>>>>
>>>>>>>> <http://sanjeewamalalgoda.blogspot.com/>blog
>>>>>>>> :http://sanjeewamalalgoda.blogspot.com/
>>>>>>>> <http://sanjeewamalalgoda.blogspot.com/>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Janaka Ranabahu*
>>>>>>> Associate Technical Lead, WSO2 Inc.
>>>>>>> http://wso2.com
>>>>>>>
>>>>>>>
>>>>>>> *E-mail: jan...@wso2.com <http://wso2.com>**M: **+94 718370861
>>>>>>> <%2B94%20718370861>*
>>>>>>>
>>>>>>> Lean . Enterprise . Middleware
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sachith Withana
>>>>>> Software Engineer; WSO2 Inc.; http://wso2.com
>>>>>> E-mail: sachith AT wso2.com
>>>>>> M: +94715518127
>>>>>> Linked-In: <http://goog_416592669>
>>>>>> https://lk.linkedin.com/in/sachithwithana
>>>>>>
>>>>>> _______________________________________________
>>>>>> Architecture mailing list
>>>>>> Architecture@wso2.org
>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Lasantha Fernando*
>>>>> Senior Software Engineer - Data Technologies Team
>>>>> WSO2 Inc. http://wso2.com
>>>>>
>>>>> email: lasan...@wso2.com
>>>>> mobile: (+94) 71 5247551
>>>>>
>>>>> _______________________________________________
>>>>> Architecture mailing list
>>>>> Architecture@wso2.org
>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *Tharindu Dharmarathna*Associate Software Engineer
>>>> WSO2 Inc.; http://wso2.com
>>>> lean.enterprise.middleware
>>>>
>>>> mobile: *+94779109091 <%2B94779109091>*
>>>>
>>>> _______________________________________________
>>>> Architecture mailing list
>>>> Architecture@wso2.org
>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>
>>>>
>>>
>>
>>
>> --
>> *Lasantha Fernando*
>> Senior Software Engineer - Data Technologies Team
>> WSO2 Inc. http://wso2.com
>>
>> email: lasan...@wso2.com
>> mobile: (+94) 71 5247551
>>
>> _______________________________________________
>> Architecture mailing list
>> Architecture@wso2.org
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
>
> --
>
> *Tharindu Dharmarathna*Associate Software Engineer
> WSO2 Inc.; http://wso2.com
> lean.enterprise.middleware
>
> mobile: *+94779109091 <%2B94779109091>*
>



-- 

*Tharindu Dharmarathna*Associate Software Engineer
WSO2 Inc.; http://wso2.com
lean.enterprise.middleware

mobile: *+94779109091*
_______________________________________________
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to