Hi All, After Going though above discussion , We had implemented the Plug-gable User Define Extension point. From this configuration We can write our own implementation which can used to get the Country and State of the Given IP.
*Caching Implementation* We define two level of caching as below. When IP address checked from the *UDF* , First It check on Cache to get the Location Information. If it is not in cache It I'll check on another database which contain IP to Location Direct Mapping as *Sajith* Mentioned. If it is there it will return and cache that location. If location not in that database , IP will check against the *MAXMIND* database. and store the location on cache and the above table. Thanks Tharindu On Tue, Mar 8, 2016 at 2:34 PM, Tharindu Dharmarathna <tharin...@wso2.com> wrote: > Hi All, > > We have come across following ways to do the above task after the Initial > POC. > > 1. Using File type database which given by max-mind (.mmdb) and use there > database readers. > > From this approach we got lesser value to get the location from the above > using JAX-RS service which is used to wrap the above database. This JAX-RS > implementation is by default used the max-mind's Cache implementation which > can find from [1] . > > *Limitations* > > > - Hosting of the Jax-RS app in another server. > - # of http calls will high. > > > 2. Call query server as above thread and cached the location with ip. > > Here you can find the execution time for a single query which get for each > method. > > > *Method 1 : 4.5 seconds* > > *Method 2: 4.76 seconds* > > > Thanks > Tharindu > > > On Tue, Mar 8, 2016 at 8:29 AM, Lasantha Fernando <lasan...@wso2.com> > wrote: > >> Hi Tharindu, >> >> On 7 March 2016 at 21:10, Sajith Ravindra <saji...@wso2.com> wrote: >> >>> >>> 2. Having a DB based cache would persist the data even on a restart and >>>> the data fetching query would be searching for an specific value(not a >>>> range query as against the max-mind DB). But the downside is that for a >>>> cache miss there would be minimum 3 DB queries (one for the cache table >>>> lookup and one for the max-mind db lookup and one for the >>>> cache persistence). >>>> >>> >>> In order to avoid expensive cache misses we may eagerly populate the DB >>> table cache. i.e. When there's a cache miss we do the lookup in max-mind db >>> and then add multiple entries for multiple IPs of that netwokrk_cid to the >>> Cache DB table instead of only for that particular IP. That way we reduce >>> the chance of cache miss being very expensive, as we increase the chance of >>> it being found on the first DB lookup. >>> >>> We might need to do some evaluation to determine how much entries that >>> we are going to add to the DB cache for IP belongs to a particular >>> netwokrk_cid. For an example if requests from a certain netwokrk_cidr is >>> frequent we may want to add more entries with compared to a less frequent >>> netwokrk_cidr. >>> >>> The downside is the DB cache tend to be more big. >>> >>> Thanks >>> *,Sajith Ravindra* >>> Senior Software Engineer >>> WSO2 Inc.; http://wso2.com >>> lean.enterprise.middleware >>> >>> mobile: +94 77 2273550 >>> blog: http://sajithr.blogspot.com/ >>> <http://lk.linkedin.com/pub/shani-ranasinghe/34/111/ab> >>> >>> On Mon, Mar 7, 2016 at 4:37 AM, Tharindu Dharmarathna < >>> tharin...@wso2.com> wrote: >>> >>>> Hi Lasantha, >>>> >>>> Upto now we are doing the following way in order to get the geo >>>> location from the stated dump. >>>> >>>> 1. two columns added filled with long value of lower and upper value >>>> of network ip addresses. Then get the geoname_id with respect to the long >>>> value for the given ip which between this above long values. Hope you will >>>> got this idea on our approach. Is there any way to do bit wise operation in >>>> order to get the network_cidr value ? . >>>> >>> >> Can't we do it by keeping the network IP and the subnet as two columns >> and the geoname_id as the third. Say for example, if 192.168.0.0/20 is >> the cidr, for IPv4 routing what is usually done is we get the IP as int, >> then do a bitwise AND with the subnet mask (e.g. if subnet mask is 20, that >> would mean 20 bits with value 1 and remaining 12 bits of value 0, i.e. >> 11111111 11111111 11110000 00000) and check whether that returns the >> network IP. >> >> You might find more info here [1]. I think there should be libraries that >> wrap this operation. But if performance is a concern and we need to keep >> the cache search implementation very lean, we can implement it ourselves. >> >> WDYT? >> >> [1] >> http://stackoverflow.com/questions/4209760/validate-an-ip-address-with-mask >> >> Thanks, >> Lasantha >> >> >>>> Thanks >>>> Tharindu >>>> >>>> On Mon, Mar 7, 2016 at 12:05 AM, Lasantha Fernando <lasan...@wso2.com> >>>> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I think what Sachith suggests also makes sense. But am also rooting >>>>> for the in-memory cache implementation suggested by Sanjeewa with >>>>> ip-netmask approach. >>>>> >>>>> Please find my comments inline. >>>>> >>>>> On 5 March 2016 at 23:50, Sachith Withana <sach...@wso2.com> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> From what I understand/was told, this happens once a day ( or >>>>>> relatively infrequently), and you wanna avoid searching through all the >>>>>> geo >>>>>> data per ip ( since you are grouping the requests by IP). >>>>>> >>>>>> IF that's the case, it would be better to use a separate DB table to >>>>>> cache these data ( IP, geoID ..etc) with the IP being the primary key ( >>>>>> which would improve the lookup time), and even though there will be cache >>>>>> misses, it would eventually reduce the (#cacheMisses/ Hits). >>>>>> >>>>>> Having a DB cache would be better since you do want to persist these >>>>>> data to be used over time. >>>>>> >>>>>> BTW in a cache miss, if we can figure out a way to limit the search >>>>>> range on the original table or at least stop the search once a match is >>>>>> found, it would greatly improve the cache miss time as well. >>>>>> >>>>>> That's my two cents. >>>>>> >>>>>> Cheers, >>>>>> Sachith >>>>>> >>>>>> On Sun, Mar 6, 2016 at 8:24 AM, Janaka Ranabahu <jan...@wso2.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Sanjeewa, >>>>>>> >>>>>>> On Sun, Mar 6, 2016 at 7:25 AM, Sanjeewa Malalgoda < >>>>>>> sanje...@wso2.com> wrote: >>>>>>> >>>>>>>> Implementing cache is better than having another table mapping IMO. >>>>>>>> What if we query database and keep IP range and network name in memory. >>>>>>>> Then we may do quick search on network name and then based on that >>>>>>>> rest can load some other way. >>>>>>>> WDYT? >>>>>>>> >>>>>>> We thought of having an in memory cache but we faced several issues >>>>>>> along the way. Let me explain the situation as it is per now. >>>>>>> >>>>>>> The Max-Mind DB has the IP addresses with the IP and the netmask. >>>>>>> Ex: 192.168.0.0/20 >>>>>>> >>>>>>> The calculation of the IP address range would be like the following. >>>>>>> >>>>>>> Address: 192.168.0.1 11000000.10101000.0000 0000.00000001 >>>>>>> Netmask: 255.255.240.0 = 20 11111111.11111111.1111 0000.00000000 >>>>>>> Wildcard: 0.0.15.255 00000000.00000000.0000 1111.11111111 >>>>>>> =>Network: 192.168.0.0/20 11000000.10101000.0000 0000.00000000 >>>>>>> (Class C) >>>>>>> Broadcast: 192.168.15.255 11000000.10101000.0000 1111.11111111 >>>>>>> HostMin: 192.168.0.1 11000000.10101000.0000 0000.00000001 >>>>>>> HostMax: 192.168.15.254 11000000.10101000.0000 1111.11111110 >>>>>>> Hosts/Net: 4094 (Private Internet >>>>>>> <http://www.ietf.org/rfc/rfc1918.txt>) >>>>>>> >>>>>>> >>>>>>> Therefore what we are currently doing is to calculate the start and >>>>>>> end IP for all the values in the max-mind DB and alter the tables with >>>>>>> those values initially(this is a one time thing that will happen). When >>>>>>> the >>>>>>> Spark script executes, we check whether the given IP is between any of >>>>>>> the >>>>>>> start and end ranges in the tables. That is the reason why it is taking >>>>>>> a >>>>>>> long time to fetch results for a given IP. >>>>>>> >>>>>>> As a solution for this, we discussed what Tharindu has mentioned. >>>>>>> 1. Have a in memory caching mechanism. >>>>>>> 2. Have a DB based caching mechanism. >>>>>>> >>>>>>> The only point that we have to highlight is the fact that in both >>>>>>> the above mechanisms we need to cache the IP address(not the ip-netmask >>>>>>> as >>>>>>> it was in the max-mind db) against the Geo location. >>>>>>> >>>>>>> Ex:- >>>>>>> For 192.168.0.1 - Colombo, Sri Lanka >>>>>>> For 192.168.15.254 - Colombo, Sri Lanka >>>>>>> >>>>>>> So as per the above example I took, if there are requests form all >>>>>>> the possible 4094 address we will be caching each IP with the Geo >>>>>>> location(since introducing range queries in a cache is not a good >>>>>>> practice). >>>>>>> >>>>>> >>>>> Since we are implementing a custom cache, won't we be doing a bitwise >>>>> operation for the lookup with netmask and network IP? So basically, we >>>>> would keep the network IP and the netmask in cache and simply do a bitwise >>>>> AND to determine whether it is a match or not, right? Am thinking such an >>>>> operation would not incur much of a performance hit and it won't be as >>>>> prohibitive as a normal range query in a cache. If that is the case, I >>>>> think we can go with the approach suggested by Sanjeewa. >>>>> >>>>> WDYT? >>>>> >>>>> >>>>>>> Please find my comments about both the approaches. >>>>>>> >>>>>>> 1. Having an in-memory cache would speedup things but based on the >>>>>>> IPs in the data set, there could be number of entries for IPs in the >>>>>>> same >>>>>>> range. One problem with this approach is that, if there is a server >>>>>>> restart, the initial script execution would take a lots of time. Also >>>>>>> based >>>>>>> on certain scenarios(high number of different IPs) the cache would not >>>>>>> have >>>>>>> a significant effect on script execution performance. >>>>>>> >>>>>>> 2. Having a DB based cache would persist the data even on a restart >>>>>>> and the data fetching query would be searching for an specific >>>>>>> value(not a >>>>>>> range query as against the max-mind DB). But the downside is that for a >>>>>>> cache miss there would be minimum 3 DB queries (one for the cache table >>>>>>> lookup and one for the max-mind db lookup and one for the >>>>>>> cache persistence). >>>>>>> >>>>>>> That is why we have initiated this thread to finalize the caching >>>>>>> approach we should take. >>>>>>> >>>>>>> Thanks, >>>>>>> Janaka >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Thanks, >>>>>>>> sanjeewa. >>>>>>>> >>>>>>> >>>>> Thanks, >>>>> Lasantha >>>>> >>>>> >>>>>> >>>>>>>> On Fri, Mar 4, 2016 at 3:12 PM, Tharindu Dharmarathna < >>>>>>>> tharin...@wso2.com> wrote: >>>>>>>> >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> We are going to implement Client IP based Geo-location Graph in >>>>>>>>> API Manager Analytics. When we go through the ways of doing in [1] , >>>>>>>>> we >>>>>>>>> selected [2] as the most suitable way to do. >>>>>>>>> >>>>>>>>> >>>>>>>>> *Overview of max-mind's DB.* >>>>>>>>> >>>>>>>>> As the structure of the db (attached in image), They have two >>>>>>>>> tables which incorporate to get the location. >>>>>>>>> >>>>>>>>> Find geoname_id according to network and get Country,City from >>>>>>>>> locations table. >>>>>>>>> >>>>>>>>> *Limitations* >>>>>>>>> >>>>>>>>> As their database dump we couldn't directly process the ip from >>>>>>>>> those tables. We need to check the given ip is in between the network >>>>>>>>> min >>>>>>>>> and max ip. This query get some long time (10 seconds in indexed >>>>>>>>> data). If >>>>>>>>> we directly do this from spark script for each and every ip which in >>>>>>>>> summary table (regardless if ip is same from two row data) will query >>>>>>>>> from >>>>>>>>> the tables. Therefore this will incur the performance impact on this >>>>>>>>> graph. >>>>>>>>> >>>>>>>>> *Solution* >>>>>>>>> >>>>>>>>> 1. Implement LRU cache against ip address vs location. >>>>>>>>> >>>>>>>>> This will need to implement on custom UDF in Spark. If ip querying >>>>>>>>> from spark available in cache it will give the location from it , IF >>>>>>>>> it is >>>>>>>>> not It will retrieve from DB and put into the cache. >>>>>>>>> >>>>>>>>> 2. Persist in a Table >>>>>>>>> >>>>>>>>> ip as the primary key and Country and city as other columns and >>>>>>>>> retrieve data from that table. >>>>>>>>> >>>>>>>>> >>>>>>>>> Please feel free to give us the most suitable way of doing this >>>>>>>>> solution?. >>>>>>>>> >>>>>>>>> [1] - Implementing Geographical based Analytics in API Manager >>>>>>>>> mail thread. >>>>>>>>> >>>>>>>>> [2] - http://dev.maxmind.com/geoip/geoip2/geolite2/ >>>>>>>>> >>>>>>>>> >>>>>>>>> *Thanks* >>>>>>>>> >>>>>>>>> *Tharindu Dharmarathna* >>>>>>>>> Associate Software Engineer >>>>>>>>> WSO2 Inc.; http://wso2.com >>>>>>>>> lean.enterprise.middleware >>>>>>>>> >>>>>>>>> mobile: *+94779109091 <%2B94779109091>* >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> *Sanjeewa Malalgoda* >>>>>>>> WSO2 Inc. >>>>>>>> Mobile : +94713068779 >>>>>>>> >>>>>>>> <http://sanjeewamalalgoda.blogspot.com/>blog >>>>>>>> :http://sanjeewamalalgoda.blogspot.com/ >>>>>>>> <http://sanjeewamalalgoda.blogspot.com/> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *Janaka Ranabahu* >>>>>>> Associate Technical Lead, WSO2 Inc. >>>>>>> http://wso2.com >>>>>>> >>>>>>> >>>>>>> *E-mail: jan...@wso2.com <http://wso2.com>**M: **+94 718370861 >>>>>>> <%2B94%20718370861>* >>>>>>> >>>>>>> Lean . Enterprise . Middleware >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Sachith Withana >>>>>> Software Engineer; WSO2 Inc.; http://wso2.com >>>>>> E-mail: sachith AT wso2.com >>>>>> M: +94715518127 >>>>>> Linked-In: <http://goog_416592669> >>>>>> https://lk.linkedin.com/in/sachithwithana >>>>>> >>>>>> _______________________________________________ >>>>>> Architecture mailing list >>>>>> Architecture@wso2.org >>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> *Lasantha Fernando* >>>>> Senior Software Engineer - Data Technologies Team >>>>> WSO2 Inc. http://wso2.com >>>>> >>>>> email: lasan...@wso2.com >>>>> mobile: (+94) 71 5247551 >>>>> >>>>> _______________________________________________ >>>>> Architecture mailing list >>>>> Architecture@wso2.org >>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>> >>>>> >>>> >>>> >>>> -- >>>> >>>> *Tharindu Dharmarathna*Associate Software Engineer >>>> WSO2 Inc.; http://wso2.com >>>> lean.enterprise.middleware >>>> >>>> mobile: *+94779109091 <%2B94779109091>* >>>> >>>> _______________________________________________ >>>> Architecture mailing list >>>> Architecture@wso2.org >>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>> >>>> >>> >> >> >> -- >> *Lasantha Fernando* >> Senior Software Engineer - Data Technologies Team >> WSO2 Inc. http://wso2.com >> >> email: lasan...@wso2.com >> mobile: (+94) 71 5247551 >> >> _______________________________________________ >> Architecture mailing list >> Architecture@wso2.org >> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >> >> > > > -- > > *Tharindu Dharmarathna*Associate Software Engineer > WSO2 Inc.; http://wso2.com > lean.enterprise.middleware > > mobile: *+94779109091 <%2B94779109091>* > -- *Tharindu Dharmarathna*Associate Software Engineer WSO2 Inc.; http://wso2.com lean.enterprise.middleware mobile: *+94779109091*
_______________________________________________ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture