Thanks Jeremy, I just wrote up my own little analysis (below) while you were responding. I'll look for the kd-tree work; if I see discussion (and am stupid enough to heap more work onto my plate) then I might get involved.
Oops, didn't send... Cheers, Thomas ----- So another glance tells me there is a list of maps (one element for each index method) and it uses Data.Map under the hood. So I have O(m lg n) where m is the number of index methods and n is the number of elements. Space wise, I think Data.Map takes up 6 words per Bin constructor (1 for the constructor, 1 for the 'Size' and one for the pointer indirection for each additional field), so the space is "6 * n * m * w" where w is the word size. This means indexing by 5 methods for 1M entries takes about 256MB, assuming 28B per entry that's 28MB of data + 256 indexing ~ 282MB needed. Indexing my imaginary 1B points by user,date,lat,lon is 6 * 2^30 * 4 * 8 - or about 192 GB of indexing + 28GB of data for 220GB total. Obviously I shouldn't be talking about keeping a live data set of 28GB in memory let alone indexing it all, but I was just curious about the ratio (220MB for 1M points, which is just one heavy user). On Sat, 2010-10-02 at 14:09 -0500, Jeremy Shaw wrote: > In the current version of IxSet, the performance of querying on just > the Lon would be essentially the same as if you just had a "Data.Map > Lon Point". But the queries on the second index are current not so > great. There is work in progress to rewrite the internals of IxSet to > be based on a kd-tree, in which case your query should be pretty > efficient. > > So, that answer is pretty vague :) I am in the process of wrapping up > happstack 0.6 which has focused on fixing some performance issues with > happstack-server, and refactoring the code so that user API and > internals are more clearly separated and better documented. > > happstack 0.7 is all about happstack-state. A key aspect will be > nailing down some solid performance benchmarks instead of vague hand > waving :) > > The numbers you give are certainly within the scope of what we would > like 0.7 to be able to handle. Also, I should note that > happstack-state and happstack-ixset are independent from each other. > You can easily use something other than IxSet to store your points and > still use happstack-state. > > - jeremy > > On Fri, Oct 1, 2010 at 1:53 PM, Thomas M. DuBuisson > <thomas.dubuis...@gmail.com> wrote: > >> That is pretty close to how it would look using happstack-state. Here > >> is a complete, runnable example which defines the types, a query, > >> creates/initializes the database, performs the query, and prints the > >> results. > > [snip] > > > > How is data stored in Happstack.State? I see the "Component" instance > > uses "fromList" from happstack-ixset but can't find any information on > > the algorithm used or its efficiency (computationally or wrt space). > > > > If making this more concrete helps then here is a possible use: > > > > == GPS Points == > > I have a GPS logger that logs every 10 seconds when I jog. Jogging for > > an hour a day for the past 180 days has resulted in 64k points. > > Pretending I hosted a site for joggers (and all points were in the same > > DB) I could easily result in a billion points (< 20K users). Would > > happstack-ixset code in the form "points @< (Lon -120) @> (Lon -125) @> > > (Lat 45) @< (Lat 50)" perform reasonably? > > > > Cheers, > > Thomas > > > > _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe