Jean-Claude Wippler wrote:

Heiko Hees wrote:

considering i have two views x,y both with a property z, which is sorted in both views,
i would excpect joins or intersections on this property to be very fast. unfortunally in reality, the intersection of two sorted views with 1.000.000 integers takes several long seconds.


MK does not know it's sorted and re-sorts both views when joining.

I did a small test using python and got the time down a little bit. Here is what I did essentially (I'm sure that C++ would be faster)


create a mapping object key=int value=index, you could also use (a bitvector now that I think about it, it will save some memory).

create a temporary
determine the smaller of the two views
scan the z column of the smaller object, add integers to the mapping
scan the z column of the larger object, if the integer is in the mapping, append the row to the temporary view.


1,000,000 integers took 5 seconds on a 2.0Ghz machine but the memory overhead was about 10M.

Metakit is very fast for scanning through columns, so what I might do is just scan through both columns in parallel taking advantage of their sorted nature. In psuedo-code here

while list1 and list2:
         z1 = list1.next()
         z2 = list2.next()
         while z1 > z2:  list1.next()
         while z2 > z1: list2.next()
         if z1 == z2:
             add to view
             look ahead into the list1 and list2 columns
             to grab more matches is possible...

The tricky part is the look ahead to make sure that you get all matches, but this code should really fly.


_______________________________________________
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Reply via email to