Re: CoreData async fetch request

Ben Trumbull Thu, 08 Oct 2009 22:27:14 -0700


On Oct 8, 2009, at 8:28 PM, David Melgar wrote:

I read a little on ICU and now understand that sqlite by defaultdoes not handle case insensitive unicode.Is there an easy way to make sqlite use ICU on the Mac, or do I haveto build it myself with ICU enabled?

Probably the easiest thing to do (besides just using Core Data) isregister custom functions with the SQLite API that go throughCFString. The SQLite APIs for this are fairly straight forward. TheICU dylib on Mac OS X is not public API. There are a variety ofreferences to building ICU or otherwise addressing this issue on theweb. Should you do that, you can register custom functions leveragingICU with SQLite as one would using CFString. I'd strongly recommendagainst building and statically linking SQLite yourself.

Based on the derived property example, it seems that I would need toduplicate any text fields I commonly search on where I want tosupport case insensitive queries.

Yes, for data sets of this size. For smaller data sets, in the 10^2or 10^3 range, you might decide to skip this and accept "slower"queries since they'll probably be around 5ms - 20ms.

How does spotlight manage to index so much text and respond quickly?Are those queries can insensitive?

Spotlight supports insensitive queries. Spotlight also duplicatesmuch of the material for its index. The importing process is notfree. You're certainly welcome to put all your data into Spotlightand test its query performance for your purposes. Spotlight isfundamentally a full text search index. It's optimized heavily forprefix and word searching across very large numbers of documents.It's very good for its intended usage pattern, but that pattern is notthe same as the usage of relational databases. The technologies arecomplementary.

I've done a bunch more performance tests. I don't understand some ofthe differences.I hadn't realized that coredata in the data model has a flag toindex a field. When I did that for Coredata, it dramatically sped itup. Although adding an index to my sqlite test slowed it down.

Your test results are very odd, in that they are still 50x to 100xslower than I would expect. These tests have not yet applied thederived property optimization I suggested, correct ?

As literally noted here, none of those queries are eligible for anindex, so I assume your actual code is doing something slightlydifferent, or the timing difference are due to configuration issues.Hot I/O and cold I/O will be very different. For example:


search, without an index (cold I/O)
add an index (page everything into the UBC cache)
search, with an index, but a query that doesn't use the index (hot I/O)

The difference between hot and cold I/O for query performance will behuge. As much as 100x. You can use /usr/bin/purge to forceeverything to be cold I/O. Not necessary representative of real worldperformance, as the whole disk cache will be flushed, includingmaterial relevant to the system that would normally exist when you runyour app.

Coredata, without an index, predicate "name like foo", 20 databases400k records each. 2m17s.Coredata with an index, predicate "name like foo", 20 databases 1milrecords each, 25 seconds.SQLite no index, sql "where name like foo%bar", 20 databases 1milrecords each 11 seconds.SQLite index, sql "where name like foo%bar", 20 databases 1 milrecords each 30 seconds.

Why the 20 database files ? These queries will be faster with asingle database file and its unified index.

In any event, the derived property optimization should make all theprefix/equality queries subsecond. On my laptop, for 20 millionrecords, that should be in the 500ms ballpark (hot I/O)

As you said, with an index, and testing for equality, both coredataand sqlite queries responded in <1 sec. SQLite without an index was9 seconds for same query.

That sounds about right. It'll be faster if you use a single databasewith 20 million rows and a unified index table.

I can only presume SQLite slowed down with an index because thedatabase file was physically much larger and took longer to readfrom disk.

I suspect its a flaw in the testing methodology, actually. Do you runthe queries multiple times, throw out the best & worst and average therest ? Are you running other apps at the same time you do performancetests ?

It is possible for the presence of an index to negatively impact a non-indexed query, but that would be a few %, not 3x slower An indexedquery would read many fewer pages from disk than a non-indexed query,regardless of the file size. O(lg(N)) instead of O(N) and at 20million that's a big difference. There are other possible issues inplay, but you said the result set is small, so they seem unlikely.


- Ben

_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: CoreData async fetch request

Reply via email to