Re: CoreData async fetch request

Ben Trumbull Tue, 06 Oct 2009 01:10:01 -0700


On Oct 5, 2009, at 7:00 PM, enki1...@gmail.com wrote:

I am doing a simple query search for a text string pattern (ie 'SELFlike foo') on ~10 million small records stored persistently usingsqlite. This is a performance test to make sure I get reasonableperformance from my database engine before I commit too much code toit.

Well, @"self like 'foo'" is a different problem than @"self like'*foo*'". LIKE queries require Unicode compliant regex and areintrinsically expensive. If you do not have a wildcard, you arebetter off use an == query. The DerivedProperty ADC example shows howto transform the text to make it much faster to search.

If you do need to use a wildcard, you'll really want to stick with 1,either prefix matching or suffix matching. The DerivedPropertyexample shows prefix matching. It's possible to structure this to usea binary index, and make the query extremely fast even for millions ofrecords. There is a huge difference in computational complexity.Prefix matching can use an index, and therefore can run O(lg(N)).

*foo* (contains) searches are slow, and cannot use an index. Youreally want to avoid these. Even Spotlight does not do arbitrarysubstring matching. Compare "help" with "elp" in your Spotlightresults. If you want word matching, you can use Spotlight orSearchKit to build a supplemental FTS index.

The query is taking over 3 minutes with a small result set. This ison a new 13" macbook pro w 4gb memory.

... a full table scan executing a regex on each of 10 million rows ona 5400 rpm drive ? Well, for doing all that, 3 minutes sounds prettyfast.

Just as a reference point, if you grab the objectIDs from the resultset, and execute an IN query selecting those objects, how long does ittake ? 50ms ? 100ms ?

The query is taking too long for a user to sit and wait for it. Isthere a way to speed it up? Can indexing be applied to it?
I had thought if I could display results as they are found thatmight be reasonable. In my tests, if I use setFetchBatchSize andsetOffset to restart it, then it ends up repeating the query takingthat many times longer to get a result. Not reasonable. It does notseem to start the query where it left off, as a database cursorwould do.

You can use -com.apple.CoreData.SQLDebug 1 to see the SQL we pass tothe database. This also has nothing to do with Core Data. This ishow offset queries behave. I realize it's not what you expected,which is why I recommended using -setFetchBatchSize: instead.

My impression is that my usage scenario is not an appropriate use ofcore data.

Core Data is just passing the query off to the database. I'm not surewhy you think going to the database directly will do anything for the179.9 / 180.0 seconds it takes to evaluate the query in the database.

I was planning to try SQLite directly. Would it be more appropriate?

You can try it directly, but it won't have any meaningful effect onyour performance results except that SQLite's built in LIKE operatordoesn't support Unicode. It'll be a tiny bit faster for that, butstill the same order of magnitude. And then, either you'll have tointegrate ICU support as Core Data does, and it'll be exactly thesame, or be stuck with ASCII.

Regardless, you'll need to make your searches eligible for an index.The DerivedProperty example shows how to do that.


- Ben

Thanks

On Oct 5, 2009 7:14pm, Ben Trumbull <trumb...@apple.com> wrote:
> Is there a way to do an asynchronous fetch request against Core data
> returning partial results?
>
> That depends on whether it's the query part that's expensive (e.g.WHERE clause with complex text searching and table scans) or simplythe quantity of the row data that's your problem. For the latter,you can just use -setFetchBatchSize: and be done.
>
>
> You can use a separate MOC on a background thread to performasynchronous work. You can then pass over results to the mainthread to display to the user. However, unless your search termsare very expensive, it's usually easier and faster to use -setFetchBatchSize: synchronously. For well indexed queries, it canhandle a million or two rows per second. Not sure why you'd subjectyour users to that kind of experience. It's common to use fetchlimits, count requests, and only show the top N results. What'syour user going to do with a hundred thousand results anyway ?
>
>
> If you need to attack the computational expense of your queryterms, that's more complicated. Obviously it would be best tooptimize the queries and ensure they are using an index. But ifthat's not enough, you can execute the queries in a background MOC,fetching objectIDs + row data (put in the the row cache) and thenhave the other MOC materialize the objects by ID from the rowcache. There's a BackgroundFetching example in /Developer/Examples/CoreData. It shows how to do this. Returning partial resultsincrementally would require some creativity on your part tosubdivide the query into several. Since most expensive queries aretext searches, it's usually possible to subdivide the result setnaturally. Like the first letter of 'title'. Similar to the thumbbar index on the side of the Contacts app on the iPhone.
>
>
> There's also a DerivedProperty example on ADC for optimizing textqueries.
>
>
> Obviously, Apple's own Spotlight could not use something like
> Coredata, since it heavily relies on returning asynchronous partial
> results.
>
> Which is neither here nor there. Most Cocoa applications wouldn'twant Spotlight to be the sole persistence back end of their data.The latency of putting all your data in a full text index instead ofa relational database or keyed archive would be pretty absurd. Now,if you're writing an app that's primarily structured around fulltext searching, you might instead prefer to focus on putting yourdata in Spotlight via small files, and using the Spotlight APIs.But it's not suitable for apps interested in an OOP view of theirdata.
>
>
> Frankly, this is my second application I've attempted to useCoredata
> to find it come up surprisingly short. The first time the issue was
> core data not being thread safe.
>
> Core Data can be used efficiently with multiple threads. It mighthelp to think of each MOC as a separate writeable view. If you'dlike to know more, you can search the archives for my posts.
>
>
> What is the target market for Core Data? Why sort of application is
> ideal for its use? What size data store? Right now it escapes me.
>
>
> Cocoa and Cocoa Touch applications, particularly done in an MVCstyle with an OO perspective on their data. Some people also use itas a persistent cache for data stored in another canonical format,such as XML files. On the Mac side, we've had customers with 3+million rows (multi GB) databases, and on the embedded side, roughly400,000 rows (100s MB). However, it does take some care and feedingto handle data sets like that, and most developers find it straightforward up to about 10% those numbers.
>
>
>
> It sounds like you're having performance issues. What kinds ofqueries are you trying to accomplish ? How much data are youworking with ? How have you modeled your primary entities?
>
>
> You can fetch back just NSManagedObjectIDs, and -setIncludesPropertyValues: to NO to effectively create your owncursors if you prefer.
>
>
> - Ben
>
>
>
>
>



_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: CoreData async fetch request

Reply via email to