Re: Validating unique objects in CoreData

Roland King Sat, 13 Feb 2010 20:32:55 -0800

ok I downloaded your project. I agree with Jerry there's a memory leak, 
actually worse than that, you aren't actually remembering the article to set 
its parent if you create it, so


        [ DDArticle newArticleWithID: messageid context:ctx ];

should be 

        article = [ DDArticle newArticleWithID: messageid context:ctx ];
        [ article release ];

I got the test to run in 30 seconds, which isn't too bad as just looping over 
the articles takes about 7 seconds itself. Here's your problem, you're never 
saving the work, so you are building up all the articles you're adding in 
memory. Yes the SQL store has an index on it and yes coredata is issuing the 
correct select command but .. there's nothing in the store. So as well as 
looking in the store, it also has to scan every one of the objects still 
waiting to be persisted. Clearly even though it uses an index on the SQL, it 
doesn't use the index hint to build an in-memory map for finding the in-memory 
objects which match a predicate. So yes your adds go slower and slower and 
slower as core data each time does one SQL lookup in an always empty database 
which finds 0 objects in 0.0005 of a second, then goes scanning an increasing 
set of pending objects one by one. Since you never match as your IDs are 
unique, it scans the whole set every time. If you log it you'll see it adding 
slower and slower each iteration. 

So I tried adding in [ archive save ] to make it commit and was surprised to 
find nothing changed, until I realized that [ archive save ] saves the wrong 
context, in fact your example code never saves anything to the DB at all! 

Adding this in inside your add loop

        if( [ [ ctx updatedObjects ] count ] > 100 )
                [ ctx save:nil ];

means the working set is never larger than 100, so that limits the amount of 
in-memory lookup, once the objects are cached in the DB, the SQL lookup piece 
is blisteringly quick, so your check for existing objects runs in nearly 
constant time. 100 is a parameter you can tweak, you could just save every 
single time but that probably has overhead, if you make it much larger than 100 
you have the save overhead less often but you have to scan more in-memory 
objects, it's a compromise. 

1000 checks and inserts a second seems .. about ok to me and if you make sure 
and save the context regularly, you should be able to keep that rate up even as 
the database size grows. 

On 14-Feb-2010, at 5:51 AM, daniele malcom wrote:

> Hi Roland, in fact indices table exists (for DDArticle entity):
> Enter SQL statements terminated with a ";"
> sqlite> .tables
> ZDDARTICLE    Z_METADATA    Z_PRIMARYKEY
> sqlite> .indices ZDDARTICLE
> ZDDARTICLE_ZMESSAGEID_INDEX
> ZDDARTICLE_ZPARENT_INDEX
> 
> With my macbook pro insertion of 30k articles took about 2/3 minutes.
> I've uploaded a test project:
> http://dl.dropbox.com/u/103260/CoreDataTreeTest.zip
> I really don't know why it should take this long time but using
> Instruments the big part is obviously fetch for searching id and
> parent.
> 
> On Sat, Feb 13, 2010 at 2:53 PM, Roland King <r...@rols.org> wrote:
>> 
>> .. oh and one other thing, there's a core data instruments tool in XCode, 
>> well there is for OSX, not for iPhoneOS which I develop for which may be why 
>> I never saw it before. You could try that.
>> 
>> On 13-Feb-2010, at 9:36 PM, Roland King wrote:
>> 
>>> ok, I don't see anything wrong with the predicate code, but I'm no core 
>>> data expert.
>>> 
>>> I'll make one totally challengable statement. Assuming that core data uses 
>>> sqllite in a rational way to store objects (eg not storing everything as 
>>> blobs of opaque data) for instance one table per entity where each column 
>>> of the table is an attribute and evaluating the predicate does what you 
>>> would expect it to do, ie uses SQL to do as much of the heavy lifting on a 
>>> fetch request as possible, that column is indexed in the table and sqllite 
>>> is using the index; taking multi-minutes to find one row out of 20,000 just 
>>> doesn't make any sense, it should take seconds at most.
>>> 
>>> I believe core data does use table-per-entity. I think that partly because 
>>> the documentation hints at it, partly because it makes sense and partly 
>>> because I looked at the implementation of one data model that I have.
>>> 
>>> I can't see the point of making indexes if the predicate code doesn't 
>>> generate SQL which doesn't use them, but it's possible. It's possible that 
>>> core data goes and loads all the entity rows and inspects their attributes 
>>> by hand and filters them in code, but this is apple not microsoft.
>>> 
>>> So that leaves column isn't indexed as the most likely. But you've checked 
>>> the 'indexed' box. Here's another wild assed guess, does coredata only 
>>> create a store when you have no current store? It certainly checks to see 
>>> if the store is compatible with the model but as the indexed property is 
>>> just a hint anyway, that store is compatible, just non-optimal .. it's 
>>> possible if you created the store with the property defined as not-indexed 
>>> and have just checked that box later, without regenerating the whole store, 
>>> the index was never added. Did you do that, just check it later? Have you 
>>> regenerated a complete new store since or are you using a store you've been 
>>> populating for a while.
>>> 
>>> Here's a particularly ugly idea, purists please stop reading now. We can 
>>> look at the store and see if it has an index on that property ... first get 
>>> up a terminal window and go to the path where your store is. I'm assuming 
>>> you have sqlite3 installed like I do .. it came with the OS as far as I 
>>> know.
>>> 
>>> Your store should be called something.sqlite, let's say it's Foo. Type
>>> 
>>>       sqlite3 Foo.sqlite
>>> 
>>> and that should open the store and give you a prompt. First you want to 
>>> find the tables in the store, so type
>>> 
>>>       .tables
>>> 
>>> as far as I can see they are called Z<YOUR ENTITY NAME>, so for you I'd 
>>> expect to see one of the tables called ZMCARTICLE. If there is one, you can 
>>> find out what indices are on it
>>> 
>>>       .indices ZMCARTICLE
>>> 
>>> I believe again the indices are called Z<YOUR ENTITY NAME>_Z<YOUR ATTRIBUTE 
>>> NAME>_INDEX, so you'd expect to find ZMCARTICLE_ZMESSAGEID_INDEX in that 
>>> list. If you don't have it, the store wasn't created with that index. If 
>>> none of those tables exist at all, my rudimentary reverse engineering of 
>>> the whole coredata thing is flawed (or I'm using some entirely different 
>>> version from you).
>>> 
>>> If the tables and indices exist, including the one on ZMESSAGEID, I'm out 
>>> of ideas unless someone knows of a way to put coredata into a form of debug 
>>> mode and see the SQL generated to figure out if it's doing anything smart.
>>> 
>>> If either none of the above works or it does work but you don't have the 
>>> index, you have a couple of options. The right one is to delete your whole 
>>> message store and run your app and make a brand new one to see if that then 
>>> adds the indexed property with an index. Depending on how you've populated 
>>> the store, that might be a real pain, perhaps you can force a migration or 
>>> something. The other really stupid idea would be to just add the index and 
>>> hope that doesn't break everything entirely which is entirely possible at 
>>> which point you delete the store and start over. You would do that by 
>>> running
>>> 
>>>       CREATE INDEX ZMCARTICLE_ZMESSAGEID_INDEX ON ZMCARTICLE (ZMESSAGEID);
>>> 
>>> Here's another useful thing I just came across, I would certainly run this 
>>> to see if the SQL being executed makes sense.
>>> 
>>> 
>>> With Mac OS X version 10.4.3 and later, you can use the user default 
>>> com.apple.CoreData.SQLDebug to log to stderr the actual SQL sent to SQLite. 
>>> (Note that user default names are case sensitive.) For example, you can 
>>> pass the following as an argument to the application:
>>> 
>>> -com.apple.CoreData.SQLDebug 1
>>> Higher levels of debug numbers produce more information, although using 
>>> higher numbers is likely to be of diminishing utility.
>>> 
>>> 
>>> 
>>> I'd love to hear about any other ways people have to debug coredata. I sort 
>>> of trust apple has done a good job with it and for it to break down 
>>> performance wise on looking for a row in 20,000 with a certain attribute 
>>> doesn't make sense to me. If you really can't get it to work, I'd write a 
>>> short project which inserts 20,000 simple objects into a store and another 
>>> one which opened the store and goes looking for one by attribute in the way 
>>> you have. If it takes multi-minutes, I'd sent it to apple as a bug.

_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Validating unique objects in CoreData

Reply via email to