Is Core Data appropriate to my task?

2009-09-10 Thread Gwynne Raskind
I have an application that manages two kinds of data: A singular file  
that contains a large amount of rarely changed (but not invariant)  
data, and documents that contain one root object's worth of  
information that connects to the singular data set in a very large  
number of places; the documents are in fact little more than a chosen  
list of references into the singular data set, plus some small bits of  
independant data. The documents are modified quite often.


Originally I thought Core Data must surely be ideal for this sort of  
application; the object graph management alone should have been very  
helpful, to say nothing of the management of stores and the abilty to  
integrate with bindings and NSDocument. I got as far as reading all  
the basic (and some of the not so basic) Core Data documentation and  
began wondering whether or not my data would fit into its model after  
all. For example, in the large singular data set, there are a large  
number of what SQL would call lookup tables, data that's used only  
to avoid duplicating a set of constant values that are used elsewhere.


To use the Employee/Department example from the Core Data docs,  
sometime in the future an Employee might have a planetOfOrigin  
attribute. Assuming one limits one's self to the restriction of the  
speed of light (so, not TOO far in the future), the resulting Planet  
entity would only ever have a small number of possible values. Such an  
attribute might be better modeled in the Employee entity by something  
like SQL's ENUM or SET types. If the set of possible values is Earth  
and Not Earth, a Boolean might make more sense. If the set of  
possible values is Earth, Mars, Venus, etc., an ENUM would be a  
reasonably obvious choice; after all, how often does the solar system  
gain or lose a planet (cough Pluto cough)? With such a small data set,  
a lookup table would only be the obvious choice if the set of possible  
values was expected to change with any frequency. But Core Data has no  
support for such a thing; I would either have to write a custom  
property type or model it by creating the Planet entity and giving it  
a relationship from Employee.


Let's pretend the lookup table *was* the obvious choice for some  
reason; the speed of light barrier has been broken and now there's a  
whole mess of planets. So in Core Data parlance, the Employee entity  
has a one-to-one relationship to the Planet entity. The inverse  
relationship from Planet to Employee, all employees from this planet  
is technically feasible, even easy, to model, but it's almost  
certainly a waste of time and effort. But the Core Data documentation  
offers a long list of very dire warnings about one-way relationships  
between entities.


Worse, the list of possible Planets certainly doesn't belong in the  
same document file that holds a single Employee's data; you'd end up  
duplicating that data across every single Employee. So the list of  
Planets would instead be in a global store. But oops, Core Data can't  
model cross-store relationships, so you use a fetched property, which  
is one-way. Inverse relationship problem solved, unless you actually  
had a use for that relationship. But fetched properties need a fetch  
request, and what do you put in the predicate? Now you need some kind  
of identifier in the Employee for the fetch to operate on, and now you  
have two fields (the planetOfOriginName string for the predicate and  
planetOfOrigin as the fetched property) to model a single  
relationship. How to maintain referential integrity? And what if you  
DID want the inverse relationship - do you model another fetched  
property in the other direction? What's the predicate there,  
planetOfOriginName LIKE [c] $FETCH_SOURCE.name? Now your Planet  
entity has intimate knowledge of the structure of your Employee  
entity; that can't be good.


It seems to me that Core Data really is intended to deal with lists of  
root objects, i.e. the entire list of Employees in one store, rather  
than one Employee per store. The Core Data documentation mentions  
attaching multiple stores to a persistent store coordinator, but I  
can't make any sense of how interrelationships between the stores are  
handled.


Is Core Data really more appropriate to my dataset than an SQLite  
database and a simple Employee object that fetches from that database?  
If so, I'd appreciate some help in understanding how.


(Let me take this opportunity to say that for all the warnings that  
Core Data is not and never has been a database, almost every concept I  
see in it makes me think O/R mapper for SQLite.)


-- Gwynne

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:

Re: Is Core Data appropriate to my task?

2009-09-10 Thread Gwynne Raskind

On Sep 10, 2009, at 3:21 PM, Erik Buck wrote:
Yes.  Use Core Data.  Your application is exactly what Core data is  
intended to support.


Create a planet entity.
Create a one to many relationship so that each employee has one  
planet, but each planet has an unlimited number of employees.


This is exactly what lookup tables in sql produce.  There is no  
need for fancy fetched properties.  There is no problem with having  
planet entity instances in the same store with employee entity  
instances.  It is a good design that makes your data stores self  
sufficient.  There will only be one instance of the planet entity  
for each planet that you define.  Right now, you would never have  
more than 8 or 9 planet entity instances no matter how many employee  
instances you have.


You could also just have a planet of origin string property in  
each Employee entity.  The property could default to Earth.  There  
is no need for a custom Enum type when strings work perfectly  
well.  You can even validate the strings whenever they change to  
restrict the set of valid strings.  Constant strings will tend to  
have the same pointer, so you won't even have the cost of separate  
string copies for each Employee instance.



I don't see this as being equivelant at all.

Extending the example, let's say the company with these Employees has  
as its directors several discriminating unfair people, and thus an  
Employee from any given Planet gets a salary adjustment based on that  
Planet. The obvious place for this data is the Planets table, or in  
Core Data's case, the Planet entity. A salaryAdj column (attribute)  
is added to the Planets table (Planet entity) and filled in with the  
(in)appropriate numbers.


Now suddenly the company is taken over by far more benevolent and  
considerate people, whose only flaw is that they don't want to break a  
system that works by removing an entire column from a database table  
(a schema change is much more difficult than a data update, after  
all), so they just UPDATE Planets SET salaryAdj=0.


So someone loads up an Employee whose Planet instances are in the same  
store with that Employee, and the old salary adjustment is still  
sitting there in the saved data. I sense unhappy Employees in this  
company's future. If only the coder who wrote the payroll system had  
put the Planet data in some global store where changes to it would  
propogate correctly to all Employees.


Does Core Data still solve the problem? Is there some reason that  
using Core Data for everything would be better than storing the global  
rarely-updated data in a real database and using Core Data only for  
the Employee entity, which is the only part which really talks to the  
UI anyway? (Something tells me the key point is right there...) For  
that matter, if Core Data is only managing one entity, what's the use  
of Core Data at all? With all the data being referential between the  
database and the entity, just define a simple NSObject subclass which  
contains a few instance variables and implements NSCoding.


-- Gwynne

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


re: Is Core Data appropriate to my task?

2009-09-10 Thread Ben Trumbull

Gwynne,


I have an application that manages two kinds of data: A singular file
that contains a large amount of rarely changed (but not invariant)
data, and documents that contain one root object's worth of
information that connects to the singular data set in a very large
number of places; the documents are in fact little more than a chosen
list of references into the singular data set, plus some small bits of
independant data. The documents are modified quite often.

Originally I thought Core Data must surely be ideal for this sort of
application; the object graph management alone should have been very
helpful, to say nothing of the management of stores and the abilty to
integrate with bindings and NSDocument. I got as far as reading all
the basic (and some of the not so basic) Core Data documentation and
began wondering whether or not my data would fit into its model after
all. For example, in the large singular data set, there are a large
number of what SQL would call lookup tables, data that's used only
to avoid duplicating a set of constant values that are used elsewhere.

To use the Employee/Department example from the Core Data docs,
sometime in the future an Employee might have a planetOfOrigin
attribute. Assuming one limits one's self to the restriction of the
speed of light (so, not TOO far in the future), the resulting Planet
entity would only ever have a small number of possible values. Such an
attribute might be better modeled in the Employee entity by something
like SQL's ENUM or SET types. If the set of possible values is Earth
and Not Earth, a Boolean might make more sense. If the set of
possible values is Earth, Mars, Venus, etc., an ENUM would be a
reasonably obvious choice; after all, how often does the solar system
gain or lose a planet (cough Pluto cough)? With such a small data set,
a lookup table would only be the obvious choice if the set of possible
values was expected to change with any frequency. But Core Data has no
support for such a thing; I would either have to write a custom
property type or model it by creating the Planet entity and giving it
a relationship from Employee.


Correct.  You can write a custom NSValueTransformer with the  
Transformable property type to implement an ENUM, or normalize the  
data into a separate table as a formally modeled entity.  Which is  
better depends on how big the data values are, how many of them there  
are, and how frequently they change.


Is that really so bad ?  The alternative is to do ALL the work yourself.


Let's pretend the lookup table *was* the obvious choice for some
reason; the speed of light barrier has been broken and now there's a
whole mess of planets. So in Core Data parlance, the Employee entity
has a one-to-one relationship to the Planet entity.


A lonely planet.  That's either going to be one-to-many or a no  
inverse to-one.



The inverse
relationship from Planet to Employee, all employees from this planet
is technically feasible, even easy, to model, but it's almost
certainly a waste of time and effort. But the Core Data documentation
offers a long list of very dire warnings about one-way relationships
between entities.


Yes, and for most situations those warnings are there for very good  
reasons.  But if there were no reasons for such relationships, then it  
wouldn't be a warning, it simply wouldn't exist.



Worse, the list of possible Planets certainly doesn't belong in the
same document file that holds a single Employee's data; you'd end up
duplicating that data across every single Employee. So the list of
Planets would instead be in a global store.


There are lots of ways to model that, but, yes, this would be the most  
natural.



But oops, Core Data can't
model cross-store relationships, so you use a fetched property, which
is one-way.


You could use a fetched property, or handle this in code by storing a  
URI for the destination object in a different store, and fetching the  
matching objectID either lazily in in -awakeFromFetch.  We've  
generally recommended using a custom accessor method for this instead  
of fetched properties.



Inverse relationship problem solved, unless you actually
had a use for that relationship. But fetched properties need a fetch
request, and what do you put in the predicate? Now you need some kind
of identifier in the Employee for the fetch to operate on,


Yes, but this isn't any different than the problem would be without  
Core Data for managing values in two different databases.


and now you  have two fields (the planetOfOriginName string for  
the predicate and

planetOfOrigin as the fetched property) to model a single
relationship. How to maintain referential integrity?


Again, no different than the problem would be without Core Data.  This  
is why the modeling tool recommends using inverse relationships.   
Maintaining the integrity by oneself is tedious and error prone.


And what if you DID want the inverse relationship - do you model  
another fetched

Re: Is Core Data appropriate to my task?

2009-09-10 Thread Gwynne Raskind
Before anything else, let me say thank you for a clear, concise, and  
very helpful set of answers to my questions; I was expecting rather  
more of a struggle for understanding :).


On Sep 10, 2009, at 5:04 PM, Ben Trumbull wrote:

support for such a thing; I would either have to write a custom
property type or model it by creating the Planet entity and giving it
a relationship from Employee.
Correct.  You can write a custom NSValueTransformer with the  
Transformable property type to implement an ENUM, or normalize the  
data into a separate table as a formally modeled entity.  Which is  
better depends on how big the data values are, how many of them  
there are, and how frequently they change.


Is that really so bad ?  The alternative is to do ALL the work  
yourself.


Mostly I just wanted to be certain that there was nothing obvious I  
was missing :).



The inverse
relationship from Planet to Employee, all employees from this  
planet

is technically feasible, even easy, to model, but it's almost
certainly a waste of time and effort. But the Core Data documentation
offers a long list of very dire warnings about one-way relationships
between entities.
Yes, and for most situations those warnings are there for very good  
reasons.  But if there were no reasons for such relationships, then  
it wouldn't be a warning, it simply wouldn't exist.


The manual isn't at all clear about this, but if I understand  
correctly, you're basically saying, Though it is almost always  
technically possible to model an inverse to any relationship, there  
are sometimes circumstances in which it is correct not to do so. Is  
that accurate, and if so, should I file a documentation bug requesting  
clarification on that and the circumstances in which it's true?



Worse, the list of possible Planets certainly doesn't belong in the
same document file that holds a single Employee's data; you'd end up
duplicating that data across every single Employee. So the list of
Planets would instead be in a global store.
There are lots of ways to model that, but, yes, this would be the  
most natural.


I can't think of any others offhand, but I haven't worked with this  
sort of data before; could you give some examples?



But oops, Core Data can't
model cross-store relationships, so you use a fetched property, which
is one-way.
You could use a fetched property, or handle this in code by storing  
a URI for the destination object in a different store, and fetching  
the matching objectID either lazily in in -awakeFromFetch.  We've  
generally recommended using a custom accessor method for this  
instead of fetched properties.


Is there any particular reason for that recommendation? The  
documentation explicitly recommends fetched properties for cross-store  
relationships (one instance of several is in the Core Data Programming  
Guide, Relationships and Fetched Properties chapter, Fetched  
Properties section, first paragraph, where it says  In general,  
fetched properties are best suited to modeling cross-store  
relationships...)


Also, if you do in code or fetched properties, this hand made cross  
store relationship, you should prefer numeric keys to text strings  
for your joins.  Creating a de facto join through a LIKE query is  
pretty crazy.  That's a case insensitive, local aware, Unicode regex  
there.  String operations are much more expensive than integer  
comparisons.  At the very least, use == for your string compares.


Don't worry, I already had the experience of having to work with a  
codebase that used string keys as its only cross-table links in mSQL.  
Eventually we had to recreate the whole system from scratch.


It seems to me that Core Data really is intended to deal with lists  
of

root objects, i.e. the entire list of Employees in one store, rather
than one Employee per store.
One document per Employee is a bit unusual.  But it's feasible if  
that's your requirement.


Employee was just the example I yanked out of the Core Data docs :). A  
better analogy would be the Picture example. If you use Core Data  
entities to store the various elements of a vector graphic, you would  
certainly want to be able to store one graphic per document.



(Let me take this opportunity to say that for all the warnings that
Core Data is not and never has been a database, almost every  
concept I

see in it makes me think O/R mapper for SQLite.)
Core Data is an O/R mapping framework, among other things.  But O/R  
frameworks are not SQL databases.  Modeling your data in any O/R  
framework as if you were writing SQL directly is inefficient and  
mistaken.


Saying that Core Data is a database is like saying your compiler is  
an assembler.  Well, the compiler suite uses an assembler, sure, and  
they both output object code in the end, but that does not mean the  
best way to use your compiler is to write in assembly.



Nonetheless, Core Data does manage the data stored on disk as well as  
the representation of 

Re: Is Core Data appropriate to my task?

2009-09-10 Thread Ben Trumbull

I don't see this as being equivelant at all.

Extending the example, let's say the company with these Employees has
as its directors several discriminating unfair people, and thus an
Employee from any given Planet gets a salary adjustment based on that
Planet. The obvious place for this data is the Planets table, or in
Core Data's case, the Planet entity. A salaryAdj column (attribute)
is added to the Planets table (Planet entity) and filled in with the
(in)appropriate numbers.

Now suddenly the company is taken over by far more benevolent and
considerate people, whose only flaw is that they don't want to break a
system that works by removing an entire column from a database table
(a schema change is much more difficult than a data update, after
all), so they just UPDATE Planets SET salaryAdj=0.


Now you're conflating other issues.  This is why I recommend not  
treating O/R systems as perfectly equivalent to databases.  They're  
not.  On Snow Leopard  iPhone OS, you can make modest alterations to  
the Core Data schema easily.  Just keep a copy of the old model, and  
pass the 2 keys to the options dictionary when you add the store to  
the PSC to leverage light weight migration.  Core Data will infer the  
appropriate schema changes and adjust the schema in place (alter table  
style).


http://developer.apple.com/mac/library/documentation/Cocoa/Conceptual/CoreDataVersioning/Articles/vmLightweight.html#//apple_ref/doc/uid/TP40008426-SW1 



If you do make some unusual and radical modifications (split tables  
into multiple new tables, compose old tables into a single new table,  
etc), you can use the full mapping model migration.  While it won't  
perform as well as light weight migration, at least you'll have tools  
support in handling the schema migration.



So someone loads up an Employee whose Planet instances are in the same
store with that Employee, and the old salary adjustment is still
sitting there in the saved data. I sense unhappy Employees in this
company's future. If only the coder who wrote the payroll system had
put the Planet data in some global store where changes to it would
propogate correctly to all Employees.


If this is important, than you can use multiple persistent stores.

I suspect Erik's point, though, is many apps don't have a significant  
issue with a small amount of duplication in the individual documents.   
Disk space is usually cheap.  And being completely self contained has  
its advantages (perhaps not relevant to you, but still existent).   
global mutations is a double edged sword.  What if your documents  
are loaded in a newer version of the app, but have some implicit data  
dependency on the older global data ?  That can get messy.



Does Core Data still solve the problem? Is there some reason that
using Core Data for everything would be better than storing the global
rarely-updated data in a real database and using Core Data only for
the Employee entity, which is the only part which really talks to the
UI anyway? (Something tells me the key point is right there...) For
that matter, if Core Data is only managing one entity, what's the use
of Core Data at all? With all the data being referential between the
database and the entity, just define a simple NSObject subclass which
contains a few instance variables and implements NSCoding.



Then why not use Core Data for the database and for the entity  
implement a simple NSObject subclass with a few instances  
variables ... ?


Although, it seems a little silly to not use Core Data for the simple  
part when you'll get persistence, change tracking and Cocoa Bindings  
integration for free.  Most people find NOT maintaining backward  
compatible initWithCoder methods in perpetuity quite refreshing.  I  
know one developer seriously considering rewriting their iPhone app  
for no other reason than to use Core Data's light weight migration and  
never hand roll another database schema upgrade again.


Here's an excerpt from a post regarding when to use Core Data on the  
iPhone:


I suppose I could tell you how great an addition to Cocoa it is, or  
how much TLC its performance tuning gets.  But what I've seen our most  
sophisticated clients decide is that it saves them from writing a lot  
of code.  The model code with Core Data is usually 50% to 70% smaller  
as measured by lines of code.  Why reinvent that ?


App developers don't get paid to write database code.  Can you learn  
SQL ?  Sure.  Do your customers care ?  No.


App developers get paid for novel functionality that addresses a real  
customer need with good UI.


- Ben

Here's a more traditional reply:

- Full KVC, KVO support out of box
- Relationship maintenance (inverses, delete propagation)
- Change tracking

- Sophisticated SQL compilation
 - NSPredicate objects instead of SQL
 - NSPredicate support for correlated subqueries, basic  
functions, and other advanced SQL

 - Proper Unicode, local aware searching, sorting, regex
   

Re: Is Core Data appropriate to my task?

2009-09-10 Thread Ben Trumbull

Before anything else, let me say thank you for a clear, concise, and
very helpful set of answers to my questions; I was expecting rather
more of a struggle for understanding :).


my pleasure.


On Sep 10, 2009, at 5:04 PM, Ben Trumbull wrote:

The inverse
relationship from Planet to Employee, all employees from this
planet
is technically feasible, even easy, to model, but it's almost
certainly a waste of time and effort. But the Core Data  
documentation

offers a long list of very dire warnings about one-way relationships
between entities.

Yes, and for most situations those warnings are there for very good
reasons.  But if there were no reasons for such relationships, then
it wouldn't be a warning, it simply wouldn't exist.


The manual isn't at all clear about this, but if I understand
correctly, you're basically saying, Though it is almost always
technically possible to model an inverse to any relationship, there
are sometimes circumstances in which it is correct not to do so. Is
that accurate, and if so, should I file a documentation bug requesting
clarification on that and the circumstances in which it's true?


Sure.  It's a challenge to document some of this material in a way  
that steers most developers down the typically optimal path while  
still keeping advanced options open.  We have many developers with  
little or no database experience, and we want to encourage them to use  
inverses until they have a compelling reason not to.  The  
documentation was more open about no inverse relationships in 10.4,  
and we learned the hard way that was less than ideal.  The modeling  
tool now issues warnings for this due to the frequency and severity of  
bugs from developers incorrectly and over eagerly using no inverse  
relationships.



But oops, Core Data can't
model cross-store relationships, so you use a fetched property,  
which

is one-way.

You could use a fetched property, or handle this in code by storing
a URI for the destination object in a different store, and fetching
the matching objectID either lazily in in -awakeFromFetch.  We've
generally recommended using a custom accessor method for this
instead of fetched properties.


Is there any particular reason for that recommendation? The
documentation explicitly recommends fetched properties for cross-store
relationships (one instance of several is in the Core Data Programming
Guide, Relationships and Fetched Properties chapter, Fetched
Properties section, first paragraph, where it says  In general,
fetched properties are best suited to modeling cross-store
relationships...)


First, custom accessor methods and -awakeFromFetch offer a vast amount  
of flexibility, and can be easier to tune for performance.  Fetched  
properties are a fine alternative.  But I like to also reinforce the  
understanding that not all your custom behavior needs to be  
encapsulated in your Core Data schema.  You have full Objective-C  
objects and very powerful runtime support.  Use it liberally.



(Let me take this opportunity to say that for all the warnings that
Core Data is not and never has been a database, almost every
concept I
see in it makes me think O/R mapper for SQLite.)

Core Data is an O/R mapping framework, among other things.  But O/R
frameworks are not SQL databases.  Modeling your data in any O/R
framework as if you were writing SQL directly is inefficient and
mistaken.

Saying that Core Data is a database is like saying your compiler is
an assembler.  Well, the compiler suite uses an assembler, sure, and
they both output object code in the end, but that does not mean the
best way to use your compiler is to write in assembly.



Nonetheless, Core Data does manage the data stored on disk as well as
the representation of that data in memory; I don't see a tremendous
difference between that and what SQLite does, other than Core Data
providing a much effective organization of and means of access to that
data.


Core Data implements a lot of functionality on top of SQLite.  From an  
API perspective, that it uses SQLite at all is an implementation detail.


In any event, O/R systems present an OO view of your data, and have  
their own idioms closer to OOP.  They are providing an abstraction  
layer and perform transformations on both your queries and result  
sets.  Relational databases can support that, but in every O/R system,  
the ideal way of using the system is somewhat different from how one  
would write SQL directly against the database.


- Ben


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com