Is Core Data appropriate to my task?
I have an application that manages two kinds of data: A singular file that contains a large amount of rarely changed (but not invariant) data, and documents that contain one root object's worth of information that connects to the singular data set in a very large number of places; the documents are in fact little more than a chosen list of references into the singular data set, plus some small bits of independant data. The documents are modified quite often. Originally I thought Core Data must surely be ideal for this sort of application; the object graph management alone should have been very helpful, to say nothing of the management of stores and the abilty to integrate with bindings and NSDocument. I got as far as reading all the basic (and some of the not so basic) Core Data documentation and began wondering whether or not my data would fit into its model after all. For example, in the large singular data set, there are a large number of what SQL would call lookup tables, data that's used only to avoid duplicating a set of constant values that are used elsewhere. To use the Employee/Department example from the Core Data docs, sometime in the future an Employee might have a planetOfOrigin attribute. Assuming one limits one's self to the restriction of the speed of light (so, not TOO far in the future), the resulting Planet entity would only ever have a small number of possible values. Such an attribute might be better modeled in the Employee entity by something like SQL's ENUM or SET types. If the set of possible values is Earth and Not Earth, a Boolean might make more sense. If the set of possible values is Earth, Mars, Venus, etc., an ENUM would be a reasonably obvious choice; after all, how often does the solar system gain or lose a planet (cough Pluto cough)? With such a small data set, a lookup table would only be the obvious choice if the set of possible values was expected to change with any frequency. But Core Data has no support for such a thing; I would either have to write a custom property type or model it by creating the Planet entity and giving it a relationship from Employee. Let's pretend the lookup table *was* the obvious choice for some reason; the speed of light barrier has been broken and now there's a whole mess of planets. So in Core Data parlance, the Employee entity has a one-to-one relationship to the Planet entity. The inverse relationship from Planet to Employee, all employees from this planet is technically feasible, even easy, to model, but it's almost certainly a waste of time and effort. But the Core Data documentation offers a long list of very dire warnings about one-way relationships between entities. Worse, the list of possible Planets certainly doesn't belong in the same document file that holds a single Employee's data; you'd end up duplicating that data across every single Employee. So the list of Planets would instead be in a global store. But oops, Core Data can't model cross-store relationships, so you use a fetched property, which is one-way. Inverse relationship problem solved, unless you actually had a use for that relationship. But fetched properties need a fetch request, and what do you put in the predicate? Now you need some kind of identifier in the Employee for the fetch to operate on, and now you have two fields (the planetOfOriginName string for the predicate and planetOfOrigin as the fetched property) to model a single relationship. How to maintain referential integrity? And what if you DID want the inverse relationship - do you model another fetched property in the other direction? What's the predicate there, planetOfOriginName LIKE [c] $FETCH_SOURCE.name? Now your Planet entity has intimate knowledge of the structure of your Employee entity; that can't be good. It seems to me that Core Data really is intended to deal with lists of root objects, i.e. the entire list of Employees in one store, rather than one Employee per store. The Core Data documentation mentions attaching multiple stores to a persistent store coordinator, but I can't make any sense of how interrelationships between the stores are handled. Is Core Data really more appropriate to my dataset than an SQLite database and a simple Employee object that fetches from that database? If so, I'd appreciate some help in understanding how. (Let me take this opportunity to say that for all the warnings that Core Data is not and never has been a database, almost every concept I see in it makes me think O/R mapper for SQLite.) -- Gwynne ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription:
Re: Is Core Data appropriate to my task?
On Sep 10, 2009, at 3:21 PM, Erik Buck wrote: Yes. Use Core Data. Your application is exactly what Core data is intended to support. Create a planet entity. Create a one to many relationship so that each employee has one planet, but each planet has an unlimited number of employees. This is exactly what lookup tables in sql produce. There is no need for fancy fetched properties. There is no problem with having planet entity instances in the same store with employee entity instances. It is a good design that makes your data stores self sufficient. There will only be one instance of the planet entity for each planet that you define. Right now, you would never have more than 8 or 9 planet entity instances no matter how many employee instances you have. You could also just have a planet of origin string property in each Employee entity. The property could default to Earth. There is no need for a custom Enum type when strings work perfectly well. You can even validate the strings whenever they change to restrict the set of valid strings. Constant strings will tend to have the same pointer, so you won't even have the cost of separate string copies for each Employee instance. I don't see this as being equivelant at all. Extending the example, let's say the company with these Employees has as its directors several discriminating unfair people, and thus an Employee from any given Planet gets a salary adjustment based on that Planet. The obvious place for this data is the Planets table, or in Core Data's case, the Planet entity. A salaryAdj column (attribute) is added to the Planets table (Planet entity) and filled in with the (in)appropriate numbers. Now suddenly the company is taken over by far more benevolent and considerate people, whose only flaw is that they don't want to break a system that works by removing an entire column from a database table (a schema change is much more difficult than a data update, after all), so they just UPDATE Planets SET salaryAdj=0. So someone loads up an Employee whose Planet instances are in the same store with that Employee, and the old salary adjustment is still sitting there in the saved data. I sense unhappy Employees in this company's future. If only the coder who wrote the payroll system had put the Planet data in some global store where changes to it would propogate correctly to all Employees. Does Core Data still solve the problem? Is there some reason that using Core Data for everything would be better than storing the global rarely-updated data in a real database and using Core Data only for the Employee entity, which is the only part which really talks to the UI anyway? (Something tells me the key point is right there...) For that matter, if Core Data is only managing one entity, what's the use of Core Data at all? With all the data being referential between the database and the entity, just define a simple NSObject subclass which contains a few instance variables and implements NSCoding. -- Gwynne ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
re: Is Core Data appropriate to my task?
Gwynne, I have an application that manages two kinds of data: A singular file that contains a large amount of rarely changed (but not invariant) data, and documents that contain one root object's worth of information that connects to the singular data set in a very large number of places; the documents are in fact little more than a chosen list of references into the singular data set, plus some small bits of independant data. The documents are modified quite often. Originally I thought Core Data must surely be ideal for this sort of application; the object graph management alone should have been very helpful, to say nothing of the management of stores and the abilty to integrate with bindings and NSDocument. I got as far as reading all the basic (and some of the not so basic) Core Data documentation and began wondering whether or not my data would fit into its model after all. For example, in the large singular data set, there are a large number of what SQL would call lookup tables, data that's used only to avoid duplicating a set of constant values that are used elsewhere. To use the Employee/Department example from the Core Data docs, sometime in the future an Employee might have a planetOfOrigin attribute. Assuming one limits one's self to the restriction of the speed of light (so, not TOO far in the future), the resulting Planet entity would only ever have a small number of possible values. Such an attribute might be better modeled in the Employee entity by something like SQL's ENUM or SET types. If the set of possible values is Earth and Not Earth, a Boolean might make more sense. If the set of possible values is Earth, Mars, Venus, etc., an ENUM would be a reasonably obvious choice; after all, how often does the solar system gain or lose a planet (cough Pluto cough)? With such a small data set, a lookup table would only be the obvious choice if the set of possible values was expected to change with any frequency. But Core Data has no support for such a thing; I would either have to write a custom property type or model it by creating the Planet entity and giving it a relationship from Employee. Correct. You can write a custom NSValueTransformer with the Transformable property type to implement an ENUM, or normalize the data into a separate table as a formally modeled entity. Which is better depends on how big the data values are, how many of them there are, and how frequently they change. Is that really so bad ? The alternative is to do ALL the work yourself. Let's pretend the lookup table *was* the obvious choice for some reason; the speed of light barrier has been broken and now there's a whole mess of planets. So in Core Data parlance, the Employee entity has a one-to-one relationship to the Planet entity. A lonely planet. That's either going to be one-to-many or a no inverse to-one. The inverse relationship from Planet to Employee, all employees from this planet is technically feasible, even easy, to model, but it's almost certainly a waste of time and effort. But the Core Data documentation offers a long list of very dire warnings about one-way relationships between entities. Yes, and for most situations those warnings are there for very good reasons. But if there were no reasons for such relationships, then it wouldn't be a warning, it simply wouldn't exist. Worse, the list of possible Planets certainly doesn't belong in the same document file that holds a single Employee's data; you'd end up duplicating that data across every single Employee. So the list of Planets would instead be in a global store. There are lots of ways to model that, but, yes, this would be the most natural. But oops, Core Data can't model cross-store relationships, so you use a fetched property, which is one-way. You could use a fetched property, or handle this in code by storing a URI for the destination object in a different store, and fetching the matching objectID either lazily in in -awakeFromFetch. We've generally recommended using a custom accessor method for this instead of fetched properties. Inverse relationship problem solved, unless you actually had a use for that relationship. But fetched properties need a fetch request, and what do you put in the predicate? Now you need some kind of identifier in the Employee for the fetch to operate on, Yes, but this isn't any different than the problem would be without Core Data for managing values in two different databases. and now you have two fields (the planetOfOriginName string for the predicate and planetOfOrigin as the fetched property) to model a single relationship. How to maintain referential integrity? Again, no different than the problem would be without Core Data. This is why the modeling tool recommends using inverse relationships. Maintaining the integrity by oneself is tedious and error prone. And what if you DID want the inverse relationship - do you model another fetched
Re: Is Core Data appropriate to my task?
Before anything else, let me say thank you for a clear, concise, and very helpful set of answers to my questions; I was expecting rather more of a struggle for understanding :). On Sep 10, 2009, at 5:04 PM, Ben Trumbull wrote: support for such a thing; I would either have to write a custom property type or model it by creating the Planet entity and giving it a relationship from Employee. Correct. You can write a custom NSValueTransformer with the Transformable property type to implement an ENUM, or normalize the data into a separate table as a formally modeled entity. Which is better depends on how big the data values are, how many of them there are, and how frequently they change. Is that really so bad ? The alternative is to do ALL the work yourself. Mostly I just wanted to be certain that there was nothing obvious I was missing :). The inverse relationship from Planet to Employee, all employees from this planet is technically feasible, even easy, to model, but it's almost certainly a waste of time and effort. But the Core Data documentation offers a long list of very dire warnings about one-way relationships between entities. Yes, and for most situations those warnings are there for very good reasons. But if there were no reasons for such relationships, then it wouldn't be a warning, it simply wouldn't exist. The manual isn't at all clear about this, but if I understand correctly, you're basically saying, Though it is almost always technically possible to model an inverse to any relationship, there are sometimes circumstances in which it is correct not to do so. Is that accurate, and if so, should I file a documentation bug requesting clarification on that and the circumstances in which it's true? Worse, the list of possible Planets certainly doesn't belong in the same document file that holds a single Employee's data; you'd end up duplicating that data across every single Employee. So the list of Planets would instead be in a global store. There are lots of ways to model that, but, yes, this would be the most natural. I can't think of any others offhand, but I haven't worked with this sort of data before; could you give some examples? But oops, Core Data can't model cross-store relationships, so you use a fetched property, which is one-way. You could use a fetched property, or handle this in code by storing a URI for the destination object in a different store, and fetching the matching objectID either lazily in in -awakeFromFetch. We've generally recommended using a custom accessor method for this instead of fetched properties. Is there any particular reason for that recommendation? The documentation explicitly recommends fetched properties for cross-store relationships (one instance of several is in the Core Data Programming Guide, Relationships and Fetched Properties chapter, Fetched Properties section, first paragraph, where it says In general, fetched properties are best suited to modeling cross-store relationships...) Also, if you do in code or fetched properties, this hand made cross store relationship, you should prefer numeric keys to text strings for your joins. Creating a de facto join through a LIKE query is pretty crazy. That's a case insensitive, local aware, Unicode regex there. String operations are much more expensive than integer comparisons. At the very least, use == for your string compares. Don't worry, I already had the experience of having to work with a codebase that used string keys as its only cross-table links in mSQL. Eventually we had to recreate the whole system from scratch. It seems to me that Core Data really is intended to deal with lists of root objects, i.e. the entire list of Employees in one store, rather than one Employee per store. One document per Employee is a bit unusual. But it's feasible if that's your requirement. Employee was just the example I yanked out of the Core Data docs :). A better analogy would be the Picture example. If you use Core Data entities to store the various elements of a vector graphic, you would certainly want to be able to store one graphic per document. (Let me take this opportunity to say that for all the warnings that Core Data is not and never has been a database, almost every concept I see in it makes me think O/R mapper for SQLite.) Core Data is an O/R mapping framework, among other things. But O/R frameworks are not SQL databases. Modeling your data in any O/R framework as if you were writing SQL directly is inefficient and mistaken. Saying that Core Data is a database is like saying your compiler is an assembler. Well, the compiler suite uses an assembler, sure, and they both output object code in the end, but that does not mean the best way to use your compiler is to write in assembly. Nonetheless, Core Data does manage the data stored on disk as well as the representation of
Re: Is Core Data appropriate to my task?
I don't see this as being equivelant at all. Extending the example, let's say the company with these Employees has as its directors several discriminating unfair people, and thus an Employee from any given Planet gets a salary adjustment based on that Planet. The obvious place for this data is the Planets table, or in Core Data's case, the Planet entity. A salaryAdj column (attribute) is added to the Planets table (Planet entity) and filled in with the (in)appropriate numbers. Now suddenly the company is taken over by far more benevolent and considerate people, whose only flaw is that they don't want to break a system that works by removing an entire column from a database table (a schema change is much more difficult than a data update, after all), so they just UPDATE Planets SET salaryAdj=0. Now you're conflating other issues. This is why I recommend not treating O/R systems as perfectly equivalent to databases. They're not. On Snow Leopard iPhone OS, you can make modest alterations to the Core Data schema easily. Just keep a copy of the old model, and pass the 2 keys to the options dictionary when you add the store to the PSC to leverage light weight migration. Core Data will infer the appropriate schema changes and adjust the schema in place (alter table style). http://developer.apple.com/mac/library/documentation/Cocoa/Conceptual/CoreDataVersioning/Articles/vmLightweight.html#//apple_ref/doc/uid/TP40008426-SW1 If you do make some unusual and radical modifications (split tables into multiple new tables, compose old tables into a single new table, etc), you can use the full mapping model migration. While it won't perform as well as light weight migration, at least you'll have tools support in handling the schema migration. So someone loads up an Employee whose Planet instances are in the same store with that Employee, and the old salary adjustment is still sitting there in the saved data. I sense unhappy Employees in this company's future. If only the coder who wrote the payroll system had put the Planet data in some global store where changes to it would propogate correctly to all Employees. If this is important, than you can use multiple persistent stores. I suspect Erik's point, though, is many apps don't have a significant issue with a small amount of duplication in the individual documents. Disk space is usually cheap. And being completely self contained has its advantages (perhaps not relevant to you, but still existent). global mutations is a double edged sword. What if your documents are loaded in a newer version of the app, but have some implicit data dependency on the older global data ? That can get messy. Does Core Data still solve the problem? Is there some reason that using Core Data for everything would be better than storing the global rarely-updated data in a real database and using Core Data only for the Employee entity, which is the only part which really talks to the UI anyway? (Something tells me the key point is right there...) For that matter, if Core Data is only managing one entity, what's the use of Core Data at all? With all the data being referential between the database and the entity, just define a simple NSObject subclass which contains a few instance variables and implements NSCoding. Then why not use Core Data for the database and for the entity implement a simple NSObject subclass with a few instances variables ... ? Although, it seems a little silly to not use Core Data for the simple part when you'll get persistence, change tracking and Cocoa Bindings integration for free. Most people find NOT maintaining backward compatible initWithCoder methods in perpetuity quite refreshing. I know one developer seriously considering rewriting their iPhone app for no other reason than to use Core Data's light weight migration and never hand roll another database schema upgrade again. Here's an excerpt from a post regarding when to use Core Data on the iPhone: I suppose I could tell you how great an addition to Cocoa it is, or how much TLC its performance tuning gets. But what I've seen our most sophisticated clients decide is that it saves them from writing a lot of code. The model code with Core Data is usually 50% to 70% smaller as measured by lines of code. Why reinvent that ? App developers don't get paid to write database code. Can you learn SQL ? Sure. Do your customers care ? No. App developers get paid for novel functionality that addresses a real customer need with good UI. - Ben Here's a more traditional reply: - Full KVC, KVO support out of box - Relationship maintenance (inverses, delete propagation) - Change tracking - Sophisticated SQL compilation - NSPredicate objects instead of SQL - NSPredicate support for correlated subqueries, basic functions, and other advanced SQL - Proper Unicode, local aware searching, sorting, regex
Re: Is Core Data appropriate to my task?
Before anything else, let me say thank you for a clear, concise, and very helpful set of answers to my questions; I was expecting rather more of a struggle for understanding :). my pleasure. On Sep 10, 2009, at 5:04 PM, Ben Trumbull wrote: The inverse relationship from Planet to Employee, all employees from this planet is technically feasible, even easy, to model, but it's almost certainly a waste of time and effort. But the Core Data documentation offers a long list of very dire warnings about one-way relationships between entities. Yes, and for most situations those warnings are there for very good reasons. But if there were no reasons for such relationships, then it wouldn't be a warning, it simply wouldn't exist. The manual isn't at all clear about this, but if I understand correctly, you're basically saying, Though it is almost always technically possible to model an inverse to any relationship, there are sometimes circumstances in which it is correct not to do so. Is that accurate, and if so, should I file a documentation bug requesting clarification on that and the circumstances in which it's true? Sure. It's a challenge to document some of this material in a way that steers most developers down the typically optimal path while still keeping advanced options open. We have many developers with little or no database experience, and we want to encourage them to use inverses until they have a compelling reason not to. The documentation was more open about no inverse relationships in 10.4, and we learned the hard way that was less than ideal. The modeling tool now issues warnings for this due to the frequency and severity of bugs from developers incorrectly and over eagerly using no inverse relationships. But oops, Core Data can't model cross-store relationships, so you use a fetched property, which is one-way. You could use a fetched property, or handle this in code by storing a URI for the destination object in a different store, and fetching the matching objectID either lazily in in -awakeFromFetch. We've generally recommended using a custom accessor method for this instead of fetched properties. Is there any particular reason for that recommendation? The documentation explicitly recommends fetched properties for cross-store relationships (one instance of several is in the Core Data Programming Guide, Relationships and Fetched Properties chapter, Fetched Properties section, first paragraph, where it says In general, fetched properties are best suited to modeling cross-store relationships...) First, custom accessor methods and -awakeFromFetch offer a vast amount of flexibility, and can be easier to tune for performance. Fetched properties are a fine alternative. But I like to also reinforce the understanding that not all your custom behavior needs to be encapsulated in your Core Data schema. You have full Objective-C objects and very powerful runtime support. Use it liberally. (Let me take this opportunity to say that for all the warnings that Core Data is not and never has been a database, almost every concept I see in it makes me think O/R mapper for SQLite.) Core Data is an O/R mapping framework, among other things. But O/R frameworks are not SQL databases. Modeling your data in any O/R framework as if you were writing SQL directly is inefficient and mistaken. Saying that Core Data is a database is like saying your compiler is an assembler. Well, the compiler suite uses an assembler, sure, and they both output object code in the end, but that does not mean the best way to use your compiler is to write in assembly. Nonetheless, Core Data does manage the data stored on disk as well as the representation of that data in memory; I don't see a tremendous difference between that and what SQLite does, other than Core Data providing a much effective organization of and means of access to that data. Core Data implements a lot of functionality on top of SQLite. From an API perspective, that it uses SQLite at all is an implementation detail. In any event, O/R systems present an OO view of your data, and have their own idioms closer to OOP. They are providing an abstraction layer and perform transformations on both your queries and result sets. Relational databases can support that, but in every O/R system, the ideal way of using the system is somewhat different from how one would write SQL directly against the database. - Ben ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com