[DataMapper] Some thoughts on DataMapper's Property API

Anthony Williams Mon, 01 Feb 2010 09:20:12 -0800

Hi all,

I've been thinking a little about defining properties in DataMapper,
particularly relating to how they work within repository blocks. I've
posted some ideas to my site: 
http://antw.me/thoughts/datamapper-property-api.html


If you prefer to read them here instead (sans links and syntax
highlighting), I've included the full message below.

I'd love to get some feedback; let me know what you think...

Anthony.

TRANSCRIPT:

In my own dm-core fork on Github I've recently been experimenting with
ways to trim down both Resource and Model, extracting specific
functionality out to separate classes and modules. I've started by
relieving Resource of the need to take care of attributes, creating an
AttributeSet class which holds all of a Resource's attributes, tracks
when they've been updated (marking them as dirty), and lazy-loading
attributes when needed.

Although not yet pushed to Github, my latest commits pass the full dm-
core spec suite against SQLite3 and PostgreSQL, but have 3-4 failures
with the InMemory and Yaml adapters. I eventually tracked this down to
an example where a Property is defined on a Model within the context
of a specific repository. For example:

    require 'dm-core'

    DataMapper.setup(:default, 'in_memory://localhost/one')
    DataMapper.setup(:second,  'in_memory://localhost/two')

    class Person
      include DataMapper::Resource

      property :id,   Serial
      property :name, String

      repository(:second) do
        property :external_id, Integer
      end
    end

This creates a Person model with two attributes: `id` and `name`, and
a third `external_id` attribute which applies only when using the
model within the `:second` repository context:

    DataMapper.repository(:second) do
      Person.create(:name => 'Michael Scarn', :external_id => 1)
    end

In then realised my AttributeSet implementation didn't account for the
properties of a Model changing depending on the current repository
context. It then led to me thinking a little more about the purpose--
and usefulness--of being able to define models in this way.

I'd like to--perhaps a little presumptuously--suggest that this
functionality isn't as nice as it first seems, provide an alternative
means for achieving the same result, and elaborate on how I think such
repository blocks should work.

### Inconsistent instance API

Allowing a user to wrap properties in a repository block results in a
model changing it's behaviour depending on external state (the current
repository context). At one moment the resource has an `external_id`
attribute, and in the next the attribute seems to disappear.

    DataMapper.repository(:second) do
      Person.new(:external_id => 1)
    end
    # => #<Person @id=nil @name=nil>

Wait... where did the `external_id` attribute go? In fact the
attribute was set, it just doesn't appear since `Person#inspect` was
called outside of the repository block...

    DataMapper.repository(:second) do
      puts Person.new(:external_id => 1).inspect
    end
    # => #<Person @id=nil @name=nil @external_id=1>

Aha! There it is. Trying to set the `external_id` attribute outside of
the `:second` repository context will also (rightly) fail.

### Ambiguity as to where a resource is saved

DataMapper's repository context allows you to save any Resource to any
defined repository (providing they support the same features).

    person = Person.new(:name => 'Samuel L. Chang')

    # Now that I have my resource, I can save it to wherever I
    # want... By default, the resource will be saved in the
    # :default repository
    person.save

    # Alternatively, I can specify a different repository...
    DataMapper.repository(:second) do
      person.save
    end

While this is an interesting feature, I'm struggling to come up with a
reason why you'd _want_ to do this. To me it just introduces
ambiguity:

> Erm, where did I save that person instance? I'm sure it's around here
> somewhere... Where are you little person instance? Peekaboo!
>
> <cite>Me, a year later.</cite>

In reality, so long as you're explicit about wrapping parts of your
application in the correct repository blocks, this is not a problem.
But wherever you have to be explicit there is the possibility that
someone will forget; forgetting _just once_ might be enough to cause
obscure bugs.

If the `external_id` attribute was set to disallow nil, the second
call to `person.save` in the above example would fail, since no value
was set. (In fact, the above example would fail anyway, since the
first call to `person.save` would mark the resource as clean, thus the
second call would do nothing.)


## A better way?

I'm of the belief that each model should be associated with one--and
only one--repository. This would be the `:default` repository, except
where a user explicitly declares otherwise when setting up their
model. A `Person` would be associated with the default repository
_always_, regardless of the current repository context. In the example
below, the person would be persisted to the default repository even
though it's wrapped in another repo.

    person = Person.new(:name => 'Michael Scarn')

    DataMapper.repository(:second) do
      person.save
    end

DataMapper could provide a method for changing the default repository:

    class Person
      include DataMapper::Resource

      # Tells DM that the Person model should be persisted
      # to the :second repository.
      set_repository :second

      property :id,   Serial
      property :name, String
    end

By doing this, users would never need to worry about repository
context outside of their models, making their domain objects much more
straight-forward.

As far as I'm concerned, `Person` and `repo(:second) { Person }` are
two different models, with different interfaces, different properties,
and are stored in different repositories. The second Person should
probably be represented as another model, distinct from the first.

Since DataMapper doesn't congflate class inheritance with Single Table
Inheritance, we could use inheritance to achieve the same effect as
the current API:

    class Person
      include DataMapper::Resource

      property :id,   Serial
      property :name, String
    end

    # Inherits properties from Person, but adds it's own
    # custom properties, and persists to another repo.
    class HRPerson < Person
      set_repository :second
      property :external_id, Integer
    end

### Problems with this approach...

`Model#copy` would break. Well... it wouldn't just break. The entire
concept of copying resources across repositories would become
redundant.

## An alternative meaning for repository blocks

By doing away with the current meaning of repository blocks within
model instances, we free up the API to do something I think is much
more interesting: models which persist _across_ multiple repositories.

Let's take a (slightly contrived) example...

    DataMapper.setup(:default,         'yaml://localhost/main')
    DataMapper.setup(:human_resources, 'yaml://localhost/hr')

    class Employee
      include DataMapper::Resource

      property :id,       Serial
      property :name,     String
      property :username, String
      property :password, String

      repository(:human_resources) do
        property :salary, Integer
        property :pay_on, Date
      end
    end

Our employee model has six properties: `name`, `username`, and
`password` will be persisted to the default repository, while `salary`
and `pay_on` will be persisted to the human resources repository.
`id`, since it is a key, is used in _both_.


Let's create a employee...

    Employee.create(
      :name     => 'Michael Scarn',
      :username => 'mscarn',
      :password => '12345',
      :salary   => 2000,
      :pay_on   => Date.today
    )

Here's what would happen "under the hood":

1. We assume that the key is generated by the model's default
repository. In the absence of a `set_repository` statement, DataMapper
assumes `:default`.

2. DataMapper then saves the resource to the default repository. In
this example it persists the name, username, and password, and returns
the ID which was generated.

3. It then proceeds to persist the salary and pay_on attributes to the
human resources repository with the ID returned by the default repo.

Our storage ends up looking a little like this:

    # default/employees.yaml
    - id: 95143
      name: "Michael Scarn"
      username: "mscarn"
      password: "12345"

    # hr/employees.yaml
    - id: 95143
      salary: 2000
      pay_on: 2010-02-01

### Lazy loading from multiple repositories

Loading a resource without specifying which fields you want to load
would work in a way similar to lazy loading.

    user = User.get(95143)

This loads the User with `id`, `name`, `username`, and `password` from
the default repository. Calling `user.salary` would load all of the
attributes which belong to the human resources repository.

    user = DataMapper.repository(:human_resources) do
      User.get(95143)
    end

    # ... or ...
    user = User.get(95143, :repository => :human_resources)

This loads the User with `id`, `salary`, and `pay_on` from the human
resources repository. Calling `user.name` would load all of the
attributes which belong to the default repository.

### Finishing up

I think this behaviour has a lot of potential: In many web
applications developers have made the compromise of denormalising data
in order to improve performance. DataMapper could instead provide an
API to store these denormalised "cache" attributes in a fast key/value
store.

    class Journey
      include DataMapper::Resource

      property :id,       Serial
      property :start_at, String
      property :end_at,   String

      repository(:redis) do
        property :really_expensive_computation, String
      end
    end

-- 
You received this message because you are subscribed to the Google Groups 
"DataMapper" group.
To post to this group, send email to datamap...@googlegroups.com.
To unsubscribe from this group, send email to 
datamapper+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/datamapper?hl=en.

[DataMapper] Some thoughts on DataMapper's Property API

Reply via email to