Re: A MooseX ORM, an attempt

Yuval Kogman Thu, 17 Jun 2010 10:48:37 -0700

Having briefly looked at the code, here are my thoughts on how to decouple
this stuff.


Here is my personal wishlist from the next ORM, whatever it may be. I tend
to view things very bottom up, so these are all implementation oriented, not
so much sugar oriented.

Many of these things are already being designed/implemented by
mst/ribasushi/fr3w etc, so swing over to #dbix-class and discuss with them.

I'm sure Fey::ORM has several of these points covered, though I haven't
actually used it yet.

It looks like your sketch has not yet addressed most of these issues, apart
from the first ones, so hopefully this is going to be helpful at least to
think about.





1. the task of mapping between n-tuples and objects is useful in and of
itself.

This would allow easily plugging this as an inflator type for DBIC, building
a new ORM, CSV -> Moose mappings, etc etc.

It would be cool if this was a separate metaclass trait that the ORM relied
on, that is available independently.

If the metaclass would provide an API like:

    my $f = $metaclass->tuple_factory( %column_spec );

    $f->new_from_list(@columns);
    $f->new_from_hash(\%hash)

    # and maybe even:
    $f->new_from_sth($sth); # runs fetchrow_arrayref or whatever, with
support for fetchall_arrayref with max rows, for performance reasons, proper
use of bindings, etc

in this case %column_spec is *not* the definition of the column schema,
that's found in attributes and traits, those are parameters that define how
the new_from_* methods will be invoked (i.e. the order of the columns, or
something).

Similarly, if that could map data into tuples, that would also be cool.




2. The next logical step is handling relationships using this subsystem.

There are two distinct ways this should be possible to do. The first is by
being able to group columns from the column builder, and delegate them to
attributes as groups. This is useful for selects with joins that return all
the data at once.




The second is a meta trait that invokes another fetch from the database,
that's more trivial and is a type of filter on the meta attr.

3. a mapping between standard Moose type constraints and standard DBI column
types would be handy for all the projects.

This is a helpful as a MooseX::Types::DBI, completely separate from any ORM.




4. a mapping between table space and class space

I.e. decouple the notion of the structural mapping of the tables from that
of the schema.

This would allow easier mixing of custom inflation types, while still making
relational querying possible, because the two separate concepts are detached
from one another.

What I mean by this is metadata on related tables, and how they are related
(via FK information embedded in attributes) is not necessarily what
determines the way that this the data is constructed, it'd only be a handy
introspection API for getting the necessary information to create join
conditions

Bonus points if you can do adhoc mappings easily, for instance define
classes that are built using virtual views (just a select statement from
several tables, not a real database view), for example.




5. DBIx::Class::ResultSet style combinatorial stuff

...but without SQL::Abstract as a requirement.


i.e. this would be the notion of a result set with abstract conditions that
must know how to combine with boolean logic, so you have a sub object for a
WHERE clause, and that would know to do some sort of boolean AND when
combined with another where clause, to create a compound where clause.

This would allow still using SQL::Abstract as a DWIM search facility, but
also allow doing things like specifying dummy objects, Fey::SQL, etc.

I'd also like to be able to construct these not just from other ones, but
manually. In DBIC one of the major usage pains many of the representational
objects are also the only way to construct more objects, so you're forced to
do things like:

    $dir->search( {}, { result_class => ... } );

in order to choose a non default inflator, but really i think the solution
here would be to simply allow creating of resultset style objects without
the notion of iteration/result gather built in.

This means that a DBIC style high level resultset would actually an object
with several delegates: the combinable query representation, an SQL compiler
that knows to create sths, a result inflator, and an iterator.

DBIx::Class certainly allows some very flexible constructs to be used, that
are surprising for an ORM (HashInflator, ResultSetColumn), but intuitively I
believe these things should be a lower level than the ORM, a facility that
it can use, instead of layer on top of it that uses hooks.

I'd like to see not only subsearches, like ->or, or ->search (or in your
case ->filter) for implied and, but also the ability to take two unrelated
resultsets and intersect/union them using SQL UNION/INTERSECt or simple
boolean logic (i.e. just ->and together the where clauses of two objects).

This also relates to things like taking a resultset and fetching a single
object out of it. ->First, ->last, pagination, random selection and other
things like that should be another layer of abstract operations so that you
can take a random item out of a data page, etc, or advance to a relative
item from ->first/last (i.e. if you take ->first, it's handy to know if you
can go to ->next or ->previous in order to generate widgets. I don't know of
an ORM that lets you say ->first without immediately getting the result
object, so the abstract computation that goes into the limit clause is
essentially not reusable, since it's directly coupled to the query
execution).




6. Good support for SELECT DISTINCT over a primary key

This is in direct opposition to relational algebra, FWIW, but often times
people really want to have it.

Basically, the notion that a single primary key maps to a single refaddr()
for a given query is very convenient when you have has_many relationships.

This is not easy to do in any ORM i've used yet.

This makes it possible to treat resultsets as sets of objects, not sets of
tuples, in principle, regardless of the join conditions.

I.e. if you query a table, and want to intersect it with some link table,
but also get the data related to those objects from a has many relationship,
this gets messy in relational algebra. If you do it in one query and group
by on the first column's key, then data from the second relationship is
discarded, and if you don't then data from the first table is duplicated.

It would be nice if there was an ORM that has a more automated facility to
do the right thing, FSVO "right", i.e. keep track of the previously inflated
object in a lookbehind window and return the previously inflated one if the
columns for that relation are identical when executing a single query, or
intelligently perform multiple queries, without the user needing to care.




7. Good support for querying *tuples*

The class hierarchy is useful for a direct mapping, but most of the use of
relational stuff is to do interesting queries and aggregations.

By far the biggest weakness IMHO in most ORMs is the inability to create
adhoc things. HashInflator and ResultSetColumn address this for DBIC, but
it's still very easy to get confused when you query for e.g. the sum of a
column... complex things in the WHERE clause are handled well by ORMs, but
complex things in the SELECT ... FROM bit are not really coherent without
explicitly saying what you mean.




8. LINQ style anonymous class generation

This strongly related to the previous item.

For example suppose you want to generate a summary about some data, you
could define a class with various columns for those data, i.e. age => Int,
gender => enum([qw(male female)]), etc etc.

This metaclass could then be used to construct a summary metaclass on the
fly (hopefully properly cached and easy to pregenerate to avoid runtime
costs) based on some operations. For example, querying the average, min and
max ages based on the gender would create a similar looking object, but
where 'age' was interpolated into three columns.

This gets much more useful when you have things like data ranges, i.e. get
the average number of purchases for users aged 20-30 should probably use
some sort of number set object (i.e. Set::Infinite), etc.

9. Good *clean* hooks and separation of concerns.

DBIx::Class is a clear winner here.

I've been able to abuse DBIC in so many ways, and so have countless others.

Without a clean cloneable schema that is decoupled from the database handle
things like KiokuDB integration, SQL::Translator stuff,
DBIx::Class::Journaled etc would be way way harder.

It's still far from perfect, but that was IMHO DBIC's biggest win over
Class::DBI, and the number one reason I personally have little hopes about
Fey::ORM (i.e. keeping the database handle as class data is extremely
limiting for doing things like that).

For example, one thing that's a little tricky to do in DBIC right now is to
get something similar to KiokuDB's live object tracking (which is necessary
for handling cyclic object graphs) because the resultset is treated as a
sort of factory, creating an object for every tuple, instead of something a
little more loosely defined.

Without doing this, cyclic references lead to $obj->child->parent having a
different refaddr than $obj. That's acceptable, but often times a big
limitation, that requires creating a second layer of objects on top of the
database objects just to handle this mapping from relational to graphy.

With the KiokuDB integration I do that by simply discarding tuples that are
irrelevant at inflation time, but by this stage a lot of work has been done
for nothing.



Lastly, have a look at Data::Stream::Bulk, see if it interests you. For an
ORM it probably doesn't have much of a benefit, but it's useful for getting
results out of DBI statement handles iteratively with very low overhead.


Anyway, I hope this rant is actually useful for someone ;-)

Re: A MooseX ORM, an attempt

Reply via email to