--- On Sun, 6/8/08, Matt S Trout <[EMAIL PROTECTED]> wrote:
> From: Matt S Trout <[EMAIL PROTECTED]>
> Subject: Re: [Dbix-class] Replication Branch, Status and Call for Comments
> To: "DBIx::Class user and developer list" <[email protected]>
> Date: Sunday, June 8, 2008, 2:04 PM
> On Sat, Jun 07, 2008 at 08:10:16AM -0700, John Napiorkowski
> wrote:
> > Hi,
> >
> > So this is a call for comments on the outstanding
> replication_redux branch, which is scheduled to be merged
> back to trunk shortly. This email is a summary of the
> features, differences from the existing version, and any
> issues or questions I am punting to the group.
>
> No Moose please.
>
> DBIx::Class uses two space indent, not four.
>
> reload_row has to go - Storage never, -ever- sees a row
> object because that
> implentation is designed to be pluggable (and there may not
> even -be- a
> row object).
>
> In any case, if I loaded something from a slave I might not
> want
> discard_changes to go to the master - you need to find some
> other way of
> specifying this. The only use case I can honestly see for
> this is when
> you've -just- inserted the row, in which case the
> find() would happen inside
> the transaction and should go to the master anyway.
I fixed the indenting issue.
I changed the reload_row method. Now this method is localized to the row
object itself. However, several issues exist for me regarding the
DBIC::PK->discard_changes method. I strongly feel the way this method is
implemented is leading to incorrect use. It seems a lot of people are taking
advantage of the fact it actually calls ->find to get whatever the storage has
for the matching primary keys in order to do stuff like reloading a row after
insert to get any field changes introduced by triggers or similar things. To
me, a method called 'discard_changes' merely means to revert a row state to
whatever it was before any changes were introduced. I wasn't around when this
method was first cooked up, so maybe there is some reason for doing it this
way, but to me the fact that the current implementation is doing a find is
merely a side effect, not the primary function. If we are going to rely on
this behavior that means we could never optimize this
function to do something like restore the fields from a cached version, etc.
So I feel if we need both behavior types there needs to be two methods. So
this is what I did:
1) In DBIC::Row I created a new method called "get_from_storage" which performs
the single task of returning a new row object that matches the current in
memory row object using ->find.
2) I set DBIC:PK->discard_changes to use 'get_from_storage' so that it
continues to work as is does now, including the questionable side effect. This
frees us to start having a discussion about the best way to clarify the meaning
and effect of these methods.
3) In order to properly support Replication, I make sure the ->find that
->get_from_storage issues is wrapped in a transaction, since Replication forces
all queries inside a transaction to go to the Master. This way for all the
current users of ->discard_changes that are expecting it to reliably return the
current storage value continue to get what they want.
I hear what you are saying about letting the user choose whether
->discard_changes uses the master or is allowed to query a replicated storage.
Certainly for performance reasons you want to limit the number of times when
select queries are forced to the Master storage. However I really believe that
you shouldn't have to write your code so differently in order to get
replication working. Also, I think it's wrong for components like Versioning,
Partitioning, etc. to be required to explicitly check if the storage is
replicated or not. Down that path insanity lies. Stuff you write for a single
database would ideally run perfectly when you need replication. That's why
we overload transactions in replication to force all queries to the master. If
you are using ->discard_changes as a way to 'validate' your Row object against
the database, then it is always true that you want the master version, since
replicated databases will always be a little behind.
Same thing goes for the new ->get_from_storage method on the Row object. So
for the short term I think the approach I've written is the way to go. Because
this common use requires it and the other uses merely introduce a performance
penalty of more queries to the Master. If, in the future we can address the
issue of how people are using ->discard_changes for the side effect maybe this
can change.
For the Moose issue, I have to say this mistake occurred due to a
miscommunication between us. For some reason I thought you gave the go ahead
for Moose in this project. One of my coworkers here at the job also had the
same feeling, so I know there must be some good reason I thought that. However
you didn't give that go ahead. The problem now is that converting this from
Moose is not trivial. Coding in Moose is a very different style, so I'd pretty
much have to toss out a lot of code and rewrite (and retest) from scratch.
Since I did this work as part of my paying job, and since we have this running
in production right now there is no chance I am going to have the time alloted
to do this. Not only is it my time, but also the time of several others on my
team that spent a lot of hours testing the current code. At this point we've
invested dozens of man/hours into writing and testing replication in order to
make it stable. So if you would prefer I
can remove replicated storage from DBIC and make it into a stand alone set of
components. Then, when we are working on the Moose-DBIC port we can
reintegrate. I'm sure when you start that port I can give some hours to
helping.
Then I can remove the Replication stuff from my replication_redux branch, and
complete the updates to the test suite so that we can run the full test suite
against alternative databases, such as mysql, postgresql, etc. As I mentioned
before, I started working on this because to properly test the replication code
I needed to test it against a real replicating database, so I had to fix up the
test suite enough to run the few tests related to replication. Since it was
mentioned to me that having the test suite fixed in this way is greatly
desirable I am happy to finish that job. I am about 75% of the way there.
That way also the work I did on this branch would be immediately useful to the
community.
Since the last email about replication status I also made quite a few changes
to how the replicated storage splits master and replicants, which seems to have
fix a serious deadlocking issue we experienced with this in production. All
tests pass and give the same output as trunk.
Sincerely,
John Napiorkowski
_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/[EMAIL PROTECTED]