Re: [Dbix-class] Replication Branch, Status and Call for Comments

John Napiorkowski Thu, 19 Jun 2008 09:10:29 -0700

--- On Sun, 6/8/08, Matt S Trout <[EMAIL PROTECTED]> wrote:

> From: Matt S Trout <[EMAIL PROTECTED]>
> Subject: Re: [Dbix-class] Replication Branch, Status and Call for Comments
> To: "DBIx::Class user and developer list" <[email protected]>
> Date: Sunday, June 8, 2008, 2:04 PM
> On Sat, Jun 07, 2008 at 08:10:16AM -0700, John Napiorkowski
> wrote:
> > Hi,
> > 
> > So this is a call for comments on the outstanding
> replication_redux branch, which is scheduled to be merged
> back to trunk shortly.  This email is a summary of the
> features, differences from the existing version, and any
> issues or questions I am punting to the group.
> 
> No Moose please.
> 
> DBIx::Class uses two space indent, not four.
> 
> reload_row has to go - Storage never, -ever- sees a row
> object because that
> implentation is designed to be pluggable (and there may not
> even -be- a
> row object).
> 
> In any case, if I loaded something from a slave I might not
> want
> discard_changes to go to the master - you need to find some
> other way of
> specifying this. The only use case I can honestly see for
> this is when
> you've -just- inserted the row, in which case the
> find() would happen inside
> the transaction and should go to the master anyway.


I fixed the indenting issue.

I changed the reload_row method.  Now this method is localized to the row 
object itself.  However, several issues exist for me regarding the 
DBIC::PK->discard_changes method.  I strongly feel the way this method is 
implemented is leading to incorrect use.  It seems a lot of people are taking 
advantage of the fact it actually calls ->find to get whatever the storage has 
for the matching primary keys in order to do stuff like reloading a row after 
insert to get any field changes introduced by triggers or similar things.  To 
me, a method called 'discard_changes' merely means to revert a row state to 
whatever it was before any changes were introduced.  I wasn't around when this 
method was first cooked up, so maybe there is some reason for doing it this 
way, but to me the fact that the current implementation is doing a find is 
merely a side effect, not the primary function.  If we are going to rely on 
this behavior that means we could never optimize this
 function to do something like restore the fields from a cached version, etc.

So I feel if we need both behavior types there needs to be two methods.  So 
this is what I did:

1) In DBIC::Row I created a new method called "get_from_storage" which performs 
the single task of returning a new row object that matches the current in 
memory row object using ->find.  

2) I set DBIC:PK->discard_changes to use 'get_from_storage' so that it 
continues to work as is does now, including the questionable side effect.  This 
frees us to start having a discussion about the best way to clarify the meaning 
and effect of these methods.

3) In order to properly support Replication, I make sure the ->find that 
->get_from_storage issues is wrapped in a transaction, since Replication forces 
all queries inside a transaction to go to the Master.  This way for all the 
current users of ->discard_changes that are expecting it to reliably return the 
current storage value continue to get what they want.

I hear what you are saying about letting the user choose whether 
->discard_changes uses the master or is allowed to query a replicated storage.  
Certainly for performance reasons you want to limit the number of times when 
select queries are forced to the Master storage.  However I really believe that 
you shouldn't have to write your code so differently in order to get 
replication working.  Also, I think it's wrong for components like Versioning, 
Partitioning, etc. to be required to explicitly check if the storage is 
replicated or not.  Down that path insanity lies.  Stuff you write for a single 
database would ideally run perfectly when you need   replication.  That's why 
we overload transactions in replication to force all queries to the master.  If 
you are using ->discard_changes as a way to 'validate' your Row object against 
the database, then it is always true that you want the master version, since 
replicated databases will always be a little behind.
  Same thing goes for the new ->get_from_storage method on the Row object.  So 
for the short term I think the approach I've written is the way to go.  Because 
this common use requires it and the other uses merely introduce a performance 
penalty of more queries to the Master.  If, in the future we can address the 
issue of how people are using ->discard_changes for the side effect maybe this 
can change.

For the Moose issue, I have to say this mistake occurred due to a 
miscommunication between us.  For some reason I thought you gave the go ahead 
for Moose in this project.  One of my coworkers here at the job also had the 
same feeling, so I know there must be some good reason I thought that.  However 
you didn't give that go ahead.  The problem now is that converting this from 
Moose is not trivial.  Coding in Moose is a very different style, so I'd pretty 
much have to toss out a lot of code and rewrite (and retest) from scratch.  
Since I did this work as part of my paying job, and since we have this running 
in production right now there is no chance I am going to have the time alloted 
to do this.  Not only is it my time, but also the time of several others on my 
team that spent a lot of hours testing the current code.  At this point we've 
invested dozens of man/hours into writing and testing replication in order to 
make it stable.  So if you would prefer I
 can remove replicated storage from DBIC and make it into a stand alone set of 
components.  Then, when we are working on the Moose-DBIC port we can 
reintegrate.  I'm sure when you start that port I can give some hours to 
helping.
  
Then I can remove the Replication stuff from my replication_redux branch, and 
complete the updates to the test suite so that we can run the full test suite 
against alternative databases, such as mysql, postgresql, etc.  As I mentioned 
before, I started working on this because to properly test the replication code 
I needed to test it against a real replicating database, so I had to fix up the 
test suite enough to run the few tests related to replication.  Since it was 
mentioned to me that having the test suite fixed in this way is greatly 
desirable I am happy to finish that job.  I am about 75% of the way there.  
That way also the work I did on this branch would be immediately useful to the 
community.

Since the last email about replication status I also made quite a few changes 
to how the replicated storage splits master and replicants, which seems to have 
fix a serious deadlocking issue we experienced with this in production.  All 
tests pass and give the same output as trunk.

Sincerely,
John Napiorkowski



      

_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/[EMAIL PROTECTED]

Re: [Dbix-class] Replication Branch, Status and Call for Comments

Reply via email to