Re: How to Handle Tightly Coupled Database Tables?

2009-03-04 Thread bcreeve

On Mar 4, 2:05 pm, AD7six  wrote:

> > When I mentioned aMySQL viewto solve the problem of calculated
> > columns, you replied:
> > > This is called de-normalization, and is a very common thing to want to
> > > do, for many reasons, including but not limited to performance
> > > improvements and reducing code complexity.
> > For what it's worth, creating aMySQL Viewto join the two and include
> > calculated columns is not de-normalization.
>
> You're quoting out of context/sequence and possibly missing the point.

You're right.  My mistake.  However, my point is still valid as I
believe Nate originally incorrectly considered calculating pseudo-
columns in the model
s afterFind method "de-normalization".

> In what way would using a view (if you choose) or using
> counterCache fields for each of your agregate fields, or a stock left
> join, group by and agregate function, NOT solve your question?

Everything you mention is sufficient, except using counterCache.  I
was trying to get a sanity check to make sure I wasn't bypassing some
already existing piece of Cake.  Since it seems the community is
divided when it comes to using views/functions in the database or
using a custom query (really the same thing) I wanted to see if there
really is a Cake way to do what I want.

De-normalizing would certainly allow me to search and sort as I
described.  I know counterCache can be used to populate a
voicemails.transcription_count column, but I don't know that I can use
it to populate something like
voicemails.transcribed_transcription_count or a boolean
voicemails.has_blank_transcriptions, can I?  Maybe that is the purpose
of counterScope, which does not appear to be documented at
book.cakephp.org.  I'll check into Cake's core and unit tests to see
if it would do the trick.

De-normalizing also does not support a calculated column that depends
on a point in time.  For instance if a transcription can be locked for
15 minutes, and that 15 minute window passes, a
voicemails.locked_transcription_count column is not instantly (or
ever) updated.  The best you could do is set up a cron job to manage
these, which is not clean.  A view with a calculated column (like
DATEDIFF with SYSDATE) handles all of this by trading off a bit of
read-time performance.  I think this trade off very much outweighs the
cost of development complexity to manage de-normalizing data.

Thanks all for the help.  But given my situation, and the lack of
another clear solution that will solve every challenge I've noted, I
think creating a view is going to be the best option.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"CakePHP" group.
To post to this group, send email to cake-php@googlegroups.com
To unsubscribe from this group, send email to 
cake-php+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/cake-php?hl=en
-~--~~~~--~~--~--~---



Re: How to Handle Tightly Coupled Database Tables?

2009-03-04 Thread AD7six



On Mar 4, 6:06 pm, bcreeve  wrote:
> Hey Nate.  Thanks for the feedback but I'm not sure it provided much
> help for me.  Hopefully it will provide some insight for some others.
>
> By "tightly coupled" I simply meant that I pretty much always want to
> look at data from both tables together.  I didn't realize the term was
> conflicting with an existing notion of coupling.
>
> In hindsight I suppose the contrived example is insufficient to
> explain reality.  I have Voicemail hasMany Transcription (and
> Transcription belongsTo Voicemail).
>
> When I mentioned aMySQL viewto solve the problem of calculated
> columns, you replied:
>
> > This is called de-normalization, and is a very common thing to want to
> > do, for many reasons, including but not limited to performance
> > improvements and reducing code complexity.
>
> For what it's worth, creating aMySQL Viewto join the two and include
> calculated columns is not de-normalization.

You're quoting out of context/sequence and possibly missing the point.

>  Each table is still as
> normalized as it is without the view.  Using a view is really just a
> valid choice to move some core business logic to the database for
> performance and complexity reasons.

indeedy. To all intents and purposes a db-view is just a table to
cake.


> >You can see an example
> > implemented in the CakePHP core in the form of counter caches for
> > hasMany/belongsTo 
> > relationships:http://book.cakephp.org/view/490/counterCache-Cache-your-count
>
> I currently use counter cache where appropriate, however AFAIK I can't
> have an auto-magic counter cache while specifying child conditions.
> For instance, at the Voicemail level I need to know how many
> Transcriptions are transcribed and how many are untranscribed.  Also,
> the the voicemail level, I want a boolean column as a flag to tell me
> if any transcriptions are transcribed but blank.

I don't follow your example but if you did de-normalize (which means
literally creating a field for each of your counts) you can use
counterCache and counterScope to update the values as appropriate. You
could also do it directly with group by and agregate functions or if
you use a view, no need for countercache but you'll probably find that
given views in mysql are typlically un-optimized that only
superficially solves your problem.

> In my Oracle and MS SQL experience, no question this stuff would be
> best calculated a view or pipeline function.  Cake seems to want
> things done in Cake, the Cake way.  I buy into this only as long as it
> is possible or practical to do so within the framework.  So I started
> adding these calculated columns in the afterFind method

That's typically a mistake.

> of the
> Voicemail model.  Now they are always available when I find a
> Voicemail (these were referred to as pseudo columns or pseudo fields
> in other articles or posts).  Problem is, these columns end up being a
> separate SELECT COUNT statement to the DB, which is not going to scale
> well in terms of performance.
>
> > > For my needs, the above has several limitations:
> > > 1. I cannot query against any Child columns or these calculated
> > > columns.
>
> > This seems like an artificial constraint.  Why not query the Child
> > object directly?
>
> The context is not at the child level.  I'm trying to search and sort
> Voicemails (in the Voicemail index) based on the status of its
> collective child Transcriptions.  In my opinion, it is not appropriate
> to be querying about Voicemails at the Transcriptions level.

That's the problem with contrived examples.

> Also,
> it's not trivial or appropriate to be asking about information about
> sibling Transcriptions that are related by voicemail_id at the
> Transcription level.  In other words, I don't want to ask the
> Transcriptions if all Transcriptions for a particular Voicemail are
> adequately transcribed.  I should be able to ask a Voicemail about its
> children Transcriptions.
>
> > > 2. I cannot sort on said columns.
>
> > See above.
>
> Same as #1 above.

It looks like your ignoring the answer that's already been given (de-
normalize).

>
> > > 3. Cake gives each Child get its own SELECT via hasMany, and each
> > > pseudo column for each parent gets its own SELECT, which likely won't
> > > scale with respect to performance.
>
> > If you query the Child object directly, Parent will be attached via a
> > LEFT JOIN, and everything will happen in one query.  Also, I'm not
> > sure what you mean about "pseudo" columns.  Why not just actually
> > create the columns you mentioned above, and use them to cache count
> > values?
>
> See #1
>
> > > 4. Lastly, I need to paginate filtered and/or sorted results, which
> > > just seems to add another layer of complexity.
>
> > See the solution I mentioned immediately above.  The default core
> > pagination would apply perfectly in that scenario.
>
> Thanks for the feedback but I'm not sure we were on the same page
> initially.  Does my response giv

Re: How to Handle Tightly Coupled Database Tables?

2009-03-04 Thread bcreeve

On Mar 4, 1:01 am, Miles J  wrote:
> Have you tried paginating/searching from the actual HABTM model? In
> this case your model would be ChildParents.

Miles:  Thanks, but I have a hasMany/belongsTo relationship, not
HABTM, so there is no relationship table.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"CakePHP" group.
To post to this group, send email to cake-php@googlegroups.com
To unsubscribe from this group, send email to 
cake-php+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/cake-php?hl=en
-~--~~~~--~~--~--~---



Re: How to Handle Tightly Coupled Database Tables?

2009-03-04 Thread bcreeve

Hey Nate.  Thanks for the feedback but I'm not sure it provided much
help for me.  Hopefully it will provide some insight for some others.

By "tightly coupled" I simply meant that I pretty much always want to
look at data from both tables together.  I didn't realize the term was
conflicting with an existing notion of coupling.

In hindsight I suppose the contrived example is insufficient to
explain reality.  I have Voicemail hasMany Transcription (and
Transcription belongsTo Voicemail).

When I mentioned a MySQL view to solve the problem of calculated
columns, you replied:

> This is called de-normalization, and is a very common thing to want to
> do, for many reasons, including but not limited to performance
> improvements and reducing code complexity.

For what it's worth, creating a MySQL View to join the two and include
calculated columns is not de-normalization.  Each table is still as
normalized as it is without the view.  Using a view is really just a
valid choice to move some core business logic to the database for
performance and complexity reasons.

>You can see an example
> implemented in the CakePHP core in the form of counter caches for
> hasMany/belongsTo 
> relationships:http://book.cakephp.org/view/490/counterCache-Cache-your-count

I currently use counter cache where appropriate, however AFAIK I can't
have an auto-magic counter cache while specifying child conditions.
For instance, at the Voicemail level I need to know how many
Transcriptions are transcribed and how many are untranscribed.  Also,
the the voicemail level, I want a boolean column as a flag to tell me
if any transcriptions are transcribed but blank.

In my Oracle and MS SQL experience, no question this stuff would be
best calculated a view or pipeline function.  Cake seems to want
things done in Cake, the Cake way.  I buy into this only as long as it
is possible or practical to do so within the framework.  So I started
adding these calculated columns in the afterFind method of the
Voicemail model.  Now they are always available when I find a
Voicemail (these were referred to as pseudo columns or pseudo fields
in other articles or posts).  Problem is, these columns end up being a
separate SELECT COUNT statement to the DB, which is not going to scale
well in terms of performance.

> > For my needs, the above has several limitations:
> > 1. I cannot query against any Child columns or these calculated
> > columns.
>
> This seems like an artificial constraint.  Why not query the Child
> object directly?

The context is not at the child level.  I'm trying to search and sort
Voicemails (in the Voicemail index) based on the status of its
collective child Transcriptions.  In my opinion, it is not appropriate
to be querying about Voicemails at the Transcriptions level.  Also,
it's not trivial or appropriate to be asking about information about
sibling Transcriptions that are related by voicemail_id at the
Transcription level.  In other words, I don't want to ask the
Transcriptions if all Transcriptions for a particular Voicemail are
adequately transcribed.  I should be able to ask a Voicemail about its
children Transcriptions.

> > 2. I cannot sort on said columns.
>
> See above.

Same as #1 above.

> > 3. Cake gives each Child get its own SELECT via hasMany, and each
> > pseudo column for each parent gets its own SELECT, which likely won't
> > scale with respect to performance.
>
> If you query the Child object directly, Parent will be attached via a
> LEFT JOIN, and everything will happen in one query.  Also, I'm not
> sure what you mean about "pseudo" columns.  Why not just actually
> create the columns you mentioned above, and use them to cache count
> values?

See #1

> > 4. Lastly, I need to paginate filtered and/or sorted results, which
> > just seems to add another layer of complexity.
>
> See the solution I mentioned immediately above.  The default core
> pagination would apply perfectly in that scenario.


Thanks for the feedback but I'm not sure we were on the same page
initially.  Does my response give you any more clarification?

brian
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"CakePHP" group.
To post to this group, send email to cake-php@googlegroups.com
To unsubscribe from this group, send email to 
cake-php+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/cake-php?hl=en
-~--~~~~--~~--~--~---



Re: How to Handle Tightly Coupled Database Tables?

2009-03-03 Thread Miles J

Have you tried paginating/searching from the actual HABTM model? In
this case your model would be ChildParents.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"CakePHP" group.
To post to this group, send email to cake-php@googlegroups.com
To unsubscribe from this group, send email to 
cake-php+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/cake-php?hl=en
-~--~~~~--~~--~--~---



Re: How to Handle Tightly Coupled Database Tables?

2009-03-03 Thread Nate

Hi Brian,

I hope I can help.

On Mar 3, 2:53 pm, bcreeve  wrote:
> Most often the way Cake handles relationships is very efficient for
> getting new development off the ground.  However, I recently have hit
> a wall when it comes to handling two or more tables that are tightly
> coupled.  By "tightly coupled" I mean the data from multiple tables
> makes most sense when looked at in conjunction.

Based on your description, I think you misunderstand the concept of
coupling.  "Low coupling refers to a relationship in which one module
interacts with another module through a stable interface and does not
need to be concerned with the other module's internal
implementation" (from http://en.wikipedia.org/wiki/Coupling_(computer_science)
).  Particularly in terms of record relationships in third normal
form, the concept of coupling almost doesn't even apply.

> In the interest of simplicity, let's say I have Parent that hasMany
> Child (and Child belongs to Parent).  I often want to view, search and
> sort using anything like the parent's last name, number of children,
> number of male children, are any children under 18, etc.

In my experience, I'd say you're better off explaining your actual
problem, rather than a contrived one.  Your explanation would lead
someone to reasonably conclude that you're talking about a model (i.e.
Person) joined on itself in a hasMany/belongsTo relationship, however,
the way you're approaching the explanation seems to contradict this.
Your example is ambiguous.

> It's conceivable that everywhere I consider a Parent, I also want to
> have access to all the data I mentioned in the paragraph above.  So, I
> add some pseudo-columns in the Parent model called child_count,
> male_child_count, and has_children_under_18.  Great, now I have this
> data available every time I ask for Parent(s).

This is called de-normalization, and is a very common thing to want to
do, for many reasons, including but not limited to performance
improvements and reducing code complexity.  You can see an example
implemented in the CakePHP core in the form of counter caches for
hasMany/belongsTo relationships: 
http://book.cakephp.org/view/490/counterCache-Cache-your-count

> For my needs, the above has several limitations:
> 1. I cannot query against any Child columns or these calculated
> columns.

This seems like an artificial constraint.  Why not query the Child
object directly?

> 2. I cannot sort on said columns.

See above.

> 3. Cake gives each Child get its own SELECT via hasMany, and each
> pseudo column for each parent gets its own SELECT, which likely won't
> scale with respect to performance.

If you query the Child object directly, Parent will be attached via a
LEFT JOIN, and everything will happen in one query.  Also, I'm not
sure what you mean about "pseudo" columns.  Why not just actually
create the columns you mentioned above, and use them to cache count
values?

> 4. Lastly, I need to paginate filtered and/or sorted results, which
> just seems to add another layer of complexity.

See the solution I mentioned immediately above.  The default core
pagination would apply perfectly in that scenario.

> Basically, I am wondering if there is a native Cake 1.2 way of
> handling this.  If this is a case where the framework doesn't fit the
> need, I would like a sanity check to make sure the following is the
> most Cake-like way to do it.

I think I have explained an appropriate solution above.  Please post a
follow-up with any relevant clarifying detail if this will not work
for your situation.

> I am considering writing a MySQL view called something like
> parents_children.  It will join parents with children and calculate/
> expose the necessary columns.  Then I will bake a model for this MySQL
> view and make my ParentsController use the freshly baked ParentChild
> model.  My index can programatically build WHERE and ORDER BY clauses
> and use $this->ParentChild->query(...).  All my read operations can
> then use $this->ParentChild->find(...).  From what I understand, as
> long as I alias my MySQL view columns like Model.column, I should
> still get a nice associative array like $parentChild['Parent']
> ['last_name'] from query and find, right?

As long as you properly alias all the columns in the view (search this
list for examples), yes, it will work the same as normal Cake query
results.

> And then for pagination, I should be able to do something like
> described inhttp://book.cakephp.org/view/249/Custom-Query-Pagination,
> or no?

Sure.

Hope that helps.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"CakePHP" group.
To post to this group, send email to cake-php@googlegroups.com
To unsubscribe from this group, send email to 
cake-php+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/cake-php?hl=en
-~--~~~~--~~--~--~---



How to Handle Tightly Coupled Database Tables?

2009-03-03 Thread bcreeve

Most often the way Cake handles relationships is very efficient for
getting new development off the ground.  However, I recently have hit
a wall when it comes to handling two or more tables that are tightly
coupled.  By "tightly coupled" I mean the data from multiple tables
makes most sense when looked at in conjunction.

In the interest of simplicity, let's say I have Parent that hasMany
Child (and Child belongs to Parent).  I often want to view, search and
sort using anything like the parent's last name, number of children,
number of male children, are any children under 18, etc.

It's conceivable that everywhere I consider a Parent, I also want to
have access to all the data I mentioned in the paragraph above.  So, I
add some pseudo-columns in the Parent model called child_count,
male_child_count, and has_children_under_18.  Great, now I have this
data available every time I ask for Parent(s).

For my needs, the above has several limitations:
1. I cannot query against any Child columns or these calculated
columns.
2. I cannot sort on said columns.
3. Cake gives each Child get its own SELECT via hasMany, and each
pseudo column for each parent gets its own SELECT, which likely won't
scale with respect to performance.
4. Lastly, I need to paginate filtered and/or sorted results, which
just seems to add another layer of complexity.

Basically, I am wondering if there is a native Cake 1.2 way of
handling this.  If this is a case where the framework doesn't fit the
need, I would like a sanity check to make sure the following is the
most Cake-like way to do it.

I am considering writing a MySQL view called something like
parents_children.  It will join parents with children and calculate/
expose the necessary columns.  Then I will bake a model for this MySQL
view and make my ParentsController use the freshly baked ParentChild
model.  My index can programatically build WHERE and ORDER BY clauses
and use $this->ParentChild->query(...).  All my read operations can
then use $this->ParentChild->find(...).  From what I understand, as
long as I alias my MySQL view columns like Model.column, I should
still get a nice associative array like $parentChild['Parent']
['last_name'] from query and find, right?

And then for pagination, I should be able to do something like
described in http://book.cakephp.org/view/249/Custom-Query-Pagination,
or no?

Again, just looking for a sanity check or if there are any other
suggestions from people who have handled similar problems in Cake.

Thanks a lot!

brian

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"CakePHP" group.
To post to this group, send email to cake-php@googlegroups.com
To unsubscribe from this group, send email to 
cake-php+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/cake-php?hl=en
-~--~~~~--~~--~--~---