Re: RE: [others] Re: Views with functions and performance issues

2006-09-22 Thread Dan Buettner

The hit with a join on indexed columns is negligible.  Relational
databases live for joins - they eat them for breakfast!  Seriously, as
long as it's indexed in both tables, it'll be super-speedy.

Dan

On 9/22/06, Christopher Brooks <[EMAIL PROTECTED]> wrote:

Hi, thanks for the comments,

> If I'm understanding right - the view contains an additional
> column that is an MD5 hash of some or all of the data in the
> base table, right?

Close.  It's got all of the data in the base table except for the colum
that's being hashed - we show the hashed version of that instead.  Like a
password table, where the base table has the plain text passwords and
usernames, and the view only shows the hashed passwords and usernames
(though it's not for passwords).

> set up second table, with two columns.  One is id number to
> correspond to id number in original table, second is to hold
> Md5 hash.  Createn index on the MD5 column for sure, possibly
> a two column index.  I'm not sure whether it would be better
> to create it as (id, md5_col) or (md5_col, id); I'd test it
> each way I guess.
>
> Set up insert/update/delete triggers on the first table to
> add/update/delete records in the second table with ID and the
> MD5 hash.
>
> Then re-create your view to show you all the columns from the
> first table plus the MD5 column from the second table,
> joining on the id column.

This sounds good, and this way I don't have to change the first table much
at all (as it's a replicated table and my luck with replication only lately
seems to have gotten good).  What kind of performance hit will I be taking
because of the join in the view (e.g. every query to the view is going to
have to have to do the join, yes?).  Or is this neglegable as long as the
hash and pointer of the second table are both indexed...

Chris




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: [others] Re: Views with functions and performance issues

2006-09-22 Thread Christopher Brooks
Hi, thanks for the comments,

> If I'm understanding right - the view contains an additional 
> column that is an MD5 hash of some or all of the data in the 
> base table, right?

Close.  It's got all of the data in the base table except for the colum
that's being hashed - we show the hashed version of that instead.  Like a
password table, where the base table has the plain text passwords and
usernames, and the view only shows the hashed passwords and usernames
(though it's not for passwords).

> set up second table, with two columns.  One is id number to 
> correspond to id number in original table, second is to hold 
> Md5 hash.  Createn index on the MD5 column for sure, possibly 
> a two column index.  I'm not sure whether it would be better 
> to create it as (id, md5_col) or (md5_col, id); I'd test it 
> each way I guess.
> 
> Set up insert/update/delete triggers on the first table to 
> add/update/delete records in the second table with ID and the 
> MD5 hash.
> 
> Then re-create your view to show you all the columns from the 
> first table plus the MD5 column from the second table, 
> joining on the id column.

This sounds good, and this way I don't have to change the first table much
at all (as it's a replicated table and my luck with replication only lately
seems to have gotten good).  What kind of performance hit will I be taking
because of the join in the view (e.g. every query to the view is going to
have to have to do the join, yes?).  Or is this neglegable as long as the
hash and pointer of the second table are both indexed...

Chris


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Views with functions and performance issues

2006-09-22 Thread Dan Buettner

If I'm understanding right - the view contains an additional column
that is an MD5 hash of some or all of the data in the base table,
right?

Yes, I would expect that to be very very slow.  When selecting, your
database engine has tro calculate 700K MD5 hashes.  Slow.  When
selecting a subset it has to do that plus what amounts to a table scan
of those 700K MD5 hashes.  Very slow.

Wonder if you could do this:

set up second table, with two columns.  One is id number to correspond
to id number in original table, second is to hold Md5 hash.  Createn
index on the MD5 column for sure, possibly a two column index.  I'm
not sure whether it would be better to create it as (id, md5_col) or
(md5_col, id); I'd test it each way I guess.

Set up insert/update/delete triggers on the first table to
add/update/delete records in the second table with ID and the MD5
hash.

Then re-create your view to show you all the columns from the first
table plus the MD5 column from the second table, joining on the id
column.

I think this should end up being pretty fast since the hashes will
only be calculated when the data changes, not all at once for every
select, and an index will speed things up tremendously when selecting
a subset.

HTH,
Dan

On 9/22/06, Christopher Brooks <[EMAIL PROTECTED]> wrote:

Hi,

I've got a view of a base table that is 100% identical to that base table
except for one column, which is a projection of the base table after its MD5
hashed.  The table is largish (~700,000 rows) and is growing quickly.
Queries on the base table are nice and fast, but on the hashed view are
painfully slow - in particular when restricting the queries on the view
based on the hashed colum.

I assume this is because there is no index for the hashed colum.  Is there a
good way I can deal with this, without changing the base table?

TIA, running MySQL 5 on windows, can change to a later build of MySQL (as
long as I can replicate from 5) if there is some funky stuff in there that
will do what I need...

Chris

--
Christopher A. Brooks
Research Officer, ARIES Group
University of Saskatchewan

Email: [EMAIL PROTECTED]
Mail: Christopher Brooks
  Department of Computer Science
  University of Saskatchewan
  57 Campus Drive
  Saskatoon, Saskatchewan, S7N 5A9
  Canada


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Views with functions and performance issues

2006-09-22 Thread Christopher Brooks
Hi,

I've got a view of a base table that is 100% identical to that base table
except for one column, which is a projection of the base table after its MD5
hashed.  The table is largish (~700,000 rows) and is growing quickly.
Queries on the base table are nice and fast, but on the hashed view are
painfully slow - in particular when restricting the queries on the view
based on the hashed colum.

I assume this is because there is no index for the hashed colum.  Is there a
good way I can deal with this, without changing the base table?

TIA, running MySQL 5 on windows, can change to a later build of MySQL (as
long as I can replicate from 5) if there is some funky stuff in there that
will do what I need...

Chris

--
Christopher A. Brooks
Research Officer, ARIES Group
University of Saskatchewan 
 
Email: [EMAIL PROTECTED]
Mail: Christopher Brooks
  Department of Computer Science
  University of Saskatchewan
  57 Campus Drive
  Saskatoon, Saskatchewan, S7N 5A9
  Canada  


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]