Re: Appropriate Design

2008-09-17 Thread Kevin Hunter
At 5:46am -0400 on Wed, 17 Sep 2008, Jeffrey Santos wrote:
 I'm developing an application
 that will require information from various sources.  Since what 
 might be considered required information about those sources will 
 vary (occasionally dramatically), I'm unsure as to the correct 
 design for the database tables.  In other words I'm left with (I 
 believe) two choices:
 
 1.   Create a source table that will contain, among other things, 
 every possible field for a source type and sort out what is 
 considered required information at the application level.

 If possible, maybe a general conventional wisdom statement 
 would greatly help my education on these matters!

This is a common pitfall of noobs to data modeling.  The idea is to try
to think of everything at the forefront, which is almost always an
impossible task, or to think of a minimal case to jump start the coding
process, and then add model components later as necessary.

The second pitfall is keeping model logic in the application.  This
violates two similar principles: Single-Point-of-Authority and
Don't-Repeat-Yourself.

You are using a database so that it can maintain the structure and
integrity of your data.  There is absolutely no other compelling reason
to use a DB.  (If you didn't need integrity and structure, you'd use a
simpler and mucho faster flat file.)  Let the DB do its job and be the
single-point-of-authority.  The application should certainly do things
the right way, putting data where it needs to go, but it should not be
the application's /responsibility/ to keep data integrity and structure.
 If you rely on your application to maintain your structure, you presume
that your programmers are perfect and will think of every little detail.
 (They're not and they won't.  I promise.)

As a general technology, databases receive a lot of work so that
applications developers don't have to sweat the little details.  Like
making sure that every user in a table has a last name.  Like making
sure the user typed a 0 instead of an O.  Like defining constraints so
that developers don't have to make sure an account has enough money to
make a withdraw.  All they need know is that it didn't go through.

The other principle of DRY also holds true.  At the point you have your
application try to maintain data constraints, you will inevitably have
lots of repeated or similar code to maintain similar data cases.  That's
a different kind of nightmare.  Similarly, with the DB, it's silly to
define multiple columns for similar data.  That's spreadsheet think.
Like col_toyota, col_ford, col_chevy, col_lexus, col_buick.  No.  Make a
single column as a foreign key to another table.

Some keywords to use with Google:

normalize
normalization
foreign keys
foreign key constraints
innodb

Wikipedia is often a good starting point.

Once you've learned the concept, I can't stress enough that you should
normalize, normalize, normalize.

 2.   Create a few source tables detailing the required information about
 each source type.  The only way I can think of doing this is something along
 the lines of:

 and then pull only the appropriate data from the database at query time
 using JOINs.

This is closer to a better approach.  That said, be more specific about
the problem you're trying to solve and folks may be better able to
provide useful advices.

Kevin

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Normalization vs. Performance

2008-08-28 Thread Kevin Hunter
At 2:11pm -0400 on Tue, 26 Aug 2008, John Smith wrote:
 So how bad is this? The mentioned query will be the query which is used
 the most in my application (yes, it is going to be a forum).
 Should I break normalization and save the date of the root in each node row?

My recommendation is no.  Normalization is a Good Thing, and you'll be
hard-pressed to convince me otherwise.  In the long run, you (or
following programmers) will thank yourself if you maintain a normalized
core data model.  Seriously.

But speed and performance are the here-and-now issues while DB
technology catches up to demands.  Have you looked at temporary tables
or materialized views?  These might help you in this endeavor.

Kevin

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Adding index to Memory table LOSES ROWS!

2008-07-27 Thread Kevin Hunter
At 12:42p -0400 on Sat, 26 Jul 2008, mos wrote:
 At 09:14 AM 7/26/2008, you wrote:
 The reproducible part is very important, because without it, it's
 suspect to be just your individual case, as with a bug in *your*
 application code, your hardware, or generally something in your setup.
 
 Well, I thought it might be my code too. That's why when the memory
 table was built (without the index), I went to a SqlMgr and counted the
 rows. None of the rows were missing. Then I did an Alter Table and added
 one index to the memory table, and sure enough when it was finished 75%
 of the rows were missing and no error was reported. Only the rows for
 index values A to E were in the table so the indexing lost rows.

That suggests to me a couple of things, both bugs with MySQL:

- an out of memory error - MySQL should *not* fail, but tell you it
  can't complete and return you to a known state.  An RDBMS should
  *never* lose data.  Ever.

- a piece of data in one of the rows of processing that MySQL doesn't
  like, and therefore gives unexpected results.  This is definitely a
  bug as this should not happen to begin with, and An RDBMS should
  *never* lose data.  Ever.

Summary: I don't know what's up and have not encountered this.  But if
you can, create a small test case that can reproduce the error.  Then
fill out a bug at http://bugs.mysql.com/ .  Loss of data is absolutely a
bug, and a critical one.

A quick ( 3min) perusal of the bugs currently open did return any
meaningful results.

Kevin

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Adding index to Memory table LOSES ROWS!

2008-07-26 Thread Kevin Hunter
At 5:52p -0400 on Fri, 25 Jul 2008, mos wrote:
[Adding index to memory table silently loses data]

First thing, what version are you using?

Second thing, although it would still be broken, did you read the MySQL
docs to make sure that the Memory table type supports the index *type*
you attempted to add?

 1) Why does adding an index lose the rows?
 2) Why wasn't an error reported?

[ squint ]

My first thought is that MySQL and the Memory table type are used by
*lotsa* folks, with success, so I find it difficult to believe that this
wouldn't have been caught well before now.

Are you positive this problem isn't hardware related?  A few bad memory
locations could easily explain this.

If you're certain this is a bug in MySQL, creating a small,
*reproducible* test case and reporting this through the proper channels
(http://bugs.mysql.com/) will go much further to fixing the problem.

The reproducible part is very important, because without it, it's
suspect to be just your individual case, as with a bug in *your*
application code, your hardware, or generally something in your setup.

Kevin

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Looking for a Mysql Guru/DBA

2008-05-22 Thread Kevin Hunter
At 1:43p -0400 on Wed, 21 May 2008, bruce wrote:
 The basic goal of the project is to be able to track the sites that I'm
 visiting via a Firefox extension. I want to be able to implement something
 like the breadcrumbs extension, but I want to be able to go a lot further.

If this is for a one-per-person kind thing (i.e. only a single Firefox
installation will use a single DB instance at any time), MySQL may be
overkill.  You may want to look towards something smaller and embedded,
like SQLite[1].  It'll be much less overhead, in both installation for
users and memory overhead for your extension.  In fact, Firefox 3.0
already includes SQLite for the smart url bar they've got going on.

 If you're interested, and you're reasonably good at mysql, and devising
 database structures/schema then let's talk!

You will still want to talk to someone about getting the right schema in
place, however.

Kevin

[1] http://www.sqlite.org/

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Table Structure

2008-05-16 Thread Kevin Hunter
At 2:49a -0400 on Fri, 16 May 2008, Ananda Kumar wrote:
 it goes without saying

Eh, not to those who don't know.  Hence the list and question.  :-D

Krishna, the short answer is that it depends on your data, and the
queries against it that you run.  Test/benchmark on your own DB and data
to see what solves your issue.

 1. primary key and unique key would do unique scan which is 
 fastest of all scan

Sort of.

First, let's get terminology correct: keys, indexes.

Terminology: keys are nothing more than data.  Keys are used to identify
rows for manipulation, either directly by a SQL statement/request, or
indirectly through the DB SQL parser.  A key is not necessarily unique.
 For example

SELECT name FROM person WHERE height = 70;

is a request for the names of people who are 5'10 tall.  The
identifying key in this statement then, is height.  All people who are
70 inches tall are keyed for selection.

A special case of a key is a primary key.  This tells the DB that the
data in this column(s) uniquely identifies each row.  Assuming that 'id'
was created as a PRIMARY KEY,

SELECT name FROM person WHERE id = 7;

will return exactly 1 or 0 rows because id is guaranteed to be unique by
the DB.  Further, a primary key is also implicitly guaranteed to be not
null.

A unique key is a bit of a misnomer, and I'm sorry the MySQL
documentation perpetuates it.  It is more accurate to think of the
column as being constrained such that every row in that column(s) is
unique.  Thus, unique is a /property/ of a column, not it's defining
characteristic.

Terminology: indexes are an /implementation/ of the DB, usually for
speed and to help maintain data integrity.  When you designate a column
(or columns) as the primary key for the rows of a table, MySQL will
automatically create an unique index on that column.  This way, for
example, when you insert data into the table, it won't have to scan the
entire table to make sure that the new data is unique.  It can just walk
a couple of levels in the index to ensure uniqueness.

For speed however, it depends on what the DB is asked, and how the index
was defined.  MySQL is somewhat limited in that you are pretty much
limited to the index type the DB has: b-tree index.  But that's
implementation specific.  Oracle, for example, has other index types,
(hash, bitmap).  Modulo the internal specifics of the MySQL
implementation, the rough idea of a unique not null b-tree index (e.g. a
primary key column) is that the DB ostensibly need only walk O(lg(n))
steps to find a key.  A hash index, then, would presumably be the
fastest at O(1).

 2 Index would do range scan, which would be comparitivly slower.

Not exactly.  This, again, depends on implementation and the data
against which queries are run.  An index is an index.  Remember, a
primary key is implemented by the MySQL engine as a unique not null
b-tree index.  In fact, trying to get low-cardinality data from a table
via indexes could even be faster than the primary key.  Consider a table
with 1,000,001 rows of favorite colors of people.  Let's say 1 million
people like the color blue, but only 1 person who likes the color puce.
 An index for that single person would be a huge win.  Depending on the
DB (read: implementation) one could make this particular style of skewed
data even faster.  Postgres, for example, has partial-indexes.

As I said at the beginning of this email, I suggest the OP do some
personal investigation and tests, because understanding indexes /and
your own data/ is not as simple as a checklist.

Kevin

P.S. I highly suggest you Google some of the concepts about which I
wrote.  Wikipedia is fairly good start.  Also, don't be afraid to read
other DB's docs.  I've learned a great deal from comparing different
DBs, documentation sources, and interacting with multiple communities.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Table Structure

2008-05-16 Thread Kevin Hunter
At 5:32a -0400 on Fri, 16 May 2008, Ananda Kumar wrote:
 2 Index would do range scan, which would be comparitivly slower.

 Not exactly.  This, again, depends on implementation and the
 data against which queries are run.  An index is an index.
 Remember, a primary key is implemented by the MySQL engine as
 a unique not null b-tree index.  In fact, trying to get
 low-cardinality data from a table via indexes could even be
 faster than the primary key.  Consider a table with 1,000,001
 rows of favorite colors of people.  Let's say 1 million
 people like the color blue, but only 1 person who likes the
 color puce. An index for that single person would be a huge
 win.  Depending on the DB (read: implementation) one could make
 this particular style of skewed data even faster.  Postgres,
 for example, has partial-indexes.
 
 As stated by you as above. Here puce is a clear case of just one
 record being read, which is very much similar to reading a record
 with a PRIMARY KEY, which is supposed to be fast for HIGH
 CARDINALITY, if there were 1/4 million people who liked puce, then
 the index would a RANGE SCAN and would be slow.

Eh, again, it's hard to say.  In my contrived example, puce is a clear
case of an index win via the fact that it's unique, but it's *not* like
the primary key because every other key isn't unique.  A primary key
enforces high-cardinality, so selecting individual rows is ostensibly
fast, but now lets select a range of rows.  What if we made ten people
like the color red and SELECTed them?  How about 2,000 green?  100,000
yellow?  An index perusal would still weed out = 900,000 rows, so is it
a win?  Try it an find out.  I don't know the answer.

Getting data from the index isn't free.  It /can/ be a win when it helps
remove a statistically strong portion of the data, dependent on disk
speed, cpu speed, and a data cardinality.  Sometimes, depending on the
what rows and data it takes to satisfy a query, it's actually better to
ignore the indexes and do a range scan of the table.

The point is that it isn't as simple as snap-your-fingers,-you're-done,
which I gather we both understand.  I suggest again to the OP to do some
tests on /personal data sets/ and see what answers needs.

Besides, who likes puce anyway?  ;-)

Kevin

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Comma's in data?

2008-03-08 Thread Kevin Hunter
At 12:23a -0500 on Sat, 08 Mar 2008, obed wrote:
 Use \ to escape.
 
 mysql insert into table1 values ('algo\,otra');

As an aside, I'm curious why the parser doesn't understand that the
comma is part of the string in this context?  It's already between two
single quotes ... ?  Shouldn't that be enough?

Kevin

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: basic style shema question

2008-01-18 Thread Kevin Hunter

At 11:44a -0500 on 18 Jan 2008, Alex K wrote:

To summarize one table vs. many tables with one to one relations?


As per usual, it depends on your needs.  For most flexibility, and to 
give the DB the best chance to give the best plan for the possible 
requests I might make in the future, I generally tend towards 
normalization, not denormalization.


The question is: what do you want to do with the schema?

Kevin

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: basic style shema question

2008-01-18 Thread Kevin Hunter
Hmm.  If we're talking pure DB theory, then the whole point is to apply 
the DRY principle as much as possible.  At the point you have multiple 
copies of the same data, unless your programmers are perfect (and they 
aren't, I promise), you *will* have stale data.  Better to have only one 
place to update things.


Some other questions/thoughts that come to mind:
- Will it ever be possible for a company to have more than one contact
  email address?
- Do you /really/ want to store the user's password in your DB?  Look
  towards salts and SHA/MD5 sums.
- If you have more than a few people with the same company, are you at
  all worried about disk space?
- Disk contention is often the bottleneck, if your DB can't fit entirely
  in main memory.  Having less places to update means better disk
  access.
- Usage patterns, as you described, are also a concern.  Logging in is
  an issue as is accessing other data.  What's the ratio of
  currently-logging-in-users to other data requests?  No need to pull
  the entire data row if you're not going to use most of the disk pages.
- Will you have mainly INSERT queries, mainly SELECT, UPDATE,
  DELETE, a combination?  DB's and engines are better suited to
  different types of workloads.
- Are you worried about integrity of your data?  How many foreign
  key constraints will you want?

These all tie in together, and generally beg the question of *your* 
usage patterns.  If this is a project of any import, I can almost 
guarantee that what you think will happen will not align with what 
actually happens.  With that in mind, having the agility of multiple 
tables with correct data (read: use foreign key constraints) will likely 
behoove you.


You might want to take a look at some articles on normalization and 
schema design.  Wikipedia is a good starting pace.


Kevin

At 12:30p -0500 on 18 Jan 2008, Alex K wrote:

Well the basic information, company description and personalized
options will be selected many times (whenever a user submits a query).
It will basically be show on the result page of the search engine.

The user's login / password well is used to login, then the user may
update the company basic information, description and personalized
options. These updates may happen sporadically though. Once every 3
minutes these fields are selected again in order to update the search
engine index.


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: MySql at 100% CPU

2008-01-17 Thread Kevin Hunter

At 3:25a -0500 on 17 Jan 2008, Ben Clewett wrote:
I need to know why MySql does this.  If any person knows how I an 
identify the problem, please let me know!


I think this package helped another person out on this list a couple 
days ago:


http://debaday.debian.net/2007/12/26/mytop-a-top-clone-for-mysql/

HTH,

Kevin

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Prepared SQL statements - Faster performance?

2008-01-17 Thread Kevin Hunter
I can't speak to the exact internals of MySQL, but in database practice 
one *generally* uses prepared statements for two reasons:


1. Security
2. Speed

If your system is at all exposed to the outside world *and you at all 
value your data*, your biggest concern should /absolutely/ be security. 
 By using prepared statements, you reduce the number of places to 
secure by letting the database do your escaping for you (e.g. database 
developer's know better what to escape than you).  It's actually better 
than that, but that's a simple explanation.


If you're not exposed to the Internet at large, and speed is your 
concern, prepared statements can give a speed improvement, but not 
always.  The reason for the alleged speed improvement is that generally 
one prepares the statement once and then aggregates the cost of 
preparation over more than one execution.  Juxtapose with reparsing and 
executing for every set of arguments, where the plans are largely the 
same: the parsing phase is duplicate work.


But preparation is not always a win.  Say I have a table of stored 
materials.  Picking the perfect plan is highly dependent on what data 
I have, and what data I want.  Take this query as an example


PREPARE( SELECT material FROM stock WHERE mass = ? );

Presumably, just having an index on 'mass' will make things faster, 
right?  That makes an assumption that I have an evenly distributed set 
of data.  What if 5 billion items in my warehouse are 5kg, and 3 items 
are 10kg?  If I plug in 5kg, my indexes are useless, and are in fact a 
loss to use.  If I plug in 10kg, my indexes are a huge gain.  Without 
knowing before hand what data I'll need, the planner will likely make a 
poor decision in favor of the best general decision.


To answer what you can expect from planning: YMMV.  I have had code 
bases improve from 90 minutes to 5 minutes on the same hardware.  I have 
also had code bases show decreased performance by implementing planning.


As the previous poster said, the only way you'll know what *your* speed 
gain/loss will be, is to do it and find out.  You may be surprised.


Kevin

At 10:56p -0500 on 15 Jan 2008, Moon's Father wrote:

To know the exact speed improvement ,you have to have a test yourself append
on your hardware mathine.

On Jan 15, 2008 11:39 PM, mos [EMAIL PROTECTED] wrote:


At 11:25 AM 1/14/2008, Jay Pipes wrote:

Are you using the PREPARE STATEMENT server-side syntax or an emulated
prepared statement like in PDO?

-jay

Jay,
 Currently I'm not using prepared statements at all. Before I switch,
I wanted to know how much of a speed improvement I can expect, if any.



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: MySql at 100% CPU

2008-01-17 Thread Kevin Hunter

At 12:08p -0500 on 17 Jan 2008, Ben Clewett wrote:
The package you show gives the same output as the 'SHOW PROCESSLIST' 
output, which I included in my previous email :)


Heh, that'll teach me to actually /use/ the package before recommending 
it. :-P  (I just found out about it when that article came out, and 
haven't yet had a chance to try it out.)


Good luck with your problem!

Kevin

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Performance problem - MySQL at 99.9% CPU

2008-01-08 Thread Kevin Hunter

At 6:47a -0500  on 08 Jan 2008, Gunnar R. wrote:

Concerning slow queries, it seems there's a couple of different queries
that's being logged.


I haven't tried it yet, but this recently went by on debaday.debian.net:

mytop: a top clone for MySQL

http://debaday.debian.net/2007/12/26/mytop-a-top-clone-for-mysql/

Kevin

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Performance problem - MySQL at 99.9% CPU

2008-01-08 Thread Kevin Hunter

At 3:51p -0500 onGunnar R. wrote, On 01/08/2008 03:51 PM:
That tool tells me 100% of the data is read from memory, not a byte from 
disk... would there still be any point in getting more memory?



Any suggestions to where to go from here?


I dunno.  My hunch is that could do some query optimization.  Have you 
checked the index usage on your queries?  Although MySQL's internal 
planner is pretty weak for anything more complex than simple statements, 
there are some excellent tools available to help you tune your queries.


http://dev.mysql.com/tech-resources/articles/using-new-query-profiler.html

A quick google with these keywords mysql index hints query profiler 
seemed to provide a good jumping off point.


HTH,

Kevin

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Select mit Subselect Problem

2007-12-17 Thread Kevin Hunter
At 7:45a -0500 on 17 Dec 2007, Marcus Füg wrote:
 Tabelle Sales (s) mit jede Menge Abverkaufsdaten.
 ArtikelNr,Kassenbon,HändlerID,Datum
 
 Jetzt sollen alle zu einem gekauften Artikel ebenfalls augelistet
 werden, d.h. was wurde mit dem Artikel zusätzliche gekauft
 
 Bisher habe ich das mit PHP gemacht, nur macht jetzt bei 5 Mio
 Datensätze in den Abverkäufen PHP schlapp. DArum meine Frage kann ich
 effizienter Mit MySQL machen?

 [snip]

 Danach die Ergebnisse aufsummiert.
 
 Geht das effizienter?

1. This list is largely an English list, so you may have better luck
asking your question in English.  Alternatively, you might ask your
question on this list: http://lists.mysql.com/mysql-de

2. How can you more efficiently use MySQL in this scenario?  It depends
on exactly what you're trying to do.  If you can describe the problem
rather than exactly what you're trying to do, we may be able to better help.

That said, I'm guessing you're looking for GROUP BY and ORDER BY.  Take
a gander at the MySQL docs for the version that you are using.  A
starting point:
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions-and-modifiers.html

Kevin

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: MySQL Magazine - Issue 1 available NOW!!!!

2007-06-11 Thread Kevin Hunter

At 5:53p -0400 on 11 Jun 2007, Gordan Bobic wrote:
I don't understand this never-ending fascination with re-inventing  
a square wheel for an application for which the standard round type  
has already been kindly provided since year dot.


I imagine the reason for this never-ending fascination ... is along  
the lines of education.  To the uninitiated, the idea of binding  
variables seems weird and unintuitive.  What?!  I don't want them to  
be bound!  They're variables!  They're supposed to change.  Okay, so  
it's a slight exaggeration, but I think the point is still good.  A  
lot of folks pick up MySQL by fiddling around in their spare time.   
Unless someone (among my friends, it's usually me) enlightens them to  
better ways of doing things, and reasons for doing something in a  
more abstract, not-always-immediately-intuitive way, folks just don't  
know any better.  In that sense, this very discussion is /exactly/  
what the magazine article should be creating.


For those who don't know, binding is a process that does two things:

1. Lets the database pre-parse a query

	This can give enormous boosts in speed because a large amount of the  
time involved in getting information from the database is spent in  
parsing and planning a query.  If you're planning to do the same  
query many times over with only one or two different parameters, why  
not only parse and plan it once, and then aggregate the cost over / 
all/ the executions?


2. as well as let the database do any escaping necessary.

	The database designers ostensibly know best what characters need to  
be escaped, so let them do it.  Besides, no sense in having multiple  
definitions around, or reinventing a wheel, or wasting your time  
doing something mundane.


In lieu of an example, I'll just point to the PHP site:

http://us.php.net/manual/en/function.mysqli-prepare.php

Kevin

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Table compression with write (append) support

2007-05-28 Thread Kevin Hunter
At 12:31a -0400 on 28 May 2007, Dan Nelson wrote:
 In the last episode (May 27), Yves Goergen said:
 I'm thinking about using a MySQL table to store an Apache access log
 and do statistics on it. Currently all access log files are stored as
 files and compressed by day. Older log files are compressed by month,
 with bzip2. This gives a very good compression ratio, since there's a
 lot of repetition in those files. If I store all that in a regular
 table, it would be several gigabytes large. So I'm looking for a way
 to compress the database table but still be able to append new rows.
 As the nature of a log file, it is not required to alter previous
 data. It could only be useful to delete older rows. Do you know
 something for that?
 
 You want the ARCHIVE storage engine.
 
 http://dev.mysql.com/doc/refman/5.0/en/archive-storage-engine.html

Huh.  This is the first I've heard of the archive engine.  Cool!

However, I'm curious how the compression offered by OPTIMIZE TABLE and
the zlib library would compare to denormalization of the log schema.  In
particular, I imagine a lot of the HTTP requests would be the same, so
you could create a table to store the requested URLs, and then have a
second table with the timestamp and foreign key relationship into the
first.  Depending on how wide the original rows are and how often
they're requested, I imagine you could get quite a savings.  Anything
else that's repeated as well?  IP's?  Return codes?

Would be curious about the results if you were able to implement both.

Kevin

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Table compression with write (append) support

2007-05-28 Thread Kevin Hunter
At 5:45a -0400 on 28 May 2007, Yves Goergen wrote:
 On 28.05.2007 09:06 CE(S)T, Kevin Hunter wrote:

 In particular, I imagine a lot of the HTTP requests would be the
 same, so you could create a table to store the requested URLs, and
 then have a second table with the timestamp and foreign key
 relationship into the first.
 
 Interesting idea. Inserting would be more work to find the already
 present dictionary rows.

My guess is not /that/ much work, since you should only have a known and
relatively small set in this dictionary, it'd basically be cached the
whole time.  But, that's my guess.  Haven't tried it.  Practice and
theory . . .

 Also, URLs sometimes contain things like
 session IDs. They're probably not of interest for my use but it's not
 always easy to detect them for removal.

Really?  Why wouldn't it be easy to detect them?  You presumably know
what variable you're looking for in the URL string, and applying a
simple regex search-and-replace . . . ?

 IP addresses (IPv4) and especially return codes (which can be mapped to
 a 1-byte value) are probably not worth the reference. Data size values
 should be too distributed for this.

Well, presumably, you'd normalize that part of the table.  That is,
rather than include multiple foreign keys in your data rows, you'd
create a cartesian product of the the return codes with the dictionary
table.  You'd have a slightly more bloated dictionary, but depending on
the number of requests the site(s) get(s), the aggregation would more
than make up for it.

 I could also parse user agent
 strings for easier evaluation, but this takes me the possibility to add
 support for newer browsers at a later time. (Well, I could update the
 database from the original access log files when I've updated the UA
 parser.)
 

Same thought.  If you've only a known set of UA strings, you could
normalize them with the dictionary table as well.

 How large is a row reference? 4 bytes?

I don't know, I'm fairly new to MySQL.  I suppose it'd also matter on
the type of index.  Anyone more knowledgeable wanna pipe up?

Well.  Whatever method works for your needs, cool.  I'm going to check
out both MYISAMPACK and ARCHIVE.  I was unaware of those.  Thanks list!

Kevin

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: IS NULL Question

2007-05-01 Thread Kevin Hunter

At 9:06p -0400 on 30 Apr 2007 John Kebbel wrote:

I'm having problems understanding NULL. I grasp what a NULL value is,


A NULL value is rather an oxymoron.  It'd be more accurate to say  
that NULL means absence of a value.



but I can't get NULL to perform correctly. For instance, if I do a
Select statement, I see that I have columns with a NULL value.

select first, last, suffix  from persons LIMIT 5;
+---+--++
| first | last | suffix |
+---+--++
| Benjamin  | Page | NULL   |
| Jonathan  | Watson   | NULL   |
| Jose  | Thorson  | NULL   |
| Alejandro | Nickels  | NULL   |
| Griselda  | Richards | NULL   |
+---+--++
5 rows in set (0.01 sec)

Logically, it seems that a Select statement should find these five  
plus

any other NULL values in the suffix column. However, such a select
statment returns an empty set.

mysql select first, last, suffix  from persons where suffix IS NULL;
Empty set (0.00 sec)

Does anyone see what I'm doing wrong? (Thanks in advance for any  
help.)


Caveat emptor: I haven't tested this in about a year.

Are you perchance using a table type of MyISAM?  I seem to recall  
that MyISAM has a hard time actually representing NULL internally.   
[ ... Thinks for a minute ... ]  I remember something about spaces,  
like, I think I did


INSERT ( val1, val2, NULL ) INTO myTable;

and was only able to get the tuples back when I did

SELECT * FROM myTable WHERE col3 = ' ';  -- a space character

If this is the case, you might consider using a different table type,  
such as InnoDB.


HTH,

Kevin

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: IS NULL Question

2007-05-01 Thread Kevin Hunter

Caveat emptor: I haven't tested this in about a year.
Are you perchance using a table type of MyISAM?  I seem to recall   
that MyISAM has a hard time actually representing NULL  
internally.   [ ... Thinks for a minute ... ]  I remember  
something about spaces,  like, I think I did

INSERT ( val1, val2, NULL ) INTO myTable;
and was only able to get the tuples back when I did
SELECT * FROM myTable WHERE col3 = ' ';  -- a space character
If this is the case, you might consider using a different table  
type,  such as InnoDB.

HTH,
Kevin


MyISAM has no problem representing NULL.


My mistake.  Obviously old data.

Kevin

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: about limit

2007-03-20 Thread Kevin Hunter

On 20 Mar 2007 at 12:50p -0400, nikos wrote:
Does any body knows how to select not the LIMIT 20 records but the  
rest of them?


MySQL version is 4.1.21-standard and I cant make a VIEW.


In Venn-speak, you are asking for Circle A less(, minus, or subtract)  
Circle B.  In SQL, this would translate to MINUS.  However, last I  
checked, MySQL does not support (or has not yet implemented) the  
MINUS operator.  This Google finding might prove useful to you in  
this endeavor (scroll to the last entry on the page):


http://www.terminally-incoherent.com/blog/category/mysql/

Kevin

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: binary into blob

2007-03-07 Thread Kevin Hunter

On 07 Mar 2007 at 3:57p -0500, Alexander Lind wrote:

imagine a large system where pdf-files are accessed by clients a lot.
say 1 pdf file is access per second on average.

also say that your database is on a machine separate from the 
webserver(s) (as is common).


do you really think its a good idea to pump the pdf data from the db 
each time it needs to be accessed?


it may seem easier to do this as you then would not have to replicate 
the physical files across the webserver machines, but i would not say 
that the slight increase of ease is worth the added load to the db and 
the network.
following your reasoning, you'd also store all images in the db so that 
you don't have to replicate these either?


what is unsafe about having a pdf file on a webserver (not in a place 
where the httpd can access it of course)?


alec

[EMAIL PROTECTED] wrote:

I have to disagree with most, I would store the entire file in the
database, metadata and all.  Better security, if you have a backend
database, it's much harder to get the data than pdf's sitting in a
directory on the webserver.  Plus if you ever want to scale to a
multi-webserver environment, the db storage works great.


On Wed, 7 Mar 2007, Ed wrote:

 

On Wednesday 07 March 2007 21:26, Alexander Lind wrote:
   

I would put the pdf as a regular file on the hd, and store the path to
it in the db.
Meta data could be things like the size of the pdf, author, owner,
number of pages etc.

Storing binary data from pdf:s or images or any other common binary
format is generally not a good idea.

Alec
  
Thank you all for your help.  I get the gist of things... don't store 
the

binnary, store the path to it and details of it.

Thank you all for your quick response.
 -Ed


Grrr.  All you lazy top-posters!  ;)  It seems to me that a case can be 
legitimately made for both methods of handling BLOBs.  On the one hand, 
where speed and/or efficiency (on many different levels) are the top 
priorities, it'd be good to keep the DB as trim as possible.  On the 
other hand, if security is the top priority, you'd better keep all your 
data encrypted and in the DB.  And as I wrote this, colbey explains this 
well in his email of 4:06p.


It seems to me that the point is to use the correct tool for the job at 
hand.  Computers have many tools, so no sense in turning everything into 
a proverbial nail head.


Kevin

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]