OLAP Proposal for MySQL

2003-08-25 Thread Philip Stoev
Hi all,

Please tell me if any of this makes sense. Any pointers to relevant
projects/articles will be much appreciated.

Philip Stoev
http://www.stoev.org/pivot/manifest.htm

===

OLAP PROPOSAL FOR MYSQL

The goal is to create an OLAP engine coupled with a presentation layer that
will be easy enough for normal people to use, with no MDX experience
required. While it is probably a fact that Wal-Mart has 70 GB of data, this
does not mean that all people have such data sets, so the goal is reasonable
performance for reasonably-sized datasets. Most people do not join 30 tables
together either. Also, it is pre-supposed that Wal-Mart engage in
extra-complex calculations to determine business strategies, most people are
often content to know How much I sold yesterday.

I. OLAP ENGINE AND CACHING

The OLAP engine takes a standard SQL query with GROUP BY statements and
aggregate functions, executes it, and saves the entire resulting dataset in
the cache. A cache index entry is then created, noting what the source
tables, the GROUP_BY columns, the aggregate functions and the WHERE
conditions that were used.

Upon execution of further queries, the OLAP engine checks the cache whether
there is a cached dataset that can be used to answer the query immediately.
This would include any of the following:

1. The query's GROUP BY columns are equal or a sub-set of the cached query.
So, a query like:
SELECT salesman, state, SUM(sales) FROM company.sales GROUP BY
salesman, state
provides the answer for
SELECT salesman, SUM(sales) FROM company.sales GROUP BY salesman

2. The query's WHERE clause is equal or more restrictive to the WHERE clause
of a cached query, and contains columns that were GROUP BY-ed.
A query like:
SELECT date, salesman, SUM(sales) FROM company.sales GROUP BY
date, salesman WHERE date  '2003-01-01'
provides the answer for:
SELECT date, salesman, SUM(sales) FROM company.sales GROUP BY
date, salesman WHERE date  '2003-01-01' AND date  '2003-06-01'
Obviously, a human will not write a query with such a WHERE statement,
however a graphical Pivot tool may be explicitly designed to create such a
query when drilling-down so that a cache hit is scored.

3. The query's source tables are equal or a sub-set of the cached query's
source tables.
So, the query:
SELECT salesman, gender, SUM(sales) FROM company.sales INNER JOIN salesman
USING (salesman_id) GROUP BY salesman, gender
or even something very complex with 10 joined tables, can be used to answer:
SELECT salesman, SUM(sales) FROM company.sales GROUP BY salesman
or even something even more complex with 5 joined tables

4. The query's aggregate functions are equal of a sub-set of the cached
query's. Certain aggregate functions may not be cached like COUNT(DISTINCT),
and others require special care (AVERAGE(value) must be translated to
SUM(value)/COUNT(value)).

The benefits of such a cache implementation is that is it data-independent.
You do not have to describe your data prior to executing your queries. It
also does not rely on creating your own cache structure and your own cache
index - a few tables can be used to hold the cache index and can be then
queried by SQL themselves to determine a hit.

If an interactive Pivoting tool is executing those queries, the cache should
(hopefully) soon fill with entries that allow most, if not all, of the
queries resulting from interactive browsing to be served from the cache.
Additionally, the tool can apply for pre-fetching of relevant data by
drilling down a bit more than the user has requested, resulting in a cache
hit when the user indeed drills deeper. Also, the tool does not have to
cache data to sort it on its own, since queries that differ only in their
SORT BY are cached. An additional enhancement would be the ability to serve
a hit from the cache using more than one cached table.

Example:

A. No cache hit, so we just populate the cache
Initial query:
SELECT salesman, state, COUNT(*) FROM sales GROUP BY salesman,
state
The server does:
CREATE TABLE 1234567 SELECT salesman, COUNT(*) FROM sales GROUP
BY salesman, state
SELECT * FROM 1234567

B. A cache hit
Initial query:
SELECT state, COUNT(*) FROM sales GROUP BY state
The server does:
SELECT state, SUM(`COUNT(*)`) AS `COUNT(*)` FROM 1234567 GROUP
BY state
[`COUNT(*)` being a valid column name for table 1234567]

II. DATA DESCRIPTION AND MANIPULATION

1. In my humble opinion, people do not think in MDX. Instead, they think in
terms of GROUP BY. So, for most uses, it should be sufficient to allow the
user to construct his own GROUP BY statement and specify the aggregate
functions that he is interested in, rather than asking him to create a cube,
an axis, a view, a measure, etc, etc.

2. People also think in terms of everyday phrases, like last 7 days or
all Mondays. A pre-compiled dictionary of such phrases will be immensely

3.22.32-log Hang when unable to create log files

2001-11-30 Thread Philip Stoev

If the MySQL server is unable to create the mysql.log log file, then the
server hangs in an undeterminate state. No children are forked, and clients
can connect, however are not serviced at all.

I know there is an error in the log , or when mysqld is ran with STDERR
logging, however in my humble opinion it is more appropriate that the server
exits immediately so that the various monitoring facilities can detect the
problem. Right now the mysql TCP port remains open and the parent process
continues to run and this may fool monitoring software.

This was observed on a 3.22.32-log under Debian 2.2 (Linux server
2.2.18pre21 #1 Sat Nov 18 18:47:15 EST 2000 i686 unknown)

Sorry if this is known or has been fixed. I did not have the opportunity to
test or follow appropriate procedures.

Philip





-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php