Re: Datawarehousing

Gelu Wed, 13 Mar 2002 06:16:11 -0800

"DIVIDE ET EMPERA"

Hi,


OK.... you wish to discuss about CONCEPTUAL point of view.

You "said": Partitioning data and Parallelism.
We begin to talk about this principle, but i feel,.... something is missing
here.Ohhh...yeaaa... is "calling" DISTRIBUTION DATA.And if we talk about how
or what tasks/process we need(complementary, further from what offered any
RDBMS
engine) to distribute data in our HETEROGENEUS/HOMOGENEOUS SYSTEM DATABASE
we must add in our discussion and:
-HOW ORGANIZE DATA FLOW?
-HOW ORGANIZE WORK FLOW DEPEND ON THE DATA DISPOSITION?
-AVOIDING "POINTS" WHERE BOYCE-CODD RULES ARE COMPROMISE?
-HOW WE CREATE FACILITIES FOR ONLINE TRASACTION PROCESSING ?
-HOW WE PROCESSING MULTIDIMENSIONAL DATA?
-HOW WE CREATE A MULTI-TIER APPLICATION?
-HOW WE CREATE A DRIVER DATABASE?
-HOW WE CREATE A SEMAPHORE SYSTEM?
-HOW ESTABLISH A LEVEL PRIORITY OF PROCESSES?
....
Now... we can talk about partitioning(serialize,recursive...) and
parallelism interdependence.We
cannot make any dissociation.When we think what table must "split"
(especially when we have mirroring database) on
whatever
database on whatever host machine, we must already know in what "block" of
our database system network we wish to "balance" some processes.Also we must
think of what "ways offering" for users to read/write and processing data
properly.
This mean :
    a- create a database with "rules" about how must work the entire
system.(Configuration Database)
    b- create a Server Application who manage User requirment and interact
with DataBases(that level is between User and DataBase).This is 3 tier
application.
   -This server must doing at least:
        -run an administrative process and maintain tables from our
databases on whatever host machine depend on configuration.
        -User access "gate" management.(In configuration database you can
"put"  parameter MAX_USERS_ACCESS_ALLOW_PER_DATABASE=10.000.In this case
this task can check what database are less then 10.000 and "guide" the User
to the
available database for working).
        -User requirment interpretor
        -SQL generator.
        ...etc.

In this case we can "write" our own function MERGE(Apropos ... and mySQL
"know"
about MERGE beginning with Version 3.23.25).

ie.
   We use two type of table "work table" and "historical table".Why? Because
we wish to prevent working with very huge tables(tens or few hundred of
GB/table.Is possible to reach terabytes if we talk about database(s)
size)especially when we need OLTP(in my opinion, mySQL is a "power" tools
for OLTP but only in this conditions).
    So... we create the "fact" table and let's say in configuration database
exist the parameter RENAME_TABLE_WHEN_THE_MONTH_CHANGE_AND_CREATE_OTHER=TRUE
    The "Administrative"(i make reference to "3 tier Server Application")
process check if this table is in conformity with
parameter aforementioned
and if  has, we must do something in this point :
1. retrieve summary data from this table and "put on apart" in a
maintenance database in a table who name is "table_summary_description".This
data can be :MAX(sales),SUM(sales),MIN(sales)...(what we considered can be
criteria searching)etc., but most important is the new suffix of the new
name of the table who should be
create(concat("fact","_","april2001")=fact_april2001) and the
MIN(ID),MAX(ID).Now User can ask :"In what month i have "thin"
sales?".Server Application invoke :"User requirement interpretor"and after
this, "SQL
generator" create the SQL(we are subscribed in the mysql mailing list;i
use the mysql syntax):
 "select suffix_table_name,MIN(sales) from table_summary_description group
by sales limit 1;"
Application Server take the result and is "send" to the user.
    On the others cases is when User ask:"Give me sales from '05/05/2001'"
or "Give me sales from '05/05/2001' to 06/05/2001".Scenario is the same, but
further, the first result is "taking" by "SQL generator" and make another
SQL:
 "select * from fact_april2001 where ID='05/05/2001'"
    or  :
    ... if we using MERGE this "fact_merge" must specify in maintenance
database.(I use my own MERGE).
    In mySQL it's very simple to create and it work fine(i tested under
Linux):
        "create table fact_merge(id date,sales numeric(10,2))
union=(fact_april2001,fact_may2001);"
        "flush tables;"
        ...E VOILA...
        ...if you wish to add other table we can use:
        "alter table fact_merge
union(fact_april2001,fact_may2001,fact_june2001);
        "flush tables;"
        and also if we wish to "exclude" tables we can use:
         "alter table fact_merge
union(fact_april2001,fact_june2001);
        "flush tables;"
       ...and work very good if we don't insert/delete data from this
tables(like merge i mean).If we wish to delete some data we can create a
FLAG field to mark if  row there is VALID or NOT.(I prefer this way because
i "see" OS "reaction".The OS "eat" a lot of resources when must working with
the physical files.And i think is normal - I/O operations.Once/week we can
stop a  "block" of database and delete from it the information who are
marked like NOT VALID)

2.rename the fact table to fact_april2001.
3.create other table fact with default ID=(fact_april2001.MAX(ID) if is the
same day --for MERGE reason--.

This what i think and what i doing at least 1 years and a half and results
are very good(...for me).I use heterogeneus database with : mySQL ver
3.23.41,INFORMIX 7.x and ORACLE 9.0.1(not IAS).
I doing my best to be explicit.If you find some confusion in my sentences, i
apologize for my poor english.Also if you find this e-mail too boring (or
stupid)...forget it ....else ....

regards.

----- Original Message -----
From: Oscar Colino <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Friday, March 08, 2002 2:29 PM
Subject: Re: Datawarehousing


> Hi,
> regarding the size, I would be more interested on the space in Gbytes
>
> I am use to work with Oracle,Teradata, DB2 ... and they all handle
> partitioning in different ways, when it comes to a big datawarehouse ( and
I
> am talking about Terabytes), there are two key features that need to be
> handle by the RDBMS system:
> 1. Paritioning of data
> 2. Parallelism (being able to split the workload of a query between
several
> processes , all accessing at the same time different blocks of data of the
> same table located in the same hard-disk or not), this feature can make a
> full table scan of a terabyte talbe to be procesed pretty quick),
otherwise
> it would not be realistic
>
> Regarding partitioning below there is an example of what I mean (this is
> basically the way Oracle works in 8.1.x)
>
> extracted from a previous email:
>
> ... if I use the MERGE thing, will the sql parser be able to access only
the
> right partition(table)
>
> ie:
>
> Imagine we partition by month a fact table containing data for two months
so
> we end up with 2 tables :
> fact_april and fact_may
> and then we merge them into on fact table called fact_merge
>
>   this is the squema design:
>     lu_month(month_id)
>     lu_day(day_id,month_id)
>     fact_may2001(day_id,sales)
>     fact_april2001(day_id,sales)
>     fact_merge (day_id,sales)  // this one is the result of mergin
fact_may
> and fact_april
>
>   now if I run the following query:
>
>    select sum(sales) from fact_merge where day_id='05/05/2001'
>
>   or
>
>    select sum(sales) from fact_merge where day_id between '05/05/2001' and
> '06/05/2001'
>
>   or
>
>    select sum(sales) from fact_merge where day_id in (select day_id from
> lu_month where month_id=200105)
>
>
>
>   will this queries be redirected by the MySql parser to the  fact_may2001
> table or will MySql scan the full fact_merge
>   table (basically all partitions)
>
>
>   This is the way most of the RDBMS systems supporting partitioning work,
> otherwise you would have to set the partitioning
>   logic on the application side.
>
>
>   thanks.
>
>
> >From: "Gelu" <[EMAIL PROTECTED]>
> >To: "Oscar Colino" <[EMAIL PROTECTED]>
> >Subject: Re: Datawarehousing
> >Date: Fri, 8 Mar 2002 15:48:16 +0200
> >
> >Hi,
> >
> > > what is the size of the warehouse?
> >The size of the Finish Goods Warehouse are : 40 types and manage cca.
> >300.000 pieces.
> >
> > > how do you handle partitioning logic on the database?
> >...IT'S MADNESS.
> >From what point of view are you interested ?
> >
> >
> >
> >----- Original Message -----
> >From: Oscar Colino <[EMAIL PROTECTED]>
> >To: <[EMAIL PROTECTED]>
> >Cc: <[EMAIL PROTECTED]>
> >Sent: Friday, March 08, 2002 12:37 PM
> >Subject: Re: Datawarehousing
> >
> >
> > >
> > > what is the size of the warehouse?
> > > how do you handle partitioning logic on the database?
> > >
> > >
> > > >From: "Gelu" <[EMAIL PROTECTED]>
> > > >To: "Oscar Colino" <[EMAIL PROTECTED]>, <[EMAIL PROTECTED]>
> > > >Subject: Re: Datawarehousing
> > > >Date: Fri, 8 Mar 2002 14:04:29 +0200
> > > >
> > > >Yes....
> > > >----- Original Message -----
> > > >From: Oscar Colino <[EMAIL PROTECTED]>
> > > >To: <[EMAIL PROTECTED]>
> > > >Sent: Friday, March 08, 2002 11:52 AM
> > > >Subject: Datawarehousing
> > > >
> > > >
> > > > > Anybody out there using Mysql as a Datawarehouse?
> > > > >
> > > > > _________________________________________________________________
> > > > > Get your FREE download of MSN Explorer at
> > > >http://explorer.msn.com/intl.asp.
> > > > >
> > > > >
> > > > >
> >---------------------------------------------------------------------
> > > > > Before posting, please check:
> > > > >    http://www.mysql.com/manual.php   (the manual)
> > > > >    http://lists.mysql.com/           (the list archive)
> > > > >
> > > > > To request this thread, e-mail
<[EMAIL PROTECTED]>
> > > > > To unsubscribe, e-mail
> > > ><[EMAIL PROTECTED]>
> > > > > Trouble unsubscribing? Try:
> >http://lists.mysql.com/php/unsubscribe.php
> > > > >
> > > > >
> > > >
> > > >
> > > >---------------------------------------------------------------------
> > > >Before posting, please check:
> > > >    http://www.mysql.com/manual.php   (the manual)
> > > >    http://lists.mysql.com/           (the list archive)
> > > >
> > > >To request this thread, e-mail <[EMAIL PROTECTED]>
> > > >To unsubscribe, e-mail
> > > ><[EMAIL PROTECTED]>
> > > >Trouble unsubscribing? Try:
http://lists.mysql.com/php/unsubscribe.php
> > > >
> > >
> > >
> > > _________________________________________________________________
> > > Send and receive Hotmail on your mobile device: http://mobile.msn.com
> > >
> > >
> > > ---------------------------------------------------------------------
> > > Before posting, please check:
> > >    http://www.mysql.com/manual.php   (the manual)
> > >    http://lists.mysql.com/           (the list archive)
> > >
> > > To request this thread, e-mail <[EMAIL PROTECTED]>
> > > To unsubscribe, e-mail
> ><[EMAIL PROTECTED]>
> > > Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
> > >
> > >
> >
>
>
> _________________________________________________________________
> Send and receive Hotmail on your mobile device: http://mobile.msn.com
>
>








---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: Datawarehousing

Reply via email to