Re: [GENERAL] designing time dimension for star schema

2014-02-15 Thread Mark Wong
On Mon, Feb 10, 2014 at 8:45 AM, Mark Wong  wrote:
> Hello everybody,
>
> I was wondering if anyone had any experiences they can share when
> designing the time dimension for a star schema and the like.  I'm
> curious about how well it would work to use a timestamp for the
> attribute key, as opposed to a surrogate key, and populating the time
> dimension with triggers on insert to the fact tables.  This is
> something that would have data streaming in (as oppose to bulk
> loading) and I think we want time granularity to the minute.

Hello everybody,

I did a simple experiment and just wanted to share.  Hopefully this
wasn't too simple.  On a 72GB 15K rpm 2.5" drive, I tried to see how
long it would take to insert (committing after each insert) 100,000
bigints, timestamps with time zone, and timestamps with time zone with
insert trigger.  The timestamp and bigints by themselves took ~10
minutes to insert 100,000 rows, and implementing the trigger increased
the time up to about ~11 minutes.

Regards,
Mark


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] designing time dimension for star schema

2014-02-10 Thread Mark Wong
On Mon, Feb 10, 2014 at 9:20 AM, CS DBA  wrote:
> I've done a lot of DSS architecture. A couple of thoughts:
>
> - in most cases the ETL process figures out the time id's as part of the
> preparation and then does bulk loads into the fact tables
>   I would be very concerned about performance of a trigger that
> fired for every row on the fact table
>
>   you mention you want to do data streaming instead of bulk loads,
> can you elaborate?

We have processes inserting data from log files as they are written.

> - When querying a star schema one of the performance features is the
> fact that all joins to the dimension tables are performed via a numeric
> key, such as:
> "select * from fact, time_dim, geo_dim
>where fact.time_id = time_dim.time_id..."
>
> In the case of this being a timestamp I suspect the performance would
> take a hit, depending on the size of your fact table and the
> scope/volume of your DSS queries this could easily be a show stopper
> based on the assumption that the database can do a numeric binary search
> much faster than a timestamp search

I guess I was hoping the extra 4 bytes from a timestamp, compared to a
bigint, wouldn't be too significant yet I didn't consider postgres
might do a binary search faster on an integer type than a timestamp.
Even with 1 billion rows, but maybe that's wishful thinking.  Maybe a
regular integer at 4 bytes would be good enough.  I would estimate a
query would touch up to an order of 1 million rows at a time.

Regards,
Mark


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] designing time dimension for star schema

2014-02-10 Thread Mark Wong
Hello everybody,

I was wondering if anyone had any experiences they can share when
designing the time dimension for a star schema and the like.  I'm
curious about how well it would work to use a timestamp for the
attribute key, as opposed to a surrogate key, and populating the time
dimension with triggers on insert to the fact tables.  This is
something that would have data streaming in (as oppose to bulk
loading) and I think we want time granularity to the minute.

A simplified example:

-- Time dimension
CREATE TABLE time (
datetime TIMESTAMP WITH TIME ZONE NOT NULL,
day_of_week SMALLINT NOT NULL
);
CREATE UNIQUE INDEX ON time (datetime);

-- Fact
CREATE TABLE fact(
datetime TIMESTAMP WITH TIME ZONE NOT NULL,
FOREIGN KEY (datetime) REFERENCES time(datetime)
);

-- Function to populate the time dimension
CREATE OR REPLACE FUNCTION decompose_timestamp() RETURNS TRIGGER AS $$
BEGIN
NEW.datetime = date_trunc('minutes', NEW.datetime);
INSERT INTO time (datetime, day_of_week)
VALUES (NEW.datetime, date_part('dow', NEW.datetime));
RETURN NEW;
EXCEPTION
WHEN unique_violation THEN
-- Do nothing if the timestamp already exists in the dimension table.
RETURN new;
END; $$
LANGUAGE 'plpgsql';

CREATE TRIGGER populate_time BEFORE INSERT
ON fact FOR EACH ROW
EXECUTE PROCEDURE decompose_timestamp();

Regards,
Mark


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Fwd: PDXPUG Day at OSCON 2010

2010-05-25 Thread Mark Wong
It was recommended to me to forward this to -general.

Regards,
Mark

-- Forwarded message --
From: Mark Wong 
Date: Tue, May 18, 2010 at 6:57 AM
Subject: PDXPUG Day at OSCON 2010
To: pgsql-annou...@postgresql.org


Thanks to the generosity of O'Reilly, we will be having a full day of
free PostgreSQL sessions on Sunday, July 18 at the Oregon Convention
Center.  Location details and schedule information can be found on the
wiki at:

http://wiki.postgresql.org/wiki/PDXPUGDay2010

We will ask for a $30 donation towards PostgreSQL at the conference,
but no one will be turned away. Sign up here:

https://spreadsheets.google.com/viewform?hl=en&formkey=dDVBRnJGWVlZRkdycFdXbXVuYTNiU2c6MQ

Please submit your talk proposal here:

http://spreadsheets.google.com/viewform?hl=en&formkey=dHBFMGFIWmxJUzhRM3R6dXVlWWxYQ1E6MQ.

Proposals will be decided upon in June 7th and updated on the wiki.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [pgsql-advocacy] OLS BOF for linux & postgresql

2005-01-28 Thread Mark Wong
On Fri, Jan 28, 2005 at 09:10:07PM -0500, Robert Bernier wrote:
> On January 28, 2005 05:06 pm, Mark Wong wrote:
> > I wanted to bounce the idea of a BOF at the Linux Symposium in
> > Ottawa and see if anyone would like to attend.  The deadline to
> > proposal is Feb 1st, sort of short notice...  I thought the dicussion
> > could revolved around these two topics:
> >
> > Linux features that PostgreSQL should take advantage of.
> > Where PostgreSQL stresses Linux now.
> >
> >
> 
> Isn't the Linux symposium mostly esoteric kernel stuff?

Perhaps some. ;)  You can see the past proceedings here:
http://www.linuxsymposium.org/2005/proceedings.php

Last year there were papers about async i/o, speeding up the page
cache and scaling linux.

Mark

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[GENERAL] OLS BOF for linux & postgresql

2005-01-28 Thread Mark Wong
I'm told these lists were more appropriate for this kind of
announcement than hackers, so here it is.

- Forwarded message from Mark Wong <[EMAIL PROTECTED]> -

Date: Tue, 25 Jan 2005 16:59:16 -0800
To: pgsql-hackers@postgresql.org
From: Mark Wong <[EMAIL PROTECTED]>
Subject: [HACKERS] OLS BOF for linux & postgresql

I wanted to bounce the idea of a BOF at the Linux Symposium in
Ottawa and see if anyone would like to attend.  The deadline to
proposal is Feb 1st, sort of short notice...  I thought the dicussion
could revolved around these two topics:

Linux features that PostgreSQL should take advantage of.
Where PostgreSQL stresses Linux now.


Thoughts, ideas, etc. are welcome.

For more information on the Linux Symposium:
http://www.linuxsymposium.org/2005/

Mark

- End forwarded message -

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [GENERAL] [HACKERS] OSDL DBT-2 for PostgreSQL

2003-08-01 Thread Mark Wong
On Fri, Aug 01, 2003 at 05:05:18PM -0700, Josh Berkus wrote:
> Mark,
> 
> > I've just got our DBT-2 workload (TPC-C derivate) working with
> > PostgreSQL using C stored functions and libpq.  I'd love to get some
> > feedback.
> 
> I'm confused.   Jenny Zhang just announced OSDL-DBT3 for Postgres; is this a 
> different test or does one of you have the name wrong?

Yeah, this is a different test.  DBT-3 is based on the TPC-H and DBT-2 is based
on the TPC-C.

Mark

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html