On Wed, Oct 26, 2011 at 1:40 AM, Lester Caine <les...@lsces.co.uk> wrote:

> Tommy Pham wrote:
>
>> I wonder ... The real question is what's the purpose of the DB?  Is it for
>> OLAP
>> or OLTP? ;)
>> As for dealing with DB having millions of rows, you're crossing over into
>> DBA area.
>>
>
> Many of my customers have coming up on 20 years of data available. There
> has been a debate on transferring historic data to a separate database, but
> having it available is not causing a problem, except for some counts and
> larger search actions, and being able to see how long a client has been
> visiting is often useful. Statistical analysis is always done on a separate
> machine, with a replicated copy of the data, so as not to affect the active
> users ...
>
>
What kind of counts/filters?  What kind of RAID subsystem is the storage?
What's the total size of the DB?  Up to 20 years of data should be in the
peta range.  In that peta range, if you're not having performance issue and
not using either RAID 0, 0+1, 10, 50, or 60, I'd love to hear about the
application and database design in details. :)


> That said, we are now using the much more detailed LLPG address data rather
> than simple postcode, and that has added another order of magnitude to data
> that is being searched live ...
>
>
> --
> Lester Caine - G8HFL
> -----------------------------
> Contact - 
> http://lsces.co.uk/wiki/?page=**contact<http://lsces.co.uk/wiki/?page=contact>
> L.S.Caine Electronic Services - http://lsces.co.uk
> EnquirySolve - http://enquirysolve.com/
> Model Engineers Digital Workshop - http://medw.co.uk//
> Firebird - 
> http://www.firebirdsql.org/**index.php<http://www.firebirdsql.org/index.php>
>
>
Since Jason didn't disclose sufficient information, I suggested that RAID
storage based on the worst case scenario being this:  as a business analyst,
I'd want to do drill-downs, filters, counts of a the following for an
up-coming marketing campaign:

* county(ies)
* city(ies)
* zip code(s)
* address type (business and/or residential)
* of business (commercial, industrial/manufacturing, etc)
* of residential ( homes - single/townhouses - or apartments/condos )

The filters and counts will any combinations of the above, ie: 5 random zip
codes within the given state that lists all other criteria and break down.
As Jason mention having 89million rows for a given state, how long would it
take to run the drill-downs if the DB isn't sitting on a fast storage
medium?  That 89 million is the most likely the average count in the USA.
For California and New York, the number can double that easily.  That's only
the basic filtering.  What of a specific business industry such as
landscaping?  What of the filtering by yearly income and/or real estate
value?  BTW, as a business analyst, I don't want to wait hours for the info
to update every time I change a criteria/filter to get the counts before I
look into a few random individual records from the results.

As I've mentioned, something of this magnitude is better to leave it to the
DBA and work together with that DBA.  Either hire/contract one or become one
:)


Regards,
Tommy

Reply via email to