I've dealt with fairly large sets, but not as static as yours.  If your only
keys for searching are planet and date, then a perl lookup with a hash will
be faster overall since a DB lookup involves connecting to the database,
doing the standard prepare/execute/fetch which could be as costly (for a
single lookup) as the lookup itself.  The actual lookup of the record in the
database is probably as fast or faster than Perl (especially after the
initial lookup that primes the caches) if you have indexed the columns on
the table properly.

If you are planning to do lots of lookups on this dataset, preloading the
dataset in a perl hash would definitely be the better approach.  If you are
doing only a few lookups over a given period, it may or may not be worth it
and taking up lots of memory for no reason and sticking with the db lookup
would probably be best.

For the perl hash, I would key the hash on the combo of planet and date,
something like:

my %Planets = (

        jupiter    => {
                                "1900-01-01"    => ( "5h 39m 18s", "+22o
4.0'", 28.922, -15,128, -164.799, "set"),
                                "1900-01-02"    => ( "5h 39m 18s", "+22o
4.0'", 28.922, -15,128, -164.799, "set"),
         },

        neptune    => {
                                "1900-01-01"    => ( "5h 39m 18s", "+22o
4.0'", 28.922, -15,128, -164.799, "set"),
                                "1900-01-02"    => ( "5h 39m 18s", "+22o
4.0'", 28.922, -15,128, -164.799, "set"),
        },
) ;

You could also just combine the planet and date as the string for the hash
key like "jupiter1900-01-01" but not real sure if this buys you any
performance - it might even be slightly slower since its working on a much
larger single hash rather than a two dimensional hash - might be interesting
to benchmark it on your size dataset to see what really happens.  As to
using DB_file, it would probably be somewhere between the Perl hash approach
and using the standard SQL database interface.

dale

----- Original Message ----- 
From: "simran" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, May 28, 2003 9:29 PM
Subject: Large Data Set In Mod_Perl


> Hi All,
>
> For one of the websites i have developed (/am developing), i have a
> dataset that i must refer to for some of the dynamic pages.
>
> The data is planetary data that is pretty much in spreadsheet format,
> aka, i have just under 800,000 "rows" of data. I don't do any copmlex
> searches or functions on the data. I simply need to look at certain
> columns at certain times.
>
> sample data set:
>
>  planet  |    date    | right_ascension | declination | distance |
altitude | azimuth  | visibility
> ---------+------------+-----------------+-------------+----------+--------
--+----------+------------
>  jupiter | 1900-01-01 | 15h 57m 7s      | -19° 37.2'  |    6.108 |
10.199 |   39.263 | up
>  mars    | 1900-01-01 | 19h 2m 20s      | -23° 36.7'  |    2.401 |
14.764 |    -4.65 | up
>  mercury | 1900-01-01 | 17h 15m 16s     | -21° 59.7'  |    1.151 |
14.041 |   20.846 | up
>  moon    | 1900-01-01 | 18h 41m 17s     | -21° 21.8'  |     58.2 |
17.136 |    0.343 | transit
>  neptune | 1900-01-01 | 5h 39m 18s      | +22° 4.0'   |   28.922
  -15.128 | -164.799 | set
>
>
> I need to be able to say:
>
> * Lookup the _distance_ for the planet _mercury_ on the date _1900-01-01_
>
> Currently i do this using a postgres database, however, my question is,
> is there a quicker way to do this in mod_perl - would a DB_File or some
> other structure be better?
>
> I would be interested in knowing if others have dealt with large data
> sets as above and what solutions they have used.
>
> A DB is quick, but is there something one can use in mod_perl that would
> be quicker? perhaps something such as copying the whole 800,000 rows to
> memory (as a hash?) on apache startup?
>
> simran.
>
>

Reply via email to