I signed up for the Netflix Prize. (www.netflixprize.com) and downloaded their data and have imported it into PostgreSQL. Here is how I created the table: Table "public.ratings" Column | Type | Modifiers --------+---------+----------- item | integer | client | integer | rating | integer | rdate | text | Indexes: "ratings_client" btree (client) "ratings_item" btree (item)
[EMAIL PROTECTED]:~/netflix$ time psql netflix -c "select count(*) from ratings" count ----------- 100480507 (1 row) real 2m6.270s user 0m0.004s sys 0m0.005s The one thing I notice is that it is REAL slow. I know it is, in fact, 100 million records, but I don't think PostgreSQL is usually slow like this. I'm going to check with some other machines to see if there is a problem with my test machine or if something is wierd about PostgreSQL and large numbers of rows. I tried to cluster the data along a particular index but had to cancel it after 3 hours. I'm using 8.1.4. The "rdate" field looks something like: "2005-09-06" So, the raw data is 23 bytes, the date string will probably be rounded up to 12 bytes, that's 24 bytes per row of data. What is the overhead per variable? per row? Is there any advantage to using "varchar(10)" over "text" ? ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq