Re: [GENERAL] Working with huge amount of data.

Mario Lopez Mon, 11 Feb 2008 08:28:28 -0800

Erik,

Thanks for your answers, actually this is a workable solution because mydata does not get updated so frequently (every 24 hours). The problemis that I would like a more advanced version of this, there must besomething I can do, I am going to try what Hubert Despez explained inhis articles.


Thanks :)

On Feb 11, 2008, at 9:37 AM, Mario Lopez wrote:
Hi guys :-), I am working on a personal project in which I am tryingto make sense on a huge (at least for me) amount of data. I haveapproximately 150 million rows of unique words (they are not exactlywords it is just for explaining the situation).
The table I am inserting this is a quite simple table, something likethis:
CREATE TABLE "public"."names" (
"id" SERIAL,
"name" VARCHAR(255)
) WITHOUT OIDS;
It is a requirement that I can make searches on the varchar withqueries that look the following way:
SELECT * FROM names WHERE name LIKE ‘keyword%’
Or
SELECT * FROM names WHERE name LIKE ‘%keyword%’
I optimized the first type of queries making partitions with everyletter that a name can begin with:
CREATE TABLE "public"."names_a" (
CONSTRAINT "names_a_check" CHECK (("name")::text ~~ 'a%'::text)
) INHERITS ("public"."names")
WITHOUT OIDS;
The problem arises with the second type of queries, where there areno possible partitions and that the search keywords are not known, Ihave tried making indexes on the letter it ends with, or indexes thatspecify that it contains the letter specified but none of them workthe planifier only make sequential scans over the table.
For the moment the quickest scan I have being able to make is usinggrep!!, surprisingly enough grep searches on an average of 20 secondsa whole plain text file of 2 GB one name per line and PostgreSQL onthe fist type of queries takes like 50 seconds while the second typeof queries con take up to two minutes which is completelyunacceptable for an online search engine that has to attend a userquerying this information.
How does this big search engines let’s say Google make this up? I amamazed of the quickness on searching this amount of information in solittle time. Any approach I could take? I am open minded so anythingis acceptable not necessarily only PostgreSQL based solutions(although I would prefer it). By the way Textual Search in PostgreSQLis discarded because what I am looking at are not names that can bedecomposed on lexems, let's say that this varchar is composed ofrandom garbage.
Actually, a friend of mine actually did exactly what you've tried:grep. He had a cron job that would update the txt file from thetable's data every five minutes and then his app would shell out torun those kinds of queries. Of course, with a setup like that yourresults can be a little out of date (the period between runs of thecron job) but, if you can deal with that, that's actually a prettysimple solution that doesn't take too much setup.
Erik Jones

DBA | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com




---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Re: [GENERAL] Working with huge amount of data.

Reply via email to