John Major wrote: > Hello- > > #I am a biologist, and work with large datasets (tables with millions of > rows are common). > #These datasets often can be simplified as features with a name, and a > start and end position (ie: a range along a number line. GeneX is on > some chromosome from position 10->40) > > I store these features in tables that generally have the form: > > SIMPLE_TABLE: > FeatureID(PrimaryKey) -- FeatureName(varchar) -- > FeatureChromosomeName(varchar) -- StartPosition(int) -- EndPosition(int) > > My problem is, I often need to execute searches of tables like these > which find "All features within a range". Ie: select FeatureID from > SIMPLE_TABLE where FeatureChromosomeName like 'chrX' and StartPosition > > 1000500 and EndPosition < 2000000; > > This kind of query is VERY slow, and I've tried tinkering with indexes > to speed it up, but with little success. > Indexes on Chromosome help a little, but it I can't think of a way to > avoid full table scans for each of the position range queries. > > Any advice on how I might be able to improve this situation would be > very helpful.
Basic question - What version, and what indexes do you have? Have an EXPLAIN? Something like - CREATE INDEX index_name ON SIMPLE_TABLE ( FeatureChromosomeName varchar_pattern_ops, StartPosition, EndPosition ); The varchar_pattern_ops being the "key" so LIKE can use an index. Provided of course its LIKE 'something%' and not LIKE '%something' > > Thanks! > John > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate > Weslee ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings