John Major wrote:
> Hello-
> 
> #I am a biologist, and work with large datasets (tables with millions of
> rows are common).
> #These datasets often can be simplified as features with a name, and a
> start and end position (ie:  a range along a number line.  GeneX is on
> some chromosome from position 10->40)
> 
> I store  these features in tables that generally have the form:
> 
> SIMPLE_TABLE:
> FeatureID(PrimaryKey) -- FeatureName(varchar) --
> FeatureChromosomeName(varchar) -- StartPosition(int) -- EndPosition(int)
> 
> My problem is, I often need to execute searches of tables like these
> which find "All features within a range". Ie:  select FeatureID from
> SIMPLE_TABLE where FeatureChromosomeName like 'chrX' and StartPosition >
> 1000500 and EndPosition < 2000000;
> 
> This kind of query is VERY slow, and I've tried tinkering with indexes
> to speed it up, but with little success.
> Indexes on Chromosome help a little, but it I can't think of a way to
> avoid full table scans for each of the position range queries.
> 
> Any advice on how I might be able to improve this situation would be
> very helpful.

Basic question - What version, and what indexes do you have?

Have an EXPLAIN?

Something like -

CREATE INDEX index_name ON SIMPLE_TABLE ( FeatureChromosomeName
varchar_pattern_ops, StartPosition, EndPosition );

The varchar_pattern_ops being the "key" so LIKE can use an index.
Provided of course its LIKE 'something%' and not LIKE '%something'


> 
> Thanks!
> John
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
> 
>                http://www.postgresql.org/about/donate
> 


Weslee

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Reply via email to