[SQL] cartesian product

2007-02-19 Thread Salman Tahir

Hi,

I have a query regarding an SQL statement I'm trying to execute. I
have the following table:

sequence
-+
AK
AKCMK
CMKA

I execute the following statement (Cartesian product):

SELECT p1.sequence as sequence1, p2.sequence as sequence2
FROM potential_pairs p1, potential_pairs p2

which gives me:

sequence1 | sequence2
+--
AK   | AK
 AK   | AKCMK
 AK   | CMKA
 AKCMK| AK
 AKCMK| AKCMK
 AKCMK| CMKA
 CMKA  | AK
 CMKA  | AKCMK
 CMKA  | CMKA
(9 rows)

I want to eliminate duplicates and by duplicate I mean a tuple such as
{AK, CMKA} should be regarded as the same as {CMKA, AK}. So I would
like the following result:

sequence1 | sequence2
+--
AK   | AK
 AK   | AKCMK
 AK   | CMKA
 AKCMK| AKCMK
 AKCMK| CMKA
 CMKA  | CMKA

Any help would be appreciated.

- Salman

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[SQL] Organization of tables

2007-06-14 Thread Salman Tahir

Hi,

I have a question regarding the organization of a table I want to
create in my database: I have the following setup:

Table Fragments (simplified example):
Primary key = {mass}

Approach (1)
fragment   |   mass (of fragment)   |   peptide
---++--
A |  x  | Peptide1, Peptide2
Q |  y  | Peptide1
K |  z  | Peptide 2, Peptide3

The idea here is that a peptide can be composed of many fragments e.g.
Peptide 2 is made up of fragments A and K; Peptide1 is made up of A
and Q and so on.

My idea is to create an index on the mass column and be able to
retrieve all Peptides that contain a certain fragment mass e.g

SELECT peptide
FROM Fragments
WHERE mass = x;

Should give me: Peptide1, Peptide2

The alternative way I have thought of to organize this table is to
have something as follows:
Approach (2)
Primary Key = {fragment, mass, peptide}
fragment   |   mass (of fragment)   |   peptide
---++--
A |  x  | Peptide1
A |  x  | Peptide2
Q |  y  | Peptide1
K |  z  | Peptide 2
K |  z  | Peptide 3

If I consider 2500 unique fragments then, using approach (1), table
Fragments will hold 2,500 tuples. If I consider the same number of
fragments then table Fragments using approach 2 holds 15,000 tuples.

I have considered using approach (1) whereby I would have less tuples
to search but if I wanted to access the peptides they belong to I
would retrieve the list of corresponding peptides e.g the string
"Peptide1, Peptide2" and process it in my program. However this seems
like a hack around the way a database table should be organised. The
problem increases further when I have to scale up and consider more
unique fragments (>2500).

Any help on how best to structure such data would be mostly appreciated.

- Salman Tahir

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq