[sqlite] Is there a way to inner join on named intervals?

2010-08-13 Thread Peng Yu
Hi,

Suppose that I have a table A, each row represents a interval. For
example, the first row represents an interval [1,10) with a name a.
The first and second rows are considered overlapping because the
interval [1,10) and interval [5,15) intersect and both rows have the
same name a.

name left right   tag
-
a  1 10   tag1
a  5 15   tag2
a21 30   tag3
b  3 12   tag4
b15 25   tag5
b19 30   tag6

I want to inner join the above table and the following table B
based on the named interval overlapping.

name left right   attr
-
a  3   7   attr1
a  8 12   attr2
a16 18   attr3
a25 35   attr4
b31 32   attr5

The result is the following. In each row, the named interval from A
overlaps the named interval from B. I don't see there is an easy way
to do this in sqlite3. I could use an external program (such as python
sqlite package) to enumerate all the named interval from table A and
search for overlapping named intervals in table B, but this operation
has a complexity of M log (N), where M is the length of table A and N
is the length of table B. If some sort of inner join could be used,
the complexity should be reduced to log(M+N). I'm wondering if there
something that can help do this kind of named interval inner join
easily.

A.name A.left A.right A.tag B.name B.left B.right B.attr

a  1 10 tag1a  3   7attr1
a  1 10 tag1a  8 12attr2
a  5 15 tag2a  3   7attr1
a  5 15 tag2a  8 12attr2
a21 30 tag3a16 18attr3

-- 
Regards,
Peng
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there a way to inner join on named intervals?

2010-08-13 Thread Jim Morris
Did you try something like(pseudo code):

select * from A inner join B on A.name = B.name AND ( B.left 
between(A.left,A.right) OR B.right between(A.left,A.right) )

On 8/13/2010 8:07 AM, Peng Yu wrote:
 Hi,

 Suppose that I have a table A, each row represents a interval. For
 example, the first row represents an interval [1,10) with a name a.
 The first and second rows are considered overlapping because the
 interval [1,10) and interval [5,15) intersect and both rows have the
 same name a.

 name left right   tag
 -
 a  1 10   tag1
 a  5 15   tag2
 a21 30   tag3
 b  3 12   tag4
 b15 25   tag5
 b19 30   tag6

 I want to inner join the above table and the following table B
 based on the named interval overlapping.

 name left right   attr
 -
 a  3   7   attr1
 a  8 12   attr2
 a16 18   attr3
 a25 35   attr4
 b31 32   attr5

 The result is the following. In each row, the named interval from A
 overlaps the named interval from B. I don't see there is an easy way
 to do this in sqlite3. I could use an external program (such as python
 sqlite package) to enumerate all the named interval from table A and
 search for overlapping named intervals in table B, but this operation
 has a complexity of M log (N), where M is the length of table A and N
 is the length of table B. If some sort of inner join could be used,
 the complexity should be reduced to log(M+N). I'm wondering if there
 something that can help do this kind of named interval inner join
 easily.

 A.name A.left A.right A.tag B.name B.left B.right B.attr
 
 a  1 10 tag1a  3   7attr1
 a  1 10 tag1a  8 12attr2
 a  5 15 tag2a  3   7attr1
 a  5 15 tag2a  8 12attr2
 a21 30 tag3a16 18attr3


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there a way to inner join on named intervals?

2010-08-13 Thread Simon Davies
On 13 August 2010 16:07, Peng Yu pengyu...@gmail.com wrote:
 Hi,

 Suppose that I have a table A, each row represents a interval. For
 example, the first row represents an interval [1,10) with a name a.
 The first and second rows are considered overlapping because the
 interval [1,10) and interval [5,15) intersect and both rows have the
 same name a.

 name left right   tag
 -
 a          1     10   tag1
 a          5     15   tag2
 a        21     30   tag3
 b          3     12   tag4
 b        15     25   tag5
 b        19     30   tag6

 I want to inner join the above table and the following table B
 based on the named interval overlapping.

 name left right   attr
 -
 a          3       7   attr1
 a          8     12   attr2
 a        16     18   attr3
 a        25     35   attr4
 b        31     32   attr5

 The result is the following. In each row, the named interval from A
 overlaps the named interval from B. I don't see there is an easy way
 to do this in sqlite3. I could use an external program (such as python
 sqlite package) to enumerate all the named interval from table A and
 search for overlapping named intervals in table B, but this operation
 has a complexity of M log (N), where M is the length of table A and N
 is the length of table B. If some sort of inner join could be used,
 the complexity should be reduced to log(M+N). I'm wondering if there
 something that can help do this kind of named interval inner join
 easily.

 A.name A.left A.right A.tag B.name B.left B.right B.attr
 
 a                  1     10     tag1        a          3       7    attr1
 a                  1     10     tag1        a          8     12    attr2
 a                  5     15     tag2        a          3       7    attr1
 a                  5     15     tag2        a          8     12    attr2
 a                21     30     tag3        a        16     18    attr3

Last line does not overlap.

Assuming that is an oversight, then

select * from A inner join B on A.name=B.name and A.left=B.right and
A.right =B.left;

seems to do what you want.


 --
 Regards,
 Peng

Regards,
Simon
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there a way to inner join on named intervals?

2010-08-13 Thread Pavel Ivanov
I don't understand where do you see a problem but it looks like this
join will do what you want:

select * from A, B
where A.name = B.name
and A.left  B.right
and A.right  B.left

 I could use an external program (such as python
 sqlite package) to enumerate all the named interval from table A and
 search for overlapping named intervals in table B, but this operation
 has a complexity of M log (N), where M is the length of table A and N
 is the length of table B. If some sort of inner join could be used,
 the complexity should be reduced to log(M+N).

How did you come to this conclusion? Any inner join will execute with
complexity either M log(N) or N log(M) anyway. The only benefit is
that SQLite can decide which way to join depending on relative size of
tables A and B. And constant in operation complexity is a bit smaller
with C code that in python code...


Pavel

On Fri, Aug 13, 2010 at 11:07 AM, Peng Yu pengyu...@gmail.com wrote:
 Hi,

 Suppose that I have a table A, each row represents a interval. For
 example, the first row represents an interval [1,10) with a name a.
 The first and second rows are considered overlapping because the
 interval [1,10) and interval [5,15) intersect and both rows have the
 same name a.

 name left right   tag
 -
 a          1     10   tag1
 a          5     15   tag2
 a        21     30   tag3
 b          3     12   tag4
 b        15     25   tag5
 b        19     30   tag6

 I want to inner join the above table and the following table B
 based on the named interval overlapping.

 name left right   attr
 -
 a          3       7   attr1
 a          8     12   attr2
 a        16     18   attr3
 a        25     35   attr4
 b        31     32   attr5

 The result is the following. In each row, the named interval from A
 overlaps the named interval from B. I don't see there is an easy way
 to do this in sqlite3. I could use an external program (such as python
 sqlite package) to enumerate all the named interval from table A and
 search for overlapping named intervals in table B, but this operation
 has a complexity of M log (N), where M is the length of table A and N
 is the length of table B. If some sort of inner join could be used,
 the complexity should be reduced to log(M+N). I'm wondering if there
 something that can help do this kind of named interval inner join
 easily.

 A.name A.left A.right A.tag B.name B.left B.right B.attr
 
 a                  1     10     tag1        a          3       7    attr1
 a                  1     10     tag1        a          8     12    attr2
 a                  5     15     tag2        a          3       7    attr1
 a                  5     15     tag2        a          8     12    attr2
 a                21     30     tag3        a        16     18    attr3

 --
 Regards,
 Peng
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there a way to inner join on named intervals?

2010-08-13 Thread Peng Yu
On Fri, Aug 13, 2010 at 11:32 AM, Pavel Ivanov paiva...@gmail.com wrote:
 I don't understand where do you see a problem but it looks like this
 join will do what you want:

 select * from A, B
 where A.name = B.name
 and A.left  B.right
 and A.right  B.left

 I could use an external program (such as python
 sqlite package) to enumerate all the named interval from table A and
 search for overlapping named intervals in table B, but this operation
 has a complexity of M log (N), where M is the length of table A and N
 is the length of table B. If some sort of inner join could be used,
 the complexity should be reduced to log(M+N).

 How did you come to this conclusion? Any inner join will execute with
 complexity either M log(N) or N log(M) anyway. The only benefit is
 that SQLite can decide which way to join depending on relative size of
 tables A and B. And constant in operation complexity is a bit smaller
 with C code that in python code...

Thank all the people that replied my email. I don't really understand
how inner join work. Sorry for the confusion.

The comparison that I actually need is the following one (logically
correct, but I'm not sure if it is the fastest comparison in term of
performance). My next question is what index I should create on table
A and B to speed up such an inner join? Are indexes on each of the
first three column of each table enough? What is the best comparison
(along with the appropriate indexes) in term of the performance?

create table A (name text, left integer, right integer, tag text);
create table B (name text, left integer, right integer, attr text);
insert into A values('a', 1, 10, 'tag1');
insert into A values('a', 5, 15, 'tag2');
insert into A values('a', 21, 30, 'tag3');
insert into A values('b', 3, 12, 'tag4');
insert into A values('b', 15, 25, 'tag5');
insert into A values('b', 19, 30, 'tag6');

insert into B values('a', 3, 7, 'attr1');
insert into B values('a', 8, 12, 'attr2');
insert into B values('a', 16, 18, 'attr3');
insert into B values('a', 25, 35, 'attr4');
insert into B values('b', 31, 32, 'attr5');

select * from A inner join B on A.name=B.name AND max(A.left, B.left)
 min(A.right, B.right);

nameleftright   tag nameleft
 right   attr
--  --  --  --  --  --
 --  --
a   1   10  tag1a   3
 7   attr1
a   1   10  tag1a   8
 12  attr2
a   5   15  tag2a   3
 7   attr1
a   5   15  tag2a   8
 12  attr2
a   21  30  tag3a   25
 35  attr4


 On Fri, Aug 13, 2010 at 11:07 AM, Peng Yu pengyu...@gmail.com wrote:
 Hi,

 Suppose that I have a table A, each row represents a interval. For
 example, the first row represents an interval [1,10) with a name a.
 The first and second rows are considered overlapping because the
 interval [1,10) and interval [5,15) intersect and both rows have the
 same name a.

 name left right   tag
 -
 a          1     10   tag1
 a          5     15   tag2
 a        21     30   tag3
 b          3     12   tag4
 b        15     25   tag5
 b        19     30   tag6

 I want to inner join the above table and the following table B
 based on the named interval overlapping.

 name left right   attr
 -
 a          3       7   attr1
 a          8     12   attr2
 a        16     18   attr3
 a        25     35   attr4
 b        31     32   attr5

 The result is the following. In each row, the named interval from A
 overlaps the named interval from B. I don't see there is an easy way
 to do this in sqlite3. I could use an external program (such as python
 sqlite package) to enumerate all the named interval from table A and
 search for overlapping named intervals in table B, but this operation
 has a complexity of M log (N), where M is the length of table A and N
 is the length of table B. If some sort of inner join could be used,
 the complexity should be reduced to log(M+N). I'm wondering if there
 something that can help do this kind of named interval inner join
 easily.

 A.name A.left A.right A.tag B.name B.left B.right B.attr
 
 a                  1     10     tag1        a          3       7    attr1
 a                  1     10     tag1        a          8     12    attr2
 a                  5     15     tag2        a          3       7    attr1
 a                  5     15     tag2        a          8     12    attr2
 a                21     30     tag3        a        16     18    attr3

 --
 Regards,
 Peng
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 

Re: [sqlite] Is there a way to inner join on named intervals?

2010-08-13 Thread Pavel Ivanov
 My next question is what index I should create on table
 A and B to speed up such an inner join? Are indexes on each of the
 first three column of each table enough? What is the best comparison
 (along with the appropriate indexes) in term of the performance?

Indexes on each field never help. SQLite uses one index per table
per query only.
For your query you should have index on name field in bigger table
(either A or B). If you write query as I did (A.left  B.right and
A.right  B.left) then depending on data distribution either (name,
left) or (name, right) index could help better than just (name). But
cases when those indices would help are very specific, so in general
your only option is index on name only.


Pavel

On Fri, Aug 13, 2010 at 12:48 PM, Peng Yu pengyu...@gmail.com wrote:
 On Fri, Aug 13, 2010 at 11:32 AM, Pavel Ivanov paiva...@gmail.com wrote:
 I don't understand where do you see a problem but it looks like this
 join will do what you want:

 select * from A, B
 where A.name = B.name
 and A.left  B.right
 and A.right  B.left

 I could use an external program (such as python
 sqlite package) to enumerate all the named interval from table A and
 search for overlapping named intervals in table B, but this operation
 has a complexity of M log (N), where M is the length of table A and N
 is the length of table B. If some sort of inner join could be used,
 the complexity should be reduced to log(M+N).

 How did you come to this conclusion? Any inner join will execute with
 complexity either M log(N) or N log(M) anyway. The only benefit is
 that SQLite can decide which way to join depending on relative size of
 tables A and B. And constant in operation complexity is a bit smaller
 with C code that in python code...

 Thank all the people that replied my email. I don't really understand
 how inner join work. Sorry for the confusion.

 The comparison that I actually need is the following one (logically
 correct, but I'm not sure if it is the fastest comparison in term of
 performance). My next question is what index I should create on table
 A and B to speed up such an inner join? Are indexes on each of the
 first three column of each table enough? What is the best comparison
 (along with the appropriate indexes) in term of the performance?

 create table A (name text, left integer, right integer, tag text);
 create table B (name text, left integer, right integer, attr text);
 insert into A values('a', 1, 10, 'tag1');
 insert into A values('a', 5, 15, 'tag2');
 insert into A values('a', 21, 30, 'tag3');
 insert into A values('b', 3, 12, 'tag4');
 insert into A values('b', 15, 25, 'tag5');
 insert into A values('b', 19, 30, 'tag6');

 insert into B values('a', 3, 7, 'attr1');
 insert into B values('a', 8, 12, 'attr2');
 insert into B values('a', 16, 18, 'attr3');
 insert into B values('a', 25, 35, 'attr4');
 insert into B values('b', 31, 32, 'attr5');

 select * from A inner join B on A.name=B.name AND max(A.left, B.left)
  min(A.right, B.right);

 name        left        right       tag         name        left
  right       attr
 --  --  --  --  --  --
  --  --
 a           1           10          tag1        a           3
  7           attr1
 a           1           10          tag1        a           8
  12          attr2
 a           5           15          tag2        a           3
  7           attr1
 a           5           15          tag2        a           8
  12          attr2
 a           21          30          tag3        a           25
  35          attr4


 On Fri, Aug 13, 2010 at 11:07 AM, Peng Yu pengyu...@gmail.com wrote:
 Hi,

 Suppose that I have a table A, each row represents a interval. For
 example, the first row represents an interval [1,10) with a name a.
 The first and second rows are considered overlapping because the
 interval [1,10) and interval [5,15) intersect and both rows have the
 same name a.

 name left right   tag
 -
 a          1     10   tag1
 a          5     15   tag2
 a        21     30   tag3
 b          3     12   tag4
 b        15     25   tag5
 b        19     30   tag6

 I want to inner join the above table and the following table B
 based on the named interval overlapping.

 name left right   attr
 -
 a          3       7   attr1
 a          8     12   attr2
 a        16     18   attr3
 a        25     35   attr4
 b        31     32   attr5

 The result is the following. In each row, the named interval from A
 overlaps the named interval from B. I don't see there is an easy way
 to do this in sqlite3. I could use an external program (such as python
 sqlite package) to enumerate all the named interval from table A and
 search for overlapping named intervals in table B, but this operation
 has a complexity of M log (N), where M is the length of table A and N
 is the length of table B. If some sort of inner join could