[sqlite] Is there a way to inner join on named intervals?
Hi, Suppose that I have a table A, each row represents a interval. For example, the first row represents an interval [1,10) with a name a. The first and second rows are considered overlapping because the interval [1,10) and interval [5,15) intersect and both rows have the same name a. name left right tag - a 1 10 tag1 a 5 15 tag2 a21 30 tag3 b 3 12 tag4 b15 25 tag5 b19 30 tag6 I want to inner join the above table and the following table B based on the named interval overlapping. name left right attr - a 3 7 attr1 a 8 12 attr2 a16 18 attr3 a25 35 attr4 b31 32 attr5 The result is the following. In each row, the named interval from A overlaps the named interval from B. I don't see there is an easy way to do this in sqlite3. I could use an external program (such as python sqlite package) to enumerate all the named interval from table A and search for overlapping named intervals in table B, but this operation has a complexity of M log (N), where M is the length of table A and N is the length of table B. If some sort of inner join could be used, the complexity should be reduced to log(M+N). I'm wondering if there something that can help do this kind of named interval inner join easily. A.name A.left A.right A.tag B.name B.left B.right B.attr a 1 10 tag1a 3 7attr1 a 1 10 tag1a 8 12attr2 a 5 15 tag2a 3 7attr1 a 5 15 tag2a 8 12attr2 a21 30 tag3a16 18attr3 -- Regards, Peng ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Is there a way to inner join on named intervals?
Did you try something like(pseudo code): select * from A inner join B on A.name = B.name AND ( B.left between(A.left,A.right) OR B.right between(A.left,A.right) ) On 8/13/2010 8:07 AM, Peng Yu wrote: Hi, Suppose that I have a table A, each row represents a interval. For example, the first row represents an interval [1,10) with a name a. The first and second rows are considered overlapping because the interval [1,10) and interval [5,15) intersect and both rows have the same name a. name left right tag - a 1 10 tag1 a 5 15 tag2 a21 30 tag3 b 3 12 tag4 b15 25 tag5 b19 30 tag6 I want to inner join the above table and the following table B based on the named interval overlapping. name left right attr - a 3 7 attr1 a 8 12 attr2 a16 18 attr3 a25 35 attr4 b31 32 attr5 The result is the following. In each row, the named interval from A overlaps the named interval from B. I don't see there is an easy way to do this in sqlite3. I could use an external program (such as python sqlite package) to enumerate all the named interval from table A and search for overlapping named intervals in table B, but this operation has a complexity of M log (N), where M is the length of table A and N is the length of table B. If some sort of inner join could be used, the complexity should be reduced to log(M+N). I'm wondering if there something that can help do this kind of named interval inner join easily. A.name A.left A.right A.tag B.name B.left B.right B.attr a 1 10 tag1a 3 7attr1 a 1 10 tag1a 8 12attr2 a 5 15 tag2a 3 7attr1 a 5 15 tag2a 8 12attr2 a21 30 tag3a16 18attr3 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Is there a way to inner join on named intervals?
On 13 August 2010 16:07, Peng Yu pengyu...@gmail.com wrote: Hi, Suppose that I have a table A, each row represents a interval. For example, the first row represents an interval [1,10) with a name a. The first and second rows are considered overlapping because the interval [1,10) and interval [5,15) intersect and both rows have the same name a. name left right tag - a 1 10 tag1 a 5 15 tag2 a 21 30 tag3 b 3 12 tag4 b 15 25 tag5 b 19 30 tag6 I want to inner join the above table and the following table B based on the named interval overlapping. name left right attr - a 3 7 attr1 a 8 12 attr2 a 16 18 attr3 a 25 35 attr4 b 31 32 attr5 The result is the following. In each row, the named interval from A overlaps the named interval from B. I don't see there is an easy way to do this in sqlite3. I could use an external program (such as python sqlite package) to enumerate all the named interval from table A and search for overlapping named intervals in table B, but this operation has a complexity of M log (N), where M is the length of table A and N is the length of table B. If some sort of inner join could be used, the complexity should be reduced to log(M+N). I'm wondering if there something that can help do this kind of named interval inner join easily. A.name A.left A.right A.tag B.name B.left B.right B.attr a 1 10 tag1 a 3 7 attr1 a 1 10 tag1 a 8 12 attr2 a 5 15 tag2 a 3 7 attr1 a 5 15 tag2 a 8 12 attr2 a 21 30 tag3 a 16 18 attr3 Last line does not overlap. Assuming that is an oversight, then select * from A inner join B on A.name=B.name and A.left=B.right and A.right =B.left; seems to do what you want. -- Regards, Peng Regards, Simon ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Is there a way to inner join on named intervals?
I don't understand where do you see a problem but it looks like this join will do what you want: select * from A, B where A.name = B.name and A.left B.right and A.right B.left I could use an external program (such as python sqlite package) to enumerate all the named interval from table A and search for overlapping named intervals in table B, but this operation has a complexity of M log (N), where M is the length of table A and N is the length of table B. If some sort of inner join could be used, the complexity should be reduced to log(M+N). How did you come to this conclusion? Any inner join will execute with complexity either M log(N) or N log(M) anyway. The only benefit is that SQLite can decide which way to join depending on relative size of tables A and B. And constant in operation complexity is a bit smaller with C code that in python code... Pavel On Fri, Aug 13, 2010 at 11:07 AM, Peng Yu pengyu...@gmail.com wrote: Hi, Suppose that I have a table A, each row represents a interval. For example, the first row represents an interval [1,10) with a name a. The first and second rows are considered overlapping because the interval [1,10) and interval [5,15) intersect and both rows have the same name a. name left right tag - a 1 10 tag1 a 5 15 tag2 a 21 30 tag3 b 3 12 tag4 b 15 25 tag5 b 19 30 tag6 I want to inner join the above table and the following table B based on the named interval overlapping. name left right attr - a 3 7 attr1 a 8 12 attr2 a 16 18 attr3 a 25 35 attr4 b 31 32 attr5 The result is the following. In each row, the named interval from A overlaps the named interval from B. I don't see there is an easy way to do this in sqlite3. I could use an external program (such as python sqlite package) to enumerate all the named interval from table A and search for overlapping named intervals in table B, but this operation has a complexity of M log (N), where M is the length of table A and N is the length of table B. If some sort of inner join could be used, the complexity should be reduced to log(M+N). I'm wondering if there something that can help do this kind of named interval inner join easily. A.name A.left A.right A.tag B.name B.left B.right B.attr a 1 10 tag1 a 3 7 attr1 a 1 10 tag1 a 8 12 attr2 a 5 15 tag2 a 3 7 attr1 a 5 15 tag2 a 8 12 attr2 a 21 30 tag3 a 16 18 attr3 -- Regards, Peng ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Is there a way to inner join on named intervals?
On Fri, Aug 13, 2010 at 11:32 AM, Pavel Ivanov paiva...@gmail.com wrote: I don't understand where do you see a problem but it looks like this join will do what you want: select * from A, B where A.name = B.name and A.left B.right and A.right B.left I could use an external program (such as python sqlite package) to enumerate all the named interval from table A and search for overlapping named intervals in table B, but this operation has a complexity of M log (N), where M is the length of table A and N is the length of table B. If some sort of inner join could be used, the complexity should be reduced to log(M+N). How did you come to this conclusion? Any inner join will execute with complexity either M log(N) or N log(M) anyway. The only benefit is that SQLite can decide which way to join depending on relative size of tables A and B. And constant in operation complexity is a bit smaller with C code that in python code... Thank all the people that replied my email. I don't really understand how inner join work. Sorry for the confusion. The comparison that I actually need is the following one (logically correct, but I'm not sure if it is the fastest comparison in term of performance). My next question is what index I should create on table A and B to speed up such an inner join? Are indexes on each of the first three column of each table enough? What is the best comparison (along with the appropriate indexes) in term of the performance? create table A (name text, left integer, right integer, tag text); create table B (name text, left integer, right integer, attr text); insert into A values('a', 1, 10, 'tag1'); insert into A values('a', 5, 15, 'tag2'); insert into A values('a', 21, 30, 'tag3'); insert into A values('b', 3, 12, 'tag4'); insert into A values('b', 15, 25, 'tag5'); insert into A values('b', 19, 30, 'tag6'); insert into B values('a', 3, 7, 'attr1'); insert into B values('a', 8, 12, 'attr2'); insert into B values('a', 16, 18, 'attr3'); insert into B values('a', 25, 35, 'attr4'); insert into B values('b', 31, 32, 'attr5'); select * from A inner join B on A.name=B.name AND max(A.left, B.left) min(A.right, B.right); nameleftright tag nameleft right attr -- -- -- -- -- -- -- -- a 1 10 tag1a 3 7 attr1 a 1 10 tag1a 8 12 attr2 a 5 15 tag2a 3 7 attr1 a 5 15 tag2a 8 12 attr2 a 21 30 tag3a 25 35 attr4 On Fri, Aug 13, 2010 at 11:07 AM, Peng Yu pengyu...@gmail.com wrote: Hi, Suppose that I have a table A, each row represents a interval. For example, the first row represents an interval [1,10) with a name a. The first and second rows are considered overlapping because the interval [1,10) and interval [5,15) intersect and both rows have the same name a. name left right tag - a 1 10 tag1 a 5 15 tag2 a 21 30 tag3 b 3 12 tag4 b 15 25 tag5 b 19 30 tag6 I want to inner join the above table and the following table B based on the named interval overlapping. name left right attr - a 3 7 attr1 a 8 12 attr2 a 16 18 attr3 a 25 35 attr4 b 31 32 attr5 The result is the following. In each row, the named interval from A overlaps the named interval from B. I don't see there is an easy way to do this in sqlite3. I could use an external program (such as python sqlite package) to enumerate all the named interval from table A and search for overlapping named intervals in table B, but this operation has a complexity of M log (N), where M is the length of table A and N is the length of table B. If some sort of inner join could be used, the complexity should be reduced to log(M+N). I'm wondering if there something that can help do this kind of named interval inner join easily. A.name A.left A.right A.tag B.name B.left B.right B.attr a 1 10 tag1 a 3 7 attr1 a 1 10 tag1 a 8 12 attr2 a 5 15 tag2 a 3 7 attr1 a 5 15 tag2 a 8 12 attr2 a 21 30 tag3 a 16 18 attr3 -- Regards, Peng ___ sqlite-users mailing list sqlite-users@sqlite.org
Re: [sqlite] Is there a way to inner join on named intervals?
My next question is what index I should create on table A and B to speed up such an inner join? Are indexes on each of the first three column of each table enough? What is the best comparison (along with the appropriate indexes) in term of the performance? Indexes on each field never help. SQLite uses one index per table per query only. For your query you should have index on name field in bigger table (either A or B). If you write query as I did (A.left B.right and A.right B.left) then depending on data distribution either (name, left) or (name, right) index could help better than just (name). But cases when those indices would help are very specific, so in general your only option is index on name only. Pavel On Fri, Aug 13, 2010 at 12:48 PM, Peng Yu pengyu...@gmail.com wrote: On Fri, Aug 13, 2010 at 11:32 AM, Pavel Ivanov paiva...@gmail.com wrote: I don't understand where do you see a problem but it looks like this join will do what you want: select * from A, B where A.name = B.name and A.left B.right and A.right B.left I could use an external program (such as python sqlite package) to enumerate all the named interval from table A and search for overlapping named intervals in table B, but this operation has a complexity of M log (N), where M is the length of table A and N is the length of table B. If some sort of inner join could be used, the complexity should be reduced to log(M+N). How did you come to this conclusion? Any inner join will execute with complexity either M log(N) or N log(M) anyway. The only benefit is that SQLite can decide which way to join depending on relative size of tables A and B. And constant in operation complexity is a bit smaller with C code that in python code... Thank all the people that replied my email. I don't really understand how inner join work. Sorry for the confusion. The comparison that I actually need is the following one (logically correct, but I'm not sure if it is the fastest comparison in term of performance). My next question is what index I should create on table A and B to speed up such an inner join? Are indexes on each of the first three column of each table enough? What is the best comparison (along with the appropriate indexes) in term of the performance? create table A (name text, left integer, right integer, tag text); create table B (name text, left integer, right integer, attr text); insert into A values('a', 1, 10, 'tag1'); insert into A values('a', 5, 15, 'tag2'); insert into A values('a', 21, 30, 'tag3'); insert into A values('b', 3, 12, 'tag4'); insert into A values('b', 15, 25, 'tag5'); insert into A values('b', 19, 30, 'tag6'); insert into B values('a', 3, 7, 'attr1'); insert into B values('a', 8, 12, 'attr2'); insert into B values('a', 16, 18, 'attr3'); insert into B values('a', 25, 35, 'attr4'); insert into B values('b', 31, 32, 'attr5'); select * from A inner join B on A.name=B.name AND max(A.left, B.left) min(A.right, B.right); name left right tag name left right attr -- -- -- -- -- -- -- -- a 1 10 tag1 a 3 7 attr1 a 1 10 tag1 a 8 12 attr2 a 5 15 tag2 a 3 7 attr1 a 5 15 tag2 a 8 12 attr2 a 21 30 tag3 a 25 35 attr4 On Fri, Aug 13, 2010 at 11:07 AM, Peng Yu pengyu...@gmail.com wrote: Hi, Suppose that I have a table A, each row represents a interval. For example, the first row represents an interval [1,10) with a name a. The first and second rows are considered overlapping because the interval [1,10) and interval [5,15) intersect and both rows have the same name a. name left right tag - a 1 10 tag1 a 5 15 tag2 a 21 30 tag3 b 3 12 tag4 b 15 25 tag5 b 19 30 tag6 I want to inner join the above table and the following table B based on the named interval overlapping. name left right attr - a 3 7 attr1 a 8 12 attr2 a 16 18 attr3 a 25 35 attr4 b 31 32 attr5 The result is the following. In each row, the named interval from A overlaps the named interval from B. I don't see there is an easy way to do this in sqlite3. I could use an external program (such as python sqlite package) to enumerate all the named interval from table A and search for overlapping named intervals in table B, but this operation has a complexity of M log (N), where M is the length of table A and N is the length of table B. If some sort of inner join could