Re: [sqlite] SQLite Use of Indexes
From: "James K. Lowden" <jklow...@schemamania.org> To: sqlite-users@sqlite.org Subject: Re: [sqlite] SQLite Use of Indexes Message-ID: <20130712161038.b8b4df84.jklow...@schemamania.org> Content-Type: text/plain; charset=US-ASCII On Mon, 8 Jul 2013 15:32:21 -0400 "peter korinis" <kori...@earthlink.net> wrote: / thanks James, peter > . a CLAIMS table = 43M rows with indices on claim_no and > stateCounty code; and > . a LINE table = 85M rows with indices on claim_no and HCPCS > (a 5 char text code) > . Have run ANALYZE > . Using Win7 > Want to count occurrences of given HCPCS codes for given states. SELECT COUNT(HCPSCD) CNT, HCPSCD, STATE_CD FROM claims , LINE WHERE CLAIMS.CLAIM_NO == LINE.CLAIM_NO -- ^^ == is not SQL right, a little python slipped in / AND HCPSCD IN ( '78451' , '78452' , '78453' , '78454' , '78469' , '78472' , '78473' , '78481' , '78483' , '78494' , '78496' , '78499' ) AND state_cd IN ('21', '45') GROUP BY STATE_CD >From a design perpective, you might want to think about what the IN clause is all about. Whatever they have in common probably belongs in a table, so you'd be able to say SELECT COUNT(HCPSCD) CNT, HCPSCD, STATE_CD FROM claims, LINE, HPs WHERE CLAIMS.CLAIM_NO = LINE.CLAIM_NO AND HCPSCD = HPs.code AND HPs.category = 'foo' AND state_cd IN ('21', '45') GROUP BY STATE_CD That suggests HCPSCD as the column to index; I suppose you have a lot more different HCPSCDs there are >7000 HCPSCDs ; I'm interested in ~100 in 12 groups / than STATE_CDs. If the HPs table is indexed by (category, code), then the join is on two ordered sets instead of an extended OR. /// would joining 3 tables give better response ?? // If HCPSCD is actually an integer, you may save space and time by declaring it as such. /// HCPSCD is NOT an integer // HTH. --jkl ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLite Use of Indexes
On Mon, 8 Jul 2013 15:32:21 -0400 "peter korinis"wrote: > . a CLAIMS table = 43M rows with indices on claim_no and > stateCounty code; and > > . a LINE table = 85M rows with indices on claim_no and HCPCS > (a 5 char text code) > > . Have run ANALYZE > > . Using Win7 > > > > Want to count occurrences of given HCPCS codes for given states. SELECT COUNT(HCPSCD) CNT, HCPSCD, STATE_CD FROM claims , LINE WHERE CLAIMS.CLAIM_NO == LINE.CLAIM_NO --^^ == is not SQL AND HCPSCD IN ( '78451' , '78452' , '78453' , '78454' , '78469' , '78472' , '78473' , '78481' , '78483' , '78494' , '78496' , '78499' ) AND state_cd IN ('21', '45') GROUP BY STATE_CD >From a design perpective, you might want to think about what the IN clause is all about. Whatever they have in common probably belongs in a table, so you'd be able to say SELECT COUNT(HCPSCD) CNT, HCPSCD, STATE_CD FROM claims, LINE, HPs WHERE CLAIMS.CLAIM_NO = LINE.CLAIM_NO AND HCPSCD = HPs.code AND HPs.category = 'foo' AND state_cd IN ('21', '45') GROUP BY STATE_CD That suggests HCPSCD as the column to index; I suppose you have a lot more different HCPSCDs than STATE_CDs. If the HPs table is indexed by (category, code), then the join is on two ordered sets instead of an extended OR. If HCPSCD is actually an integer, you may save space and time by declaring it as such. HTH. --jkl ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLite Use of Indexes
On 9 Jul 2013, at 6:06pm, peter korinis <kori...@earthlink.net> wrote: > So, to implement your suggestion of crafting better indices, here’s my > approach: > 1. First action is joining the 2 tables on claim_no. {Therefore > claim_no should be first row in index for both tables} > 2. Find state and county in claims table > 3. Find HCPSCD in line table > So my 2 combined indices would be: >CREATE INDEX idx_Claim_State_Cnty ON CLAIMS (CLAIM_NO, > STATE_CD, CNTY_CD) >CREATE INDEX idx_Line_hcpscd ON LINE (CLAIM_NO, HCPSCD) Those would be good indexes. You can see how good by using EXPLAIN QUERY PLAN for your SELECT. Or better still, actually try them out and time the results. Might be interesting to then reverse the order of the columns in each index and try those. See if they’re better or worse. Or create lots of indexes, then use EXPLAIN QUERY PLAN and find out which indexes SQLite decided to use, then delete the others. I am deliberately not giving you an absolute answer because you have a good large set of data for testing and you obviously understand the idea now. You’ll learn more by trying out several alternatives yourself. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLite Use of Indexes
Thank you Simon for responding to my questions. Your phonebook (FName/LName) analogy clearly explained why 2 indices per table per select won't work. Let me provide a bit more info and a possible attempt to implement your suggestions for better indices. My 'bread and butter' query counts the frequency of HCPSCD occurrences for each county of interest in each state of interest. [Occasionally I want counts for entire state.] I generally have 6 groups of 1~15 related HCPSCD codes and want counts from 1-12 counties of a state. CLAIMS table columns: claim_no, state_cd, cnty_cd are columns of interest; 40 other columns rarely used. LINE table columns: claim_no, hcpscd, plus 25 more columns (these are repeating fields per claim) [FYI: DB contains 44M claim numbers with 1~12 HCPSCD codes per claim, several thousand HCPSCD codes of which I am interested in 44 in 6 groups, 50 states with 9~125 counties per state of which I am usually interested in ~10.] I currently do a query like this . SELECT COUNT(HCPSCD) CNT, HCPSCD, STATE_CD, CNTY_CD FROM CLAIMS , LINE WHERE CLAIMS.CLAIM_NO == LINE.CLAIM_NO AND HCPSCD IN ('78451','78452','78453','78454','78469','78472','78473','78481','78483','78 494','78496','78499') -- this is first group of hcpscd codes always queried AND STATE_CD = '21' AND CNTY_CD IN ('220', '345', '570') GROUP BY CNTY_CD UNION {repeat above SELECT with second group of hcpscd codes for same state and county . and so on 4 more times} So, to implement your suggestion of crafting better indices, here's my approach: 1. First action is joining the 2 tables on claim_no. {Therefore claim_no should be first row in index for both tables} 2. Find state and county in claims table 3. Find HCPSCD in line table So my 2 combined indices would be: CREATE INDEX idx_Claim_State_Cnty ON CLAIMS (CLAIM_NO, STATE_CD, CNTY_CD) CREATE INDEX idx_Line_hcpscd ON LINE (CLAIM_NO, HCPSCD) Is this what you were suggesting? Will these indices produce improved performance from the single column indices I was using? Thanks so much. Peter === > 1. Can SQLite use > 1 index per table per SELECT? Seems like using > both indices for each table would be better than the single index per table > as chosen by the plan. No, you?re right: one index per table, per SELECT. Pretend you have a phone book with two indexes: surname and first name. You have to look up someone called Miri Kallas. You can use either of the indexes to look up one name but once you?ve done that the other index is useless to you. You have made up some indexes which are useful but not the most useful. Drop those indexes and try to figure out one compound index on each table which would be best for the SELECT you asked about. Remember that this CREATE INDEX it1 ON t1 (c1) CREATE INDEX it2 ON t1 (c2) does not do the same thing as CREATE INDEX it12 ON t1 (c1, c2) Can?t do it for you because I can?t tell which of your columns are from which table. > 2. Is using something like "SELECT . FROM LINE INDEXED BY claim_no > AND INDEXED BY hcpscd" possible? What is the correct syntax to include 2 > INDEXED BY this way? If you have to tell SQLite which index to use you?re doing it wrong. Make up a good index and SQLite will decide to use it. Especially if you?ve done ANALYZE. > 3. Is there a better way to write this query, for example, would > rearranging the order of AND criteria in WHERE clause improve performance? The query optimizer is meant to do all that for you. However, I suspect that you may understand your query better if you get rid of some of your 'IN' clauses. If you imagine doing 24 (= 2 * 12) different SELECTs, one for each State and HCPSCD, what would your SELECT look like then ? With good indexes it should be possible to make each of those SELECTs execute ridiculously fast. Once you?ve figured out how to do that and make it run fast, /then/ you might want to recombine the query for each State, though perhaps not put both States in the same query. > 4. How do I interpret the 'order' and 'from' in the query plan > results? It?s showing you what each of your indexes is being used for. And what is shows is the neither index is being used for both selecting records and arranging the order of results. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLite Use of Indexes
On 8 Jul 2013, at 8:32pm, peter koriniswrote: > 1. Can SQLite use > 1 index per table per SELECT? Seems like using > both indices for each table would be better than the single index per table > as chosen by the plan. No, you’re right: one index per table, per SELECT. Pretend you have a phone book with two indexes: surname and first name. You have to look up someone called Miri Kallas. You can use either of the indexes to look up one name but once you’ve done that the other index is useless to you. You have made up some indexes which are useful but not the most useful. Drop those indexes and try to figure out one compound index on each table which would be best for the SELECT you asked about. Remember that this CREATE INDEX it1 ON t1 (c1) CREATE INDEX it2 ON t1 (c2) does not do the same thing as CREATE INDEX it12 ON t1 (c1, c2) Can’t do it for you because I can’t tell which of your columns are from which table. > 2. Is using something like "SELECT . FROM LINE INDEXED BY claim_no > AND INDEXED BY hcpscd" possible? What is the correct syntax to include 2 > INDEXED BY this way? If you have to tell SQLite which index to use you’re doing it wrong. Make up a good index and SQLite will decide to use it. Especially if you’ve done ANALYZE. > 3. Is there a better way to write this query, for example, would > rearranging the order of AND criteria in WHERE clause improve performance? The query optimizer is meant to do all that for you. However, I suspect that you may understand your query better if you get rid of some of your 'IN' clauses. If you imagine doing 24 (= 2 * 12) different SELECTs, one for each State and HCPSCD, what would your SELECT look like then ? With good indexes it should be possible to make each of those SELECTs execute ridiculously fast. Once you’ve figured out how to do that and make it run fast, /then/ you might want to recombine the query for each State, though perhaps not put both States in the same query. > 4. How do I interpret the 'order' and 'from' in the query plan > results? It’s showing you what each of your indexes is being used for. And what is shows is the neither index is being used for both selecting records and arranging the order of results. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] SQLite Use of Indexes
I have a large DB and want to improve query performance. Context: the DB contains 2 tables representing nationwide claim data: . a CLAIMS table = 43M rows with indices on claim_no and stateCounty code; and . a LINE table = 85M rows with indices on claim_no and HCPCS (a 5 char text code) . Have run ANALYZE . Using Win7 Want to count occurrences of given HCPCS codes for given states. Here is the SQL query I use: SELECT COUNT(HCPSCD) CNT, HCPSCD, STATE_CD FROM claims , LINE WHERE CLAIMS.CLAIM_NO == LINE.CLAIM_NO AND HCPSCD IN ('78451','78452','78453','78454','78469','78472','78473','78481','78483','78 494','78496','78499') AND state_cd IN ('21', '45') GROUP BY STATE_CD ; Query Plan results Order From 0 1 TABLE LINE WITH INDEX idx_line_hcpscd 1 0 TABLE claims WITH INDEX idx_claims_claim_no QUESTIONS: 1. Can SQLite use > 1 index per table per SELECT? Seems like using both indices for each table would be better than the single index per table as chosen by the plan. 2. Is using something like "SELECT . FROM LINE INDEXED BY claim_no AND INDEXED BY hcpscd" possible? What is the correct syntax to include 2 INDEXED BY this way? 3. Is there a better way to write this query, for example, would rearranging the order of AND criteria in WHERE clause improve performance? 4. How do I interpret the 'order' and 'from' in the query plan results? Thanks much for your help, peter ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users