[sqlite] Implicit Indices on Subqueries used as lookups or joins ....???
Hi I have continious issues with subquery performance when subqueries are used for joins. It crops up all the time my daily work. If you create a derived table using a subquery and use it in a join SQLite performance is abysmal. However if you make a temp table from said subquery and index this temp table on the join keys, it goes at incredible speed. Examples include a query which takes over 2 hours and doesnt complete as I killed it, to running in under 10 seconds if use the temp table pattern. This pattern of the temp table has to be repeated for almost any data analysis I do as SQLite subquery performance with joins is so bad. To recreate the problem simple create two subqueries which produce say 100 000 records each with composite integer keys and join them. e.g Table1 (Key1, Key2, Key3, Value) Table2 (Key1, Key2, Key3, Value) select * from (select Key1, Key2, sum(Value) as Value) from Table1 group by Key1, Key2) t1 join (select Key1, Key2, sum(Value) as Value) from Table2 group by Key1, Key2) t2 on (t1.Key1 = t2.Key1 and t2.Key2 = t2.Key2) Make sure T1 and Most esp T2 have large volumes of records to highlight the problem, eg. 100 000 each does the job. 2 hours versus 10 seconds on my hardware. Can SQLite be altered to automatically create an index on subqueries used as joins or lookups for the key fields used in the join or lookup. This would, in my experience and opinion make SQLite so much more effective. The cost in time of creating said indices is usually less 1 second on my hardware and examples and saves hours! Thanks, ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Implicit Indices on Subqueries used as lookups or joins ....???
I still consider it a work around for adhoc queries. Programatically I can of course use it easily, but when analysing data one runs many adhoc queires which you change minute on minute. Having to create temp tables for each change and give it a new name for each change is a real pain. Further given platforms like Oracle dont appear to suffer from this problem, I assume (as dangerous as that is) that they actually do create implicit indices. It is a nice to have I agree, but its big win nice to have! The ability to build indices on temp tables already exists in Sqlite, surely it cant be too hard to apply this when building temp tables from subqueries as you must have parsed the join criteria to be able join the tables. Thanks for the reply, S On Tue, Dec 2, 2008 at 3:57 PM, P Kishor [EMAIL PROTECTED] wrote: On 12/2/08, Da Martian [EMAIL PROTECTED] wrote: Hi I have continious issues with subquery performance when subqueries are used for joins. It crops up all the time my daily work. If you create a derived table using a subquery and use it in a join SQLite performance is abysmal. However if you make a temp table from said subquery and index this temp table on the join keys, it goes at incredible speed. Examples include a query which takes over 2 hours and doesnt complete as I killed it, to running in under 10 seconds if use the temp table pattern. This pattern of the temp table has to be repeated for almost any data analysis I do as SQLite subquery performance with joins is so bad. To recreate the problem simple create two subqueries which produce say 100 000 records each with composite integer keys and join them. e.g Table1 (Key1, Key2, Key3, Value) Table2 (Key1, Key2, Key3, Value) select * from (select Key1, Key2, sum(Value) as Value) from Table1 group by Key1, Key2) t1 join (select Key1, Key2, sum(Value) as Value) from Table2 group by Key1, Key2) t2 on (t1.Key1 = t2.Key1 and t2.Key2 = t2.Key2) Make sure T1 and Most esp T2 have large volumes of records to highlight the problem, eg. 100 000 each does the job. 2 hours versus 10 seconds on my hardware. Can SQLite be altered to automatically create an index on subqueries used as joins or lookups for the key fields used in the join or lookup. This would, in my experience and opinion make SQLite so much more effective. The cost in time of creating said indices is usually less 1 second on my hardware and examples and saves hours! I have experienced the same, and my solution is exactly as noted above... programmatically create temp tables with appropriate indexes, and then query with those temp tables. No need to even drop the temp tables as they go away when the connection is dropped. Works like a charm, so there has been really no need to want to have core SQLite do the same for me, but I guess it might be nice. -- Puneet Kishor ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance Problems with joining and subqueries
Hi I do appologise. In my hast, I did copy bits of a larger query. The queries are correct they just have extra bits around the edges. I will post the corrections below. Regarding using Case I didnt know sqlite supported this. In fact looking at core functions on the web there isnt any mention of a case statement, nor in aggregate functions. Knowing this I can restructure as you have suggested. My first attempt at running the query as provided showed 10+ minutes performance similiar to the two subqueries joined together meaning the temp table option still wins hands down. So I still have the question about why a temp table is quicker than a subquery (derived table)? For this query, an index on (Customer, PrincipalContractNo, SASContract, BusinessArea, ProductGroup, Site), or any prefix of this list, might help. Regarding this, I did create this index as mentioned in the post before, however it performed worse than the PK index. The PK index query takes around 1 minute. Creating the above index it balloons to over 10+ minutes which I found strange as in theory a index matching the group by should help collect the records together in the correct order. Thanks again for your help, and sorry about the shody copy paste job. Corrected Subqueires: Corrections: 1st subquery: SELECT cl.Customer as Customer, cl.PrincipalContractNo as PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea, cl.ProductGroup, cl.Site, rr.period_id as period, sum(cl.NCRAmt) as NCRAmt, sum(rr.revenue_amount) as RevenueAmt FROM ROT_DIM_CONTRACT_LINE cl join ROT_FACT_REV_ROLLOUT rr on (cl.ContractNo = rr.ContractNo and cl.ContractLine = rr.ContractLine) where rr.period_id 200809 group by cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea, cl.ProductGroup, cl.Site, rr.period_id 2nd Subquery: SELECT cl.Customer as Customer, cl.PrincipalContractNo as PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea, cl.ProductGroup, cl.Site, sum(cl.NCRAmt) as NCRAmt FROM ROT_DIM_CONTRACT_LINE cl join ROT_FACT_REV_ROLLOUT rr on (cl.ContractNo = rr.ContractNo and cl.ContractLine = rr.ContractLine) where rr.period_id = 200809 group by cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea, cl.ProductGroup, cl.Site On Thu, Oct 23, 2008 at 4:52 PM, Igor Tandetnik [EMAIL PROTECTED] wrote: Da Martian [EMAIL PROTECTED] wrote: I have two tables as follows: CREATE TABLE ROT_DIM_CONTRACT_LINE (ContractNo Integer,ContractLine Integer,StartDate Integer,EndDate Integer,NCRAmt Float,Item Integer,BusinessArea Text,ProductGroup Text,ProductStyle Text,Site Text,Customer Integer,PrincipalContractNo Text,SASContract Text,ContractMonths Float,StartMonths Float,FullMonths Float,EndMonths Float,MonthlyAmortAmt Float,FirstAmortAmt Float,LastAmort Float,BalancingAmortAmt Float,RolloutToContractDiff Float, PRIMARY KEY (ContractNo ASC,ContractLine ASC)) -CREATE UNIQUE INDEX IDX_ROT_DIM_CONTRACT_LINE_1 ON ROT_DIM_CONTRACT_LINE (ContractNo ASC,ContractLine ASC) You don't need this index. PRIMARY KEY clause has already created the same index implicitly. CREATE TABLE ROT_FACT_REV_ROLLOUT (Period_ID Integer,ContractNo Integer,ContractLine Integer,Revenue_Amount Float, PRIMARY KEY (Period_ID ASC,ContractNo ASC,ContractLine ASC)) CREATE UNIQUE INDEX IDX_ROT_FACT_REV_ROLLOUT_1 ON ROT_FACT_REV_ROLLOUT (Period_ID ASC,ContractNo ASC,ContractLine ASC) Same here. Drop the index. ROT_DIM_CONTRACT_LINE has 131,747 records ROT_FACT_REV_ROLLOUT has 3,971,369 records The process I am doing is two fold: 1) Joining the data for complete list 2) Splitting the data by date (Period_ID) into two catagories based on a threshold date. so if my threshold is Sep 2008 (200809) I want all records after Sep 2008 to be displayed AND I want a total column for all records prior to Sep 2008. No case statements in SQLite so two subqueries does the job: SEP 2008 select * from (SELECT cl.Customer as Customer, cl.PrincipalContractNo as PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea, cl.ProductGroup, cl.Site, rr.period_id as period, sum(cl.NCRAmt) as NCRAmt, sum(rr.revenue_amount) as RevenueAmt FROM ROT_DIM_CONTRACT_LINE cl join ROT_FACT_REV_ROLLOUT rr on (cl.ContractNo = rr.ContractNo and cl.ContractLine = rr.ContractLine) where rr.period_id 200809 group by cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea, cl.ProductGroup, cl.Site, rr.period_id You have fewer closing parens here than you have opening ones. Why are you doing select from select? Why an extra level of indirection? For this query, an index on (Customer, PrincipalContractNo, SASContract, BusinessArea, ProductGroup, Site), or any prefix of this list, might help. = SEP 2008 SELECT cl.Customer as Customer, cl.PrincipalContractNo as PrincipalContractNo, SASContract as SASContractNo
[sqlite] Performance Problems with joining and subqueries
Hi I have two queries. Each returns a full result set in around 1 minute when run on there own. Making each of these queries a subquery and joining them with an inner join, and the query takes more than 10 minutes. I cancelled it after 10 minutes. Does anyone have any ideas on how to optimise this type of process in SQLite? Indices only make things worse. I have read through most of the threads on this and it still seems unclear as to when and why indices chosen seem to be worse. Analyse always seems to make things worse. So does adding indices. I have a two tables: one wide and low on records and one thin and having around 4 million records. Withonly UNIQUE PK indices, the queries complete in around 1 minute usually. With an index that satisfies the group by, it takes over 10 minutes and still doesnt complete. Analsye makes sure it uses the group by index as well. To get best performance I seem to have to clear all indices and the sqlite_stat1 table. So I am at a loss as to how to take two 1 minute queriues and join them in an efficient manner as subqueries. If I make the first subquery into a temp table then do join the temp table to the second query, it completes in 1.5 minutes. I would expect the same performance from subqueries. Thanksm ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance Problems with joining and subqueries
) order by fut.Customer, fut.PrincipalContractNo, fut.SASContractNo, fut.BusinessArea, fut.ProductGroup, fut.Site, fut.Period However if I change the first query to be a temp table: drop table fut create temp table fut as SELECT cl.Customer as Customer, cl.PrincipalContractNo as PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea as BusinessArea, cl.ProductGroup as ProductGroup, cl.Site as Site, rr.period_id as period, sum(cl.NCRAmt) as NCRAmt, sum(rr.revenue_amount) as RevenueAmt FROM ROT_DIM_CONTRACT_LINE cl join ROT_FACT_REV_ROLLOUT rr on (cl.ContractNo = rr.ContractNo and cl.ContractLine = rr.ContractLine) where rr.period_id 200809 group by cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea, cl.ProductGroup, Site, rr.period_id; I know get around 2 minutes for the temp table combined with the second subquery: select fut.Customer as Customer, fut.PrincipalContractNo as PrincipalContractNo, fut.SASContractNo as SASContractNo, fut.BusinessArea as BusinessArea, fut.ProductGroup as ProductGroup, fut.Site as Site, ncritd.NCRAmt as MCR_ITD, fut.Period as Period, fut.RevenueAmt as RevenueAmt from fut fut join (SELECT cl.Customer as Customer, cl.PrincipalContractNo as PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea, cl.ProductGroup, cl.Site, sum(cl.NCRAmt) as NCRAmt FROM ROT_DIM_CONTRACT_LINE cl join ROT_FACT_REV_ROLLOUT rr on (cl.ContractNo = rr.ContractNo and cl.ContractLine = rr.ContractLine) where rr.period_id = 200809 group by cl.Customer, PrincipalContractNo, SASContract, BusinessArea, ProductGroup, Site) ncritd on (fut.Customer = ncritd.Customer and fut.PrincipalContractNo = ncritd.PrincipalContractNo and fut.SASContractNo = ncritd.SASContractNo and fut.BusinessArea = ncritd.BusinessArea and fut.ProductGroup = ncritd.ProductGroup and fut.Site = ncritd.Site) order by fut.Customer, fut.PrincipalContractNo, fut.SASContractNo, fut.BusinessArea, fut.ProductGroup, fut.Site, fut.Period ; So subqueries appear far more inefficient than temp tables. Thanks, On Thu, Oct 23, 2008 at 1:05 PM, Igor Tandetnik [EMAIL PROTECTED] wrote: Da Martian [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED][EMAIL PROTECTED] Does anyone have any ideas on how to optimise this type of process in SQLite? What type of process? Show your tables and your query. Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Tip: Perfromance over large datasets
Hi I suspect most of you know this, but since it helped me I decided to post it. I have around 5 million records in a sqlite file of about 400MB. Running joins over this was taking hours! And the disk went mental. Tweaking the following two pragmas solved the problem though. Hours to seconds! PRAGMA temp_store = 2; PRAGMA cache_size = 10; If this wasnt already so fast I would have just loaded the whole thing into a :memory: db. Regards ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Re: Unicode Help
It looks fine to me. To help check it, one thing you can try is writing the result of FieldAsString directly to a file as raw bytes, then in notepad open that with encoding set to Unicode. E.g. something logically equivalent to: size := Length(field) * 2; SetLength(buffer, size ); System.Move(field^, buffer^, size); file.Write(buffer, size); I imagine you don't have to jump through hoops like that, but hopefully you see what I have in mind. If the result looks good in notepad, then you know this layer is fine, so the problem must be closer to the display layer. Hi How do you set Notepad to Ecnoding = Unicode. I cant see an option for that ?
Re: [sqlite] Re: Unicode Help
EUREKA! Ok I got it working now. It turns out my source was UTF-8 Encoded, so even when i used the utf-16 functions it wasnt comming out right. I am now doing a converstion in delphi from UTF-8 to UTF16 and using all UTF-16 sqlite functions as recommended. Thanks a million for all your help, it was all your suggestions which lead me to the solution. Much appreciated. Have a good weekend. S On 12/8/06, Trevor Talbot [EMAIL PROTECTED] wrote: On 12/7/06, Da Martian [EMAIL PROTECTED] wrote: Yeah I am currently using VirtualTree from Mikes Delphi Gems. Its fully unicode enabled (I beleive). I use WideStrings through out the entire pipeline from xml I recieve into SQLite via the prepare16 back out through column_text16 into virtual tree. Well thats true, the SQL APIs are mapped to return PWideChar which is then copied via System.Move into a widestring as follows: [ DLL interfaces ] Previously (before my langauge headaches :-) ) I was doing the above without the APIs ending in 16, and everything was string and PChar in the above layer. The layer that used this class has always had WideString. I realise your probably not delphi pros, but if you do spot something stupid I am doing I would appreciate any help you can offer. I've never used Delphi, but I did sleep at a Holiday Inn last night... It looks fine to me. To help check it, one thing you can try is writing the result of FieldAsString directly to a file as raw bytes, then in notepad open that with encoding set to Unicode. E.g. something logically equivalent to: size := Length(field) * 2; SetLength(buffer, size ); System.Move(field^, buffer^, size); file.Write(buffer, size); I imagine you don't have to jump through hoops like that, but hopefully you see what I have in mind. If the result looks good in notepad, then you know this layer is fine, so the problem must be closer to the display layer. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: Unicode Help
I think std function for convertions would be very helpful. I am still having issues trying to get my charaters standardizsed. I spent all of yesterday playing with ideas but it still in the dark. Part of my problem is I dont have a clue what my source data is encoded as. Does anyone know of a tool which can try and guess the encoding? Basically its a custom java bean written by someone else. It takes reports from a third party system turns them into XML using a string buffer. They just append everything to a string buffer. The code which actually adds this to the output (the key peice) I cant actually see at this point. So by my best guess based on research is that java usually uses UTF-16. But if this is so, it should work. If I add the text using the *16 prepare and then retrieve it using the *16 column_text, I still get the two seperate characters instead of the umlaught thingie. So I can only assume that somehow my source isnt UTF-16. or I am converting it somewhere in the middle. This is possible since I am using Delphi and it has some implicit convertions, but I think I have got that under control. The problem is if I copy my source and paste it into Notepad say, it shows correctly cause notepad then does it own stuff, and if I save the notepad and read that it works fine. *sigh*. So my questions are: 1) Any tools to determine encoding based on datas? 2) When using the NON16 version of prepare: If I add text which is in UTF16 what happens? 16 Version: If I add UTF16 text what happnes? if I add UTF-8 Text what happens? if I add ASCIII text what happnes? Thanks,
Re: [sqlite] Re: Unicode Help
I am still having issues trying to get my charaters standardizsed. I spent all of yesterday playing with ideas but it still in the dark. Whatever you were doing the first time was fine: I have been having that very thought! So if I look at a name with umlaughts in the database via sqlite3.exe I get: Städt. Klinikum Neunkirchen gGmbH -- | an a with two dots on top That text was properly encoded as UTF-8. The ONLY issue with that line is that the sqlite shell under Windows is incapable of displaying Unicode, so you need to retrieve the data from sqlite using a tool that is. The actual storage of it is perfect. I know the console sqlite3 wont show it. The delphi control I am using does unicode. If I add the text using the *16 prepare and then retrieve it using the *16 column_text, I still get the two seperate characters instead of the umlaught thingie. So I can only assume that somehow my source isnt UTF-16. or I am converting it somewhere in the middle. This is possible since I am using Delphi and it has some implicit convertions, but I think I have got that under control. AFAIK Delphi has no built-in Unicode support at all; you will need to find third-party support for everything, from processing to display controls. It is likely you are ending up with UTF-8 data at some point in the pipeline, and whatever you're doing to process it does not understand UTF-8. Yeah I am currently using VirtualTree from Mikes Delphi Gems. Its fully unicode enabled (I beleive). I use WideStrings through out the entire pipeline from xml I recieve into SQLite via the prepare16 back out through column_text16 into virtual tree. Well thats true, the SQL APIs are mapped to return PWideChar which is then copied via System.Move into a widestring as follows: Show code :) * DLL * function sqlite3_open(filename: PWideChar; var db: pointer): integer; cdecl; external 'sqlite3.dll' name 'sqlite3_open16'; function sqlite3_close(db: pointer): integer; cdecl; external 'sqlite3.dll '; function sqlite3_exec(db: pointer; sql: PWideChar; callback: pointer; userdata: PWideChar; var errmsg: PWideChar): integer; cdecl; external ' sqlite3.dll' name 'sqlite3_exec16'; procedure sqlite3_free(ptr: PWideChar); cdecl; external 'sqlite3.dll'; function sqlite3_prepare(db: pointer; sql: PWideChar; nBytes: integer; var stmt: pointer; var ztail: PWideChar): integer; cdecl; external 'sqlite3.dll' name 'sqlite3_prepare16'; function sqlite3_column_bytes(stmt: pointer; col: integer): integer; cdecl; external 'sqlite3.dll' name 'sqlite3_column_bytes16'; function sqlite3_column_text(stmt: pointer; col: integer): PWideChar; cdecl; external 'sqlite3.dll' name 'sqlite3_column_text16'; *CLASS* function TSqliteQueryResults.FieldAsString(i: integer): WideString; var size: integer; temp: PWideChar; begin size := FieldSize(i); SetLength(result, size div 2); temp := sqlite3_column_text(Fstmt, i); System.Move(sqlite3_column_text(Fstmt, i)^, PWideChar(result)^, size); end; function TSqliteQueryResults.FieldSize(i: integer): integer; begin result := sqlite3_column_bytes(Fstmt, i); end; * END Previously (before my langauge headaches :-) ) I was doing the above without the APIs ending in 16, and everything was string and PChar in the above layer. The layer that used this class has always had WideString. I realise your probably not delphi pros, but if you do spot something stupid I am doing I would appreciate any help you can offer. Thanks,
Re: [sqlite] Unicode Help
Hi You seem really confused about the whole encoding issue. Yes definatly confused, I had always hope unicode would simplify the world, but my experiences have shown no such luck :-) Codepages haunted my past and encodings haunt my future :-) Ok, that does answer one of my questions I think. If I passed something not in UTF-8 to sqlite would it return it exactly the same way I passed it in? From your statement of chaos below I assume it wont if that data somehow violates UTF-8. So I need to get it to UTF-8 or UTF16 before I insert. Thanks for the information.
[sqlite] Unicode Help
Hi I have a system up and working using sqlite3, but I think I am having unicode issues and I am not sure how I should go about coding the solution. I was hoping someone could share the approach needed. Here is my situation: I have german characters which Umlauts which I would like to get back out of sqlite. An example is an a with two little dots on the top. I have been using the non 16 versions. But in my mind thats ok, I just want whatever I put in back out again. The facts that its unicode should make a diff to sqlite. Unicode of 2 bytes say will be just be 2 normla chars to sqlite. At least this was an assumption. So if I look at a name with umlaughts in the database via sqlite3.exe I get: Städt. Klinikum Neunkirchen gGmbH -- | an a with two dots on top Now I expected that when this was put back into a unicode field it would be ok, but it doesnt seem to work. So I tried the *16 versions, but now the field size returned by sqlite3_column_bytes16 always seems to be larger than the string I get back resulting in junk characters on the end. So I get the Umlauts in my application but all this other junk as well. Any ideas ?
Re: [sqlite] Powered-by Images?
Hi I still cant seem to locate any, but I used the main icon from the homepage and it looks pretty good on my about box. Thanks, On 11/29/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Da Martian [EMAIL PROTECTED] wrote: Hi Are there any powered by images for SQLite3? A search for 'powered by sqlite' on google images turned up two candidates for me. Both seem adequate, though neither is great. You want to make a new and better one and contribute it to the project? -- D. Richard Hipp [EMAIL PROTECTED] - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Powered-by Images?
Hi Are there any powered by images for SQLite3? Google seems at a loss which is unsual? S
Re: [sqlite] Re: Regarding sqlite3_exec
Hi Thanks for the reposnse. The main reason is my record count could be from a few thousands to a million. But even at the lowly numbers of around 1 the interface can seem slugish if you read every record before displaying anything. As you mention, and has been disucssed above, doing stuff in the background is good way to go, but more complex. As a generla rule of coding I put as few unneccessary threads into a phase 1 program as I can, because the complexity goes up hugly, threads can be complex to use, co-ordinate, interrupt etc... and chance of bugs goes up drmatically. So I tend to do that for a Phase 2 - Bells and whistles phase and only when there isnt a simpler way. This thread has covered just about all approaches I can think of :-) thanks for the reponses. S On 10/27/06, Isaac Raway [EMAIL PROTECTED] wrote: Why don't you design the table with a unique row ID, stored in an integer field, then fetch a list of those ID numbers? For 5000 rows, assuming you store them in you application as 4 byte longs, that's about 19 k of memory. Counting that result as you receive it isn't that difficult. If it takes a long time (it probably won't) you can do it in another thread and update the interface as appropriate. I'm not seeing a downside here. Isaac On 10/26/06, Da Martian [EMAIL PROTECTED] wrote: No there isnt, but RDBM systems are a generalised data retrieval mechanism. As such they suffer from that generality. Dont get me wrong, RDBM systems are appropriate for 95% of all data requirements I have had to deal with and I would never dream of trying to write one from scratch, nor can I imagine a world without them. However certain applications (Weather data, Gnome data, Large indices (like google)) require using somethng designed specifically for that purpose. If you customise data retrieval (and particluar your sorting/indcies/access path) you can leave rdbms in the dust in terms of performance. All I have read about google, suggests they do exactly this. Although I must point out, I dont actually know anything about google with any certainty. Just what has leaked out over the years on the rumour mill. But designiing my own google like indices (on a smaller scale of coure) and some specialisted weather stuff, it neccessary to throw away the rdbms and do it yourself. For a goole query for instance, they know they will get a list of 1 or more words. They also know they will only ever search through the index of words. They dont have other data types, records or tables. Why go through all the hassles of compiling SQLs, and that generic overhead when your application will only ever do one thing? You can just make an API like this search(wordlist): Resultset. You immediatly save yourself complexity and processing time. Then for large indices you will know your data set, so instead of using a std BTree you would use a more appropraite DS possible with skip lists etc.. . As for performing a database search twice, this whole thread has shown, that sometimes the you have to :-) S On 10/25/06, John Stanton [EMAIL PROTECTED] wrote: There is no magic in data retrieval. Google use the same physical laws as us ordinary mortals. I see no reason to ever perform a dataabase search twice. -- Isaac Raway Entia non sunt multiplicanda praeter necessitatem. http://blueapples.org - blog http://stonenotes.com - personal knowledge management - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: Regarding sqlite3_exec
No there isnt, but RDBM systems are a generalised data retrieval mechanism. As such they suffer from that generality. Dont get me wrong, RDBM systems are appropriate for 95% of all data requirements I have had to deal with and I would never dream of trying to write one from scratch, nor can I imagine a world without them. However certain applications (Weather data, Gnome data, Large indices (like google)) require using somethng designed specifically for that purpose. If you customise data retrieval (and particluar your sorting/indcies/access path) you can leave rdbms in the dust in terms of performance. All I have read about google, suggests they do exactly this. Although I must point out, I dont actually know anything about google with any certainty. Just what has leaked out over the years on the rumour mill. But designiing my own google like indices (on a smaller scale of coure) and some specialisted weather stuff, it neccessary to throw away the rdbms and do it yourself. For a goole query for instance, they know they will get a list of 1 or more words. They also know they will only ever search through the index of words. They dont have other data types, records or tables. Why go through all the hassles of compiling SQLs, and that generic overhead when your application will only ever do one thing? You can just make an API like this search(wordlist): Resultset. You immediatly save yourself complexity and processing time. Then for large indices you will know your data set, so instead of using a std BTree you would use a more appropraite DS possible with skip lists etc.. . As for performing a database search twice, this whole thread has shown, that sometimes the you have to :-) S On 10/25/06, John Stanton [EMAIL PROTECTED] wrote: There is no magic in data retrieval. Google use the same physical laws as us ordinary mortals. I see no reason to ever perform a dataabase search twice.
Re: [sqlite] Re: Regarding sqlite3_exec
What prevents you from doing the same yourself? Just collect the data in a memory buffer as you get it, before processing it. Nothing but effiency as discussed. Basically, as Mr Cote has said, its either a count(*) query or O(n) step calls. The former means two statements have to be run, if this is a heavy statement its not that great. The latter is best option available, because as Mr Cote points out step isnt that heavy. The idea behind me prompting of this discussion is to get the best of both worlds. ie. the effiency of count(*) query without the need to execute two queries. At the end of the day its not serious as many work arounds/solutions are available. That doesn't seem to make sense – after all, the count can only be returned *after* all the rows have been collected. By then you know the count yourself anyway. But to return all the rows just to count them requires N calls to step. If the data set is large you only want to return a subset to start with. So you wouldnt know the count. If you dont know the count, you cant update GUI type things etc.. S
Re: [sqlite] Regarding sqlite3_exec
Hi Thanks for your resposne. In the end its not important as you point out as many options are available, I guess I allowed myself to indulge in idealic fantasy for a moment :-) S On 10/24/06, Dennis Cote [EMAIL PROTECTED] wrote: Da Martian wrote: Hi I understand the problem in thoery and you are saying that sqlite is using an iterative algorithm which returns data as it finds it, is this correct? It appears that DBs like oracle etc.. get the whole resultset into temporary memory/space and then return the query (at least for unions this appears true), although this is just based off assumptions based on observations. It seems to me that the DB (ie sqlite) can determine the number of results far more effiently than a client application. The reason is, the client application has to call step (n) times and a column extract (n x no columns). While the db could just move through the results set and count them up without every doing any processing on the data being counted. Perhaps this could done as a seperate api, like preparewithcount() which returns the count as well. With carefull design most of the work needed to prepare the statement etc.. could avoid being repeated as would happen with a count(*) query. This is just an idea, and I am speaking simply as a developer, but one who has not looked at sqlite implentation at all. Yes, sqlite iterates and returns each result row as it is located. SQLite also has a legacy sqlite3_get_table API that will return the entire result set in a table in ram. It can fail if there is not enough memory to hold the result set though. Your idea can (almost) be implemented in your application like this. int prepare_and_count(sqlite3* db, const char* sql, int len, sqlite3_stmt** s, const char** tail, int* count) { int rc = sqlite3_prepare(db, sql, len, s, tail); *count = 0; if (rc == SQLITE_OK) { while (sqlite3_step(*s) == SQLITE_ROW) ++(*count); sqlite3_reset(*s); } return rc; } This will avoid the need to prepare two queries by using the same query twice, once to count the result rows and a second time to collect the results. It does require N extra calls to sqlite3_step (which are very low overhead compared to the execution of a step). The extra calls to step are eliminated if you use a select count(*) query instead. With a count query SQLite will scan through the table as quickly as possible and count the results without stopping and returning to the caller after each row. But this requires a second query to be prepared. When you look at the big picture though, optimizing the count query isn't likely worth the effort. The count is usually only needed to implement GUI controls like scrollbars. The time is takes to collect the results and present them in the GUI will dominate the time it takes to prepare and execute a second count query unless the result set is very large. With large results the execution time of the count query dominates, and the overhead time to prepare the query becomes insignificant. It really doesn't take that long to prepare a count query. HTH Dennis Cote - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: Regarding sqlite3_exec
Yes but google doesnt us an RDMS, its all propriatary to support there speed and huge volumes. Its anyones guess (excpet google themselves) what exactly they do, and rumours abound, but I have done many apps which require custom data handling to achieve some end that doesnt fit with RDBM Systems. But yes paging and using LIMIT and OFFSET is also a solution. Again not as efficent though, cause of all the repeated queris :-) On 10/25/06, Martin Jenkins [EMAIL PROTECTED] wrote: Da Martian wrote: But to return all the rows just to count them requires N calls to step. If the data set is large you only want to return a subset to start with. So you wouldnt know the count. If you dont know the count, you cant update GUI type things etc.. I haven't been following this thread closely, but isn't this exactly the problem that Google solves by returning Results 1 - 10 of about 9,940,000 for sqlite. (0.11 seconds) for a query with a very large result set? If Google can't do it with all the resources they have... Martin - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: Regarding sqlite3_exec
Its quiet simple: 1) If there are too many results the user will have to wait a long time before they see anything because we will be buffering away results. The application will appear slugish. The user would get bored. I can point you to dozens of studies which show the correlation between response time and productivity where humans are concerned. 2) Often users will find what they want in the first 50 results. This means you would have wasted a lot of time brinigng back data you dont need. However they wont always find what they want in the first 50. So the option for more must be there. So why not use web like paging I hear you say. Well because the query is heavy. To re-run it each with a different limit and offset still requires re-running it. One of the solutions (there are many none ideal) is to have a growing scroll bar. Ie it grows each time you fetch a batch of results. But this like most of the solutions looks a little tardy to a user (me being one of them). Perosnally I hate it when a scroll bar keeps growing when you reach the bottom. The few other approaches have been mentioned in the previos post to this thread. Your extremly simplistic view on this is a result of never dealing in volumous data and result sets and quick running queries. Once you put volumes into your thinking cap you will begin to see why you dont just read everything into memory for the hell of it. Think about it. On 10/25/06, A. Pagaltzis [EMAIL PROTECTED] wrote: I still fail to understand the problem. Either you want the data from the query, or you don't. If you want it, then you run the query once and buffer the results in memory before you process them, so you know the count before you start processing. Or you don't want the data, then you use a COUNT(*) query. In either case, it is only one query you need to run.
Re: [sqlite] Re: Regarding sqlite3_exec
Your extremly simplistic view on this is a result of never dealing in volumous data and result sets and quick running queries. Once you put volumes into your thinking cap you will begin to see why you dont just read everything into memory for the hell of it. Just to complete the above thought. If I wanted everything in memory I would not have any need for sqlite. Standard Data Structures list BST, Hash etc.. will be far more effient for in memory use.
Re: [sqlite] Re: Regarding sqlite3_exec
Its was not meant as an insult, however you did set the tone with your post (ala: Either you want the data from the query, or you don't.). I mearly responded in kind. If you live in glass houses dont throw stones and all that. I mean its not hard to see that loading 20 million records into memory isnt the most effient approach to showing a list box on the screen. Thanks for your vote of confidence in my intelligence. Clearly, you are smart enough to figure out a solution without assistance. Nevermind, -- Aristotle Pagaltzis // http://plasmasturm.org/ - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: Regarding sqlite3_exec
Hi All these options are good, and the discussion was interesting. I mearly wanted to see what peoples thoughts on the sqlite providing this were. There are many ways to skin a cat as they say. Since this is probably drifting off topic, I suggest we discontinue the thread. Thanks for your cander, discussing is whats its all about. Stephen On 10/25/06, A. Pagaltzis [EMAIL PROTECTED] wrote: * Da Martian [EMAIL PROTECTED] [2006-10-25 15:05]: Its was not meant as an insult, however you did set the tone with your post (ala: Either you want the data from the query, or you don't.). I mearly responded in kind. If you live in glass houses dont throw stones and all that. I mean its not hard to see that loading 20 million records into memory isnt the most effient approach to showing a list box on the screen. I suggested that after you said that Oracle collects results in memory before returning them; you seemed to hint that this wouldn't be a problem, in which case whether you do it yourself or the database does it for you doesn't make a difference. Solutions that come to mind are a) to populate the UI from an idle callback, where the scollbar would simply keep growing independently from any user interaction until all the results are fetched; or if that's unacceptable, b) run a separate COUNT(*) query, since preparing a extra query is cheap, but using COUNT(*) tells SQLite that it can forgo a lot of processing, which makes the up-front counting query quicker. There are other options as well. A lot depends on your circumstances. F.ex. paged queries can be made cheaper by selecting results into a temporary table so that you can re-retrieve them with a much cheaper query. Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/ - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: Regarding sqlite3_exec
Indeed, Thanks to all for the responses. Currently I use the seperate count(*), but think I will change to the prepare, step(n), reset option only because the query is a little slow. Utlimatly sqlite is brilliant, hands down the best embeddable and standalone db I have yet to encounter. Thanks to all involved in it. S On 10/25/06, Nuno Lucas [EMAIL PROTECTED] wrote: On 10/25/06, Dennis Cote [EMAIL PROTECTED] wrote: Nuno Lucas wrote: There is another alternative if you don't mind to have the overhead of having an automatic row count (which sqlite avoids by design). It's by having a trigger that will update the table row count on each insert/delete. Nuno, This technique only works if you never use conditions on your queries. If your query returns a subset of the rows in a table this carefully maintained count of all the rows in the table is useless. Sure, but I wasn't trying to solve the general issue. The only solution for that is to run the query to full length, whatever the SQL engine is (even if they hide it from the user/programmer). The trigger(s) could be elaborated to specific queries, off course, and that would solve the GUI issue for fixed queries, but the generic case will never have a solution other than the full scan. Anyway, I never used this solution, just trying to show it can be solved for the specific cases of most GUI's, if the trigger overhead is not a problem. In my modest opinion, if one really needs to have an accurate scrollbar, one should show the rows by pages, with buttons to go to the next/prev page (and the scrollbar will be correct per page). No need to know the total rows in the view (unless we know the table doesn't grow that much that a select count(*), or a full select into memory, doesn't add much to the window rendering, which is most times). Regards, ~Nuno Lucas Dennis Cote. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Regarding sqlite3_exec
Hi The optimal way is that you prepare the statement, fetch and count the results with sqlite3_step. How would I fetch and count the results via sqlite3_step? Do you mean fetch all the records first? What if my result set is huge, and I would only like to show the first few records but still know how many there are? For exmaple, Lets say I run a SQL statement (its a very heavy statement consiting of joins and subqueries). It returns 5000 rows. For speed I dont want to retrieve 5000 rows, I want to setup a list which shows that there are 5000 rows on the scroll bar, but only retrieves the first say 20 for display. Is this possible? I know I can do a select count(*) from (SQL) but its a heavy query and then I would be running it twice? Any solution to this? S On 10/6/06, He Shiming [EMAIL PROTECTED] wrote: Hi List, If I use sqlite3_exec to query a database, How can I know that the results in the data base got over. For example If I am expecting a 10 results in some for loop and actually there are only five results , How can I get a notification or return value that the results completed or Is there any way I can get SQLITE_DONE through sqlite3_Exec. What return value I will get If I query an empty table. Thanks and Regards, Vivek R SQLite didn't provide a get number of rows function for the result set. It is mentioned in the document that sqlite3_exec is actually a wrapper for sqlite3_prepare and sqlite3_step. It is in my opinion that sqlite3_exec should only be used when the result of the query isn't that important. For instance, a pragma query. For the record, sqlite3_exec did provide a callback function in which you can count and get the number of rows in a resultset. The optimal way is that you prepare the statement, fetch and count the results with sqlite3_step. Another thing I noticed from your question is that you might not want to expect 10 results. It's not very wise to design a hard loop such as for(i=0;i10;i++) when comes to a database query resultset. A better way would be to use an array to store the result set they way you could understand, and process them later. Then you'll have for(i=0;iarray.size()10?array.size():10;i++). Best regards, He Shiming - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Regarding sqlite3_exec
Hi I understand the problem in thoery and you are saying that sqlite is using an iterative algorithm which returns data as it finds it, is this correct? It appears that DBs like oracle etc.. get the whole resultset into temporary memory/space and then return the query (at least for unions this appears true), although this is just based off assumptions based on observations. It seems to me that the DB (ie sqlite) can determine the number of results far more effiently than a client application. The reason is, the client application has to call step (n) times and a column extract (n x no columns). While the db could just move through the results set and count them up without every doing any processing on the data being counted. Perhaps this could done as a seperate api, like preparewithcount() which returns the count as well. With carefull design most of the work needed to prepare the statement etc.. could avoid being repeated as would happen with a count(*) query. This is just an idea, and I am speaking simply as a developer, but one who has not looked at sqlite implentation at all. Thanks for your response, it was very informative, helpfull and poinient. S On 10/24/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Da Martian [EMAIL PROTECTED] wrote: Hi The optimal way is that you prepare the statement, fetch and count the results with sqlite3_step. How would I fetch and count the results via sqlite3_step? Do you mean fetch all the records first? What if my result set is huge, and I would only like to show the first few records but still know how many there are? For exmaple, Lets say I run a SQL statement (its a very heavy statement consiting of joins and subqueries). It returns 5000 rows. For speed I dont want to retrieve 5000 rows, I want to setup a list which shows that there are 5000 rows on the scroll bar, but only retrieves the first say 20 for display. Is this possible? No, it is not possible. In the general case where there are user-defined functions in the query, returning the number of rows in the result set is equivalent to the halting problem. See http://en.wikipedia.com/wiki/Halting_problem Even in the absence of the artifical constructs that make the problem theoretically undecidable, the problem is still very hard. I am not aware of a general solution other than to run the query to completion and count the rows. I suspect that I can write a proof that no solution exists that is faster than running the query to completion, though I have never taken the time to actually write that proof out. You might be able to find special cases where you can predict the size of the result set without actually computing the result set. But such techniques would only work for very narrowly defined queries over tables with very narrowly defined data constraints. -- D. Richard Hipp [EMAIL PROTECTED] - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Query Optimisation Question?
Hi I have found that using an in clause with a subquery can be twice as fast as a straght join. Can enyone explain the logic of this to me? I am curious to understand it so I can optimise other queries for better performance. I have included the queries below: OT_TARGETS has 20 rows for regionid = 1 OT_PRODUCTS has 201 rows for regionid = 1 select distinct RegionID, ProductID, ProductName, ProductShortName, ProductRank from OT_PRODUCTS p Where RegionID = 1 and ProductID in (select distinct productid from ot_targets where regionid = 1) order by ProductRank, ProductName, ProductID; 2-3 seconds slower than above: select distinct t.RegionID, t.ProductID, p.ProductName, p.ProductShortName, p.ProductRank from OT_TARGETS t, OT_PRODUCTS p Where t.ProductID = p.ProductID and t.RegionID = p.RegionID and t.RegionID = 1 order by p.ProductRank, p.ProductName, p.ProductID; Thanks,
[sqlite] count(*) slow
Hi I have 3 million rows in a table which takes up about 3.1GB on disk. The count(*) is slow. I have run the analyze, but apart from creating the stats table it does nothing. Any reason why this is? Can it be improved ?