Re: [sqlite] Implicit Indices on Subqueries used as "lookups" or joins ....???

2008-12-02 Thread Da Martian
I still consider it a work around for adhoc queries. Programatically I can
of course use it easily, but when analysing data one runs many adhoc queires
which you change minute on minute. Having to create temp tables for each
change and give it a new name for each change is a real pain.

Further given platforms like Oracle dont appear to suffer from this problem,
I assume (as dangerous as that is) that they actually do create implicit
indices.

It is a nice to have I agree, but its big win nice to have! The ability to
build indices on temp tables already exists in Sqlite, surely it cant be too
hard to apply this when building temp tables from subqueries as you must
have parsed the join criteria to be able join the tables.

Thanks for the reply,

S


On Tue, Dec 2, 2008 at 3:57 PM, P Kishor <[EMAIL PROTECTED]> wrote:

> On 12/2/08, Da Martian <[EMAIL PROTECTED]> wrote:
> > Hi
> >
> >  I have continious issues with subquery performance when subqueries are
> used
> >  for joins. It crops up all the time my daily work.
> >
> >  If you create a derived table using a subquery and use it in a join
> SQLite
> >  performance is abysmal. However if you make a temp table from said
> subquery
> >  and index this temp table on the join keys, it goes at incredible speed.
> >
> >  Examples include a query which takes over 2 hours and doesnt complete as
> I
> >  killed it, to running in under 10 seconds if use the temp table pattern.
> >
> >  This pattern of the temp table has to be repeated for almost any data
> >  analysis I do as SQLite subquery performance with joins is so bad.
> >
> >  To recreate the problem simple create two subqueries which produce say
> 100
> >  000 records each with composite integer keys and join them.
> >
> >  e.g
> >
> >  Table1 (Key1, Key2, Key3, Value)
> >  Table2 (Key1, Key2, Key3, Value)
> >
> >  select *
> >  from
> > (select Key1, Key2, sum(Value) as Value) from Table1 group by Key1,
> >  Key2) t1 join
> > (select Key1, Key2, sum(Value) as Value) from Table2 group by Key1,
> >  Key2) t2 on
> >   (t1.Key1 = t2.Key1 and
> >t2.Key2 = t2.Key2)
> >
> >  Make sure T1 and Most esp T2 have large volumes of records to highlight
> the
> >  problem, eg. 100  000 each does the job. >2 hours versus 10 seconds on
> my
> >  hardware.
> >
> >
> >  Can SQLite be altered to automatically create an index on subqueries
> used as
> >  joins or lookups for the key fields used in the join or lookup. This
> would,
> >  in my experience and opinion make SQLite so much more effective. The
> cost in
> >  time of creating said indices is usually less 1 second on my hardware
> and
> >  examples and saves hours!
> >
> >
>
> I have experienced the same, and my solution is exactly as noted
> above... programmatically create temp tables with appropriate indexes,
> and then query with those temp tables. No need to even drop the temp
> tables as they go away when the connection is dropped.
>
> Works like a charm, so there has been really no need to want to have
> core SQLite do the same for me, but I guess it might be nice.
>
>
> --
> Puneet Kishor
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Implicit Indices on Subqueries used as "lookups" or joins ....???

2008-12-02 Thread Da Martian
Hi

I have continious issues with subquery performance when subqueries are used
for joins. It crops up all the time my daily work.

If you create a derived table using a subquery and use it in a join SQLite
performance is abysmal. However if you make a temp table from said subquery
and index this temp table on the join keys, it goes at incredible speed.

Examples include a query which takes over 2 hours and doesnt complete as I
killed it, to running in under 10 seconds if use the temp table pattern.

This pattern of the temp table has to be repeated for almost any data
analysis I do as SQLite subquery performance with joins is so bad.

To recreate the problem simple create two subqueries which produce say 100
000 records each with composite integer keys and join them.

e.g

Table1 (Key1, Key2, Key3, Value)
Table2 (Key1, Key2, Key3, Value)

select *
from
(select Key1, Key2, sum(Value) as Value) from Table1 group by Key1,
Key2) t1 join
(select Key1, Key2, sum(Value) as Value) from Table2 group by Key1,
Key2) t2 on
  (t1.Key1 = t2.Key1 and
   t2.Key2 = t2.Key2)

Make sure T1 and Most esp T2 have large volumes of records to highlight the
problem, eg. 100  000 each does the job. >2 hours versus 10 seconds on my
hardware.


Can SQLite be altered to automatically create an index on subqueries used as
joins or lookups for the key fields used in the join or lookup. This would,
in my experience and opinion make SQLite so much more effective. The cost in
time of creating said indices is usually less 1 second on my hardware and
examples and saves hours!

Thanks,
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Problems with joining and subqueries

2008-10-24 Thread Da Martian
Hi

I do appologise. In my hast, I did copy bits of a larger query. The queries
are correct they just have extra bits around the edges. I will post the
corrections below.

Regarding using "Case" I didnt know sqlite supported this. In fact looking
at core functions on the web there isnt any mention of a case statement, nor
in aggregate functions.
Knowing this I can restructure as you have suggested. My first attempt at
running the query as provided showed 10+ minutes performance similiar to the
two subqueries joined together meaning the temp table option still wins
hands down.

So I still have the question about why a temp table is quicker than a
subquery (derived table)?

>For this query, an index on (Customer, PrincipalContractNo, SASContract,
>BusinessArea, ProductGroup, Site), or any prefix of this list, might
>help.
Regarding this, I did create this index as mentioned in the post before,
however it performed worse than the PK index. The PK index query takes
around 1 minute. Creating the above index it balloons to over 10+ minutes
which I found strange as in theory a index matching the group by should help
collect the records together in the correct order.


Thanks again for your help, and sorry about the shody "copy paste" job.

Corrected Subqueires:

Corrections:
1st subquery:
SELECT cl.Customer as Customer, cl.PrincipalContractNo as
PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea,
cl.ProductGroup, cl.Site, rr.period_id as period, sum(cl.NCRAmt) as
NCRAmt, sum(rr.revenue_amount) as RevenueAmt
FROM
ROT_DIM_CONTRACT_LINE cl join
ROT_FACT_REV_ROLLOUT rr on
  (cl.ContractNo = rr.ContractNo and
 cl.ContractLine = rr.ContractLine)
where
rr.period_id > 200809
  group by
   cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea,
  cl.ProductGroup, cl.Site, rr.period_id

2nd Subquery:
SELECT cl.Customer as Customer, cl.PrincipalContractNo as
  PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea,
 cl.ProductGroup, cl.Site, sum(cl.NCRAmt) as NCRAmt
  FROM
ROT_DIM_CONTRACT_LINE cl join
ROT_FACT_REV_ROLLOUT rr on
   (cl.ContractNo = rr.ContractNo and
cl.ContractLine = rr.ContractLine)
  where
   rr.period_id <= 200809
  group by
   cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea,
  cl.ProductGroup, cl.Site

On Thu, Oct 23, 2008 at 4:52 PM, Igor Tandetnik <[EMAIL PROTECTED]> wrote:

> Da Martian <[EMAIL PROTECTED]> wrote:
> > I have two tables as follows:
> >
> > CREATE TABLE ROT_DIM_CONTRACT_LINE (ContractNo Integer,ContractLine
> > Integer,StartDate Integer,EndDate Integer,NCRAmt Float,Item
> > Integer,BusinessArea Text,ProductGroup Text,ProductStyle Text,Site
> > Text,Customer Integer,PrincipalContractNo Text,SASContract
> > Text,ContractMonths Float,StartMonths Float,FullMonths Float,EndMonths
> > Float,MonthlyAmortAmt Float,FirstAmortAmt Float,LastAmort
> > Float,BalancingAmortAmt Float,RolloutToContractDiff Float, PRIMARY KEY
> > (ContractNo ASC,ContractLine ASC))
> > -CREATE UNIQUE INDEX IDX_ROT_DIM_CONTRACT_LINE_1 ON
> > ROT_DIM_CONTRACT_LINE (ContractNo ASC,ContractLine ASC)
>
> You don't need this index. PRIMARY KEY clause has already created the
> same index implicitly.
>
> > CREATE TABLE ROT_FACT_REV_ROLLOUT (Period_ID Integer,ContractNo
> > Integer,ContractLine Integer,Revenue_Amount Float, PRIMARY KEY
> > (Period_ID
> > ASC,ContractNo ASC,ContractLine ASC))
> > CREATE UNIQUE INDEX IDX_ROT_FACT_REV_ROLLOUT_1 ON ROT_FACT_REV_ROLLOUT
> > (Period_ID ASC,ContractNo ASC,ContractLine ASC)
>
> Same here. Drop the index.
>
> > ROT_DIM_CONTRACT_LINE has 131,747 records
> > ROT_FACT_REV_ROLLOUT has 3,971,369 records
> >
> > The process I am doing is two fold:
> > 1) Joining the data for complete list
> > 2) Splitting the data by date (Period_ID) into two catagories based
> > on a
> > threshold date.
> >so if my threshold is Sep 2008 (200809) I want all records after
> > Sep 2008 to be displayed AND
> >I want a total column for all records prior to Sep 2008.
>
> > No case statements in SQLite so two subqueries does the job:
> >
> >> SEP 2008
> > select * from
> > (SELECT cl.Customer as Customer, cl.PrincipalContractNo as
> > PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea,
> > cl.ProductGroup, cl.Site, rr.period_id as period, sum(cl.NCRAmt) as
> > NCRAmt, sum(rr.revenue_amount) as RevenueAmt
> > FROM
> >  ROT_DIM_CONTRACT_LINE cl join
> >  ROT_FACT_REV_ROLLOUT rr on
> >(cl.ContractNo = rr.ContractNo and
> > cl.ContractLine = rr.ContractLine)
> > where
> >rr.period_id > 200809
> > group by
> >   cl.Customer, PrincipalContractNo, SASContrac

Re: [sqlite] Performance Problems with joining and subqueries

2008-10-23 Thread Da Martian
p and
fut.Site = ncritd.Site)
order by
   fut.Customer, fut.PrincipalContractNo, fut.SASContractNo,
fut.BusinessArea, fut.ProductGroup, fut.Site, fut.Period


However if I change the first query to be a temp table:
drop table fut
create temp table fut as
SELECT cl.Customer as Customer, cl.PrincipalContractNo as
PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea as
BusinessArea, cl.ProductGroup as ProductGroup, cl.Site as Site, rr.period_id
as period, sum(cl.NCRAmt) as NCRAmt, sum(rr.revenue_amount) as RevenueAmt
FROM
  ROT_DIM_CONTRACT_LINE cl join
  ROT_FACT_REV_ROLLOUT rr on
(cl.ContractNo = rr.ContractNo and
 cl.ContractLine = rr.ContractLine)
where
rr.period_id > 200809
group by
   cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea,
cl.ProductGroup, Site, rr.period_id;

I know get around 2 minutes for the temp table combined with the second
subquery:

select fut.Customer as Customer, fut.PrincipalContractNo as
PrincipalContractNo, fut.SASContractNo as SASContractNo, fut.BusinessArea as
BusinessArea, fut.ProductGroup as ProductGroup, fut.Site as Site,
ncritd.NCRAmt as MCR_ITD, fut.Period as Period, fut.RevenueAmt as
RevenueAmt
from
 fut fut join
(SELECT cl.Customer as Customer, cl.PrincipalContractNo as
PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea,
cl.ProductGroup, cl.Site, sum(cl.NCRAmt) as NCRAmt
FROM
ROT_DIM_CONTRACT_LINE cl join
ROT_FACT_REV_ROLLOUT rr on
   (cl.ContractNo = rr.ContractNo and
cl.ContractLine = rr.ContractLine)
where
   rr.period_id <= 200809
group by
   cl.Customer, PrincipalContractNo, SASContract, BusinessArea,
ProductGroup, Site) ncritd on
   (fut.Customer = ncritd.Customer and
fut.PrincipalContractNo = ncritd.PrincipalContractNo and
fut.SASContractNo = ncritd.SASContractNo and
fut.BusinessArea = ncritd.BusinessArea and
fut.ProductGroup = ncritd.ProductGroup and
fut.Site = ncritd.Site)
order by
   fut.Customer, fut.PrincipalContractNo, fut.SASContractNo,
fut.BusinessArea, fut.ProductGroup, fut.Site, fut.Period ;


So subqueries appear far more inefficient than temp tables.

Thanks,





On Thu, Oct 23, 2008 at 1:05 PM, Igor Tandetnik <[EMAIL PROTECTED]> wrote:

> "Da Martian" <[EMAIL PROTECTED]> wrote
> in message
> news:[EMAIL PROTECTED]<[EMAIL PROTECTED]>
> > Does anyone have any ideas on how to optimise this type of process in
> > SQLite?
>
> What type of process? Show your tables and your query.
>
> Igor Tandetnik
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Performance Problems with joining and subqueries

2008-10-23 Thread Da Martian
Hi

I have two queries. Each returns a full result set in around 1 minute when
run on there own. Making each of these queries a subquery and joining them
with an inner join, and the query takes more than 10 minutes. I cancelled it
after 10 minutes.

Does anyone have any ideas on how to optimise this type of process in
SQLite?

Indices only make things worse. I have read through most of the threads on
this and it still seems unclear as to when and why indices chosen seem to be
worse.

Analyse always seems to make things worse. So does adding indices.

I have a two tables:  one wide and low on records and one thin and having
around 4 million records. Withonly UNIQUE PK indices, the queries complete
in around 1 minute usually.

With an index that satisfies the group by, it takes over 10 minutes and
still doesnt complete. Analsye makes sure it uses the group by index as
well. To get best performance I seem to have to clear all indices and the
sqlite_stat1 table.


So I am at a loss as to how to take two 1 minute queriues and join them in
an efficient manner as subqueries.

If I make the first subquery into a temp table then do join the temp table
to the second query, it completes in 1.5 minutes.

I would expect the same performance from subqueries.

Thanksm
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Tip: Perfromance over large datasets

2008-10-22 Thread Da Martian
Hi

I suspect most of you know this, but since it helped me I decided to post
it. I have around 5 million records in a sqlite file of about 400MB. Running
joins over this was taking hours! And the disk went mental. Tweaking the
following two pragmas solved the problem though. Hours to seconds!

PRAGMA temp_store = 2;
PRAGMA cache_size = 10;

If this wasnt already so fast I would have just loaded the whole thing into
a :memory: db.

Regards
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Re: Unicode Help

2006-12-08 Thread Da Martian

EUREKA!

Ok I got it working now. It turns out my source was UTF-8 Encoded, so even
when i used the utf-16 functions it wasnt comming out right. I am now doing
a converstion in delphi from UTF-8 to UTF16 and using all UTF-16 sqlite
functions as recommended.

Thanks a million for all your help, it was all your suggestions which lead
me to the solution.

Much appreciated.

Have a good weekend.

S


On 12/8/06, Trevor Talbot <[EMAIL PROTECTED]> wrote:


On 12/7/06, Da Martian <[EMAIL PROTECTED]> wrote:

> Yeah I am currently using VirtualTree from Mikes Delphi Gems. Its fully
> unicode enabled (I beleive). I use WideStrings through out the entire
> pipeline from xml I recieve into SQLite via the prepare16 back out
through
> column_text16 into virtual tree. Well thats true, the SQL APIs are
mapped to
> return PWideChar which is then copied via System.Move into a widestring
as
> follows:

[ DLL interfaces ]

> Previously (before my langauge headaches :-)  ) I was doing the above
> without the APIs ending in 16, and everything was string and PChar in
the
> above layer. The layer that used this class has always had "WideString".
>
> I realise your probably not delphi pros, but if you do spot something
stupid
> I am doing I would appreciate any help you can offer.

I've never used Delphi, "but I did sleep at a Holiday Inn last night"...

It looks fine to me.  To help check it, one thing you can try is
writing the result of FieldAsString directly to a file as raw bytes,
then in notepad open that with "encoding" set to "Unicode".  E.g.
something logically equivalent to:

  size := Length(field) * 2;
  SetLength(buffer, size );
  System.Move(field^, buffer^, size);
  file.Write(buffer, size);

I imagine you don't have to jump through hoops like that, but
hopefully you see what I have in mind.  If the result looks good in
notepad, then you know this layer is fine, so the problem must be
closer to the display layer.


-
To unsubscribe, send email to [EMAIL PROTECTED]

-




Re: [sqlite] Re: Unicode Help

2006-12-08 Thread Da Martian



It looks fine to me.  To help check it, one thing you can try is
writing the result of FieldAsString directly to a file as raw bytes,
then in notepad open that with "encoding" set to "Unicode".  E.g.
something logically equivalent to:

  size := Length(field) * 2;
  SetLength(buffer, size );
  System.Move(field^, buffer^, size);
  file.Write(buffer, size);

I imagine you don't have to jump through hoops like that, but
hopefully you see what I have in mind.  If the result looks good in
notepad, then you know this layer is fine, so the problem must be
closer to the display layer.



Hi

How do you set Notepad to Ecnoding = Unicode. I cant see an option for that
?


Re: [sqlite] Unicode Help

2006-12-07 Thread Da Martian

Hi


You seem really confused about the whole encoding issue.


Yes definatly confused, I had always hope unicode would simplify the world,
but my experiences have shown no such luck :-) Codepages haunted my past and
encodings haunt my future :-)

Ok, that does answer one of my questions I think. If I passed something not
in UTF-8 to sqlite would it return it exactly the same way I passed it in?

From your statement of chaos below I assume it wont if that data somehow

violates UTF-8. So I need to get it to UTF-8 or UTF16 before I insert.

Thanks for the information.


Re: [sqlite] Re: Unicode Help

2006-12-07 Thread Da Martian

I am still having issues trying to get my charaters standardizsed. I spent
> all of yesterday playing with ideas but it still in the dark.

Whatever you were doing the first time was fine:



I have been having that very thought!


So if I look at a name with umlaughts in the database via sqlite3.exe I
get:
>
> Städt. Klinikum Neunkirchen gGmbH
>   --
>   |
>   an "a" with two dots on top

That text was properly encoded as UTF-8.  The ONLY issue with that
line is that the sqlite shell under Windows is incapable of displaying
Unicode, so you need to retrieve the data from sqlite using a tool
that is.  The actual storage of it is perfect.



I know the console sqlite3 wont show it. The delphi control I am using does
unicode.



> If I add the text using the *16 prepare and then retrieve it using the
*16
> column_text, I still get the two seperate characters instead of the
umlaught
> thingie. So I can only assume that somehow my source isnt UTF-16. or I
am
> converting it somewhere in the middle. This is possible since I am using
> Delphi and it has some implicit convertions, but I think I have got that
> under control.

AFAIK Delphi has no built-in Unicode support at all; you will need to
find third-party support for everything, from processing to display
controls.  It is likely you are ending up with UTF-8 data at some
point in the pipeline, and whatever you're doing to process it does
not understand UTF-8.



Yeah I am currently using VirtualTree from Mikes Delphi Gems. Its fully
unicode enabled (I beleive). I use WideStrings through out the entire
pipeline from xml I recieve into SQLite via the prepare16 back out through
column_text16 into virtual tree. Well thats true, the SQL APIs are mapped to
return PWideChar which is then copied via System.Move into a widestring as
follows:


Show code :)


* DLL *
function  sqlite3_open(filename: PWideChar; var db: pointer): integer;
cdecl; external 'sqlite3.dll' name 'sqlite3_open16';
function  sqlite3_close(db: pointer): integer; cdecl; external 'sqlite3.dll
';
function  sqlite3_exec(db: pointer; sql: PWideChar; callback: pointer;
userdata: PWideChar; var errmsg: PWideChar): integer; cdecl; external '
sqlite3.dll' name 'sqlite3_exec16';
procedure sqlite3_free(ptr: PWideChar); cdecl; external 'sqlite3.dll';
function  sqlite3_prepare(db: pointer; sql: PWideChar; nBytes: integer; var
stmt: pointer; var ztail: PWideChar): integer; cdecl; external 'sqlite3.dll'
name 'sqlite3_prepare16';
function  sqlite3_column_bytes(stmt: pointer; col: integer): integer; cdecl;
external 'sqlite3.dll' name 'sqlite3_column_bytes16';
function  sqlite3_column_text(stmt: pointer; col: integer): PWideChar;
cdecl; external 'sqlite3.dll' name 'sqlite3_column_text16';

*CLASS*
function TSqliteQueryResults.FieldAsString(i: integer): WideString;
var
   size: integer;
 temp: PWideChar;
begin
   size := FieldSize(i);
   SetLength(result, size div 2);
 temp := sqlite3_column_text(Fstmt, i);
   System.Move(sqlite3_column_text(Fstmt, i)^, PWideChar(result)^, size);
end;

function TSqliteQueryResults.FieldSize(i: integer): integer;
begin
   result := sqlite3_column_bytes(Fstmt, i);
end;

* END 

Previously (before my langauge headaches :-)  ) I was doing the above
without the APIs ending in 16, and everything was string and PChar in the
above layer. The layer that used this class has always had "WideString".

I realise your probably not delphi pros, but if you do spot something stupid
I am doing I would appreciate any help you can offer.

Thanks,


Re: [sqlite] Re: Unicode Help

2006-12-07 Thread Da Martian

I think std function for convertions would be very helpful.

I am still having issues trying to get my charaters standardizsed. I spent
all of yesterday playing with ideas but it still in the dark.

Part of my problem is I dont have a clue what my source data is encoded as.
Does anyone know of a tool which can try and guess the encoding? Basically
its a custom java bean written by someone else. It takes reports from a
third party system turns them into XML using a string buffer. They just
append everything to a string buffer. The code which actually adds this to
the output (the key peice) I cant actually see at this point. So by my best
guess based on research is that java usually uses UTF-16. But if this is so,
it should work.

If I add the text using the *16 prepare and then retrieve it using the *16
column_text, I still get the two seperate characters instead of the umlaught
thingie. So I can only assume that somehow my source isnt UTF-16. or I am
converting it somewhere in the middle. This is possible since I am using
Delphi and it has some implicit convertions, but I think I have got that
under control.

The problem is if I copy my source and paste it into Notepad say, it shows
correctly cause notepad then does it own stuff, and if I save the notepad
and read that it works fine. *sigh*.

So my questions  are:

1) Any tools to determine encoding based on datas?
2)
When using the
NON16 version of prepare:
 If I add text which is in UTF16 what happens?

16 Version:
If I add UTF16 text what happnes?
if I add UTF-8 Text what happens?
if I add ASCIII text what happnes?

Thanks,


Re: [sqlite] Powered-by Images?

2006-12-05 Thread Da Martian

Hi

I still cant seem to locate any, but I used the main icon from the homepage
and it looks pretty good on my about box.

Thanks,

On 11/29/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:


"Da Martian" <[EMAIL PROTECTED]> wrote:
> Hi
>
> Are there any powered by images for SQLite3?
>

A search for 'powered by sqlite' on google images
turned up two candidates for me.  Both seem adequate,
though neither is great.  You want to make a new
and better one and contribute it to the project?

--
D. Richard Hipp  <[EMAIL PROTECTED]>



-
To unsubscribe, send email to [EMAIL PROTECTED]

-




[sqlite] Unicode Help

2006-12-05 Thread Da Martian

Hi

I have a system up and working using sqlite3, but I think I am having
unicode issues and I am not sure how I should go about coding the solution.
I was hoping someone could share the approach needed. Here is my situation:

I have german characters which "Umlauts" which I would like to get back  out
of sqlite. An example is an "a" with two little dots on the top.

I have been using the non "16" versions. But in my mind thats ok, I just
want whatever I put in back out again. The facts that its unicode should
make a diff to sqlite. Unicode of 2 bytes say will be just be 2 normla chars
to sqlite. At least this was an assumption.

So if I look at a name with umlaughts in the database via sqlite3.exe I get:

Städt. Klinikum Neunkirchen gGmbH
  --
  |
  an "a" with two dots on top

Now I expected that when this was put back into a unicode field it would be
ok, but it doesnt seem to work.

So I tried the *16 versions, but now the field size returned by
"sqlite3_column_bytes16" always seems to be larger than the string I get
back resulting in junk characters on the end. So I get the Umlauts in my
application but all this other junk as well.

Any ideas ?


[sqlite] Powered-by Images?

2006-11-29 Thread Da Martian

Hi

Are there any powered by images for SQLite3?

Google seems at a loss which is unsual?

S


Re: [sqlite] Re: Regarding sqlite3_exec

2006-10-28 Thread Da Martian

Hi

Thanks for the reposnse. The main reason is my record count could be from a
few thousands to a million. But even at the lowly numbers of around 1
the interface can seem slugish if you read every record before displaying
anything.

As you mention, and has been disucssed above, doing stuff in the background
is good way to go, but more complex. As a generla rule of coding I put as
few unneccessary threads into a "phase 1" program as I can, because the
complexity goes up hugly, threads can be complex to use, co-ordinate,
interrupt etc... and chance of bugs goes up drmatically. So I tend to do
that for a "Phase 2" - Bells and whistles phase and only when there isnt a
simpler way.

This thread has covered just about all approaches I can think of :-) thanks
for the reponses.

S


On 10/27/06, Isaac Raway <[EMAIL PROTECTED]> wrote:


Why don't you design the table with a unique row ID, stored in an
integer field, then fetch a list of those ID numbers?

For 5000 rows, assuming you store them in you application as 4 byte
longs, that's about 19 k of memory.

Counting that result as you receive it isn't that difficult. If it
takes a long time (it probably won't) you can do it in another thread
and update the interface as appropriate.

I'm not seeing a downside here.

Isaac

On 10/26/06, Da Martian <[EMAIL PROTECTED]> wrote:
> No there isnt, but RDBM systems are a generalised data retrieval
mechanism.
> As such they suffer from that generality.
>
> Dont get me wrong, RDBM systems  are appropriate for 95% of all data
> requirements I have had to deal with and I would never dream of trying
to
> write one from scratch, nor can I imagine a world without them.
>
> However certain applications (Weather data, Gnome data, Large indices
(like
> google)) require using somethng designed specifically for that purpose.
If
> you customise data retrieval (and particluar your sorting/indcies/access
> path) you can leave rdbms in the dust in terms of performance. All I
have
> read about google, suggests they do exactly this. Although I must point
out,
> I dont actually know anything about google with any certainty. Just what
has
> "leaked" out over the years on the rumour mill. But designiing my own
> "google" like indices (on a smaller scale of coure) and some
specialisted
> weather stuff, it neccessary to throw away the rdbms and do it yourself.
For
> a goole query for instance, they know they will get a list of 1 or more
> words. They also know they will only ever search through the index of
words.
> They dont have other data types, records or tables. Why go through all
the
> hassles of compiling SQLs, and that generic overhead when your
application
> will only ever do one thing? You can just make an API like this
> "search(wordlist): Resultset. "
>
> You immediatly save yourself complexity and processing time. Then for
large
> indices you will know your data set, so instead of using a std BTree you
> would use a more appropraite DS possible with skip lists etc..
> .
> As for performing a database search twice, this whole thread has shown,
that
> sometimes the you have to :-)
>
> S
>
> On 10/25/06, John Stanton <[EMAIL PROTECTED]> wrote:
> >
> > There is no magic in data retrieval.  Google use the same physical
laws
> > as us ordinary mortals.
> >
> > I see no reason to ever perform a dataabase search twice.
> >
>
>


--
Isaac Raway
Entia non sunt multiplicanda praeter necessitatem.

http://blueapples.org - blog
http://stonenotes.com - personal knowledge management


-
To unsubscribe, send email to [EMAIL PROTECTED]

-




Re: [sqlite] Re: Regarding sqlite3_exec

2006-10-26 Thread Da Martian

No there isnt, but RDBM systems are a generalised data retrieval mechanism.
As such they suffer from that generality.

Dont get me wrong, RDBM systems  are appropriate for 95% of all data
requirements I have had to deal with and I would never dream of trying to
write one from scratch, nor can I imagine a world without them.

However certain applications (Weather data, Gnome data, Large indices (like
google)) require using somethng designed specifically for that purpose. If
you customise data retrieval (and particluar your sorting/indcies/access
path) you can leave rdbms in the dust in terms of performance. All I have
read about google, suggests they do exactly this. Although I must point out,
I dont actually know anything about google with any certainty. Just what has
"leaked" out over the years on the rumour mill. But designiing my own
"google" like indices (on a smaller scale of coure) and some specialisted
weather stuff, it neccessary to throw away the rdbms and do it yourself. For
a goole query for instance, they know they will get a list of 1 or more
words. They also know they will only ever search through the index of words.
They dont have other data types, records or tables. Why go through all the
hassles of compiling SQLs, and that generic overhead when your application
will only ever do one thing? You can just make an API like this
"search(wordlist): Resultset. "

You immediatly save yourself complexity and processing time. Then for large
indices you will know your data set, so instead of using a std BTree you
would use a more appropraite DS possible with skip lists etc..
.
As for performing a database search twice, this whole thread has shown, that
sometimes the you have to :-)

S

On 10/25/06, John Stanton <[EMAIL PROTECTED]> wrote:


There is no magic in data retrieval.  Google use the same physical laws
as us ordinary mortals.

I see no reason to ever perform a dataabase search twice.



Re: [sqlite] Re: Regarding sqlite3_exec

2006-10-25 Thread Da Martian

Indeed,

Thanks to all for the responses. Currently I use the seperate count(*), but
think I will change to the "prepare, step(n), reset" option only because the
query is a little slow.

Utlimatly sqlite is brilliant, hands down the best embeddable and standalone
db I have yet to encounter. Thanks to all involved in it.

S


On 10/25/06, Nuno Lucas <[EMAIL PROTECTED]> wrote:


On 10/25/06, Dennis Cote <[EMAIL PROTECTED]> wrote:
> Nuno Lucas wrote:
> >
> > There is another alternative if you don't mind to have the overhead of
> > having an automatic row count (which sqlite avoids by design). It's by
> > having a trigger that will update the table row count on each
> > insert/delete.
> >
> Nuno,
>
> This technique only works if you never use conditions on your queries.
> If your query returns a subset of the rows in a table this carefully
> maintained count of all the rows in the table is useless.

Sure, but I wasn't trying to solve the general issue. The only
solution for that is to run the query to full length, whatever the SQL
engine is (even if they hide it from the user/programmer).

The trigger(s) could be elaborated to specific queries, off course,
and that would solve the GUI issue for fixed queries, but the generic
case will never have a solution other than the full scan.

Anyway, I never used this solution, just trying to show it can be
solved for the specific cases of most GUI's, if the trigger overhead
is not a problem.

In my modest opinion, if one really needs to have an accurate
scrollbar, one should show the rows by pages, with buttons to go to
the next/prev page (and the scrollbar will be correct per page). No
need to know the total rows in the view (unless we know the table
doesn't grow that much that a "select count(*)", or a full select into
memory, doesn't add much to the window rendering, which is most
times).


Regards,
~Nuno Lucas


>
> Dennis Cote.


-
To unsubscribe, send email to [EMAIL PROTECTED]

-




Re: [sqlite] Re: Regarding sqlite3_exec

2006-10-25 Thread Da Martian

Hi

All these options are good, and the discussion was interesting. I mearly
wanted to see what peoples thoughts on the sqlite providing this were. There
are many ways to skin a cat as they say. Since this is probably drifting off
topic, I suggest we discontinue the thread.

Thanks for your cander, discussing is whats its all about.

Stephen


On 10/25/06, A. Pagaltzis <[EMAIL PROTECTED]> wrote:


* Da Martian <[EMAIL PROTECTED]> [2006-10-25 15:05]:
> Its was not meant as an insult, however you did set the tone
> with your post (ala: Either you want the data from the query,
> or you don't.). I mearly responded in kind. If you live in
> glass houses dont throw stones and all that. I mean its not
> hard to see that loading 20 million records into memory isnt
> the most effient approach to showing a list box on the screen.

I suggested that after you said that Oracle collects results in
memory before returning them; you seemed to hint that this
wouldn't be a problem, in which case whether you do it yourself
or the database does it for you doesn't make a difference.

Solutions that come to mind are  a) to populate the UI from an
idle callback, where the scollbar would simply keep growing
independently from any user interaction until all the results are
fetched; or if that's unacceptable,  b) run a separate COUNT(*)
query, since preparing a extra query is cheap, but using COUNT(*)
tells SQLite that it can forgo a lot of processing, which makes
the up-front counting query quicker.

There are other options as well. A lot depends on your
circumstances. F.ex. paged queries can be made cheaper by
selecting results into a temporary table so that you can
re-retrieve them with a much cheaper query.

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>


-
To unsubscribe, send email to [EMAIL PROTECTED]

-




Re: [sqlite] Re: Regarding sqlite3_exec

2006-10-25 Thread Da Martian

Its was not meant as an insult, however you did set the tone with your post
(ala: Either you want the data
from the query, or you don't.). I mearly responded in kind. If you live in
glass houses dont throw stones and all that. I mean its not hard to see that
loading 20 million records into memory isnt the most effient approach to
showing a list box on the screen.


Thanks for your vote of confidence in my intelligence. Clearly,

you are smart enough to figure out a solution without assistance.

Nevermind,
--
Aristotle Pagaltzis // 


-
To unsubscribe, send email to [EMAIL PROTECTED]

-




Re: [sqlite] Re: Regarding sqlite3_exec

2006-10-25 Thread Da Martian

Your extremly simplistic view on this is a result of never dealing in

volumous data and result sets and quick running queries. Once you >put
volumes into your thinking cap you will begin to see why you dont just read
everything into memory for the hell of it.

Just to complete the above thought. If I wanted everything in memory I would
not have any need for sqlite. Standard Data Structures list BST, Hash etc..
will be far more effient for in memory use.



>



Re: [sqlite] Re: Regarding sqlite3_exec

2006-10-25 Thread Da Martian

Its quiet simple:

1) If there are too many results the user will have to wait a long time
before they see anything because we will be buffering away results. The
application will appear slugish. The user would get bored. I can point you
to dozens of studies which show the correlation between response time and
productivity where humans are concerned.
2) Often users will find what they want in the first 50 results. This means
you would have wasted a lot of time brinigng back data you dont need.
However they wont always find what they want in the first 50. So the option
for more must be there. So why not use "web" like paging I hear you say.
Well because the query is heavy. To re-run it each with a different limit
and offset still requires re-running it. One of the solutions (there are
many none ideal) is to have a growing scroll bar. Ie it grows each time you
fetch a batch of results. But this like most of the solutions looks a little
tardy to a user (me being one of them). Perosnally I hate it when a scroll
bar keeps growing when you reach the bottom.

The few other approaches have been mentioned in the previos post to this
thread.

Your extremly simplistic view on this is a result of never dealing in
volumous data and result sets and quick running queries. Once you put
volumes into your thinking cap you will begin to see why you dont just read
everything into memory for the hell of it.

Think about it.


On 10/25/06, A. Pagaltzis <[EMAIL PROTECTED]> wrote:


I still fail to understand the problem. Either you want the data
from the query, or you don't. If you want it, then you run the
query once and buffer the results in memory before you process
them, so you know the count before you start processing. Or you
don't want the data, then you use a COUNT(*) query. In either
case, it is only one query you need to run.




Re: [sqlite] Re: Regarding sqlite3_exec

2006-10-25 Thread Da Martian

Yes but google doesnt us an RDMS, its all propriatary to support there speed
and huge volumes. Its anyones guess (excpet google themselves) what exactly
they do, and rumours abound, but I have done many apps which require custom
data handling to achieve some end that doesnt fit with RDBM Systems.

But yes paging and using LIMIT and OFFSET is also a solution. Again not as
efficent though, cause of all the repeated queris :-)


On 10/25/06, Martin Jenkins <[EMAIL PROTECTED]> wrote:


Da Martian wrote:

> But to return all the rows just to count them requires N calls to
> step. If the data set is large you only want to return a subset to
> start with. So you wouldnt know the count. If you dont know the
> count, you cant update GUI type things etc..

I haven't been following this thread closely, but isn't this exactly the
problem that Google "solves" by returning

  "Results 1 - 10 of about 9,940,000 for sqlite. (0.11 seconds)"

for a query with a very large result set? If Google can't do it with all
the resources they have...

Martin


-
To unsubscribe, send email to [EMAIL PROTECTED]

-




Re: [sqlite] Regarding sqlite3_exec

2006-10-25 Thread Da Martian

Hi

Thanks for your resposne.

In the end its not important as you point out as many options are available,
I guess I allowed myself to indulge in "idealic" fantasy for a moment :-)

S

On 10/24/06, Dennis Cote <[EMAIL PROTECTED]> wrote:


Da Martian wrote:
> Hi
>
> I understand the problem in thoery and you are saying that sqlite is
> using
> an iterative algorithm which returns data as it finds it, is this
> correct?
>
> It appears that DBs like oracle etc.. get the whole resultset into
> temporary
> memory/space and then return the query (at least for unions this appears
> true), although this is just based off assumptions based on
observations.
>
> It seems to me that the DB (ie sqlite) can determine the number of
> results
> far more effiently than a client application. The reason is, the client
> application has to call step (n) times and a column extract (n x no
> columns). While the db could just move through the results set and count
> them up without every doing any processing on the data being counted.
> Perhaps this could done as a seperate api, like preparewithcount() which
> returns the count as well. With carefull design most of the work
> needed to
> prepare the statement etc.. could avoid being repeated as would happen
> with
> a count(*) query.
>
> This is just an idea, and I am speaking simply as a developer, but one
> who
> has not looked at sqlite implentation at all.
>
Yes, sqlite iterates and returns each result row as it is located.

SQLite also has a legacy sqlite3_get_table API that will return the
entire result set in a table in ram. It can fail if there is not enough
memory to hold the result set though.

Your idea can (almost) be implemented in your application like this.

int prepare_and_count(sqlite3* db, const char* sql, int len,
sqlite3_stmt** s, const char** tail, int* count)
{
int rc = sqlite3_prepare(db, sql, len, s, tail);

*count = 0;
if (rc == SQLITE_OK) {
while (sqlite3_step(*s) == SQLITE_ROW)
++(*count);
sqlite3_reset(*s);
}

return rc;
}

This will avoid the need to prepare two queries by using the same query
twice, once to count the result rows and a second time to collect the
results. It does require N extra calls to sqlite3_step (which are very
low overhead compared to the execution of a step).

The extra calls to step are eliminated if you use a "select count(*)
" query instead. With a count query SQLite will scan through the
table as quickly as possible and count the results without stopping and
returning to the caller after each row. But this requires a second query
to be prepared.

When you look at the big picture though, optimizing the count query
isn't likely worth the effort. The count is usually only needed to
implement GUI controls like scrollbars. The time is takes to collect the
results and present them in the GUI will dominate the time it takes to
prepare and execute a second count query unless the result set is very
large. With large results the execution time of the count query
dominates, and the overhead time to prepare the query becomes
insignificant. It really doesn't take that long to prepare a count query.

HTH
Dennis Cote




-
To unsubscribe, send email to [EMAIL PROTECTED]

-




Re: [sqlite] Re: Regarding sqlite3_exec

2006-10-25 Thread Da Martian

What prevents you from doing the same yourself? Just collect the
data in a memory buffer as you get it, before processing it.


Nothing but effiency as discussed. Basically, as Mr Cote has said, its
either a count(*) query or O(n) step calls. The former means two statements
have to be run, if this is a heavy statement its not that great. The latter
is best option available, because as Mr Cote points out step isnt that
heavy. The idea behind me prompting of this discussion is to get the best of
both worlds. ie. the effiency of count(*) query without the need to execute
two queries. At the end of the day its not serious as many work
arounds/solutions are available.


That doesn't seem to make sense – after all, the count can only
be returned *after* all the rows have been collected. By then you
know the count yourself anyway.


But to return all the rows just to count them requires N calls to step. If
the data set is large you only want to return a subset to start with. So you
wouldnt know the count. If you dont know the count, you cant update GUI type
things etc..

S


Re: [sqlite] Regarding sqlite3_exec

2006-10-24 Thread Da Martian

Hi

I understand the problem in thoery and you are saying that sqlite is using
an iterative algorithm which returns data as it finds it, is this correct?

It appears that DBs like oracle etc.. get the whole resultset into temporary
memory/space and then return the query (at least for unions this appears
true), although this is just based off assumptions based on observations.

It seems to me that the DB (ie sqlite) can determine the number of results
far more effiently than a client application. The reason is, the client
application has to call step (n) times and a column extract (n x no
columns). While the db could just move through the results set and count
them up without every doing any processing on the data being counted.
Perhaps this could done as a seperate api, like preparewithcount() which
returns the count as well. With carefull design most of the work needed to
prepare the statement etc.. could avoid being repeated as would happen with
a count(*) query.

This is just an idea, and I am speaking simply as a developer, but one who
has not looked at sqlite implentation at all.

Thanks for your response, it was very informative, helpfull and poinient.

S


On 10/24/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:


"Da Martian" <[EMAIL PROTECTED]> wrote:
> Hi
>
>
> >The optimal way is that you prepare the statement, fetch and
> > count the results with sqlite3_step.
>
> How would I "fetch and count" the results via sqlite3_step?
>
> Do you mean fetch all the records first? What if my result set is huge,
and
> I would only like to show the first few records but still know how many
> there are?
>
> For exmaple, Lets say I run a SQL statement (its a very heavy statement
> consiting of joins and subqueries). It returns 5000 rows. For speed I
dont
> want to retrieve 5000 rows, I want to setup a list which shows that
there
> are 5000 rows on the scroll bar, but only retrieves the first say 20 for
> display.
>
> Is this possible?

No, it is not possible.  In the general case where there are
user-defined functions in the query, returning the number of
rows in the result set is equivalent to the halting problem.
See

   http://en.wikipedia.com/wiki/Halting_problem

Even in the absence of the artifical constructs that make the
problem theoretically undecidable, the problem is still very
hard.  I am not aware of a general solution other than to
run the query to completion and count the rows.  I suspect
that I can write a proof that no solution exists that is
faster than running the query to completion, though I have
never taken the time to actually write that proof out.

You might be able to find special cases where you can predict
the size of the result set without actually computing the
result set.  But such techniques would only work for very
narrowly defined queries over tables with very narrowly
defined data constraints.

--
D. Richard Hipp  <[EMAIL PROTECTED]>



-
To unsubscribe, send email to [EMAIL PROTECTED]

-




Re: [sqlite] Regarding sqlite3_exec

2006-10-24 Thread Da Martian

Hi



The optimal way is that you prepare the statement, fetch and
count the results with sqlite3_step.


How would I "fetch and count" the results via sqlite3_step?

Do you mean fetch all the records first? What if my result set is huge, and
I would only like to show the first few records but still know how many
there are?

For exmaple, Lets say I run a SQL statement (its a very heavy statement
consiting of joins and subqueries). It returns 5000 rows. For speed I dont
want to retrieve 5000 rows, I want to setup a list which shows that there
are 5000 rows on the scroll bar, but only retrieves the first say 20 for
display.

Is this possible? I know I can do a "select count(*) from (SQL)" but its a
heavy query and then I would be running it twice?

Any solution to this?

S




On 10/6/06, He Shiming <[EMAIL PROTECTED]> wrote:


> Hi List,
>  If I use sqlite3_exec to query a database,
> How can I know that the results in the data base got over. For example
If
> I am expecting a 10 results in some for loop and actually there are only
> five results , How can I get a notification or return value that the
> results completed or Is there any way I can get SQLITE_DONE through
> sqlite3_Exec.  What return value I will get If I query an empty table.
>
>
> Thanks and Regards,
>  Vivek R
>

SQLite didn't provide a "get number of rows" function for the result set.
It
is mentioned in the document that sqlite3_exec is actually a wrapper for
sqlite3_prepare and sqlite3_step. It is in my opinion that sqlite3_exec
should only be used when the result of the query isn't that important. For
instance, a pragma query. For the record, sqlite3_exec did provide a
callback function in which you can count and get the number of rows in a
resultset. The optimal way is that you prepare the statement, fetch and
count the results with sqlite3_step.

Another thing I noticed from your question is that you might not want to
"expect 10 results". It's not very wise to design a hard loop such as
for(i=0;i<10;i++) when comes to a database query resultset. A better way
would be to use an array to store the result set they way you could
understand, and process them later. Then you'll have
for(i=0;i

[sqlite] Query Optimisation Question?

2006-09-16 Thread Da Martian

Hi

I have found that using an in clause with a subquery can be twice as fast as
a straght join. Can enyone explain the logic of this to me? I am curious to
understand it so I can optimise other queries for better performance.

I have included the queries below:

OT_TARGETS has 20 rows for regionid = 1
OT_PRODUCTS has 201 rows for regionid = 1

select distinct RegionID, ProductID, ProductName,  ProductShortName,
ProductRank
from
  OT_PRODUCTS p
Where
  RegionID = 1  and
  ProductID in (select distinct productid from ot_targets where regionid =
1)
order by
  ProductRank,
  ProductName,
  ProductID;

2-3 seconds slower than above:

select distinct t.RegionID, t.ProductID, p.ProductName,  p.ProductShortName,
p.ProductRank
from
OT_TARGETS t,
OT_PRODUCTS p
Where
   t.ProductID = p.ProductID and
   t.RegionID = p.RegionID and
   t.RegionID = 1
 order by
  p.ProductRank,
  p.ProductName,
  p.ProductID;


Thanks,


[sqlite] Index Optimisation

2006-09-16 Thread Da Martian

Hi

I have a very simple database with three tables. I require joining the
three tables together.

I am trying to optmise a query which joins all three and sorts the
results. I cannot tell if sqlite uses my indeces or not. Is there a way
to tell what indexes are used in a query ? I join by the first PK, but
the sort is by additonal columns.

How can I go about optimising this query? Any ideas? I just cant see
how to tell if changes are helping espeicially with disk caching etc...

My tables are like this, and the query follows:

Products:
  Region: PK Integer
  Product: PK Integer
  Rank: TEXT
  name: TEXT

Customers:
  Region: PK Integer
  Customer: pK Integer;
  Rank: TEXT
  name: TEXT

Targets:
  Region: PK Integer
  Customer: pK Integer;
  Product: PK Integer

Select *
from
  Targets t,
  Customers c,
  Products p
where
  t.Region = c.REgion and
  t.Customer = c.Customer and
  T.REgion = p.region and
  t.product = p.product and
  t.region = 1
order by
  c.rank,
  c.name,
  c.customer,
  p.rank,
  p.name,
  p.product
Thanks,


[sqlite] count(*) slow

2005-09-15 Thread Da Martian
Hi

I have 3 million rows in a table which takes up about 3.1GB on disk. The 
count(*) is slow. 

I have run the analyze, but apart from creating the stats table it does 
nothing. 

Any reason why this is? Can it be improved ?