[firebird-support] Re: In memory sorting consumes alot of space

2013-05-18 Thread haris_p_sw


--- In firebird-support@yahoogroups.com, Alexandre Benson Smith iblist@... 
wrote:


  Here is my query which is a join between three tables. prdInCatTab holds 
  the keys to product and prdCategory and it's a one-to-many relationship:
 
  SELECT FIRST 10 SKIP 500 *
  FROM product JOIN prdInCatTab ON prdInCatTab.PINCPrdId = product.prdId
  JOIN prdCategory ON prdCategory.catId = prdInCatTab.PINCCatId
  WHERE prdCategory.catShowFlag = '1'
  AND product.prdHideFlag != '1'
  AND prdInCatTab.PINCPrimary = '1'
  ORDER BY prdInCatTab.PINCRank DESC, product.nameUp ASC;
 
  And here is the plan:
 
  PLAN SORT (JOIN (PRDINCATTAB NATURAL, PRDCATEGORY INDEX (PK_PRDCATEGORY), 
  PRODUCT INDEX (PK_PRODUCT)))
 
  Well, my actual question could be: Why does Firebird consume such a large 
  amount of sorting space? It is almost impossible for a web server which has 
  over 100 connections to allocate such amounts of memory space. As I wrote 
  in my first post here are my numbers:
 
  1. Table product's size is between 30 and 40 MB. The other two have small 
  size.
 
  2. TempCacheLimit = 536870912 with Classic Super Server if you want to have 
  an acceptable run(4-5 secs first run on my laptop, around 2 secs all next 
  runs).
 
  3. Firebird server does actually need from 200 to 450 MB for each run.
 
  Although my production servers are much faster than my laptop, I replaced 
  the above query with another that uses a subquery as a virtual table. It 
  turned out very fast and not memory consuming:
 
  SELECT * FROM product
   JOIN prdInCatTab ON prdInCatTab.PINCPrdId = product.prdId
   JOIN prdCategory ON prdCategory.catId = prdInCatTab.PINCCatId
   JOIN
  (
  SELECT FIRST 10 SKIP 500 product.prdId as bprdid
  FROM product JOIN prdInCatTab ON prdInCatTab.PINCPrdId = product.prdId
  JOIN prdCategory ON prdCategory.catId = prdInCatTab.PINCCatId
  WHERE catShowFlag = '1'
  AND product.prdHideFlag != '1'
  AND prdInCatTab.PINCPrimary = '1'
  ORDER BY PINCRank DESC, product.nameUp ASC
  ) AS b ON product.prdid = b.bprdid
  AND prdInCatTab.PINCPrimary = '1'
 
  and the plan:
  PLAN JOIN (JOIN (SORT (JOIN (B PRDINCATTAB NATURAL, B PRDCATEGORY INDEX 
  (PK_PRDCATEGORY), B PRODUCT INDEX (PK_PRODUCT))), JOIN (PRDINCATTAB INDEX 
  (FK_PINC_PRODUCT), PRODUCT INDEX (PK_PRODUCT))), PRDCATEGORY INDEX 
  (PK_PRDCATEGORY))
 
  But you can't always substitute your queries easily, can you?
  And it still bothers me as a Firebird user how postgres does it so well.
 
  Regards,
  Haris
 
 
 
 
 
 
 There are blobs in any of the tables ? Since you are using select *, any 
 blobs will be returned. IIRC there is a memory consumition above the 
 expected when blobs are used in sorting (group by, distinct, etc.) I 
 can't recall the exact details and in wich FB version are affected. 
 Perhaps some one could put more information about it. Something 
 regarding the memory be freed just when the transaction is commited or 
 something like that.
 
 But, could you try to remove * and put the just the necessary fields ?
 
 If not, could you put the table strucure of the involved tables ? 
 Perhaps you have large varchar fields that leads to a huge memory usage.
 
 see you !


Hi Set,
Thanks for your reply.

Yes, there are blobs and quite large varchar fields(160). I have already tested 
that if I select only some columns then the query execution time is 
dramatically reduced. My opinion is that it doesn't matter which columns you 
select, what matters is total size of the returned dataset. Actually, that's 
why my solution with the subquery works fast. Because it does the sorting on a 
dataset with very few char and integer columns. Maybe the optimizer could 
follow a similar plan.

Regards,
Haris 



[firebird-support] Re: In memory sorting consumes alot of space

2013-05-18 Thread haris_p_sw
 
 I don't know myself, perhaps postgres applies a simple compression to the 
 intermediate sort files to minimize disk space usage.
 
 The fact that your query create a fully exploded result set of all columns in 
 all of the query tables doesn't help -- it seems very unnecessary -- it only 
 increases the size of the temp sort file.
 
 One possible performance optimization (depending on the selectivity of the 
 prdInCatTab.PINCPrimary column) would be to add an index on this column.  
 This will not reduce the size of the temp sort file, but could improve 
 overall query performance.
 
 
 Sean



Hi Sean,
Thanks for your reply.

Yes. I know that select * doesn't help and I agree with you that the key here 
is the total size of the dataset to be sorted.

I created the index you 've suggested but the optimizer doesn't use it.

From these discussions (correct me if I'm wrong) I understand that there is a 
design issue. I can accept it and try to avoid hitting on the wall from now on 
by writing queries that sort small sized datasets if NATURAL PLAN is 
unavoidable. But I also hope that these discussions will guide Firebird 
developers to have a look on such issues.

Regards,
Haris



[firebird-support] Re: In memory sorting consumes alot of space

2013-05-17 Thread haris_p_sw


--- In firebird-support@yahoogroups.com, Svein Erling Tysv�r 
svein.erling.tysvaer@... wrote:

 Hi,
 
 I 'd like to share my recent testing experiences with you hoping to help in 
 Firebird development.
 
 I 've been using Firebird (classic server  70 connections for web serving) 
 for 13 years now.
 
 Recently, I decided to test certain suspicious select queries more 
 carefully. Here are my conclusions 
 and I hope you prove me wrong:
 
 1. If the query plan uses indexing, everything works OK and very fast.
 
 2. If the query plan goes the NATURAL way then sorting happens which is 
 totally expected. However, 
 if the tuning variable TempCacheLimit has a value that is not big enough for 
 the dataset the sorting 
 process happens on hard disk and is extremely slow. it depends on the 
 machine and the dataset but it 
 has to be a very powerful machine to manage it under 10 secs. On my new 
 64-bit laptop, Firebird 2.5.2
 for a dataset around 30-40 MB I got 35-40 seconds.
 The NATURAL plan was unavoidable since there was a need of ordering by two 
 columns from two different 
 tables.
 I started tuning TempCacheLimit, LockMemSize, DefaultDbCachePages and 
 FileSystemCacheThreshold. I 
 discovered that caching in memory does not speed up things so I gave 
 FileSystemCacheThreshold a big 
 value. I managed to get an execution time around 2 seconds(4 for the first 
 run) when TempCacheLimit 
 was: 536870912(classic super server. With plain classic server you need more 
 memory).
 
 I know that the above value is only a limit. The problem is that during 
 query execution I noticed that 
 200 up to 480 MB were actually occupied(depending on the run).
 
 I don't know if things are better with Super Server. It is not an option for 
 me because I am using 
 FreeBSD.
 
 As a conclusion, I was disappointed when I tested postgres 9.2 with the same 
 query. It finishes under 
 a second without using extra memory. It is obvious that postgres also does 
 extra work for my query but
 in a much more efficient way.
 
 Why Firebird sorts datasets loading whole rows into memory when the plan is 
 NATURAL? Couldn't it sort 
 only primary keys for example? There might be something I 'm missing...
 
 Hi Haris!
 
 I've no clue about your actual question, but am curious as to what your query 
 looks like and what size of tables and result sets we're talking about. Going 
 NATURAL on the first table in a plan is quite normal and usually not too time 
 consuming unless we talk about large tables (though I've no experience with 
 Classic...), but NATURAL on other tables in the plan can be quite ugly 
 performancewise. And Firebird can (or at least could) be very slow if you 
 issue the wrong kind of query against it (e.g. ... WHERE somefield IN 
 (SELECT DISTINCT AnotherField ... wasn't recommended).

 Set


Hi Set,
Thanks for your reply. 

Here is my query which is a join between three tables. prdInCatTab holds the 
keys to product and prdCategory and it's a one-to-many relationship:

SELECT FIRST 10 SKIP 500 *
FROM product JOIN prdInCatTab ON prdInCatTab.PINCPrdId = product.prdId 
JOIN prdCategory ON prdCategory.catId = prdInCatTab.PINCCatId
WHERE prdCategory.catShowFlag = '1'
AND product.prdHideFlag != '1'
AND prdInCatTab.PINCPrimary = '1'
ORDER BY prdInCatTab.PINCRank DESC, product.nameUp ASC;

And here is the plan:

PLAN SORT (JOIN (PRDINCATTAB NATURAL, PRDCATEGORY INDEX (PK_PRDCATEGORY), 
PRODUCT INDEX (PK_PRODUCT)))

Well, my actual question could be: Why does Firebird consume such a large 
amount of sorting space? It is almost impossible for a web server which has 
over 100 connections to allocate such amounts of memory space. As I wrote in my 
first post here are my numbers:

1. Table product's size is between 30 and 40 MB. The other two have small size.

2. TempCacheLimit = 536870912 with Classic Super Server if you want to have an 
acceptable run(4-5 secs first run on my laptop, around 2 secs all next runs). 

3. Firebird server does actually need from 200 to 450 MB for each run.   

Although my production servers are much faster than my laptop, I replaced the 
above query with another that uses a subquery as a virtual table. It turned out 
very fast and not memory consuming:

SELECT * FROM product 
JOIN prdInCatTab ON prdInCatTab.PINCPrdId = product.prdId 
JOIN prdCategory ON prdCategory.catId = prdInCatTab.PINCCatId 
JOIN
(
SELECT FIRST 10 SKIP 500 product.prdId as bprdid
FROM product JOIN prdInCatTab ON prdInCatTab.PINCPrdId = product.prdId 
JOIN prdCategory ON prdCategory.catId = prdInCatTab.PINCCatId
WHERE catShowFlag = '1'
AND product.prdHideFlag != '1'
AND prdInCatTab.PINCPrimary = '1'
ORDER BY PINCRank DESC, product.nameUp ASC
) AS b ON product.prdid = b.bprdid
AND prdInCatTab.PINCPrimary = '1'

and the plan:
PLAN JOIN (JOIN (SORT (JOIN (B PRDINCATTAB NATURAL, B PRDCATEGORY INDEX 
(PK_PRDCATEGORY), B PRODUCT INDEX (PK_PRODUCT))), JOIN (PRDINCATTAB INDEX 
(FK_PINC_PRODUCT), PRODUCT 

[firebird-support] Re: In memory sorting consumes alot of space

2013-05-17 Thread haris_p_sw


--- In firebird-support@yahoogroups.com, Ann Harrison aharrison@... wrote:

 On Thu, May 16, 2013 at 12:35 PM, haris_p_sw haris_p_sw@... wrote:
 
 
 
  Why Firebird sorts datasets loading whole rows into memory when the plan
  is NATURAL? Couldn't it sort only primary keys for example? There might be
  something I 'm missing...
 
 
 Firebird actually doesn't sort whole rows, it sorts only the rows requested
 in the query - so select * is your enemy here as well. If it were to sort
 carrying only the columns in the ORDER BY clause  and the RDB$DB_KEY, it
 would then have to make random reads for each row in the result set.
  Random reads all over the database are even more expensive than the block
 reads used to retrieve sorted subsets of the query.
 
 Allocate as much memory as feasible for sorts and use a fast disk (or disk
 equivalent) for temp space.
 
 Good luck,
 
 
 Ann
 
 
 [Non-text portions of this message have been removed]



Hi Ann,
Thanks for your reply.

I just posted an answer to Set:
http://tech.groups.yahoo.com/group/firebird-support/message/121760
You can see there that I have solved this specific problem. I agree with what 
you 're saying(I don't have a fraction of your knowledge on the subject but it 
seems very logical). I just can't understand how postgres does it so well. 

Regards,
Haris.



RE: [firebird-support] Re: In memory sorting consumes alot of space

2013-05-17 Thread Leyne, Sean


 Hi Ann,
 Thanks for your reply.
 
 I just posted an answer to Set:
 http://tech.groups.yahoo.com/group/firebird-support/message/121760
 You can see there that I have solved this specific problem. I agree with what
 you 're saying(I don't have a fraction of your knowledge on the subject but it
 seems very logical).

 I just can't understand how postgres does it so well.

I don't know myself, perhaps postgres applies a simple compression to the 
intermediate sort files to minimize disk space usage.

The fact that your query create a fully exploded result set of all columns in 
all of the query tables doesn't help -- it seems very unnecessary -- it only 
increases the size of the temp sort file.

One possible performance optimization (depending on the selectivity of the 
prdInCatTab.PINCPrimary column) would be to add an index on this column.  This 
will not reduce the size of the temp sort file, but could improve overall query 
performance.


Sean



Re: [firebird-support] Re: In memory sorting consumes alot of space

2013-05-17 Thread Alexandre Benson Smith
Em 17/5/2013 15:33, haris_p_sw escreveu:

 --- In firebird-support@yahoogroups.com, Svein Erling Tysv�r 
 svein.erling.tysvaer@... wrote:
 Hi,

 I 'd like to share my recent testing experiences with you hoping to help in 
 Firebird development.

 I 've been using Firebird (classic server  70 connections for web serving) 
 for 13 years now.

 Recently, I decided to test certain suspicious select queries more 
 carefully. Here are my conclusions
 and I hope you prove me wrong:

 1. If the query plan uses indexing, everything works OK and very fast.

 2. If the query plan goes the NATURAL way then sorting happens which is 
 totally expected. However,
 if the tuning variable TempCacheLimit has a value that is not big enough 
 for the dataset the sorting
 process happens on hard disk and is extremely slow. it depends on the 
 machine and the dataset but it
 has to be a very powerful machine to manage it under 10 secs. On my new 
 64-bit laptop, Firebird 2.5.2
 for a dataset around 30-40 MB I got 35-40 seconds.
 The NATURAL plan was unavoidable since there was a need of ordering by two 
 columns from two different
 tables.
 I started tuning TempCacheLimit, LockMemSize, DefaultDbCachePages and 
 FileSystemCacheThreshold. I
 discovered that caching in memory does not speed up things so I gave 
 FileSystemCacheThreshold a big
 value. I managed to get an execution time around 2 seconds(4 for the first 
 run) when TempCacheLimit
 was: 536870912(classic super server. With plain classic server you need 
 more memory).

 I know that the above value is only a limit. The problem is that during 
 query execution I noticed that
 200 up to 480 MB were actually occupied(depending on the run).

 I don't know if things are better with Super Server. It is not an option 
 for me because I am using
 FreeBSD.

 As a conclusion, I was disappointed when I tested postgres 9.2 with the 
 same query. It finishes under
 a second without using extra memory. It is obvious that postgres also does 
 extra work for my query but
 in a much more efficient way.

 Why Firebird sorts datasets loading whole rows into memory when the plan is 
 NATURAL? Couldn't it sort
 only primary keys for example? There might be something I 'm missing...
 Hi Haris!

 I've no clue about your actual question, but am curious as to what your 
 query looks like and what size of tables and result sets we're talking 
 about. Going NATURAL on the first table in a plan is quite normal and 
 usually not too time consuming unless we talk about large tables (though 
 I've no experience with Classic...), but NATURAL on other tables in the plan 
 can be quite ugly performancewise. And Firebird can (or at least could) be 
 very slow if you issue the wrong kind of query against it (e.g. ... WHERE 
 somefield IN (SELECT DISTINCT AnotherField ... wasn't recommended).
 
 Set

 Hi Set,
 Thanks for your reply.

 Here is my query which is a join between three tables. prdInCatTab holds the 
 keys to product and prdCategory and it's a one-to-many relationship:

 SELECT FIRST 10 SKIP 500 *
 FROM product JOIN prdInCatTab ON prdInCatTab.PINCPrdId = product.prdId
 JOIN prdCategory ON prdCategory.catId = prdInCatTab.PINCCatId
 WHERE prdCategory.catShowFlag = '1'
 AND product.prdHideFlag != '1'
 AND prdInCatTab.PINCPrimary = '1'
 ORDER BY prdInCatTab.PINCRank DESC, product.nameUp ASC;

 And here is the plan:

 PLAN SORT (JOIN (PRDINCATTAB NATURAL, PRDCATEGORY INDEX (PK_PRDCATEGORY), 
 PRODUCT INDEX (PK_PRODUCT)))

 Well, my actual question could be: Why does Firebird consume such a large 
 amount of sorting space? It is almost impossible for a web server which has 
 over 100 connections to allocate such amounts of memory space. As I wrote in 
 my first post here are my numbers:

 1. Table product's size is between 30 and 40 MB. The other two have small 
 size.

 2. TempCacheLimit = 536870912 with Classic Super Server if you want to have 
 an acceptable run(4-5 secs first run on my laptop, around 2 secs all next 
 runs).

 3. Firebird server does actually need from 200 to 450 MB for each run.

 Although my production servers are much faster than my laptop, I replaced the 
 above query with another that uses a subquery as a virtual table. It turned 
 out very fast and not memory consuming:

 SELECT * FROM product
  JOIN prdInCatTab ON prdInCatTab.PINCPrdId = product.prdId
  JOIN prdCategory ON prdCategory.catId = prdInCatTab.PINCCatId
  JOIN
 (
 SELECT FIRST 10 SKIP 500 product.prdId as bprdid
 FROM product JOIN prdInCatTab ON prdInCatTab.PINCPrdId = product.prdId
 JOIN prdCategory ON prdCategory.catId = prdInCatTab.PINCCatId
 WHERE catShowFlag = '1'
 AND product.prdHideFlag != '1'
 AND prdInCatTab.PINCPrimary = '1'
 ORDER BY PINCRank DESC, product.nameUp ASC
 ) AS b ON product.prdid = b.bprdid
 AND prdInCatTab.PINCPrimary = '1'

 and the plan:
 PLAN JOIN (JOIN (SORT (JOIN (B PRDINCATTAB NATURAL, B PRDCATEGORY INDEX 
 (PK_PRDCATEGORY), B PRODUCT INDEX