[firebird-support] Re: In memory sorting consumes alot of space
--- In firebird-support@yahoogroups.com, Alexandre Benson Smith iblist@... wrote: Here is my query which is a join between three tables. prdInCatTab holds the keys to product and prdCategory and it's a one-to-many relationship: SELECT FIRST 10 SKIP 500 * FROM product JOIN prdInCatTab ON prdInCatTab.PINCPrdId = product.prdId JOIN prdCategory ON prdCategory.catId = prdInCatTab.PINCCatId WHERE prdCategory.catShowFlag = '1' AND product.prdHideFlag != '1' AND prdInCatTab.PINCPrimary = '1' ORDER BY prdInCatTab.PINCRank DESC, product.nameUp ASC; And here is the plan: PLAN SORT (JOIN (PRDINCATTAB NATURAL, PRDCATEGORY INDEX (PK_PRDCATEGORY), PRODUCT INDEX (PK_PRODUCT))) Well, my actual question could be: Why does Firebird consume such a large amount of sorting space? It is almost impossible for a web server which has over 100 connections to allocate such amounts of memory space. As I wrote in my first post here are my numbers: 1. Table product's size is between 30 and 40 MB. The other two have small size. 2. TempCacheLimit = 536870912 with Classic Super Server if you want to have an acceptable run(4-5 secs first run on my laptop, around 2 secs all next runs). 3. Firebird server does actually need from 200 to 450 MB for each run. Although my production servers are much faster than my laptop, I replaced the above query with another that uses a subquery as a virtual table. It turned out very fast and not memory consuming: SELECT * FROM product JOIN prdInCatTab ON prdInCatTab.PINCPrdId = product.prdId JOIN prdCategory ON prdCategory.catId = prdInCatTab.PINCCatId JOIN ( SELECT FIRST 10 SKIP 500 product.prdId as bprdid FROM product JOIN prdInCatTab ON prdInCatTab.PINCPrdId = product.prdId JOIN prdCategory ON prdCategory.catId = prdInCatTab.PINCCatId WHERE catShowFlag = '1' AND product.prdHideFlag != '1' AND prdInCatTab.PINCPrimary = '1' ORDER BY PINCRank DESC, product.nameUp ASC ) AS b ON product.prdid = b.bprdid AND prdInCatTab.PINCPrimary = '1' and the plan: PLAN JOIN (JOIN (SORT (JOIN (B PRDINCATTAB NATURAL, B PRDCATEGORY INDEX (PK_PRDCATEGORY), B PRODUCT INDEX (PK_PRODUCT))), JOIN (PRDINCATTAB INDEX (FK_PINC_PRODUCT), PRODUCT INDEX (PK_PRODUCT))), PRDCATEGORY INDEX (PK_PRDCATEGORY)) But you can't always substitute your queries easily, can you? And it still bothers me as a Firebird user how postgres does it so well. Regards, Haris There are blobs in any of the tables ? Since you are using select *, any blobs will be returned. IIRC there is a memory consumition above the expected when blobs are used in sorting (group by, distinct, etc.) I can't recall the exact details and in wich FB version are affected. Perhaps some one could put more information about it. Something regarding the memory be freed just when the transaction is commited or something like that. But, could you try to remove * and put the just the necessary fields ? If not, could you put the table strucure of the involved tables ? Perhaps you have large varchar fields that leads to a huge memory usage. see you ! Hi Set, Thanks for your reply. Yes, there are blobs and quite large varchar fields(160). I have already tested that if I select only some columns then the query execution time is dramatically reduced. My opinion is that it doesn't matter which columns you select, what matters is total size of the returned dataset. Actually, that's why my solution with the subquery works fast. Because it does the sorting on a dataset with very few char and integer columns. Maybe the optimizer could follow a similar plan. Regards, Haris
[firebird-support] Re: In memory sorting consumes alot of space
I don't know myself, perhaps postgres applies a simple compression to the intermediate sort files to minimize disk space usage. The fact that your query create a fully exploded result set of all columns in all of the query tables doesn't help -- it seems very unnecessary -- it only increases the size of the temp sort file. One possible performance optimization (depending on the selectivity of the prdInCatTab.PINCPrimary column) would be to add an index on this column. This will not reduce the size of the temp sort file, but could improve overall query performance. Sean Hi Sean, Thanks for your reply. Yes. I know that select * doesn't help and I agree with you that the key here is the total size of the dataset to be sorted. I created the index you 've suggested but the optimizer doesn't use it. From these discussions (correct me if I'm wrong) I understand that there is a design issue. I can accept it and try to avoid hitting on the wall from now on by writing queries that sort small sized datasets if NATURAL PLAN is unavoidable. But I also hope that these discussions will guide Firebird developers to have a look on such issues. Regards, Haris
[firebird-support] Re: In memory sorting consumes alot of space
--- In firebird-support@yahoogroups.com, Svein Erling Tysv�r svein.erling.tysvaer@... wrote: Hi, I 'd like to share my recent testing experiences with you hoping to help in Firebird development. I 've been using Firebird (classic server 70 connections for web serving) for 13 years now. Recently, I decided to test certain suspicious select queries more carefully. Here are my conclusions and I hope you prove me wrong: 1. If the query plan uses indexing, everything works OK and very fast. 2. If the query plan goes the NATURAL way then sorting happens which is totally expected. However, if the tuning variable TempCacheLimit has a value that is not big enough for the dataset the sorting process happens on hard disk and is extremely slow. it depends on the machine and the dataset but it has to be a very powerful machine to manage it under 10 secs. On my new 64-bit laptop, Firebird 2.5.2 for a dataset around 30-40 MB I got 35-40 seconds. The NATURAL plan was unavoidable since there was a need of ordering by two columns from two different tables. I started tuning TempCacheLimit, LockMemSize, DefaultDbCachePages and FileSystemCacheThreshold. I discovered that caching in memory does not speed up things so I gave FileSystemCacheThreshold a big value. I managed to get an execution time around 2 seconds(4 for the first run) when TempCacheLimit was: 536870912(classic super server. With plain classic server you need more memory). I know that the above value is only a limit. The problem is that during query execution I noticed that 200 up to 480 MB were actually occupied(depending on the run). I don't know if things are better with Super Server. It is not an option for me because I am using FreeBSD. As a conclusion, I was disappointed when I tested postgres 9.2 with the same query. It finishes under a second without using extra memory. It is obvious that postgres also does extra work for my query but in a much more efficient way. Why Firebird sorts datasets loading whole rows into memory when the plan is NATURAL? Couldn't it sort only primary keys for example? There might be something I 'm missing... Hi Haris! I've no clue about your actual question, but am curious as to what your query looks like and what size of tables and result sets we're talking about. Going NATURAL on the first table in a plan is quite normal and usually not too time consuming unless we talk about large tables (though I've no experience with Classic...), but NATURAL on other tables in the plan can be quite ugly performancewise. And Firebird can (or at least could) be very slow if you issue the wrong kind of query against it (e.g. ... WHERE somefield IN (SELECT DISTINCT AnotherField ... wasn't recommended). Set Hi Set, Thanks for your reply. Here is my query which is a join between three tables. prdInCatTab holds the keys to product and prdCategory and it's a one-to-many relationship: SELECT FIRST 10 SKIP 500 * FROM product JOIN prdInCatTab ON prdInCatTab.PINCPrdId = product.prdId JOIN prdCategory ON prdCategory.catId = prdInCatTab.PINCCatId WHERE prdCategory.catShowFlag = '1' AND product.prdHideFlag != '1' AND prdInCatTab.PINCPrimary = '1' ORDER BY prdInCatTab.PINCRank DESC, product.nameUp ASC; And here is the plan: PLAN SORT (JOIN (PRDINCATTAB NATURAL, PRDCATEGORY INDEX (PK_PRDCATEGORY), PRODUCT INDEX (PK_PRODUCT))) Well, my actual question could be: Why does Firebird consume such a large amount of sorting space? It is almost impossible for a web server which has over 100 connections to allocate such amounts of memory space. As I wrote in my first post here are my numbers: 1. Table product's size is between 30 and 40 MB. The other two have small size. 2. TempCacheLimit = 536870912 with Classic Super Server if you want to have an acceptable run(4-5 secs first run on my laptop, around 2 secs all next runs). 3. Firebird server does actually need from 200 to 450 MB for each run. Although my production servers are much faster than my laptop, I replaced the above query with another that uses a subquery as a virtual table. It turned out very fast and not memory consuming: SELECT * FROM product JOIN prdInCatTab ON prdInCatTab.PINCPrdId = product.prdId JOIN prdCategory ON prdCategory.catId = prdInCatTab.PINCCatId JOIN ( SELECT FIRST 10 SKIP 500 product.prdId as bprdid FROM product JOIN prdInCatTab ON prdInCatTab.PINCPrdId = product.prdId JOIN prdCategory ON prdCategory.catId = prdInCatTab.PINCCatId WHERE catShowFlag = '1' AND product.prdHideFlag != '1' AND prdInCatTab.PINCPrimary = '1' ORDER BY PINCRank DESC, product.nameUp ASC ) AS b ON product.prdid = b.bprdid AND prdInCatTab.PINCPrimary = '1' and the plan: PLAN JOIN (JOIN (SORT (JOIN (B PRDINCATTAB NATURAL, B PRDCATEGORY INDEX (PK_PRDCATEGORY), B PRODUCT INDEX (PK_PRODUCT))), JOIN (PRDINCATTAB INDEX (FK_PINC_PRODUCT), PRODUCT
[firebird-support] Re: In memory sorting consumes alot of space
--- In firebird-support@yahoogroups.com, Ann Harrison aharrison@... wrote: On Thu, May 16, 2013 at 12:35 PM, haris_p_sw haris_p_sw@... wrote: Why Firebird sorts datasets loading whole rows into memory when the plan is NATURAL? Couldn't it sort only primary keys for example? There might be something I 'm missing... Firebird actually doesn't sort whole rows, it sorts only the rows requested in the query - so select * is your enemy here as well. If it were to sort carrying only the columns in the ORDER BY clause and the RDB$DB_KEY, it would then have to make random reads for each row in the result set. Random reads all over the database are even more expensive than the block reads used to retrieve sorted subsets of the query. Allocate as much memory as feasible for sorts and use a fast disk (or disk equivalent) for temp space. Good luck, Ann [Non-text portions of this message have been removed] Hi Ann, Thanks for your reply. I just posted an answer to Set: http://tech.groups.yahoo.com/group/firebird-support/message/121760 You can see there that I have solved this specific problem. I agree with what you 're saying(I don't have a fraction of your knowledge on the subject but it seems very logical). I just can't understand how postgres does it so well. Regards, Haris.
RE: [firebird-support] Re: In memory sorting consumes alot of space
Hi Ann, Thanks for your reply. I just posted an answer to Set: http://tech.groups.yahoo.com/group/firebird-support/message/121760 You can see there that I have solved this specific problem. I agree with what you 're saying(I don't have a fraction of your knowledge on the subject but it seems very logical). I just can't understand how postgres does it so well. I don't know myself, perhaps postgres applies a simple compression to the intermediate sort files to minimize disk space usage. The fact that your query create a fully exploded result set of all columns in all of the query tables doesn't help -- it seems very unnecessary -- it only increases the size of the temp sort file. One possible performance optimization (depending on the selectivity of the prdInCatTab.PINCPrimary column) would be to add an index on this column. This will not reduce the size of the temp sort file, but could improve overall query performance. Sean
Re: [firebird-support] Re: In memory sorting consumes alot of space
Em 17/5/2013 15:33, haris_p_sw escreveu: --- In firebird-support@yahoogroups.com, Svein Erling Tysv�r svein.erling.tysvaer@... wrote: Hi, I 'd like to share my recent testing experiences with you hoping to help in Firebird development. I 've been using Firebird (classic server 70 connections for web serving) for 13 years now. Recently, I decided to test certain suspicious select queries more carefully. Here are my conclusions and I hope you prove me wrong: 1. If the query plan uses indexing, everything works OK and very fast. 2. If the query plan goes the NATURAL way then sorting happens which is totally expected. However, if the tuning variable TempCacheLimit has a value that is not big enough for the dataset the sorting process happens on hard disk and is extremely slow. it depends on the machine and the dataset but it has to be a very powerful machine to manage it under 10 secs. On my new 64-bit laptop, Firebird 2.5.2 for a dataset around 30-40 MB I got 35-40 seconds. The NATURAL plan was unavoidable since there was a need of ordering by two columns from two different tables. I started tuning TempCacheLimit, LockMemSize, DefaultDbCachePages and FileSystemCacheThreshold. I discovered that caching in memory does not speed up things so I gave FileSystemCacheThreshold a big value. I managed to get an execution time around 2 seconds(4 for the first run) when TempCacheLimit was: 536870912(classic super server. With plain classic server you need more memory). I know that the above value is only a limit. The problem is that during query execution I noticed that 200 up to 480 MB were actually occupied(depending on the run). I don't know if things are better with Super Server. It is not an option for me because I am using FreeBSD. As a conclusion, I was disappointed when I tested postgres 9.2 with the same query. It finishes under a second without using extra memory. It is obvious that postgres also does extra work for my query but in a much more efficient way. Why Firebird sorts datasets loading whole rows into memory when the plan is NATURAL? Couldn't it sort only primary keys for example? There might be something I 'm missing... Hi Haris! I've no clue about your actual question, but am curious as to what your query looks like and what size of tables and result sets we're talking about. Going NATURAL on the first table in a plan is quite normal and usually not too time consuming unless we talk about large tables (though I've no experience with Classic...), but NATURAL on other tables in the plan can be quite ugly performancewise. And Firebird can (or at least could) be very slow if you issue the wrong kind of query against it (e.g. ... WHERE somefield IN (SELECT DISTINCT AnotherField ... wasn't recommended). Set Hi Set, Thanks for your reply. Here is my query which is a join between three tables. prdInCatTab holds the keys to product and prdCategory and it's a one-to-many relationship: SELECT FIRST 10 SKIP 500 * FROM product JOIN prdInCatTab ON prdInCatTab.PINCPrdId = product.prdId JOIN prdCategory ON prdCategory.catId = prdInCatTab.PINCCatId WHERE prdCategory.catShowFlag = '1' AND product.prdHideFlag != '1' AND prdInCatTab.PINCPrimary = '1' ORDER BY prdInCatTab.PINCRank DESC, product.nameUp ASC; And here is the plan: PLAN SORT (JOIN (PRDINCATTAB NATURAL, PRDCATEGORY INDEX (PK_PRDCATEGORY), PRODUCT INDEX (PK_PRODUCT))) Well, my actual question could be: Why does Firebird consume such a large amount of sorting space? It is almost impossible for a web server which has over 100 connections to allocate such amounts of memory space. As I wrote in my first post here are my numbers: 1. Table product's size is between 30 and 40 MB. The other two have small size. 2. TempCacheLimit = 536870912 with Classic Super Server if you want to have an acceptable run(4-5 secs first run on my laptop, around 2 secs all next runs). 3. Firebird server does actually need from 200 to 450 MB for each run. Although my production servers are much faster than my laptop, I replaced the above query with another that uses a subquery as a virtual table. It turned out very fast and not memory consuming: SELECT * FROM product JOIN prdInCatTab ON prdInCatTab.PINCPrdId = product.prdId JOIN prdCategory ON prdCategory.catId = prdInCatTab.PINCCatId JOIN ( SELECT FIRST 10 SKIP 500 product.prdId as bprdid FROM product JOIN prdInCatTab ON prdInCatTab.PINCPrdId = product.prdId JOIN prdCategory ON prdCategory.catId = prdInCatTab.PINCCatId WHERE catShowFlag = '1' AND product.prdHideFlag != '1' AND prdInCatTab.PINCPrimary = '1' ORDER BY PINCRank DESC, product.nameUp ASC ) AS b ON product.prdid = b.bprdid AND prdInCatTab.PINCPrimary = '1' and the plan: PLAN JOIN (JOIN (SORT (JOIN (B PRDINCATTAB NATURAL, B PRDCATEGORY INDEX (PK_PRDCATEGORY), B PRODUCT INDEX