Re: [GENERAL] Index optimization ?
Bo Lorentsen wrote: Greg Stark wrote: If Postgres used an index it would call odd(), which would return 1 because it's the first time, and then Postgres would go look up the rows where col is 1 and return all of them. That's a very different behaviour from if the index isn't used. If all the records have col=1 then you're getting all of the records instead of half of them. If col=0 then you're getting none of them instead of half of them. This is the differance between volatile and stable functions, but not the answer to why an index lookuo are per query and not per row, is it not ? Because the _whole_ _point_ of an index is to find matching rows _without_ scanning the whole table. IF you have to look at every row anyway, then just might as well to an sequential scan. The performance boost an index gives you comes from that very fact that you can avoid looking most of the rows. You only have to traverse the index tree from the root to the node(s) you are looking for. If you have an unique index, this means you have to traverse (roughly) log2(n) nodes, if your table has n rows. (If your table has 2^32, or about 4 billion rows, you just need to do 32(!) comparisons when walking the index tree, but whould need to do at worst 2^32 if you sequentially scanned the table. _But_ the downside is that you skipped rows. You just didn't look at them _ever_. You don't even _know_ how many rows you skipped (Thats why count(*) on a hughe table is slow). So you _can't_ use an index for a where-condition that has to inspect every row. But, since the SQL-spec requires you to call volatile functions _for_every_row_, you _can't_ use an index in your case. You can, howevery, accelerate something like where f in (1,2,3,4). You just scan the index 4 times, each time for a different value. Of course, if the number of values becomes larger and larger, there is a point where it's more efficient to do a sequential scan _once_, instead of a few tousand index scans (depends on the number of rows in the table). The postgres optimizer tries to estimate this, and will switch to an seq-scan, if it would have to do too many index lookups. Your example (the one with currval()) would translate to a IN-clause with about as many entries in the IN-list is you have rows in the table (Because the function has to be called _for_ _every_ _row_). Clearly, it's not feasable to use an index scan here - it would be slower than a seq scan. greetings, florian pflug smime.p7s Description: S/MIME Cryptographic Signature
Re: [GENERAL] Index optimization ?
On Wed, Jan 19, 2005 at 02:15:42AM +0100, Florian G. Pflug wrote: You can, howevery, accelerate something like where f in (1,2,3,4). You just scan the index 4 times, each time for a different value. Of course, if the number of values becomes larger and larger, there is a point where it's more efficient to do a sequential scan _once_, instead of a few tousand index scans (depends on the number of rows in the table). The postgres optimizer tries to estimate this, and will switch to an seq-scan, if it would have to do too many index lookups. Are PostgreSQL Btree indexes setup as a linked-list so you can scan forwards and backwards in them? If so, is the IN processor smart enough to collapse ranges of values into a single index scan (ie, IN(1,2,3,4,8,9,10) would best be done as an index scan starting at 1 and stoping at 4 and a second scan starting at 8 and stopping at 10). -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Index optimization ?
On Tue, Jan 18, 2005 at 07:33:51PM -0600, Jim C. Nasby wrote: On Wed, Jan 19, 2005 at 02:15:42AM +0100, Florian G. Pflug wrote: You can, howevery, accelerate something like where f in (1,2,3,4). You just scan the index 4 times, each time for a different value. Of course, if the number of values becomes larger and larger, there is a point where it's more efficient to do a sequential scan _once_, instead of a few tousand index scans (depends on the number of rows in the table). The postgres optimizer tries to estimate this, and will switch to an seq-scan, if it would have to do too many index lookups. Are PostgreSQL Btree indexes setup as a linked-list so you can scan forwards and backwards in them? Yes, they are. If so, is the IN processor smart enough to collapse ranges of values into a single index scan No, it isn't AFAIK. (ie, IN(1,2,3,4,8,9,10) would best be done as an index scan starting at 1 and stoping at 4 and a second scan starting at 8 and stopping at 10). That assumes the optimizer knows that the domain can contain integer values ... seems a complex and infructuous analysis to do in general. Maybe the optimizer could collapse that to a single index scan from 1 to 10 and then apply a filter to extract only the values actually mentioned. Not sure how workable that is. -- Alvaro Herrera ([EMAIL PROTECTED]) La web junta la gente porque no importa que clase de mutante sexual seas, tienes millones de posibles parejas. Pon buscar gente que tengan sexo con ciervos incendiándose, y el computador dirá especifique el tipo de ciervo (Jason Alexander) ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [GENERAL] Index optimization ?
On Tue, Jan 18, 2005 at 11:03:22PM -0300, Alvaro Herrera wrote: On Tue, Jan 18, 2005 at 07:33:51PM -0600, Jim C. Nasby wrote: On Wed, Jan 19, 2005 at 02:15:42AM +0100, Florian G. Pflug wrote: You can, howevery, accelerate something like where f in (1,2,3,4). You just scan the index 4 times, each time for a different value. Of course, if the number of values becomes larger and larger, there is a point where it's more efficient to do a sequential scan _once_, instead of a few tousand index scans (depends on the number of rows in the table). The postgres optimizer tries to estimate this, and will switch to an seq-scan, if it would have to do too many index lookups. Are PostgreSQL Btree indexes setup as a linked-list so you can scan forwards and backwards in them? Yes, they are. If so, is the IN processor smart enough to collapse ranges of values into a single index scan No, it isn't AFAIK. (ie, IN(1,2,3,4,8,9,10) would best be done as an index scan starting at 1 and stoping at 4 and a second scan starting at 8 and stopping at 10). That assumes the optimizer knows that the domain can contain integer values ... seems a complex and infructuous analysis to do in general. Maybe the optimizer could collapse that to a single index scan from 1 to 10 and then apply a filter to extract only the values actually mentioned. Not sure how workable that is. It seems like it should just be a SMOC (simple matter of code), though I suspect better index statistics might be needed first. Given good enough statistics, it should be easy to ascertain how many index tuples you can expect between two values (even if it's not an int column, in this case). The optimizer should also be able to know what the cost is to scan down the index structure, so it should then be easy to figure out which plan is better. Speaking of index statistics, has anyone looked at ways to determine how many tuples exist between two different values? For example, Sybase used to keep statistics in 256 buckets. It would define the range of values that each bucket covered, and how many tuples in the table fell into that bucket. It's a very elegant way to deal with tables that have skewed numbers, as well as clusters of different numbers. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [GENERAL] Index optimization ?
Florian G. Pflug wrote: Because the _whole_ _point_ of an index is to find matching rows _without_ scanning the whole table. IF you have to look at every row anyway, then just might as well to an sequential scan. I am sorry it took me this long to understand this, but I think I got it now thanks to all Your (all in this thread) efforts to make me :-) What confuses me was the difference between what controls a query and what filters a row in a selected data set, I just could not see the difference, but that have changed :-) I think the problem was I just could not see the general principles from the special case where I thought I had the use for this index lookup. Thanks to anyone for there effort to make me understand, hope I will be a faster learner next time :-) /BL ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Index optimization ?
Greg Stark wrote: I understand that, I just can't see why an index lookup can't be used on per row basis. Well, how would that work? Well, good point, the per row is a set of data selected as a product of the static part os the query (non volatile parts), the only thing you can do with this dataset is seq scan this, and use the volatile functions on each row. I just could not see this ... until now :-) If your query is only volatile (like a currval) then PG will select all the table as a dataset and use the volatile expression on each, and that I did not understand, as it semmed more logical to use the index, but I learned :-) /BL ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] Index optimization ?
Tom Lane wrote: No, you'd still end up with a seqscan, because this WHERE clause offers no chance of matching an index, and we don't do anything special with stable functions beyond trying to match them to index conditions. So, the executer uses the (first) value to find the index to use for ALL rows, and if this value change on each row, this can't be optimized and a seq scan is initiated. Is this not a problem for joins ? But consider something like SELECT * FROM mytable WHERE keycol = int(random() * 1000); where keycol is indexed and contains integers 0..1000; let's say each such value appears ten times. With a seqscan implementation (which I consider is what SQL defines the semantics to be) random() would be recomputed at each row and there would be about a 1/1000 chance of selecting each row. This would demand a new index lookup for each row, right ? You might get more or less than exactly ten result rows, and they'd almost certainly contain different values of keycol. This much i do understand :-) Now if random() were marked stable (and of course both multiply and int() are immutable), then the planner would consider an indexscan on keycol to be a valid optimization. But that would produce distinguishably different results, because random() would be evaluated only once: you would always get exactly ten rows and they'd always all have the same keycol value. I know why random (and currval) is not stabel, but I just don't understand why a variable righthand result in seq scan, and not an index scan, even when the data types match an index type. To me it sounds like an index lookup is a one time a query (not per row) thing, but I don't understand why. This can be because, this is the way it turned up, but there is more possibly an aspect of SQL that I don't know too much about. An index can basically implement conditions like WHERE indexedcol = constant --- it takes the constant value and searches the index for matches. (Btrees can also do things like WHERE indexedcol = constant, but let's just think about equality to keep things simple.) :-) We can deal with a nonconstant righthand side, so long as it's okay to evaluate the value just once before the index starts to do its thing. That assumption is what STABLE is all about. So righthand value can't evaluate per row, and the value type of the righthand expression can't be used as a index match. I just hoped for the executer to work like this : find indexedcol indexs evaluate the righthand expression, and find its type (not value) match the righthand value type and match it on index types (is both sides integer) if index is found use this together with the per row righthand value or just use seq scan (I don't understand why, this works if indexes don't) This is what I thought PG was doing :-) Hope, I did not miss any important points. /BL ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [GENERAL] Index optimization ?
Florian G. Pflug wrote: Lets say, you have an query select * from table where field = function(). Now, according to the sql-spec, you would have to scan each row in table, call the function functio(), and compare the result. If the result of the call to function() matches the value in field, the you return the row, otherwise you don't return it. This far I follow :-) An index is a tree, where each node has two subnodes/children. Each node representents a row of your table (or, rather, references a row - it contails only the value of the field you indexed). Additionally, the value of the field of the left child (and the value of the field of its children, and so on) is always guaranteed to be smaller-or-equal to the value of field of the node itself, and the value of the field of the right child is always guaranteed to be greater-or-equal to the value of the field of the node. What you say, is based on the result of the evaluation, the executer will optimize or performe the index lookup, if the righthand is a constant, but it will perform a seq scan if the value is not known on the first query (volatile) ? To me this sound like the indexes can't be performed per row, but only per query ? Or, PG is not type aware when maching indexes ? Now, when doing an index lookup, you have to know which value to look for (lets say you look for 3). Then you look at the top node, and compare the value you are looking for to the value of the node. In our case, the node has a smaller value then the one we are looking for. Because we know that the left child of the toplevel node will have an even smaller value, we don't need to look at the left child at all. We just check the right child, and there we find our record with field=3. But why can't we evaluate righthand and use this new row value (could still be 3, but not 'foobar' :-)) as a key to the index ? _BUT_ this only works, because we knew for which value to look before we started searching. If we the value we look for is constantly changing, our index lookup would return bogus results. So, if the value is unknown _at_the_beginning_ of the search, you can't use the index, because the power of an index search comes from the idea to skip a whole subtree (in our case the left one), because we _know_ it can't contain the value we are looking for. But the value is unknow yes ... but the type of the value is not, this will not change from row to row only the value do this. Functions marked immutable or stable is _guaranteed_ to never change their value during a select statement, or at least not in an unpredictable way. Thats why you can use return values of immutable or stable functions for an index search. I understand that, I just can't see why an index lookup can't be used on per row basis. Lets say, you are doing the following query select * from table where field1 = currval('seq') and field2 = nextval('seq') Now, the value of currval('seq') changes with every row visited. In your example, the value of currval is actually stable - but postgres has no way to know this. To use an index scan for your query, postgres would need to know, that only a call to nextval can change the value of currval - but this, of course, is a quite difficult dependency for a database to track. But why can't the integer index for field1 be used, with the currval result, even if it changes ? Are the index lookup only performed ones (asked this before ) per query ? /BL ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Index optimization ?
Greg Stark wrote: If Postgres used an index it would call odd(), which would return 1 because it's the first time, and then Postgres would go look up the rows where col is 1 and return all of them. That's a very different behaviour from if the index isn't used. If all the records have col=1 then you're getting all of the records instead of half of them. If col=0 then you're getting none of them instead of half of them. This is the differance between volatile and stable functions, but not the answer to why an index lookuo are per query and not per row, is it not ? /BL ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [GENERAL] Index optimization ?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Okay, let's look at this a different way. When you look at a volatile function or variable, let's say CURRENT_TIMESTAMP (which returns the current date and time as of the beginning of the transaction), you see a function or variable whose value changes unpredictably between calls/reads. Now let's look at that value being inserted into a table. Say table X has column Y. We insert a few rows into X, and use CURRENT_TIMESTAMP as the value for Y. So we have something like this (example only): SELECT Y FROM X; Y - - 1 4 6 9 14 Later, we want to find all of the rows we just inserted. We can't very well do this using Y, since: SELECT * FROM X WHERE Y = CURRENT_TIMESTAMP; When CURRENT_TIMESTAMP is something like, say, 44, will not find any of the same rows. Now expand on this to look at an index on a volatile function. When the row inserts/updates take place, the function has a specific value which it returns. Now we can index this return value, but when we go to search the results, we evaluate the function and get a *DIFFERENT VALUE* -- thus, our search of the index reveals a *different* set of rows than the one we were hoping to find, since the value it is looking for (the result of the function call) is not the same as it was when we built the index. Why would you *want* to do this? It is roughly equivalent to building an index on random values, to return a random set of rows -- and this is just as. if not more efficiently done without the index. Again, looking at the ODD function, consider the above where column Z becomes odd(): Y Z - 11 40 61 90 14 1 Now run SELECT Y, Z FROM X WHERE Z = ODD() to get: Y Z - 40 90 Run the same query again to get: Y Z - 11 61 14 1 You see that the results will be inconsistent, since ODD() is volatile and its result can change at any time (at least as far as PostgreSQL is aware of it). On Jan 17, 2005, at 10:30 AM, Bo Lorentsen wrote: Tom Lane wrote: No, you'd still end up with a seqscan, because this WHERE clause offers no chance of matching an index, and we don't do anything special with stable functions beyond trying to match them to index conditions. So, the executer uses the (first) value to find the index to use for ALL rows, and if this value change on each row, this can't be optimized and a seq scan is initiated. Is this not a problem for joins ? But consider something like SELECT * FROM mytable WHERE keycol = int(random() * 1000); where keycol is indexed and contains integers 0..1000; let's say each such value appears ten times. With a seqscan implementation (which I consider is what SQL defines the semantics to be) random() would be recomputed at each row and there would be about a 1/1000 chance of selecting each row. This would demand a new index lookup for each row, right ? You might get more or less than exactly ten result rows, and they'd almost certainly contain different values of keycol. This much i do understand :-) Now if random() were marked stable (and of course both multiply and int() are immutable), then the planner would consider an indexscan on keycol to be a valid optimization. But that would produce distinguishably different results, because random() would be evaluated only once: you would always get exactly ten rows and they'd always all have the same keycol value. I know why random (and currval) is not stabel, but I just don't understand why a variable righthand result in seq scan, and not an index scan, even when the data types match an index type. To me it sounds like an index lookup is a one time a query (not per row) thing, but I don't understand why. This can be because, this is the way it turned up, but there is more possibly an aspect of SQL that I don't know too much about. An index can basically implement conditions like WHERE indexedcol = constant --- it takes the constant value and searches the index for matches. (Btrees can also do things like WHERE indexedcol = constant, but let's just think about equality to keep things simple.) :-) We can deal with a nonconstant righthand side, so long as it's okay to evaluate the value just once before the index starts to do its thing. That assumption is what STABLE is all about. So righthand value can't evaluate per row, and the value type of the righthand expression can't be used as a index match. I just hoped for the executer to work like this : find indexedcol indexs evaluate the righthand expression, and find its type (not value) match the righthand value type and match it on index types (is both sides integer) if index is found use this together with the per row righthand value or just use seq scan (I don't understand why, this works if indexes don't) This is what I thought PG was doing :-) Hope, I did not miss any important points. /BL
Re: [GENERAL] Index optimization ?
Bo Lorentsen [EMAIL PROTECTED] writes: To me it sounds like an index lookup is a one time a query (not per row) thing, but I don't understand why. I can't explain it any more clearly than Florian did: http://archives.postgresql.org/pgsql-general/2005-01/msg00769.php regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [GENERAL] Index optimization ?
Bo Lorentsen [EMAIL PROTECTED] writes: I understand that, I just can't see why an index lookup can't be used on per row basis. Well, how would that work? -- greg ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] Index optimization ?
Tom Lane wrote: http://developer.postgresql.org/docs/postgres/xfunc-volatility.html Ok, thanks I see why there is these three differant function types, but I don't quite understand why the value from a volatile function, can't be used as a index key. Is this because there is no return type garanti, for the voilatile function too ? Will the only possible way to fix this be to make a volatile function with a return type (I know this is not possible now, but in theory) ? /BL ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [GENERAL] Index optimization ?
On Jan 17, 2005, at 0:25, Bo Lorentsen wrote: Tom Lane wrote: http://developer.postgresql.org/docs/postgres/xfunc-volatility.html Ok, thanks I see why there is these three differant function types, but I don't quite understand why the value from a volatile function, can't be used as a index key. Is this because there is no return type garanti, for the voilatile function too ? I don't believe it has necessarily anything to do with the return type, but rather the return value. An index only works if you know what the value is, and the return value for a volatile function is not guaranteed to be the same for given parameters. Here's a contrived (and untestsd) example, but one I think makes it clear: CREATE FUNCTION plus_random ( INTEGER ) RETURNS INTEGER LANGUAGE SQL AS ' SELECT round( $1 + random() * 100 ); '; One could conceivably attempt to make a functional index using plus_random(), but the result it gives every time is indeterminant. How would you be able to usefully search for values in an index that is based on this function? Would it make sense do to do so? Does this help? (And if I'm completely off base, someone please let me know :) Michael Glaesemann grzm myrealbox com ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] Index optimization ?
Michael Glaesemann wrote: I don't believe it has necessarily anything to do with the return type, but rather the return value. An index only works if you know what the value is, and the return value for a volatile function is not guaranteed to be the same for given parameters. Here's a contrived (and untestsd) example, but one I think makes it clear: CREATE FUNCTION plus_random ( INTEGER ) RETURNS INTEGER LANGUAGE SQL AS ' SELECT round( $1 + random() * 100 ); '; One could conceivably attempt to make a functional index using plus_random(), but the result it gives every time is indeterminant. How would you be able to usefully search for values in an index that is based on this function? Would it make sense do to do so? What you say is that PG can't see the difference between this plus_random and the currval, right. But if I have a select (a quite strange one), like this : SELECT * FROM test_table WHERE id = plus_random( test_col ); I don't understand the problem. The function always return an integer as specified in the function decl. so why not use the PK index for search, instead of using seq scan ? The value is totally unpredictable but it is still an integer and the pk index is still useful regarding performance ! I know there is something I don't understand, so I just have to ask :-) Does this help? (And if I'm completely off base, someone please let me know :) No this time I think missed the point, I understand the volatility of functions, so the planer know what to expect from the function, regarding side effect, but I still don't understand why this influences the choice of valid indexes. /BL ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [GENERAL] Index optimization ?
On Sun, 2005-01-16 at 16:25 +0100, Bo Lorentsen wrote: [about a volatile function in a where clause not generating index scan] Will the only possible way to fix this be to make a volatile function with a return type (I know this is not possible now, but in theory) ? this has nothing to do with the return type. a volatile function is a function that is not garanteed to return the same value given same input parameters, (such as currval()). when a volatile function is used thus: SELECT * FROM mytable WHERE col=myvolatilefunc(); the planner must call the function once per table row, and assume possibly different return values each time, so an indexscan will not improve timings. on the other hand, if the function is labeled STABLE, the planner can assume that the same value will alway be returned, so only one call to it can be made, and an indexscan might be found the most effective. hope this helps gnari ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Index optimization ?
Ragnar Hafstað wrote: this has nothing to do with the return type. a volatile function is a function that is not garanteed to return the same value given same input parameters, (such as currval()). when a volatile function is used thus: SELECT * FROM mytable WHERE col=myvolatilefunc(); the planner must call the function once per table row, and assume possibly different return values each time, so an indexscan will not improve timings. Why not use the index scan for every row, is this a limit in the planner ? I think there is something in the planner I don't understand :-) on the other hand, if the function is labeled STABLE, the planner can assume that the same value will alway be returned, so only one call to it can be made, and an indexscan might be found the most effective. The two other function types are not interesting, but I don't understand the planners use of index optimization. /BL ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] Index optimization ?
On Sun, Jan 16, 2005 at 05:30:22PM +0100, Bo Lorentsen wrote: One could conceivably attempt to make a functional index using plus_random(), but the result it gives every time is indeterminant. How would you be able to usefully search for values in an index that is based on this function? Would it make sense do to do so? What you say is that PG can't see the difference between this plus_random and the currval, right. But if I have a select (a quite strange one), like this : SELECT * FROM test_table WHERE id = plus_random( test_col ); I don't understand the problem. The function always return an integer as specified in the function decl. so why not use the PK index for search, instead of using seq scan ? The value is totally unpredictable but it is still an integer and the pk index is still useful regarding performance ! No, it depends on your interpretation of the query. Note, I'm not up with the SQL standard so maybe it doesn't work like this, but this is what I think the problem is. The above query can be interpreted as: for each row in test_table, compare id against plus_random( test_col ). Now, in theory the plus_random function needs to be evaluated for every row, each time giving a different value, thus it may or may not match id. You can see that with that interpretation an index on id doesn't help. If you interpret the query so plus_random is evaluted only once, then an index will help. If test_col is a column of the table then there is no way an index can help you. Hope this helps, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. pgp6DEl6JKvoi.pgp Description: PGP signature
Re: [GENERAL] Index optimization ?
On Sun, 2005-01-16 at 17:45 +0100, Bo Lorentsen wrote: Ragnar Hafstað wrote: when a volatile function is used thus: SELECT * FROM mytable WHERE col=myvolatilefunc(); the planner must call the function once per table row, and assume possibly different return values each time, so an indexscan will not improve timings. Why not use the index scan for every row, is this a limit in the planner ? I think there is something in the planner I don't understand :-) the planner will just use the plan it estimates will be fastest. because of how indexscans work in postgresql, in this case it would be slower than a tablescan (assuming the function really is volatile) gnari ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [GENERAL] Index optimization ?
# kleptog@svana.org / 2005-01-16 17:48:08 +0100: On Sun, Jan 16, 2005 at 05:30:22PM +0100, Bo Lorentsen wrote: One could conceivably attempt to make a functional index using plus_random(), but the result it gives every time is indeterminant. How would you be able to usefully search for values in an index that is based on this function? Would it make sense do to do so? What you say is that PG can't see the difference between this plus_random and the currval, right. But if I have a select (a quite strange one), like this : SELECT * FROM test_table WHERE id = plus_random( test_col ); I don't understand the problem. The function always return an integer as specified in the function decl. so why not use the PK index for search, instead of using seq scan ? The value is totally unpredictable but it is still an integer and the pk index is still useful regarding performance ! No, it depends on your interpretation of the query. Note, I'm not up with the SQL standard so maybe it doesn't work like this, but this is what I think the problem is. The above query can be interpreted as: for each row in test_table, compare id against plus_random( test_col ). That's what happens if you declare the function VOLATILE. Make it STABLE, and the function call will be evaluated only once for the whole table scan. That's just what Tom Lane suggested in his post. -- If you cc me or remove the list(s) completely I'll most likely ignore your message.see http://www.eyrie.org./~eagle/faqs/questions.html ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [GENERAL] Index optimization ?
Ragnar =?ISO-8859-1?Q?Hafsta=F0?= [EMAIL PROTECTED] writes: On Sun, 2005-01-16 at 17:45 +0100, Bo Lorentsen wrote: Why not use the index scan for every row, is this a limit in the planner ? I think there is something in the planner I don't understand :-) the planner will just use the plan it estimates will be fastest. because of how indexscans work in postgresql, in this case it would be slower than a tablescan (assuming the function really is volatile) It has nothing to do with speed, it has to do with giving the correct answer. We define correct answer as being the result you would get from a naive interpretation of the SQL semantics --- that is, for every row in the FROM table, actually execute the WHERE clause, and return the rows where it produces TRUE. As an example, a query like SELECT * FROM mytable WHERE random() 0.1; should produce a random sampling of about one-tenth of the rows in mytable. If we evaluated random() only once in this query, we would get either all or none of the rows, clearly not the right answer. An indexscan is a legal optimization only if the function(s) in the WHERE clause are all STABLE or better. This is because the index access code will only evaluate the righthand side of the indexcol = something clause once, and then will use that value to descend the btree and select matching index entries. We must be certain that this gives the same result we would get from a seqscan. The definition of STABLE that PostgreSQL uses was crafted specifically to capture the property that a function is safe to use in an indexscan qualification ... regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] Index optimization ?
Martijn van Oosterhout wrote: No, it depends on your interpretation of the query. Note, I'm not up with the SQL standard so maybe it doesn't work like this, but this is what I think the problem is. I just try to learn, so that is ok :-) Tom gave me a solution that works, so now I struggle to understand why. The above query can be interpreted as: for each row in test_table, compare id against plus_random( test_col ). Now, in theory the plus_random function needs to be evaluated for every row, each time giving a different value, thus it may or may not match id. But if you take a look at a function, it has a return type. So currval always returns a BIGINT no matter what kind of parameters are given, that is a part of the declaration, as far as I can see. Why are this type info not used to match an index, as the type is the same no matter what row we are in, or no matter its parameter value (or context). The value change, but not the type. The type is used to find a matching index is it not ? Am I misunderstanding you ? You can see that with that interpretation an index on id doesn't help. No, I think this is the problem, I don't see :-) The function promise to return a certain type, and type can be used to find the prober index (if any). If you interpret the query so plus_random is evaluted only once, then an index will help. If test_col is a column of the table then there is no way an index can help you. If and only if the function returns a different value TYPE, otherwise it can use the same index but with different values, of the same type alias use index scan. But again, I am sure there is something I have misunderstud :-) Thanks for trying :-) /BL ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [GENERAL] Index optimization ?
Tom Lane wrote: It has nothing to do with speed, it has to do with giving the correct answer. We define correct answer as being the result you would get from a naive interpretation of the SQL semantics --- that is, for every row in the FROM table, actually execute the WHERE clause, and return the rows where it produces TRUE. As an example, a query like SELECT * FROM mytable WHERE random() 0.1; should produce a random sampling of about one-tenth of the rows in mytable. Nice explaination ... If we evaluated random() only once in this query, we would get either all or none of the rows, clearly not the right answer. So if the random function was stable, you either get all or none, as et gets executed only ones ? An indexscan is a legal optimization only if the function(s) in the WHERE clause are all STABLE or better. This is because the index access code will only evaluate the righthand side of the indexcol = something clause once, and then will use that value to descend the btree and select matching index entries. We must be certain that this gives the same result we would get from a seqscan. Now this sounds like a blink of the problem that I don't understand :-) When you say it evaluate right side ones, what kind of information are you (the executer) then getting, and how is the index match then performed. Is all the where clause expression marked as volatile at this level, just to be sure ? Well maybe the real question is how does the executer match an index, or am I off track ? /BL ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [GENERAL] Index optimization ?
On Sun, 2005-01-16 at 14:11 -0500, Tom Lane wrote: Ragnar =?ISO-8859-1?Q?Hafsta=F0?= [EMAIL PROTECTED] writes: On Sun, 2005-01-16 at 17:45 +0100, Bo Lorentsen wrote: Why not use the index scan for every row, is this a limit in the planner ? I think there is something in the planner I don't understand :-) the planner will just use the plan it estimates will be fastest. because of how indexscans work in postgresql, in this case it would be slower than a tablescan (assuming the function really is volatile) It has nothing to do with speed, it has to do with giving the correct answer. We define correct answer as being the result you would get from a naive interpretation of the SQL semantics --- that is, for every row in the FROM table, actually execute the WHERE clause, and return the rows where it produces TRUE. I should not have used the word 'indexscan'. I just meant that it would be less effective to use an index to look up each result of the volatile function than using a tablescan. It was clear that the function would have to be called for each row, but the OP was asking (I think) why the index was not used. gnari ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] Index optimization ?
Bo Lorentsen wrote: So if the random function was stable, you either get all or none, as et gets executed only ones ? An indexscan is a legal optimization only if the function(s) in the WHERE clause are all STABLE or better. This is because the index access code will only evaluate the righthand side of the indexcol = something clause once, and then will use that value to descend the btree and select matching index entries. We must be certain that this gives the same result we would get from a seqscan. Now this sounds like a blink of the problem that I don't understand :-) When you say it evaluate right side ones, what kind of information are you (the executer) then getting, and how is the index match then performed. Is all the where clause expression marked as volatile at this level, just to be sure ? Lets say, you have an query select * from table where field = function(). Now, according to the sql-spec, you would have to scan each row in table, call the function functio(), and compare the result. If the result of the call to function() matches the value in field, the you return the row, otherwise you don't return it. An index is a tree, where each node has two subnodes/children. Each node representents a row of your table (or, rather, references a row - it contails only the value of the field you indexed). Additionally, the value of the field of the left child (and the value of the field of its children, and so on) is always guaranteed to be smaller-or-equal to the value of field of the node itself, and the value of the field of the right child is always guaranteed to be greater-or-equal to the value of the field of the node. So, if you have three records in the table table, like this: f1 -- 1 2 3 Then your index looks the following: 2 / \ 1 3 Now, when doing an index lookup, you have to know which value to look for (lets say you look for 3). Then you look at the top node, and compare the value you are looking for to the value of the node. In our case, the node has a smaller value then the one we are looking for. Because we know that the left child of the toplevel node will have an even smaller value, we don't need to look at the left child at all. We just check the right child, and there we find our record with field=3. _BUT_ this only works, because we knew for which value to look before we started searching. If we the value we look for is constantly changing, our index lookup would return bogus results. So, if the value is unknown _at_the_beginning_ of the search, you can't use the index, because the power of an index search comes from the idea to skip a whole subtree (in our case the left one), because we _know_ it can't contain the value we are looking for. Functions marked immutable or stable is _guaranteed_ to never change their value during a select statement, or at least not in an unpredictable way. Thats why you can use return values of immutable or stable functions for an index search. Well maybe the real question is how does the executer match an index, or am I off track ? Lets say, you are doing the following query select * from table where field1 = currval('seq') and field2 = nextval('seq') Now, the value of currval('seq') changes with every row visited. In your example, the value of currval is actually stable - but postgres has no way to know this. To use an index scan for your query, postgres would need to know, that only a call to nextval can change the value of currval - but this, of course, is a quite difficult dependency for a database to track. greetings, Florian Pflug smime.p7s Description: S/MIME Cryptographic Signature
Re: [GENERAL] Index optimization ?
Bo Lorentsen [EMAIL PROTECTED] writes: Tom Lane wrote: SELECT * FROM mytable WHERE random() 0.1; If we evaluated random() only once in this query, we would get either all or none of the rows, clearly not the right answer. So if the random function was stable, you either get all or none, as et gets executed only ones ? No, you'd still end up with a seqscan, because this WHERE clause offers no chance of matching an index, and we don't do anything special with stable functions beyond trying to match them to index conditions. But consider something like SELECT * FROM mytable WHERE keycol = int(random() * 1000); where keycol is indexed and contains integers 0..1000; let's say each such value appears ten times. With a seqscan implementation (which I consider is what SQL defines the semantics to be) random() would be recomputed at each row and there would be about a 1/1000 chance of selecting each row. You might get more or less than exactly ten result rows, and they'd almost certainly contain different values of keycol. Now if random() were marked stable (and of course both multiply and int() are immutable), then the planner would consider an indexscan on keycol to be a valid optimization. But that would produce distinguishably different results, because random() would be evaluated only once: you would always get exactly ten rows and they'd always all have the same keycol value. An indexscan is a legal optimization only if the function(s) in the WHERE clause are all STABLE or better. This is because the index access code will only evaluate the righthand side of the indexcol = something clause once, and then will use that value to descend the btree and select matching index entries. We must be certain that this gives the same result we would get from a seqscan. Now this sounds like a blink of the problem that I don't understand :-) When you say it evaluate right side ones, what kind of information are you (the executer) then getting, and how is the index match then performed. An index can basically implement conditions like WHERE indexedcol = constant --- it takes the constant value and searches the index for matches. (Btrees can also do things like WHERE indexedcol = constant, but let's just think about equality to keep things simple.) We can deal with a nonconstant righthand side, so long as it's okay to evaluate the value just once before the index starts to do its thing. That assumption is what STABLE is all about. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Index optimization ?
Florian G. Pflug [EMAIL PROTECTED] writes: Lets say, you have an query select * from table where field = function(). Maybe this would be clearer with a more egregious example of volatility. Say you had a function odd() that returns 1 and 0 alternating. That is, it returns 1 the first time it's called, 0 the second time it's called, then 1, then 0, etc. If you did select * from tab where col = odd() you would expect to get half of the rows where col=0 or col=1. Of course since the order is unpredictable there's no way to know which ones but you should still be pretty sure it'll be half of the rows. If Postgres used an index it would call odd(), which would return 1 because it's the first time, and then Postgres would go look up the rows where col is 1 and return all of them. That's a very different behaviour from if the index isn't used. If all the records have col=1 then you're getting all of the records instead of half of them. If col=0 then you're getting none of them instead of half of them. -- greg ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[GENERAL] Index optimization ?
Hi ... In my quest to get rid of the oid dependency, i have made some new low level code with the help from many nice people from this community (thanks for that), but I still have one somewhat big problem. I am running PG 7.4.6, btw. I have a sale table that have a BIGSERIAL as primary key, but explain keeps telling me (and so does the performance) that it will perform a seq scan of my table when performing this statement (not using the pkey index) : select * from sale where id = currval( 'sale_id_seq' ); I tried this too : select * from sale where id = currval( 'sale_id_seq' )::bigint; But this still did not work (still using seq scan) :-( At last I did a : explain select * from sale where id = 42::bigint; Just to make sure, and this made the proper optimizations (used the pkey index). What have I done wrong, or do PG still have some casting problems in the optimizer ? /BL ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Index optimization ?
Bo Lorentsen [EMAIL PROTECTED] writes: select * from sale where id = currval( 'sale_id_seq' ); This is not legally optimizable into an indexscan, because currval() is a volatile function. (It's easy to construct cases where its value actually does change from row to row --- just use a nextval() as well.) You can fake it out in a couple of ways --- the recommended method is to wrap currval in a user-defined function that is misleadingly marked stable. I think it still works to just put the call in a sub-select: select * from sale where id = (select currval( 'sale_id_seq' )); but I take no responsibility if future improvements in the planner break that trick. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Index optimization ?
On Sat, Jan 15, 2005 at 07:03:43PM +0100, Bo Lorentsen wrote: select * from sale where id = currval( 'sale_id_seq' )::bigint; But this still did not work (still using seq scan) :-( currval() is volatile. See Function Volatility Categories in the Extending SQL chapter of the documentation and search the list archives for past discussion of currval()'s volatility. -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [GENERAL] Index optimization ?
On Sat, Jan 15, 2005 at 01:27:49PM -0500, Tom Lane wrote: Bo Lorentsen [EMAIL PROTECTED] writes: select * from sale where id = currval( 'sale_id_seq' ); This is not legally optimizable into an indexscan, because currval() is a volatile function. (It's easy to construct cases where its value actually does change from row to row --- just use a nextval() as well.) You can fake it out in a couple of ways --- the recommended method is to wrap currval in a user-defined function that is misleadingly marked stable. I think it still works to just put the call in a sub-select: select * from sale where id = (select currval( 'sale_id_seq' )); but I take no responsibility if future improvements in the planner break that trick. Would it make sense to have a version of currval that will only return one value in a statement/transaction? So the first time it's called it remembers what currval for that sequence is and always returns the same value? -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [GENERAL] Index optimization ?
On Sat, Jan 15, 2005 at 03:11:22PM -0600, Jim C. Nasby wrote: Would it make sense to have a version of currval that will only return one value in a statement/transaction? So the first time it's called it remembers what currval for that sequence is and always returns the same value? What would nextval() do in that case? Return the nextval on the first call, and act like currval() from then until the end of the transaction? -- Alvaro Herrera ([EMAIL PROTECTED]) Porque francamente, si para saber manejarse a uno mismo hubiera que rendir examen... ¿Quién es el machito que tendría carnet? (Mafalda) ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Index optimization ?
Tom Lane wrote: This is not legally optimizable into an indexscan, because currval() is a volatile function. (It's easy to construct cases where its value actually does change from row to row --- just use a nextval() as well.) I am not sure what you mean by a volatile function, and how this affect the returned types, I guess this demands some more low level PG knowledge to understand. You can fake it out in a couple of ways --- the recommended method is to wrap currval in a user-defined function that is misleadingly marked stable. I think it still works to just put the call in a sub-select: select * from sale where id = (select currval( 'sale_id_seq' )); but I take no responsibility if future improvements in the planner break that trick. The select trick works just fine, but I don't understand why :-( Do you have any idea to how I may learn more about function types, or is this a read the source, luke thing (I am not sure I have time for that right now) ? Well, thanks anyway ... this just work so nice, and I a looking forward to fill up this database with plenty of data, and still being able to sleep :-) /BL ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [GENERAL] Index optimization ?
Michael Fuhr wrote: currval() is volatile. See Function Volatility Categories in the Extending SQL chapter of the documentation and search the list archives for past discussion of currval()'s volatility. Hmm, I can't find that chapter in the 7.4 manual, or am I looking the wrong place ? I really like to understand this ! /BL ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Index optimization ?
Bo Lorentsen [EMAIL PROTECTED] writes: Do you have any idea to how I may learn more about function types, or is this a read the source, luke thing (I am not sure I have time for that right now) ? http://developer.postgresql.org/docs/postgres/xfunc-volatility.html regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Index optimization ?
On Sat, Jan 15, 2005 at 11:28:08PM +0100, Bo Lorentsen wrote: Michael Fuhr wrote: currval() is volatile. See Function Volatility Categories in the Extending SQL chapter of the documentation and search the list archives for past discussion of currval()'s volatility. Hmm, I can't find that chapter in the 7.4 manual, or am I looking the wrong place ? I really like to understand this ! Sorry...apparently that section is in the development documentation but not in the 7.x doc. Tom Lane posted the link but here it is again: http://developer.postgresql.org/docs/postgres/xfunc-volatility.html -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Index optimization ?
On Sat, Jan 15, 2005 at 06:34:11PM -0300, Alvaro Herrera wrote: On Sat, Jan 15, 2005 at 03:11:22PM -0600, Jim C. Nasby wrote: Would it make sense to have a version of currval that will only return one value in a statement/transaction? So the first time it's called it remembers what currval for that sequence is and always returns the same value? What would nextval() do in that case? Return the nextval on the first call, and act like currval() from then until the end of the transaction? I'm not sure which would be best. It could either do what you suggest, or it could operate as normal, or it could possibly throw an error if run a second time. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster