Re: [HACKERS] Cluster wide option to control symbol case folding
Robert Haas [mailto:robertmh...@gmail.com] wrote: >> Where you get into trouble there is that you might run CREATE EXTENSION >> from that session Yes. I can see this problem. And, while I can imagine resolving it with context belonging to the extension, separate from the current session's context, any resolution gets to be pretty complex. Probably complex enough that the resolution is worse than living with the problem as part of the cost of the feature. Which means, it remains one of the arguments against it. >> Again, I'm not trying to rain down fire and brimstone >> on your idea here and I clearly see the utility of it. I do not feel this at all (though, since this is e-mail, it is helpful that you state it explicitly). And, I have not felt like this was the case at any point in these discussions. I have consistently received thoughtful and remarkably good responses containing solid points. While I did not recognize all the impacts, and I probably still do not, I never thought this was a trivial issue with no arguments against it (not least, that what I was asking for is not standard compliant). Every modal behavior of any sort in any software adds pain. At the very least it increases the regression testing burden. And, this is a mode with pretty fundamental impact. It has to be worth a lot to somebody to be worth having. And, of course, it also has to work. Ian Lewis (www.mstarlabs.com) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster wide option to control symbol case folding
Tom Lane [mailto:t...@sss.pgh.pa.us] wrote: >> 2. If the folding mode is chosen through a GUC variable, which >> is certainly what people would expect, then it turns out that >> it breaks client libraries/applications *anyway*, because an >> installation-wide setting could impose itself on a client that >> hadn't asked for it. I know that some variables can only be configured at a wide scope, and not a narrow one. Is there no way to restrict a GUC variable's configuration scope to session and finer, but force a fixed value at global scope? If it is possible to restrict global configuration, that at least protects the general purpose administrative tools to a significant degree. >> And for libraries, that isn't a great solution because then they're incompatible with applications that wanted another setting. Good point. Libraries continue to have problems even with session level configuration if they are to operate in the context of an application that reconfigures its session case folding for its own purposes. But, that seems like a problem that is much more likely to affect developers of new systems rather than general users or administrators of existing database systems. If so, it is more of a forward looking problem than a legacy problem in the sense that the person who encounters it is likely to be in a position to do something about it. This makes it much less critical to get every library in the world updated to support all case folding modes than would be the case for general administrative tools like pgAdmin. Depending on the nature of the library, a developer would have the option of using multiple sessions or, perhaps, if it were possible, modifying the folding configuration when using the library. Anyhow, as you say, libraries clearly continue to have issues even with restricted scope on case folding configuration. And the session level idea really helps nothing unless the global default session configuration is fixed. Ian Lewis (www.mstarlabs.com) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster wide option to control symbol case folding
Robert Haas [mailto:robertmh...@gmail.com] wrote: > The issue is, rather, that every extension written for > PostgreSQL, whether in or out of core, needs to handle this issue and > every general-purpose client tool (pgAdmin, etc.) needs to be aware of > it. I can see the accuracy of all of the points you make here. And, I definitely had not thought through the side effects on support tools and third party libraries of implementing such modal behavior on the server when I originally asked my question. I did not even understand the ramifications of upper case folding on the server until Tom pointed out the earlier conversations on the subject (in my defense, I was not confused enough to think I had thought through all the effects of a fundamental change to language recognition based on writing one e-mail message). A fully case sensitive mode, leaving the server catalogs all in lower case, which is what we would really like to have for our use, still looks pretty easy to implement on the server. And, it would at least behave consistently with the lower case folding mode if one quoted all identifiers, unlike a case preserving, case insensitive mode. One idea, which would likely be harder to implement on the server, but that would have less impact on third party tools and libraries, would be to configure case folding on a session basis. There would have to be some means to configure a session for the case folding your application wants to see. And, the general default would have to be the current PostgreSQL behavior so that an application that was designed for current behavior would never see a change. While not quite obvious to me how one would implement this for all client environments, it would make such a feature more useful if it included a means to make the configuration outside of the scope of an application itself so that one could give an application over which one has no control the behavior it expects. That is, provide a means to configure a specific application's session default behavior on the client. But, provide no means to configure the server's general default behavior so that the server itself is never modal with respect to case folding. Only the client session is modal. It is pretty easy to see the pain of adding symbol case folding modes. On the other hand, there is no way to know exactly the gain (or loss) in adoption to providing alternate case folding. So, you have one fact (the pain) and one speculation (the gain). I can see that makes deciding whether this is a good or bad idea for the project not at all easy. Anyhow, I appreciate the time you, and others, have taken to explain your thinking and the impacts of adding modal case folding to the server. Ian Lewis (www.mstarlabs.com) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster wide option to control symbol case folding
From: Robert Haas [mailto:robertmh...@gmail.com] wrote: > I'm not sure there's any way to split the baby here: tool authors will obviously prefer that PostgreSQL's behavior in this area be invariable, while people trying to develop portable database applications will prefer configurability. > As far as I can see, this is a zero sum game that is bound to have one winner and one loser. Tom is clearly right that such modes make life harder in a fundamental way for anyone writing only against PostgreSQL. And, excepting the upper case folding option, which is of no interest at all to me personally - I do not care which case folding messes up my symbol declarations, it would move PostgreSQL away from the standard rather than closer to it (against that, however, PostgreSQL has many features that are not part of the standard, including its existing lower case folding). If he is also right that addition of such an option would deteriorate into a situation where more people think PostgreSQL is broken, rather than fewer people thinking that, as I think would be the case, I have no strong argument for why PostgreSQL - as a project - should support such modal behavior. Personally, I believe such an option would increase, not decrease the number of people who could relatively easily use PostgreSQL. If that is right it is a strong argument for such a modal behavior in spite of the obvious real pain. And, from what I can see, many, maybe most, general purpose tool authors target many backends. So, they already have to deal with some signficiant degree of variation in case folding behavior. So, I do not really see this as a zero sum game. It is a question of whether such an option would grow the user base. If not, it is clearly a bad idea for the project. But, if it would grow the user base sufficiently, then, yes, there is pain for those who write general purpose tools aimed only at PostgreSQL. But, such tools gain from a wider adoption of PostgreSQL. Ian Lewis (www.mstarlabs.com) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster wide option to control symbol case folding
gsst...@gmail.com [mailto:gsst...@gmail.com] On Behalf Of Greg Stark wrote: > But the problem with configurable quoting rules is a bit different. > Imagine your application later decides to depend on PostGIS. So you load the PostGIS extension and perhaps also some useful functions you found on Stack Overflow for solving some GIS problem you have. Those extensions will create objects and then work with those objects and may use CamelCase for clarity -- in > fact I think PostGIS functions are documented as CamelCase. The PostGIS extensions might not work on your system with different case rules if they haven't been 100% consistent with their camelCasing, and the functions from StackOverflow would be even less likely to work. Well, in the case of StackOverflow suggestions, I cannot remember a time when I did not have to rewrite whatever suggestions I have found and used. That is not to say that StackOverflow is not useful. It is incredibly useful at times. But, the suggestions are usually fragments showing how to do something, not solutions. And, most such suggestions are small. And, so, relatively easy to understand and patch as needed. Many such suggestions are not very useful verbatim anyhow. They are useful exactly because they allow you to understand something that you were unable to glean from the documentation. Certainly, making symbol usage consistent is not a hard patch on a small fragment of code that probably needs help anyhow to bring it to production grade. I would not consider this a strong argument against having modal symbol recognition. Your point about PostGIS, and other full or partial solutions for a complex problem, is a more serious issue. I do not have a strong answer to this point. However, at the least a CamelCase case defect in a tool is a pretty easy problem to locate and submit as a patch. (I understand that your point is not just about PostGIS, but for PostGIS itself I have read in a few places that they quote everything already. I do not know whether that is true or not as I have never even looked at the tool. However, if it is true they quote everything, then they already have their CamelCase exactly right everywhere. If they did not the symbol lookup would fail against current PostgreSQL. Any tool that quotes everything should work the same way against any mode as long as all modes are case sensitive. It might be ugly, but at least it should always work no matter what the back end case translation.) In our own code, I actually would prefer that we were forced to always use the same case everywhere we refer to a symbol. And a case sensitive behavior would enforce that at testing. I do not want this because I want to be able to define symbols that differ only in case. I want it so that every symbol reference is exactly visually like every other symbol reference to the same object. Even though the effect is small, I think such consistency makes it easier to read code. Even in C we almost never use the ability to overload on case alone except in a few rare - and localized - cases where the code is actually clearer with such a notation. For example, in a mathematical implementation, using a notation where something like t acts as an index and T defines the range of t the difference in case is very clear. Perhaps more importantly, this use of overload on case is consistent with conventional mathematical notation (which, in my opinion is very good where it belongs). This is not true when dealing with TheLongSymbolWithMixedCase vs. TheLongSymbolWithMixedcase. The human brain cannot see that difference easily, while it can see the difference between t and T very easily, and it can see the relationship between the two symbols more easily than it can see the relationship between t and tmax, say. Still, we almost never have such code running on a database server. Anyhow, you have a good point about third party libraries and tools that integrate with PostgreSQL. However, I for one would be willing to live with and address that kind of issue as needed. If the behavior were controlled at database create time, which, from the articles Tom linked, seems to be the general consensus as the right time for such a choice given the current implementation, then one would at least have the option of having databases with different case rules within a cluster. Since each session can only connect to one database, this is not a solution to every such situation, but it would address at least some such cases. Ian Lewis (www.mstarlabs.com) PS. To anyone who might know the answer: My Reply All to this group does not seem to join to the original thread. All I am doing is Reply All from Outlook. Is there something else I need to do to allow my responses to join the original thread? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster wide option to control symbol case folding
Tom Lane [mailto:t...@sss.pgh.pa.us] wrote: > Quite aside from the impact on the server (which would be extensive), it would break every nontrivial application, and force them all to try to deal with each possible folding behavior. I have read through the various threads related to this issue that you supplied. And, it looks quite clear that a change is unlikely. Maybe it is even a bad idea, though personally I think it could prove worth the pain that you obviously anticipate. However, before fully dropping this issue, I do have one comment I would like to make (just for the record) on your statement that a change in the current behavior would break most non-trivial applications. I assume you are talking about general purpose tools that attempt to interact with any database in any configuration. Obviously, a purpose built tool, such as our own internal database applications, would be designed only for the behavior of the databases it is intended to work against. I have, over the past few months, tried quite a large number (10, maybe a few more than that) of general purpose tools against PostgreSQL looking for replacements for some of the tools we use against our old database server. And, almost none of them work well if I quote identifiers on the server. Almost all work perfectly well if I accept that all my symbols will be converted to lower case and do not quote the identifiers. Most fail - often not even in corner cases - when the catalog (properly) returns a mixed case symbol like WeeklySales. Very few of the tools I have tried seem to know to quote such a symbol to preserve the case when generating a query to send back to the server. Most of the tools I have tested work through ODBC, though a few connect directly to PostgreSQL. Both types of tools have exhibited similar issues. Maybe the ODBC connection is relevant, since it appears that Microsoft SQL Server does what our current server does and preserves case, at least in some modes. A good fraction of the tools I have tried, push their compatibility with SQL Server. So, the current behavior already breaks many tools unless one accepts that all symbols on the server are lower case. At root, based on reading the threads you provided, this probably indicates defects in the tools, rather than a problem with PostgreSQL. My reading of the standard text quoted in various places is that any mixed case identifier returned from the catalog has to be quoted to match in a query (whether you fold to lower or upper case). But, I can easily imagine a good number of people deciding they want mixed case on the server, and so quoting their identifiers. And, then deciding PostgreSQL is defective, rather than deciding their favorite administration or query tool is defective. Almost all of the tools I tried worked fine when I had all lower case symbols on the server. Based on observing the generated SQL, most of the tools that failed for me when I had mixed case symbols on the server would work against a case preserving mode in PostgreSQL. The tools generally pass through the catalog reported symbols without manipulation. I fully understand your concern to keep the number of modal behaviors to a minimum. And, the upper case folding looks to have a lot of side effects, and so is a whole lot harder to implement than I expected. Preserving case, which is what we actually want, by contrast still looks pretty easy, though it is clearly not standard compliant. I do not have any more to add on the subject. I just wanted to make this note that a good fraction of third party tools already fail unless one never quotes identifiers on PostgreSQL. Thank you for taking the time to reply to my original inquiry. Ian Lewis www.mstarlabs.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster wide option to control symbol case folding
On Saturday, December 24, 2016 10:49 PM Tom Lane [mailto:t...@sss.pgh.pa.us] wrote: > No. This has been looked into repeatedly in the past, and we simply don't want to deal with it. Fair enough. We will not pursue the issue then. That is why I asked. Ian Lewis (www.mstarlabs.com) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster wide option to control symbol case folding
On December 24, 2016 9:52 PM Craig Ringer [mailto:craig.rin...@2ndquadrant.com] wrote: > Personally I can see such an option being ok as an initdb-time setting or at CREATE DATABASE time. Case folding can know the current db from global context. > > It'd have to be fast though. Very fast. That seems reasonable. In fact, it is nicer to configure at the database level, rather than at the cluster level. I did not know it was possible to tell the database from global context. Since a connection only allows access to a single database, it makes sense that could be possible. A check of a single global configuration variable to switch between one of three handler paths would be very small compared with the character-by-character checks currently performed by downcase_identifier(), though, of course, the extra check would not be free. However, it would likely be faster to setup up the processing to call through a global pointer to one of three handler functions. On most Intel processors, at least, that extra pointer indirection costs little to nothing. The pointer could be set up during database connect (I do not know what I am talking about here, but there must be such a process somewhere). Presumably, the handler pointer would have to go into the global database descriptor whatever that is. Or, if you allow use of global objects for storing information about database scope run-time configuration, it could just be a function pointer stored with the handlers. The database initialization processing could call a setup function when it runs to select the correct handling for its configuration. The default would be the current downcase_identifier() handling. Does this seem like an approach that would meet your "Very fast" requirement? Ian Lewis (www.mstarlabs.com)
[HACKERS] Cluster wide option to control symbol case folding
Is there any chance that the PostgreSQL developers would accept a new cluster wide configuration option to control how the system handles symbol case folding? Currently PostgreSQL folds all un-quoted symbols to lower case. We would like to add a global configuration option with a name like symbol_casefold with settings to allow folding symbols to lc - lower case (default) uc - upper case none - no case folding USE CASE 1 The option we actually want for our own use is "none". Currently we have several large bodies of code that work with an SQL Anywhere backend. This server preserves case and - by a configuration option - performs case insensitive lookup on all symbols. We are moving to PostgreSQL for internal applications. We are also considering using PostgreSQL for the storage backend in a product. We use camel case for symbol names in all of our application and backend code (largely written in C++, Object Pascal, and Python). In a language like Pascal that ignores case, we still maintain consistent use of case in symbol names for the sake of human readers of the code. Where we make a connection from application code to a storage backend we would like to use exactly the same symbol name - including case - for the symbol in the backend as we use in the corresponding symbol in application code. For example, if we intend to read a value into a variable called FirstName we would like the corresponding field in the database to also be FirstName. The main reason we want this exact match is so that a human reader sees exactly the same thing in the two places. This makes it easier to see the connection between the two bodies of code. However, in places we also automate the connection between client symbols and server symbols, and in such a case it is useful, though certainly not necessary, to have an exact match. While we do not do so at present, where we use a database purely through code, we can likely add double quotes around all symbols in our generated SQL, which means we can have an exact match as we want under PostgreSQL as it currently behaves. However, in many cases we also access the same database backend through third party tools and ad hoc queries. If we double quote all of our symbols, which is what we have tried to do in our initial tests, then all such tools and manually written queries must also double quote all symbols. We use several tools and libraries that do not appear to have any way to properly quote symbols when they are obtained automatically from the schema. So, for example, if our reporting tool retrieves a field called FirstName from the database schema, it happily uses that name in its internally generated SQL. And, on PostgreSQL, of course this fails because FirstName in script is firstname at the server, which is not defined if we have double quoted the field name in the table definition. Since we do not have the code, we cannot work around this in any easy way. For ad hoc queries we can double quote all symbol references. But, this makes the queries noticeably harder for a person to read. And, I do not believe this is just a matter of "getting used to it". The quotes clutter the script, and that clutter makes the script fundamentally harder to understand. Maybe this is small, but anything that makes comprehension harder is a bad thing. Queries can be hard enough to understand without extra syntactic clutter. So, these considerations leave us the option of never quoting symbols when using PostgreSQL. This works everywhere we have tried it. But, it is pretty unattractive from the point of view of looking at the symbol names in the backend if they are to match exactly the symbol names we use in code. Using a different naming convention than we use everywhere else in our code (underscore separated all lower case symbol names, say) is not appealing either. We are very consistent in our symbol name handling in our code, and breaking our conventions in some relatively large section of our code is very unattractive. In addition, in many places our tools and code use schema supplied field names to form column titles in a table or in a caption on an edit box (etc.). The mixed case names are much nicer for this purpose than the folded names. For our purposes, these are our arguments for wanting control of how the server folds case. USE CASE 2 Even though we have no use for it, I have included the option "uc" because, in trying to determine whether PostgreSQL could support our desired behavior, I found a fairly large number of people who are coming from a different backend, such as Oracle (from what I see on the internet - no personal experience), that case fold similarly to PostgreSQL. However, instead of folding to lower case, it appears a number of other database servers fold to upper case. This leaves people who are moving from these other database systems with problems in their own code that they have to patch up to be able to make the port succes