[GitHub] pnowojski commented on issue #6519: [FLINK-9559] [table] The type of a union of CHAR columns of different lengths should be VARCHAR
pnowojski commented on issue #6519: [FLINK-9559] [table] The type of a union of CHAR columns of different lengths should be VARCHAR URL: https://github.com/apache/flink/pull/6519#issuecomment-415420332 What does the SQL standard have to say about this? MySQL is not the most standard compliant db out there. Maybe the problem here is that string literals are of `CHAR(N)` type, while we do not support `CHAR` anywhere else? Maybe our literals should also have a type of `VARCHAR(N)`? That would also solve this problem. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] pnowojski commented on issue #6519: [FLINK-9559] [table] The type of a union of CHAR columns of different lengths should be VARCHAR
pnowojski commented on issue #6519: [FLINK-9559] [table] The type of a union of CHAR columns of different lengths should be VARCHAR URL: https://github.com/apache/flink/pull/6519#issuecomment-415443589 At least according to draft of 2008 SQL Standard, this PR violates it (paragraph 9.3, section 3) iii) 3) says that the result type in this case should be `CHAR(N)` where N is the maximum of the inputs. If any of the inputs is `VARCHAR(x)`, the result should be also `VARCHAR`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] pnowojski commented on issue #6519: [FLINK-9559] [table] The type of a union of CHAR columns of different lengths should be VARCHAR
pnowojski commented on issue #6519: [FLINK-9559] [table] The type of a union of CHAR columns of different lengths should be VARCHAR URL: https://github.com/apache/flink/pull/6519#issuecomment-415699060 @hequn8128 may I ask what is the use case that you are trying to solve? I'm asking because I have mixed feelings with this: 1. as it is, it brakes standard, which is bad 2. adding configuration option `shouldConvertRaggedUnionTypesToVarying`, which by default would be `true` is equally bad 3. making it by default `false` leaves us with an option that's very hard to explain to the users what does it to. I would guess almost nobody would use it 4. having more configuration rises the complexity of the system - more features that we need to maintain, support, refactor, test etc. This can quickly become a nightmare. 5. it seems like very cosmetic change. In most cases it doesn't matter, since either the result is being printed or being stored in an external system that has it's own schema. As long as in this external system column is defined as `VARCHAR(x)` this issue doesn't matter. Isn't the scope of this limited purely to unit tests, where we need to compare to spaces? 6. if we agree to introduce this change, should we also agree when next time someone reports that he/she would like to have things like: - `123.456` literal being double or decimal? - `'abc'` literal being `CHAR(3)` or `VARCHAR(3)`? - `'A ' || 'B '` returning `'AB'` or `'A B '` - 'NULL == 0` being true, false or null? - `SUBSTRING('abc' FROM -2 FOR 4)` should yield 'a' or 'bc'? and the list goes on and on. We definitely can not provide a switch for all of such quirks. Having one switch like `mysql-compatibility-mode: true` would have more sense, however it's impossible to implement/maintain for different reasons. That's why I don't like changing the default behaviour and I see more problems then positives by introducing this as a configurable option. However I see one issue here that would solve this problem. Since Flink does not support `CHAR(x)` anywhere besides string literals, and our `CHAR(x)` support seems to be broken (for example `'A ' || 'B '` should return `'A B '`), maybe we should treat all string literals as varchars? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] pnowojski commented on issue #6519: [FLINK-9559] [table] The type of a union of CHAR columns of different lengths should be VARCHAR
pnowojski commented on issue #6519: [FLINK-9559] [table] The type of a union of CHAR columns of different lengths should be VARCHAR URL: https://github.com/apache/flink/pull/6519#issuecomment-415789873 @hequn8128 I'm pretty sure that what you have just pointed out is bug in Flink SQL. I'm not 100% sure (maybe 95%), but when comparing different types like `CHAR(3)` with `CHAR(5)` or `VARCHAR(2)`, all of the operands should be first converted to same type (adding/ignoring padding when necessary) and only then compared. In other words `'A ' == 'A'` should return true, and this PR is not a proper fix to this problem. It doesn't fix for example: ``` SELECT * FROM country WHERE country_name = 'GERMANY ' ``` This further convinces me that: > and our CHAR(x) support seems to be broken (for example 'A ' || 'B ' should return 'A B ') Probably either we should convert string literals to `VARCHAR` or properly implement support for `CHAR` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] pnowojski commented on issue #6519: [FLINK-9559] [table] The type of a union of CHAR columns of different lengths should be VARCHAR
pnowojski commented on issue #6519: [FLINK-9559] [table] The type of a union of CHAR columns of different lengths should be VARCHAR URL: https://github.com/apache/flink/pull/6519#issuecomment-416196736 @hequn8128, as I wrote before, we can not support hundreds of switches modifying semantic of SQL, especially not in so minor ways. This PR also do not fix your original issue - broken `CHAR(x)` support in Flink. Output column type of `CHAR(x)` should be properly handled by each connector. For example if Flink is processing some `CHAR(X)` column which is then being written to some sink/connector, it's this Sink's responsibility to correctly handle padding - whether that means trimming it or not depends on the underlying column's type. Again, changing type of `case when` only partially hides the problem. > I think we should not add trailing spaces in result of case when, just like most DBMSs do. Further more, please correct me if I'm wrong, but this sentence seems not quite correct: MySQL http://sqlfiddle.com/#!9/ce6d86/1 1. string literals seems to be of `VARCHAR` type. 2. `CONCAT` on chars is broken 3. indeed `case when` returns `VARCHAR` Oracle http://sqlfiddle.com/#!4/ce6d86/1 1. again, string literals seems to be `VARCHAR` 2. type of `case when` calculated dynamically, depending which branch was selected, but looks like `CHAR(x)` PostgreSQL http://sqlfiddle.com/#!17/ce6d8/1 1. again, string literals seems to be `VARCHAR` 2. I don't understand the type of `case when`. Some magic is happening. Without `concat` it looks like `VARCHAR` with `concat` it looks like `CHAR`. MSSQL http://sqlfiddle.com/#!18/ce6d8/2 1. again, string literals seems to be `VARCHAR` 2. oO... I'm not even attempting to interpret those results... probably `concat` is seriously broken PS, if you look at those results, you should understand why I'm objecting into providing a configuration switch (or more like a hundred of them) in order to follow what "other DBMSs do". This is simply impossible... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] pnowojski commented on issue #6519: [FLINK-9559] [table] The type of a union of CHAR columns of different lengths should be VARCHAR
pnowojski commented on issue #6519: [FLINK-9559] [table] The type of a union of CHAR columns of different lengths should be VARCHAR URL: https://github.com/apache/flink/pull/6519#issuecomment-416559091 Yes, sorry for my mistake there, I realised about it too late and was in the process of fixing it, but it took me quite some time to find a way to determine the actual data type returned from `case`. In your examples, you are missing one important issue: what's the type of string literal. Check out those examples: PostgreSQL http://sqlfiddle.com/#!17/c6522/1 - string literals are of `UNKNOWN` type, that behaves like `VARCHAR` - `case when` on `CHAR` arguments returns `CHAR` type. MSSQL: http://sqlfiddle.com/#!18/50a92/1/0 (click `browser` schema button) - string literals are `VARCHAR` - `case when` return type strictly follows the standard - `CHAR(8)` Oracle: http://sqlfiddle.com/#!4/50a92/1/0 (click `browser` schema button) - string literals are `CHAR` - `case when` return type `VARCHAR` This is still hardly conclusive to motivate the change for me. I'm not sure why are you rejecting my proposal to change default string literal type to `VARCHAR`. It would fix all of our `CHAR` problems (comparisons, connector support, functions support, etc) for now and `case when` would return `VARCHAR` automatically. Even if in the future we would want to provide a configuration switch `string-literals-type: char/varchar`, such configuration option would be cleaner and much easier to understand compared to `shouldConvertRaggedUnionTypesToVarying`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] pnowojski commented on issue #6519: [FLINK-9559] [table] The type of a union of CHAR columns of different lengths should be VARCHAR
pnowojski commented on issue #6519: [FLINK-9559] [table] The type of a union of CHAR columns of different lengths should be VARCHAR URL: https://github.com/apache/flink/pull/6519#issuecomment-416647706 Good point with the comparisons. I meant that after changing literal types to `VARCHAR` (and accepting that this brakes SQL standard) our comparisons/functions (like `concat`) would be more SQL standard compliant, without the need to fix them. Saying that our literals are `VARCHAR` and we are being consistent with handling them is better then providing broken `CHAR`. Also it should be a cheaper and way easier for us compared to finding all places that do not handle `CHAR` as they should and fixing them. As a side note, I generally consider `CHAR` as a pretty useless type, especially since unicode it's no longer fixed length. Changing type of string literal would be easy in calcite (`org.apache.calcite.rex.RexBuilder#makePreciseStringLiteral`). On our side it's might a little bit be more tricky. I might be wrong here, but it might be enough to provide `RexShuttle` that do `CHAR` to `VARCHAR` type rewrite and apply it to `org.apache.calcite.rel.AbstractRelNode#accept(org.apache.calcite.rex.RexShuttle)` as a yet another "zero" step. Which also doesn't sound overly complicated. `shouldConvertRaggedUnionTypesToVarying` both with better `CHAR` support or `VARCHAR` literals looses it's purpose (and becomes cosmetic change), or doesn't it? Since I consider this our `CHAR` handling to be a serious problem, I would like it to be fixed (in one way or the other) before 1.7.0 release. If we could fix this properly, would you be fine with dropping this PR? If we would be running out of time, we could reevaluate this closer to feature freeze. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services