[GitHub] pnowojski commented on issue #6519: [FLINK-9559] [table] The type of a union of CHAR columns of different lengths should be VARCHAR

2018-08-23 Thread GitBox
pnowojski commented on issue #6519: [FLINK-9559] [table] The type of a union of 
CHAR columns of different lengths should be VARCHAR
URL: https://github.com/apache/flink/pull/6519#issuecomment-415420332
 
 
   What does the SQL standard have to say about this? MySQL is not the most 
standard compliant db out there.
   
   Maybe the problem here is that string literals are of `CHAR(N)` type, while 
we do not support `CHAR` anywhere else? Maybe our literals should also have a 
type of `VARCHAR(N)`? That would also solve this problem.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] pnowojski commented on issue #6519: [FLINK-9559] [table] The type of a union of CHAR columns of different lengths should be VARCHAR

2018-08-23 Thread GitBox
pnowojski commented on issue #6519: [FLINK-9559] [table] The type of a union of 
CHAR columns of different lengths should be VARCHAR
URL: https://github.com/apache/flink/pull/6519#issuecomment-415443589
 
 
   At least according to draft of 2008 SQL Standard, this PR violates it 
(paragraph 9.3, section 3) iii) 3) says that the result type in this case 
should be `CHAR(N)` where N is the maximum of the inputs. If any of the inputs 
is `VARCHAR(x)`, the result should be also `VARCHAR`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] pnowojski commented on issue #6519: [FLINK-9559] [table] The type of a union of CHAR columns of different lengths should be VARCHAR

2018-08-24 Thread GitBox
pnowojski commented on issue #6519: [FLINK-9559] [table] The type of a union of 
CHAR columns of different lengths should be VARCHAR
URL: https://github.com/apache/flink/pull/6519#issuecomment-415699060
 
 
   @hequn8128 may I ask what is the use case that you are trying to solve? I'm 
asking because I have mixed feelings with this:
   
   1. as it is, it brakes standard, which is bad
   2. adding configuration option `shouldConvertRaggedUnionTypesToVarying`, 
which by default would be `true` is equally bad
   3. making it by default `false` leaves us with an option that's very hard to 
explain to the users what does it to. I would guess almost nobody would use it
   4. having more configuration rises the complexity of the system - more 
features that we need to maintain, support, refactor, test etc. This can 
quickly become a nightmare.
   5. it seems like very cosmetic change. In most cases it doesn't matter, 
since either the result is being printed or being stored in an external system 
that has it's own schema. As long as in this external system column is defined 
as `VARCHAR(x)` this issue doesn't matter. Isn't the scope of this limited 
purely to unit tests, where we need to compare to spaces?
   6. if we agree to introduce this change, should we also agree when next time 
someone reports that he/she would like to have things like:
   
   - `123.456` literal being double or decimal?
   - `'abc'` literal being `CHAR(3)` or `VARCHAR(3)`?
   - `'A  ' || 'B  '` returning `'AB'` or `'A  B  '`
   - 'NULL == 0` being true, false or null?
   - `SUBSTRING('abc' FROM -2 FOR 4)` should yield 'a' or 'bc'?
   
   and the list goes on and on. We definitely can not provide a switch for all 
of such quirks. Having one switch like `mysql-compatibility-mode: true` would 
have more sense, however it's impossible to implement/maintain for different 
reasons.
   
   That's why I don't like changing the default behaviour and I see more 
problems then positives by introducing this as a configurable option. 
   
   However I see one issue here that would solve this problem. Since Flink does 
not support `CHAR(x)` anywhere besides string literals, and our `CHAR(x)` 
support seems to be broken (for example `'A  ' || 'B  '` should return `'A  B  
'`), maybe we should treat all string literals as varchars?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] pnowojski commented on issue #6519: [FLINK-9559] [table] The type of a union of CHAR columns of different lengths should be VARCHAR

2018-08-24 Thread GitBox
pnowojski commented on issue #6519: [FLINK-9559] [table] The type of a union of 
CHAR columns of different lengths should be VARCHAR
URL: https://github.com/apache/flink/pull/6519#issuecomment-415789873
 
 
   @hequn8128 I'm pretty sure that what you have just pointed out is bug in 
Flink SQL. I'm not 100% sure (maybe 95%), but when comparing different types 
like `CHAR(3)` with `CHAR(5)` or `VARCHAR(2)`, all of the operands should be 
first converted to same type (adding/ignoring padding when necessary) and only 
then compared.
   
   In other words `'A  ' == 'A'` should return true, and this PR is not a 
proper fix to this problem. It doesn't fix for example:
   
   ```
   SELECT * FROM country WHERE country_name = 'GERMANY  '
   ```
   
   This further convinces me that:
   > and our CHAR(x) support seems to be broken (for example 'A ' || 'B ' 
should return 'A B ')
   
   Probably either we should convert string literals to `VARCHAR` or properly 
implement support for `CHAR`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] pnowojski commented on issue #6519: [FLINK-9559] [table] The type of a union of CHAR columns of different lengths should be VARCHAR

2018-08-27 Thread GitBox
pnowojski commented on issue #6519: [FLINK-9559] [table] The type of a union of 
CHAR columns of different lengths should be VARCHAR
URL: https://github.com/apache/flink/pull/6519#issuecomment-416196736
 
 
   @hequn8128, as I wrote before, we can not support hundreds of switches 
modifying semantic of SQL, especially not in so minor ways. This PR also do not 
fix your original issue - broken `CHAR(x)` support in Flink.
   
   Output column type of `CHAR(x)` should be properly handled by each 
connector. For example if Flink is processing some `CHAR(X)` column which is 
then being written to some sink/connector, it's this Sink's responsibility to 
correctly handle padding - whether that means trimming it or not depends on the 
underlying column's type. Again, changing type of `case when` only partially 
hides the problem.
   
   >  I think we should not add trailing spaces in result of case when, just 
like most DBMSs do.
   
   Further more, please correct me if I'm wrong, but this sentence seems not 
quite correct:
   
   MySQL http://sqlfiddle.com/#!9/ce6d86/1
   
   1. string literals seems to be of `VARCHAR` type.
   2. `CONCAT` on chars is broken
   3. indeed `case when` returns `VARCHAR`
   
   Oracle http://sqlfiddle.com/#!4/ce6d86/1
   
   1. again, string literals seems to be `VARCHAR`
   2. type of `case when` calculated dynamically, depending which branch was 
selected, but looks like `CHAR(x)`
   
   PostgreSQL http://sqlfiddle.com/#!17/ce6d8/1
   
   1. again, string literals seems to be `VARCHAR`
   2. I don't understand the type of `case when`. Some magic is happening. 
Without `concat` it looks like `VARCHAR` with `concat` it looks like `CHAR`.
   
   MSSQL http://sqlfiddle.com/#!18/ce6d8/2
   
   1. again, string literals seems to be `VARCHAR`
   2. oO... I'm not even attempting to interpret those results... probably 
`concat` is seriously broken
   
   PS, if you look at those results, you should understand why I'm objecting 
into providing a configuration switch (or more like a hundred of them) in order 
to follow what "other DBMSs do". This is simply impossible...
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] pnowojski commented on issue #6519: [FLINK-9559] [table] The type of a union of CHAR columns of different lengths should be VARCHAR

2018-08-28 Thread GitBox
pnowojski commented on issue #6519: [FLINK-9559] [table] The type of a union of 
CHAR columns of different lengths should be VARCHAR
URL: https://github.com/apache/flink/pull/6519#issuecomment-416559091
 
 
   Yes, sorry for my mistake there, I realised about it too late and was in the 
process of fixing it, but it took me quite some time to find a way to determine 
the actual data type returned from `case`. 
   
   In your examples, you are missing one important issue: what's the type of 
string literal. Check out those examples:
   
   PostgreSQL
   http://sqlfiddle.com/#!17/c6522/1
   
   - string literals are of `UNKNOWN` type, that behaves like `VARCHAR`
   - `case when` on `CHAR` arguments returns `CHAR` type.
   
   MSSQL:
   http://sqlfiddle.com/#!18/50a92/1/0
   (click `browser` schema button)
   
   - string literals are `VARCHAR`
   - `case when` return type strictly follows the standard - `CHAR(8)`
   
   Oracle:
   http://sqlfiddle.com/#!4/50a92/1/0
   (click `browser` schema button)
   
   - string literals are `CHAR`
   - `case when` return type `VARCHAR`
   
   This is still hardly conclusive to motivate the change for me.
   
   I'm not sure why are you rejecting my proposal to change default string 
literal type to `VARCHAR`. It would fix all of our `CHAR` problems 
(comparisons, connector support, functions support, etc)  for now and `case 
when` would return `VARCHAR` automatically. Even if in the future we would want 
to provide a configuration switch `string-literals-type: char/varchar`, such 
configuration option would be cleaner and much easier to understand compared to 
`shouldConvertRaggedUnionTypesToVarying`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] pnowojski commented on issue #6519: [FLINK-9559] [table] The type of a union of CHAR columns of different lengths should be VARCHAR

2018-08-28 Thread GitBox
pnowojski commented on issue #6519: [FLINK-9559] [table] The type of a union of 
CHAR columns of different lengths should be VARCHAR
URL: https://github.com/apache/flink/pull/6519#issuecomment-416647706
 
 
   Good point with the comparisons. I meant that after changing literal types 
to `VARCHAR` (and accepting that this brakes SQL standard) our 
comparisons/functions (like `concat`) would be more SQL standard compliant, 
without the need to fix them. Saying that our literals are `VARCHAR` and we are 
being consistent with handling them is better then providing broken `CHAR`. 
Also it should be a cheaper and way easier for us compared to finding all 
places that do not handle `CHAR` as they should and fixing them. As a side 
note, I generally consider `CHAR` as a pretty useless type, especially since 
unicode it's no longer fixed length.
   
   Changing type of string literal would be easy in calcite 
(`org.apache.calcite.rex.RexBuilder#makePreciseStringLiteral`).
   
   On our side it's might a little bit be more tricky. I might be wrong here, 
but it might be enough to provide `RexShuttle` that do `CHAR` to `VARCHAR` type 
rewrite and apply it to 
`org.apache.calcite.rel.AbstractRelNode#accept(org.apache.calcite.rex.RexShuttle)`
 as a yet another "zero" step. Which also doesn't sound overly complicated.
   
   `shouldConvertRaggedUnionTypesToVarying` both with better `CHAR` support or 
`VARCHAR` literals looses it's purpose (and becomes cosmetic change), or 
doesn't it?
   
   Since I consider this our `CHAR` handling to be a serious problem, I would 
like it to be fixed (in one way or the other) before 1.7.0 release. If we could 
fix this properly, would you be fine with dropping this PR? If we would be 
running out of time, we could reevaluate this closer to feature freeze.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services