[ https://issues.apache.org/jira/browse/CALCITE-6001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776753#comment-17776753 ]
Julian Hyde commented on CALCITE-6001: -------------------------------------- Can we be sure to test on characters that are (1) 7-bit ASCII, (2) 8-bit ASCII, (3) UTF-8, (4) non UTF-8. (Maybe category 4 is empty... are there any Unicode characters that cannot be expressed in UTF-8?) And the test should call out which category they are testing. This will be valuable because databases will inevitably have different levels of support. > Add useUtf8AsDefaultCharset flag to SqlConformanceEnum to allow encoding of > non-ISO-8859-1 characters > ----------------------------------------------------------------------------------------------------- > > Key: CALCITE-6001 > URL: https://issues.apache.org/jira/browse/CALCITE-6001 > Project: Calcite > Issue Type: New Feature > Reporter: Tanner Clary > Assignee: Tanner Clary > Priority: Major > Labels: pull-request-available > > Many dialects supported by Calcite encode their strings using a default > charset (most commonly UTF-8 or ISO-8859-1). For example, BigQuery uses > [UTF-8|https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#string_type]. > I am proposing to add a dialect property to be referenced when converting > string literals so that the current dialect's default is used unless > otherwise specified. > Presently, if no charset is specified when converting to RexLiterals > [here|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rex/RexBuilder.java#L1618], > the CalciteSystemProperty {{DEFAULT_CHARSET}} is used > ([docs|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/config/CalciteSystemProperty.java#L300]) > which is set as ISO-8859-1. > This means that when converting a query like: > {{select 'ק' as result;}} > you will get the following the error: {{Failed to encode 'ק' in character > set 'ISO-8859-1'}}. > This failure is unexpected if you are using BigQuery conformance(or any > dialect whose default is UTF-8). > Of course an alternative solution would be to just change the Calcite default > to UTF-8 which supports encoding any UNICODE character while ISO-8859-1 can > only encode the first 256, but I imagine there are reasons against this. -- This message was sent by Atlassian Jira (v8.20.10#820010)