[ https://issues.apache.org/jira/browse/HIVE-12274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Elliot West updated HIVE-12274: ------------------------------- Description: h2. Overview This issue is very similar in principle to HIVE-1364. We are hitting a limit when processing JSON data that has a large nested schema. The struct definition is truncated when inserted into the metastore database column {{COLUMNS_V2.YPE_NAME}} as it is greater than 4000 characters in length. Given that the purpose of these columns is to hold very loosely defined configuration values it seems rather limiting to impose such a relatively low length bound. One can imagine that valid use cases will arise where reasonable parameter/property values exceed the current limit. h2. Context These limitations were in by the patch attributed to HIVE-1364 which mentions the _"max length on Oracle 9i/10g/11g"_ as the reason. However, nowadays the limit can be increased because: * Oracle DB's {{varchar2}} supports 32767 bytes now, by setting the configuration parameter {{MAX_STRING_SIZE}} to {{EXTENDED}}. (source) * Postgres supports a max of 1GB for {{character}} datatype. (source) * MySQL can support upto 65535 bytes for the entire row. So long as the {{PARAM_KEY}} value + {{PARAM_VALUE}} is less than 65535, we should be good. (source) * SQL Server's {{varchar}} max length is 8000 and can go beyond using "varchar(max)" with the same limitation as MySQL being 65535 bytes for the entire row. (source) * Derby's {{varchar}} can be upto 32672 bytes. (source) h2. Proposal Can these columns not use CLOB-like types as for example as used by {{TBLS.VIEW_EXPANDED_TEXT}}? It would seem that suitable type equivalents exist for all targeted database platforms: * MySQL: {{mediumtext}} * Postgres: {{text}} * Oracle: {{CLOB}} * Derby: {{LONG VARCHAR}} I'd suggest that the candidates for type change are: * {{COLUMNS_V2.TYPE_NAME}} * {{TABLE_PARAMS.PARAM_VALUE}} * {{SERDE_PARAMS.PARAM_VALUE}} * {{SD_PARAMS.PARAM_VALUE}} Finally, will this limitation persist in the work resulting from HIVE-9452? was: This issue is very similar in principle to HIVE-1364. We are hitting a limit when processing JSON data that has a large nested schema. The struct definition is truncated when inserted into the metastore database column {{COLUMNS_V2.YPE_NAME}} as it is greater than 4000 characters in length. Given that the purpose of these columns is to hold very loosely defined configuration values it seems rather limiting to impose such a relatively low length bound. One can imagine that valid use cases will arise where reasonable parameter/property values exceed the current limit. Can these columns not use CLOB-like types as for example as used by {{TBLS.VIEW_EXPANDED_TEXT}}? It would seem that suitable type equivalents exist for all targeted database platforms: * MySQL: {{mediumtext}} * Postgres: {{text}} * Oracle: {{CLOB}} * Derby: {{LONG VARCHAR}} I'd suggest that the candidates for type change are: * {{COLUMNS_V2.TYPE_NAME}} * {{TABLE_PARAMS.PARAM_VALUE}} * {{SERDE_PARAMS.PARAM_VALUE}} * {{SD_PARAMS.PARAM_VALUE}} Finally, will this limitation persist in the work resulting from HIVE-9452? > Increase width of columns used for general configuration in the metastore. > -------------------------------------------------------------------------- > > Key: HIVE-12274 > URL: https://issues.apache.org/jira/browse/HIVE-12274 > Project: Hive > Issue Type: Improvement > Components: Metastore > Affects Versions: 2.0.0 > Reporter: Elliot West > Labels: metastore > Attachments: HIVE-12274.example.ddl.hql > > > h2. Overview > This issue is very similar in principle to HIVE-1364. We are hitting a limit > when processing JSON data that has a large nested schema. The struct > definition is truncated when inserted into the metastore database column > {{COLUMNS_V2.YPE_NAME}} as it is greater than 4000 characters in length. > Given that the purpose of these columns is to hold very loosely defined > configuration values it seems rather limiting to impose such a relatively low > length bound. One can imagine that valid use cases will arise where > reasonable parameter/property values exceed the current limit. > h2. Context > These limitations were in by the patch attributed to HIVE-1364 which mentions > the _"max length on Oracle 9i/10g/11g"_ as the reason. However, nowadays the > limit can be increased because: > * Oracle DB's {{varchar2}} supports 32767 bytes now, by setting the > configuration parameter {{MAX_STRING_SIZE}} to {{EXTENDED}}. (source) > * Postgres supports a max of 1GB for {{character}} datatype. (source) > * MySQL can support upto 65535 bytes for the entire row. So long as the > {{PARAM_KEY}} value + {{PARAM_VALUE}} is less than 65535, we should be good. > (source) > * SQL Server's {{varchar}} max length is 8000 and can go beyond using > "varchar(max)" with the same limitation as MySQL being 65535 bytes for the > entire row. (source) > * Derby's {{varchar}} can be upto 32672 bytes. (source) > h2. Proposal > Can these columns not use CLOB-like types as for example as used by > {{TBLS.VIEW_EXPANDED_TEXT}}? It would seem that suitable type equivalents > exist for all targeted database platforms: > * MySQL: {{mediumtext}} > * Postgres: {{text}} > * Oracle: {{CLOB}} > * Derby: {{LONG VARCHAR}} > I'd suggest that the candidates for type change are: > * {{COLUMNS_V2.TYPE_NAME}} > * {{TABLE_PARAMS.PARAM_VALUE}} > * {{SERDE_PARAMS.PARAM_VALUE}} > * {{SD_PARAMS.PARAM_VALUE}} > Finally, will this limitation persist in the work resulting from HIVE-9452? -- This message was sent by Atlassian JIRA (v6.3.4#6332)