[jira] [Commented] (HIVE-15670) column_stats_accurate may not fit in PARTITION_PARAMS.VALUE
[ https://issues.apache.org/jira/browse/HIVE-15670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16225476#comment-16225476 ] Sergey Shelukhin commented on HIVE-15670: - Yeah I suggested having bitmask as one of the possible solutions. Let me modify the description :) > column_stats_accurate may not fit in PARTITION_PARAMS.VALUE > --- > > Key: HIVE-15670 > URL: https://issues.apache.org/jira/browse/HIVE-15670 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > > The JSON can be too big with many columns (see setColumnStatsState method). > We can make JSON more compact by only storing the list of columns with true > values. Or we can even store a bitmask in a dedicated column, and adjust it > when altering table (rare enough). Or we can just change the VALUE column to > text blob (might be a painful change wrt upgrade scripts, and supporting all > the DBs' varied blob implementations, esp. in directsql). > Storing denormalized flags in a separate table will probably be slow, > comparatively. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15670) column_stats_accurate may not fit in PARTITION_PARAMS.VALUE
[ https://issues.apache.org/jira/browse/HIVE-15670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16223239#comment-16223239 ] Alexander Behm commented on HIVE-15670: --- Thanks for the response. From the perspective of a client issuing an RPC to alter column stats, it seems like a rather questionable side-effect to also alter the table metadata with a pretty big payload. Instead of "fixing" this issue by changing the database schema, could we instead remove the JSON string altogether? I'm definitely not familiar with the implementation details, just trying to provide a perspective from a Metastore client that is not Hive. > column_stats_accurate may not fit in PARTITION_PARAMS.VALUE > --- > > Key: HIVE-15670 > URL: https://issues.apache.org/jira/browse/HIVE-15670 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > > The JSON can be too big with many columns (see setColumnStatsState method). > We can make JSON more compact by only storing the list of columns with true > values. Or we can even store a bitmask in a dedicated column, and adjust it > when altering table (rare enough). Or we can just change the VALUE column to > text blob (might be a painful change wrt upgrade scripts, and supporting all > the DBs' varied blob implementations, esp. in directsql). > Storing denormalized flags in a separate table will probably be slow, > comparatively. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15670) column_stats_accurate may not fit in PARTITION_PARAMS.VALUE
[ https://issues.apache.org/jira/browse/HIVE-15670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16217948#comment-16217948 ] Sergey Shelukhin commented on HIVE-15670: - Beats me... the current implementation is as such. > column_stats_accurate may not fit in PARTITION_PARAMS.VALUE > --- > > Key: HIVE-15670 > URL: https://issues.apache.org/jira/browse/HIVE-15670 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > > The JSON can be too big with many columns (see setColumnStatsState method). > We can make JSON more compact by only storing the list of columns with true > values. Or we can even store a bitmask in a dedicated column, and adjust it > when altering table (rare enough). Or we can just change the VALUE column to > text blob (might be a painful change wrt upgrade scripts, and supporting all > the DBs' varied blob implementations, esp. in directsql). > Storing denormalized flags in a separate table will probably be slow, > comparatively. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15670) column_stats_accurate may not fit in PARTITION_PARAMS.VALUE
[ https://issues.apache.org/jira/browse/HIVE-15670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16217912#comment-16217912 ] Alexander Behm commented on HIVE-15670: --- May I ask what's the purpose of storing this JSON in the tableproperties? Seems pretty expensive to me. If you want to keep track of the accuracy of column stats, why not populate a "last updated" timestamp in the appropriate column statistic? > column_stats_accurate may not fit in PARTITION_PARAMS.VALUE > --- > > Key: HIVE-15670 > URL: https://issues.apache.org/jira/browse/HIVE-15670 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > > The JSON can be too big with many columns (see setColumnStatsState method). > We can make JSON more compact by only storing the list of columns with true > values. Or we can even store a bitmask in a dedicated column, and adjust it > when altering table (rare enough). Or we can just change the VALUE column to > text blob (might be a painful change wrt upgrade scripts, and supporting all > the DBs' varied blob implementations, esp. in directsql). > Storing denormalized flags in a separate table will probably be slow, > comparatively. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15670) column_stats_accurate may not fit in PARTITION_PARAMS.VALUE
[ https://issues.apache.org/jira/browse/HIVE-15670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15831055#comment-15831055 ] Sergey Shelukhin commented on HIVE-15670: - [~prasanth_j] [~pxiong] fyi > column_stats_accurate may not fit in PARTITION_PARAMS.VALUE > --- > > Key: HIVE-15670 > URL: https://issues.apache.org/jira/browse/HIVE-15670 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > > The JSON can be too big with many columns. > We can make JSON more compact by only storing the list of columns with true > values. Or we can even store a bitmask in a dedicated column, and adjust it > when altering table (rare enough). Or we can just change the VALUE column to > text blob (might be a painful change wrt upgrade scripts, and supporting all > the DBs' varied blob implementations, esp. in directsql). > Storing denormalized flags in a separate table will probably be slow, > comparatively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)