[
https://issues.apache.org/jira/browse/HIVE-5304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergey Shelukhin updated HIVE-5304:
-----------------------------------
Description:
[removed old description]
Hive JDOQL filter pushdown and direct SQL may end up pushing StringCol op
'SomeString' to underlying SQL datastore. However, the datastore may handle
these differently based on the encoding and collation used for the columns of
the database.
So, query results can change depending on the underlying store for the
metastore, and OS used
I am assuming that byte-order sort if the correct way to order things.
Our MySQL script specifies _bin collation, and Postgres, as far as I see,
defaults to "C"; both of those are byte-order collations.
However, MySQL by default doesn't use _bin collation, so if database is
auto-created, the order of things is going to change.
Derby also uses the non-byte-order by
drop_partitions_filter.q illustrates this problem. In byte order collation
(proper way) USA is sorted before Uganda
was:
Hive uses JDOQL filters to optimize partition retrieval; recently direct SQL
was added to optimize it further. Both of these methods may end up pushing
StringCol op 'SomeString' to underlying SQL datastore. Many paths also pushes
order by-s, although these are not as problematic.
The problem is that different datastores handle string compares differently.
While testing on Postgres, I see that results in different things, from
innocent like order changes in "show partitions", to more serious like
{code}
alter table ptestfilter drop partition (c>='US', d<='2')
{code}
in drop_partitions_filter.q - in Derby, with which the .q.out file was
generated, it drops "c=Uganda/d=2"; this also passes on MySQL (I ran tests with
autocreated db); on Postgres with a db from the script it doesn't.
Looks like we need to enforce collation in partition names and
part_key_values-es; both in the create scripts, as well as during autocreate
(via package.jdo?)
EDIT:
also affected are - show indexes. So all names need to be taken care of
Then; describe_comment_nonascii.q fails against MySQL on autocreated db due to
problems with commends.
> Hive results can depend on metastore's underlying datastore, if autocreate is
> used
> ----------------------------------------------------------------------------------
>
> Key: HIVE-5304
> URL: https://issues.apache.org/jira/browse/HIVE-5304
> Project: Hive
> Issue Type: Bug
> Components: Metastore
> Reporter: Sergey Shelukhin
>
> [removed old description]
> Hive JDOQL filter pushdown and direct SQL may end up pushing StringCol op
> 'SomeString' to underlying SQL datastore. However, the datastore may handle
> these differently based on the encoding and collation used for the columns of
> the database.
> So, query results can change depending on the underlying store for the
> metastore, and OS used
> I am assuming that byte-order sort if the correct way to order things.
> Our MySQL script specifies _bin collation, and Postgres, as far as I see,
> defaults to "C"; both of those are byte-order collations.
> However, MySQL by default doesn't use _bin collation, so if database is
> auto-created, the order of things is going to change.
> Derby also uses the non-byte-order by
> drop_partitions_filter.q illustrates this problem. In byte order collation
> (proper way) USA is sorted before Uganda
--
This message was sent by Atlassian JIRA
(v6.1#6144)