[
https://issues.apache.org/jira/browse/HIVE-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13666713#comment-13666713
]
John Omernik commented on HIVE-4070:
------------------------------------
They may be a bit scary, but from what perspective? The programming of the
system or the users/operators of the data warehouse that need consistency in
results? The scary thing about the LIKE operator and it not being able to be
controlled in it's behavior is ensuring that assumptions by users don't lead to
false negative hits. I know personally, I came from MySQl/MSSQL and I had that
happened to me until I figured out the difference. It isn't well documented, I
can't control it, it bit me. Ok, no problem, how do I fix it now in my
environment, most of my users are from that same background, so now I have to
do training to change this? How effective will that be? Will people forget?
Will I get false negatives?
A "mlike" may be an option, however, I am not sure; it still requires the user
to remember there is a change, is mlike better than other work arounds? Not if
the user forgets to use it and in the case of like, the end result is false
negatives(user forgets, no results, moves on not realizing they are missing
something).
If a global option isn't available, could like be case insensitive and an olike
function (oracle like) be added? I will say that while false positives are a
pain, at least you can look at the results and say "wait, why isn't this case
sensitive?" and you know something is different than what you are used to (i.e.
Oracle, hence olike). You see and then understand there is a discrepancy. With
mlike and case sensitivity being the default, you don't have any indicator,
because unexpected case sensitivity produces false negatives, unexpected case
insensitivity produces false positives. In analytics false positives are far
preferred than false negatives.
Just brainstorming here, what if we implemented mlike AND olike (my/ms SQL like
and oracle like) (or cslike and cilike (case sensitive and case insensitive))
Then we have the like function just be a link to mlike or olike. Since Hive
started out with case sensitivity, we could have like link to olike, but it can
be changed to mlike in the hive-site. The point being, absence of
specification, default to normal (LIKE = OLIKE Case sensitive)
hive.like.insensitive = true make it MLIKE. Simple, doesn't change the default
hive setting/behavior (out of the box), gives the option of global setting, and
from a programing standpoint we are doing an if test on what LIKE links to, and
we'd have olike and mlike available as UDFs when a user wanted to do the
opposite of what the default setting is.
> Like operator in Hive is case sensitive while in MySQL (and most likely other
> DBs) it's case insensitive
> --------------------------------------------------------------------------------------------------------
>
> Key: HIVE-4070
> URL: https://issues.apache.org/jira/browse/HIVE-4070
> Project: Hive
> Issue Type: Bug
> Components: UDF
> Affects Versions: 0.10.0
> Reporter: Mark Grover
> Assignee: Mark Grover
> Priority: Trivial
>
> Hive's like operator seems to be case sensitive.
> See
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLike.java#L164
> However, MySQL's like operator is case insensitive. I don't have other DB's
> (like PostgreSQL) installed and handy but I am guessing their LIKE is case
> insensitive as well.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira