Thejas M Nair created HIVE-6140:
-----------------------------------
Summary: trim udf is very slow
Key: HIVE-6140
URL: https://issues.apache.org/jira/browse/HIVE-6140
Project: Hive
Issue Type: Bug
Components: UDF
Reporter: Thejas M Nair
Paraphrasing what was reported by [~cartershanklin] -
I used the attached Perl script to generate 500 million two-character strings
which always included a space. I loaded it using:
create table letters (l string);
load data local inpath '/home/sandbox/data.csv' overwrite into table letters;
Then I ran this SQL script:
select count(l) from letters where l = 'l ';
select count(l) from letters where trim(l) = 'l';
First query = 170 seconds
Second query = 514 seconds
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)