[
https://issues.apache.org/jira/browse/HIVE-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13863101#comment-13863101
]
Eric Hanson commented on HIVE-6140:
-----------------------------------
This may not be relevant for you, but if you can use ORC then you can enable
vectorized execution, and benefit from the vectorized implementation of TRIM,
which should be much faster. See
org.apache.hadoop.hive.ql.exec.vector.expressions.StringTrim.
> trim udf is very slow
> ---------------------
>
> Key: HIVE-6140
> URL: https://issues.apache.org/jira/browse/HIVE-6140
> Project: Hive
> Issue Type: Bug
> Components: UDF
> Reporter: Thejas M Nair
> Assignee: Anandha L Ranganathan
> Attachments: temp.pl
>
>
> Paraphrasing what was reported by [~cartershanklin] -
> I used the attached Perl script to generate 500 million two-character strings
> which always included a space. I loaded it using:
> create table letters (l string);
> load data local inpath '/home/sandbox/data.csv' overwrite into table letters;
> Then I ran this SQL script:
> select count(l) from letters where l = 'l ';
> select count(l) from letters where trim(l) = 'l';
> First query = 170 seconds
> Second query = 514 seconds
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)