[jira] [Commented] (HIVE-6140) trim udf is very slow

Eric Hanson (JIRA) Mon, 06 Jan 2014 08:54:30 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13863101#comment-13863101
 ]


Eric Hanson commented on HIVE-6140:
-----------------------------------

This may not be relevant for you, but if you can use ORC then you can enable 
vectorized execution, and benefit from the vectorized implementation of TRIM, 
which should be much faster. See 
org.apache.hadoop.hive.ql.exec.vector.expressions.StringTrim.

> trim udf is very slow
> ---------------------
>
>                 Key: HIVE-6140
>                 URL: https://issues.apache.org/jira/browse/HIVE-6140
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF
>            Reporter: Thejas M Nair
>            Assignee: Anandha L Ranganathan
>         Attachments: temp.pl
>
>
> Paraphrasing what was reported by [~cartershanklin] -
> I used the attached Perl script to generate 500 million two-character strings 
> which always included a space. I loaded it using:
> create table letters (l string); 
> load data local inpath '/home/sandbox/data.csv' overwrite into table letters;
> Then I ran this SQL script:
> select count(l) from letters where l = 'l ';
> select count(l) from letters where trim(l) = 'l';
> First query = 170 seconds
> Second query  = 514 seconds



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HIVE-6140) trim udf is very slow

Reply via email to