[ 
https://issues.apache.org/jira/browse/HIVE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916274#action_12916274
 ] 

Siying Dong commented on HIVE-1638:
-----------------------------------

Vladimir, it is a small test cluster with around 8 machines and each of them 
have 8 cores. The query latency is not hours. Input split size determines the 
number of mappers and influence query latency. The input split size is 
configurable through some parameters. For default size in our production 
cluster, those sample queries usually take minutes to finish.

In the queries above, I set the split size to be a number that mapper usually 
take between 10-20 minutes, which is kind of ideal size for non-interactive 
queries. It doesn't represent any production query.

> convert commonly used udfs to generic udfs
> ------------------------------------------
>
>                 Key: HIVE-1638
>                 URL: https://issues.apache.org/jira/browse/HIVE-1638
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Siying Dong
>         Attachments: HIVE-1638.1.patch
>
>
> Copying a mail from Joy:
> i did a little bit of profiling of a simple hive group by query today. i was 
> surprised to see that one of the most expensive functions were in converting 
> the equals udf (i had some simple string filters) to generic udfs. 
> (primitiveobjectinspectorconverter.textconverter)
> am i correct in thinking that the fix is to simply port some of the most 
> popular udfs (string equality/comparison etc.) to generic udsf?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to