[jira] Commented: (HIVE-1638) convert commonly used udfs to generic udfs

2010-09-29 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916284#action_12916284
 ] 

Siying Dong commented on HIVE-1638:
---

Ning, which object do you refer to? The parameters typed DeferredObject passed 
in seem to be different one and calling get() gets org.apache.hadoop.io.Text 
which already seems to be late.

> convert commonly used udfs to generic udfs
> --
>
> Key: HIVE-1638
> URL: https://issues.apache.org/jira/browse/HIVE-1638
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Siying Dong
> Attachments: HIVE-1638.1.patch
>
>
> Copying a mail from Joy:
> i did a little bit of profiling of a simple hive group by query today. i was 
> surprised to see that one of the most expensive functions were in converting 
> the equals udf (i had some simple string filters) to generic udfs. 
> (primitiveobjectinspectorconverter.textconverter)
> am i correct in thinking that the fix is to simply port some of the most 
> popular udfs (string equality/comparison etc.) to generic udsf?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1638) convert commonly used udfs to generic udfs

2010-09-29 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916278#action_12916278
 ] 

Siying Dong commented on HIVE-1638:
---

Sorry. It's around 30 machines and each of them 8 cores.

> convert commonly used udfs to generic udfs
> --
>
> Key: HIVE-1638
> URL: https://issues.apache.org/jira/browse/HIVE-1638
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Siying Dong
> Attachments: HIVE-1638.1.patch
>
>
> Copying a mail from Joy:
> i did a little bit of profiling of a simple hive group by query today. i was 
> surprised to see that one of the most expensive functions were in converting 
> the equals udf (i had some simple string filters) to generic udfs. 
> (primitiveobjectinspectorconverter.textconverter)
> am i correct in thinking that the fix is to simply port some of the most 
> popular udfs (string equality/comparison etc.) to generic udsf?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1638) convert commonly used udfs to generic udfs

2010-09-29 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916274#action_12916274
 ] 

Siying Dong commented on HIVE-1638:
---

Vladimir, it is a small test cluster with around 8 machines and each of them 
have 8 cores. The query latency is not hours. Input split size determines the 
number of mappers and influence query latency. The input split size is 
configurable through some parameters. For default size in our production 
cluster, those sample queries usually take minutes to finish.

In the queries above, I set the split size to be a number that mapper usually 
take between 10-20 minutes, which is kind of ideal size for non-interactive 
queries. It doesn't represent any production query.

> convert commonly used udfs to generic udfs
> --
>
> Key: HIVE-1638
> URL: https://issues.apache.org/jira/browse/HIVE-1638
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Siying Dong
> Attachments: HIVE-1638.1.patch
>
>
> Copying a mail from Joy:
> i did a little bit of profiling of a simple hive group by query today. i was 
> surprised to see that one of the most expensive functions were in converting 
> the equals udf (i had some simple string filters) to generic udfs. 
> (primitiveobjectinspectorconverter.textconverter)
> am i correct in thinking that the fix is to simply port some of the most 
> popular udfs (string equality/comparison etc.) to generic udsf?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1638) convert commonly used udfs to generic udfs

2010-09-29 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916256#action_12916256
 ] 

Ning Zhang commented on HIVE-1638:
--

Siying, great work!

Also can you do an optimization for the case when the parameters are constants 
(e.g., the 2nd parameter of f_c='5015'). The objectInspector doesn't have the 
information of whether the input parameter is constant or not, but I think if 
you check in evaluate() whether the parameter is the same *object* between the 
1st and 2nd row, you can conclude the parameter is a constant. This can save a 
lot in object constructions. 

> convert commonly used udfs to generic udfs
> --
>
> Key: HIVE-1638
> URL: https://issues.apache.org/jira/browse/HIVE-1638
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Siying Dong
> Attachments: HIVE-1638.1.patch
>
>
> Copying a mail from Joy:
> i did a little bit of profiling of a simple hive group by query today. i was 
> surprised to see that one of the most expensive functions were in converting 
> the equals udf (i had some simple string filters) to generic udfs. 
> (primitiveobjectinspectorconverter.textconverter)
> am i correct in thinking that the fix is to simply port some of the most 
> popular udfs (string equality/comparison etc.) to generic udsf?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1638) convert commonly used udfs to generic udfs

2010-09-29 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916254#action_12916254
 ] 

Vladimir Rodionov commented on HIVE-1638:
-

OK, thanks for clarification. A couple questions more: 

1. How large is your cluster?
2. Is it typical simple query latency in Hive - hours?

> convert commonly used udfs to generic udfs
> --
>
> Key: HIVE-1638
> URL: https://issues.apache.org/jira/browse/HIVE-1638
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Siying Dong
> Attachments: HIVE-1638.1.patch
>
>
> Copying a mail from Joy:
> i did a little bit of profiling of a simple hive group by query today. i was 
> surprised to see that one of the most expensive functions were in converting 
> the equals udf (i had some simple string filters) to generic udfs. 
> (primitiveobjectinspectorconverter.textconverter)
> am i correct in thinking that the fix is to simply port some of the most 
> popular udfs (string equality/comparison etc.) to generic udsf?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1638) convert commonly used udfs to generic udfs

2010-09-29 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916244#action_12916244
 ] 

Siying Dong commented on HIVE-1638:
---

I should have made it clearer.

"CPU Cycle (MapRed Framework)" is the "CPU_MILLISECONDS" reported in 
"Map-Reduce Framework" section of job page.

"Total CPU Time (hmon)" is not the query execution time. It is the average time 
the query takes the cluster. It aggregates resource usage this job takes on 
each machine and normalized by the total resource of the cluster (this one 
specifically should be number of cores). It also includes reducer's costs. It's 
trend should be very similar to the first one, just another source.



> convert commonly used udfs to generic udfs
> --
>
> Key: HIVE-1638
> URL: https://issues.apache.org/jira/browse/HIVE-1638
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Siying Dong
> Attachments: HIVE-1638.1.patch
>
>
> Copying a mail from Joy:
> i did a little bit of profiling of a simple hive group by query today. i was 
> surprised to see that one of the most expensive functions were in converting 
> the equals udf (i had some simple string filters) to generic udfs. 
> (primitiveobjectinspectorconverter.textconverter)
> am i correct in thinking that the fix is to simply port some of the most 
> popular udfs (string equality/comparison etc.) to generic udsf?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1638) convert commonly used udfs to generic udfs

2010-09-29 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916231#action_12916231
 ] 

Vladimir Rodionov commented on HIVE-1638:
-

Correct me if I am wrong, new queries take 23, 42, 93, 79 seconds? What is a 
wall-to-wall execution time for these queries?

> convert commonly used udfs to generic udfs
> --
>
> Key: HIVE-1638
> URL: https://issues.apache.org/jira/browse/HIVE-1638
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Siying Dong
> Attachments: HIVE-1638.1.patch
>
>
> Copying a mail from Joy:
> i did a little bit of profiling of a simple hive group by query today. i was 
> surprised to see that one of the most expensive functions were in converting 
> the equals udf (i had some simple string filters) to generic udfs. 
> (primitiveobjectinspectorconverter.textconverter)
> am i correct in thinking that the fix is to simply port some of the most 
> popular udfs (string equality/comparison etc.) to generic udsf?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1638) convert commonly used udfs to generic udfs

2010-09-29 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916208#action_12916208
 ] 

Namit Jain commented on HIVE-1638:
--

great results - I will review the patch

> convert commonly used udfs to generic udfs
> --
>
> Key: HIVE-1638
> URL: https://issues.apache.org/jira/browse/HIVE-1638
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Siying Dong
> Attachments: HIVE-1638.1.patch
>
>
> Copying a mail from Joy:
> i did a little bit of profiling of a simple hive group by query today. i was 
> surprised to see that one of the most expensive functions were in converting 
> the equals udf (i had some simple string filters) to generic udfs. 
> (primitiveobjectinspectorconverter.textconverter)
> am i correct in thinking that the fix is to simply port some of the most 
> popular udfs (string equality/comparison etc.) to generic udsf?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1638) convert commonly used udfs to generic udfs

2010-09-29 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916058#action_12916058
 ] 

Siying Dong commented on HIVE-1638:
---

Forgot to say, the performance improvement doesn't seem to come from where 
Joydeep expected. Most improvements seem to come from not converting the second 
parameter if the return value can be determined only from the first parameter, 
which we can't do in old UDF functions with UDFBridge wrapper. 

> convert commonly used udfs to generic udfs
> --
>
> Key: HIVE-1638
> URL: https://issues.apache.org/jira/browse/HIVE-1638
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Siying Dong
> Attachments: HIVE-1638.1.patch
>
>
> Copying a mail from Joy:
> i did a little bit of profiling of a simple hive group by query today. i was 
> surprised to see that one of the most expensive functions were in converting 
> the equals udf (i had some simple string filters) to generic udfs. 
> (primitiveobjectinspectorconverter.textconverter)
> am i correct in thinking that the fix is to simply port some of the most 
> popular udfs (string equality/comparison etc.) to generic udsf?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.