[jira] Commented: (HIVE-1638) convert commonly used udfs to generic udfs
[ https://issues.apache.org/jira/browse/HIVE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916284#action_12916284 ] Siying Dong commented on HIVE-1638: --- Ning, which object do you refer to? The parameters typed DeferredObject passed in seem to be different one and calling get() gets org.apache.hadoop.io.Text which already seems to be late. > convert commonly used udfs to generic udfs > -- > > Key: HIVE-1638 > URL: https://issues.apache.org/jira/browse/HIVE-1638 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Siying Dong > Attachments: HIVE-1638.1.patch > > > Copying a mail from Joy: > i did a little bit of profiling of a simple hive group by query today. i was > surprised to see that one of the most expensive functions were in converting > the equals udf (i had some simple string filters) to generic udfs. > (primitiveobjectinspectorconverter.textconverter) > am i correct in thinking that the fix is to simply port some of the most > popular udfs (string equality/comparison etc.) to generic udsf? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1638) convert commonly used udfs to generic udfs
[ https://issues.apache.org/jira/browse/HIVE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916278#action_12916278 ] Siying Dong commented on HIVE-1638: --- Sorry. It's around 30 machines and each of them 8 cores. > convert commonly used udfs to generic udfs > -- > > Key: HIVE-1638 > URL: https://issues.apache.org/jira/browse/HIVE-1638 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Siying Dong > Attachments: HIVE-1638.1.patch > > > Copying a mail from Joy: > i did a little bit of profiling of a simple hive group by query today. i was > surprised to see that one of the most expensive functions were in converting > the equals udf (i had some simple string filters) to generic udfs. > (primitiveobjectinspectorconverter.textconverter) > am i correct in thinking that the fix is to simply port some of the most > popular udfs (string equality/comparison etc.) to generic udsf? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1638) convert commonly used udfs to generic udfs
[ https://issues.apache.org/jira/browse/HIVE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916274#action_12916274 ] Siying Dong commented on HIVE-1638: --- Vladimir, it is a small test cluster with around 8 machines and each of them have 8 cores. The query latency is not hours. Input split size determines the number of mappers and influence query latency. The input split size is configurable through some parameters. For default size in our production cluster, those sample queries usually take minutes to finish. In the queries above, I set the split size to be a number that mapper usually take between 10-20 minutes, which is kind of ideal size for non-interactive queries. It doesn't represent any production query. > convert commonly used udfs to generic udfs > -- > > Key: HIVE-1638 > URL: https://issues.apache.org/jira/browse/HIVE-1638 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Siying Dong > Attachments: HIVE-1638.1.patch > > > Copying a mail from Joy: > i did a little bit of profiling of a simple hive group by query today. i was > surprised to see that one of the most expensive functions were in converting > the equals udf (i had some simple string filters) to generic udfs. > (primitiveobjectinspectorconverter.textconverter) > am i correct in thinking that the fix is to simply port some of the most > popular udfs (string equality/comparison etc.) to generic udsf? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1638) convert commonly used udfs to generic udfs
[ https://issues.apache.org/jira/browse/HIVE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916256#action_12916256 ] Ning Zhang commented on HIVE-1638: -- Siying, great work! Also can you do an optimization for the case when the parameters are constants (e.g., the 2nd parameter of f_c='5015'). The objectInspector doesn't have the information of whether the input parameter is constant or not, but I think if you check in evaluate() whether the parameter is the same *object* between the 1st and 2nd row, you can conclude the parameter is a constant. This can save a lot in object constructions. > convert commonly used udfs to generic udfs > -- > > Key: HIVE-1638 > URL: https://issues.apache.org/jira/browse/HIVE-1638 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Siying Dong > Attachments: HIVE-1638.1.patch > > > Copying a mail from Joy: > i did a little bit of profiling of a simple hive group by query today. i was > surprised to see that one of the most expensive functions were in converting > the equals udf (i had some simple string filters) to generic udfs. > (primitiveobjectinspectorconverter.textconverter) > am i correct in thinking that the fix is to simply port some of the most > popular udfs (string equality/comparison etc.) to generic udsf? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1638) convert commonly used udfs to generic udfs
[ https://issues.apache.org/jira/browse/HIVE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916254#action_12916254 ] Vladimir Rodionov commented on HIVE-1638: - OK, thanks for clarification. A couple questions more: 1. How large is your cluster? 2. Is it typical simple query latency in Hive - hours? > convert commonly used udfs to generic udfs > -- > > Key: HIVE-1638 > URL: https://issues.apache.org/jira/browse/HIVE-1638 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Siying Dong > Attachments: HIVE-1638.1.patch > > > Copying a mail from Joy: > i did a little bit of profiling of a simple hive group by query today. i was > surprised to see that one of the most expensive functions were in converting > the equals udf (i had some simple string filters) to generic udfs. > (primitiveobjectinspectorconverter.textconverter) > am i correct in thinking that the fix is to simply port some of the most > popular udfs (string equality/comparison etc.) to generic udsf? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1638) convert commonly used udfs to generic udfs
[ https://issues.apache.org/jira/browse/HIVE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916244#action_12916244 ] Siying Dong commented on HIVE-1638: --- I should have made it clearer. "CPU Cycle (MapRed Framework)" is the "CPU_MILLISECONDS" reported in "Map-Reduce Framework" section of job page. "Total CPU Time (hmon)" is not the query execution time. It is the average time the query takes the cluster. It aggregates resource usage this job takes on each machine and normalized by the total resource of the cluster (this one specifically should be number of cores). It also includes reducer's costs. It's trend should be very similar to the first one, just another source. > convert commonly used udfs to generic udfs > -- > > Key: HIVE-1638 > URL: https://issues.apache.org/jira/browse/HIVE-1638 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Siying Dong > Attachments: HIVE-1638.1.patch > > > Copying a mail from Joy: > i did a little bit of profiling of a simple hive group by query today. i was > surprised to see that one of the most expensive functions were in converting > the equals udf (i had some simple string filters) to generic udfs. > (primitiveobjectinspectorconverter.textconverter) > am i correct in thinking that the fix is to simply port some of the most > popular udfs (string equality/comparison etc.) to generic udsf? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1638) convert commonly used udfs to generic udfs
[ https://issues.apache.org/jira/browse/HIVE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916231#action_12916231 ] Vladimir Rodionov commented on HIVE-1638: - Correct me if I am wrong, new queries take 23, 42, 93, 79 seconds? What is a wall-to-wall execution time for these queries? > convert commonly used udfs to generic udfs > -- > > Key: HIVE-1638 > URL: https://issues.apache.org/jira/browse/HIVE-1638 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Siying Dong > Attachments: HIVE-1638.1.patch > > > Copying a mail from Joy: > i did a little bit of profiling of a simple hive group by query today. i was > surprised to see that one of the most expensive functions were in converting > the equals udf (i had some simple string filters) to generic udfs. > (primitiveobjectinspectorconverter.textconverter) > am i correct in thinking that the fix is to simply port some of the most > popular udfs (string equality/comparison etc.) to generic udsf? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1638) convert commonly used udfs to generic udfs
[ https://issues.apache.org/jira/browse/HIVE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916208#action_12916208 ] Namit Jain commented on HIVE-1638: -- great results - I will review the patch > convert commonly used udfs to generic udfs > -- > > Key: HIVE-1638 > URL: https://issues.apache.org/jira/browse/HIVE-1638 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Siying Dong > Attachments: HIVE-1638.1.patch > > > Copying a mail from Joy: > i did a little bit of profiling of a simple hive group by query today. i was > surprised to see that one of the most expensive functions were in converting > the equals udf (i had some simple string filters) to generic udfs. > (primitiveobjectinspectorconverter.textconverter) > am i correct in thinking that the fix is to simply port some of the most > popular udfs (string equality/comparison etc.) to generic udsf? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1638) convert commonly used udfs to generic udfs
[ https://issues.apache.org/jira/browse/HIVE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916058#action_12916058 ] Siying Dong commented on HIVE-1638: --- Forgot to say, the performance improvement doesn't seem to come from where Joydeep expected. Most improvements seem to come from not converting the second parameter if the return value can be determined only from the first parameter, which we can't do in old UDF functions with UDFBridge wrapper. > convert commonly used udfs to generic udfs > -- > > Key: HIVE-1638 > URL: https://issues.apache.org/jira/browse/HIVE-1638 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Siying Dong > Attachments: HIVE-1638.1.patch > > > Copying a mail from Joy: > i did a little bit of profiling of a simple hive group by query today. i was > surprised to see that one of the most expensive functions were in converting > the equals udf (i had some simple string filters) to generic udfs. > (primitiveobjectinspectorconverter.textconverter) > am i correct in thinking that the fix is to simply port some of the most > popular udfs (string equality/comparison etc.) to generic udsf? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.