[jira] [Commented] (HIVE-2891) TextConverter for UDF's is inefficient if the input object is already Text or Lazy
[ https://issues.apache.org/jira/browse/HIVE-2891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547860#comment-13547860 ] Hudson commented on HIVE-2891: -- Integrated in Hive-trunk-hadoop2 #54 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/54/]) HIVE-2891: TextConverter for UDF's is inefficient if the input object is already Text or Lazy (Cliff Engle via Ashutosh Chauhan) (Revision 1306096) Result = ABORTED hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1306096 Files : * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorConverter.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestObjectInspectorConverters.java TextConverter for UDF's is inefficient if the input object is already Text or Lazy -- Key: HIVE-2891 URL: https://issues.apache.org/jira/browse/HIVE-2891 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.7.0, 0.7.1, 0.8.1 Reporter: Cliff Engle Assignee: Cliff Engle Priority: Minor Fix For: 0.9.0 Attachments: HIVE-2891.1.patch.txt, HIVE-2891.2.patch.txt The TextConverter in PrimitiveObjectInspectorConverter.java is very inefficient if the input object is already Text or Lazy. Since it calls getPrimitiveJavaObject, each Text is decoded into a String and then re-encoded into Text. The solution is to check if preferWritable() is true, then call getPrimitiveWritable(input). To test performance, I ran the Grep query from https://issues.apache.org/jira/browse/HIVE-396 on a cluster of 3 ec2 large nodes (2 slaves 1 master) on 6GB of data. It took 21 map tasks. With the current 0.8.1 version, it took 81 seconds. After patching, it took 66 seconds. I will attach a patch and testcases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2891) TextConverter for UDF's is inefficient if the input object is already Text or Lazy
[ https://issues.apache.org/jira/browse/HIVE-2891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240300#comment-13240300 ] Hudson commented on HIVE-2891: -- Integrated in Hive-trunk-h0.21 #1336 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1336/]) HIVE-2891: TextConverter for UDF's is inefficient if the input object is already Text or Lazy (Cliff Engle via Ashutosh Chauhan) (Revision 1306096) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1306096 Files : * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorConverter.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestObjectInspectorConverters.java TextConverter for UDF's is inefficient if the input object is already Text or Lazy -- Key: HIVE-2891 URL: https://issues.apache.org/jira/browse/HIVE-2891 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.7.0, 0.7.1, 0.8.1 Reporter: Cliff Engle Assignee: Cliff Engle Priority: Minor Fix For: 0.9.0 Attachments: HIVE-2891.1.patch.txt, HIVE-2891.2.patch.txt The TextConverter in PrimitiveObjectInspectorConverter.java is very inefficient if the input object is already Text or Lazy. Since it calls getPrimitiveJavaObject, each Text is decoded into a String and then re-encoded into Text. The solution is to check if preferWritable() is true, then call getPrimitiveWritable(input). To test performance, I ran the Grep query from https://issues.apache.org/jira/browse/HIVE-396 on a cluster of 3 ec2 large nodes (2 slaves 1 master) on 6GB of data. It took 21 map tasks. With the current 0.8.1 version, it took 81 seconds. After patching, it took 66 seconds. I will attach a patch and testcases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira