[ https://issues.apache.org/jira/browse/HIVE-664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Carl Steinbach updated HIVE-664: -------------------------------- Component/s: (was: Query Processor) UDF Labels: optimization (was: ) > optimize UDF split > ------------------ > > Key: HIVE-664 > URL: https://issues.apache.org/jira/browse/HIVE-664 > Project: Hive > Issue Type: Bug > Components: UDF > Reporter: Namit Jain > Labels: optimization > > Min Zhou added a comment - 21/Jul/09 07:34 AM > It's very useful for us . > some comments: > 1. Can you implement it directly with Text ? Avoiding string decoding and > encoding would be faster. Of course that trick may lead to another problem, > as String.split uses a regular expression for splitting. > 2. getDisplayString() always return a string in lowercase. > [ Show » ] > Min Zhou added a comment - 21/Jul/09 07:34 AM It's very useful for us . some > comments: > 1. Can you implement it directly with Text ? Avoiding string decoding and > encoding would be faster. Of course that trick may lead to another problem, > as String.split uses a regular expression for splitting. > 2. getDisplayString() always return a string in lowercase. > [ Permlink | « Hide ] > Namit Jain added a comment - 21/Jul/09 09:22 AM > Committed. Thanks Emil > [ Show » ] > Namit Jain added a comment - 21/Jul/09 09:22 AM Committed. Thanks Emil > [ Permlink | « Hide ] > Emil Ibrishimov added a comment - 21/Jul/09 10:48 AM > There are some easy (compromise) ways to optimize split: > 1. Check if the regex argument actually contains some "regex specific > characters" and if it doesn't, do a straightforward split without converting > to strings. > 2. Assume some default value for the second argument (for example - > split(str) to be equivalent to split(str, ' ') and optimize for this value > 3. Have two separate split functions - one that does regex and one that > splits around plain text. > I think that 1 is a good choice and can be done rather quickly. > [ Show » ] > Emil Ibrishimov added a comment - 21/Jul/09 10:48 AM There are some easy > (compromise) ways to optimize split: 1. Check if the regex argument actually > contains some "regex specific characters" and if it doesn't, do a > straightforward split without converting to strings. 2. Assume some default > value for the second argument (for example - split(str) to be equivalent to > split(str, ' ') and optimize for this value 3. Have two separate split > functions - one that does regex and one that splits around plain text. I > think that 1 is a good choice and can be done rather quickly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira