[jira] [Commented] (HIVE-948) more query plan optimization rules
[ https://issues.apache.org/jira/browse/HIVE-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585084#comment-13585084 ] Phabricator commented on HIVE-948: -- navis has commented on the revision HIVE-948 [jira] more query plan optimization rules. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java:93 ?? The method is added by HIVE-1750, which was committed by you. ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java:80 Will be fixed. REVISION DETAIL https://reviews.facebook.net/D8463 To: JIRA, ashutoshc, navis Cc: njain more query plan optimization rules --- Key: HIVE-948 URL: https://issues.apache.org/jira/browse/HIVE-948 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Navis Attachments: HIVE-948.D8463.1.patch, HIVE-948.D8463.2.patch, HIVE-948.D8463.3.patch, HIVE-948.D8463.3.patch, HIVE-948.D8463.4.patch, HIVE-948.testresult_only.txt Many query plans are not optimal in that they contain redundant operators. Some examples are unnecessary select operators (select followed by select, select output being the same as input etc.). Even though these operators are not very expensive, they could account for around 10% of CPU time in some simple queries. It seems they are low-hanging fruits that we should pick first. BTW, it seems these optimization rules should be added at the last stage of the physical optimization phase since some redundant operators are added to facilitate physical plan generation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-948) more query plan optimization rules
[ https://issues.apache.org/jira/browse/HIVE-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585085#comment-13585085 ] Navis commented on HIVE-948: I've fixed failure of auto_smb_mapjoin_14 before, which is missing on current source code. Sorry. Running test. more query plan optimization rules --- Key: HIVE-948 URL: https://issues.apache.org/jira/browse/HIVE-948 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Navis Attachments: HIVE-948.D8463.1.patch, HIVE-948.D8463.2.patch, HIVE-948.D8463.3.patch, HIVE-948.D8463.3.patch, HIVE-948.D8463.4.patch, HIVE-948.testresult_only.txt Many query plans are not optimal in that they contain redundant operators. Some examples are unnecessary select operators (select followed by select, select output being the same as input etc.). Even though these operators are not very expensive, they could account for around 10% of CPU time in some simple queries. It seems they are low-hanging fruits that we should pick first. BTW, it seems these optimization rules should be added at the last stage of the physical optimization phase since some redundant operators are added to facilitate physical plan generation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3672) Support altering partition column type in Hive
[ https://issues.apache.org/jira/browse/HIVE-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585109#comment-13585109 ] Hudson commented on HIVE-3672: -- Integrated in Hive-trunk-hadoop2 #135 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/135/]) HIVE-3672 Support altering partition column type in Hive (Jingwei Lu via namit) (Revision 1449109) Result = FAILURE namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1449109 Files : * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/AlterTableAlterPartDesc.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/AlterTableDesc.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/DDLWork.java * /hive/trunk/ql/src/test/queries/clientnegative/alter_partition_coltype_2columns.q * /hive/trunk/ql/src/test/queries/clientnegative/alter_partition_coltype_invalidcolname.q * /hive/trunk/ql/src/test/queries/clientnegative/alter_partition_coltype_invalidtype.q * /hive/trunk/ql/src/test/queries/clientpositive/alter_partition_coltype.q * /hive/trunk/ql/src/test/results/clientnegative/alter_partition_coltype_2columns.q.out * /hive/trunk/ql/src/test/results/clientnegative/alter_partition_coltype_invalidcolname.q.out * /hive/trunk/ql/src/test/results/clientnegative/alter_partition_coltype_invalidtype.q.out * /hive/trunk/ql/src/test/results/clientpositive/alter_partition_coltype.q.out Support altering partition column type in Hive -- Key: HIVE-3672 URL: https://issues.apache.org/jira/browse/HIVE-3672 Project: Hive Issue Type: Improvement Components: CLI, SQL Reporter: Jingwei Lu Assignee: Jingwei Lu Labels: features Fix For: 0.11.0 Attachments: HIVE-3672.1.patch.txt, HIVE-3672.2.patch.txt, HIVE-3672.3.patch.txt, HIVE-3672.4.patch.txt, HIVE-3672.5.patch.txt, HIVE-3672.6.patch.txt, HIVE-3672.6.patch.txt, HIVE-3672.7.patch.txt, HIVE-3672.8.patch.txt, HIVE-3672.9.patch.txt Original Estimate: 72h Remaining Estimate: 72h Currently, Hive does not allow altering partition column types. As we've discouraged users from using non-string partition column types, this presents a problem for users who want to change there partition columns to be strings, they have to rename their table, create a new table, and copy all the data over. To support this via the CLI, adding a command like ALTER TABLE table_name PARTITION COLUMN (column_name new type); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4035) Column Pruner for PTF Op
[ https://issues.apache.org/jira/browse/HIVE-4035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585156#comment-13585156 ] Ashutosh Chauhan commented on HIVE-4035: +1 Column Pruner for PTF Op Key: HIVE-4035 URL: https://issues.apache.org/jira/browse/HIVE-4035 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Assignee: Prajakta Kalmegh Attachments: HIVE-4035.1.patch.txt, HIVE-4035.2.patch.txt for a PTFOp for Windowing; should prune columns based on its children. Virtual Columns should only be carried forward if needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-4035) Column Pruner for PTF Op
[ https://issues.apache.org/jira/browse/HIVE-4035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-4035. Resolution: Fixed Committed to branch. Thanks, Prajakta! Column Pruner for PTF Op Key: HIVE-4035 URL: https://issues.apache.org/jira/browse/HIVE-4035 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Assignee: Prajakta Kalmegh Attachments: HIVE-4035.1.patch.txt, HIVE-4035.2.patch.txt for a PTFOp for Windowing; should prune columns based on its children. Virtual Columns should only be carried forward if needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-4036) remove use of FunctionRegistry during PTF Op initialization
[ https://issues.apache.org/jira/browse/HIVE-4036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-4036. Resolution: Fixed Committed to trunk. Thanks, Harish! remove use of FunctionRegistry during PTF Op initialization --- Key: HIVE-4036 URL: https://issues.apache.org/jira/browse/HIVE-4036 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-4036.1.patch.txt current way of initializing WindowFnDefs breaks down for dynamic UDAFs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4044) Add URL type
[ https://issues.apache.org/jira/browse/HIVE-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585177#comment-13585177 ] Ashutosh Chauhan commented on HIVE-4044: URL is an unusual type to add in query processing engines. Can you spec out whats the motivation of adding this type (e.g. you can always use string type for urls). I am assuming from your description above that it might result in storage efficiency by having better encoding of urls. But, I see in LazyBinaryURL following comment /** * The serialization of LazyBinaryURL is the same as the binary representation * of the underlying string */ and also URLWritable has {code} @Override public void write(DataOutput out) throws IOException { if (url != null) { byte[] bytes = url.toString().getBytes(); WritableUtils.writeVInt(out, bytes.length); out.write(bytes); } else { WritableUtils.writeVInt(out, 0); } } {code} So, it seems like you are storing urls as string anyways both for intermediate data of MR as well as output of query. So, I don't see how is it resulting in better storage efficiency. Add URL type Key: HIVE-4044 URL: https://issues.apache.org/jira/browse/HIVE-4044 Project: Hive Issue Type: Improvement Reporter: Samuel Yuan Assignee: Samuel Yuan Attachments: HIVE-4044.HIVE-4044.HIVE-4044.D8799.1.patch Having a separate type for URLs would enable improvements in storage efficiency based on breaking up a URL into its components. The new type will be named URL and made a non-reserved keyword (see HIVE-701). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4042) ignore mapjoin hint
[ https://issues.apache.org/jira/browse/HIVE-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585195#comment-13585195 ] Ashutosh Chauhan commented on HIVE-4042: Whats the point of adding *yet another config and defaulting it to false* ? As I see this, whole point of this patch is not to fail the production query when it has hints. With default value being false, queries will still fail. I don't see any merit of this config at all. Why cant we *always* ignore map-join hint? For the case Kevin brought up you have already added logging and relies on user to rewrite their query using that logging info. ignore mapjoin hint --- Key: HIVE-4042 URL: https://issues.apache.org/jira/browse/HIVE-4042 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.4042.1.patch, hive.4042.2.patch, hive.4042.3.patch After HIVE-3784, in a production environment, it can become difficult to deploy since a lot of production queries can break. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Apache Jenkins job repeatedly falling
Hi, Whilst looking at another Apache project I noticed that the Jenkins job Hive-trunk-hadoop2 [1] has been failing for several weeks, sometimes taking 13 hours to do so. This doesn't seem to be a good use of the very congested Jenkins servers. Presumably the Hive developers are getting little value from a build that always fails. Is this being looked at? If there are no plans to fix this job soon then maybe it should be disabled. Thanks, Phil [1] https://builds.apache.org/job/Hive-trunk-hadoop2/
Re: HIVE-4053 | Review request
Krishna, Can you please post a patch on the JIRA and post a review on reviewboard? You should also consider adding some unit tests. If you need help with any of this, please let us know. I will post this on JIRA as well for completeness. Mark On Fri, Feb 22, 2013 at 9:48 PM, Krishna research...@gmail.com wrote: Hi, I've implemented 'Refined Soundex' algorithm using a GenericUDF and would like to share it for a review by experts as I'm a newbie. Change Details: A new java class is created: GenericUDFRefinedSoundex.java Add a entry to FunctionRegistry.java: registerGenericUDF(soundex_ref, GenericUDFRefinedSoundex.class); Both files are attached to the email. I'm planning to implement other phonetic algorithms and submit all as a single patch. I understand there are many other steps that I need to finish before a patch is ready but for now, if you could review the attached code and provide feedback, it'll be great. Here are the details of Refined Soundex algorithm: First letter is stored Subsequent letters are replaced by numbers as defined below- * B, P = 1 * F, V = 2 * C, K, S = 3 * G, J = 4 * Q, X, Z = 5 * D, T = 6 * L = 7 * M, N = 8 * R = 9 * Other letters = 0 Consecutive letters belonging to the same group are replaced by one letter Example: SELECT soundex_ref('Carren') FROM src LIMIT 1; C30908 Thanks, Krishna
[jira] [Commented] (HIVE-4053) Add support for phonetic algorithms in Hive
[ https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585226#comment-13585226 ] Mark Grover commented on HIVE-4053: --- Can you please post a patch on the JIRA and post a review on reviewboard? You should also consider adding some unit tests. If you need help with any of this, please let us know. Add support for phonetic algorithms in Hive --- Key: HIVE-4053 URL: https://issues.apache.org/jira/browse/HIVE-4053 Project: Hive Issue Type: New Feature Components: UDF Reporter: Krishna Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java Following phonetic algorithms should be considered, which are very useful in search: Soundex Refined Soundex Daitch–Mokotoff Soundex Metaphone and Double Metaphone New York State Identification and Intelligence System (NYSIIS) Caverphone -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-948) more query plan optimization rules
[ https://issues.apache.org/jira/browse/HIVE-948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-948: - Attachment: HIVE-948.D8463.5.patch navis updated the revision HIVE-948 [jira] more query plan optimization rules. 1. Fixed for the case when parent SEL has more than two child operators 2. Avoid merging SEL-SEL if any epxression of child references parent column which is result of function Reviewers: ashutoshc, JIRA REVISION DETAIL https://reviews.facebook.net/D8463 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D8463?vs=28257id=28527#toc AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java ql/src/java/org/apache/hadoop/hive/ql/ppd/PredicateTransitivePropagate.java ql/src/test/queries/clientpositive/nonblock_op_deduplicate.q ql/src/test/results/clientpositive/nonblock_op_deduplicate.q.out To: JIRA, ashutoshc, navis Cc: njain more query plan optimization rules --- Key: HIVE-948 URL: https://issues.apache.org/jira/browse/HIVE-948 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Navis Attachments: HIVE-948.D8463.1.patch, HIVE-948.D8463.2.patch, HIVE-948.D8463.3.patch, HIVE-948.D8463.3.patch, HIVE-948.D8463.4.patch, HIVE-948.D8463.5.patch, HIVE-948.testresult_only.txt Many query plans are not optimal in that they contain redundant operators. Some examples are unnecessary select operators (select followed by select, select output being the same as input etc.). Even though these operators are not very expensive, they could account for around 10% of CPU time in some simple queries. It seems they are low-hanging fruits that we should pick first. BTW, it seems these optimization rules should be added at the last stage of the physical optimization phase since some redundant operators are added to facilitate physical plan generation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-948) more query plan optimization rules
[ https://issues.apache.org/jira/browse/HIVE-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585269#comment-13585269 ] Navis commented on HIVE-948: Making the result of udf_reflect2, I've realized that it's better not to merge two SEL operators if child SEL references column of parent SEL which is result of function twice or more. For example, select reflect2(ts, getYear), reflect2(ts, getMonth) from (select cast(key as timestamp) as ts from tbl) a; more query plan optimization rules --- Key: HIVE-948 URL: https://issues.apache.org/jira/browse/HIVE-948 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Navis Attachments: HIVE-948.D8463.1.patch, HIVE-948.D8463.2.patch, HIVE-948.D8463.3.patch, HIVE-948.D8463.3.patch, HIVE-948.D8463.4.patch, HIVE-948.D8463.5.patch, HIVE-948.testresult_only.txt Many query plans are not optimal in that they contain redundant operators. Some examples are unnecessary select operators (select followed by select, select output being the same as input etc.). Even though these operators are not very expensive, they could account for around 10% of CPU time in some simple queries. It seems they are low-hanging fruits that we should pick first. BTW, it seems these optimization rules should be added at the last stage of the physical optimization phase since some redundant operators are added to facilitate physical plan generation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4068) Size of aggregation buffer which uses non-primitive type is not estimated correctly
Navis created HIVE-4068: --- Summary: Size of aggregation buffer which uses non-primitive type is not estimated correctly Key: HIVE-4068 URL: https://issues.apache.org/jira/browse/HIVE-4068 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Currently, hive assumes an aggregation buffer which holds a map is occupying just 256 byte (fixed). If it's bigger than that in real, OutOfMemoryError can be thrown (especially for 1k buffer). workaround : set hive.map.aggr.hash.percentmemory=smaller value than default(0.5) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4068) Size of aggregation buffer which uses non-primitive type is not estimated correctly
[ https://issues.apache.org/jira/browse/HIVE-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4068: -- Attachment: HIVE-4068.D8859.1.patch navis requested code review of HIVE-4068 [jira] Size of aggregation buffer which uses non-primitive type is not estimated correctly. Reviewers: JIRA HIVE-4068 Size of aggregation buffer which uses non-primitive type is not estimated correctly Currently, hive assumes an aggregation buffer which holds a map is occupying just 256 byte (fixed). If it's bigger than that in real, OutOfMemoryError can be thrown (especially for 1k buffer). workaround : set hive.map.aggr.hash.percentmemory=smaller value than default(0.5) TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D8859 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEvaluator.java MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/21519/ To: JIRA, navis Size of aggregation buffer which uses non-primitive type is not estimated correctly --- Key: HIVE-4068 URL: https://issues.apache.org/jira/browse/HIVE-4068 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-4068.D8859.1.patch Currently, hive assumes an aggregation buffer which holds a map is occupying just 256 byte (fixed). If it's bigger than that in real, OutOfMemoryError can be thrown (especially for 1k buffer). workaround : set hive.map.aggr.hash.percentmemory=smaller value than default(0.5) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira