[ https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794263#action_12794263 ]
Edward Capriolo commented on HIVE-417: -------------------------------------- I currently am benching an 11 node hive cluster against a 16 TB MySQL system 4x quad core 32 GB RAM 5.1 with partitioning. Hive destroys mysql with any query like: (date_id is my partition column.) {noformat} set mapred.map.tasks=34; set mapred.reduce.tasks=11; FROM pageviews insert overwrite directory '/user/ecapriolo/hivetest4' select sitename_id, user_id, count(user_id) WHERE date_id=20091250 group by sitename_id,user_id 12098855 Rows loaded to /user/ecapriolo/hivetest4 OK Time taken: 185.528 seconds {noformat} The same query can take over 3000 seconds on MySQL because these large summary queries are always written to a temp table and then writes bottleneck your read queries. However, if mysql has an index (and if the index is in memory, which is hard in a warehouse) on some other value in the where clause like: {noformat} select sitename_id, user_id, count(user_id) WHERE date_id=20091250 and sitename_id=400 group by sitename_id,user_id {noformat} MySQL gets a relative performance speed-up, while hive ends up scanning the entire table. I agree with dhruba, >>This sounds really awesome! Make hadoop-hive suitable for things other than >>brute force table-scans! If we had indexes helping stop some brute force scans, that would just open up other doors to what hive could do. > Implement Indexing in Hive > -------------------------- > > Key: HIVE-417 > URL: https://issues.apache.org/jira/browse/HIVE-417 > Project: Hadoop Hive > Issue Type: New Feature > Components: Metastore, Query Processor > Affects Versions: 0.2.0, 0.3.0, 0.3.1, 0.4.0 > Reporter: Prasad Chakka > Assignee: He Yongqiang > Attachments: hive-417.proto.patch, hive-417-2009-07-18.patch > > > Implement indexing on Hive so that lookup and range queries are efficient. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.