[jira] Commented: (HIVE-1197) create a new input format where a mapper spans a file
[ https://issues.apache.org/jira/browse/HIVE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838763#action_12838763 ] Zheng Shao commented on HIVE-1197: -- Can you explain what does a mapper spans a file mean? create a new input format where a mapper spans a file - Key: HIVE-1197 URL: https://issues.apache.org/jira/browse/HIVE-1197 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Siying Dong Fix For: 0.6.0 This will be needed for Sort merge joins. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1202) Unknown exception : null while join
[ https://issues.apache.org/jira/browse/HIVE-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mafish updated HIVE-1202: - Attachment: HIVE-1202.branch-0.4.1.patch Attachment is the patch for this BUG. It limits hive to perform pruning action only when the current query block contains only on table. This is fixed according to my understanding and I'm not sure it is the original idea of author. Author of the pruner is Yongqiang, right? Please comment and evaluate it. Unknown exception : null while join - Key: HIVE-1202 URL: https://issues.apache.org/jira/browse/HIVE-1202 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.4.1 Environment: hive-0.4.1 hadoop 0.19.1 Reporter: Mafish Fix For: 0.4.1 Attachments: HIVE-1202.branch-0.4.1.patch Hive throws Unknown exception : null with query: select * from ( select name from classes ) a join classes b where a.name b.number After tracing the code, I found this bug will occur with following conditions: 1. It is join operation. 2. At least one of the source of join is physical table (right side in above case). 3. With where condition and condition(s) of where clause must include columns from both side of join (a.name and b.number in case) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1203) HiveInputFormat.getInputFormatFromCache swallows cause exception when trowing IOExcpetion
HiveInputFormat.getInputFormatFromCache swallows cause exception when trowing IOExcpetion Key: HIVE-1203 URL: https://issues.apache.org/jira/browse/HIVE-1203 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.5.0, 0.4.1, 0.4.0 Reporter: Vladimir Klimontovich Fix For: 0.4.2, 0.5.1, 0.6.0 To fix this it's simply needed to add second parameter to IOException constructor. Patches for 0.4, 0.5 and trunk are available. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1203) HiveInputFormat.getInputFormatFromCache swallows cause exception when trowing IOExcpetion
[ https://issues.apache.org/jira/browse/HIVE-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Klimontovich updated HIVE-1203: Attachment: 0.4.patch 0.5.patch HiveInputFormat.getInputFormatFromCache swallows cause exception when trowing IOExcpetion Key: HIVE-1203 URL: https://issues.apache.org/jira/browse/HIVE-1203 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.4.0, 0.4.1, 0.5.0 Reporter: Vladimir Klimontovich Fix For: 0.4.2, 0.5.1, 0.6.0 Attachments: 0.4.patch, 0.5.patch To fix this it's simply needed to add second parameter to IOException constructor. Patches for 0.4, 0.5 and trunk are available. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1203) HiveInputFormat.getInputFormatFromCache swallows cause exception when trowing IOExcpetion
[ https://issues.apache.org/jira/browse/HIVE-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Klimontovich updated HIVE-1203: Attachment: trunk.patch HiveInputFormat.getInputFormatFromCache swallows cause exception when trowing IOExcpetion Key: HIVE-1203 URL: https://issues.apache.org/jira/browse/HIVE-1203 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.4.0, 0.4.1, 0.5.0 Reporter: Vladimir Klimontovich Fix For: 0.4.2, 0.5.1, 0.6.0 Attachments: 0.4.patch, 0.5.patch, trunk.patch To fix this it's simply needed to add second parameter to IOException constructor. Patches for 0.4, 0.5 and trunk are available. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1193) ensure sorting properties for a table
[ https://issues.apache.org/jira/browse/HIVE-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838914#action_12838914 ] Edward Capriolo commented on HIVE-1193: --- Also how can the optimizer take advantage of this? If we know data is sorted we could do some aggressive pruning (if we know offsets) and short circuiting for some where conditions. ensure sorting properties for a table - Key: HIVE-1193 URL: https://issues.apache.org/jira/browse/HIVE-1193 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.6.0 Attachments: hive.1193.1.patch If a table is sorted, and data is being inserted into that - currently, we dont make sure that data is sorted. That might be useful some downstream operations. This cannot be made the default due to backward compatibility, but an option can be added for the same -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1202) Unknown exception : null while join
[ https://issues.apache.org/jira/browse/HIVE-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838946#action_12838946 ] He Yongqiang commented on HIVE-1202: Actually hive does not support this kind of join. It only support equal join. please try sth like this: select a.name, b.* from classes a join classes b on a.name = b.number where a.name b.number Unknown exception : null while join - Key: HIVE-1202 URL: https://issues.apache.org/jira/browse/HIVE-1202 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.4.1 Environment: hive-0.4.1 hadoop 0.19.1 Reporter: Mafish Fix For: 0.4.1 Attachments: HIVE-1202.branch-0.4.1.patch Hive throws Unknown exception : null with query: select * from ( select name from classes ) a join classes b where a.name b.number After tracing the code, I found this bug will occur with following conditions: 1. It is join operation. 2. At least one of the source of join is physical table (right side in above case). 3. With where condition and condition(s) of where clause must include columns from both side of join (a.name and b.number in case) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1203) HiveInputFormat.getInputFormatFromCache swallows cause exception when trowing IOExcpetion
[ https://issues.apache.org/jira/browse/HIVE-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838953#action_12838953 ] He Yongqiang commented on HIVE-1203: Vladimir, in this case, do we really need a stack trace? It is mostly caused by ClassNotFound etc when creating an input format instance with the class name. Is there some error that can only be found from the stack trace? HiveInputFormat.getInputFormatFromCache swallows cause exception when trowing IOExcpetion Key: HIVE-1203 URL: https://issues.apache.org/jira/browse/HIVE-1203 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.4.0, 0.4.1, 0.5.0 Reporter: Vladimir Klimontovich Fix For: 0.4.2, 0.5.1, 0.6.0 Attachments: 0.4.patch, 0.5.patch, trunk.patch To fix this it's simply needed to add second parameter to IOException constructor. Patches for 0.4, 0.5 and trunk are available. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1193) ensure sorting properties for a table
[ https://issues.apache.org/jira/browse/HIVE-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838958#action_12838958 ] He Yongqiang commented on HIVE-1193: @Zheng, 1. How do we make sure that the data is bucketed / sorted? By adding an additional map-reduce job? Yes. 2. What if the user already specified CLUSTER BY key in his query? As 1, there will be a new job added which will redistribute the data. If the user specify a cluster by column different than the table's sort and bucket property, we maybe should let it fail. But right now that cluster by is actually ignored. 3. Do we disable merging of small files when we do this? Yes. We should disable it. we should disable it when enabled enforceBucketing or enforceSorting ensure sorting properties for a table - Key: HIVE-1193 URL: https://issues.apache.org/jira/browse/HIVE-1193 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.6.0 Attachments: hive.1193.1.patch If a table is sorted, and data is being inserted into that - currently, we dont make sure that data is sorted. That might be useful some downstream operations. This cannot be made the default due to backward compatibility, but an option can be added for the same -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1203) HiveInputFormat.getInputFormatFromCache swallows cause exception when trowing IOExcpetion
[ https://issues.apache.org/jira/browse/HIVE-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838960#action_12838960 ] Vladimir Klimontovich commented on HIVE-1203: - I think so. Exception could be thrown not only from Class.forName, but also from Class.newInstance. Actually it was my case as due to incorrect settings constructor of my InputFormat was throwing an exception (and this exception was being swallowed). Also, it's a general rule in most java projects (to add exception as a cause esception). More information in stacktrace never hurts :) (Although I'm not sure that Hive should follow this rule) HiveInputFormat.getInputFormatFromCache swallows cause exception when trowing IOExcpetion Key: HIVE-1203 URL: https://issues.apache.org/jira/browse/HIVE-1203 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.4.0, 0.4.1, 0.5.0 Reporter: Vladimir Klimontovich Fix For: 0.4.2, 0.5.1, 0.6.0 Attachments: 0.4.patch, 0.5.patch, trunk.patch To fix this it's simply needed to add second parameter to IOException constructor. Patches for 0.4, 0.5 and trunk are available. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1197) create a new input format where a mapper spans a file
[ https://issues.apache.org/jira/browse/HIVE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838959#action_12838959 ] Namit Jain commented on HIVE-1197: -- Currently, the split that a mapper processes is determined by a variety of parameters, including the dfs block size, min split size etc. It might be useful to have an option when the users wants a mapper so scan 1 file. This will be specially useful for sort-merge join. If the data is partitioned into various buckets, and each bucket us sorted, the sort merge join can join the different buckets together. For example, consider the following scenario: table T1: sorted and bucketed by column 'key' into 1000 buckets table T2: sorted and bucketed by column 'key' into 1000 buckets and the query: select * from T1 join T2 on key mapjoin. Instead of joining the table T1 with T2, the 1000 buckets can be joined with each other individually. Since the data is sorted on the join key, sort-merge join can be used. Say the buckets are named: b0001, b0002 .. b1000 Say table T1 is the big table, and the buckets from T2 are being read as part of the mapper which is spawned to process T1, under the current approach, it will be very difficult to perform outer joins. For example, if bucket b1 for T1 contains: 1 2 5 6 9 16 22 30 and the corresponding bucket for T2 contains: 2 4 8 If there are 2 mappers for bucket b1 for T1, processing 4 records each ((1,2,5,6) and (9.16.22.30) respectively. It will be very difficult to perform a outer join. The mapper will need to peek into the previous record and the next record respectively. Moreover, it will be very difficult to ensure that the result also has 1000 buckets. Another map-reduce job will be needed for the same. This can be easily solved if we are guaranteed that the whole bucket (or the file corresponding to the bucket), will be processed by a single mapper. create a new input format where a mapper spans a file - Key: HIVE-1197 URL: https://issues.apache.org/jira/browse/HIVE-1197 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Siying Dong Fix For: 0.6.0 This will be needed for Sort merge joins. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1203) HiveInputFormat.getInputFormatFromCache swallows cause exception when trowing IOExcpetion
[ https://issues.apache.org/jira/browse/HIVE-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838968#action_12838968 ] He Yongqiang commented on HIVE-1203: Thanks for the explanation. Yes, there is no harm to do that. I will test and commit it. HiveInputFormat.getInputFormatFromCache swallows cause exception when trowing IOExcpetion Key: HIVE-1203 URL: https://issues.apache.org/jira/browse/HIVE-1203 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.4.0, 0.4.1, 0.5.0 Reporter: Vladimir Klimontovich Fix For: 0.4.2, 0.5.1, 0.6.0 Attachments: 0.4.patch, 0.5.patch, trunk.patch To fix this it's simply needed to add second parameter to IOException constructor. Patches for 0.4, 0.5 and trunk are available. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1193) ensure sorting properties for a table
[ https://issues.apache.org/jira/browse/HIVE-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839004#action_12839004 ] Namit Jain commented on HIVE-1193: -- There are 2 different jiras: one for ensuring the bucketing properties and one for ensuring the sorted properties. Currently, even though the tables are sorted and bucketed during the table creation, they are not enforced. It is up to the user to make sure the data is bucketed/sorted appropriately while loading. Since it is not enforced, the optimizer cannot take advantage of that because it doesnt know whether the data is actually sorted. There was a jira previously, which took advantage of the fact that the data is sorted for processing for group by. This is controlled by configurable parameters. Going forward, we want to use them for joining, specifically for sort merge joins. @Edward, currently we are not doing skipping based on sorting properties. Currently, we create an additional map-reduce job for bucketing/sorting. Even if there is a cluster by, and the data is already bucketed/sorted by the correct key, we dont use that. There will be another map-reduce job. This can be optimized in future. Merging of map-only jobs is disabled, but same thing should be performed for map-reduce jobs also. I will file a follow-up jira on that. ensure sorting properties for a table - Key: HIVE-1193 URL: https://issues.apache.org/jira/browse/HIVE-1193 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.6.0 Attachments: hive.1193.1.patch If a table is sorted, and data is being inserted into that - currently, we dont make sure that data is sorted. That might be useful some downstream operations. This cannot be made the default due to backward compatibility, but an option can be added for the same -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Hive User Group Meeting 3/18/2010 7pm at Facebook
Hi all, We are going to hold the second Hive User Group Meeting at 7PM on 3/18/2010 Thursday. The agenda will be: * Hive Tutorial: 20 min * Hive User Case Study: 20 min * New Features and API: 25 min JDBC/ODBC and CTAS UDF/UDAF/UDTF Create View/HBaseInputFormat Hive Join Strategy SerDe The audience is beginner to intermediate Hive users/developers. *** The details are here: http://www.facebook.com/event.php?eid=319237846974 *** *** Please RSVP so we can schedule logistics accordingly. *** -- Yours, Zheng
[jira] Commented: (HIVE-801) row-wise IN would be useful
[ https://issues.apache.org/jira/browse/HIVE-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839052#action_12839052 ] Adam Kramer commented on HIVE-801: -- Also note that the true utility of this is syntax like WHERE a.foo IN (b.*) ...for instances where b has many many columns and it is messy to articulate them. I'm thinking about a current table I have with 800 columns...is there a limit on the character-wise length of a query? row-wise IN would be useful --- Key: HIVE-801 URL: https://issues.apache.org/jira/browse/HIVE-801 Project: Hadoop Hive Issue Type: New Feature Reporter: Adam Kramer SELECT * FROM tablename t WHERE IN(12345,key1,key2,key3); ...IN would operate on a given row, and return True when the first argument equaled at least one of the other arguments. So here IN would return true if 12345=key1 OR 12345=key2 OR 12345=key3 (but wouldn't test the latter two if the first matched). This would also help with https://issues.apache.org/jira/browse/HIVE-783, if IN were implemented in a manner that allows it to be used in an ON clause. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerome Boulon updated HIVE-259: --- Attachment: (was: Percentile.xlsx) Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerome Boulon updated HIVE-259: --- Attachment: Percentile.xlsx Percentiles that match included test case Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerome Boulon updated HIVE-259: --- Attachment: HIVE-259-3.patch - use Double instead of Integer for percentile so we can ask for 99.999 percentile - checkstyle fix except State object - new test case Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerome Boulon updated HIVE-259: --- Status: Open (was: Patch Available) Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerome Boulon updated HIVE-259: --- Status: Patch Available (was: Open) HIVE-259-3.patch Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1204) typedbytes: writing to stderr kills the mapper
typedbytes: writing to stderr kills the mapper -- Key: HIVE-1204 URL: https://issues.apache.org/jira/browse/HIVE-1204 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.6.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1204) typedbytes: writing to stderr kills the mapper
[ https://issues.apache.org/jira/browse/HIVE-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1204: - Attachment: hive.1204.1.patch typedbytes: writing to stderr kills the mapper -- Key: HIVE-1204 URL: https://issues.apache.org/jira/browse/HIVE-1204 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.6.0 Attachments: hive.1204.1.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-259: Attachment: HIVE-259.4.patch This one fixes all checkstyle errors, and uses *Writable classes to avoid creating new objects as much as possible. Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, HIVE-259.4.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.