[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"
[ https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908638#action_12908638 ] Amareshwari Sriramadasu commented on HIVE-1633: --- Here is full exception trace: {noformat} java.io.IOException: cannot find dir = hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1/emptyFile in partToPartitionInfo: [xxx..., xxx..., xxx..., ... hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1, hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/2] at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:277) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.(CombineHiveInputFormat.java:100) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:312) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:929) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:921) at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:838) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:792) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1021) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:792) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:766) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:610) at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:120) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:900) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:770) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:647) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:140) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:199) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:353) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) {noformat} > CombineHiveInputFormat fails with "cannot find dir for emptyFile" > - > > Key: HIVE-1633 > URL: https://issues.apache.org/jira/browse/HIVE-1633 > Project: Hadoop Hive > Issue Type: Bug > Components: Clients >Reporter: Amareshwari Sriramadasu > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1629) Patch to fix hashCode method in DoubleWritable class
[ https://issues.apache.org/jira/browse/HIVE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908639#action_12908639 ] Ning Zhang commented on HIVE-1629: -- Good question John. I think this patch doesn't affect bucketing, which is implemented using ObjectInspectorUtils.hashCode(). Actually the hash function used there for Double is the same as the one provided in this patch. But I'll double check with Zheng/Namit tomorrow. > Patch to fix hashCode method in DoubleWritable class > > > Key: HIVE-1629 > URL: https://issues.apache.org/jira/browse/HIVE-1629 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Vaibhav Aggarwal >Assignee: Vaibhav Aggarwal > Fix For: 0.7.0 > > Attachments: HIVE-1629.patch > > > A patch to fix the hashCode() method of DoubleWritable class of Hive. > It prevents the HashMap (of type DoubleWritable) from behaving as LinkedList. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"
CombineHiveInputFormat fails with "cannot find dir for emptyFile" - Key: HIVE-1633 URL: https://issues.apache.org/jira/browse/HIVE-1633 Project: Hadoop Hive Issue Type: Bug Components: Clients Reporter: Amareshwari Sriramadasu -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1629) Patch to fix hashCode method in DoubleWritable class
[ https://issues.apache.org/jira/browse/HIVE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang resolved HIVE-1629. -- Fix Version/s: 0.7.0 Resolution: Fixed Committed. Thanks Vaibhav! > Patch to fix hashCode method in DoubleWritable class > > > Key: HIVE-1629 > URL: https://issues.apache.org/jira/browse/HIVE-1629 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Vaibhav Aggarwal >Assignee: Vaibhav Aggarwal > Fix For: 0.7.0 > > Attachments: HIVE-1629.patch > > > A patch to fix the hashCode() method of DoubleWritable class of Hive. > It prevents the HashMap (of type DoubleWritable) from behaving as LinkedList. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1629) Patch to fix hashCode method in DoubleWritable class
[ https://issues.apache.org/jira/browse/HIVE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908622#action_12908622 ] John Sichi commented on HIVE-1629: -- Ning, does this change introduce incompatibility with any persistent storage (e.g. bucketing)? > Patch to fix hashCode method in DoubleWritable class > > > Key: HIVE-1629 > URL: https://issues.apache.org/jira/browse/HIVE-1629 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Vaibhav Aggarwal >Assignee: Vaibhav Aggarwal > Attachments: HIVE-1629.patch > > > A patch to fix the hashCode() method of DoubleWritable class of Hive. > It prevents the HashMap (of type DoubleWritable) from behaving as LinkedList. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1622) Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true
[ https://issues.apache.org/jira/browse/HIVE-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1622: - Attachment: HIVE-1622_0.17.patch oops, forgot the patch hadoop 0.17 logs. > Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true > --- > > Key: HIVE-1622 > URL: https://issues.apache.org/jira/browse/HIVE-1622 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1622.patch, HIVE-1622_0.17.patch > > > Currently map-only merge (using CombineHiveInputFormat) is only enabled for > merging files generated by mappers. It should be used for files generated at > readers as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1629) Patch to fix hashCode method in DoubleWritable class
[ https://issues.apache.org/jira/browse/HIVE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908599#action_12908599 ] Ning Zhang commented on HIVE-1629: -- +1 Will commit if tests pass. > Patch to fix hashCode method in DoubleWritable class > > > Key: HIVE-1629 > URL: https://issues.apache.org/jira/browse/HIVE-1629 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Vaibhav Aggarwal > Attachments: HIVE-1629.patch > > > A patch to fix the hashCode() method of DoubleWritable class of Hive. > It prevents the HashMap (of type DoubleWritable) from behaving as LinkedList. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1629) Patch to fix hashCode method in DoubleWritable class
[ https://issues.apache.org/jira/browse/HIVE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang reassigned HIVE-1629: Assignee: Vaibhav Aggarwal > Patch to fix hashCode method in DoubleWritable class > > > Key: HIVE-1629 > URL: https://issues.apache.org/jira/browse/HIVE-1629 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Vaibhav Aggarwal >Assignee: Vaibhav Aggarwal > Attachments: HIVE-1629.patch > > > A patch to fix the hashCode() method of DoubleWritable class of Hive. > It prevents the HashMap (of type DoubleWritable) from behaving as LinkedList. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Question regarding region scans in HBase integration
I see. My changes are starting out super-simple, addressing only the case of an equality predicate and a simple key. Once I get those committed, we can talk about how to add support for compound keys and range predicates, which is where your code could come in. JVS On Sep 11, 2010, at 7:26 PM, Daniel Einspanjer wrote: > Okay, that getSplits part is specifically where my code was involved. > > My use case was one of salted rowkeys. We are storing documents that have a > guid as the id and the creation date of the document is important for > scanning. When we tested having a rowkey format of > +, the RegionServer hotspots became problematic, so > we decided to salt the rowkey by using the first digit of the guid: > ++. This gives us nice distribution of > inserts throughout the cluster, but of course, it makes scanning a contiguous > date range much more complicated. > > The code I have allows us to write a MR that takes a list of prefixes (e.g. > the hexchar) and a list of ranges (e.g. the desired timestamps) and construct > a master Scan object that contains any configuration such as filters or cache > settings, and a series of Scan objects that constitute the Cartesian product > of the ranges. Then, it passes those in to a custom getSplits that ensures > only the needed regions participate in the Map. > > If this sounds like it might be useful, I'll work on getting it cleaned up > and posted somewhere so you can review it and maybe glean it for ideas. If > you are already past that point then I apologize for not checking into this > sooner. :) > > -Daniel > > On 9/11/10 7:09 PM, John Sichi wrote: >> Hi Daniel, >> >> I'm almost done with this for HIVE-1226; the remaining step I need to finish >> is to get the filter passed down during getSplits, since the HBase getSplits >> implementation takes care of figuring out which regions contain the row in >> question. >> >> JVS >> >> On Sep 11, 2010, at 7:00 PM, Daniel Einspanjer wrote: >> >>> I was trying to spend a little time this weekend catching up with the >>> current state of HBase integration for Hive. One thing that I haven't seen >>> mentioned is how exactly Hive scans an HBase table during a SELECT. >>> >>> Does Hive have logic that allows it to intelligently scan only the >>> participating regions during a SELECT query that uses the rowkey? If not, >>> I recently wrote some code that allows a MapReduce job to effectively >>> select the regions based on a list of start/end rowkey ranges. If this >>> might be useful to the Hive integration, I could create a Jira and take a >>> look at trying to set up a patch. >>> >>> Daniel Einspanjer >>> Metrics Architect >>> Mozilla Corporation
[jira] Commented: (HIVE-1364) Increase the maximum length of SERDEPROPERTIES values (currently 767 characters)
[ https://issues.apache.org/jira/browse/HIVE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908592#action_12908592 ] John Sichi commented on HIVE-1364: -- See HIVE-1632 for an ftype use case. > Increase the maximum length of SERDEPROPERTIES values (currently 767 > characters) > > > Key: HIVE-1364 > URL: https://issues.apache.org/jira/browse/HIVE-1364 > Project: Hadoop Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 0.5.0 >Reporter: Carl Steinbach >Assignee: Carl Steinbach > Fix For: 0.6.0, 0.7.0 > > Attachments: HIVE-1364.2.patch.txt, HIVE-1364.patch > > > The value component of a SERDEPROPERTIES key/value pair is currently limited > to a maximum length of 767 characters. I believe that the motivation for > limiting the length to > 767 characters is that this value is the maximum allowed length of an index in > a MySQL database running on the InnoDB engine: > http://bugs.mysql.com/bug.php?id=13315 > * The Metastore OR mapping currently limits many fields (including > SERDEPROPERTIES.PARAM_VALUE) to a maximum length of 767 characters despite > the fact that these fields are not indexed. > * The maximum length of a VARCHAR value in MySQL 5.0.3 and later is 65,535. > * We can expect many users to hit the 767 character limit on > SERDEPROPERTIES.PARAM_VALUE when using the hbase.columns.mapping > serdeproperty to map a table that has many columns. > I propose increasing the maximum allowed length of > SERDEPROPERTIES.PARAM_VALUE to 8192. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1632) Column length not sufficient for large STRUCT definitions
Column length not sufficient for large STRUCT definitions - Key: HIVE-1632 URL: https://issues.apache.org/jira/browse/HIVE-1632 Project: Hadoop Hive Issue Type: Bug Components: Metastore Affects Versions: 0.5.0 Reporter: Wolfgang Nagele Priority: Trivial Can be reproduced by adding the following table: {code}hive> CREATE TABLE test (big struct);{code} Error: {noformat}FAILED: Error in metadata: javax.jdo.JDODataStoreException: Add request failed : INSERT INTO COLUMNS (SD_ID,COMMENT,"COLUMN_NAME",TYPE_NAME,INTEGER_IDX) VALUES (?,?,?,?,?) NestedThrowables: java.sql.SQLDataException: A truncation error was encountered trying to shrink VARCHAR 'struct
[jira] Resolved: (HIVE-1630) bug in NO_DROP
[ https://issues.apache.org/jira/browse/HIVE-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain resolved HIVE-1630. -- Hadoop Flags: [Reviewed] Resolution: Fixed Committed. Thanks Siying > bug in NO_DROP > -- > > Key: HIVE-1630 > URL: https://issues.apache.org/jira/browse/HIVE-1630 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Namit Jain >Assignee: Siying Dong > Fix For: 0.7.0 > > Attachments: HIVE-1630.2.patch > > > If the table is marked NO_DROP, we should still be able to drop old > partitions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1630) bug in NO_DROP
[ https://issues.apache.org/jira/browse/HIVE-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908483#action_12908483 ] Namit Jain commented on HIVE-1630: -- +1 will commit if the tests pass > bug in NO_DROP > -- > > Key: HIVE-1630 > URL: https://issues.apache.org/jira/browse/HIVE-1630 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Namit Jain >Assignee: Siying Dong > Fix For: 0.7.0 > > Attachments: HIVE-1630.2.patch > > > If the table is marked NO_DROP, we should still be able to drop old > partitions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1539) Concurrent metastore threading problem
[ https://issues.apache.org/jira/browse/HIVE-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908472#action_12908472 ] Bennie Schut commented on HIVE-1539: Patch got comitted on datanucleus 2.2.0.m2 We could consider moving from 2.0 to 2.2 The temporary fix of loading your own classloader also wroked nicely. > Concurrent metastore threading problem > --- > > Key: HIVE-1539 > URL: https://issues.apache.org/jira/browse/HIVE-1539 > Project: Hadoop Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 0.7.0 >Reporter: Bennie Schut >Assignee: Bennie Schut > Attachments: ClassLoaderResolver.patch, thread_dump_hanging.txt > > > When running hive as a service and running a high number of queries > concurrently I end up with multiple threads running at 100% cpu without any > progress. > Looking at these threads I notice this thread(484e): > at > org.apache.hadoop.hive.metastore.ObjectStore.getMTable(ObjectStore.java:598) > But on a different thread(63a2): > at > org.apache.hadoop.hive.metastore.model.MStorageDescriptor.jdoReplaceField(MStorageDescriptor.java) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.