[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"

2010-09-12 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908638#action_12908638
 ] 

Amareshwari Sriramadasu commented on HIVE-1633:
---

Here is full exception trace:
{noformat}
java.io.IOException: cannot find dir =
hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1/emptyFile
in partToPartitionInfo:
[xxx..., xxx..., xxx..., ...
 hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1,
hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/2]
at
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:277)
at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.(CombineHiveInputFormat.java:100)
at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:312)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:929)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:921)
at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:838)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:792)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1021)
at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:792)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:766)
at 
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:610)
at 
org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:120)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:900)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:770)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:647)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:140)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:199)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:353)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
{noformat}


> CombineHiveInputFormat fails with "cannot find dir for emptyFile"
> -
>
> Key: HIVE-1633
> URL: https://issues.apache.org/jira/browse/HIVE-1633
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Clients
>Reporter: Amareshwari Sriramadasu
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1629) Patch to fix hashCode method in DoubleWritable class

2010-09-12 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908639#action_12908639
 ] 

Ning Zhang commented on HIVE-1629:
--

Good question John. I think this patch doesn't affect bucketing, which is 
implemented using ObjectInspectorUtils.hashCode(). Actually the hash function 
used there for Double is the same as the one provided in this patch. But I'll 
double check with Zheng/Namit tomorrow. 

> Patch to fix hashCode method in DoubleWritable class
> 
>
> Key: HIVE-1629
> URL: https://issues.apache.org/jira/browse/HIVE-1629
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Fix For: 0.7.0
>
> Attachments: HIVE-1629.patch
>
>
> A patch to fix the hashCode() method of DoubleWritable class of Hive.
> It prevents the HashMap (of type DoubleWritable) from behaving as LinkedList.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"

2010-09-12 Thread Amareshwari Sriramadasu (JIRA)
CombineHiveInputFormat fails with "cannot find dir for emptyFile"
-

 Key: HIVE-1633
 URL: https://issues.apache.org/jira/browse/HIVE-1633
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Clients
Reporter: Amareshwari Sriramadasu




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1629) Patch to fix hashCode method in DoubleWritable class

2010-09-12 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang resolved HIVE-1629.
--

Fix Version/s: 0.7.0
   Resolution: Fixed

Committed. Thanks Vaibhav!

> Patch to fix hashCode method in DoubleWritable class
> 
>
> Key: HIVE-1629
> URL: https://issues.apache.org/jira/browse/HIVE-1629
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Fix For: 0.7.0
>
> Attachments: HIVE-1629.patch
>
>
> A patch to fix the hashCode() method of DoubleWritable class of Hive.
> It prevents the HashMap (of type DoubleWritable) from behaving as LinkedList.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1629) Patch to fix hashCode method in DoubleWritable class

2010-09-12 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908622#action_12908622
 ] 

John Sichi commented on HIVE-1629:
--

Ning, does this change introduce incompatibility with any persistent storage 
(e.g. bucketing)?


> Patch to fix hashCode method in DoubleWritable class
> 
>
> Key: HIVE-1629
> URL: https://issues.apache.org/jira/browse/HIVE-1629
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Attachments: HIVE-1629.patch
>
>
> A patch to fix the hashCode() method of DoubleWritable class of Hive.
> It prevents the HashMap (of type DoubleWritable) from behaving as LinkedList.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1622) Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true

2010-09-12 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1622:
-

Attachment: HIVE-1622_0.17.patch

oops, forgot the patch hadoop 0.17 logs. 

> Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true
> ---
>
> Key: HIVE-1622
> URL: https://issues.apache.org/jira/browse/HIVE-1622
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1622.patch, HIVE-1622_0.17.patch
>
>
> Currently map-only merge (using CombineHiveInputFormat) is only enabled for 
> merging files generated by mappers. It should be used for files generated at 
> readers as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1629) Patch to fix hashCode method in DoubleWritable class

2010-09-12 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908599#action_12908599
 ] 

Ning Zhang commented on HIVE-1629:
--

+1 Will commit if tests pass.

> Patch to fix hashCode method in DoubleWritable class
> 
>
> Key: HIVE-1629
> URL: https://issues.apache.org/jira/browse/HIVE-1629
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Vaibhav Aggarwal
> Attachments: HIVE-1629.patch
>
>
> A patch to fix the hashCode() method of DoubleWritable class of Hive.
> It prevents the HashMap (of type DoubleWritable) from behaving as LinkedList.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1629) Patch to fix hashCode method in DoubleWritable class

2010-09-12 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang reassigned HIVE-1629:


Assignee: Vaibhav Aggarwal

> Patch to fix hashCode method in DoubleWritable class
> 
>
> Key: HIVE-1629
> URL: https://issues.apache.org/jira/browse/HIVE-1629
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Attachments: HIVE-1629.patch
>
>
> A patch to fix the hashCode() method of DoubleWritable class of Hive.
> It prevents the HashMap (of type DoubleWritable) from behaving as LinkedList.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Question regarding region scans in HBase integration

2010-09-12 Thread John Sichi
I see.  My changes are starting out super-simple, addressing only the case of 
an equality predicate and a simple key.  Once I get those committed, we can 
talk about how to add support for compound keys and range predicates, which is 
where your code could come in.

JVS

On Sep 11, 2010, at 7:26 PM, Daniel Einspanjer wrote:

> Okay, that getSplits part is specifically where my code was involved.
> 
> My use case was one of salted rowkeys.  We are storing documents that have a 
> guid as the id and the creation date of the document is important for 
> scanning.  When we tested having a rowkey format of 
> +, the RegionServer hotspots became problematic, so 
> we decided to salt the rowkey by using the first digit of the guid: 
> ++.  This gives us nice distribution of 
> inserts throughout the cluster, but of course, it makes scanning a contiguous 
> date range much more complicated.
> 
> The code I have allows us to write a MR that takes a list of prefixes (e.g. 
> the hexchar) and a list of ranges (e.g. the desired timestamps) and construct 
> a master Scan object that contains any configuration such as filters or cache 
> settings, and a series of Scan objects that constitute the Cartesian product 
> of the ranges.  Then, it passes those in to a custom getSplits that ensures 
> only the needed regions participate in the Map.
> 
> If this sounds like it might be useful, I'll work on getting it cleaned up 
> and posted somewhere so you can review it and maybe glean it for ideas.  If 
> you are already past that point then I apologize for not checking into this 
> sooner. :)
> 
> -Daniel
> 
> On 9/11/10 7:09 PM, John Sichi wrote:
>> Hi Daniel,
>> 
>> I'm almost done with this for HIVE-1226; the remaining step I need to finish 
>> is to get the filter passed down during getSplits, since the HBase getSplits 
>> implementation takes care of figuring out which regions contain the row in 
>> question.
>> 
>> JVS
>> 
>> On Sep 11, 2010, at 7:00 PM, Daniel Einspanjer wrote:
>> 
>>> I was trying to spend a little time this weekend catching up with the 
>>> current state of HBase integration for Hive.  One thing that I haven't seen 
>>> mentioned is how exactly Hive scans an HBase table during a SELECT.
>>> 
>>> Does Hive have logic that allows it to intelligently scan only the 
>>> participating regions during a SELECT query that uses the rowkey?  If not, 
>>> I recently wrote some code that allows a MapReduce job to effectively 
>>> select the regions based on a list of start/end rowkey ranges.  If this 
>>> might be useful to the Hive integration, I could create a Jira and take a 
>>> look at trying to set up a patch.
>>> 
>>> Daniel Einspanjer
>>> Metrics Architect
>>> Mozilla Corporation



[jira] Commented: (HIVE-1364) Increase the maximum length of SERDEPROPERTIES values (currently 767 characters)

2010-09-12 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908592#action_12908592
 ] 

John Sichi commented on HIVE-1364:
--

See HIVE-1632 for an ftype use case.


> Increase the maximum length of SERDEPROPERTIES values (currently 767 
> characters)
> 
>
> Key: HIVE-1364
> URL: https://issues.apache.org/jira/browse/HIVE-1364
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.5.0
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1364.2.patch.txt, HIVE-1364.patch
>
>
> The value component of a SERDEPROPERTIES key/value pair is currently limited
> to a maximum length of 767 characters. I believe that the motivation for 
> limiting the length to 
> 767 characters is that this value is the maximum allowed length of an index in
> a MySQL database running on the InnoDB engine: 
> http://bugs.mysql.com/bug.php?id=13315
> * The Metastore OR mapping currently limits many fields (including 
> SERDEPROPERTIES.PARAM_VALUE) to a maximum length of 767 characters despite 
> the fact that these fields are not indexed.
> * The maximum length of a VARCHAR value in MySQL 5.0.3 and later is 65,535.
> * We can expect many users to hit the 767 character limit on 
> SERDEPROPERTIES.PARAM_VALUE when using the hbase.columns.mapping 
> serdeproperty to map a table that has many columns.
> I propose increasing the maximum allowed length of 
> SERDEPROPERTIES.PARAM_VALUE to 8192.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1632) Column length not sufficient for large STRUCT definitions

2010-09-12 Thread Wolfgang Nagele (JIRA)
Column length not sufficient for large STRUCT definitions
-

 Key: HIVE-1632
 URL: https://issues.apache.org/jira/browse/HIVE-1632
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.5.0
Reporter: Wolfgang Nagele
Priority: Trivial


Can be reproduced by adding the following table:
{code}hive> CREATE TABLE test (big struct);{code}

Error:
{noformat}FAILED: Error in metadata: javax.jdo.JDODataStoreException: Add 
request failed : INSERT INTO COLUMNS 
(SD_ID,COMMENT,"COLUMN_NAME",TYPE_NAME,INTEGER_IDX) VALUES (?,?,?,?,?) 
NestedThrowables:
java.sql.SQLDataException: A truncation error was encountered trying to shrink 
VARCHAR 'struct

[jira] Resolved: (HIVE-1630) bug in NO_DROP

2010-09-12 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-1630.
--

Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed. Thanks Siying

> bug in NO_DROP
> --
>
> Key: HIVE-1630
> URL: https://issues.apache.org/jira/browse/HIVE-1630
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Siying Dong
> Fix For: 0.7.0
>
> Attachments: HIVE-1630.2.patch
>
>
> If the table is marked NO_DROP, we should still be able to drop old 
> partitions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1630) bug in NO_DROP

2010-09-12 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908483#action_12908483
 ] 

Namit Jain commented on HIVE-1630:
--

+1

will commit if the tests pass

> bug in NO_DROP
> --
>
> Key: HIVE-1630
> URL: https://issues.apache.org/jira/browse/HIVE-1630
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Siying Dong
> Fix For: 0.7.0
>
> Attachments: HIVE-1630.2.patch
>
>
> If the table is marked NO_DROP, we should still be able to drop old 
> partitions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1539) Concurrent metastore threading problem

2010-09-12 Thread Bennie Schut (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908472#action_12908472
 ] 

Bennie Schut commented on HIVE-1539:


Patch got comitted on datanucleus 2.2.0.m2
We could consider moving from 2.0 to 2.2
The temporary fix of loading your own classloader also wroked nicely.


> Concurrent metastore threading problem 
> ---
>
> Key: HIVE-1539
> URL: https://issues.apache.org/jira/browse/HIVE-1539
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.7.0
>Reporter: Bennie Schut
>Assignee: Bennie Schut
> Attachments: ClassLoaderResolver.patch, thread_dump_hanging.txt
>
>
> When running hive as a service and running a high number of queries 
> concurrently I end up with multiple threads running at 100% cpu without any 
> progress.
> Looking at these threads I notice this thread(484e):
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getMTable(ObjectStore.java:598)
> But on a different thread(63a2):
> at 
> org.apache.hadoop.hive.metastore.model.MStorageDescriptor.jdoReplaceField(MStorageDescriptor.java)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.