RE: wrong number of records loaded to a table is returned by Hive

2010-10-01 Thread Steven Wong
Based on my cursory code inspection, the non-final row count is set when 
ExecDriver.progress calls ss.getHiveHistory().setCounters(...) inside the while 
loop, and we need to add the same call after the while loop (after the last 
updateCounters call at the end) to set the final row count.


From: gaurav jain [mailto:jainy_gau...@yahoo.com]
Sent: Friday, October 01, 2010 1:17 PM
To: hive-u...@hadoop.apache.org
Cc: hive-dev@hadoop.apache.org
Subject: Re: wrong number of records loaded to a table is returned by Hive

One more data point:

in Hive History:

org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:
 26002996

in JT:
org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum TABLE_ID_1_ROWCOUNT 
0 31,208,099 31,208,099



From: gaurav jain 
To: hive-u...@hadoop.apache.org
Cc: hive-dev@hadoop.apache.org
Sent: Fri, October 1, 2010 12:07:14 PM
Subject: Re: wrong number of records loaded to a table is returned by Hive
Hi Ning,

I also see the same behavior. Below is some data for your reference.

This behavior is observed for large values.

I believe HIVE is recording non-final values at the end of insert query:

Since hive reads the HIVE History file counters, it may be printing non-final 
values.

Relevant function I looked at:


org.apache.hadoop.hive.ql.Driver.execute() 
SessionState.get().getHiveHistory().printRowCount(queryId);



org.apache.hadoop.hive.ql.history.HiveHistory.printRowCount(String)

  This function reads ROWS_INSERTED="~26002996" from hive history.


Regards,
Gaurav Jain

--

Hive Query Output
26002996 Rows loaded to 

Hive Select Output after insert
31,208,099

>From JobTracker UI:

   MAP  
REDUCE  TOTAL
Map input records   31,208,099   0   31,208,099
Map output records 31,208,099   0   31,208,099
Reduce input records   0  31,208,09931,208,099


>From Hive History File:

TaskEnd ROWS_INSERTED="~26002996"
TASK_RET_CODE="0"
TASK_HADOOP_PROGRESS="2010-10-01 18:37:39,548 Stage-1 map = 100%,  reduce = 
100%"
TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS=
   Job Counters .Launched reduce tasks:36,
   Job Counters .Rack-local map tasks:50
   Job Counters .Launched map tasks:97
  Job Counters .Data-local maptasks:47
  ...
  ...
  Map-Reduce Framework.Map input records:31208099
  Map-Reduce Framework.Reduce output records:0
  Map-Reduce Framework.Spilled Records:88206972
  Map-Reduce Framework.Map output records:31208099
  Map-Reduce Framework.Reduce input records:28636162
  TASK_ID="Stage-1" QUERY_ID="hadoop_20101001183131"
TASK_HADOOP_ID="job_201008201925_149454" TIME="1285958308044"


---



From: Ning Zhang 
To: "" 
Sent: Fri, October 1, 2010 10:45:53 AM
Subject: Re: wrong number of records loaded to a table is returned by Hive

Ping, this is a known issue. The number reported at the end of INSERT OVERWRITE 
is obtained by means of Hadoop counters, which is not very reliable and subject 
to inaccuracy due to failed tasks and speculations.

If you are using the latest trunk, you may want to try the feature of 
automatically gathering statistics during INSERT OVERWRITE TABLE. You need to 
set up a MySQL/HBase for partial stats publishing/aggregation.  You can find 
the design doc at http://wiki.apache.org/hadoop/Hive/StatsDev.

Note that stats is still in this experimental stage. So please feel free to 
report bugs/suggestions here or to 
hive-dev@hadoop.apache.org.

On Oct 1, 2010, at 10:30 AM, Ping Zhu wrote:


I had such issues on different versions of hadoop/hive: The version of 
hadoop/hive I am using now is hadoop 0.20.2/hive 0.7. The version of 
hadoop/hive I once used is hadoop 0.20.0/hive 0.5

Ping
On Fri, Oct 1, 2010 at 10:23 AM, Ping Zhu 
mailto:p...@sharethis.com>> wrote:
Hi,

  I ran a simple Hive query inserting data into a target table from a source 
table. The number of records loaded to the target table (say number A), which 
is returned by running this query, is different with the number (say number B) 
returned by running a query "select count(1) from target". I checked the number 
of rows in target table's HDFS files by running command "hadoop fs -cat 
/root/hive/metastore_db/ptarget/* | wc -l ". The number returned is number B. I 
believe number B is the actual number of rows in target table.

  I had this issue intermittently. Any comments?

  Thank you very much.

  Ping






[jira] Commented: (HIVE-1611) Add alternative search-provider to Hive site

2010-10-01 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917145#action_12917145
 ] 

Otis Gospodnetic commented on HIVE-1611:


+1 for getting the patch in now and adjusting it to work with the new skin or 
whatever ends up changing as Hive infra changes while converting to TLP.

> Add alternative search-provider to Hive site
> 
>
> Key: HIVE-1611
> URL: https://issues.apache.org/jira/browse/HIVE-1611
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Alex Baranau
>Assignee: Alex Baranau
>Priority: Minor
> Attachments: HIVE-1611.patch
>
>
> Use search-hadoop.com service to make available search in Hive sources, MLs, 
> wiki, etc.
> This was initially proposed on user mailing list. The search service was 
> already added in site's skin (common for all Hadoop related projects) before 
> so this issue is about enabling it for Hive. The ultimate goal is to use it 
> at all Hadoop's sub-projects' sites.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1427) Provide metastore schema migration scripts (0.5 -> 0.6)

2010-10-01 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1427:
-

Status: Patch Available  (was: Open)

> Provide metastore schema migration scripts (0.5 -> 0.6)
> ---
>
> Key: HIVE-1427
> URL: https://issues.apache.org/jira/browse/HIVE-1427
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Metastore
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0
>
> Attachments: HIVE-1427.1.patch.txt
>
>
> At a minimum this ticket covers packaging up example MySQL migration scripts 
> (cumulative across all schema changes from 0.5 to 0.6) and explaining what to 
> do with them in the release notes.
> This is also probably a good point at which to decide and clearly state which 
> Metastore DBs we officially support in production, e.g. do we need to provide 
> migration scripts for Derby?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1427) Provide metastore schema migration scripts (0.5 -> 0.6)

2010-10-01 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917140#action_12917140
 ] 

HBase Review Board commented on HIVE-1427:
--

Message from: "Carl Steinbach" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/931/
---

Review request for Hive Developers, namit jain, John Sichi, and Paul Yang.


Summary
---

This patch provides metastore schema upgrade scripts for MySQL and Derby based 
metastores.
Note that this patch includes the schema changes proposed in HIVE-1364.


This addresses bug HIVE-1427.
http://issues.apache.org/jira/browse/HIVE-1427


Diffs
-

  metastore/scripts/upgrade/derby/README PRE-CREATION 
  metastore/scripts/upgrade/derby/upgrade-0.6.0.derby.sql PRE-CREATION 
  metastore/scripts/upgrade/mysql/README PRE-CREATION 
  metastore/scripts/upgrade/mysql/upgrade-0.6.0.mysql.sql PRE-CREATION 

Diff: http://review.cloudera.org/r/931/diff


Testing
---


Thanks,

Carl




> Provide metastore schema migration scripts (0.5 -> 0.6)
> ---
>
> Key: HIVE-1427
> URL: https://issues.apache.org/jira/browse/HIVE-1427
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Metastore
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0
>
> Attachments: HIVE-1427.1.patch.txt
>
>
> At a minimum this ticket covers packaging up example MySQL migration scripts 
> (cumulative across all schema changes from 0.5 to 0.6) and explaining what to 
> do with them in the release notes.
> This is also probably a good point at which to decide and clearly state which 
> Metastore DBs we officially support in production, e.g. do we need to provide 
> migration scripts for Derby?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Review Request: HIVE-1427: Provide metastore schema migration scripts (0.5 -> 0.6)

2010-10-01 Thread Carl Steinbach

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/931/
---

Review request for Hive Developers, namit jain, John Sichi, and Paul Yang.


Summary
---

This patch provides metastore schema upgrade scripts for MySQL and Derby based 
metastores.
Note that this patch includes the schema changes proposed in HIVE-1364.


This addresses bug HIVE-1427.
http://issues.apache.org/jira/browse/HIVE-1427


Diffs
-

  metastore/scripts/upgrade/derby/README PRE-CREATION 
  metastore/scripts/upgrade/derby/upgrade-0.6.0.derby.sql PRE-CREATION 
  metastore/scripts/upgrade/mysql/README PRE-CREATION 
  metastore/scripts/upgrade/mysql/upgrade-0.6.0.mysql.sql PRE-CREATION 

Diff: http://review.cloudera.org/r/931/diff


Testing
---


Thanks,

Carl



[jira] Updated: (HIVE-1427) Provide metastore schema migration scripts (0.5 -> 0.6)

2010-10-01 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1427:
-

Attachment: HIVE-1427.1.patch.txt

HIVE-1427.1.patch.txt:
* Upgrade scripts for derby and mysql.
* Includes all schema changes between 0.5.0 and branch-0.6, along with proposed 
changes in HIVE-1364.

I'm in the process of running upgrade tests on Derby and MySQL.

> Provide metastore schema migration scripts (0.5 -> 0.6)
> ---
>
> Key: HIVE-1427
> URL: https://issues.apache.org/jira/browse/HIVE-1427
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Metastore
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0
>
> Attachments: HIVE-1427.1.patch.txt
>
>
> At a minimum this ticket covers packaging up example MySQL migration scripts 
> (cumulative across all schema changes from 0.5 to 0.6) and explaining what to 
> do with them in the release notes.
> This is also probably a good point at which to decide and clearly state which 
> Metastore DBs we officially support in production, e.g. do we need to provide 
> migration scripts for Derby?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1658) Fix describe [extended] column formatting

2010-10-01 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-1658:
---

Attachment: HIVE-1658-PrelimPatch.patch

Preliminary patch on the above mentioned approach - felt this one to be easier. 
Comments welcome.

The code needs to be reorganized and cleaned, but I wanted to upload patch 
before I sign off for the day. Will proceed with test cases on confirmation of 
the approach.

> Fix describe [extended] column formatting
> -
>
> Key: HIVE-1658
> URL: https://issues.apache.org/jira/browse/HIVE-1658
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-1658-PrelimPatch.patch
>
>
> When displaying the column schema, the formatting should follow should be 
> nametypecomment
> to be inline with the previous formatting style for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: mapreduce against hive raw data ?

2010-10-01 Thread Edward Capriolo
On Fri, Oct 1, 2010 at 3:08 PM, Jinsong Hu  wrote:
> Hi, There:
>  I wonder if it is possible to run map-reduce again hive's raw data.
> hive supports hql, but sometimes, I want to run map-reduce to do more
> sophisticated
> processing than those simple hql can handle. In that case, I need to run my
> own custom map-reduce job against hive's raw data.
>  I wonder if that is possible. The key issue is where to find those files
> and how to deserialize them.
> Can anybody point me into the right location to find the API ?
>
> Jimmy.
>

Jimmy,

The files are typically found in /user/hive/warehouse/

By default they would be TextFiles delimited with ^A. But depending
how you defined the table, possibly other delimiters,sequence files
they could in a different format.


[jira] Commented: (HIVE-1611) Add alternative search-provider to Hive site

2010-10-01 Thread Alex Baranau (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917021#action_12917021
 ] 

Alex Baranau commented on HIVE-1611:


I see.

As for reflecting project structure changes - we'll adjust configuration at our 
end (search-hadoop.com) to catch up with it. As we've done for all other 
projects that became TLPs recently.
At the same time in case the site skin/look changes and adjusting of html will 
be needed - you always can reopen this issue and I'll submit a new patch.

bq. If we want to see the skinconf change done first, we should open/transfer 
this ticket to core I believe.
I think we can do it now (before site moves). It's really about one line 
change, the skin itself was changed as a part of AVRO-626. Don't think we have 
to submit ticket to core as this (single-line) change is in Hive's code base.

Does that sound reasonable?

> Add alternative search-provider to Hive site
> 
>
> Key: HIVE-1611
> URL: https://issues.apache.org/jira/browse/HIVE-1611
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Alex Baranau
>Assignee: Alex Baranau
>Priority: Minor
> Attachments: HIVE-1611.patch
>
>
> Use search-hadoop.com service to make available search in Hive sources, MLs, 
> wiki, etc.
> This was initially proposed on user mailing list. The search service was 
> already added in site's skin (common for all Hadoop related projects) before 
> so this issue is about enabling it for Hive. The ultimate goal is to use it 
> at all Hadoop's sub-projects' sites.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: wrong number of records loaded to a table is returned by Hive

2010-10-01 Thread gaurav jain
One more data point:

in Hive History:

org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:
 26002996


in JT:
org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnumTABLE_ID_1_ROWCOUNT031,208,09931,208,099






From: gaurav jain 
To: hive-u...@hadoop.apache.org
Cc: hive-dev@hadoop.apache.org
Sent: Fri, October 1, 2010 12:07:14 PM
Subject: Re: wrong number of records loaded to a table is returned by Hive


Hi Ning,

I also see the same behavior. Below is some data for your reference.

This behavior is observed for large values.

I believe HIVE is recording non-final values at the end of insert query:

Since hive reads the HIVE History file counters, it may be printing non-final 
values.

Relevant function I looked at:

org.apache.hadoop.hive.ql.Driver.execute()   
  SessionState.get().getHiveHistory().printRowCount(queryId);

org.apache.hadoop.hive.ql.history.HiveHistory.printRowCount(String)
  This function reads ROWS_INSERTED="~26002996" from hive history.

Regards,
Gaurav Jain

--


Hive Query Output
26002996 Rows loaded to 

Hive Select Output after insert
31,208,099

>From JobTracker  UI:

   MAP  REDUCE  TOTAL
Map input records   31,208,099 0 
  31,208,099
Map output records 31,208,099   0   31,208,099
Reduce input records   0  31,208,09931,208,099


>From Hive History File:

TaskEnd ROWS_INSERTED="~26002996" 
TASK_RET_CODE="0"
TASK_HADOOP_PROGRESS="2010-10-01 18:37:39,548 Stage-1 map = 100%,  reduce = 
100%" 
TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS=
   Job Counters .Launched reduce tasks:36,
   Job Counters  .Rack-local map tasks:50
   Job Counters .Launched map tasks:97
  Job Counters .Data-local maptasks:47
  ...
  ...
  Map-Reduce Framework.Map input records:31208099
  Map-Reduce Framework.Reduce output records:0
  Map-Reduce Framework.Spilled Records:88206972
  Map-Reduce Framework.Map output records:31208099
  Map-Reduce Framework.Reduce input records:28636162
  TASK_ID="Stage-1" QUERY_ID="hadoop_20101001183131"   
 TASK_HADOOP_ID="job_201008201925_149454"  TIME="1285958308044"


---




From: Ning Zhang 
To: "" 
Sent: Fri, October 1, 2010 10:45:53 AM
Subject: Re: wrong number of records loaded to a table is returned by Hive

Ping, this is a known issue. The number reported at the end of INSERT OVERWRITE 
is obtained by means of Hadoop counters, which is not very reliable and subject 
to inaccuracy due to failed tasks and speculations. 

If you are using the latest trunk, you may want to try the feature of 
automatically gathering statistics during INSERT OVERWRITE TABLE. You need to 
set up a MySQL/HBase for partial stats publishing/aggregation.  You can find 
the 
design doc at http://wiki.apache.org/hadoop/Hive/StatsDev. 

Note that stats is still in this experimental stage. So please feel free to 
report bugs/suggestions here or to hive-...@hadoop.apache.org. 


On Oct 1, 2010, at 10:30 AM, Ping  Zhu wrote:

I had such issues on different versions of hadoop/hive: The version 
of hadoop/hive I am using now is hadoop 0.20.2/hive 0.7. The version 
of hadoop/hive I once used is hadoop 0.20.0/hive 0.5
>
>
>Ping
>
>
>On Fri, Oct 1, 2010 at 10:23 AM, Ping Zhu  wrote:
>
>Hi,
>>
>>
>>  I ran a simple Hive query inserting data into a target table from a source 
>>table. The number of records loaded to the target table (say number A), which 
>>is 
>>returned by running this query, is different with the number (say number B) 
>>returned by running a query "select count(1) from target". I checked the 
>>number 
>>of rows in target table's HDFS files by running command "hadoop fs -cat 
>>/root/hive/metastore_db/ptarget/* | wc -l ". The number returned is number B. 
>>I 
>>believe number B is the actual number of rows in target table.
>>
>>
>>  I had this issue intermittently. Any comments?
>>
>>
>>  Thank you very much.
>>
>>  Ping
>


  

[jira] Updated: (HIVE-307) "LOAD DATA LOCAL INPATH" fails when the table already contains a file of the same name

2010-10-01 Thread Kirk True (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated HIVE-307:
---

Status: Patch Available  (was: Open)

Patch attached.

> "LOAD DATA LOCAL INPATH" fails when the table already contains a file of the 
> same name
> --
>
> Key: HIVE-307
> URL: https://issues.apache.org/jira/browse/HIVE-307
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Zheng Shao
>Assignee: Kirk True
>Priority: Critical
> Attachments: HIVE-307.patch, HIVE-307.patch
>
>
> Failed with exception checkPaths: 
> /user/zshao/warehouse/tmp_user_msg_history/test_user_msg_history already 
> exists
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1611) Add alternative search-provider to Hive site

2010-10-01 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917016#action_12917016
 ] 

Edward Capriolo commented on HIVE-1611:
---

 Now that hive is TLP we likely have to get the ball rolling and cut the cord 
with hadoop. I will contact infra and see what our options are. We have a few 
issues. 
-we need to move the SVN from a hadoop subproject to a toplevel svn. 
-after we do that we need to take the forest docs and move them into hive then 
can change the search box

If we want to see the skinconf change done first, we should open/transfer this 
ticket to core I believe.


> Add alternative search-provider to Hive site
> 
>
> Key: HIVE-1611
> URL: https://issues.apache.org/jira/browse/HIVE-1611
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Alex Baranau
>Assignee: Alex Baranau
>Priority: Minor
> Attachments: HIVE-1611.patch
>
>
> Use search-hadoop.com service to make available search in Hive sources, MLs, 
> wiki, etc.
> This was initially proposed on user mailing list. The search service was 
> already added in site's skin (common for all Hadoop related projects) before 
> so this issue is about enabling it for Hive. The ultimate goal is to use it 
> at all Hadoop's sub-projects' sites.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Hive-trunk-h0.20 #379

2010-10-01 Thread Apache Hudson Server
See 

Changes:

[heyongqiang] HIVE-1624. Patch to allows scripts in S3 location.(Vaibhav 
Aggarwal via He Yongqiang)

--
[...truncated 14212 lines...]
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK

mapreduce against hive raw data ?

2010-10-01 Thread Jinsong Hu

Hi, There:
 I wonder if it is possible to run map-reduce again hive's raw data.
hive supports hql, but sometimes, I want to run map-reduce to do more 
sophisticated

processing than those simple hql can handle. In that case, I need to run my
own custom map-reduce job against hive's raw data.
 I wonder if that is possible. The key issue is where to find those files 
and how to deserialize them.

Can anybody point me into the right location to find the API ?

Jimmy. 



Re: wrong number of records loaded to a table is returned by Hive

2010-10-01 Thread gaurav jain
Hi Ning,

I also see the same behavior. Below is some data for your reference.

This behavior is observed for large values.

I believe HIVE is recording non-final values at the end of insert query:

Since hive reads the HIVE History file counters, it may be printing non-final 
values.

Relevant function I looked at:

org.apache.hadoop.hive.ql.Driver.execute()   
  SessionState.get().getHiveHistory().printRowCount(queryId);

org.apache.hadoop.hive.ql.history.HiveHistory.printRowCount(String)
  This function reads ROWS_INSERTED="~26002996" from hive history.

Regards,
Gaurav Jain

--


Hive Query Output
26002996 Rows loaded to 

Hive Select Output after insert
31,208,099

>From JobTracker UI:

   MAP  REDUCE  TOTAL
Map input records   31,208,099  0   31,208,099
Map output records 31,208,099   0   31,208,099
Reduce input records   0  31,208,09931,208,099


>From Hive History File:

TaskEnd ROWS_INSERTED="~26002996" 
TASK_RET_CODE="0"
TASK_HADOOP_PROGRESS="2010-10-01 18:37:39,548 Stage-1 map = 100%,  reduce = 
100%" 
TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS=
   Job Counters .Launched reduce tasks:36,
   Job Counters .Rack-local map tasks:50
   Job Counters .Launched map tasks:97
  Job Counters .Data-local maptasks:47
  ...
  ...
  Map-Reduce Framework.Map input records:31208099
  Map-Reduce Framework.Reduce output records:0
  Map-Reduce Framework.Spilled Records:88206972
  Map-Reduce Framework.Map output records:31208099
  Map-Reduce Framework.Reduce input records:28636162
  TASK_ID="Stage-1" QUERY_ID="hadoop_20101001183131"   
 TASK_HADOOP_ID="job_201008201925_149454" TIME="1285958308044"


---




From: Ning Zhang 
To: "" 
Sent: Fri, October 1, 2010 10:45:53 AM
Subject: Re: wrong number of records loaded to a table is returned by Hive

Ping, this is a known issue. The number reported at the end of INSERT OVERWRITE 
is obtained by means of Hadoop counters, which is not very reliable and subject 
to inaccuracy due to failed tasks and speculations. 

If you are using the latest trunk, you may want to try the feature of 
automatically gathering statistics during INSERT OVERWRITE TABLE. You need to 
set up a MySQL/HBase for partial stats publishing/aggregation.  You can find 
the 
design doc at http://wiki.apache.org/hadoop/Hive/StatsDev. 

Note that stats is still in this experimental stage. So please feel free to 
report bugs/suggestions here or to hive-...@hadoop.apache.org. 


On Oct 1, 2010, at 10:30 AM, Ping Zhu wrote:

I had such issues on different versions of hadoop/hive: The version 
of hadoop/hive I am using now is hadoop 0.20.2/hive 0.7. The version 
of hadoop/hive I once used is hadoop 0.20.0/hive 0.5
>
>
>Ping
>
>
>On Fri, Oct 1, 2010 at 10:23 AM, Ping Zhu  wrote:
>
>Hi,
>>
>>
>>  I ran a simple Hive query inserting data into a target table from a source 
>>table. The number of records loaded to the target table (say number A), which 
>>is 
>>returned by running this query, is different with the number (say number B) 
>>returned by running a query "select count(1) from target". I checked the 
>>number 
>>of rows in target table's HDFS files by running command "hadoop fs -cat 
>>/root/hive/metastore_db/ptarget/* | wc -l ". The number returned is number B. 
>>I 
>>believe number B is the actual number of rows in target table.
>>
>>
>>  I had this issue intermittently. Any comments?
>>
>>
>>  Thank you very much.
>>
>>  Ping
>



  

[jira] Commented: (HIVE-1611) Add alternative search-provider to Hive site

2010-10-01 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917001#action_12917001
 ] 

Ning Zhang commented on HIVE-1611:
--

Thanks for the link Alex. I've talked to Ashish and he said Hive has just been 
approved to TLP. There might be some work need to be done to move the wiki and 
all documentations (I think Edward Capriolo has volunteered to do so?). Let me 
ask Edward and see what he thinks. 

> Add alternative search-provider to Hive site
> 
>
> Key: HIVE-1611
> URL: https://issues.apache.org/jira/browse/HIVE-1611
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Alex Baranau
>Assignee: Alex Baranau
>Priority: Minor
> Attachments: HIVE-1611.patch
>
>
> Use search-hadoop.com service to make available search in Hive sources, MLs, 
> wiki, etc.
> This was initially proposed on user mailing list. The search service was 
> already added in site's skin (common for all Hadoop related projects) before 
> so this issue is about enabling it for Hive. The ultimate goal is to use it 
> at all Hadoop's sub-projects' sites.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

2010-10-01 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach resolved HIVE-1157.
--

Resolution: Duplicate

> UDFs can't be loaded via "add jar" when jar is on HDFS
> --
>
> Key: HIVE-1157
> URL: https://issues.apache.org/jira/browse/HIVE-1157
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Philip Zeyliger
>Priority: Minor
> Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, 
> HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.patch.v6.txt, 
> HIVE-1157.v2.patch.txt, output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that 
> are on jars on HDFS.  The proposed implementation would be for "add jar" to 
> recognize that the target file is on HDFS, copy it locally, and load it into 
> the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  
> Can you use a UDF where the jar which contains the function is on HDFS, and 
> not on the local filesystem.  Specifically, the following does not seem to 
> work:
> # This is Hive 0.5, from svn
> $bin/hive  
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;   
>
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';
> 
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add 
> jar" to download remote jars locally, when necessary (to load them into the 
> classpath), or update URLClassLoader (or whatever is underneath there) to 
> read directly from HDFS, which seems a bit more fragile.  But I wanted to 
> make sure that my interpretation of what's going on is right before I have at 
> it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1684) intermittent failures in create_escape.q

2010-10-01 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang resolved HIVE-1684.
--

Resolution: Duplicate

duplicate of HIVE-1669.

> intermittent failures in create_escape.q
> 
>
> Key: HIVE-1684
> URL: https://issues.apache.org/jira/browse/HIVE-1684
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: He Yongqiang
>
> [junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I 
> lastUpdateTime -I lastAccessTime -I [Oo]wner -I CreateTime -I LastAccessTime 
> -I Location -I transient_lastDdlTime -I last_modified_ -I 
> java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I 
> Caused by: -I [.][.][.] [0-9]* more 
> /data/users/njain/hive_commit1/hive_commit1/build/ql/test/logs/clientpositive/create_escape.q.out
>  
> /data/users/njain/hive_commit1/hive_commit1/ql/src/test/results/clientpositive/create_escape.q.out
> [junit] 48d47
> [junit] < serialization.format\t  
> [junit] 49a49
> [junit] > serialization.format\t  
> Sometimes, I see the above failure. 
> This does not happen always, and needs to be investigated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1658) Fix describe [extended] column formatting

2010-10-01 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916992#action_12916992
 ] 

Thiruvel Thirumoolan commented on HIVE-1658:


Patch under works.

Changes:

1. 'describe' & 'describe extended' outputs will be the same as pre HIVE-558.
2. 'describe formatted' will use the new format for displaying columns and 
additional information.

Will implement the changes similar to how extended is implemented, using a 
boolean in DescTableDesc to denote the formatted keyword and formatting the 
output in DDLTask.describeTable.

> Fix describe [extended] column formatting
> -
>
> Key: HIVE-1658
> URL: https://issues.apache.org/jira/browse/HIVE-1658
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Thiruvel Thirumoolan
>
> When displaying the column schema, the formatting should follow should be 
> nametypecomment
> to be inline with the previous formatting style for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1684) intermittent failures in create_escape.q

2010-10-01 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916990#action_12916990
 ] 

Ning Zhang commented on HIVE-1684:
--

This is the same as HIVE-1669, which was introduced in the new desc extended 
feature. It should be addressed by HIVE-1658. 

> intermittent failures in create_escape.q
> 
>
> Key: HIVE-1684
> URL: https://issues.apache.org/jira/browse/HIVE-1684
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: He Yongqiang
>
> [junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I 
> lastUpdateTime -I lastAccessTime -I [Oo]wner -I CreateTime -I LastAccessTime 
> -I Location -I transient_lastDdlTime -I last_modified_ -I 
> java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I 
> Caused by: -I [.][.][.] [0-9]* more 
> /data/users/njain/hive_commit1/hive_commit1/build/ql/test/logs/clientpositive/create_escape.q.out
>  
> /data/users/njain/hive_commit1/hive_commit1/ql/src/test/results/clientpositive/create_escape.q.out
> [junit] 48d47
> [junit] < serialization.format\t  
> [junit] 49a49
> [junit] > serialization.format\t  
> Sometimes, I see the above failure. 
> This does not happen always, and needs to be investigated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1376) Simple UDAFs with more than 1 parameter crash on empty row query

2010-10-01 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1376:
-

Attachment: HIVE-1376.2.patch

The previous patch failed on several test, particularly count(*) queries. 
Attaching a new patch for percentile only and will update a patch for HIVE-1674 
separately. 

> Simple UDAFs with more than 1 parameter crash on empty row query 
> -
>
> Key: HIVE-1376
> URL: https://issues.apache.org/jira/browse/HIVE-1376
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Ning Zhang
> Attachments: HIVE-1376.2.patch, HIVE-1376.patch
>
>
> Simple UDAFs with more than 1 parameter crash when the query returns no rows. 
> Currently, this only seems to affect the percentile() UDAF where the second 
> parameter is the percentile to be computed (of type double). I've also 
> verified the bug by adding a dummy parameter to ExampleMin in contrib. 
> On an empty query, Hive seems to be trying to resolve an iterate() method 
> with signature {null,null} instead of {null,double}. You can reproduce this 
> bug using:
> CREATE TABLE pct_test ( val INT );
> SELECT percentile(val, 0.5) FROM pct_test;
> which produces a lot of errors like: 
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
> execute method public boolean 
> org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.iterate(org.apache.hadoop.io.LongWritable,double)
>   on object 
> org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@11d13272 
> of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator 
> with arguments {null, null} of size 2

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1688) In the MapJoinOperator, the code uses tag as alias, which is not always true

2010-10-01 Thread Liyin Tang (JIRA)
In the MapJoinOperator, the code uses tag as alias, which is not always true


 Key: HIVE-1688
 URL: https://issues.apache.org/jira/browse/HIVE-1688
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Drivers
Affects Versions: 0.6.0, 0.7.0
Reporter: Liyin Tang
Assignee: Liyin Tang


In the MapJoinOperator and SMBMapJoinOperator, the code uses tag as alias, 
which is not always true.
Actually, alias = order[tag]



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1669) non-deterministic display of storage parameter in test

2010-10-01 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-1669:
---

Attachment: HIVE-1669.patch

Just sorting the param keys before displaying them. Test outputs not updated 
yet, will do so along with HIVE-1658.

> non-deterministic display of storage parameter in test
> --
>
> Key: HIVE-1669
> URL: https://issues.apache.org/jira/browse/HIVE-1669
> Project: Hadoop Hive
>  Issue Type: Sub-task
>Reporter: Ning Zhang
> Attachments: HIVE-1669.patch
>
>
> With the change to beautify the 'desc extended table', the storage parameters 
> are displayed in non-deterministic manner (since its implementation is 
> HashMap). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1686) XMLEncoder failing to serialize classes containing Enums for non-SUN JREs

2010-10-01 Thread Stephen Watt (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916966#action_12916966
 ] 

Stephen Watt commented on HIVE-1686:


This issue will also be fixed in the JRE with the next release of IBM Java 1.6 
(SR9) around the Dec 2010 timeframe.

> XMLEncoder failing to serialize classes containing Enums for non-SUN JREs
> -
>
> Key: HIVE-1686
> URL: https://issues.apache.org/jira/browse/HIVE-1686
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.5.0
> Environment: SLES 10 SP2, IBM Java 1.6 SR8
>Reporter: Stephen Watt
>Priority: Minor
> Fix For: 0.5.1
>
> Attachments: HIVE-1686.patch
>
>
> If one is using Hive 0.5 with IBM Java 1.6 certain Hive Queries will fail in 
> the Hive CLI, such as "SELECT Count(1) from TABLE" with the error "failed to 
> write expression: GenericUDAFEvaluator$Mode=Class.new()". This is due to the 
> fact that XMLEncoder in the JRE's beans.jar is not able to serialize Classes 
> with Enums without an explicitly having an EnumPersistenceDelegate assigned 
> to each class that needs to be serialized. This was an issue in SUN JDK 1.5 
> but not 1.6 and is still an issue in IBM Java 1.6.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1686) XMLEncoder failing to serialize classes containing Enums for non-SUN JREs

2010-10-01 Thread Stephen Watt (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916964#action_12916964
 ] 

Stephen Watt commented on HIVE-1686:


To resolve this issue, I have provided a patch that has added a variety of 
enumPersistenceDelegates for the classes throwing this Exception to the 
XMLEncoder in org.apache.hadoop.hive.ql.exec.Utilities.serializeMapRedWork()


> XMLEncoder failing to serialize classes containing Enums for non-SUN JREs
> -
>
> Key: HIVE-1686
> URL: https://issues.apache.org/jira/browse/HIVE-1686
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.5.0
> Environment: SLES 10 SP2, IBM Java 1.6 SR8
>Reporter: Stephen Watt
>Priority: Minor
> Fix For: 0.5.1
>
> Attachments: HIVE-1686.patch
>
>
> If one is using Hive 0.5 with IBM Java 1.6 certain Hive Queries will fail in 
> the Hive CLI, such as "SELECT Count(1) from TABLE" with the error "failed to 
> write expression: GenericUDAFEvaluator$Mode=Class.new()". This is due to the 
> fact that XMLEncoder in the JRE's beans.jar is not able to serialize Classes 
> with Enums without an explicitly having an EnumPersistenceDelegate assigned 
> to each class that needs to be serialized. This was an issue in SUN JDK 1.5 
> but not 1.6 and is still an issue in IBM Java 1.6.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1611) Add alternative search-provider to Hive site

2010-10-01 Thread Alex Baranau (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916963#action_12916963
 ] 

Alex Baranau commented on HIVE-1611:


Hello,
Sorry for misleading you, the patch is applicable to 
https://svn.apache.org/repos/asf/hadoop/hive/site/. The skinconf it should be 
applied to is:
https://svn.apache.org/repos/asf/hadoop/hive/site/author/src/documentation/skinconf.xml

As far as I understand Hive hasn't set up a separate repo (since graduating to 
TLP). It still uses Hadoop's common pelt. Right?

> Add alternative search-provider to Hive site
> 
>
> Key: HIVE-1611
> URL: https://issues.apache.org/jira/browse/HIVE-1611
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Alex Baranau
>Assignee: Alex Baranau
>Priority: Minor
> Attachments: HIVE-1611.patch
>
>
> Use search-hadoop.com service to make available search in Hive sources, MLs, 
> wiki, etc.
> This was initially proposed on user mailing list. The search service was 
> already added in site's skin (common for all Hadoop related projects) before 
> so this issue is about enabling it for Hive. The ultimate goal is to use it 
> at all Hadoop's sub-projects' sites.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1686) XMLEncoder failing to serialize classes containing Enums for non-SUN JREs

2010-10-01 Thread Stephen Watt (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Watt updated HIVE-1686:
---

Attachment: HIVE-1686.patch

> XMLEncoder failing to serialize classes containing Enums for non-SUN JREs
> -
>
> Key: HIVE-1686
> URL: https://issues.apache.org/jira/browse/HIVE-1686
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.5.0
> Environment: SLES 10 SP2, IBM Java 1.6 SR8
>Reporter: Stephen Watt
>Priority: Minor
> Fix For: 0.5.1
>
> Attachments: HIVE-1686.patch
>
>
> If one is using Hive 0.5 with IBM Java 1.6 certain Hive Queries will fail in 
> the Hive CLI, such as "SELECT Count(1) from TABLE" with the error "failed to 
> write expression: GenericUDAFEvaluator$Mode=Class.new()". This is due to the 
> fact that XMLEncoder in the JRE's beans.jar is not able to serialize Classes 
> with Enums without an explicitly having an EnumPersistenceDelegate assigned 
> to each class that needs to be serialized. This was an issue in SUN JDK 1.5 
> but not 1.6 and is still an issue in IBM Java 1.6.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1683) Column aliases cannot be used in a group by clause

2010-10-01 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916960#action_12916960
 ] 

John Sichi commented on HIVE-1683:
--

This is not a bug and should be closed as invalid.  Conceptually, GROUP BY 
processing happens before SELECT list computation, so it would be circular to 
allow such references.

> Column aliases cannot be used in a group by clause
> --
>
> Key: HIVE-1683
> URL: https://issues.apache.org/jira/browse/HIVE-1683
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Shrikrishna Lawande
>
> Column aliases cannot be used in a group by clause
> Following query would fail :
> select col1 as t, count(col2) from test group by t;
> FAILED: Error in semantic analysis: line 1:49 Invalid Table Alias or Column 
> Reference t

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1673) Create table bug causes the row format property lost when serde is specified.

2010-10-01 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1673:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed. Thanks Yongqiang

> Create table bug causes the row format property lost when serde is specified.
> -
>
> Key: HIVE-1673
> URL: https://issues.apache.org/jira/browse/HIVE-1673
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
> Attachments: hive-1673.1.patch
>
>
> An example:
> create table src_rc_serde_yongqiang(key string, value string) ROW FORMAT  
> DELIMITED FIELDS TERMINATED BY '\\0' stored as rcfile; 
> will lost the row format information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1687) smb_mapjoin_8.q in TestMinimrCliDriver hangs/fails

2010-10-01 Thread Namit Jain (JIRA)
smb_mapjoin_8.q in TestMinimrCliDriver hangs/fails
--

 Key: HIVE-1687
 URL: https://issues.apache.org/jira/browse/HIVE-1687
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Joydeep Sen Sarma


The test never seems to succeed for me, although it is OK for many other people

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

2010-10-01 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HIVE-1157:
--

Attachment: HIVE-1157.patch.v6.txt

> UDFs can't be loaded via "add jar" when jar is on HDFS
> --
>
> Key: HIVE-1157
> URL: https://issues.apache.org/jira/browse/HIVE-1157
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Philip Zeyliger
>Priority: Minor
> Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, 
> HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.patch.v6.txt, 
> HIVE-1157.v2.patch.txt, output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that 
> are on jars on HDFS.  The proposed implementation would be for "add jar" to 
> recognize that the target file is on HDFS, copy it locally, and load it into 
> the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  
> Can you use a UDF where the jar which contains the function is on HDFS, and 
> not on the local filesystem.  Specifically, the following does not seem to 
> work:
> # This is Hive 0.5, from svn
> $bin/hive  
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;   
>
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';
> 
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add 
> jar" to download remote jars locally, when necessary (to load them into the 
> classpath), or update URLClassLoader (or whatever is underneath there) to 
> read directly from HDFS, which seems a bit more fragile.  But I wanted to 
> make sure that my interpretation of what's going on is right before I have at 
> it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

2010-10-01 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916956#action_12916956
 ] 

Philip Zeyliger commented on HIVE-1157:
---

Namit,

Thanks for the review.  I've fixed the test failures.  The one you pointed out 
was a missing log line from the results.  And there was a second one having to 
do with relative paths.

Oddly enough, however, when I tried to bring the changes up to current trunk, 
it turned out that HIVE-1624 conflicted, and, when I looked at it, it turns out 
to supply the same feature as this patch.  I'll upload the fixed patch for 
posterity, but it looks like this issue is no longer necessary.

-- Philip

> UDFs can't be loaded via "add jar" when jar is on HDFS
> --
>
> Key: HIVE-1157
> URL: https://issues.apache.org/jira/browse/HIVE-1157
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Philip Zeyliger
>Priority: Minor
> Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, 
> HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.v2.patch.txt, 
> output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that 
> are on jars on HDFS.  The proposed implementation would be for "add jar" to 
> recognize that the target file is on HDFS, copy it locally, and load it into 
> the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  
> Can you use a UDF where the jar which contains the function is on HDFS, and 
> not on the local filesystem.  Specifically, the following does not seem to 
> work:
> # This is Hive 0.5, from svn
> $bin/hive  
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;   
>
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';
> 
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add 
> jar" to download remote jars locally, when necessary (to load them into the 
> classpath), or update URLClassLoader (or whatever is underneath there) to 
> read directly from HDFS, which seems a bit more fragile.  But I wanted to 
> make sure that my interpretation of what's going on is right before I have at 
> it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1686) XMLEncoder failing to serialize classes containing Enums for non-SUN JREs

2010-10-01 Thread Stephen Watt (JIRA)
XMLEncoder failing to serialize classes containing Enums for non-SUN JREs
-

 Key: HIVE-1686
 URL: https://issues.apache.org/jira/browse/HIVE-1686
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.5.0
 Environment: SLES 10 SP2, IBM Java 1.6 SR8
Reporter: Stephen Watt
Priority: Minor
 Fix For: 0.5.1


If one is using Hive 0.5 with IBM Java 1.6 certain Hive Queries will fail in 
the Hive CLI, such as "SELECT Count(1) from TABLE" with the error "failed to 
write expression: GenericUDAFEvaluator$Mode=Class.new()". This is due to the 
fact that XMLEncoder in the JRE's beans.jar is not able to serialize Classes 
with Enums without an explicitly having an EnumPersistenceDelegate assigned to 
each class that needs to be serialized. This was an issue in SUN JDK 1.5 but 
not 1.6 and is still an issue in IBM Java 1.6.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-307) "LOAD DATA LOCAL INPATH" fails when the table already contains a file of the same name

2010-10-01 Thread Kirk True (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated HIVE-307:
---

Attachment: HIVE-307.patch

Here is a resubmission of the patch with a CLI-based unit test.

Let me know what else needs to be addressed.

Thanks!

> "LOAD DATA LOCAL INPATH" fails when the table already contains a file of the 
> same name
> --
>
> Key: HIVE-307
> URL: https://issues.apache.org/jira/browse/HIVE-307
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Zheng Shao
>Assignee: Kirk True
>Priority: Critical
> Attachments: HIVE-307.patch, HIVE-307.patch
>
>
> Failed with exception checkPaths: 
> /user/zshao/warehouse/tmp_user_msg_history/test_user_msg_history already 
> exists
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1683) Column aliases cannot be used in a group by clause

2010-10-01 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916953#action_12916953
 ] 

Namit Jain commented on HIVE-1683:
--

A workaround is to use the original expression:

select col1, count(col2) from test group by col1;

> Column aliases cannot be used in a group by clause
> --
>
> Key: HIVE-1683
> URL: https://issues.apache.org/jira/browse/HIVE-1683
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Shrikrishna Lawande
>
> Column aliases cannot be used in a group by clause
> Following query would fail :
> select col1 as t, count(col2) from test group by t;
> FAILED: Error in semantic analysis: line 1:49 Invalid Table Alias or Column 
> Reference t

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1673) Create table bug causes the row format property lost when serde is specified.

2010-10-01 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916951#action_12916951
 ] 

Namit Jain commented on HIVE-1673:
--

scriptfile1.q and smp_mapjoin_8.q in TestMiniMrCliDriver are also unrelated - 
filed independent jiras for them also.


> Create table bug causes the row format property lost when serde is specified.
> -
>
> Key: HIVE-1673
> URL: https://issues.apache.org/jira/browse/HIVE-1673
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
> Attachments: hive-1673.1.patch
>
>
> An example:
> create table src_rc_serde_yongqiang(key string, value string) ROW FORMAT  
> DELIMITED FIELDS TERMINATED BY '\\0' stored as rcfile; 
> will lost the row format information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1685) scriptfile1.1 in minimr faling intermittently

2010-10-01 Thread Namit Jain (JIRA)
scriptfile1.1 in minimr faling intermittently
-

 Key: HIVE-1685
 URL: https://issues.apache.org/jira/browse/HIVE-1685
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Joydeep Sen Sarma


 [junit] Begin query: scriptfile1.q
[junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I 
lastUpdateTime -I lastAccessTime -I [Oo]wner -I CreateTime -I LastAccessTime -I 
Location -I transient_lastDdlTime -I last_modified_ -I 
java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I Caused 
by: -I [.][.][.] [0-9]* more 
/data/users/njain/hive_commit1/hive_commit1/build/ql/test/logs/clientpositive/scriptfile1.q.out
 
/data/users/njain/hive_commit1/hive_commit1/ql/src/test/results/clientpositive/scriptfile1.q.out
[junit] 1c1
[junit] < PREHOOK: query: CREATE TABLE scriptfile1_dest1(key INT, value 
STRING)
[junit] ---
[junit] > PREHOOK: query: CREATE TABLE dest1(key INT, value STRING)
[junit] 3c3
[junit] < POSTHOOK: query: CREATE TABLE scriptfile1_dest1(key INT, value 
STRING)
[junit] ---
[junit] > POSTHOOK: query: CREATE TABLE dest1(key INT, value STRING)
[junit] 5c5
[junit] < POSTHOOK: Output: defa...@scriptfile1_dest1
[junit] ---
[junit] > POSTHOOK: Output: defa...@dest1
[junit] 12c12
[junit] < INSERT OVERWRITE TABLE scriptfile1_dest1 SELECT tmap.tkey, 
tmap.tvalue
[junit] ---
[junit] junit.framework.AssertionFailedError: Client execution results 
failed with error code = 1
[junit] > INSERT OVERWRITE TABLE dest1 SELECT tmap.tkey, tmap.tvalue
[junit] See build/ql/tmp/hive.log, or try "ant test ... 
-Dtest.silent=false" to get more logs.
[junit] 15c15
[junit] at junit.framework.Assert.fail(Assert.java:47)
[junit] < PREHOOK: Output: defa...@scriptfile1_dest1
[junit] at 
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_scriptfile1(TestMinimrCliDriver.java:522)
[junit] ---
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] > PREHOOK: Output: defa...@dest1
[junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[junit] 22c22
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] < INSERT OVERWRITE TABLE scriptfile1_dest1 SELECT tmap.tkey, 
tmap.tvalue
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] ---
[junit] at junit.framework.TestCase.runTest(TestCase.java:154)
[junit] > INSERT OVERWRITE TABLE dest1 SELECT tmap.tkey, tmap.tvalue
[junit] at junit.framework.TestCase.runBare(TestCase.java:127)
[junit] 25,28c25,28
[junit] at junit.framework.TestResult$1.protect(TestResult.java:106)
[junit] < POSTHOOK: Output: defa...@scriptfile1_dest1
[junit] at junit.framework.TestResult.runProtected(TestResult.java:124)
[junit] < POSTHOOK: Lineage: scriptfile1_dest1.key SCRIPT 
[(src)src.FieldSchema(name:key, type:string, comment:default), 
(src)src.FieldSchema(name:value, type:string, comment:default), ]
[junit] at junit.framework.TestResult.run(TestResult.java:109)
[junit] at junit.framework.TestCase.run(TestCase.java:118)
[junit] < POSTHOOK: Lineage: scriptfile1_dest1.value SCRIPT 
[(src)src.FieldSchema(name:key, type:string, comment:default), 
(src)src.FieldSchema(name:value, type:string, comment:default), ]
[junit] at junit.framework.TestSuite.runTest(TestSuite.java:208)
[junit] < PREHOOK: query: SELECT scriptfile1_dest1.* FROM scriptfile1_dest1
[junit] at junit.framework.TestSuite.run(TestSuite.java:203)
[junit] ---
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
[junit] > POSTHOOK: Output: defa...@dest1
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
[junit] > POSTHOOK: Lineage: dest1.key SCRIPT 
[(src)src.FieldSchema(name:key, type:string, comment:default), 
(src)src.FieldSchema(name:value, type:string, comment:default), ]
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
[junit] > POSTHOOK: Lineage: dest1.value SCRIPT 
[(src)src.FieldSchema(name:key, type:string, comment:default), 
(src)src.FieldSchema(name:value, type:string, comment:default), ]
[junit] > PREHOOK: query: SELECT dest1.* FROM dest1
[junit] 30,32c30,32
[junit] < PREHOOK: Input: defa...@scriptfile1_dest1
[junit] < PREHOOK: Output: 
hdfs://localhost.localdomain:59220/data/users/njain/hive_commit1/hive_commit1/build/ql/scratchdir/hive_2010-09-30_01-24-37_987_7722845044472176538/-mr-1
[junit] < POSTHOOK: query: SELECT scriptfile1_dest1.* FROM scriptfile1_des

Build failed in Hudson: Hive-trunk-h0.19 #556

2010-10-01 Thread Apache Hudson Server
See 

Changes:

[heyongqiang] HIVE-1624. Patch to allows scripts in S3 location.(Vaibhav 
Aggarwal via He Yongqiang)

--
[...truncated 12232 lines...]
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK

RE: release 0.6

2010-10-01 Thread Namit Jain
I am not sure what kind of downtime would it involve for us (facebook).

We will have to make a copy of the production metastore, and then perform the 
changes.
If that takes a long time, we will have to come up with some quicker upgrade 
solutions -
We will try to do that today, and get back to you.


Thanks,
-namit


From: Carl Steinbach [mailto:c...@cloudera.com]
Sent: Thursday, September 30, 2010 11:23 PM
To: Namit Jain
Cc: hive-dev@hadoop.apache.org
Subject: Re: release 0.6

Hi Namit,
It used to be much higher in the beginning but quite a few users reported 
problems on some mysql dbs. 767 seemed to work most dbs. before committing this 
can someone test this on some different dbs (with and without UTF encoding)?

Copying my response to Prasad from HIVE-1364:
"It's possible that people who ran into problems before were using a version of 
MySQL older than 5.0.3. These versions supported a 255 byte max length for 
VARCHARs. It's also possible that older versions of the package.jdo mapping 
contained more indexes, in which case the 767 byte limit holds. Also, UTF 
encoding should not make a difference since these are byte lengths, not 
character lengths."

Another point is that HIVE-675 added two 4000 byte VARCHARs to the mapping, and 
this patch is present in both trunk and the 0.6.0 branch. I haven't heard that 
anyone is experiencing problems because of this.

Do we really need it for 0.6, or should we test it properly/take our time and 
then commit it if needed.

Yes, I think we really need these changes. Several people have already 
commented on the list about hitting the 767 byte limit while using the HBase 
storage handler.

What kind of testing regimen do think is necessary for this change?

Thanks.

Carl



[jira] Created: (HIVE-1684) intermittent failures in create_escape.q

2010-10-01 Thread Namit Jain (JIRA)
intermittent failures in create_escape.q


 Key: HIVE-1684
 URL: https://issues.apache.org/jira/browse/HIVE-1684
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: He Yongqiang


[junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I 
lastUpdateTime -I lastAccessTime -I [Oo]wner -I CreateTime -I LastAccessTime -I 
Location -I transient_lastDdlTime -I last_modified_ -I 
java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I Caused 
by: -I [.][.][.] [0-9]* more 
/data/users/njain/hive_commit1/hive_commit1/build/ql/test/logs/clientpositive/create_escape.q.out
 
/data/users/njain/hive_commit1/hive_commit1/ql/src/test/results/clientpositive/create_escape.q.out
[junit] 48d47
[junit] <   serialization.format\t  
[junit] 49a49
[junit] >   serialization.format\t  


Sometimes, I see the above failure. 

This does not happen always, and needs to be investigated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1673) Create table bug causes the row format property lost when serde is specified.

2010-10-01 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916944#action_12916944
 ] 

Namit Jain commented on HIVE-1673:
--

create_escape.q in TestCliDriver is failing intermittently, but has nothing to 
do with the current patch

> Create table bug causes the row format property lost when serde is specified.
> -
>
> Key: HIVE-1673
> URL: https://issues.apache.org/jira/browse/HIVE-1673
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
> Attachments: hive-1673.1.patch
>
>
> An example:
> create table src_rc_serde_yongqiang(key string, value string) ROW FORMAT  
> DELIMITED FIELDS TERMINATED BY '\\0' stored as rcfile; 
> will lost the row format information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Hive-trunk-h0.18 #556

2010-10-01 Thread Apache Hudson Server
See 

Changes:

[heyongqiang] HIVE-1624. Patch to allows scripts in S3 location.(Vaibhav 
Aggarwal via He Yongqiang)

--
[...truncated 30997 lines...]
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK

[jira] Commented: (HIVE-1611) Add alternative search-provider to Hive site

2010-10-01 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916927#action_12916927
 ] 

Ning Zhang commented on HIVE-1611:
--

Hi Alex, some questions:
 - Hive doesn't have the file author/src/documentation/skinconf.xml, which is 
included in the patch. How does this work?
 - This patch and comments suggest this patch is for Hadoop subprojects. Hive 
is transiting to a TLP independent of Hadoop. Is there an issue after the 
transition?

> Add alternative search-provider to Hive site
> 
>
> Key: HIVE-1611
> URL: https://issues.apache.org/jira/browse/HIVE-1611
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Alex Baranau
>Assignee: Alex Baranau
>Priority: Minor
> Attachments: HIVE-1611.patch
>
>
> Use search-hadoop.com service to make available search in Hive sources, MLs, 
> wiki, etc.
> This was initially proposed on user mailing list. The search service was 
> already added in site's skin (common for all Hadoop related projects) before 
> so this issue is about enabling it for Hive. The ultimate goal is to use it 
> at all Hadoop's sub-projects' sites.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Hive-trunk-h0.17 #555

2010-10-01 Thread Apache Hudson Server
See 

Changes:

[heyongqiang] HIVE-1624. Patch to allows scripts in S3 location.(Vaibhav 
Aggarwal via He Yongqiang)

[namit] HIVE-1670 MapJoin throws an error if no column from the mapjoined table 
is selected
(Ning Zhang via namit)

--
[...truncated 10872 lines...]
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src1
[

[jira] Commented: (HIVE-1611) Add alternative search-provider to Hive site

2010-10-01 Thread Alex Baranau (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916880#action_12916880
 ] 

Alex Baranau commented on HIVE-1611:


Sorry to ping, but it looks like since the issue is assigned to me it was 
abandoned by committees...

> Add alternative search-provider to Hive site
> 
>
> Key: HIVE-1611
> URL: https://issues.apache.org/jira/browse/HIVE-1611
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Alex Baranau
>Assignee: Alex Baranau
>Priority: Minor
> Attachments: HIVE-1611.patch
>
>
> Use search-hadoop.com service to make available search in Hive sources, MLs, 
> wiki, etc.
> This was initially proposed on user mailing list. The search service was 
> already added in site's skin (common for all Hadoop related projects) before 
> so this issue is about enabling it for Hive. The ultimate goal is to use it 
> at all Hadoop's sub-projects' sites.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1612) Cannot build hive for hadoop 0.21.0

2010-10-01 Thread bharath v (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916869#action_12916869
 ] 

bharath v commented on HIVE-1612:
-

Any one working on this issue ?

> Cannot build hive for hadoop 0.21.0
> ---
>
> Key: HIVE-1612
> URL: https://issues.apache.org/jira/browse/HIVE-1612
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: AJ Pahl
>
> Current trunk for 0.7.0 does not support building HIVE against the Hadoop 
> 0.21.0 release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1683) Column aliases cannot be used in a group by clause

2010-10-01 Thread Shrikrishna Lawande (JIRA)
Column aliases cannot be used in a group by clause
--

 Key: HIVE-1683
 URL: https://issues.apache.org/jira/browse/HIVE-1683
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shrikrishna Lawande


Column aliases cannot be used in a group by clause


Following query would fail :

select col1 as t, count(col2) from test group by t;
FAILED: Error in semantic analysis: line 1:49 Invalid Table Alias or Column 
Reference t


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1452) Mapside join on non partitioned table with partitioned table causes error

2010-10-01 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916842#action_12916842
 ] 

Thiruvel Thirumoolan commented on HIVE-1452:


> However, I see different results when MAPJOIN is used. Will open another JIRA 
> for the same.

Have opened HIVE-1682 for the same.

> Mapside join on non partitioned table with partitioned table causes error
> -
>
> Key: HIVE-1452
> URL: https://issues.apache.org/jira/browse/HIVE-1452
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
>Assignee: Thiruvel Thirumoolan
>
> I am running script which contains two tables, one is dynamically partitioned 
> and stored as RCFormat and the other is stored as TXT file.
> The TXT file has around 397MB in size and has around 24million rows.
> {code}
> drop table joinquery;
> create external table joinquery (
>   id string,
>   type string,
>   sec string,
>   num string,
>   url string,
>   cost string,
>   listinfo array >
> ) 
> STORED AS TEXTFILE
> LOCATION '/projects/joinquery';
> CREATE EXTERNAL TABLE idtable20mil(
> id string
> )
> STORED AS TEXTFILE
> LOCATION '/projects/idtable20mil';
> insert overwrite table joinquery
>select 
>   /*+ MAPJOIN(idtable20mil) */
>   rctable.id,
>   rctable.type,
>   rctable.map['sec'],
>   rctable.map['num'],
>   rctable.map['url'],
>   rctable.map['cost'],
>   rctable.listinfo
> from rctable
> JOIN  idtable20mil on (rctable.id = idtable20mil.id)
> where
> rctable.id is not null and
> rctable.part='value' and
> rctable.subpart='value'and
> rctable.pty='100' and
> rctable.uniqid='1000'
> order by id;
> {code}
> Result:
> Possible error:
>   Data file split:string,part:string,subpart:string,subsubpart:string> is 
> corrupted.
> Solution:
>   Replace file. i.e. by re-running the query that produced the source table / 
> partition.
> -
> If I look at mapper logs.
> {verbatim}
> Caused by: java.io.IOException: java.io.EOFException
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinObjectValue.readExternal(MapJoinObjectValue.java:109)
>   at 
> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1792)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1751)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.htree.HashBucket.readExternal(HashBucket.java:284)
>   at 
> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1792)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1751)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.helper.Serialization.deserialize(Serialization.java:106)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.helper.DefaultSerializer.deserialize(DefaultSerializer.java:106)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.recman.BaseRecordManager.fetch(BaseRecordManager.java:360)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.recman.BaseRecordManager.fetch(BaseRecordManager.java:332)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.htree.HashDirectory.get(HashDirectory.java:195)
>   at org.apache.hadoop.hive.ql.util.jdbm.htree.HTree.get(HTree.java:155)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper.get(HashMapWrapper.java:114)
>   ... 11 more
> Caused by: java.io.EOFException
>   at java.io.DataInputStream.readInt(DataInputStream.java:375)
>   at 
> java.io.ObjectInputStream$BlockDataInputStream.readInt(ObjectInputStream.java:2776)
>   at java.io.ObjectInputStream.readInt(ObjectInputStream.java:950)
>   at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:153)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinObjectValue.readExternal(MapJoinObjectValue.java:98)
> {verbatim}
> I am trying to create a testcase, which can demonstrate this error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1682) Wrong results with MAPJOIN when cols from non-MAPJOINed table are selected

2010-10-01 Thread Thiruvel Thirumoolan (JIRA)
Wrong results with MAPJOIN when cols from non-MAPJOINed table are selected
--

 Key: HIVE-1682
 URL: https://issues.apache.org/jira/browse/HIVE-1682
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.7.0
 Environment: Hive trunk (rev 1003407)
Hadoop 20.2
Reporter: Thiruvel Thirumoolan


Results of this query is wrong:

set hive.mapjoin.cache.numrows=100;
select /*+ MAPJOIN(invites) */ pokes.bar from pokes join invites on (pokes.bar 
= invites.bar);

Results of all the queries below match:

/* This is the same as problematic query without specifying numrows - which 
defaults to 25k much greater than the number of rows in pokes table */
select /*+ MAPJOIN(invites) */ pokes.bar from pokes join invites on (pokes.bar 
= invites.bar)

set hive.mapjoin.cache.numrows=100;
select /*+ MAPJOIN(invites) */ invites.bar from pokes join invites on 
(pokes.bar = invites.bar);

select invites.bar from pokes join invites on (pokes.bar = invites.bar);

select pokes.bar from pokes join invites on (pokes.bar = invites.bar);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1452) Mapside join on non partitioned table with partitioned table causes error

2010-10-01 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan resolved HIVE-1452.


Resolution: Duplicate

HIVE-1670 fixes the EOF issue and I dont see the problem with the queries I 
used above. Hence closing this one.

However, I see different results when MAPJOIN is used. Will open another JIRA 
for the same.

> Mapside join on non partitioned table with partitioned table causes error
> -
>
> Key: HIVE-1452
> URL: https://issues.apache.org/jira/browse/HIVE-1452
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
>Assignee: Thiruvel Thirumoolan
>
> I am running script which contains two tables, one is dynamically partitioned 
> and stored as RCFormat and the other is stored as TXT file.
> The TXT file has around 397MB in size and has around 24million rows.
> {code}
> drop table joinquery;
> create external table joinquery (
>   id string,
>   type string,
>   sec string,
>   num string,
>   url string,
>   cost string,
>   listinfo array >
> ) 
> STORED AS TEXTFILE
> LOCATION '/projects/joinquery';
> CREATE EXTERNAL TABLE idtable20mil(
> id string
> )
> STORED AS TEXTFILE
> LOCATION '/projects/idtable20mil';
> insert overwrite table joinquery
>select 
>   /*+ MAPJOIN(idtable20mil) */
>   rctable.id,
>   rctable.type,
>   rctable.map['sec'],
>   rctable.map['num'],
>   rctable.map['url'],
>   rctable.map['cost'],
>   rctable.listinfo
> from rctable
> JOIN  idtable20mil on (rctable.id = idtable20mil.id)
> where
> rctable.id is not null and
> rctable.part='value' and
> rctable.subpart='value'and
> rctable.pty='100' and
> rctable.uniqid='1000'
> order by id;
> {code}
> Result:
> Possible error:
>   Data file split:string,part:string,subpart:string,subsubpart:string> is 
> corrupted.
> Solution:
>   Replace file. i.e. by re-running the query that produced the source table / 
> partition.
> -
> If I look at mapper logs.
> {verbatim}
> Caused by: java.io.IOException: java.io.EOFException
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinObjectValue.readExternal(MapJoinObjectValue.java:109)
>   at 
> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1792)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1751)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.htree.HashBucket.readExternal(HashBucket.java:284)
>   at 
> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1792)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1751)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.helper.Serialization.deserialize(Serialization.java:106)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.helper.DefaultSerializer.deserialize(DefaultSerializer.java:106)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.recman.BaseRecordManager.fetch(BaseRecordManager.java:360)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.recman.BaseRecordManager.fetch(BaseRecordManager.java:332)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.htree.HashDirectory.get(HashDirectory.java:195)
>   at org.apache.hadoop.hive.ql.util.jdbm.htree.HTree.get(HTree.java:155)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper.get(HashMapWrapper.java:114)
>   ... 11 more
> Caused by: java.io.EOFException
>   at java.io.DataInputStream.readInt(DataInputStream.java:375)
>   at 
> java.io.ObjectInputStream$BlockDataInputStream.readInt(ObjectInputStream.java:2776)
>   at java.io.ObjectInputStream.readInt(ObjectInputStream.java:950)
>   at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:153)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinObjectValue.readExternal(MapJoinObjectValue.java:98)
> {verbatim}
> I am trying to create a testcase, which can demonstrate this error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1678) NPE in MapJoin

2010-10-01 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu reassigned HIVE-1678:
-

Assignee: Amareshwari Sriramadasu

> NPE in MapJoin 
> ---
>
> Key: HIVE-1678
> URL: https://issues.apache.org/jira/browse/HIVE-1678
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
>
> The query with two map joins and a group by fails with following NPE:
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:177)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:464)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1681) ObjectStore.commitTransaction() does not properly handle transactions that have already been rolled back

2010-10-01 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1681:
-

Attachment: HIVE-1681.1.patch.txt

> ObjectStore.commitTransaction() does not properly handle transactions that 
> have already been rolled back
> 
>
> Key: HIVE-1681
> URL: https://issues.apache.org/jira/browse/HIVE-1681
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.5.0, 0.6.0, 0.7.0
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Attachments: HIVE-1681.1.patch.txt
>
>
> Here's the code for ObjectStore.commitTransaction() and 
> ObjectStore.rollbackTransaction():
> {code}
>   public boolean commitTransaction() {
> assert (openTrasactionCalls >= 1);
> if (!currentTransaction.isActive()) {
>   throw new RuntimeException(
>   "Commit is called, but transaction is not active. Either there are"
>   + " mismatching open and close calls or rollback was called in 
> the same trasaction");
> }
> openTrasactionCalls--;
> if ((openTrasactionCalls == 0) && currentTransaction.isActive()) {
>   transactionStatus = TXN_STATUS.COMMITED;
>   currentTransaction.commit();
> }
> return true;
>   }
>   public void rollbackTransaction() {
> if (openTrasactionCalls < 1) {
>   return;
> }
> openTrasactionCalls = 0;
> if (currentTransaction.isActive()
> && transactionStatus != TXN_STATUS.ROLLBACK) {
>   transactionStatus = TXN_STATUS.ROLLBACK;
>   // could already be rolled back
>   currentTransaction.rollback();
> }
>   }
> {code}
> Now suppose a nested transaction throws an exception which results
> in the nested pseudo-transaction calling rollbackTransaction(). This causes
> rollbackTransaction() to rollback the actual transaction, as well as to set 
> openTransactionCalls=0 and transactionStatus = TXN_STATUS.ROLLBACK.
> Suppose also that this nested transaction squelches the original exception.
> In this case the stack will unwind and the caller will eventually try to 
> commit the
> transaction by calling commitTransaction() which will see that 
> currentTransaction.isActive() returns
> FALSE and will throw a RuntimeException. The fix for this problem is
> that commitTransaction() needs to first check transactionStatus and return 
> immediately
> if transactionStatus==TXN_STATUS.ROLLBACK.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1681) ObjectStore.commitTransaction() does not properly handle transactions that have already been rolled back

2010-10-01 Thread Carl Steinbach (JIRA)
ObjectStore.commitTransaction() does not properly handle transactions that have 
already been rolled back


 Key: HIVE-1681
 URL: https://issues.apache.org/jira/browse/HIVE-1681
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.5.0, 0.6.0, 0.7.0
Reporter: Carl Steinbach
Assignee: Carl Steinbach


Here's the code for ObjectStore.commitTransaction() and 
ObjectStore.rollbackTransaction():

{code}
  public boolean commitTransaction() {
assert (openTrasactionCalls >= 1);
if (!currentTransaction.isActive()) {
  throw new RuntimeException(
  "Commit is called, but transaction is not active. Either there are"
  + " mismatching open and close calls or rollback was called in 
the same trasaction");
}
openTrasactionCalls--;
if ((openTrasactionCalls == 0) && currentTransaction.isActive()) {
  transactionStatus = TXN_STATUS.COMMITED;
  currentTransaction.commit();
}
return true;
  }

  public void rollbackTransaction() {
if (openTrasactionCalls < 1) {
  return;
}
openTrasactionCalls = 0;
if (currentTransaction.isActive()
&& transactionStatus != TXN_STATUS.ROLLBACK) {
  transactionStatus = TXN_STATUS.ROLLBACK;
  // could already be rolled back
  currentTransaction.rollback();
}
  }

{code}

Now suppose a nested transaction throws an exception which results
in the nested pseudo-transaction calling rollbackTransaction(). This causes
rollbackTransaction() to rollback the actual transaction, as well as to set 
openTransactionCalls=0 and transactionStatus = TXN_STATUS.ROLLBACK.
Suppose also that this nested transaction squelches the original exception.
In this case the stack will unwind and the caller will eventually try to commit 
the
transaction by calling commitTransaction() which will see that 
currentTransaction.isActive() returns
FALSE and will throw a RuntimeException. The fix for this problem is
that commitTransaction() needs to first check transactionStatus and return 
immediately
if transactionStatus==TXN_STATUS.ROLLBACK.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1680) EXPLAIN for a CTAS query fails if the table already exists in the system

2010-10-01 Thread Shrikrishna Lawande (JIRA)
EXPLAIN for a CTAS query  fails if the table already exists in the system
-

 Key: HIVE-1680
 URL: https://issues.apache.org/jira/browse/HIVE-1680
 Project: Hadoop Hive
  Issue Type: Bug
  Components: CLI
Reporter: Shrikrishna Lawande
Priority: Minor


EXPLAIN for a CTAS query (create table as select) would throw an error "FAILED: 
Error in semantic analysis: org.apache.hadoop.hive.ql.parse.SemanticException: 
Table already exists: temp" if the table already exists in the system

The semantic analyzer should not  validate the tables name in the create table 
statement for 'EXPLAIN' queries.

To repro :


create table test (col1 int, col2 int);

 create table test1 as select * from test;


 explain create table test1 as select * from test;;
FAILED: Error in semantic analysis: 
org.apache.hadoop.hive.ql.parse.SemanticException: Table already exists: test1


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1679) MetaStore does not detect and rollback failed transactions

2010-10-01 Thread Carl Steinbach (JIRA)
MetaStore does not detect and rollback failed transactions
--

 Key: HIVE-1679
 URL: https://issues.apache.org/jira/browse/HIVE-1679
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.5.0, 0.6.0, 0.7.0
Reporter: Carl Steinbach


Most of the methods in HiveMetaStore and ObjectStore adhere to the following 
idiom when 
interacting with the ObjectStore:

{code}
boolean success = false;
try {
  ms.openTransaction();
  /* do some stuff */
  success = ms.commitTransaction();
} finally {
  if (!success) {
ms.rollbackTransaction();
  }
}
{code}

The problem with this is that ObjectStore.commitTransaction() always returns 
TRUE:

{code}
  public boolean commitTransaction() {
assert (openTrasactionCalls >= 1);
if (!currentTransaction.isActive()) {
  throw new RuntimeException(
  "Commit is called, but transaction is not active. Either there are"
  + " mismatching open and close calls or rollback was called in 
the same trasaction");
}
openTrasactionCalls--;
if ((openTrasactionCalls == 0) && currentTransaction.isActive()) {
  transactionStatus = TXN_STATUS.COMMITED;
  currentTransaction.commit();
}
return true;
  }
{code}


Consequently, the transaction appears to always succeed and ObjectStore is never
directed to rollback transactions that have actually failed. 



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.