[jira] Commented: (HIVE-307) LOAD DATA LOCAL INPATH fails when the table already contains a file of the same name

2010-09-17 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910799#action_12910799
 ] 

Ashish Thusoo commented on HIVE-307:


Hi Kirk,

Thanks for the contribution. Can you add a simple test case with your patch?

Ashish

 LOAD DATA LOCAL INPATH fails when the table already contains a file of the 
 same name
 --

 Key: HIVE-307
 URL: https://issues.apache.org/jira/browse/HIVE-307
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.5.0
Reporter: Zheng Shao
Priority: Critical
 Attachments: HIVE-307.patch


 Failed with exception checkPaths: 
 /user/zshao/warehouse/tmp_user_msg_history/test_user_msg_history already 
 exists
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.MoveTask

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-307) LOAD DATA LOCAL INPATH fails when the table already contains a file of the same name

2010-09-17 Thread Ashish Thusoo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-307:
---

  Status: Open  (was: Patch Available)
Assignee: Kirk True

Cancelling the patch because of a missing test case. Krik, would be great if 
you can resubmit with the test case. Otherwise the code looks fine to me.

Ashish

 LOAD DATA LOCAL INPATH fails when the table already contains a file of the 
 same name
 --

 Key: HIVE-307
 URL: https://issues.apache.org/jira/browse/HIVE-307
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.5.0
Reporter: Zheng Shao
Assignee: Kirk True
Priority: Critical
 Attachments: HIVE-307.patch


 Failed with exception checkPaths: 
 /user/zshao/warehouse/tmp_user_msg_history/test_user_msg_history already 
 exists
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.MoveTask

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [VOTE] Hive as a TLP

2010-08-30 Thread Ashish Thusoo
With 10 +1 votes this vote passes.

Owen, 

Please forward this to the Apache board.

Thanks,
Ashish 

-Original Message-
From: Tom White [mailto:t...@cloudera.com] 
Sent: Friday, August 27, 2010 10:24 AM
To: gene...@hadoop.apache.org
Subject: Re: [VOTE] Hive as a TLP

+1

Tom

On Thu, Aug 26, 2010 at 1:01 PM, Ashish Thusoo athu...@facebook.com wrote:
 The Hive development community voted and passed the following 
 resolution. The details of the vote is at

 http://www.bit.ly/aJogyU

 The PMC will comprise of the current committers on Hive (as of 8/24/2010) 
 with Namit Jain being the chair.

 Please vote on sending this resolution to the Apache Board.

 Thanks,
 Ashish

 Draft Resolution to be sent to the Apache Board
 ---

 Establish the Apache Hive Project

         WHEREAS, the Board of Directors deems it to be in the best
         interests of the Foundation and consistent with the
         Foundation's purpose to establish a Project Management
         Committee charged with the creation and maintenance of
         open-source software related to parallel analysis of large
         data sets for distribution at no charge to the public.

         NOW, THEREFORE, BE IT RESOLVED, that a Project Management
         Committee (PMC), to be known as the Apache Hive Project,
         be and hereby is established pursuant to Bylaws of the
         Foundation; and be it further

         RESOLVED, that the Apache Hive Project be and hereby is
         responsible for the creation and maintenance of software
         related to parallel analysis of large data sets; and be
         it further

         RESOLVED, that the office of Vice President, Apache Hive be
         and hereby is created, the person holding such office to
         serve at the direction of the Board of Directors as the chair
         of the Apache Hive Project, and to have primary responsibility
         for management of the projects within the scope of
         responsibility of the Apache Hive Project; and be it further

         RESOLVED, that the persons listed immediately below be and
         hereby are appointed to serve as the initial members of the
         Apache Hive Project:
             * Namit Jain (na...@apache.org)
             * John Sichi (j...@apache.org)
             * Zheng Shao (zs...@apache.org)
             * Edward Capriolo (appodic...@apache.org)
             * Raghotham Murthy (r...@apache.org)
             * Ning Zhang (nzh...@apache.org)
             * Paul Yang (pa...@apache.org)
             * He Yongqiang (he yongqi...@apache.org)
             * Prasad Chakka (pras...@apache.org)
             * Joydeep Sen Sarma (jsensa...@apache.org)
             * Ashish Thusoo (athu...@apache.org)

         NOW, THEREFORE, BE IT FURTHER RESOLVED, that Namit Jain
         be appointed to the office of Vice President, Apache Hive, to
         serve in accordance with and subject to the direction of the
         Board of Directors and the Bylaws of the Foundation until
         death, resignation, retirement, removal or disqualification,
         or until a successor is appointed; and be it further

         RESOLVED, that the initial Apache Hive PMC be and hereby is
         tasked with the creation of a set of bylaws intended to
         encourage open development and increased participation in the
         Apache Hive Project; and be it further

         RESOLVED, that the Apache Hive Project be and hereby
         is tasked with the migration and rationalization of the Apache
         Hadoop Hive sub-project; and be it further

         RESOLVED, that all responsibilities pertaining to the Apache
         Hive sub-project encumbered upon the
         Apache Hadoop Project are hereafter discharged.



[VOTE] Draft Resolution to make Hive a TLP

2010-08-24 Thread Ashish Thusoo
Folks,

I am going to make the following proposal at gene...@hadoop.apache.org

In summary this proposal does the following things:

1. Establishes the PMC as comprising of the current committers of Hive (as of 
today - 8/24/2010).

2. Proposes Namit Jain to the chair of the project (PMC chairs have no more 
power than other PMC members, but they are responsible for writing regular 
reports for the Apache board, assigning rights to new committers, etc.)

3. Tasks the PMC to come up with the bylaws for governance of the project.

Please vote on this as soon as possible(yes I should have done this as part of 
the earlier vote, but please bear with me), so that we can get the ball rolling 
on this...

Thanks,
Ashish

Draft Resolution to be sent to the Apache Board
---

Establish the Apache Hive Project

 WHEREAS, the Board of Directors deems it to be in the best
 interests of the Foundation and consistent with the
 Foundation's purpose to establish a Project Management
 Committee charged with the creation and maintenance of
 open-source software related to parallel analysis of large
 data sets for distribution at no charge to the public.

 NOW, THEREFORE, BE IT RESOLVED, that a Project Management
 Committee (PMC), to be known as the Apache Hive Project,
 be and hereby is established pursuant to Bylaws of the
 Foundation; and be it further

 RESOLVED, that the Apache Hive Project be and hereby is
 responsible for the creation and maintenance of software
 related to parallel analysis of large data sets; and be
 it further

 RESOLVED, that the office of Vice President, Apache Hive be
 and hereby is created, the person holding such office to
 serve at the direction of the Board of Directors as the chair
 of the Apache Hive Project, and to have primary responsibility
 for management of the projects within the scope of
 responsibility of the Apache Hive Project; and be it further

 RESOLVED, that the persons listed immediately below be and
 hereby are appointed to serve as the initial members of the
 Apache Hive Project:
 * Namit Jain (na...@apache.org)
 * John Sichi (j...@apache.org)
 * Zheng Shao (zs...@apache.org)
 * Edward Capriolo (appodic...@apache.org)
 * Raghotham Murthy (r...@apache.org)
 * Ning Zhang (nzh...@apache.org)
 * Paul Yang (pa...@apache.org)
 * He Yongqiang (he yongqi...@apache.org)
 * Prasad Chakka (pras...@apache.org)
 * Joydeep Sen Sarma (jsensa...@apache.org)
 * Ashish Thusoo (athu...@apache.org)

 NOW, THEREFORE, BE IT FURTHER RESOLVED, that Namit Jain
 be appointed to the office of Vice President, Apache Hive, to
 serve in accordance with and subject to the direction of the
 Board of Directors and the Bylaws of the Foundation until
 death, resignation, retirement, removal or disqualification,
 or until a successor is appointed; and be it further

 RESOLVED, that the initial Apache Hive PMC be and hereby is
 tasked with the creation of a set of bylaws intended to
 encourage open development and increased participation in the
 Apache Hive Project; and be it further

 RESOLVED, that the Apache Hive Project be and hereby
 is tasked with the migration and rationalization of the Apache
 Hadoop Hive sub-project; and be it further

 RESOLVED, that all responsibilities pertaining to the Apache
 Hive sub-project encumbered upon the
 Apache Hadoop Project are hereafter discharged.



RE: [DISCUSSION] Move to become a TLP

2010-08-20 Thread Ashish Thusoo
Thanks everyone who voted. Looks like this is unanimous at this point. I will 
start the proceedings in the Hadoop PMC to make Hive a TLP.

Ashish 

-Original Message-
From: Paul Yang [mailto:py...@facebook.com] 
Sent: Thursday, August 19, 2010 4:05 PM
To: hive-dev@hadoop.apache.org
Subject: RE: [DISCUSSION] Move to become a TLP

+1

-Original Message-
From: Joydeep Sen Sarma [mailto:jssa...@facebook.com]
Sent: Thursday, August 19, 2010 3:30 PM
To: hive-dev@hadoop.apache.org
Subject: RE: [DISCUSSION] Move to become a TLP

+1

-Original Message-
From: Carl Steinbach [mailto:c...@cloudera.com]
Sent: Thursday, August 19, 2010 3:18 PM
To: hive-dev@hadoop.apache.org
Subject: Re: [DISCUSSION] Move to become a TLP

+1

On Thu, Aug 19, 2010 at 3:15 PM, Ning Zhang nzh...@facebook.com wrote:

 +1 as well.

 On Aug 19, 2010, at 3:06 PM, Zheng Shao wrote:

  +1.
 
  Zheng
 
  On Mon, Aug 16, 2010 at 11:58 AM, John Sichi jsi...@facebook.com
 wrote:
  +1 from me.  The momentum on cross-company collaboration we're 
  +seeing
 now, plus big integration contributions such as the new storage 
 handlers (HyperTable and Cassandra), are all signs that Hive is growing up 
 fast.
 
  HBase recently took the same route, so I'm going to have a chat 
  with
 Jonathan Gray to find out what that involved for them.
 
  JVS
 
  On Aug 14, 2010, at 4:42 PM, Jeff Hammerbacher wrote:
 
  Yes, I think Hive is ready to become a TLP.
 
  On Fri, Aug 13, 2010 at 1:36 PM, Ashish Thusoo 
  athu...@facebook.com
 wrote:
 
  Nice one Ed...
 
  Folks,
 
  Please chime in. I think we should close this out next week one 
  way or
 the
  other. We can consider this a vote at this point, so please vote 
  on
 this
  issue.
 
  Thanks,
  Ashish
 
  -Original Message-
  From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
  Sent: Thursday, August 12, 2010 8:05 AM
  To: hive-dev@hadoop.apache.org
  Subject: Re: [DISCUSSION] Move to become a TLP
 
  On Wed, Aug 11, 2010 at 9:15 PM, Ashish Thusoo 
  athu...@facebook.com
  wrote:
  Folks,
 
  This question has come up in the PMC once again and would be 
  great to
  hear once more on this topic. What do people think? Are we ready 
  to
 become a
  TLP?
 
  Thanks,
  Ashish
 
  I thought of one more benefit. We can rename our packages from
 
  org.apache.hadoop.hive.*
  to
  org.apache.hive.*
 
  :)
 
 
 
 
 
 
  --
  Yours,
  Zheng
  http://www.linkedin.com/in/zshao




RE: [DISCUSSION] Move to become a TLP

2010-08-13 Thread Ashish Thusoo
Nice one Ed...

Folks,

Please chime in. I think we should close this out next week one way or the 
other. We can consider this a vote at this point, so please vote on this issue.

Thanks,
Ashish 

-Original Message-
From: Edward Capriolo [mailto:edlinuxg...@gmail.com] 
Sent: Thursday, August 12, 2010 8:05 AM
To: hive-dev@hadoop.apache.org
Subject: Re: [DISCUSSION] Move to become a TLP

On Wed, Aug 11, 2010 at 9:15 PM, Ashish Thusoo athu...@facebook.com wrote:
 Folks,

 This question has come up in the PMC once again and would be great to hear 
 once more on this topic. What do people think? Are we ready to become a TLP?

 Thanks,
 Ashish

I thought of one more benefit. We can rename our packages from

org.apache.hadoop.hive.*
to
org.apache.hive.*

:)


[DISCUSSION] Move to become a TLP

2010-08-11 Thread Ashish Thusoo
Folks,

This question has come up in the PMC once again and would be great to hear once 
more on this topic. What do people think? Are we ready to become a TLP?

Thanks,
Ashish

RE: Hive should start moving to the new hadoop mapreduce api.

2010-07-29 Thread Ashish Thusoo
+1 to this

Ashish

-Original Message-
From: yongqiang he [mailto:heyongqiang...@gmail.com] 
Sent: Thursday, July 29, 2010 10:54 AM
To: hive-dev@hadoop.apache.org
Subject: Hive should start moving to the new hadoop mapreduce api.

Hi all,

In offline discussions when we fixing HIVE-1492, we think it maybe good now to 
start thinking to move Hive to use new MapReduce context API, and also start 
deprecating Hadoop-0.17.0 support in Hive.
Basically the new MapReduce API gives Hive more control at runtime.

Any thoughts on this?


Thanks


RE: Hive should start moving to the new hadoop mapreduce api.

2010-07-29 Thread Ashish Thusoo
Yes these are mutually exclusive.

Ashish 

-Original Message-
From: Edward Capriolo [mailto:edlinuxg...@gmail.com] 
Sent: Thursday, July 29, 2010 11:20 AM
To: hive-dev@hadoop.apache.org
Subject: Re: Hive should start moving to the new hadoop mapreduce api.

Aren't these things mutually exclusive?
The new Map Reduce API appeared in 20.
Deprecating 17 seems reasonable, but we still have to support the old api for 
18 and 19 correct?

On Thu, Jul 29, 2010 at 2:11 PM, Ashish Thusoo athu...@facebook.com wrote:
 +1 to this

 Ashish

 -Original Message-
 From: yongqiang he [mailto:heyongqiang...@gmail.com]
 Sent: Thursday, July 29, 2010 10:54 AM
 To: hive-dev@hadoop.apache.org
 Subject: Hive should start moving to the new hadoop mapreduce api.

 Hi all,

 In offline discussions when we fixing HIVE-1492, we think it maybe good now 
 to start thinking to move Hive to use new MapReduce context API, and also 
 start deprecating Hadoop-0.17.0 support in Hive.
 Basically the new MapReduce API gives Hive more control at runtime.

 Any thoughts on this?


 Thanks



RE: Hive should start moving to the new hadoop mapreduce api.

2010-07-29 Thread Ashish Thusoo
Before deciding that, we should pool the user list to see if this would be too 
disruptive for anyone..

Ashish 

-Original Message-
From: Ning Zhang [mailto:nzh...@facebook.com] 
Sent: Thursday, July 29, 2010 12:18 PM
To: hive-dev@hadoop.apache.org
Subject: Re: Hive should start moving to the new hadoop mapreduce api.

Maybe we should decide hive-0.7 as the last branch to support hadoop pre-0.20 
API and later branches of Hive will be switched to the new hadoop API?

On Jul 29, 2010, at 11:53 AM, Ashish Thusoo wrote:

 Yes these are mutually exclusive.
 
 Ashish
 
 -Original Message-
 From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
 Sent: Thursday, July 29, 2010 11:20 AM
 To: hive-dev@hadoop.apache.org
 Subject: Re: Hive should start moving to the new hadoop mapreduce api.
 
 Aren't these things mutually exclusive?
 The new Map Reduce API appeared in 20.
 Deprecating 17 seems reasonable, but we still have to support the old api for 
 18 and 19 correct?
 
 On Thu, Jul 29, 2010 at 2:11 PM, Ashish Thusoo athu...@facebook.com wrote:
 +1 to this
 
 Ashish
 
 -Original Message-
 From: yongqiang he [mailto:heyongqiang...@gmail.com]
 Sent: Thursday, July 29, 2010 10:54 AM
 To: hive-dev@hadoop.apache.org
 Subject: Hive should start moving to the new hadoop mapreduce api.
 
 Hi all,
 
 In offline discussions when we fixing HIVE-1492, we think it maybe good now 
 to start thinking to move Hive to use new MapReduce context API, and also 
 start deprecating Hadoop-0.17.0 support in Hive.
 Basically the new MapReduce API gives Hive more control at runtime.
 
 Any thoughts on this?
 
 
 Thanks
 



RE: [howldev] Initial thoughts on authorization in howl

2010-07-29 Thread Ashish Thusoo
Hi Pradeep,

I get from this note that the authorization that you are talking about here are 
basically the management of the permissions on the hdfs directories 
corresponding to the tables and the partitions. So from that angle this sounds 
good to me. There is a whole set of permissions/authorizations with regard to 
the metadata operations themselves eg. Who should be able to run an alter table 
add column or describe table etc. I presume that would be beyond the scope of 
this change and would come in later? I am thinking more in terms of the 
permissions model that is supported in SQL using GRANT statements etc.

I also presume that by conf variables you mean the key value properties that 
Hive can store in the metadata and not the hive conf variables, right?

Ashish

-Original Message-
From: John Sichi [mailto:jsi...@facebook.com] 
Sent: Wednesday, July 28, 2010 2:22 PM
To: hive-dev@hadoop.apache.org
Subject: Fwd: [howldev] Initial thoughts on authorization in howl

Begin forwarded message:

From: Pradeep Kamath prade...@yahoo-inc.commailto:prade...@yahoo-inc.com
Date: July 27, 2010 4:38:42 PM PDT
To: howl...@yahoogroups.commailto:howl...@yahoogroups.com
Subject: [howldev] Initial thoughts on authorization in howl
Reply-To: howl...@yahoogroups.commailto:howl...@yahoogroups.com



The initial thoughts on authorization in howl are to model authorization (for 
DDL ops like create table/drop table/add partition etc) after hdfs permissions. 
To be able to do this, we would like to extend createTable() to add the ability 
to record a different group from the user's primary group and to record the 
complete unix permissions on the table directory. Also, we would like to have a 
way for partition directories to inherit permissions and group information 
based on the table directory. To keep the metastore backward compatible for use 
with hive, I propose having conf variables to achieve these objectives:
-  table.group.namehttp://table.group.name - value will indicate the 
name of the unix group for the table directory. This will be used by 
createTable() to perform a chgrp to the value provided. This property will 
provide the user the ability to choose from one of the many unix groups he is 
part of to associate with the table.
-  table.permissions - value will be of the form rwxrwxrwx to indicate 
read-write-execute permissions on the table directory. This will be used by 
createTable() to perform a chmod to the value provided. This will let the user 
decide what permissions he wants on the table.
-  partitions.inherit.permissions - a value of true will indicate that 
partitions inherit the group name and permissions of the table level directory. 
This will be used by addPartition() to perform a chgrp and chmod to the values 
as on the table directory.

I favor conf properties over API changes since the complete authorization 
design for hive is not finalized yet. These properties can be 
deprecated/removed when that is in place. These properties would also be useful 
to some installation of vanilla hive since at least DFS level authorization can 
now be achieved by hive without the user having to manually perform chgrp and 
chmod operations on DFS.

I would like to hear from hive developers/committers whether this would be 
acceptable for hive and also thoughts from others.

Pradeep



__._,_.___


Your email settings: Individual Email|Traditional Change settings via the 
Webhttp://groups.yahoo.com/group/howldev/join;_ylc=X3oDMTJnZXE5ZHNwBF9TAzk3NDc2NTkwBGdycElkAzYzNDIwNTA4BGdycHNwSWQDMTcwNzI4MTk0MgRzZWMDZnRyBHNsawNzdG5ncwRzdGltZQMxMjgwMjczOTQ2
 (Yahoo! ID required) Change settings via email: Switch delivery to Daily 
Digestmailto:howldev-dig...@yahoogroups.com?subject=email%20delivery:%20Digest
 | Switch to Fully 
Featuredmailto:howldev-fullfeatu...@yahoogroups.com?subject=change%20delivery%20format:%20Fully%20Featured
Visit Your Group 
http://groups.yahoo.com/group/howldev;_ylc=X3oDMTJlOWw0Y3F0BF9TAzk3NDc2NTkwBGdycElkAzYzNDIwNTA4BGdycHNwSWQDMTcwNzI4MTk0MgRzZWMDZnRyBHNsawNocGYEc3RpbWUDMTI4MDI3Mzk0Ng--
 | Yahoo! Groups Terms of Use http://docs.yahoo.com/info/terms/ | Unsubscribe 
mailto:howldev-unsubscr...@yahoogroups.com?subject=unsubscribe

__,_._,___



RE: Hive Web Interface Broken YET AGAIN!

2010-07-29 Thread Ashish Thusoo
Can you point to the JIRA that introduced this problem?

Ashish 

-Original Message-
From: Edward Capriolo [mailto:edlinuxg...@gmail.com] 
Sent: Thursday, July 29, 2010 7:38 AM
To: hive-dev@hadoop.apache.org
Subject: Hive Web Interface Broken YET AGAIN!

All,

While the web interface is not as widely used as the cli, people do use it. Its 
init process has been broken 3 times I can remember (once by the shims), once 
by adding version numbers to the jars, and now it is affected by the libjars.

[r...@etl02 ~]# hive --service hwi
Exception in thread main java.io.IOException: Error opening job jar: -libjars
at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
Caused by: java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.init(ZipFile.java:114)
at java.util.jar.JarFile.init(JarFile.java:133)
at java.util.jar.JarFile.init(JarFile.java:70)
at org.apache.hadoop.util.RunJar.main(RunJar.java:88)

I notice someone patched the cli do deal with this. There is no test coverage 
for the shell scripts.

But it seems like only some of the scripts were repaired:

bin/ext/cli.sh
bin/ext/lineage.sh
bin/ext/metastore.sh

I wonder why only half the scripts are repaired? In general if something 
changes in hive or hadoop that causes the cli to break we should fix it across 
the board. I feel like every time a release is coming up I test drive the web 
interface to find a simple script problem stops it from running.

Edward


[jira] Commented: (HIVE-417) Implement Indexing in Hive

2010-07-27 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892932#action_12892932
 ] 

Ashish Thusoo commented on HIVE-417:


Started looking at this. One initial question I had - why is virtualcolumn 
class in the serde2 package?

 Implement Indexing in Hive
 --

 Key: HIVE-417
 URL: https://issues.apache.org/jira/browse/HIVE-417
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0
Reporter: Prasad Chakka
Assignee: He Yongqiang
 Attachments: hive-417.proto.patch, hive-417-2009-07-18.patch, 
 hive-indexing-8-thrift-metastore-remodel.patch, hive-indexing.3.patch, 
 hive-indexing.5.thrift.patch, hive.indexing.11.patch, idx2.png, 
 indexing_with_ql_rewrites_trunk_953221.patch


 Implement indexing on Hive so that lookup and range queries are efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-417) Implement Indexing in Hive

2010-07-27 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892939#action_12892939
 ] 

Ashish Thusoo commented on HIVE-417:


Also, how is the file name populated? That is not done through the IOContext?

 Implement Indexing in Hive
 --

 Key: HIVE-417
 URL: https://issues.apache.org/jira/browse/HIVE-417
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0
Reporter: Prasad Chakka
Assignee: He Yongqiang
 Attachments: hive-417.proto.patch, hive-417-2009-07-18.patch, 
 hive-indexing-8-thrift-metastore-remodel.patch, hive-indexing.3.patch, 
 hive-indexing.5.thrift.patch, hive.indexing.11.patch, idx2.png, 
 indexing_with_ql_rewrites_trunk_953221.patch


 Implement indexing on Hive so that lookup and range queries are efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1264) Make Hive work with Hadoop security

2010-07-26 Thread Ashish Thusoo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo reassigned HIVE-1264:
---

Assignee: Venkatesh S

 Make Hive work with Hadoop security
 ---

 Key: HIVE-1264
 URL: https://issues.apache.org/jira/browse/HIVE-1264
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Jeff Hammerbacher
Assignee: Venkatesh S
 Attachments: HiveHadoop20S_patch.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1264) Make Hive work with Hadoop security

2010-07-26 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892517#action_12892517
 ] 

Ashish Thusoo commented on HIVE-1264:
-

Can these changes be packed in the shims layer. So all the calls can be 
replaced with a call to shims with the shim for 20.1xx doing the right thing. 

 Make Hive work with Hadoop security
 ---

 Key: HIVE-1264
 URL: https://issues.apache.org/jira/browse/HIVE-1264
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Jeff Hammerbacher
Assignee: Venkatesh S
 Attachments: HiveHadoop20S_patch.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-417) Implement Indexing in Hive

2010-07-02 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884685#action_12884685
 ] 

Ashish Thusoo commented on HIVE-417:


Looked at the code and have some questions...

Can you explain how the metastore object model is laid out. It seems that the 
table names of the index are stored in key value properties of the table that 
the index is created on. Is that correct? Would it be better to put a key 
reference from the index table to the base table instead (similar to what is 
done for partitions)?

Also, how would this be used to query the table? Can you give an example?

Is the idea here to select from the index an then pass the offsets to another 
query to look up the table? An example or a test which shows the query on the 
base table would be useful.


 Implement Indexing in Hive
 --

 Key: HIVE-417
 URL: https://issues.apache.org/jira/browse/HIVE-417
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0
Reporter: Prasad Chakka
Assignee: He Yongqiang
 Attachments: hive-417.proto.patch, hive-417-2009-07-18.patch, 
 hive-indexing.3.patch, hive-indexing.5.thrift.patch, 
 indexing_with_ql_rewrites_trunk_953221.patch


 Implement indexing on Hive so that lookup and range queries are efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-287) count distinct on multiple columns does not work

2010-07-02 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884699#action_12884699
 ] 

Ashish Thusoo commented on HIVE-287:


@John

Another disadvantage for doing C that I can think of is the fact that count 
would become a keyword and then any column names would have to be quoted. Not a 
big deal but just something that would be a side effect of going with C.


 count distinct on multiple columns does not work
 

 Key: HIVE-287
 URL: https://issues.apache.org/jira/browse/HIVE-287
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Arvind Prabhakar
 Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
 HIVE-287-4.patch


 The following query does not work:
 select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1449) Table aliases in order by clause lead to semantic analysis failure

2010-07-02 Thread Ashish Thusoo (JIRA)
Table aliases in order by clause lead to semantic analysis failure
--

 Key: HIVE-1449
 URL: https://issues.apache.org/jira/browse/HIVE-1449
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Fix For: 0.7.0


A simple statement of the form

select a.account_id, count(1) from tmp_ash_test2 a group by a.account_id order 
by a.account_id;

throws a sematic analysis exception

where as

select a.account_id, count(1) from tmp_ash_test2 a group by a.account_id order 
by account_id;

works fine (the second query does not have the table alias a in the order by 
clause.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1428) ALTER TABLE ADD PARTITION fails with a remote Thirft metastore

2010-07-01 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884401#action_12884401
 ] 

Ashish Thusoo commented on HIVE-1428:
-

Is your question the fact that build.dir is an empty string? build.dir gets 
defined in build-common.xml which inturn picks properties from 
build.properties. The build.xml in the metastore directory includes 
build-common.xml so it should be getting build.dir. How are you running this 
test?


 ALTER TABLE ADD PARTITION fails with a remote Thirft metastore
 --

 Key: HIVE-1428
 URL: https://issues.apache.org/jira/browse/HIVE-1428
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.6.0, 0.7.0
Reporter: Paul Yang
 Attachments: HIVE-1428.patch, TestHiveMetaStoreRemote.java


 If the hive cli is configured to use a remote metastore, ALTER TABLE ... ADD 
 PARTITION commands will fail with an error similar to the following:
 [prade...@chargesize:~/dev/howl]hive --auxpath ult-serde.jar -e ALTER TABLE 
 mytable add partition(datestamp = '20091101', srcid = '10',action) location 
 '/user/pradeepk/mytable/20091101/10';
 10/06/16 17:08:59 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found 
 in the classpath. Usage of hadoop-site.xml is deprecated. Instead use 
 core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of 
 core-default.xml, mapred-default.xml and hdfs-default.xml respectively
 Hive history 
 file=/tmp/pradeepk/hive_job_log_pradeepk_201006161709_1934304805.txt
 FAILED: Error in metadata: org.apache.thrift.TApplicationException: 
 get_partition failed: unknown result
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask
 [prade...@chargesize:~/dev/howl]
 This is due to a check that tries to retrieve the partition to see if it 
 exists. If it does not, an attempt is made to pass a null value from the 
 metastore. Since thrift does not support null return values, an exception is 
 thrown.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: predicate pushdown, HIVE-1395, and HIVE-1342

2010-06-28 Thread Ashish Thusoo
I will look into those.

Ashish

-Original Message-
From: John Sichi [mailto:jsi...@facebook.com] 
Sent: Monday, June 28, 2010 4:54 PM
To: hive-dev@hadoop.apache.org
Subject: predicate pushdown, HIVE-1395, and HIVE-1342

Could the person who originally developed predicate pushdown take a look at 
these two bugs and add hints?

Thanks,
JVS



[jira] Updated: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch

2010-06-25 Thread Ashish Thusoo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-1271:


Status: Resolved  (was: Patch Available)
Resolution: Fixed

Committed. Thanks Arvind!

 Case sensitiveness of type information specified when using custom reducer 
 causes type mismatch
 ---

 Key: HIVE-1271
 URL: https://issues.apache.org/jira/browse/HIVE-1271
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.5.0
Reporter: Dilip Joseph
Assignee: Arvind Prabhakar
 Fix For: 0.6.0

 Attachments: HIVE-1271-1.patch, HIVE-1271.patch


 Type information specified  while using a custom reduce script is converted 
 to lower case, and causes type mismatch during query semantic analysis .  The 
 following REDUCE query where field name =  userId failed.
 hive CREATE TABLE SS (
 a INT,
 b INT,
 vals ARRAYSTRUCTuserId:INT, y:STRING
 );
 OK
 hive FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
 INSERT OVERWRITE TABLE SS
 REDUCE *
 USING 'myreduce.py'
 AS
 (a INT,
 b INT,
 vals ARRAYSTRUCTuserId:INT, y:STRING
 )
 ;
 FAILED: Error in semantic analysis: line 2:27 Cannot insert into
 target table because column number/types are different SS: Cannot
 convert column 2 from arraystructuserId:int,y:string to
 arraystructuserid:int,y:string.
 The same query worked fine after changing userId to userid.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: 6.0 and trunk look broken to me

2010-06-23 Thread Ashish Thusoo
Not sure if this is just my env but on 0.6.0 when I run the unit tests I get a 
bunch of errors of the following form:

[junit] Begin query: alter3.q
[junit] java.lang.NoSuchFieldError: HIVESESSIONSILENT
[junit] at 
org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:1052)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
[junit] at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194)
[junit] at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
[junit] at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
[junit] at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220)
[junit] 

-Original Message-
From: John Sichi [mailto:jsi...@facebook.com] 
Sent: Wednesday, June 23, 2010 2:15 PM
To: hive-dev@hadoop.apache.org
Subject: Re: 6.0 and trunk look broken to me

(You mean 0.6, right?)

I'm not able to reproduce this (just tested with latest trunk on Linux and 
Mac).  Is anyone else seeing it?

JVS

On Jun 23, 2010, at 1:51 PM, Edward Capriolo wrote:

 Trunk and 6.0 both show this in hadoop local mode and hadoop distributed mode.
 
 export HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_loca
 edw...@ec dist]$ export
 HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_local[edw...@ec dist]$ 
 bin/hive Hive history 
 file=/tmp/edward/hive_job_log_edward_201006231647_1723542005.txt
 hive show tables;
 FAILED: Parse Error: line 0:-1 cannot recognize input 'EOF'
 
 [edw...@ec dist]$ more /tmp/edward/hive.log
 2010-06-23 16:41:00,749 ERROR ql.Driver
 (SessionState.java:printError(277)) - FAILED: Parse Error: line 0:-1 
 cannot recognize input 'EOF'
 
 org.apache.hadoop.hive.ql.parse.ParseException: line 0:-1 cannot 
 recognize input 'EOF'
 
   at 
 org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:401)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:299)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:379)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)



[jira] Commented: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch

2010-06-23 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881998#action_12881998
 ] 

Ashish Thusoo commented on HIVE-1271:
-

I have committed this to trunk and will commit to 0.6.0 soon. One thing I did 
overlook though. We should add a test case for this. Can you do that as part of 
another JIRA as this one is already partially committed.

Thanks,
Ashish

 Case sensitiveness of type information specified when using custom reducer 
 causes type mismatch
 ---

 Key: HIVE-1271
 URL: https://issues.apache.org/jira/browse/HIVE-1271
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.5.0
Reporter: Dilip Joseph
Assignee: Arvind Prabhakar
 Fix For: 0.6.0

 Attachments: HIVE-1271-1.patch, HIVE-1271.patch


 Type information specified  while using a custom reduce script is converted 
 to lower case, and causes type mismatch during query semantic analysis .  The 
 following REDUCE query where field name =  userId failed.
 hive CREATE TABLE SS (
 a INT,
 b INT,
 vals ARRAYSTRUCTuserId:INT, y:STRING
 );
 OK
 hive FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
 INSERT OVERWRITE TABLE SS
 REDUCE *
 USING 'myreduce.py'
 AS
 (a INT,
 b INT,
 vals ARRAYSTRUCTuserId:INT, y:STRING
 )
 ;
 FAILED: Error in semantic analysis: line 2:27 Cannot insert into
 target table because column number/types are different SS: Cannot
 convert column 2 from arraystructuserId:int,y:string to
 arraystructuserid:int,y:string.
 The same query worked fine after changing userId to userid.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch

2010-06-22 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881306#action_12881306
 ] 

Ashish Thusoo commented on HIVE-1271:
-

I am looking at this.


 Case sensitiveness of type information specified when using custom reducer 
 causes type mismatch
 ---

 Key: HIVE-1271
 URL: https://issues.apache.org/jira/browse/HIVE-1271
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.5.0
Reporter: Dilip Joseph
Assignee: Arvind Prabhakar
 Fix For: 0.6.0

 Attachments: HIVE-1271-1.patch, HIVE-1271.patch


 Type information specified  while using a custom reduce script is converted 
 to lower case, and causes type mismatch during query semantic analysis .  The 
 following REDUCE query where field name =  userId failed.
 hive CREATE TABLE SS (
 a INT,
 b INT,
 vals ARRAYSTRUCTuserId:INT, y:STRING
 );
 OK
 hive FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
 INSERT OVERWRITE TABLE SS
 REDUCE *
 USING 'myreduce.py'
 AS
 (a INT,
 b INT,
 vals ARRAYSTRUCTuserId:INT, y:STRING
 )
 ;
 FAILED: Error in semantic analysis: line 2:27 Cannot insert into
 target table because column number/types are different SS: Cannot
 convert column 2 from arraystructuserId:int,y:string to
 arraystructuserid:int,y:string.
 The same query worked fine after changing userId to userid.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch

2010-06-22 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881319#action_12881319
 ] 

Ashish Thusoo commented on HIVE-1271:
-

Looks good to me. However, why remove the check on Category? Also why drop the 
default implementation of the equals method for TypeInfo? 


 Case sensitiveness of type information specified when using custom reducer 
 causes type mismatch
 ---

 Key: HIVE-1271
 URL: https://issues.apache.org/jira/browse/HIVE-1271
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.5.0
Reporter: Dilip Joseph
Assignee: Arvind Prabhakar
 Fix For: 0.6.0

 Attachments: HIVE-1271-1.patch, HIVE-1271.patch


 Type information specified  while using a custom reduce script is converted 
 to lower case, and causes type mismatch during query semantic analysis .  The 
 following REDUCE query where field name =  userId failed.
 hive CREATE TABLE SS (
 a INT,
 b INT,
 vals ARRAYSTRUCTuserId:INT, y:STRING
 );
 OK
 hive FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
 INSERT OVERWRITE TABLE SS
 REDUCE *
 USING 'myreduce.py'
 AS
 (a INT,
 b INT,
 vals ARRAYSTRUCTuserId:INT, y:STRING
 )
 ;
 FAILED: Error in semantic analysis: line 2:27 Cannot insert into
 target table because column number/types are different SS: Cannot
 convert column 2 from arraystructuserId:int,y:string to
 arraystructuserid:int,y:string.
 The same query worked fine after changing userId to userid.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: Vertical partitioning

2010-06-17 Thread Ashish Thusoo
If you are querying this data again and again you could just create another 
table which has only those 10 columns (more like a materialized view approach - 
though that is not there in Hive yet.) This ofcourse uses up some space as 
compared to vertical partitioning but if the rcfile performance is not good 
enough, this could be the workaround for now. Also do you see a lot more time 
spent on I/O in your queries?

Ashish

-Original Message-
From: Edward Capriolo [mailto:edlinuxg...@gmail.com] 
Sent: Thursday, June 17, 2010 9:02 AM
To: hive-dev@hadoop.apache.org
Subject: Re: Vertical partitioning

On Thu, Jun 17, 2010 at 3:00 AM, jaydeep vishwakarma  
jaydeep.vishwaka...@mkhoj.com wrote:

 Just looking opportunity and feasibility for it. In one of my table 
 have more than 20 fields where most of the time I need only 10 main 
 fields. We rarely need other fields for day to day analysis.

 Regards,
 Jaydeep


 Ning Zhang wrote:

 Hive support columnar storage (RCFile) but not vertical partitioning. 
 Is there any use case for vertical partitioning?

 On Jun 16, 2010, at 6:41 AM, jaydeep vishwakarma wrote:



 Hi,

 Does hive support Vertical partitioning?

 Regards,
 Jaydeep



 The information contained in this communication is intended solely for 
 the use of the individual or entity to whom it is addressed and others 
 authorized to receive it. It may contain confidential or legally 
 privileged information. If you are not the intended recipient you are 
 hereby notified that any disclosure, copying, distribution or taking 
 any action in reliance on the contents of this information is strictly 
 prohibited and may be unlawful. If you have received this 
 communication in error, please notify us immediately by responding to this 
 email and then delete it from your system.
 The firm is neither liable for the proper and complete transmission of 
 the information contained in this communication nor for any delay in 
 its receipt.







 

 The information contained in this communication is intended solely for 
 the use of the individual or entity to whom it is addressed and others 
 authorized to receive it. It may contain confidential or legally 
 privileged information. If you are not the intended recipient you are 
 hereby notified that any disclosure, copying, distribution or taking 
 any action in reliance on the contents of this information is strictly 
 prohibited and may be unlawful. If you have received this 
 communication in error, please notify us immediately by responding to this 
 email and then delete it from your system.
 The firm is neither liable for the proper and complete transmission of 
 the information contained in this communication nor for any delay in 
 its receipt.


Vertical partitioning is just as practical in a traditional RDBMS as it would 
be in hive. Normally you would do it for a few reasons:
1) You have some rarely used columns and you want to reduce the table/row size
2) Your DBMS has terrible blob/clob/text support and the only want to get large 
objects out of your way is to put them in other tables.

If you go the option of vertical partitioning in hive, you may have to join to 
select the columns you need. I do not consider row serialization and de 
serialization to be the majority of a hive job, and in most cases hadoop 
handles 1 large file better then two smaller ones. Then again we have some 
tables 140+ columns so i can see vertical partitioning helping with those 
tables but it doubles the management.


RE: how to set the debut parameters of hive?

2010-06-11 Thread Ashish Thusoo
I think if you just pass the java parameters on the command line it should just 
work. So bin/hive  And your parameters. I have not tried it though, mostly 
I am just able to debug using eclipse (you can create the related eclipse files 
by doing

cd metastore
ant model-jar
cd ..
ant eclipse-files 

Ashish

-Original Message-
From: Zhou Shuaifeng [mailto:zhoushuaif...@huawei.com] 
Sent: Friday, June 11, 2010 12:00 AM
To: hive-dev@hadoop.apache.org
Cc: ac.pi...@huawei.com
Subject: how to set the debut parameters of hive?

Hi, I want to debug hive remotely, how to set the config?
E.g. debug hdfs is seeting DEBUG_PARAMETERS in the file 'bin/hadoop', so, how 
to set the debug parameters of hive?
Thanks a lot.



-
This e-mail and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify 
the sender by phone or email immediately and delete it!

 


[jira] Updated: (HIVE-1373) Missing connection pool plugin in Eclipse classpath

2010-06-09 Thread Ashish Thusoo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-1373:


   Status: Resolved  (was: Patch Available)
 Hadoop Flags: [Reviewed]
Fix Version/s: 0.6.0
   Resolution: Fixed

Committed. Thanks Vinithra!!


 Missing connection pool plugin in Eclipse classpath
 ---

 Key: HIVE-1373
 URL: https://issues.apache.org/jira/browse/HIVE-1373
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
 Environment: Eclipse, Linux
Reporter: Vinithra Varadharajan
Assignee: Vinithra Varadharajan
 Fix For: 0.6.0

 Attachments: HIVE-1373.patch


 In a recent checkin, connection pool dependency was introduced but eclipse 
 .classpath file was not updated.  This causes launch configurations from 
 within Eclipse to fail.
 {code}
 hive show tables;
 show tables;
 10/05/26 14:59:46 INFO parse.ParseDriver: Parsing command: show tables
 10/05/26 14:59:46 INFO parse.ParseDriver: Parse Completed
 10/05/26 14:59:46 INFO ql.Driver: Semantic Analysis Completed
 10/05/26 14:59:46 INFO ql.Driver: Returning Hive schema: 
 Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from 
 deserializer)], properties:null)
 10/05/26 14:59:46 INFO ql.Driver: query plan = 
 file:/tmp/vinithra/hive_2010-05-26_14-59-46_058_1636674338194744357/queryplan.xml
 10/05/26 14:59:46 INFO ql.Driver: Starting command: show tables
 10/05/26 14:59:46 INFO metastore.HiveMetaStore: 0: Opening raw store with 
 implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
 10/05/26 14:59:46 INFO metastore.ObjectStore: ObjectStore, initialize called
 FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Error 
 creating transactional connection factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
 10/05/26 14:59:47 ERROR exec.DDLTask: FAILED: Error in metadata: 
 javax.jdo.JDOFatalInternalException: Error creating transactional connection 
 factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 javax.jdo.JDOFatalInternalException: Error creating transactional connection 
 factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
   at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:491)
   at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:472)
   at org.apache.hadoop.hive.ql.metadata.Hive.getAllTables(Hive.java:458)
   at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:504)
   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:176)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:631)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:504)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:382)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303)
 Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional 
 connection factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
   at 
 org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:395)
   at 
 org.datanucleus.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:547)
   at 
 org.datanucleus.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:175)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at javax.jdo.JDOHelper$16.run(JDOHelper.java:1956)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.jdo.JDOHelper.invoke(JDOHelper.java:1951)
   at 
 javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1159)
   at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:803)
   at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:698)
   at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:191)
   at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:208)
   at 
 org.apache.hadoop.hive.metastore.ObjectStore.initialize

[jira] Commented: (HIVE-1397) histogram() UDAF for a numerical column

2010-06-09 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877233#action_12877233
 ] 

Ashish Thusoo commented on HIVE-1397:
-

+1.

This would be a cool contribution.


 histogram() UDAF for a numerical column
 ---

 Key: HIVE-1397
 URL: https://issues.apache.org/jira/browse/HIVE-1397
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Mayank Lahiri
Assignee: Mayank Lahiri
 Fix For: 0.6.0


 A histogram() UDAF to generate an approximate histogram of a numerical (byte, 
 short, double, long, etc.) column. The result is returned as a map of (x,y) 
 histogram pairs, and can be plotted in Gnuplot using impulses (for example). 
 The algorithm is currently adapted from A streaming parallel decision tree 
 algorithm by Ben-Haim and Tom-Tov, JMLR 11 (2010), and uses space 
 proportional to the number of histogram bins specified. It has no 
 approximation guarantees, but seems to work well when there is a lot of data 
 and a large number (e.g. 50-100) of histogram bins specified.
 A typical call might be:
 SELECT histogram(val, 10) FROM some_table;
 where the result would be a histogram with 10 bins, returned as a Hive map 
 object.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1139) GroupByOperator sometimes throws OutOfMemory error when there are too many distinct keys

2010-06-09 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877232#action_12877232
 ] 

Ashish Thusoo commented on HIVE-1139:
-

Arvind, I thought the whole point of this JIRA was to make HashMapWrapper to 
support java.util.Map, no? If that would be a separate JIRA, what would this 
one be for? Sorry for being a bit dense here but if you could clarify that 
would be great.

Thanks,
Ashish


 GroupByOperator sometimes throws OutOfMemory error when there are too many 
 distinct keys
 

 Key: HIVE-1139
 URL: https://issues.apache.org/jira/browse/HIVE-1139
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Arvind Prabhakar

 When a partial aggregation performed on a mapper, a HashMap is created to 
 keep all distinct keys in main memory. This could leads to OOM exception when 
 there are too many distinct keys for a particular mapper. A workaround is to 
 set the map split size smaller so that each mapper takes less number of rows. 
 A better solution is to use the persistent HashMapWrapper (currently used in 
 CommonJoinOperator) to spill overflow rows to disk. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1398) Support union all without an outer select *

2010-06-09 Thread Ashish Thusoo (JIRA)
Support union all without an outer select *
---

 Key: HIVE-1398
 URL: https://issues.apache.org/jira/browse/HIVE-1398
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo


In hive for union alls the query has to be wrapped in an sub query as shown 
below:

select * from 
(select c1 from t1
  union all
  select c2 from t2);

This JIRA proposes to fix that to support

select c1 from t1
union all
select c2 from t2;


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-417) Implement Indexing in Hive

2010-06-09 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877236#action_12877236
 ] 

Ashish Thusoo commented on HIVE-417:


A couple of comments on this:

A complication that happens by doing a rewrite just after parse is that you 
loose the ability to report back errors that correspond to the original query. 
Also the 
metadata that you need to do the rewrite is only available after phase 1 of 
semantic analysis. So in my opinion the rewrite should be done after semantic 
analysis but before plan generation. Is that what you had in mind...

so something like...

[Query parser]
[Query semantic analysis]
[Query optimization]
...


 Implement Indexing in Hive
 --

 Key: HIVE-417
 URL: https://issues.apache.org/jira/browse/HIVE-417
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0
Reporter: Prasad Chakka
Assignee: He Yongqiang
 Attachments: hive-417.proto.patch, hive-417-2009-07-18.patch, 
 hive-indexing.3.patch


 Implement indexing on Hive so that lookup and range queries are efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1373) Missing connection pool plugin in Eclipse classpath

2010-05-28 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872962#action_12872962
 ] 

Ashish Thusoo commented on HIVE-1373:
-

1 copy is anyway done from lib to dist/lib for these jars. If we go directly to 
ivy we would copy things from the ivy cache to dist/lib. So the number of 
copies in the build process
would remain the same, no? There is of course the first time overhead of 
downloading these jars from their repos to the ivy cache.

 Missing connection pool plugin in Eclipse classpath
 ---

 Key: HIVE-1373
 URL: https://issues.apache.org/jira/browse/HIVE-1373
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
 Environment: Eclipse, Linux
Reporter: Vinithra Varadharajan
Assignee: Vinithra Varadharajan
Priority: Minor
 Attachments: HIVE-1373.patch


 In a recent checkin, connection pool dependency was introduced but eclipse 
 .classpath file was not updated.  This causes launch configurations from 
 within Eclipse to fail.
 {code}
 hive show tables;
 show tables;
 10/05/26 14:59:46 INFO parse.ParseDriver: Parsing command: show tables
 10/05/26 14:59:46 INFO parse.ParseDriver: Parse Completed
 10/05/26 14:59:46 INFO ql.Driver: Semantic Analysis Completed
 10/05/26 14:59:46 INFO ql.Driver: Returning Hive schema: 
 Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from 
 deserializer)], properties:null)
 10/05/26 14:59:46 INFO ql.Driver: query plan = 
 file:/tmp/vinithra/hive_2010-05-26_14-59-46_058_1636674338194744357/queryplan.xml
 10/05/26 14:59:46 INFO ql.Driver: Starting command: show tables
 10/05/26 14:59:46 INFO metastore.HiveMetaStore: 0: Opening raw store with 
 implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
 10/05/26 14:59:46 INFO metastore.ObjectStore: ObjectStore, initialize called
 FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Error 
 creating transactional connection factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
 10/05/26 14:59:47 ERROR exec.DDLTask: FAILED: Error in metadata: 
 javax.jdo.JDOFatalInternalException: Error creating transactional connection 
 factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 javax.jdo.JDOFatalInternalException: Error creating transactional connection 
 factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
   at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:491)
   at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:472)
   at org.apache.hadoop.hive.ql.metadata.Hive.getAllTables(Hive.java:458)
   at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:504)
   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:176)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:631)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:504)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:382)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303)
 Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional 
 connection factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
   at 
 org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:395)
   at 
 org.datanucleus.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:547)
   at 
 org.datanucleus.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:175)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at javax.jdo.JDOHelper$16.run(JDOHelper.java:1956)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.jdo.JDOHelper.invoke(JDOHelper.java:1951)
   at 
 javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1159)
   at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:803)
   at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:698

[jira] Assigned: (HIVE-1368) Hive JDBC Integration with SQuirrel SQL Client support Enhanced

2010-05-28 Thread Ashish Thusoo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo reassigned HIVE-1368:
---

Assignee: Sunil Kumar

Sunil, I have added you as a contributor so you can assign JIRAs to yourself.

 Hive JDBC Integration with SQuirrel SQL Client support Enhanced
 ---

 Key: HIVE-1368
 URL: https://issues.apache.org/jira/browse/HIVE-1368
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Clients
Affects Versions: 0.5.0
 Environment: ubuntu8.04, jdk-6,hive-0.5.0, hadoop-0.20.1 
Reporter: Sunil Kumar
Assignee: Sunil Kumar
 Fix For: 0.5.0

 Attachments: Hive JDBC Integration with SQuirrel SQL Client support 
 Enhanced.doc, SQLClient_support.patch


 Hive JDBC Integration with SQuirrel SQL Client support Enhanced:-
 Hive JDBC Client enhanced to browse hive default schema tables through 
 Squirrel SQL Client.
 This enhancement help to browse the hive table's structure i.e. table's 
 column and their data type in the Squirrel SQL client interface and SQL query 
 can be also performed on the tables through Squirrel SQL client.
 To enable this following Hive JDBC Java files are modified and added:-
 1.Methods of org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData.java are 
 updated.
 2.Hive org.apache.hadoop.hive.jdbc.ResultSet.java updated and extended 
 (org.apache.hadoop.hive.jdbc.ExtendedHiveResultSet.java) to support 
 additional JDBC metadata 
 3.Methods of org.apache.hadoop.hive.jdbc. HiveResultSetMetaData are 
 updated.
 4.Methods of  org.apache.hadoop.hive.jdbc. HiveConnection are updated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1368) Hive JDBC Integration with SQuirrel SQL Client support Enhanced

2010-05-28 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872965#action_12872965
 ] 

Ashish Thusoo commented on HIVE-1368:
-

In my opinion best would be to load this patch to HIVE-1126 and name it for 
0.5.0 in case others want to use it for 0.5.0 and mark this JIRA as a duplicate 
of that one.

 Hive JDBC Integration with SQuirrel SQL Client support Enhanced
 ---

 Key: HIVE-1368
 URL: https://issues.apache.org/jira/browse/HIVE-1368
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Clients
Affects Versions: 0.5.0
 Environment: ubuntu8.04, jdk-6,hive-0.5.0, hadoop-0.20.1 
Reporter: Sunil Kumar
Assignee: Sunil Kumar
 Fix For: 0.5.0

 Attachments: Hive JDBC Integration with SQuirrel SQL Client support 
 Enhanced.doc, SQLClient_support.patch


 Hive JDBC Integration with SQuirrel SQL Client support Enhanced:-
 Hive JDBC Client enhanced to browse hive default schema tables through 
 Squirrel SQL Client.
 This enhancement help to browse the hive table's structure i.e. table's 
 column and their data type in the Squirrel SQL client interface and SQL query 
 can be also performed on the tables through Squirrel SQL client.
 To enable this following Hive JDBC Java files are modified and added:-
 1.Methods of org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData.java are 
 updated.
 2.Hive org.apache.hadoop.hive.jdbc.ResultSet.java updated and extended 
 (org.apache.hadoop.hive.jdbc.ExtendedHiveResultSet.java) to support 
 additional JDBC metadata 
 3.Methods of org.apache.hadoop.hive.jdbc. HiveResultSetMetaData are 
 updated.
 4.Methods of  org.apache.hadoop.hive.jdbc. HiveConnection are updated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1346) Table column name changed to _col1,_col2 ..._coln when where clause used in the select quert statement

2010-05-28 Thread Ashish Thusoo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo reassigned HIVE-1346:
---

Assignee: Sunil Kumar

 Table column name changed to _col1,_col2 ..._coln when where clause used in 
 the select quert statement
 --

 Key: HIVE-1346
 URL: https://issues.apache.org/jira/browse/HIVE-1346
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Clients
Affects Versions: 0.5.0
 Environment: ubuntu8.04, jdk-6,hive-0.5.0, hadoop-0.20.1
Reporter: Sunil Kumar
Assignee: Sunil Kumar
Priority: Minor
 Attachments: HIVE-1346_patch.patch, HIVE-1346_patch.patch, 
 HIVE-1346_patch.patch


 when where clause used in the hive query hive -ResultSetMetaData  does not 
 give original table column name. While when where clause not used 
 ResultSetMetaData  gives original table column names. I have used following 
 code:-
 String tableName = user;
   String sql = select * from  + tableName +  where 
 id=1;
   result = stmt.executeQuery(sql);
   ResultSetMetaData metaData = result.getMetaData();
   int columnCount = metaData.getColumnCount();
   for (int i = 1; i = columnCount; i++) {
   System.out.println(Column name:  + 
 metaData.getColumnName(i));
   }
 executing above code i got following result:-
 Column name:_col1
 Column name:_col2
 while original user table columns names were (id,name).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1346) Table column name changed to _col1,_col2 ..._coln when where clause used in the select quert statement

2010-05-28 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872968#action_12872968
 ] 

Ashish Thusoo commented on HIVE-1346:
-

Hi Sunil,

Have you created this patch on 0.5.0 branch or trunk? Are you  proposing that 
this goes into both 0.5.1 and trunk?

 Table column name changed to _col1,_col2 ..._coln when where clause used in 
 the select quert statement
 --

 Key: HIVE-1346
 URL: https://issues.apache.org/jira/browse/HIVE-1346
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Clients
Affects Versions: 0.5.0
 Environment: ubuntu8.04, jdk-6,hive-0.5.0, hadoop-0.20.1
Reporter: Sunil Kumar
Assignee: Sunil Kumar
Priority: Minor
 Attachments: HIVE-1346_patch.patch, HIVE-1346_patch.patch, 
 HIVE-1346_patch.patch


 when where clause used in the hive query hive -ResultSetMetaData  does not 
 give original table column name. While when where clause not used 
 ResultSetMetaData  gives original table column names. I have used following 
 code:-
 String tableName = user;
   String sql = select * from  + tableName +  where 
 id=1;
   result = stmt.executeQuery(sql);
   ResultSetMetaData metaData = result.getMetaData();
   int columnCount = metaData.getColumnCount();
   for (int i = 1; i = columnCount; i++) {
   System.out.println(Column name:  + 
 metaData.getColumnName(i));
   }
 executing above code i got following result:-
 Column name:_col1
 Column name:_col2
 while original user table columns names were (id,name).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1346) Table column name changed to _col1,_col2 ..._coln when where clause used in the select quert statement

2010-05-28 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872975#action_12872975
 ] 

Ashish Thusoo commented on HIVE-1346:
-

@Namit,

in what cases would colAlias ever be null. There seems to be code which checks 
for this around line 3314 in the trunk branch. But afaik we should always be 
generating a colAlias (at least the default ones). Just wanted to make sure 
that we are covering all the basis with this fix.

Ashish

 Table column name changed to _col1,_col2 ..._coln when where clause used in 
 the select quert statement
 --

 Key: HIVE-1346
 URL: https://issues.apache.org/jira/browse/HIVE-1346
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Clients
Affects Versions: 0.5.0
 Environment: ubuntu8.04, jdk-6,hive-0.5.0, hadoop-0.20.1
Reporter: Sunil Kumar
Assignee: Sunil Kumar
Priority: Minor
 Attachments: HIVE-1346_patch.patch, HIVE-1346_patch.patch, 
 HIVE-1346_patch.patch


 when where clause used in the hive query hive -ResultSetMetaData  does not 
 give original table column name. While when where clause not used 
 ResultSetMetaData  gives original table column names. I have used following 
 code:-
 String tableName = user;
   String sql = select * from  + tableName +  where 
 id=1;
   result = stmt.executeQuery(sql);
   ResultSetMetaData metaData = result.getMetaData();
   int columnCount = metaData.getColumnCount();
   for (int i = 1; i = columnCount; i++) {
   System.out.println(Column name:  + 
 metaData.getColumnName(i));
   }
 executing above code i got following result:-
 Column name:_col1
 Column name:_col2
 while original user table columns names were (id,name).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1375) dynamic partitions should not create some of the partitions if the query fails

2010-05-28 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872976#action_12872976
 ] 

Ashish Thusoo commented on HIVE-1375:
-

An example would be great to help explain this problem better?

Thanks,
Ashish

 dynamic partitions should not create some of the partitions if the query fails
 --

 Key: HIVE-1375
 URL: https://issues.apache.org/jira/browse/HIVE-1375
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Ning Zhang
 Fix For: 0.6.0


 Currently, if a bad row exists, which cannot be part of a partitioning 
 column, it fails - but some of the partitions may already have been created

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1374) Query compile-only option

2010-05-28 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872981#action_12872981
 ] 

Ashish Thusoo commented on HIVE-1374:
-

Is doing explain on the query enough? If the proposal to convert queries into 
explains when run with -c option?

Also consider the following example in a query.hql script..


create table foo(bar string);

insert overwrite table foo select c1 from old_foo;

What would happen to the create statement in this compile only option?

Maybe it is better to provide a switch to do parse only checks?

 Query compile-only option
 -

 Key: HIVE-1374
 URL: https://issues.apache.org/jira/browse/HIVE-1374
 Project: Hadoop Hive
  Issue Type: New Feature
Affects Versions: 0.6.0
Reporter: Paul Yang
Assignee: Paul Yang

 A compile-only option might be useful for helping users quickly prototype 
 queries, fix errors, and do test runs. The proposed change would be adding a 
 -c switch that behaves like -e but only compiles the specified query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1372) New algorithm for variance() UDAF

2010-05-28 Thread Ashish Thusoo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-1372:


Status: Patch Available  (was: Open)

Hi Mayank,

Thanks for the contribution. Please do a submit patch when you put up a patch 
for a JIRA.

Thanks,
Ashish

 New algorithm for variance() UDAF
 -

 Key: HIVE-1372
 URL: https://issues.apache.org/jira/browse/HIVE-1372
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Mayank Lahiri
Priority: Minor
 Fix For: 0.6.0

 Attachments: HIVE-1372.patch


 A new algorithm for the UDAF that computes variance. This is pretty much a 
 drop-in replacement for the current UDAF, and has two benefits: provably 
 numerically stable (reference included in comments), and reduces arithmetic 
 operations by about half.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1372) New algorithm for variance() UDAF

2010-05-28 Thread Ashish Thusoo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo reassigned HIVE-1372:
---

Assignee: Mayank Lahiri

Also I have added you as a contributor, so you should be able to assign JIRAs 
to yourself.

Thanks,
Ashish

 New algorithm for variance() UDAF
 -

 Key: HIVE-1372
 URL: https://issues.apache.org/jira/browse/HIVE-1372
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Mayank Lahiri
Assignee: Mayank Lahiri
Priority: Minor
 Fix For: 0.6.0

 Attachments: HIVE-1372.patch


 A new algorithm for the UDAF that computes variance. This is pretty much a 
 drop-in replacement for the current UDAF, and has two benefits: provably 
 numerically stable (reference included in comments), and reduces arithmetic 
 operations by about half.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1359) Unit test should be shim-aware

2010-05-28 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872990#action_12872990
 ] 

Ashish Thusoo commented on HIVE-1359:
-

+1 to all the great suggestions in this discussion...

I have one more thing to add. Would it be more maintainable to associate the 
include/exclude information with the test as the key as opposed to the version 
being the key i.e.

instead of

0.20.0
  include - test1.q, test2.q ..
  exclude - test3.q

0.17.0
  include - test3.q
  exclude - test1.q

we do

test1.q
  exclude -  0.17.0

test2.q
  include - = 0.17.0

or something on that line... this may make adding tests to versions fairly easy.

 Unit test should be shim-aware
 --

 Key: HIVE-1359
 URL: https://issues.apache.org/jira/browse/HIVE-1359
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: unit_tests.txt


 Some features in Hive only works for certain Hadoop versions through shim. 
 However the unit test structure is not shim-aware in that there is only one 
 set of queries and expected outputs for all Hadoop versions. This may not be 
 sufficient when we will have different output for different Hadoop versions. 
 One example is CombineHiveInputFormat wich is only available from Hadoop 
 0.20. The plan using CombineHiveInputFormat and HiveInputFormat may be 
 different. Another example is archival partitions (HAR) which is also only 
 available from 0.20. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1265) Function Registry should should auto-detect UDFs from UDF Description

2010-05-28 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872993#action_12872993
 ] 

Ashish Thusoo commented on HIVE-1265:
-

Can you explain more what you mean by it is picking up the test class path? 
When you get the classes for a package, it should return you all the classes in 
that package irrespective of the location. 

+1 to the general approach here.

 Function Registry should should auto-detect UDFs  from UDF Description
 --

 Key: HIVE-1265
 URL: https://issues.apache.org/jira/browse/HIVE-1265
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: hive-1265-patch.diff


 We should be able to register functions dynamically.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1371) remove blank in rcfilecat

2010-05-28 Thread Ashish Thusoo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-1371:


Status: Patch Available  (was: Open)

Hi Yongqiang,

Please do a submit patch when putting up a patch.

Thanks,
Ashish

 remove blank in rcfilecat
 -

 Key: HIVE-1371
 URL: https://issues.apache.org/jira/browse/HIVE-1371
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: hive.1371.1.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1371) remove blank in rcfilecat

2010-05-28 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872995#action_12872995
 ] 

Ashish Thusoo commented on HIVE-1371:
-

+1.

Will commit.


 remove blank in rcfilecat
 -

 Key: HIVE-1371
 URL: https://issues.apache.org/jira/browse/HIVE-1371
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: hive.1371.1.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1369) LazySimpleSerDe should be able to read classes that support some form of toString()

2010-05-28 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872997#action_12872997
 ] 

Ashish Thusoo commented on HIVE-1369:
-

I I do not see any drawbacks here. I think another requirement from this was 
that the serialization be such that it is order preserving whereever there is a 
notion of order, as this serde could also be used to serialize between 
map/reduce boundaries. So if the implementation takes care of that and does not 
introduce oerhead I think this would be good.

Others, what do you think about this?

Ashish

 LazySimpleSerDe should be able to read classes that support some form of 
 toString()
 ---

 Key: HIVE-1369
 URL: https://issues.apache.org/jira/browse/HIVE-1369
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Alex Kozlov
Priority: Minor
   Original Estimate: 2h
  Remaining Estimate: 2h

 Currently LazySimpleSerDe is able to deserialize only BytesWritable or Text 
 objects.  It should be pretty easy to extend the class to read any object 
 that implements toString() method.
 Ideas or concerns?
 Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1373) Missing connection pool plugin in Eclipse classpath

2010-05-27 Thread Ashish Thusoo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo reassigned HIVE-1373:
---

Assignee: Vinithra Varadharajan

Have added you to the contributors so you should be able to assign things to 
yourself now.

Thx.

 Missing connection pool plugin in Eclipse classpath
 ---

 Key: HIVE-1373
 URL: https://issues.apache.org/jira/browse/HIVE-1373
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
 Environment: Eclipse, Linux
Reporter: Vinithra Varadharajan
Assignee: Vinithra Varadharajan
Priority: Minor
 Attachments: HIVE-1373.patch


 In a recent checkin, connection pool dependency was introduced but eclipse 
 .classpath file was not updated.  This causes launch configurations from 
 within Eclipse to fail.
 {code}
 hive show tables;
 show tables;
 10/05/26 14:59:46 INFO parse.ParseDriver: Parsing command: show tables
 10/05/26 14:59:46 INFO parse.ParseDriver: Parse Completed
 10/05/26 14:59:46 INFO ql.Driver: Semantic Analysis Completed
 10/05/26 14:59:46 INFO ql.Driver: Returning Hive schema: 
 Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from 
 deserializer)], properties:null)
 10/05/26 14:59:46 INFO ql.Driver: query plan = 
 file:/tmp/vinithra/hive_2010-05-26_14-59-46_058_1636674338194744357/queryplan.xml
 10/05/26 14:59:46 INFO ql.Driver: Starting command: show tables
 10/05/26 14:59:46 INFO metastore.HiveMetaStore: 0: Opening raw store with 
 implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
 10/05/26 14:59:46 INFO metastore.ObjectStore: ObjectStore, initialize called
 FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Error 
 creating transactional connection factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
 10/05/26 14:59:47 ERROR exec.DDLTask: FAILED: Error in metadata: 
 javax.jdo.JDOFatalInternalException: Error creating transactional connection 
 factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 javax.jdo.JDOFatalInternalException: Error creating transactional connection 
 factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
   at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:491)
   at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:472)
   at org.apache.hadoop.hive.ql.metadata.Hive.getAllTables(Hive.java:458)
   at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:504)
   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:176)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:631)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:504)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:382)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303)
 Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional 
 connection factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
   at 
 org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:395)
   at 
 org.datanucleus.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:547)
   at 
 org.datanucleus.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:175)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at javax.jdo.JDOHelper$16.run(JDOHelper.java:1956)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.jdo.JDOHelper.invoke(JDOHelper.java:1951)
   at 
 javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1159)
   at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:803)
   at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:698)
   at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:191)
   at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:208)
   at 
 org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:153

[jira] Commented: (HIVE-1373) Missing connection pool plugin in Eclipse classpath

2010-05-27 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872237#action_12872237
 ] 

Ashish Thusoo commented on HIVE-1373:
-

+1. Looks good to me. I think in future we should move all the lib dependencies 
in the eclipse files to come from build/dist/lib as that will help us migrate 
more stuff over to ivy.

Will run tests and commit once the tests pass.

 Missing connection pool plugin in Eclipse classpath
 ---

 Key: HIVE-1373
 URL: https://issues.apache.org/jira/browse/HIVE-1373
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
 Environment: Eclipse, Linux
Reporter: Vinithra Varadharajan
Assignee: Vinithra Varadharajan
Priority: Minor
 Attachments: HIVE-1373.patch


 In a recent checkin, connection pool dependency was introduced but eclipse 
 .classpath file was not updated.  This causes launch configurations from 
 within Eclipse to fail.
 {code}
 hive show tables;
 show tables;
 10/05/26 14:59:46 INFO parse.ParseDriver: Parsing command: show tables
 10/05/26 14:59:46 INFO parse.ParseDriver: Parse Completed
 10/05/26 14:59:46 INFO ql.Driver: Semantic Analysis Completed
 10/05/26 14:59:46 INFO ql.Driver: Returning Hive schema: 
 Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from 
 deserializer)], properties:null)
 10/05/26 14:59:46 INFO ql.Driver: query plan = 
 file:/tmp/vinithra/hive_2010-05-26_14-59-46_058_1636674338194744357/queryplan.xml
 10/05/26 14:59:46 INFO ql.Driver: Starting command: show tables
 10/05/26 14:59:46 INFO metastore.HiveMetaStore: 0: Opening raw store with 
 implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
 10/05/26 14:59:46 INFO metastore.ObjectStore: ObjectStore, initialize called
 FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Error 
 creating transactional connection factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
 10/05/26 14:59:47 ERROR exec.DDLTask: FAILED: Error in metadata: 
 javax.jdo.JDOFatalInternalException: Error creating transactional connection 
 factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 javax.jdo.JDOFatalInternalException: Error creating transactional connection 
 factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
   at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:491)
   at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:472)
   at org.apache.hadoop.hive.ql.metadata.Hive.getAllTables(Hive.java:458)
   at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:504)
   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:176)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:631)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:504)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:382)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303)
 Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional 
 connection factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
   at 
 org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:395)
   at 
 org.datanucleus.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:547)
   at 
 org.datanucleus.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:175)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at javax.jdo.JDOHelper$16.run(JDOHelper.java:1956)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.jdo.JDOHelper.invoke(JDOHelper.java:1951)
   at 
 javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1159)
   at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:803)
   at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:698)
   at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:191

[jira] Commented: (HIVE-802) Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it

2010-05-27 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872238#action_12872238
 ] 

Ashish Thusoo commented on HIVE-802:


Should we just mark this as a duplicate of 1176 in that case?

 Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it
 -

 Key: HIVE-802
 URL: https://issues.apache.org/jira/browse/HIVE-802
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Todd Lipcon
Assignee: Arvind Prabhakar

 There's a bug in DataNucleus that causes this issue:
 http://www.jpox.org/servlet/jira/browse/NUCCORE-371
 To reproduce, simply put your hive source tree in a directory that contains a 
 '+' character.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-80) Allow Hive Server to run multiple queries simulteneously

2010-05-27 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872240#action_12872240
 ] 

Ashish Thusoo commented on HIVE-80:
---

yes I think what Ning is saying is correct. We should however add a test case 
to the unit tests to check that. I am not sure that we added a test case for 
the parallel execution stuff.

 Allow Hive Server to run multiple queries simulteneously
 

 Key: HIVE-80
 URL: https://issues.apache.org/jira/browse/HIVE-80
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Server Infrastructure
Reporter: Raghotham Murthy
Assignee: Neil Conway
Priority: Critical
 Attachments: hive_input_format_race-2.patch


 Can use one driver object per query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [DISCUSSION] To be (or not to be) a TLP - that is the question

2010-04-22 Thread Ashish Thusoo
What is the advantage of becoming a TLP to the project itself? I have heard 
that it is something that apache wants, but considering that we are very 
comfortable on how Hive interacts with the Hadoop ecosystem as a sub project 
for Hadoop, there has to be some big incentive for the project to be a TLP and 
nowhere have a seen how this would benefit Hive. Any thoughts on that?

Ashish


From: Jeff Hammerbacher [mailto:ham...@cloudera.com]
Sent: Wednesday, April 21, 2010 7:35 PM
To: hive-dev@hadoop.apache.org
Cc: Ashish Thusoo
Subject: Re: [DISCUSSION] To be (or not to be) a TLP - that is the question

Hive already does the work to run on multiple versions of Hadoop, and the 
release cycle is independent of Hadoop's. I don't see why it should remain a 
subproject. I'm +1 on Hive becoming a TLP.

On Tue, Apr 20, 2010 at 2:03 PM, Zheng Shao 
zsh...@gmail.commailto:zsh...@gmail.com wrote:
As a Hive committer, I don't feel the benefit we get from becoming a
TLP is big enough (compared with the cost) to make Hive a TLP.
From Chris's comment I see that the cost is not that big, but I still
wonder what benefit we will get from that.

Also I didn't get the idea of the joke (In fact, one could argue that
Pig opting not to be TLP yet is why Hive should go TLP). I don't see
any reasons that applies to Pig but not Hive.
We should continue the discussion here, but anything in the Pig's
discussion should also be considered here.

Zheng

On Mon, Apr 19, 2010 at 5:48 PM, Amr Awadallah 
a...@cloudera.commailto:a...@cloudera.com wrote:
 I am personally +1 on Hive being a TLP, I think it did reach the community
 adoption and maturity level required for that. In fact, one could argue that
 Pig opting not to be TLP yet is why Hive should go TLP :) (jk).

 The real question to ask is whether there is a volunteer to take care of the
 administrative tasks, which isn't a ton of work afaiu (I am willing to
 volunteer if no body else up to the task, but I am not a committer and only
 contributed a minor patch for bash/cygwin).

 BTW, here is a very nice summary from Yahoo's Chris Douglas on TLP
 tradeoffs. I happen to agree with all he says, and frankly I couldn't have
 wrote it better my self. I highlight certain parts from his message, but I
 recommend you read the whole thing.

 -- Forwarded message --
 From: Chris Douglas cdoug...@apache.orgmailto:cdoug...@apache.org
 Date: Tue, Apr 13, 2010 at 11:46 PM
 Subject: Subprojects and TLP status
 To: gene...@hadoop.apache.orgmailto:gene...@hadoop.apache.org, 
 priv...@hadoop.apache.orgmailto:priv...@hadoop.apache.org

 Most of Hadoop's subprojects have discussed becoming top-level Apache
 projects (TLPs) in the last few weeks. Most have expressed a desire to
 remain in Hadoop. The salient parts of the discussions I've read tend
 to focus on three aspects: a technical dependence on Hadoop,
 additional overhead as a TLP, and visibility both within the Hadoop
 ecosystem and in the open source community generally.

 Life as a TLP: this is not much harder than being a Hadoop subproject,
 and the Apache preferences being tossed around- particularly
 insufficiently diverse- are not blockers. Every subproject needs to
 write a section of the report Hadoop sends to the board; almost the
 same report, sent to a new address. The initial cost is similarly
 light: copy bylaws, send a few notes to INFRA, and follow some
 directions. I think the estimated costs are far higher than they will
 be in practice. Inertia is a powerful force, but it should be
 overcome. The directions are here, and should not intimidating:

 http://apache.org/dev/project-creation.html

 Visibility: the Hadoop site does not need to change. For each
 subproject, we can literally change the hyperlinks to point to the new
 page and be done. Long-term, linking to all ASF projects that run on
 Hadoop from a prominent page is something we all want. So particularly
 in the medium-term that most are considering: visibility through the
 website will not change. Each subproject will still be linked from the
 front page.

 Hadoop would not be nearly as popular as it is without Zookeeper,
 HBase, Hive, and Pig. All statistics on work in shared MapReduce
 clusters show that users vastly prefer running Pig and Hive queries to
 writing MapReduce jobs. HBase continues to push features in HDFS that
 increase its adoption and relevance outside MapReduce, while sharing
 some of its NoSQL limelight. Zookeeper is not only a linchpin in real
 workloads, but many proposals for future features require it. The
 bottom line is that MapReduce and HDFS need these projects for
 visibility and adoption in precisely the same way. I don't think
 separate TLPs will uncouple the broader community from one another.

 Technical dependence: this has two dimensions. First, influencing
 MapReduce and HDFS. This is nonsense. Earning influence by
 contributing to a subproject is the only way to push code changes

[jira] Commented: (HIVE-987) Hive CLI Omnibus Improvement ticket

2010-04-22 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12859956#action_12859956
 ] 

Ashish Thusoo commented on HIVE-987:


I am +1 on this. I think this can open up good possibilities. I have not looked 
at sqlline code but how much does it depend on actually SQL dialect. Plus, how 
easy is it to extend to hdfs related command e.g. the CLI today has commands 
that can do set of conf variables. It also supports the hadoop dfs commands as 
well which talk directly to hdfs. I am not sure if too many people use them, 
but I do. Would be great to get them integrated with sqlline if that is 
possible.


 Hive CLI Omnibus Improvement ticket
 ---

 Key: HIVE-987
 URL: https://issues.apache.org/jira/browse/HIVE-987
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Carl Steinbach
 Attachments: HIVE-987.1.patch, sqlline-1.0.8_eb.jar


 Add the following features to the Hive CLI:
 * Command History
 * ReadLine support
 ** HIVE-120: Add readline support/support for alt-based commands in the CLI
 ** Java-ReadLine is LGPL, but it depends on GPL readline library. We probably 
 need to use JLine instead.
 * Tab completion
 ** HIVE-97: tab completion for hive cli
 * Embedded/Standalone CLI modes, and ability to connect to different Hive 
 Server instances.
 ** HIVE-818: Create a Hive CLI that connects to hive ThriftServer
 * .hiverc configuration file
 ** HIVE-920: .hiverc doesnt work
 * Improved support for comments.
 ** HIVE-430: Ability to comment desired for hive query files
 * Different output formats
 ** HIVE-49: display column header on CLI
 ** XML output format
 For additional inspiration we may want to look at the Postgres psql shell: 
 http://www.postgresql.org/docs/8.1/static/app-psql.html
 Finally, it would be really cool if we implemented this in a generic fashion 
 and spun it off as an apache-commons
 shell framework. It seems like most of the Apache Hadoop projects have their 
 own shells, and I'm sure the same is true
 for non-Hadoop Apache projects as well. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1320) NPE with lineage in a query of union alls on joins.

2010-04-22 Thread Ashish Thusoo (JIRA)
NPE with lineage in a query of union alls on joins.
---

 Key: HIVE-1320
 URL: https://issues.apache.org/jira/browse/HIVE-1320
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo


The following query generates a NPE in the lineage ctx code

EXPLAIN
INSERT OVERWRITE TABLE dest_l1
SELECT j.*
FROM (SELECT t1.key, p1.value
  FROM src1 t1
  LEFT OUTER JOIN src p1
  ON (t1.key = p1.key)
  UNION ALL
  SELECT t2.key, p2.value
  FROM src1 t2
  LEFT OUTER JOIN src p2
  ON (t2.key = p2.key)) j;

The stack trace is:

FAILED: Hive Internal Error: java.lang.NullPointerException(null)
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.optimizer.lineage.LineageCtx$Index.mergeDependency(LineageCtx.java:116)
at 
org.apache.hadoop.hive.ql.optimizer.lineage.OpProcFactory$UnionLineage.process(OpProcFactory.java:396)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88)
at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:54)
at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102)
at 
org.apache.hadoop.hive.ql.optimizer.lineage.Generator.transform(Generator.java:72)
at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:83)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5976)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:48)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1320) NPE with lineage in a query of union alls on joins.

2010-04-22 Thread Ashish Thusoo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-1320:


Attachment: HIVE-1320.patch

Fixed the NPE. The cause was that we were not checking for inp_dep to be null 
in the union all code path. We have to do that for all operators that have more 
than 1 parents.


 NPE with lineage in a query of union alls on joins.
 ---

 Key: HIVE-1320
 URL: https://issues.apache.org/jira/browse/HIVE-1320
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Attachments: HIVE-1320.patch


 The following query generates a NPE in the lineage ctx code
 EXPLAIN
 INSERT OVERWRITE TABLE dest_l1
 SELECT j.*
 FROM (SELECT t1.key, p1.value
   FROM src1 t1
   LEFT OUTER JOIN src p1
   ON (t1.key = p1.key)
   UNION ALL
   SELECT t2.key, p2.value
   FROM src1 t2
   LEFT OUTER JOIN src p2
   ON (t2.key = p2.key)) j;
 The stack trace is:
 FAILED: Hive Internal Error: java.lang.NullPointerException(null)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.optimizer.lineage.LineageCtx$Index.mergeDependency(LineageCtx.java:116)
 at 
 org.apache.hadoop.hive.ql.optimizer.lineage.OpProcFactory$UnionLineage.process(OpProcFactory.java:396)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88)
 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:54)
 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102)
 at 
 org.apache.hadoop.hive.ql.optimizer.lineage.Generator.transform(Generator.java:72)
 at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:83)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5976)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126)
 at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:48)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1320) NPE with lineage in a query of union alls on joins.

2010-04-22 Thread Ashish Thusoo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-1320:


   Status: Patch Available  (was: Open)
Affects Version/s: 0.6.0
Fix Version/s: 0.6.0

 NPE with lineage in a query of union alls on joins.
 ---

 Key: HIVE-1320
 URL: https://issues.apache.org/jira/browse/HIVE-1320
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Fix For: 0.6.0

 Attachments: HIVE-1320.patch


 The following query generates a NPE in the lineage ctx code
 EXPLAIN
 INSERT OVERWRITE TABLE dest_l1
 SELECT j.*
 FROM (SELECT t1.key, p1.value
   FROM src1 t1
   LEFT OUTER JOIN src p1
   ON (t1.key = p1.key)
   UNION ALL
   SELECT t2.key, p2.value
   FROM src1 t2
   LEFT OUTER JOIN src p2
   ON (t2.key = p2.key)) j;
 The stack trace is:
 FAILED: Hive Internal Error: java.lang.NullPointerException(null)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.optimizer.lineage.LineageCtx$Index.mergeDependency(LineageCtx.java:116)
 at 
 org.apache.hadoop.hive.ql.optimizer.lineage.OpProcFactory$UnionLineage.process(OpProcFactory.java:396)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88)
 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:54)
 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102)
 at 
 org.apache.hadoop.hive.ql.optimizer.lineage.Generator.transform(Generator.java:72)
 at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:83)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5976)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126)
 at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:48)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[DISCUSSION] To be (or not to be) a TLP - that is the question

2010-04-19 Thread Ashish Thusoo
Hi Folks,

Recently Apache Board asked the Hadoop PMC if some sub projects can become top 
level projects. In the opinion of the board, big umbrella projects make it 
difficult to monitor the health of the communities within the sub projects. If 
Hive does become a TLP, then we would have to elect our own PMC and take on all 
the administrative tasks that the Hadoop PMC does for us. So there is 
definitely more administrative work involved as a TLP. So the question is 
whether we should take on this additional task keeping at this time and what 
tangible advantages and disadvantages would such a move entail for the project. 
Would like to hear what the community thinks on this issue.

Thanks,
Ashish

PS: As some reference to what is happening in the other subprojects, at this 
time PIG and Zookeeper have decided NOT to become TLPs where as Hbase and Avro 
have decided to become TLPs.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-04-14 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857110#action_12857110
 ] 

Ashish Thusoo commented on HIVE-1293:
-

I would vote for versioning. Since we do not have to deal with the complexity 
of a buffer cache I think this would be much simpler to implement than what is 
possible in traditional databases. At the same time, for locks we will have to 
do a lease based mechanism anyway in order to protect against locks leaking 
because of client crashes. And when you account for that, it seems that locking 
would not be significantly simpler to implement than versioning.


 Concurreny Model for Hive
 -

 Key: HIVE-1293
 URL: https://issues.apache.org/jira/browse/HIVE-1293
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain

 Concurrency model for Hive:
 Currently, hive does not provide a good concurrency model. The only 
 guanrantee provided in case of concurrent readers and writers is that
 reader will not see partial data from the old version (before the write) and 
 partial data from the new version (after the write).
 This has come across as a big problem, specially for background processes 
 performing maintenance operations.
 The following possible solutions come to mind.
 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
 the query or the write locks can be delayed till move
 task (when the directory is actually moved). Care needs to be taken for 
 deadlocks.
 2. Versioning: The writer can create a new version if the current version is 
 being read. Note that, it is not equivalent to snapshots,
 the old version can only be accessed by the current readers, and will be 
 deleted when all of them have finished.
 Comments.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HIVE-1131) Add column lineage information to the pre execution hooks

2010-04-05 Thread Ashish Thusoo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-1131:


Attachment: HIVE-1131_8.patch

Another one with test fixes.


 Add column lineage information to the pre execution hooks
 -

 Key: HIVE-1131
 URL: https://issues.apache.org/jira/browse/HIVE-1131
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, 
 HIVE-1131_4.patch, HIVE-1131_5.patch, HIVE-1131_6.patch, HIVE-1131_7.patch, 
 HIVE-1131_8.patch


 We need a mechanism to pass the lineage information of the various columns of 
 a table to a pre execution hook so that applications can use that for:
 - auditing
 - dependency checking
 and many other applications.
 The proposal is to expose this through a bunch of classes to the pre 
 execution hook interface to the clients and put in the necessary 
 transformation logic in the optimizer to generate this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1131) Add column lineage information to the pre execution hooks

2010-04-02 Thread Ashish Thusoo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-1131:


Attachment: HIVE-1131_6.patch

With fixes to tests and with null dropped.


 Add column lineage information to the pre execution hooks
 -

 Key: HIVE-1131
 URL: https://issues.apache.org/jira/browse/HIVE-1131
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, 
 HIVE-1131_4.patch, HIVE-1131_5.patch, HIVE-1131_6.patch


 We need a mechanism to pass the lineage information of the various columns of 
 a table to a pre execution hook so that applications can use that for:
 - auditing
 - dependency checking
 and many other applications.
 The proposal is to expose this through a bunch of classes to the pre 
 execution hook interface to the clients and put in the necessary 
 transformation logic in the optimizer to generate this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1131) Add column lineage information to the pre execution hooks

2010-04-02 Thread Ashish Thusoo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-1131:


Attachment: HIVE-1131_7.patch

Another patch which fixes the QueryPlan to have LinkedHashMaps as that was also 
creating instability in the tests.

 Add column lineage information to the pre execution hooks
 -

 Key: HIVE-1131
 URL: https://issues.apache.org/jira/browse/HIVE-1131
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, 
 HIVE-1131_4.patch, HIVE-1131_5.patch, HIVE-1131_6.patch, HIVE-1131_7.patch


 We need a mechanism to pass the lineage information of the various columns of 
 a table to a pre execution hook so that applications can use that for:
 - auditing
 - dependency checking
 and many other applications.
 The proposal is to expose this through a bunch of classes to the pre 
 execution hook interface to the clients and put in the necessary 
 transformation logic in the optimizer to generate this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1131) Add column lineage information to the pre execution hooks

2010-04-02 Thread Ashish Thusoo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-1131:


Release Note: This changes the signature of PostExecute.java
Hadoop Flags: [Incompatible change]
  Status: Patch Available  (was: Open)

submitting.

 Add column lineage information to the pre execution hooks
 -

 Key: HIVE-1131
 URL: https://issues.apache.org/jira/browse/HIVE-1131
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, 
 HIVE-1131_4.patch, HIVE-1131_5.patch, HIVE-1131_6.patch, HIVE-1131_7.patch


 We need a mechanism to pass the lineage information of the various columns of 
 a table to a pre execution hook so that applications can use that for:
 - auditing
 - dependency checking
 and many other applications.
 The proposal is to expose this through a bunch of classes to the pre 
 execution hook interface to the clients and put in the necessary 
 transformation logic in the optimizer to generate this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1131) Add column lineage information to the pre execution hooks

2010-03-31 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852013#action_12852013
 ] 

Ashish Thusoo commented on HIVE-1131:
-

I looked at the ExecutionCtx stuff. There are atleast 3 different unrelated 
fields in SessionState that we should also move to the ExecutionCtx. I will 
file a follow up JIRA for it but I think we should get this one in. I did see 
some test failures due to using HashMaps and the consequent change in ordering 
after I refreshed. Will fix that and upload a new patch.


 Add column lineage information to the pre execution hooks
 -

 Key: HIVE-1131
 URL: https://issues.apache.org/jira/browse/HIVE-1131
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, 
 HIVE-1131_4.patch


 We need a mechanism to pass the lineage information of the various columns of 
 a table to a pre execution hook so that applications can use that for:
 - auditing
 - dependency checking
 and many other applications.
 The proposal is to expose this through a bunch of classes to the pre 
 execution hook interface to the clients and put in the necessary 
 transformation logic in the optimizer to generate this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1131) Add column lineage information to the pre execution hooks

2010-03-31 Thread Ashish Thusoo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-1131:


Attachment: HIVE-1131_5.patch

Added a more centralized function to decide what is the dependency type. Also 
reduced the number of dependency types to SIMPLE, EXPRESSION and SELECT. SIMPLE 
= a copy of the column, EXPRESSION = UDF, UDAF, UDTF or union all, SCRIPT = if 
a user script is used.

Also fixed the HashMap to LinkedHashMap..


 Add column lineage information to the pre execution hooks
 -

 Key: HIVE-1131
 URL: https://issues.apache.org/jira/browse/HIVE-1131
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, 
 HIVE-1131_4.patch, HIVE-1131_5.patch


 We need a mechanism to pass the lineage information of the various columns of 
 a table to a pre execution hook so that applications can use that for:
 - auditing
 - dependency checking
 and many other applications.
 The proposal is to expose this through a bunch of classes to the pre 
 execution hook interface to the clients and put in the necessary 
 transformation logic in the optimizer to generate this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1131) Add column lineage information to the pre execution hooks

2010-03-30 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851674#action_12851674
 ] 

Ashish Thusoo commented on HIVE-1131:
-

Look at the DataContainer class. That has a partition in it. And the Dependency 
has a mapping from Partition to the dependencies. Can you explain more your 
concerns on inefficiency?

For S6 actually the queryplan is the wrong place to store the lineageinfo. 
Because of the dynamic partitioning work that Ning is doing, I have to generate 
the partition to dependency mapping at run time. So I would rather store it in 
a run time structure as opposed to a compile time structure. SessionState fits 
that bill, though I think we should have another structure called ExecutionCtx 
for this. But otherwise I think we want to store this in a runtime structure.

S2 will add some more comments.


 Add column lineage information to the pre execution hooks
 -

 Key: HIVE-1131
 URL: https://issues.apache.org/jira/browse/HIVE-1131
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, 
 HIVE-1131_4.patch


 We need a mechanism to pass the lineage information of the various columns of 
 a table to a pre execution hook so that applications can use that for:
 - auditing
 - dependency checking
 and many other applications.
 The proposal is to expose this through a bunch of classes to the pre 
 execution hook interface to the clients and put in the necessary 
 transformation logic in the optimizer to generate this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1131) Add column lineage information to the pre execution hooks

2010-03-25 Thread Ashish Thusoo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-1131:


Attachment: HIVE-1131_2.patch

Patch with all the review comments incorporated. This is just the source patch. 
Will be uploading the fixed tests shortly.


 Add column lineage information to the pre execution hooks
 -

 Key: HIVE-1131
 URL: https://issues.apache.org/jira/browse/HIVE-1131
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Attachments: HIVE-1131.patch, HIVE-1131_2.patch


 We need a mechanism to pass the lineage information of the various columns of 
 a table to a pre execution hook so that applications can use that for:
 - auditing
 - dependency checking
 and many other applications.
 The proposal is to expose this through a bunch of classes to the pre 
 execution hook interface to the clients and put in the necessary 
 transformation logic in the optimizer to generate this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1131) Add column lineage information to the pre execution hooks

2010-03-25 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849624#action_12849624
 ] 

Ashish Thusoo commented on HIVE-1131:
-

Comment 3 from Raghu and comment S2-S4 from Zheng are not yet incorporated.

The new patch overhauls things a bit to support Partition level lineage and 
does this in a post execute hook. It gets rid of the visits and the iterator 
classes. Will fix the other comments in the patch with the test cases.


 Add column lineage information to the pre execution hooks
 -

 Key: HIVE-1131
 URL: https://issues.apache.org/jira/browse/HIVE-1131
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Attachments: HIVE-1131.patch, HIVE-1131_2.patch


 We need a mechanism to pass the lineage information of the various columns of 
 a table to a pre execution hook so that applications can use that for:
 - auditing
 - dependency checking
 and many other applications.
 The proposal is to expose this through a bunch of classes to the pre 
 execution hook interface to the clients and put in the necessary 
 transformation logic in the optimizer to generate this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1131) Add column lineage information to the pre execution hooks

2010-03-25 Thread Ashish Thusoo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-1131:


Attachment: HIVE-1131_3.patch

This fixes all the review comments. Will post the patch with tests separately.


 Add column lineage information to the pre execution hooks
 -

 Key: HIVE-1131
 URL: https://issues.apache.org/jira/browse/HIVE-1131
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch


 We need a mechanism to pass the lineage information of the various columns of 
 a table to a pre execution hook so that applications can use that for:
 - auditing
 - dependency checking
 and many other applications.
 The proposal is to expose this through a bunch of classes to the pre 
 execution hook interface to the clients and put in the necessary 
 transformation logic in the optimizer to generate this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1131) Add column lineage information to the pre execution hooks

2010-03-25 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849633#action_12849633
 ] 

Ashish Thusoo commented on HIVE-1131:
-

Also I did not find any instance of S3 in the code. Perhaps you just mentioned 
it for completeness but in case you do find an instance please let me know the 
offending file.


 Add column lineage information to the pre execution hooks
 -

 Key: HIVE-1131
 URL: https://issues.apache.org/jira/browse/HIVE-1131
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch


 We need a mechanism to pass the lineage information of the various columns of 
 a table to a pre execution hook so that applications can use that for:
 - auditing
 - dependency checking
 and many other applications.
 The proposal is to expose this through a bunch of classes to the pre 
 execution hook interface to the clients and put in the necessary 
 transformation logic in the optimizer to generate this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1131) Add column lineage information to the pre execution hooks

2010-03-25 Thread Ashish Thusoo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-1131:


Attachment: HIVE-1131_4.patch

This patch has all the tests updated as well.


 Add column lineage information to the pre execution hooks
 -

 Key: HIVE-1131
 URL: https://issues.apache.org/jira/browse/HIVE-1131
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, 
 HIVE-1131_4.patch


 We need a mechanism to pass the lineage information of the various columns of 
 a table to a pre execution hook so that applications can use that for:
 - auditing
 - dependency checking
 and many other applications.
 The proposal is to expose this through a bunch of classes to the pre 
 execution hook interface to the clients and put in the necessary 
 transformation logic in the optimizer to generate this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [ANNOUNCEMENT] Contributor Workshop at Yahoo!

2010-03-25 Thread Ashish Thusoo
Sounds like a good idea to me. If anyone @FB wants to join, maybe they could do 
it with you.

Ashish 

-Original Message-
From: Carl Steinbach [mailto:c...@cloudera.com] 
Sent: Thursday, March 25, 2010 2:09 PM
To: hive-dev@hadoop.apache.org
Subject: Re: [ANNOUNCEMENT] Contributor Workshop at Yahoo!

I'm happy to organize this if no one else wants to. Let me know if there are 
any objections. Otherwise I will send an email to the Y! at the end of the day.

Thanks.

Carl

On Thu, Mar 25, 2010 at 11:14 AM, Jeff Hammerbacher ham...@cloudera.comwrote:

 Has someone already emailed about a Hive workshop?

 On Thu, Mar 25, 2010 at 10:33 AM, Owen O'Malley o...@yahoo-inc.com wrote:

  Yahoo is organizing Contributor's Workshops on the day after the 
  Hadoop Summit (10 June 2010) for both Hadoop Core (HDFS  MapReduce) 
  and Pig. We would be happy to provide space for any of the other 
  Hadoop sub-projects
 as
  well!  If you are interested in organizing such a workshop for one 
  of the Hadoop sub-projects, please email us at 
  hadoopcontributorr...@yahoo-inc.com with WORKSHOP ORGANIZER (project)
 in
  the subject line.
 
  See you all at the Hadoop Summit - June 29th,
 http://www.hadoopsummit.org/
 
  Thanks,
Owen O'Malley  Eric Baldeschwieler



[jira] Commented: (HIVE-1117) Make QueryPlan serializable

2010-02-12 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833117#action_12833117
 ] 

Ashish Thusoo commented on HIVE-1117:
-

What would be the advantage to use Avro here? We do not really have a 
requirement of cross language clients for this thing? To me throwing Avro in 
the mix is just adding another dependency that is not really needed.. no?

 Make QueryPlan serializable
 ---

 Key: HIVE-1117
 URL: https://issues.apache.org/jira/browse/HIVE-1117
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Zheng Shao
Assignee: Zheng Shao
 Fix For: 0.6.0


 We need to make QueryPlan serializable so that we can resume the query some 
 time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1131) Add column lineage information to the pre execution hooks

2010-02-04 Thread Ashish Thusoo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-1131:


Attachment: HIVE-1131.patch

This is just the source patch. Will publish the test patch soon.

 Add column lineage information to the pre execution hooks
 -

 Key: HIVE-1131
 URL: https://issues.apache.org/jira/browse/HIVE-1131
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Attachments: HIVE-1131.patch


 We need a mechanism to pass the lineage information of the various columns of 
 a table to a pre execution hook so that applications can use that for:
 - auditing
 - dependency checking
 and many other applications.
 The proposal is to expose this through a bunch of classes to the pre 
 execution hook interface to the clients and put in the necessary 
 transformation logic in the optimizer to generate this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1123) Checkstyle fixes

2010-02-04 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829816#action_12829816
 ] 

Ashish Thusoo commented on HIVE-1123:
-

Apart from the indentation of throws clause is there any other major sticking 
point. Personally speaking I don't have a strong preference for the indentation 
of throws. Going with 2 indents probably makes it easier for eclipse to catch 
this. @Carl I do think that there is value in publishing the entire set of 
rules that you have used.

 Checkstyle fixes
 

 Key: HIVE-1123
 URL: https://issues.apache.org/jira/browse/HIVE-1123
 Project: Hadoop Hive
  Issue Type: Task
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Attachments: HIVE-1123.checkstyle.patch, HIVE-1123.cli.2.patch, 
 HIVE-1123.cli.patch, HIVE-1123.common.2.patch, HIVE-1123.common.patch, 
 HIVE-1123.contrib.2.patch, HIVE-1123.contrib.patch, HIVE-1123.hwi.2.patch, 
 HIVE-1123.hwi.patch, HIVE-1123.jdbc.2.patch, HIVE-1123.jdbc.patch, 
 HIVE-1123.metastore.2.patch, HIVE-1123.metastore.patch, HIVE-1123.ql.2.patch, 
 HIVE-1123.ql.patch, HIVE-1123.serde.2.patch, HIVE-1123.serde.patch, 
 HIVE-1123.service.2.patch, HIVE-1123.service.patch, HIVE-1123.shims.2.patch, 
 HIVE-1123.shims.patch


 Fix checkstyle errors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1131) Add column lineage information to the pre execution hooks

2010-02-03 Thread Ashish Thusoo (JIRA)
Add column lineage information to the pre execution hooks
-

 Key: HIVE-1131
 URL: https://issues.apache.org/jira/browse/HIVE-1131
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo


We need a mechanism to pass the lineage information of the various columns of a 
table to a pre execution hook so that applications can use that for:

- auditing
- dependency checking

and many other applications.

The proposal is to expose this through a bunch of classes to the pre execution 
hook interface to the clients and put in the necessary transformation logic in 
the optimizer to generate this information.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: HIVE-49 and other forms of CLI niceness

2010-01-27 Thread Ashish Thusoo
Looks like a good suggestion. Ideally the driver code should just return a 
structure that encodes the columns separately as opposed to a single serialized 
string today and the formatting logic should all be in the CliDriver

Ashish 

-Original Message-
From: Edward Capriolo [mailto:edlinuxg...@gmail.com] 
Sent: Wednesday, January 27, 2010 1:00 PM
To: hive-u...@hadoop.apache.org
Subject: HIVE-49 and other forms of CLI niceness

All,

Some simple features in Hive can really bring down the learning curve for new 
users. I am teaching some how to use hive.

A buddy if mine did this.
hive select * from mt_date_test;
OK
a   2010-01-01  NULL
b   2009-12-31  NULL
c   2010-01-27  NULL

hive select * from mt_date_test where my_date  '2010-01-01';

2010-01-27 08:18:27,008 map = 100%,  reduce =100% Ended Job = 
job_200909171715_20264 OK

I instantly suspected 1) whiteplace 2) delimeters

hive select key from mt_date_test;

OK
a   2010-01-01
b   2009-12-31
c   2010-01-27

!!BINGO!!

Should we use a pipe | or some other column delimiter like the mysql CLI does? 
and have this be a property that is on by default

hive.cli.columnseparator='\t'
hive.cli.columnseparator='|'

In its current state the user understandably made the assumption that '' does 
not work on strings.

Should we add some expose the format of the results in Driver so that the CLI 
can effectively split the rows by column?


RE: Hive in maven

2010-01-22 Thread Ashish Thusoo
Yes you should open a JIRA for this.

Ashish 

-Original Message-
From: Gerrit [mailto:gvanvuu...@specificmedia.com] 
Sent: Friday, January 22, 2010 7:33 AM
To: hive-dev@hadoop.apache.org
Subject: Re: Hive in maven

Hi,

Yes I'll start on creating the pom.xml

The fastest and recommended way of doing this is by having a maven repo and 
sync it with the official maven repos (one way ). Also future hive releases 
would be just a matter of loading to this repo and its automatically synced 
with the official maven repo.

If a manual upload is requested it takes more time. (says that on there website 
)

shall I open a jira for this?

Cheers,
 Gerrit

On Thu, 2010-01-21 at 12:28 -0800, Yongqiang He wrote:
 Hi Gerrit,
 
 Can you help uploading to maven?
 
 Thanks
 Yongqiang
 On 1/20/10 2:21 AM, Gerrit gvanvuu...@specificmedia.com wrote:
 
  Yep:
  
  The main maven page is:
  http://maven.apache.org/guides/mini/guide-central-repository-upload.
  html
  (see section
  Sync'ing your own repository to the central repository 
  automatically)
  
  For groupId and artifactId conventions see:
  http://maven.apache.org/guides/mini/guide-naming-conventions.html)
  
  
  I have been a maven user for some time now and can help out to make 
  the pom, document how to; setup and deploy, if you need help.
  
  For internal repos you could use:
   http://nexus.sonatype.org/
   http://www.jfrog.org/products.php
  
  
  On Tue, 2010-01-19 at 23:31 -0800, Zheng Shao wrote:
  This is a good idea. Can you point us to some references on how to 
  upload it to maven?
  
  Zheng
  
  On Mon, Jan 18, 2010 at 1:20 PM, Gerrit 
  gvanvuu...@specificmedia.comwrote:
  
  Hi guys,
  
  Would it be possible to add the hive jars to the main maven repo? 
  If there is not objections I can make the request to the main repo 
  if you agree.
  
  The reason for this need is that I've created a Loader for the pig 
  project to read HiveRCTables
  (https://issues.apache.org/jira/browse/PIG-1117) and currently use 
  ant to directly download the libraries from the apache site using:
  get verbose=true
  src=${apache.dist.site}/${hive.groupId}/${hive.artifactId}/${hive
  .artifactI 
  d}-${hive.version}/${hive.artifactId}-${hive.version}-hadoop-${had
  oop.versio
  n}-bin.tar.gz
  dest=lib-hivedeps/${hive.artifactId}-${hive.version}-hadoop-${had
  oop.versio
  n}-bin.tar.gz/
  
  I would much prefer using ivy or maven and it makes this much cleaner.
  
  Thanks,
  
   Gerrit
  
  
  
  
  
  
  
 
 



RE: Unit test result depends on platform.

2010-01-19 Thread Ashish Thusoo
Can you file a JIRA and give us the unit tests that fail. That would be very 
helpful. I suspect some of the test queries may be missing a sort by predicate 
so they could have different sort orders as compared to the expected output.

Ashish 

-Original Message-
From: Mafish Liu [mailto:maf...@gmail.com] 
Sent: Monday, January 18, 2010 5:30 PM
To: hive-dev@hadoop.apache.org
Subject: Re: Unit test result depends on platform.

Attachments are listing programs.
--
maf...@gmail.com


RE: New Hive committer Ning Zhang

2010-01-11 Thread Ashish Thusoo
Congrats!!

Ashish

-Original Message-
From: Zheng Shao [mailto:zsh...@gmail.com] 
Sent: Monday, January 11, 2010 11:51 AM
To: hive-dev@hadoop.apache.org
Subject: New Hive committer Ning Zhang

Ning has done a lot of work on Hive.
Hadoop PMC recently approved Ning Zhang as a new committer to Hive.

Congratulations Ning!

--
Yours,
Zheng


[jira] Commented: (HIVE-972) support views

2009-12-22 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793711#action_12793711
 ] 

Ashish Thusoo commented on HIVE-972:


Pretty comprehensive writeup :) Here are my comments:

1. It may be better to just go with a flat model to keep things simple. Also 
whenever we do materialized views in future you do have an object that is part 
table and part view and you may just need the flat model anyway at that point. 
The primary reason though to go with the flat model would be simplicity and 
less severe schema migration of the metastore schema.

2. For dependency tracking there is already code in hive that uses pre 
execution hooks to track lineage. That could easily be used to extract view 
dependencies (table level dependencies) when you create the view metadata. 
Raghu also had done some work on column lineage and perhaps that can be used to 
capture column lineage.

I think for the first cut we should just go with table dependencies and leave 
column stuff for later. We should have the lenient dependency invalidation 
scheme (perhaps for both drops and alters) because at least that way users can 
inspect view definitions and then fix them later. Accordingly then we would 
need a flag to mark an invalidated view and maybe some way of looking at that 
list. I think we can punt the cascade option for now as that seems to be an 
optimization in the user workflow and could be added later. Thoughts? The 
restrict though is probably more useful. We could have that be the default in 
the strict mode (Hive has a strict mode which disallows queries on partitioned 
tables in case a where clause on the partition column was not specified),

Not sure on what we should do about temporary functions but if we use views to 
transform our internal logs to another schema (nectar imps - context) then we 
may need it.

3. I am not sure if supporting limit is important but I can see good use of 
order by when we do materialized views. The sorted property could be helpful 
there and would be good to capture. We already capture those for tables.

4. I think fast path should work seemlessly, once the fast path with filters is 
done, no?

5. I think we can punt view modification for now if we support ways of 
inspecting the view sql for folks.


 support views
 -

 Key: HIVE-972
 URL: https://issues.apache.org/jira/browse/HIVE-972
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Reporter: Namit Jain
Assignee: John Sichi

 Hive currently does not support views. 
 It would be a very nice feature to have.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [VOTE] hive release candidate 0.4.1-rc3

2009-11-30 Thread Ashish Thusoo
+1 on the basis of tests run on the dev tar ball.

Ashish 

-Original Message-
From: Zheng Shao [mailto:zsh...@gmail.com] 
Sent: Monday, November 30, 2009 11:37 AM
To: hive-dev@hadoop.apache.org
Subject: Re: [VOTE] hive release candidate 0.4.1-rc3

I tried binary tarball with both hadoop 0.17 and 0.20 and both worked.
Please vote.

Zheng

On Fri, Nov 27, 2009 at 7:32 AM, Zheng Shao zsh...@gmail.com wrote:
 One more modification to the Tarballs:

 Location moved to 
 http://people.apache.org/~zshao/hive-0.4.1-candidate-3/

 I also made both the source tarball and binary tarball.

 Zheng

 On Sat, Nov 21, 2009 at 11:03 AM, Zheng Shao zsh...@gmail.com wrote:
 I forgot to modify the version in build.properties and make the tarball.

 Here it is:

 svn:
 https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.1-rc3/

 Tarball:
 http://people.apache.org/~zshao/hive-0.4.1-dev.tar.gz


 Please vote.

 Zheng

 On Sun, Nov 15, 2009 at 1:14 AM, Ashish Thusoo athu...@facebook.com wrote:
 Zheng,

 I cannot find the tar ball. What is the location?

 Ashish
 
 From: Zheng Shao [zsh...@gmail.com]
 Sent: Thursday, November 12, 2009 4:14 PM
 To: hive-dev@hadoop.apache.org
 Subject: Re: [VOTE] hive release candidate 0.4.1-rc2

 Please vote. We would like release 0.4.1 to go out as soon as 
 possible since it fixed some critical bugs in 0.4.0.

 Zheng

 On Wed, Nov 11, 2009 at 6:34 AM, Zheng Shao zsh...@gmail.com wrote:
 I have made a release candidate 0.4.1-rc2.

 We've fixed several critical bugs to hive release 0.4.0. We need 
 hive release 0.4.1 out asap.

 Here are the list of changes:

   HIVE-884. Metastore Server should call System.exit() on error.
   (Zheng Shao via pchakka)

   HIVE-864. Fix map-join memory-leak.
   (Namit Jain via zshao)

   HIVE-878. Update the hash table entry before flushing in Group By
   hash aggregation (Zheng Shao via namit)

   HIVE-882. Create a new directory every time for scratch.
   (Namit Jain via zshao)

   HIVE-890. Fix cli.sh for detecting Hadoop versions. (Paul Huff 
 via zshao)

   HIVE-892. Hive to kill hadoop jobs using POST. (Dhruba Borthakur 
 via zshao)

   HIVE-883. URISyntaxException when partition value contains special chars.
   (Zheng Shao via namit)

 *  HIVE-902. Fix cli.sh to work with hadoop versions less than 20.
   (Carl Steinbach via zshao)


 *: New since release candidate 0.4.1-rc0.


 Please vote.

 --
 Yours,
 Zheng




 --
 Yours,
 Zheng




 --
 Yours,
 Zheng




 --
 Yours,
 Zheng




--
Yours,
Zheng


[jira] Created: (HIVE-939) Extend hive streaming to support counter updates similar to hadoop streaming.

2009-11-17 Thread Ashish Thusoo (JIRA)
Extend hive streaming to support counter updates similar to hadoop streaming.
-

 Key: HIVE-939
 URL: https://issues.apache.org/jira/browse/HIVE-939
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo


The code to update hadoop counters needs to be ported from hadoop streaming to 
the streaming code in Hive.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [VOTE] hive release candidate 0.4.1-rc2

2009-11-14 Thread Ashish Thusoo
Zheng,

I cannot find the tar ball. What is the location?

Ashish

From: Zheng Shao [zsh...@gmail.com]
Sent: Thursday, November 12, 2009 4:14 PM
To: hive-dev@hadoop.apache.org
Subject: Re: [VOTE] hive release candidate 0.4.1-rc2

Please vote. We would like release 0.4.1 to go out as soon as possible
since it fixed some critical bugs in 0.4.0.

Zheng

On Wed, Nov 11, 2009 at 6:34 AM, Zheng Shao zsh...@gmail.com wrote:
 I have made a release candidate 0.4.1-rc2.

 We've fixed several critical bugs to hive release 0.4.0. We need hive
 release 0.4.1 out asap.

 Here are the list of changes:

   HIVE-884. Metastore Server should call System.exit() on error.
   (Zheng Shao via pchakka)

   HIVE-864. Fix map-join memory-leak.
   (Namit Jain via zshao)

   HIVE-878. Update the hash table entry before flushing in Group By
   hash aggregation (Zheng Shao via namit)

   HIVE-882. Create a new directory every time for scratch.
   (Namit Jain via zshao)

   HIVE-890. Fix cli.sh for detecting Hadoop versions. (Paul Huff via zshao)

   HIVE-892. Hive to kill hadoop jobs using POST. (Dhruba Borthakur via zshao)

   HIVE-883. URISyntaxException when partition value contains special chars.
   (Zheng Shao via namit)

 *  HIVE-902. Fix cli.sh to work with hadoop versions less than 20.
   (Carl Steinbach via zshao)


 *: New since release candidate 0.4.1-rc0.


 Please vote.

 --
 Yours,
 Zheng




--
Yours,
Zheng


RE: Hive Performance

2009-11-09 Thread Ashish Thusoo
There are a bunch of optimizations that deal with skewed data in Hive as well. 
The optimizer is rule based and the user has to hint the query - similar to 
what is done in RDBMS. We have mostly done our performance work on the 
benchmark published in the SIGMOD paper.

Ashish

-Original Message-
From: Edward Capriolo [mailto:edlinuxg...@gmail.com] 
Sent: Saturday, November 07, 2009 11:19 AM
To: hive-dev@hadoop.apache.org
Subject: Re: Hive Performance

A friend and I were disgussing pig vs hive in general yesterday. On the surface 
hive is an sql like language.pig is its own language 'pig latin' however in the 
end I think they both end up doing column projections, joins,etc. In the end it 
is a similar operation happening on the same cluster. So performance wise I 
expect the performance will eventually be similair. now pig offering more sql 
support is a large undertaking.

 While pig looks very versatile I resently emultated the example on cloudera's 
blog for geoip locating traffic in pig. I did this in hive with an external 
perl script using map/transform. (It did not take a page long pig program) I 
also think the hive udf framework can be used in place of many piggybank 
functions. Also unless I am missing something a udf is native java. Seems like 
piggybank functions are going to be piping /streaming output I can't see that 
performing better.

To backtrack if pig adds sql, will we need hive? If hive adds something like 
tsql will we need pig?

On 11/7/09, Rob Stewart robstewar...@googlemail.com wrote:
 Hi there. I'm in the process of writing a paper, and part of it I aim 
 to write (yet another) comparative study on various interfaces with Hadoop.

 This will almost certainly include Pig and Hive, probably MapReduce, 
 and maybe JAQL.

 I have read the papers published on the Hive JIRA (pig vs hive vs 
 MapReduce for 2 queries, an aggregation, and a join). I am, however, 
 wanting to know a bit from the Hive community.

 1. Do you guys (the Hive developers) have a standardized benchmarking 
 tool to use prior to each Hive release? I am thinking of something 
 similar to PigMix, used by the Pig developers. In case you don't know, 
 PigMix is a set of 12 designed queries, implemented in Pig and Java 
 Hadoop, and comparisons are made on execution time. Does the Hive community 
 have something similar?

 2. The Pig wiki point out some unique features of Pig that allow 
 optimal execution performance. For instance, they have a methods to 
 optimize queries on skewed data (by taking samples of the data for 
 reduce key allocations. Is there something about the implementation of 
 Hive that gives it some functionality not found in other interfaces. 
 And better still, would there some Hive implementation that could work 
 as a proof of concept to show any optimized features of Hive?

 3. One section suggested for investigation within the Pig development 
 team is to create a SQL like language that could be compiled down 
 through Pig to MR jobs. If such a project was to achieve parity with 
 Hive's SQL like interface, where would be the distinction be between Pig and 
 Hive.
 Certainly, from a users perspective, there would be very little difference.
 If the only difference turns out to be the execution performance 
 achieved by one interface over another, where would this put the 
 inferior interface (be that either Pig or Hive) in terms of its 
 relevance in the Hadoop software stack?


 Many thanks,


 Rob Stewart



RE: Make me as a member of hive developer

2009-11-02 Thread Ashish Thusoo
Hi Mohan,

The instructions to subscribe to the mailing list is here...

http://hadoop.apache.org/hive/mailing_lists.html#Developers

Ashish 

-Original Message-
From: Mohan Agarwal [mailto:mohan.agarwa...@gmail.com] 
Sent: Monday, November 02, 2009 8:45 AM
To: hive-dev@hadoop.apache.org
Subject: Make me as a member of hive developer




[jira] Commented: (HIVE-884) Metastore Server should exit if error happens

2009-10-17 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766966#action_12766966
 ] 

Ashish Thusoo commented on HIVE-884:


Can we add a test case?

Otherwise changes look good


 Metastore Server should exit if error happens
 -

 Key: HIVE-884
 URL: https://issues.apache.org/jira/browse/HIVE-884
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.4.1, 0.5.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-884.1.patch


 Currently, HiveMetaStore (the thrift server) is not exiting when the main 
 thread saw an Exception.
 The process should exit when that happens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-885) Better error messages for debugging serde problem at reducer input

2009-10-17 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766983#action_12766983
 ] 

Ashish Thusoo commented on HIVE-885:


Will values.next() always return a BytesWritable?


 Better error messages for debugging serde problem at reducer input
 --

 Key: HIVE-885
 URL: https://issues.apache.org/jira/browse/HIVE-885
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.5.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-885.1.patch


 Sometimes we are seeing serde exceptions at the reducer side with hadoop 0.20.
 This should help debug the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[ANNOUNCE] Hive 0.4.0 released

2009-10-14 Thread Ashish Thusoo
Hi Folks,

We have release the rc2 candidate that Namit had generated as Hive 0.4.0. You 
can find download it from the download page.

http://hadoop.apache.org/hive/releases.html#Download

Thanks,
Ashish


RE: Hive and MapReduce

2009-10-12 Thread Ashish Thusoo
adding hive-user and hive-dev lists.
And removing the common mailing list..

Can you elaborate a bit on the datasize. By default Hive should just be relying 
on hadoop to give you the number of mappers depending on the number of splits 
you have in your data.

Ashish

-Original Message-
From: Touretsky, Gregory [mailto:gregory.touret...@intel.com] 
Sent: Monday, October 12, 2009 3:02 AM
To: Touretsky, Gregory; common-u...@hadoop.apache.org
Subject: RE: Hive and MapReduce

Ok, the patch below actually works. Re-built Hadoop cluster and everything 
works now.
Now I have to understand how to force Hive to run 1 mapper for complicated 
query on the large table...

From: Touretsky, Gregory
Sent: Sunday, October 11, 2009 4:39 PM
To: common-u...@hadoop.apache.org
Cc: Touretsky, Gregory
Subject: Hive and MapReduce

Hi,

   I'm running Hadoop 0.20.1 and Hive (checked out revision 824063).
Direct MapReduce task succeeds, but Map task created by Hive fails:

hive select * from pokes where foo100;
Total MapReduce jobs = 1
Number of reduce tasks is set to 0 since there's no reduce operator Starting 
Job = job_200910111626_0001, Tracking URL = 
http://itstl0016.iil.intel.com:50030/jobdetails.jsp?jobid=job_200910111626_0001
Kill Command = /nfs/iil/disks/rep_tests_gtouret01/hadoop/bin/hadoop job  
-Dmapred.job.tracker=itstl0016.iil.intel.com:9001 -kill job_200910111626_0001
2009-10-11 04:26:57,844 map = 100%,  reduce = 100% Ended Job = 
job_200910111626_0001 with errors
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.ExecDriver

From the logs/hadoop--jobtracker-.iil.intel.com.log:
2009-10-11 16:26:56,829 INFO org.apache.hadoop.mapred.JobInProgress: 
Initializing job_200910111626_0001
2009-10-11 16:26:57,091 INFO org.apache.hadoop.mapred.JobInProgress: Input size 
for job job_200910111626_0001 = 13. Number of splits = 1
2009-10-11 16:26:57,225 ERROR org.apache.hadoop.mapred.JobTracker: Job 
initialization failed:
java.lang.IllegalArgumentException: Network location name contains /: 
/IDC1-DC201/WE/34(I've had the same issue with the /default_rack)
at org.apache.hadoop.net.NodeBase.set(NodeBase.java:75)
at org.apache.hadoop.net.NodeBase.init(NodeBase.java:57)
at 
org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2390)
at 
org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2384)
at 
org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:349)
at 
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:450)
at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3147)
at 
org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

2009-10-11 16:26:57,225 INFO org.apache.hadoop.mapred.JobTracker: Failing job 
job_200910111626_0001
2009-10-11 16:26:57,866 INFO org.apache.hadoop.mapred.JobTracker: Killing job 
job_200910111626_0001

Any suggestion?
I saw patches in 
https://issues.apache.org/jira/browse/HADOOP-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712524#action_12712524,
 but I can't apply all of them cleanly to my Hadoop sources...

Thanks,
   Gregory
-
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for the sole 
use of the intended recipient(s). Any review or distribution by others is 
strictly prohibited. If you are not the intended recipient, please contact the 
sender and delete all copies.


[jira] Updated: (HIVE-805) Session level metastore

2009-10-07 Thread Ashish Thusoo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-805:
---

Attachment: HIVE-805-1.patch

Incorporated Prasad's review comments. I have not yet disabled this for 
partition tables though.

 Session level metastore
 ---

 Key: HIVE-805
 URL: https://issues.apache.org/jira/browse/HIVE-805
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.2.0
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Fix For: 0.5.0

 Attachments: HIVE-805-1.patch, HIVE-805.patch


 Implement a shadow metastore that is in memory and runs for a session. This 
 can contain definitions for session specific views that can be used to 
 implement data flow variables in Hive. It can also be used for testing 
 scripts. First we will support the later use case where in all the DDL 
 statements in the session create objects in the session metastore and all the 
 queries are converted to explain internal. Any thoughts on load commands?
 This feature is enabled when
 set hive.session.test = true
 is done in the session.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [VOTE] vote for release candidate for hive

2009-09-30 Thread Ashish Thusoo
+1.

Also sending this to the PMC for approval.

Hi PMC,

The release candidate that Namit prepared can be found at the following 
location:

http://people.apache.org/~namit/hive-0.4.0-candidate-2/

It has the hive 0.4.0 releases for hadoop 0.17, 0.18, 0.19 and 0.20.

Please try it out and vote on it.

Thanks,
Ashish



From: Min Zhou [coderp...@gmail.com]
Sent: Tuesday, September 29, 2009 6:35 PM
To: hive-dev@hadoop.apache.org
Subject: Re: [VOTE] vote for release candidate for hive

I saw it, +1 for all test passed .

On Wed, Sep 30, 2009 at 1:59 AM, Namit Jain nj...@facebook.com wrote:

 I did find the files:

 [nj...@dev029 /tmp]$ ls -lrt hive-0.4.0-dev-hadoop-0.19.0/src
 total 33580
 drwxr-xr-x  4 njain users 4096 Aug 11 16:41 docs
 drwxr-xr-x  7 njain users 4096 Aug 11 16:41 data
 -rw-r--r--  1 njain users15675 Aug 11 16:41 README.txt
 -rw-r--r--  1 njain users 2810 Sep  2 10:44 TestTruncate.launch
 -rw-r--r--  1 njain users 2804 Sep  2 10:44 TestMTQueries.launch
 -rw-r--r--  1 njain users 2807 Sep  2 10:44 TestJdbc.launch
 -rw-r--r--  1 njain users 2808 Sep  2 10:44 TestHive.launch
 -rw-r--r--  1 njain users 2805 Sep  2 10:44 TestCliDriver.launch
 -rw-r--r--  1 njain users17045 Sep 10 15:16 build.xml
 -rw-r--r--  1 njain users  850 Sep 10 15:16 build.properties
 -rw-r--r--  1 njain users12520 Sep 10 15:16 build-common.xml
 -rw-r--r--  1 njain users33431 Sep 17 18:15 CHANGES.txt
 -rw-r--r--  1 njain users 1071 Sep 18 13:26 runscr
 -rw-r--r--  1 njain users 23392371 Sep 18 13:26
 hive-0.4.0-hadoop-0.20.0-dev.tar.gz
 -rw-r--r--  1 njain users 10735695 Sep 18 13:27
 hive-0.4.0-hadoop-0.20.0-bin.tar.gz
 drwxr-xr-x  3 njain users 4096 Sep 29 10:54 jdbc
 drwxr-xr-x  2 njain users 4096 Sep 29 10:54 ivy
 drwxr-xr-x  4 njain users 4096 Sep 29 10:54 hwi
 drwxr-xr-x  4 njain users 4096 Sep 29 10:54 eclipse-templates
 drwxr-xr-x  3 njain users 4096 Sep 29 10:54 contrib
 drwxr-xr-x  2 njain users 4096 Sep 29 10:54 conf
 drwxr-xr-x  3 njain users 4096 Sep 29 10:54 common
 drwxr-xr-x  4 njain users 4096 Sep 29 10:54 cli
 drwxr-xr-x  3 njain users 4096 Sep 29 10:54 ant
 drwxr-xr-x  2 njain users 4096 Sep 29 10:54 testutils
 drwxr-xr-x  2 njain users 4096 Sep 29 10:54 testlibs
 drwxr-xr-x  3 njain users 4096 Sep 29 10:54 shims
 drwxr-xr-x  6 njain users 4096 Sep 29 10:54 service
 drwxr-xr-x  4 njain users 4096 Sep 29 10:54 serde
 drwxr-xr-x  5 njain users 4096 Sep 29 10:54 ql
 drwxr-xr-x  4 njain users 4096 Sep 29 10:54 odbc
 drwxr-xr-x  6 njain users 4096 Sep 29 10:54 metastore
 drwxr-xr-x  2 njain users 4096 Sep 29 10:54 lib
 drwxr-xr-x  3 njain users 4096 Sep 29 10:54 bin



 I have attached the output.




 -Original Message-
 From: Min Zhou [mailto:coderp...@gmail.com]
 Sent: Tuesday, September 22, 2009 6:29 PM
 To: hive-dev@hadoop.apache.org
 Subject: Re: [VOTE] vote for release candidate for hive

 Hi Namit

 I meant

 http://people.apache.org/~namit/hive-0.4.0-candidate-2/hive-0.4.0-hadoop-0.19.0-dev.tar.gzhttp://people.apache.org/%7Enamit/hive-0.4.0-candidate-2/hive-0.4.0-hadoop-0.19.0-dev.tar.gz

 Min

 On Wed, Sep 23, 2009 at 5:31 AM, Namit Jain nj...@facebook.com wrote:

  Which one are you looking at ?
 
  I downloaded just now from:
 
 
 
 http://people.apache.org/~namit/hive-0.4.0-candidate-2/hive-0.4.0-hadoop-0.20.0-dev.tar.gzhttp://people.apache.org/%7Enamit/hive-0.4.0-candidate-2/hive-0.4.0-hadoop-0.20.0-dev.tar.gz
 
 http://people.apache.org/%7Enamit/hive-0.4.0-candidate-2/hive-0.4.0-hadoop-0.20.0-dev.tar.gz
 
 
  and it contains CHANGE.txt and build.xml etc.
 
  Did you download the binary tarball ?
 
  Thanks,
  -namit
 
 
 
  -Original Message-
  From: Min Zhou [mailto:coderp...@gmail.com]
  Sent: Monday, September 21, 2009 7:46 PM
  To: hive-dev@hadoop.apache.org
  Subject: Re: [VOTE] vote for release candidate for hive
 
  Hi Namit,
 
  I haven't found build.xml, CHANGES.txt from your tarball. They must be
  included so that we can test it and check the changes, I think.
 
  Thanks,
  Min
 
  On Sat, Sep 19, 2009 at 4:42 AM, Namit Jain nj...@facebook.com wrote:
 
   It is available from
  
   http://people.apache.org/~namit/ http://people.apache.org/%7Enamit/
 http://people.apache.org/%7Enamit/ 
  http://people.apache.org/%7Enamit/
  
  
   Thanks,
   -namit
  
   -Original Message-
   From: Ashish Thusoo
   Sent: Thursday, September 17, 2009 11:55 PM
   To: hive-dev@hadoop.apache.org; Namit Jain
   Subject: RE: [VOTE] vote for release candidate for hive
  
   Namit,
  
   Can you make it available from
  
   http://people.apache.org/~njain/ http://people.apache.org/%7Enjain/
 http://people.apache.org/%7Enjain/ 
  http://people.apache.org/%7Enjain/
  
   That way people who do not have access to the apache machines will also
  be
   able to try the candidate.
  
   Thanks,
   Ashish

[ANNOUNCE] Edwardo Capriolo as a Hive committer

2009-09-22 Thread Ashish Thusoo
Hi Folks,

We are happy to add Edwardo as a committer to the Hive project. Edwardo has 
made many contributions to Hive over the last year including the Hive Web 
Interface. My heartiest congratulations and a warm welcome to him in the hive 
committers group.

Cheers,
Ashish


RE: [VOTE] vote for release candidate for hive

2009-09-18 Thread Ashish Thusoo
Namit,

Can you make it available from

http://people.apache.org/~njain/

That way people who do not have access to the apache machines will also be able 
to try the candidate.

Thanks,
Ashish

From: Namit Jain [nj...@facebook.com]
Sent: Thursday, September 17, 2009 6:32 PM
To: Namit Jain; hive-dev@hadoop.apache.org
Subject: [VOTE] vote for release candidate for hive

Following the convention

-Original Message-
From: Namit Jain
Sent: Thursday, September 17, 2009 6:31 PM
To: hive-dev@hadoop.apache.org
Subject: vote for release candidate for hive

I have created another release candidate for Hive.

  https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc2/

Let me know if it is OK to publish this release candidate.



The only change from the previous candidate 
(https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc1/) is the 
fix for

 https://issues.apache.org/jira/browse/HIVE-838


The tar ball can be found at:

people.apache.org

/home/namit/public_html/hive-0.4.0-candidate-2/hive-0.4.0-dev.tar.gz*



Thanks,
-namit






[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

2009-09-17 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756823#action_12756823
 ] 

Ashish Thusoo commented on HIVE-78:
---

@Min

I agree with Edwards thought here. We have to foster a collaborative 
environment and not be dismissive of each others ideas and approaches. Much of 
the work in the community happens on a volunteer basis and whatever time anyone 
puts on the project is a bonus and should be respected by all. 

It does make sense to keep authentication separate from authorization because 
in most environments there are already directories which deal with the former. 
Creating yet another store for passwords just leads to an administration 
nightmare as the account administrators have to create accounts for new users 
at multiple places. So lets just focus on authorization and let the directory 
infrastructure deal with authentication. Will look at your patch as well.




 Authentication infrastructure for Hive
 --

 Key: HIVE-78
 URL: https://issues.apache.org/jira/browse/HIVE-78
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Reporter: Ashish Thusoo
Assignee: Edward Capriolo
 Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, 
 hive-78.diff


 Allow hive to integrate with existing user repositories for authentication 
 and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: vote for a release candidate

2009-09-14 Thread Ashish Thusoo
Clearly with the fix it is still dangerous for them to use LOAD INTO unless 
they understand the consistency implications or have put work arounds to 
address some reader crashes. I agree though that since this is a regression, we 
should get the functionality to what it was in 0.3

Ashish 

-Original Message-
From: Todd Lipcon [mailto:t...@cloudera.com] 
Sent: Saturday, September 12, 2009 3:45 PM
To: hive-dev@hadoop.apache.org
Subject: Re: vote for a release candidate

Hi Namit,
Yes, we have customers who are using LOAD INTO without OVERWRITE. The use case 
is for collecting session data into a table partitioned by the hour of session 
start time. Since sessions are of varying lengths, incremental loads are 
necessary as sessions finish up.

There are a couple of possible workarounds, but all of them have drawbacks.

-Todd


On Thu, Sep 10, 2009 at 6:58 PM, Namit Jain nj...@facebook.com wrote:

 I am not sure 718 is a valid requirement. I think it got in by legacy.

 Should we even support LOAD INTO ?

 We only support INSERT OVERWRITE,  similarly, we should only support 
 LOAD OVERWRITE INTO.

 Is anyone using LOAD INTO without OVERWRITE ?



 Thanks,
 -namit





 -Original Message-
 From: Todd Lipcon [mailto:t...@cloudera.com]
 Sent: Thursday, September 10, 2009 4:28 PM
 To: hive-dev@hadoop.apache.org
 Subject: Re: vote for a release candidate

 What do you guys think the feasibility of HIVE-718 being fixed for 
 0.4.0 is?
 I think a completely correct solution is likely to be very tough to 
 achieve, but as is it's a regression from 0.3.0 in that the 
 functionality silently fails.

 -Todd

 On Thu, Sep 10, 2009 at 3:24 PM, Namit Jain nj...@facebook.com wrote:

  I have created a release candidate for Hive.
 
  https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc0/
 
 
  Let me know if it is OK to publish this release candidate.
 
 
  Thanks,
  -namit
 
 
 
 
 



[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

2009-09-14 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755292#action_12755292
 ] 

Ashish Thusoo commented on HIVE-718:


+1

Looks good to me.


 Load data inpath into a new partition without overwrite does not move the file
 --

 Key: HIVE-718
 URL: https://issues.apache.org/jira/browse/HIVE-718
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Zheng Shao
Assignee: Namit Jain
 Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt, 
 hive.718.1.patch


 The bug can be reproduced as following. Note that it only happens for 
 partitioned tables. The select after the first load returns nothing, while 
 the second returns the data correctly.
 insert.txt in the current local directory contains 3 lines: a, b and c.
 {code}
  create table tmp_insert_test (value string) stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test;
  select * from tmp_insert_test;
 a
 b
 c
  create table tmp_insert_test_p ( value string) partitioned by (ds string) 
  stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
 a   2009-08-01
 b   2009-08-01
 d   2009-08-01
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

2009-09-11 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12754058#action_12754058
 ] 

Ashish Thusoo commented on HIVE-718:


Apologies on following this earlier. It caught my attention as Todd brought up 
whether we should get this into 0.4.0 release as this is a regression when 
compared to 0.3.0. I checked the code on 0.3.0 and it seems to be the same as 
that in 0.4.0. So I am not sure if this is a regression. If this is not a 
regression then potentially we can go out with 0.4.0 without this and document 
this?

As is evident by this discussion LOAD INTO and its cousin INSERT INTO (when we 
have it) are very tricky. Almost all our code has been written with the 
overwrite semantics. Appending new data to an existing partition would need 
more work to get right and I feel we should punt it and document that insert 
into is not reliable - I think it has never been reliable.

In order to safely implement the INSERT INTO and LOAD INTO semantics one 
approach is to introduce a notion of versions on the DML commands which is 
encoded in the directory structure i.e.

instead of storing things as 

xyz/part-

we store the files as

xyz/v1/part-

and so on so forth. We store the latest created version in the metastore entry 
for that table. When a reader comes in it first looks at this entry and then 
finds a version corresponding to that in the table. The versions themselves 
could be garbage collected by deleting version directories that are older than 
say some configurable duration old and this could either be done lazily by the 
writer on the table or by an active garbage collector in the background. These 
are of course somewhat involved changes and would solve the isolation and 
atomicity problems. The later becase v1 is a directory so moving data to that 
directory would be a rename and hence atomic. Thoughts?


 Load data inpath into a new partition without overwrite does not move the file
 --

 Key: HIVE-718
 URL: https://issues.apache.org/jira/browse/HIVE-718
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Zheng Shao
 Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt


 The bug can be reproduced as following. Note that it only happens for 
 partitioned tables. The select after the first load returns nothing, while 
 the second returns the data correctly.
 insert.txt in the current local directory contains 3 lines: a, b and c.
 {code}
  create table tmp_insert_test (value string) stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test;
  select * from tmp_insert_test;
 a
 b
 c
  create table tmp_insert_test_p ( value string) partitioned by (ds string) 
  stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
 a   2009-08-01
 b   2009-08-01
 d   2009-08-01
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: vote for a release candidate

2009-09-11 Thread Ashish Thusoo
Just replied on the JIRA. Is this really a regression - the code in 0.3.0 and 
0.4.0 seems similar...

Ashish

From: Namit Jain [nj...@facebook.com]
Sent: Thursday, September 10, 2009 6:58 PM
To: hive-dev@hadoop.apache.org
Subject: RE: vote for a release candidate

I am not sure 718 is a valid requirement. I think it got in by legacy.

Should we even support LOAD INTO ?

We only support INSERT OVERWRITE,  similarly, we should only support LOAD 
OVERWRITE INTO.

Is anyone using LOAD INTO without OVERWRITE ?



Thanks,
-namit





-Original Message-
From: Todd Lipcon [mailto:t...@cloudera.com]
Sent: Thursday, September 10, 2009 4:28 PM
To: hive-dev@hadoop.apache.org
Subject: Re: vote for a release candidate

What do you guys think the feasibility of HIVE-718 being fixed for 0.4.0 is?
I think a completely correct solution is likely to be very tough to achieve,
but as is it's a regression from 0.3.0 in that the functionality silently
fails.

-Todd

On Thu, Sep 10, 2009 at 3:24 PM, Namit Jain nj...@facebook.com wrote:

 I have created a release candidate for Hive.

 https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc0/


 Let me know if it is OK to publish this release candidate.


 Thanks,
 -namit







[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

2009-09-11 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12754397#action_12754397
 ] 

Ashish Thusoo commented on HIVE-718:


@prasad, can you explain your comment about the external process stuff?


 Load data inpath into a new partition without overwrite does not move the file
 --

 Key: HIVE-718
 URL: https://issues.apache.org/jira/browse/HIVE-718
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Zheng Shao
 Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt


 The bug can be reproduced as following. Note that it only happens for 
 partitioned tables. The select after the first load returns nothing, while 
 the second returns the data correctly.
 insert.txt in the current local directory contains 3 lines: a, b and c.
 {code}
  create table tmp_insert_test (value string) stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test;
  select * from tmp_insert_test;
 a
 b
 c
  create table tmp_insert_test_p ( value string) partitioned by (ds string) 
  stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
 a   2009-08-01
 b   2009-08-01
 d   2009-08-01
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   3   4   5   6   7   >