RE: [VOTE] vote for release candidate for hive

2009-09-17 Thread Ashish Thusoo
Namit,

Can you make it available from

http://people.apache.org/~njain/

That way people who do not have access to the apache machines will also be able 
to try the candidate.

Thanks,
Ashish

From: Namit Jain [nj...@facebook.com]
Sent: Thursday, September 17, 2009 6:32 PM
To: Namit Jain; hive-dev@hadoop.apache.org
Subject: [VOTE] vote for release candidate for hive

Following the convention

-Original Message-
From: Namit Jain
Sent: Thursday, September 17, 2009 6:31 PM
To: hive-dev@hadoop.apache.org
Subject: vote for release candidate for hive

I have created another release candidate for Hive.

  https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc2/

Let me know if it is OK to publish this release candidate.



The only change from the previous candidate 
(https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc1/) is the 
fix for

 https://issues.apache.org/jira/browse/HIVE-838


The tar ball can be found at:

people.apache.org

/home/namit/public_html/hive-0.4.0-candidate-2/hive-0.4.0-dev.tar.gz*



Thanks,
-namit






[jira] Commented: (HIVE-819) Add lazy decompress ability to RCFile

2009-09-17 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756978#action_12756978
 ] 

Ning Zhang commented on HIVE-819:
-

Yongqiang, thanks for the explanation! Below are some more detailed comments:

1) in RCFile.c:307 it seems decompress() can be called multiple times and the 
function doesn't check if the data is already decompressed, and if so return. 
This may not cause problem in this diff since the callers will check if the 
data is decompressed or not before calling decompress(), but since it is a 
public function and it doesn't prevent future callers call this function twice. 
So it may be better to implement this check inside the decompress() function. 

2) Also the same decompress() function, it seems it doesn't work correctly when 
the column is not compressed. Can you double check it?

3) Add unit tests or qfiles for the following cases:
  - storage dimension: 
 (1) fields are compressed 
 (2) fields are uncompressed
  - queries dimension:
 (a) 1 column in the where-clause 
 (b) 2 references to the same column in the where-clause (e.g., a> 2 and a 
< 5) 
 (c) 2 references to the same column in the where-clause and groupby-clause 
respectively (e.g., where a > 2 group by a).

So there will be 6 test cases w/ the permutation of the 2 dimensions. For (b) 
and (c) please check the actual column decompression is only done once. 

> Add lazy decompress ability to RCFile
> -
>
> Key: HIVE-819
> URL: https://issues.apache.org/jira/browse/HIVE-819
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor, Serializers/Deserializers
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Fix For: 0.5.0
>
> Attachments: hive-819-2009-9-12.patch
>
>
> This is especially useful for a filter scanning. 
> For example, for query 'select a, b, c from table_rc_lazydecompress where 
> a>1;' we only need to decompress the block data of b,c columns when one row's 
> column 'a' in that block satisfies the filter condition.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

2009-09-17 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756951#action_12756951
 ] 

Min Zhou commented on HIVE-78:
--

>From the words you commented:
{noformat}
Daemons like HiveService and HiveWebInterface will have to run as supergroup or 
a hive group? 
{noformat}

> Authentication infrastructure for Hive
> --
>
> Key: HIVE-78
> URL: https://issues.apache.org/jira/browse/HIVE-78
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Ashish Thusoo
>Assignee: Edward Capriolo
> Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, 
> hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication 
> and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

2009-09-17 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756949#action_12756949
 ] 

Min Zhou commented on HIVE-78:
--

I do not think the HiveServer in your mind is the same as mine, which support 
multiple users, not only one.

> Authentication infrastructure for Hive
> --
>
> Key: HIVE-78
> URL: https://issues.apache.org/jira/browse/HIVE-78
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Ashish Thusoo
>Assignee: Edward Capriolo
> Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, 
> hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication 
> and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

2009-09-17 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756936#action_12756936
 ] 

Edward Capriolo commented on HIVE-78:
-

@Min
 
I would think the code should apply to any client cli, hive server, or HWI. 

We should probably also provide a configuration variable 

{noformat}

   hive.authorize
   true

{noformat}

> Authentication infrastructure for Hive
> --
>
> Key: HIVE-78
> URL: https://issues.apache.org/jira/browse/HIVE-78
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Ashish Thusoo
>Assignee: Edward Capriolo
> Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, 
> hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication 
> and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

2009-09-17 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756904#action_12756904
 ] 

Min Zhou commented on HIVE-78:
--

Let me guess, you are all talking about CLI. But we are using HiveServer as a 
multi-user server, not just support only one user  like mysqld does.

> Authentication infrastructure for Hive
> --
>
> Key: HIVE-78
> URL: https://issues.apache.org/jira/browse/HIVE-78
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Ashish Thusoo
>Assignee: Edward Capriolo
> Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, 
> hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication 
> and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[VOTE] vote for release candidate for hive

2009-09-17 Thread Namit Jain

Following the convention

-Original Message-
From: Namit Jain 
Sent: Thursday, September 17, 2009 6:31 PM
To: hive-dev@hadoop.apache.org
Subject: vote for release candidate for hive

I have created another release candidate for Hive.
 
  https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc2/ 
 
Let me know if it is OK to publish this release candidate.
 
 
 
The only change from the previous candidate 
(https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc1/) is the 
fix for
 
 https://issues.apache.org/jira/browse/HIVE-838


The tar ball can be found at:

people.apache.org

/home/namit/public_html/hive-0.4.0-candidate-2/hive-0.4.0-dev.tar.gz*



Thanks,
-namit






vote for release candidate for hive

2009-09-17 Thread Namit Jain
I have created another release candidate for Hive.
 
  https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc2/ 
 
Let me know if it is OK to publish this release candidate.
 
 
 
The only change from the previous candidate 
(https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc1/) is the 
fix for
 
 https://issues.apache.org/jira/browse/HIVE-838


The tar ball can be found at:

people.apache.org

/home/namit/public_html/hive-0.4.0-candidate-2/hive-0.4.0-dev.tar.gz*



Thanks,
-namit






[jira] Commented: (HIVE-819) Add lazy decompress ability to RCFile

2009-09-17 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756881#action_12756881
 ] 

He Yongqiang commented on HIVE-819:
---

>>Can you briefly summarize the current approach of how decompression is done 
>>and the your proposal to the lazy decompression? Also more comments in the 
>>code would be much helpful.
np. Currently compression is eager. The needed columns info is passed into 
reader, and the reader will skip unneeded columns and only read needed columns 
into memory and decompress them immediately when they are read.
Lazy decompression is done by not decompress needed columns at the first place, 
just hold the uncompressed bytes in memory, and pass a call back object to 
BytesRefWritable. The patch added an interface LazyDecompressionCallback, and 
RCFile's reader implemented it as a LazyDecompressionCallbackImpl. 
LazyDecompressionCallback is used to constuct BytesRefWritable, and when 
BytesRefWritable.getData() etc is called(that's the entry between 
ColumnSerde,ColumnStruct and BytesRefWritable) when need to convert underlying 
bytes to objects, the call back method is invoked and decompression happens.

>>Does the performance regression by 4 secs with the query predicate duration > 
>>8 consistent or intermittent?
intermittent. i tested it more times after did the comments. 
>>If the latter, what method of timing are you using?
i just submit a simple hive select query in local mode and use the query finish 
time.

> Add lazy decompress ability to RCFile
> -
>
> Key: HIVE-819
> URL: https://issues.apache.org/jira/browse/HIVE-819
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor, Serializers/Deserializers
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Fix For: 0.5.0
>
> Attachments: hive-819-2009-9-12.patch
>
>
> This is especially useful for a filter scanning. 
> For example, for query 'select a, b, c from table_rc_lazydecompress where 
> a>1;' we only need to decompress the block data of b,c columns when one row's 
> column 'a' in that block satisfies the filter condition.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-838) in strict mode, no partition selected error

2009-09-17 Thread Raghotham Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghotham Murthy resolved HIVE-838.
---

   Resolution: Fixed
Fix Version/s: 0.4.0

committed to 0.4.

> in strict mode, no partition selected error
> ---
>
> Key: HIVE-838
> URL: https://issues.apache.org/jira/browse/HIVE-838
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.4.0
>
> Attachments: hive.838.1.patch, hive.838.2.patch
>
>
> set hive.mapred.mode=strict;
> select * from 
>   (select count(1) from src 
> union all
>select count(1) from srcpart where ds = '2009-08-09'
>   )x;
> Is it a blocker for 0.4 ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

2009-09-17 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756823#action_12756823
 ] 

Ashish Thusoo commented on HIVE-78:
---

@Min

I agree with Edwards thought here. We have to foster a collaborative 
environment and not be dismissive of each others ideas and approaches. Much of 
the work in the community happens on a volunteer basis and whatever time anyone 
puts on the project is a bonus and should be respected by all. 

It does make sense to keep authentication separate from authorization because 
in most environments there are already directories which deal with the former. 
Creating yet another store for passwords just leads to an administration 
nightmare as the account administrators have to create accounts for new users 
at multiple places. So lets just focus on authorization and let the directory 
infrastructure deal with authentication. Will look at your patch as well.




> Authentication infrastructure for Hive
> --
>
> Key: HIVE-78
> URL: https://issues.apache.org/jira/browse/HIVE-78
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Ashish Thusoo
>Assignee: Edward Capriolo
> Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, 
> hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication 
> and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-841) Context.java Uses Deleted (previously Deprecated) Hadoop Methods

2009-09-17 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-841:


   Resolution: Fixed
Fix Version/s: 0.5.0
   Status: Resolved  (was: Patch Available)

Committed. Thanks Cyrus

> Context.java Uses Deleted (previously Deprecated) Hadoop Methods
> 
>
> Key: HIVE-841
> URL: https://issues.apache.org/jira/browse/HIVE-841
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Cyrus Katrak
> Fix For: 0.5.0
>
> Attachments: hive841.patch
>
>
> Building Hive against Trunk/Nightly Hadoop Fails 
> (ql/src/java/org/apache/hadoop/hive/ql/Context.java)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

2009-09-17 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756817#action_12756817
 ] 

Edward Capriolo commented on HIVE-78:
-

@namit,

I think, I can explain why AS made sense at the time. My plan was not to 
decouple users from a rule. See my little patch.

{noformat}
+struct AccessControl {
+  1: list  user,
+  2: list  group,
+  3: list  database,
+  4: list  table,
+  5: list  partition,
+  6: list  column,
+  7: list  priv,
+  8: stringname
+}
{noformat}

I wanted to be more or less immutable or support really simple syntax.

Something like this is doable
{noformat}
GRANT my_permission to USER3;
{noformat}
But it seems to imply that users are decoupled from the rule. 
This is really not true (in my design) a user or group is just another 
multivalued attribute of the rule. 

I would like the format to be inter-changable 
{noformat}
ALTER my_permission add db 'db';
ALTER my_permission add table 'db.table';
ALTER my_permission drop table 'db.table';
{noformat}

@Min,
Above in this Jira see Ashish's comment..

{noformat}
I agree, it is best to punt authentication to the authentication systems (LDAP, 
kerb etc. etc.) and concentrate on authorization (privileges) here. 
{noformat}

The goal here is to trust the User/group information as hadoop does, and create 
a system that grants/revokes privileges.  Authentication and Authorization are 
two separate things so our Jira is misnamed :)

I will review your patch, just to see what you came up with. As I said, you are 
farther along then I am, and this has been off my radar so I don't mind passing 
the baton, but Namit is right we have to agree on the syntax because and what 
we are controlling because down the road it will be an issue.





> Authentication infrastructure for Hive
> --
>
> Key: HIVE-78
> URL: https://issues.apache.org/jira/browse/HIVE-78
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Ashish Thusoo
>Assignee: Edward Capriolo
> Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, 
> hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication 
> and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-841) Context.java Uses Deleted (previously Deprecated) Hadoop Methods

2009-09-17 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756804#action_12756804
 ] 

Namit Jain commented on HIVE-841:
-

The changes look good - will commit if the tests pass

> Context.java Uses Deleted (previously Deprecated) Hadoop Methods
> 
>
> Key: HIVE-841
> URL: https://issues.apache.org/jira/browse/HIVE-841
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Cyrus Katrak
> Attachments: hive841.patch
>
>
> Building Hive against Trunk/Nightly Hadoop Fails 
> (ql/src/java/org/apache/hadoop/hive/ql/Context.java)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-838) in strict mode, no partition selected error

2009-09-17 Thread Raghotham Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghotham Murthy updated HIVE-838:
--

  Resolution: Fixed
Release Note: 
HIVE-838. In strict mode, remove error if no partition is selected.
(Namit Jain via rmurthy)

Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

committed. thanks namit.

> in strict mode, no partition selected error
> ---
>
> Key: HIVE-838
> URL: https://issues.apache.org/jira/browse/HIVE-838
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.838.1.patch, hive.838.2.patch
>
>
> set hive.mapred.mode=strict;
> select * from 
>   (select count(1) from src 
> union all
>select count(1) from srcpart where ds = '2009-08-09'
>   )x;
> Is it a blocker for 0.4 ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (HIVE-838) in strict mode, no partition selected error

2009-09-17 Thread Raghotham Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghotham Murthy reopened HIVE-838:
---


havent committed to 0.4 yet.

> in strict mode, no partition selected error
> ---
>
> Key: HIVE-838
> URL: https://issues.apache.org/jira/browse/HIVE-838
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.838.1.patch, hive.838.2.patch
>
>
> set hive.mapred.mode=strict;
> select * from 
>   (select count(1) from src 
> union all
>select count(1) from srcpart where ds = '2009-08-09'
>   )x;
> Is it a blocker for 0.4 ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-80) Allow Hive Server to run multiple queries simulteneously

2009-09-17 Thread Cliff Resnick (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-80?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cliff Resnick updated HIVE-80:
--

Attachment: org.apache.hadoop.hive.ql.exec.Utilities-ThreadLocal-1.patch

This fixes a broken patch previously submitted

> Allow Hive Server to run multiple queries simulteneously
> 
>
> Key: HIVE-80
> URL: https://issues.apache.org/jira/browse/HIVE-80
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Server Infrastructure
>Reporter: Raghotham Murthy
>Assignee: Neil Conway
>Priority: Critical
> Fix For: 0.5.0
>
> Attachments: hive_input_format_race-2.patch, 
> org.apache.hadoop.hive.ql.exec.Utilities-ThreadLocal-1.patch
>
>
> Can use one driver object per query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-80) Allow Hive Server to run multiple queries simulteneously

2009-09-17 Thread Cliff Resnick (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-80?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cliff Resnick updated HIVE-80:
--

Attachment: (was: 
org.apache.hadoop.hive.ql.exec.Utilities-ThreadLocal.patch)

> Allow Hive Server to run multiple queries simulteneously
> 
>
> Key: HIVE-80
> URL: https://issues.apache.org/jira/browse/HIVE-80
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Server Infrastructure
>Reporter: Raghotham Murthy
>Assignee: Neil Conway
>Priority: Critical
> Fix For: 0.5.0
>
> Attachments: hive_input_format_race-2.patch
>
>
> Can use one driver object per query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: vote for release candidate for hive

2009-09-17 Thread Matt Pestritto
Please disregard.  I found the cause of my error.

Thanks.

On Thu, Sep 17, 2009 at 3:09 PM, Matt Pestritto  wrote:

> I recently switched to the 0.4 branch to do some testing and I'm running
> into a problem.
>
> When I run a query from the cli - the first one works, but the second query
> always fails with a NullPointerException.
>
> Did anyone else run into this ?
>
> Thanks
> -Matt
>
> hive> select count(1) from table1;
> Total MapReduce jobs = 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=
> Starting Job = job_200909171501_0001, Tracking URL =
> http://mustique:50030/jobdetails.jsp?jobid=job_200909171501_0001
> Kill Command = /home/hadoop/hadoop/bin/../bin/hadoop job
> -Dmapred.job.tracker=mustique:9001 -kill job_200909171501_0001
> 2009-09-17 03:05:54,855 map = 0%,  reduce =0%
> 2009-09-17 03:06:02,895 map = 22%,  reduce =0%
> 2009-09-17 03:06:06,933 map = 44%,  reduce =0%
> 2009-09-17 03:06:11,965 map = 67%,  reduce =0%
> 2009-09-17 03:06:15,988 map = 89%,  reduce =0%
> 2009-09-17 03:06:20,009 map = 100%,  reduce =0%
> 2009-09-17 03:06:25,036 map = 100%,  reduce =11%
> 2009-09-17 03:06:30,054 map = 100%,  reduce =15%
> 2009-09-17 03:06:31,063 map = 100%,  reduce =22%
> 2009-09-17 03:06:34,075 map = 100%,  reduce =26%
> 2009-09-17 03:06:36,101 map = 100%,  reduce =100%
> Ended Job = job_200909171501_0001
> OK
> 274087
> Time taken: 45.401 seconds
> hive> select count(1) from table1;
> Total MapReduce jobs = 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=
> java.lang.NullPointerException
> at
> org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:154)
> at
> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:373)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:379)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:285)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> Job Submission failed with exception
> 'java.lang.RuntimeException(java.lang.NullPointerException)'
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.ExecDriver
> hive>
>
>
> On Thu, Sep 17, 2009 at 12:36 PM, Namit Jain  wrote:
>
>> https://issues.apache.org/jira/browse/HIVE-838
>>
>> is a blocker for 0.4 -
>> Once this is merged, I will have another release candidate
>>
>>
>> -Original Message-
>> From: Johan Oskarsson [mailto:jo...@oskarsson.nu]
>> Sent: Wednesday, September 16, 2009 8:29 AM
>> To: hive-dev@hadoop.apache.org
>> Subject: Re: vote for release candidate for hive
>>
>> +1 based on running unit tests.
>>
>> /Johan
>>
>> Namit Jain wrote:
>> > Sorry, was meant for hive-dev@
>> >
>> > From: Namit Jain [mailto:nj...@facebook.com]
>> > Sent: Tuesday, September 15, 2009 1:30 PM
>> > To: hive-u...@hadoop.apache.org
>> > Subject: vote for release candidate for hive
>> >
>> >
>> > I have created another release candidate for Hive.
>> >
>> >
>> >
>> >  https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc1/
>> >
>> >
>> >
>> >
>> >
>> > Let me know if it is OK to publish this release candidate.
>> >
>> >
>> >
>> > The only change from the previous candidate (
>> https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc0/) is
>> the fix for
>> >
>> > https://issues.apache.org/jira/browse/HIVE-718
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > Thanks,
>> >
>> > -namit
>> >
>> >
>> >
>> >
>>
>>
>


Re: vote for release candidate for hive

2009-09-17 Thread Matt Pestritto
I recently switched to the 0.4 branch to do some testing and I'm running
into a problem.

When I run a query from the cli - the first one works, but the second query
always fails with a NullPointerException.

Did anyone else run into this ?

Thanks
-Matt

hive> select count(1) from table1;
Total MapReduce jobs = 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapred.reduce.tasks=
Starting Job = job_200909171501_0001, Tracking URL =
http://mustique:50030/jobdetails.jsp?jobid=job_200909171501_0001
Kill Command = /home/hadoop/hadoop/bin/../bin/hadoop job
-Dmapred.job.tracker=mustique:9001 -kill job_200909171501_0001
2009-09-17 03:05:54,855 map = 0%,  reduce =0%
2009-09-17 03:06:02,895 map = 22%,  reduce =0%
2009-09-17 03:06:06,933 map = 44%,  reduce =0%
2009-09-17 03:06:11,965 map = 67%,  reduce =0%
2009-09-17 03:06:15,988 map = 89%,  reduce =0%
2009-09-17 03:06:20,009 map = 100%,  reduce =0%
2009-09-17 03:06:25,036 map = 100%,  reduce =11%
2009-09-17 03:06:30,054 map = 100%,  reduce =15%
2009-09-17 03:06:31,063 map = 100%,  reduce =22%
2009-09-17 03:06:34,075 map = 100%,  reduce =26%
2009-09-17 03:06:36,101 map = 100%,  reduce =100%
Ended Job = job_200909171501_0001
OK
274087
Time taken: 45.401 seconds
hive> select count(1) from table1;
Total MapReduce jobs = 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapred.reduce.tasks=
java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:154)
at
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:373)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:379)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:285)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
Job Submission failed with exception
'java.lang.RuntimeException(java.lang.NullPointerException)'
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.ExecDriver
hive>


On Thu, Sep 17, 2009 at 12:36 PM, Namit Jain  wrote:

> https://issues.apache.org/jira/browse/HIVE-838
>
> is a blocker for 0.4 -
> Once this is merged, I will have another release candidate
>
>
> -Original Message-
> From: Johan Oskarsson [mailto:jo...@oskarsson.nu]
> Sent: Wednesday, September 16, 2009 8:29 AM
> To: hive-dev@hadoop.apache.org
> Subject: Re: vote for release candidate for hive
>
> +1 based on running unit tests.
>
> /Johan
>
> Namit Jain wrote:
> > Sorry, was meant for hive-dev@
> >
> > From: Namit Jain [mailto:nj...@facebook.com]
> > Sent: Tuesday, September 15, 2009 1:30 PM
> > To: hive-u...@hadoop.apache.org
> > Subject: vote for release candidate for hive
> >
> >
> > I have created another release candidate for Hive.
> >
> >
> >
> >  https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc1/
> >
> >
> >
> >
> >
> > Let me know if it is OK to publish this release candidate.
> >
> >
> >
> > The only change from the previous candidate (
> https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc0/) is
> the fix for
> >
> > https://issues.apache.org/jira/browse/HIVE-718
> >
> >
> >
> >
> >
> >
> >
> > Thanks,
> >
> > -namit
> >
> >
> >
> >
>
>


[jira] Updated: (HIVE-841) Context.java Uses Deleted (previously Deprecated) Hadoop Methods

2009-09-17 Thread Cyrus Katrak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cyrus Katrak updated HIVE-841:
--

Status: Patch Available  (was: Open)

> Context.java Uses Deleted (previously Deprecated) Hadoop Methods
> 
>
> Key: HIVE-841
> URL: https://issues.apache.org/jira/browse/HIVE-841
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Cyrus Katrak
> Attachments: hive841.patch
>
>
> Building Hive against Trunk/Nightly Hadoop Fails 
> (ql/src/java/org/apache/hadoop/hive/ql/Context.java)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-819) Add lazy decompress ability to RCFile

2009-09-17 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756674#action_12756674
 ] 

Ning Zhang commented on HIVE-819:
-

A few general comments: 

 1) Can you briefly summarize the current approach of how decompression is done 
and the your proposal to the lazy decompression? Also more comments in the code 
would be much helpful.

 2) Does the performance regression by 4 secs with the query predicate duration 
> 8 consistent or intermittent? If it is the former is there any additional 
changes that causes this regression (I thought the worst case would be 
decompress all columns, as you mentioned, which is equivalent to the previous 
behavior?). If the latter, what method of timing are you using? If you have 
YourKit can your also do CPU profiling? 

> Add lazy decompress ability to RCFile
> -
>
> Key: HIVE-819
> URL: https://issues.apache.org/jira/browse/HIVE-819
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor, Serializers/Deserializers
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Fix For: 0.5.0
>
> Attachments: hive-819-2009-9-12.patch
>
>
> This is especially useful for a filter scanning. 
> For example, for query 'select a, b, c from table_rc_lazydecompress where 
> a>1;' we only need to decompress the block data of b,c columns when one row's 
> column 'a' in that block satisfies the filter condition.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-841) Context.java Uses Deleted (previously Deprecated) Hadoop Methods

2009-09-17 Thread Cyrus Katrak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cyrus Katrak updated HIVE-841:
--

Attachment: hive841.patch

> Context.java Uses Deleted (previously Deprecated) Hadoop Methods
> 
>
> Key: HIVE-841
> URL: https://issues.apache.org/jira/browse/HIVE-841
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Cyrus Katrak
> Attachments: hive841.patch
>
>
> Building Hive against Trunk/Nightly Hadoop Fails 
> (ql/src/java/org/apache/hadoop/hive/ql/Context.java)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-841) Context.java Uses Deleted (previously Deprecated) Hadoop Methods

2009-09-17 Thread Cyrus Katrak (JIRA)
Context.java Uses Deleted (previously Deprecated) Hadoop Methods


 Key: HIVE-841
 URL: https://issues.apache.org/jira/browse/HIVE-841
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.5.0
Reporter: Cyrus Katrak


Building Hive against Trunk/Nightly Hadoop Fails 
(ql/src/java/org/apache/hadoop/hive/ql/Context.java)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-78) Authentication infrastructure for Hive

2009-09-17 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756662#action_12756662
 ] 

Namit Jain commented on HIVE-78:


I think, we should spend some time on finalizing the functionality before 
implementing it - it is very difficult to change something once it is out, due 
to all kinds of backward compatibility issues.

For the syntax, AS

wont it be simpler to add permissions to a role, and then assign roles to a 
user.



GRANT WITH_GRANT,RC, ON '*' TO 'USER1','USER2' AS my_permission

ALTER GRANT my_permission add USER 'USER3'


Can I revoke some privileges from my_permissions ?

If yes, how is it different from doing the two things differently ?


CREATE ROLE my_permission AS GRANT WITH_GRANT,RC, ON '*' ;
GRANT my_permission to USER1, USER2;

later

GRANT my_permission to USER3;

> Authentication infrastructure for Hive
> --
>
> Key: HIVE-78
> URL: https://issues.apache.org/jira/browse/HIVE-78
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Ashish Thusoo
>Assignee: Edward Capriolo
> Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, 
> hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication 
> and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-837) virtual column support (filename) in hive

2009-09-17 Thread Prasad Chakka (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756658#action_12756658
 ] 

Prasad Chakka commented on HIVE-837:


buckets have other semantic meaning which is not the case for files so we 
should not lump buckets with meta/virtual columns. we could possibly add a 
virtual column/udf called bucket() for that.

mysql gives lot of virtual data as udfs (curtime(), database(), current_user(), 
default(column)) etc instead of virtual columns. i think it makes sense to make 
them udfs just incase some virtual columns need arguments.

> virtual column support (filename) in hive
> -
>
> Key: HIVE-837
> URL: https://issues.apache.org/jira/browse/HIVE-837
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>
> Copying from some mails:
> I am dumping files into a hive partion on five minute intervals. I am using 
> LOAD DATA into a partition.
> weblogs
> web1.00
> web1.05
> web1.10
> ...
> web2.00
> web2.05
> web1.10
> 
> Things that would be useful..
> Select files from the folder with a regex or exact name
> select * FROM logs where FILENAME LIKE(WEB1*)
> select * FROM LOGS WHERE FILENAME=web2.00
> Also it would be nice to be able to select offsets in a file, this would make 
> sense with appends
> select * from logs WHERE FILENAME=web2.00 FROMOFFSET=454644 [TOOFFSET=]
> select  
> substr(filename, 4, 7) as  class_A, 
> substr(filename,  8, 10) as class_B
> count( x ) as cnt
> from FOO
> group by
> substr(filename, 4, 7), 
> substr(filename,  8, 10) ;
> Hive should support virtual columns

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-837) virtual column support (filename) in hive

2009-09-17 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756630#action_12756630
 ] 

Namit Jain commented on HIVE-837:
-

yesterday, i was having a offline conversation with Raghu, and we were thinking 
that this is similar to the concept of buckets that exists currently.
So, do we enhance the tablesample clause to include filenames also, and not 
expose it as a virtual column at all ?

I think it is more intuitive to have filenames in the where clause - maybe, we 
should have some virtual columns for buckets also and leave the current
syntax for buckets as is for backward compatibility.

File pruning is must - so, having the filename as udf might be more difficult. 
The udf filename() will return the same value at compile time.
So, I would prefer virtual columns instead of filenames. 

SELECT * FROM weblogs DATAFILE ('log1.txt', 'log2.txt') WHERE col1='..' and 
col2= ...
would solve the pruning problem since the file names are part of the syntax, 
but how do you propose to select the filename in that case ?

So, I think the original syntax:
select * FROM logs where FILENAME LIKE(WEB1*)
might be easier

> virtual column support (filename) in hive
> -
>
> Key: HIVE-837
> URL: https://issues.apache.org/jira/browse/HIVE-837
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>
> Copying from some mails:
> I am dumping files into a hive partion on five minute intervals. I am using 
> LOAD DATA into a partition.
> weblogs
> web1.00
> web1.05
> web1.10
> ...
> web2.00
> web2.05
> web1.10
> 
> Things that would be useful..
> Select files from the folder with a regex or exact name
> select * FROM logs where FILENAME LIKE(WEB1*)
> select * FROM LOGS WHERE FILENAME=web2.00
> Also it would be nice to be able to select offsets in a file, this would make 
> sense with appends
> select * from logs WHERE FILENAME=web2.00 FROMOFFSET=454644 [TOOFFSET=]
> select  
> substr(filename, 4, 7) as  class_A, 
> substr(filename,  8, 10) as class_B
> count( x ) as cnt
> from FOO
> group by
> substr(filename, 4, 7), 
> substr(filename,  8, 10) ;
> Hive should support virtual columns

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: vote for release candidate for hive

2009-09-17 Thread Namit Jain
https://issues.apache.org/jira/browse/HIVE-838

is a blocker for 0.4 - 
Once this is merged, I will have another release candidate


-Original Message-
From: Johan Oskarsson [mailto:jo...@oskarsson.nu] 
Sent: Wednesday, September 16, 2009 8:29 AM
To: hive-dev@hadoop.apache.org
Subject: Re: vote for release candidate for hive

+1 based on running unit tests.

/Johan

Namit Jain wrote:
> Sorry, was meant for hive-dev@
> 
> From: Namit Jain [mailto:nj...@facebook.com]
> Sent: Tuesday, September 15, 2009 1:30 PM
> To: hive-u...@hadoop.apache.org
> Subject: vote for release candidate for hive
> 
> 
> I have created another release candidate for Hive.
> 
> 
> 
>  https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc1/
> 
> 
> 
> 
> 
> Let me know if it is OK to publish this release candidate.
> 
> 
> 
> The only change from the previous candidate 
> (https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc0/) is the 
> fix for
> 
> https://issues.apache.org/jira/browse/HIVE-718
> 
> 
> 
> 
> 
> 
> 
> Thanks,
> 
> -namit
> 
> 
> 
> 



[jira] Commented: (HIVE-837) virtual column support (filename) in hive

2009-09-17 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756563#action_12756563
 ] 

Edward Capriolo commented on HIVE-837:
--

It would be nice and very useful. sometimes I want to select my own 'partition' 
or 'datafile' explicitly.. something like below:

SELECT *  FROM weblogs PARTITION ('2009-09-17', '2009-09-18') WHERE col1='..' 
and col2= ...

Or users can select data files from directory:

SELECT *  FROM weblogs DATAFILE ('log1.txt', 'log2.txt') WHERE col1='..' and 
col2= ...

> virtual column support (filename) in hive
> -
>
> Key: HIVE-837
> URL: https://issues.apache.org/jira/browse/HIVE-837
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>
> Copying from some mails:
> I am dumping files into a hive partion on five minute intervals. I am using 
> LOAD DATA into a partition.
> weblogs
> web1.00
> web1.05
> web1.10
> ...
> web2.00
> web2.05
> web1.10
> 
> Things that would be useful..
> Select files from the folder with a regex or exact name
> select * FROM logs where FILENAME LIKE(WEB1*)
> select * FROM LOGS WHERE FILENAME=web2.00
> Also it would be nice to be able to select offsets in a file, this would make 
> sense with appends
> select * from logs WHERE FILENAME=web2.00 FROMOFFSET=454644 [TOOFFSET=]
> select  
> substr(filename, 4, 7) as  class_A, 
> substr(filename,  8, 10) as class_B
> count( x ) as cnt
> from FOO
> group by
> substr(filename, 4, 7), 
> substr(filename,  8, 10) ;
> Hive should support virtual columns

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-819) Add lazy decompress ability to RCFile

2009-09-17 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao reassigned HIVE-819:
---

Assignee: He Yongqiang

> Add lazy decompress ability to RCFile
> -
>
> Key: HIVE-819
> URL: https://issues.apache.org/jira/browse/HIVE-819
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor, Serializers/Deserializers
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Fix For: 0.5.0
>
> Attachments: hive-819-2009-9-12.patch
>
>
> This is especially useful for a filter scanning. 
> For example, for query 'select a, b, c from table_rc_lazydecompress where 
> a>1;' we only need to decompress the block data of b,c columns when one row's 
> column 'a' in that block satisfies the filter condition.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.