RE: [VOTE] vote for release candidate for hive
Namit, Can you make it available from http://people.apache.org/~njain/ That way people who do not have access to the apache machines will also be able to try the candidate. Thanks, Ashish From: Namit Jain [nj...@facebook.com] Sent: Thursday, September 17, 2009 6:32 PM To: Namit Jain; hive-dev@hadoop.apache.org Subject: [VOTE] vote for release candidate for hive Following the convention -Original Message- From: Namit Jain Sent: Thursday, September 17, 2009 6:31 PM To: hive-dev@hadoop.apache.org Subject: vote for release candidate for hive I have created another release candidate for Hive. https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc2/ Let me know if it is OK to publish this release candidate. The only change from the previous candidate (https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc1/) is the fix for https://issues.apache.org/jira/browse/HIVE-838 The tar ball can be found at: people.apache.org /home/namit/public_html/hive-0.4.0-candidate-2/hive-0.4.0-dev.tar.gz* Thanks, -namit
[jira] Commented: (HIVE-819) Add lazy decompress ability to RCFile
[ https://issues.apache.org/jira/browse/HIVE-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756978#action_12756978 ] Ning Zhang commented on HIVE-819: - Yongqiang, thanks for the explanation! Below are some more detailed comments: 1) in RCFile.c:307 it seems decompress() can be called multiple times and the function doesn't check if the data is already decompressed, and if so return. This may not cause problem in this diff since the callers will check if the data is decompressed or not before calling decompress(), but since it is a public function and it doesn't prevent future callers call this function twice. So it may be better to implement this check inside the decompress() function. 2) Also the same decompress() function, it seems it doesn't work correctly when the column is not compressed. Can you double check it? 3) Add unit tests or qfiles for the following cases: - storage dimension: (1) fields are compressed (2) fields are uncompressed - queries dimension: (a) 1 column in the where-clause (b) 2 references to the same column in the where-clause (e.g., a> 2 and a < 5) (c) 2 references to the same column in the where-clause and groupby-clause respectively (e.g., where a > 2 group by a). So there will be 6 test cases w/ the permutation of the 2 dimensions. For (b) and (c) please check the actual column decompression is only done once. > Add lazy decompress ability to RCFile > - > > Key: HIVE-819 > URL: https://issues.apache.org/jira/browse/HIVE-819 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor, Serializers/Deserializers >Reporter: He Yongqiang >Assignee: He Yongqiang > Fix For: 0.5.0 > > Attachments: hive-819-2009-9-12.patch > > > This is especially useful for a filter scanning. > For example, for query 'select a, b, c from table_rc_lazydecompress where > a>1;' we only need to decompress the block data of b,c columns when one row's > column 'a' in that block satisfies the filter condition. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756951#action_12756951 ] Min Zhou commented on HIVE-78: -- >From the words you commented: {noformat} Daemons like HiveService and HiveWebInterface will have to run as supergroup or a hive group? {noformat} > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, > hive-78.diff > > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756949#action_12756949 ] Min Zhou commented on HIVE-78: -- I do not think the HiveServer in your mind is the same as mine, which support multiple users, not only one. > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, > hive-78.diff > > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756936#action_12756936 ] Edward Capriolo commented on HIVE-78: - @Min I would think the code should apply to any client cli, hive server, or HWI. We should probably also provide a configuration variable {noformat} hive.authorize true {noformat} > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, > hive-78.diff > > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756904#action_12756904 ] Min Zhou commented on HIVE-78: -- Let me guess, you are all talking about CLI. But we are using HiveServer as a multi-user server, not just support only one user like mysqld does. > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, > hive-78.diff > > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[VOTE] vote for release candidate for hive
Following the convention -Original Message- From: Namit Jain Sent: Thursday, September 17, 2009 6:31 PM To: hive-dev@hadoop.apache.org Subject: vote for release candidate for hive I have created another release candidate for Hive. https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc2/ Let me know if it is OK to publish this release candidate. The only change from the previous candidate (https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc1/) is the fix for https://issues.apache.org/jira/browse/HIVE-838 The tar ball can be found at: people.apache.org /home/namit/public_html/hive-0.4.0-candidate-2/hive-0.4.0-dev.tar.gz* Thanks, -namit
vote for release candidate for hive
I have created another release candidate for Hive. https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc2/ Let me know if it is OK to publish this release candidate. The only change from the previous candidate (https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc1/) is the fix for https://issues.apache.org/jira/browse/HIVE-838 The tar ball can be found at: people.apache.org /home/namit/public_html/hive-0.4.0-candidate-2/hive-0.4.0-dev.tar.gz* Thanks, -namit
[jira] Commented: (HIVE-819) Add lazy decompress ability to RCFile
[ https://issues.apache.org/jira/browse/HIVE-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756881#action_12756881 ] He Yongqiang commented on HIVE-819: --- >>Can you briefly summarize the current approach of how decompression is done >>and the your proposal to the lazy decompression? Also more comments in the >>code would be much helpful. np. Currently compression is eager. The needed columns info is passed into reader, and the reader will skip unneeded columns and only read needed columns into memory and decompress them immediately when they are read. Lazy decompression is done by not decompress needed columns at the first place, just hold the uncompressed bytes in memory, and pass a call back object to BytesRefWritable. The patch added an interface LazyDecompressionCallback, and RCFile's reader implemented it as a LazyDecompressionCallbackImpl. LazyDecompressionCallback is used to constuct BytesRefWritable, and when BytesRefWritable.getData() etc is called(that's the entry between ColumnSerde,ColumnStruct and BytesRefWritable) when need to convert underlying bytes to objects, the call back method is invoked and decompression happens. >>Does the performance regression by 4 secs with the query predicate duration > >>8 consistent or intermittent? intermittent. i tested it more times after did the comments. >>If the latter, what method of timing are you using? i just submit a simple hive select query in local mode and use the query finish time. > Add lazy decompress ability to RCFile > - > > Key: HIVE-819 > URL: https://issues.apache.org/jira/browse/HIVE-819 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor, Serializers/Deserializers >Reporter: He Yongqiang >Assignee: He Yongqiang > Fix For: 0.5.0 > > Attachments: hive-819-2009-9-12.patch > > > This is especially useful for a filter scanning. > For example, for query 'select a, b, c from table_rc_lazydecompress where > a>1;' we only need to decompress the block data of b,c columns when one row's > column 'a' in that block satisfies the filter condition. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-838) in strict mode, no partition selected error
[ https://issues.apache.org/jira/browse/HIVE-838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghotham Murthy resolved HIVE-838. --- Resolution: Fixed Fix Version/s: 0.4.0 committed to 0.4. > in strict mode, no partition selected error > --- > > Key: HIVE-838 > URL: https://issues.apache.org/jira/browse/HIVE-838 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Fix For: 0.4.0 > > Attachments: hive.838.1.patch, hive.838.2.patch > > > set hive.mapred.mode=strict; > select * from > (select count(1) from src > union all >select count(1) from srcpart where ds = '2009-08-09' > )x; > Is it a blocker for 0.4 ? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756823#action_12756823 ] Ashish Thusoo commented on HIVE-78: --- @Min I agree with Edwards thought here. We have to foster a collaborative environment and not be dismissive of each others ideas and approaches. Much of the work in the community happens on a volunteer basis and whatever time anyone puts on the project is a bonus and should be respected by all. It does make sense to keep authentication separate from authorization because in most environments there are already directories which deal with the former. Creating yet another store for passwords just leads to an administration nightmare as the account administrators have to create accounts for new users at multiple places. So lets just focus on authorization and let the directory infrastructure deal with authentication. Will look at your patch as well. > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, > hive-78.diff > > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-841) Context.java Uses Deleted (previously Deprecated) Hadoop Methods
[ https://issues.apache.org/jira/browse/HIVE-841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-841: Resolution: Fixed Fix Version/s: 0.5.0 Status: Resolved (was: Patch Available) Committed. Thanks Cyrus > Context.java Uses Deleted (previously Deprecated) Hadoop Methods > > > Key: HIVE-841 > URL: https://issues.apache.org/jira/browse/HIVE-841 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.5.0 >Reporter: Cyrus Katrak > Fix For: 0.5.0 > > Attachments: hive841.patch > > > Building Hive against Trunk/Nightly Hadoop Fails > (ql/src/java/org/apache/hadoop/hive/ql/Context.java) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756817#action_12756817 ] Edward Capriolo commented on HIVE-78: - @namit, I think, I can explain why AS made sense at the time. My plan was not to decouple users from a rule. See my little patch. {noformat} +struct AccessControl { + 1: list user, + 2: list group, + 3: list database, + 4: list table, + 5: list partition, + 6: list column, + 7: list priv, + 8: stringname +} {noformat} I wanted to be more or less immutable or support really simple syntax. Something like this is doable {noformat} GRANT my_permission to USER3; {noformat} But it seems to imply that users are decoupled from the rule. This is really not true (in my design) a user or group is just another multivalued attribute of the rule. I would like the format to be inter-changable {noformat} ALTER my_permission add db 'db'; ALTER my_permission add table 'db.table'; ALTER my_permission drop table 'db.table'; {noformat} @Min, Above in this Jira see Ashish's comment.. {noformat} I agree, it is best to punt authentication to the authentication systems (LDAP, kerb etc. etc.) and concentrate on authorization (privileges) here. {noformat} The goal here is to trust the User/group information as hadoop does, and create a system that grants/revokes privileges. Authentication and Authorization are two separate things so our Jira is misnamed :) I will review your patch, just to see what you came up with. As I said, you are farther along then I am, and this has been off my radar so I don't mind passing the baton, but Namit is right we have to agree on the syntax because and what we are controlling because down the road it will be an issue. > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, > hive-78.diff > > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-841) Context.java Uses Deleted (previously Deprecated) Hadoop Methods
[ https://issues.apache.org/jira/browse/HIVE-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756804#action_12756804 ] Namit Jain commented on HIVE-841: - The changes look good - will commit if the tests pass > Context.java Uses Deleted (previously Deprecated) Hadoop Methods > > > Key: HIVE-841 > URL: https://issues.apache.org/jira/browse/HIVE-841 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.5.0 >Reporter: Cyrus Katrak > Attachments: hive841.patch > > > Building Hive against Trunk/Nightly Hadoop Fails > (ql/src/java/org/apache/hadoop/hive/ql/Context.java) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-838) in strict mode, no partition selected error
[ https://issues.apache.org/jira/browse/HIVE-838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghotham Murthy updated HIVE-838: -- Resolution: Fixed Release Note: HIVE-838. In strict mode, remove error if no partition is selected. (Namit Jain via rmurthy) Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) committed. thanks namit. > in strict mode, no partition selected error > --- > > Key: HIVE-838 > URL: https://issues.apache.org/jira/browse/HIVE-838 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.838.1.patch, hive.838.2.patch > > > set hive.mapred.mode=strict; > select * from > (select count(1) from src > union all >select count(1) from srcpart where ds = '2009-08-09' > )x; > Is it a blocker for 0.4 ? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Reopened: (HIVE-838) in strict mode, no partition selected error
[ https://issues.apache.org/jira/browse/HIVE-838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghotham Murthy reopened HIVE-838: --- havent committed to 0.4 yet. > in strict mode, no partition selected error > --- > > Key: HIVE-838 > URL: https://issues.apache.org/jira/browse/HIVE-838 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.838.1.patch, hive.838.2.patch > > > set hive.mapred.mode=strict; > select * from > (select count(1) from src > union all >select count(1) from srcpart where ds = '2009-08-09' > )x; > Is it a blocker for 0.4 ? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-80) Allow Hive Server to run multiple queries simulteneously
[ https://issues.apache.org/jira/browse/HIVE-80?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cliff Resnick updated HIVE-80: -- Attachment: org.apache.hadoop.hive.ql.exec.Utilities-ThreadLocal-1.patch This fixes a broken patch previously submitted > Allow Hive Server to run multiple queries simulteneously > > > Key: HIVE-80 > URL: https://issues.apache.org/jira/browse/HIVE-80 > Project: Hadoop Hive > Issue Type: Improvement > Components: Server Infrastructure >Reporter: Raghotham Murthy >Assignee: Neil Conway >Priority: Critical > Fix For: 0.5.0 > > Attachments: hive_input_format_race-2.patch, > org.apache.hadoop.hive.ql.exec.Utilities-ThreadLocal-1.patch > > > Can use one driver object per query. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-80) Allow Hive Server to run multiple queries simulteneously
[ https://issues.apache.org/jira/browse/HIVE-80?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cliff Resnick updated HIVE-80: -- Attachment: (was: org.apache.hadoop.hive.ql.exec.Utilities-ThreadLocal.patch) > Allow Hive Server to run multiple queries simulteneously > > > Key: HIVE-80 > URL: https://issues.apache.org/jira/browse/HIVE-80 > Project: Hadoop Hive > Issue Type: Improvement > Components: Server Infrastructure >Reporter: Raghotham Murthy >Assignee: Neil Conway >Priority: Critical > Fix For: 0.5.0 > > Attachments: hive_input_format_race-2.patch > > > Can use one driver object per query. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: vote for release candidate for hive
Please disregard. I found the cause of my error. Thanks. On Thu, Sep 17, 2009 at 3:09 PM, Matt Pestritto wrote: > I recently switched to the 0.4 branch to do some testing and I'm running > into a problem. > > When I run a query from the cli - the first one works, but the second query > always fails with a NullPointerException. > > Did anyone else run into this ? > > Thanks > -Matt > > hive> select count(1) from table1; > Total MapReduce jobs = 1 > Number of reduce tasks determined at compile time: 1 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer= > In order to limit the maximum number of reducers: > set hive.exec.reducers.max= > In order to set a constant number of reducers: > set mapred.reduce.tasks= > Starting Job = job_200909171501_0001, Tracking URL = > http://mustique:50030/jobdetails.jsp?jobid=job_200909171501_0001 > Kill Command = /home/hadoop/hadoop/bin/../bin/hadoop job > -Dmapred.job.tracker=mustique:9001 -kill job_200909171501_0001 > 2009-09-17 03:05:54,855 map = 0%, reduce =0% > 2009-09-17 03:06:02,895 map = 22%, reduce =0% > 2009-09-17 03:06:06,933 map = 44%, reduce =0% > 2009-09-17 03:06:11,965 map = 67%, reduce =0% > 2009-09-17 03:06:15,988 map = 89%, reduce =0% > 2009-09-17 03:06:20,009 map = 100%, reduce =0% > 2009-09-17 03:06:25,036 map = 100%, reduce =11% > 2009-09-17 03:06:30,054 map = 100%, reduce =15% > 2009-09-17 03:06:31,063 map = 100%, reduce =22% > 2009-09-17 03:06:34,075 map = 100%, reduce =26% > 2009-09-17 03:06:36,101 map = 100%, reduce =100% > Ended Job = job_200909171501_0001 > OK > 274087 > Time taken: 45.401 seconds > hive> select count(1) from table1; > Total MapReduce jobs = 1 > Number of reduce tasks determined at compile time: 1 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer= > In order to limit the maximum number of reducers: > set hive.exec.reducers.max= > In order to set a constant number of reducers: > set mapred.reduce.tasks= > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:154) > at > org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:373) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:379) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:285) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:155) > at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) > Job Submission failed with exception > 'java.lang.RuntimeException(java.lang.NullPointerException)' > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.ExecDriver > hive> > > > On Thu, Sep 17, 2009 at 12:36 PM, Namit Jain wrote: > >> https://issues.apache.org/jira/browse/HIVE-838 >> >> is a blocker for 0.4 - >> Once this is merged, I will have another release candidate >> >> >> -Original Message- >> From: Johan Oskarsson [mailto:jo...@oskarsson.nu] >> Sent: Wednesday, September 16, 2009 8:29 AM >> To: hive-dev@hadoop.apache.org >> Subject: Re: vote for release candidate for hive >> >> +1 based on running unit tests. >> >> /Johan >> >> Namit Jain wrote: >> > Sorry, was meant for hive-dev@ >> > >> > From: Namit Jain [mailto:nj...@facebook.com] >> > Sent: Tuesday, September 15, 2009 1:30 PM >> > To: hive-u...@hadoop.apache.org >> > Subject: vote for release candidate for hive >> > >> > >> > I have created another release candidate for Hive. >> > >> > >> > >> > https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc1/ >> > >> > >> > >> > >> > >> > Let me know if it is OK to publish this release candidate. >> > >> > >> > >> > The only change from the previous candidate ( >> https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc0/) is >> the fix for >> > >> > https://issues.apache.org/jira/browse/HIVE-718 >> > >> > >> > >> > >> > >> > >> > >> > Thanks, >> > >> > -namit >> > >> > >> > >> > >> >> >
Re: vote for release candidate for hive
I recently switched to the 0.4 branch to do some testing and I'm running into a problem. When I run a query from the cli - the first one works, but the second query always fails with a NullPointerException. Did anyone else run into this ? Thanks -Matt hive> select count(1) from table1; Total MapReduce jobs = 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapred.reduce.tasks= Starting Job = job_200909171501_0001, Tracking URL = http://mustique:50030/jobdetails.jsp?jobid=job_200909171501_0001 Kill Command = /home/hadoop/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=mustique:9001 -kill job_200909171501_0001 2009-09-17 03:05:54,855 map = 0%, reduce =0% 2009-09-17 03:06:02,895 map = 22%, reduce =0% 2009-09-17 03:06:06,933 map = 44%, reduce =0% 2009-09-17 03:06:11,965 map = 67%, reduce =0% 2009-09-17 03:06:15,988 map = 89%, reduce =0% 2009-09-17 03:06:20,009 map = 100%, reduce =0% 2009-09-17 03:06:25,036 map = 100%, reduce =11% 2009-09-17 03:06:30,054 map = 100%, reduce =15% 2009-09-17 03:06:31,063 map = 100%, reduce =22% 2009-09-17 03:06:34,075 map = 100%, reduce =26% 2009-09-17 03:06:36,101 map = 100%, reduce =100% Ended Job = job_200909171501_0001 OK 274087 Time taken: 45.401 seconds hive> select count(1) from table1; Total MapReduce jobs = 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapred.reduce.tasks= java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:154) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:373) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:379) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:285) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:155) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) Job Submission failed with exception 'java.lang.RuntimeException(java.lang.NullPointerException)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ExecDriver hive> On Thu, Sep 17, 2009 at 12:36 PM, Namit Jain wrote: > https://issues.apache.org/jira/browse/HIVE-838 > > is a blocker for 0.4 - > Once this is merged, I will have another release candidate > > > -Original Message- > From: Johan Oskarsson [mailto:jo...@oskarsson.nu] > Sent: Wednesday, September 16, 2009 8:29 AM > To: hive-dev@hadoop.apache.org > Subject: Re: vote for release candidate for hive > > +1 based on running unit tests. > > /Johan > > Namit Jain wrote: > > Sorry, was meant for hive-dev@ > > > > From: Namit Jain [mailto:nj...@facebook.com] > > Sent: Tuesday, September 15, 2009 1:30 PM > > To: hive-u...@hadoop.apache.org > > Subject: vote for release candidate for hive > > > > > > I have created another release candidate for Hive. > > > > > > > > https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc1/ > > > > > > > > > > > > Let me know if it is OK to publish this release candidate. > > > > > > > > The only change from the previous candidate ( > https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc0/) is > the fix for > > > > https://issues.apache.org/jira/browse/HIVE-718 > > > > > > > > > > > > > > > > Thanks, > > > > -namit > > > > > > > > > >
[jira] Updated: (HIVE-841) Context.java Uses Deleted (previously Deprecated) Hadoop Methods
[ https://issues.apache.org/jira/browse/HIVE-841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cyrus Katrak updated HIVE-841: -- Status: Patch Available (was: Open) > Context.java Uses Deleted (previously Deprecated) Hadoop Methods > > > Key: HIVE-841 > URL: https://issues.apache.org/jira/browse/HIVE-841 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.5.0 >Reporter: Cyrus Katrak > Attachments: hive841.patch > > > Building Hive against Trunk/Nightly Hadoop Fails > (ql/src/java/org/apache/hadoop/hive/ql/Context.java) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-819) Add lazy decompress ability to RCFile
[ https://issues.apache.org/jira/browse/HIVE-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756674#action_12756674 ] Ning Zhang commented on HIVE-819: - A few general comments: 1) Can you briefly summarize the current approach of how decompression is done and the your proposal to the lazy decompression? Also more comments in the code would be much helpful. 2) Does the performance regression by 4 secs with the query predicate duration > 8 consistent or intermittent? If it is the former is there any additional changes that causes this regression (I thought the worst case would be decompress all columns, as you mentioned, which is equivalent to the previous behavior?). If the latter, what method of timing are you using? If you have YourKit can your also do CPU profiling? > Add lazy decompress ability to RCFile > - > > Key: HIVE-819 > URL: https://issues.apache.org/jira/browse/HIVE-819 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor, Serializers/Deserializers >Reporter: He Yongqiang >Assignee: He Yongqiang > Fix For: 0.5.0 > > Attachments: hive-819-2009-9-12.patch > > > This is especially useful for a filter scanning. > For example, for query 'select a, b, c from table_rc_lazydecompress where > a>1;' we only need to decompress the block data of b,c columns when one row's > column 'a' in that block satisfies the filter condition. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-841) Context.java Uses Deleted (previously Deprecated) Hadoop Methods
[ https://issues.apache.org/jira/browse/HIVE-841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cyrus Katrak updated HIVE-841: -- Attachment: hive841.patch > Context.java Uses Deleted (previously Deprecated) Hadoop Methods > > > Key: HIVE-841 > URL: https://issues.apache.org/jira/browse/HIVE-841 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.5.0 >Reporter: Cyrus Katrak > Attachments: hive841.patch > > > Building Hive against Trunk/Nightly Hadoop Fails > (ql/src/java/org/apache/hadoop/hive/ql/Context.java) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-841) Context.java Uses Deleted (previously Deprecated) Hadoop Methods
Context.java Uses Deleted (previously Deprecated) Hadoop Methods Key: HIVE-841 URL: https://issues.apache.org/jira/browse/HIVE-841 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.5.0 Reporter: Cyrus Katrak Building Hive against Trunk/Nightly Hadoop Fails (ql/src/java/org/apache/hadoop/hive/ql/Context.java) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756662#action_12756662 ] Namit Jain commented on HIVE-78: I think, we should spend some time on finalizing the functionality before implementing it - it is very difficult to change something once it is out, due to all kinds of backward compatibility issues. For the syntax, AS wont it be simpler to add permissions to a role, and then assign roles to a user. GRANT WITH_GRANT,RC, ON '*' TO 'USER1','USER2' AS my_permission ALTER GRANT my_permission add USER 'USER3' Can I revoke some privileges from my_permissions ? If yes, how is it different from doing the two things differently ? CREATE ROLE my_permission AS GRANT WITH_GRANT,RC, ON '*' ; GRANT my_permission to USER1, USER2; later GRANT my_permission to USER3; > Authentication infrastructure for Hive > -- > > Key: HIVE-78 > URL: https://issues.apache.org/jira/browse/HIVE-78 > Project: Hadoop Hive > Issue Type: New Feature > Components: Server Infrastructure >Reporter: Ashish Thusoo >Assignee: Edward Capriolo > Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, > hive-78.diff > > > Allow hive to integrate with existing user repositories for authentication > and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-837) virtual column support (filename) in hive
[ https://issues.apache.org/jira/browse/HIVE-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756658#action_12756658 ] Prasad Chakka commented on HIVE-837: buckets have other semantic meaning which is not the case for files so we should not lump buckets with meta/virtual columns. we could possibly add a virtual column/udf called bucket() for that. mysql gives lot of virtual data as udfs (curtime(), database(), current_user(), default(column)) etc instead of virtual columns. i think it makes sense to make them udfs just incase some virtual columns need arguments. > virtual column support (filename) in hive > - > > Key: HIVE-837 > URL: https://issues.apache.org/jira/browse/HIVE-837 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Namit Jain > > Copying from some mails: > I am dumping files into a hive partion on five minute intervals. I am using > LOAD DATA into a partition. > weblogs > web1.00 > web1.05 > web1.10 > ... > web2.00 > web2.05 > web1.10 > > Things that would be useful.. > Select files from the folder with a regex or exact name > select * FROM logs where FILENAME LIKE(WEB1*) > select * FROM LOGS WHERE FILENAME=web2.00 > Also it would be nice to be able to select offsets in a file, this would make > sense with appends > select * from logs WHERE FILENAME=web2.00 FROMOFFSET=454644 [TOOFFSET=] > select > substr(filename, 4, 7) as class_A, > substr(filename, 8, 10) as class_B > count( x ) as cnt > from FOO > group by > substr(filename, 4, 7), > substr(filename, 8, 10) ; > Hive should support virtual columns -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-837) virtual column support (filename) in hive
[ https://issues.apache.org/jira/browse/HIVE-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756630#action_12756630 ] Namit Jain commented on HIVE-837: - yesterday, i was having a offline conversation with Raghu, and we were thinking that this is similar to the concept of buckets that exists currently. So, do we enhance the tablesample clause to include filenames also, and not expose it as a virtual column at all ? I think it is more intuitive to have filenames in the where clause - maybe, we should have some virtual columns for buckets also and leave the current syntax for buckets as is for backward compatibility. File pruning is must - so, having the filename as udf might be more difficult. The udf filename() will return the same value at compile time. So, I would prefer virtual columns instead of filenames. SELECT * FROM weblogs DATAFILE ('log1.txt', 'log2.txt') WHERE col1='..' and col2= ... would solve the pruning problem since the file names are part of the syntax, but how do you propose to select the filename in that case ? So, I think the original syntax: select * FROM logs where FILENAME LIKE(WEB1*) might be easier > virtual column support (filename) in hive > - > > Key: HIVE-837 > URL: https://issues.apache.org/jira/browse/HIVE-837 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Namit Jain > > Copying from some mails: > I am dumping files into a hive partion on five minute intervals. I am using > LOAD DATA into a partition. > weblogs > web1.00 > web1.05 > web1.10 > ... > web2.00 > web2.05 > web1.10 > > Things that would be useful.. > Select files from the folder with a regex or exact name > select * FROM logs where FILENAME LIKE(WEB1*) > select * FROM LOGS WHERE FILENAME=web2.00 > Also it would be nice to be able to select offsets in a file, this would make > sense with appends > select * from logs WHERE FILENAME=web2.00 FROMOFFSET=454644 [TOOFFSET=] > select > substr(filename, 4, 7) as class_A, > substr(filename, 8, 10) as class_B > count( x ) as cnt > from FOO > group by > substr(filename, 4, 7), > substr(filename, 8, 10) ; > Hive should support virtual columns -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: vote for release candidate for hive
https://issues.apache.org/jira/browse/HIVE-838 is a blocker for 0.4 - Once this is merged, I will have another release candidate -Original Message- From: Johan Oskarsson [mailto:jo...@oskarsson.nu] Sent: Wednesday, September 16, 2009 8:29 AM To: hive-dev@hadoop.apache.org Subject: Re: vote for release candidate for hive +1 based on running unit tests. /Johan Namit Jain wrote: > Sorry, was meant for hive-dev@ > > From: Namit Jain [mailto:nj...@facebook.com] > Sent: Tuesday, September 15, 2009 1:30 PM > To: hive-u...@hadoop.apache.org > Subject: vote for release candidate for hive > > > I have created another release candidate for Hive. > > > > https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc1/ > > > > > > Let me know if it is OK to publish this release candidate. > > > > The only change from the previous candidate > (https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc0/) is the > fix for > > https://issues.apache.org/jira/browse/HIVE-718 > > > > > > > > Thanks, > > -namit > > > >
[jira] Commented: (HIVE-837) virtual column support (filename) in hive
[ https://issues.apache.org/jira/browse/HIVE-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756563#action_12756563 ] Edward Capriolo commented on HIVE-837: -- It would be nice and very useful. sometimes I want to select my own 'partition' or 'datafile' explicitly.. something like below: SELECT * FROM weblogs PARTITION ('2009-09-17', '2009-09-18') WHERE col1='..' and col2= ... Or users can select data files from directory: SELECT * FROM weblogs DATAFILE ('log1.txt', 'log2.txt') WHERE col1='..' and col2= ... > virtual column support (filename) in hive > - > > Key: HIVE-837 > URL: https://issues.apache.org/jira/browse/HIVE-837 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Namit Jain > > Copying from some mails: > I am dumping files into a hive partion on five minute intervals. I am using > LOAD DATA into a partition. > weblogs > web1.00 > web1.05 > web1.10 > ... > web2.00 > web2.05 > web1.10 > > Things that would be useful.. > Select files from the folder with a regex or exact name > select * FROM logs where FILENAME LIKE(WEB1*) > select * FROM LOGS WHERE FILENAME=web2.00 > Also it would be nice to be able to select offsets in a file, this would make > sense with appends > select * from logs WHERE FILENAME=web2.00 FROMOFFSET=454644 [TOOFFSET=] > select > substr(filename, 4, 7) as class_A, > substr(filename, 8, 10) as class_B > count( x ) as cnt > from FOO > group by > substr(filename, 4, 7), > substr(filename, 8, 10) ; > Hive should support virtual columns -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-819) Add lazy decompress ability to RCFile
[ https://issues.apache.org/jira/browse/HIVE-819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao reassigned HIVE-819: --- Assignee: He Yongqiang > Add lazy decompress ability to RCFile > - > > Key: HIVE-819 > URL: https://issues.apache.org/jira/browse/HIVE-819 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor, Serializers/Deserializers >Reporter: He Yongqiang >Assignee: He Yongqiang > Fix For: 0.5.0 > > Attachments: hive-819-2009-9-12.patch > > > This is especially useful for a filter scanning. > For example, for query 'select a, b, c from table_rc_lazydecompress where > a>1;' we only need to decompress the block data of b,c columns when one row's > column 'a' in that block satisfies the filter condition. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.