Re: Creating branch-1

2015-06-03 Thread Vinod Kumar Vavilapalli
Hadoop uses a "Target Version" field. Not sure if this was done for all 
projects.

+Vinod

On Jun 3, 2015, at 9:16 AM, Alan Gates 
mailto:alanfga...@gmail.com>> wrote:

I don't think using Affects Version will work because it is used to list which 
versions of Hive the bug affects, unless you're proposing being able to parse 
affected version into branch (ie 1.3.0 => branch-1).

I like the idea of customizing JIRA, though I don't know how hard it is.

We could also use the labels field.  It would run against master by default and 
you could also add a label to run against an additional branch.  It would have 
to find a patch matching that branch in order to run.

Alan.

[cid:part1.08000808.03000103@gmail.com]
Thejas Nair
June 3, 2015 at 7:51
Thanks for the insights Sergio!
Using 'Affects Version' sounds like a good idea. However, for the case
where it needs to be executed against both branch-1 and master, I
think it would be more intuitive to use
"Affects Version/s: branch-master branch-1 " , as the version
number in master branch will keep increasing.

We might be able to request for a custom field in jira (say "Test
branches") for this as well. But we could probably start with the
'Affects Version' approach.
[cid:part1.08000808.03000103@gmail.com]
Sergio Pena
June 2, 2015 at 15:03
Hi Alan,

Currently, the test system executes tests on a specific branch only if
there is a Jenkins job assigned to it, like trunk or spark. Any other
branch will not work. We will need to create a job for branch-1, modify the
jenkins-submit-build.sh to add the new profile, and add a new properties
file to the Jenkins instance that contains branch information.

This is a little tedious for every branch we create.

Also, I don't think the test system will grab two patches (branch-1 &
master) to execute the tests on different branches. It will get the latest
one you uploaded.

What about if we use the 'Affects Version/s' field of the ticket to specify
which branches the patch needs to be executed? Or as you said, use hints on
the comments.

For instance:
- Affects Version/s: branch-1 # Tests on branch-1 only
- Affects Version/s: 2.0.0 branch-1 # Tests on branch-1 and master
- Affects Version/s: branch-spark # Tests on branch-spark only

If we use 'branch-xxx' as a naming convention for our branches, then we can
detect the branch from the ticket details. And if x.x.x version is
specified, then just execute them from master.

Also, branch-1 would need to be executed with MR1, right? Then the patch
file would need to be named 'HIVE--mr1.patch' so that it uses the MR1
environment.

Right now the code that parses this info is on process_jira function on
'jenkins-common.sh', and it is called by 'jenkins-submit-build.sh'. We can
parse different branches there, and let jenkins-submit-build.sh call the
correct job with specific branch details.

Any other ideas?

- Sergio



[cid:part1.08000808.03000103@gmail.com]
Alan Gates
June 1, 2015 at 16:19
Based on our discussion and vote last week I'm working on creating branch-1.   
I plan to make the branch tomorrow.  If anyone has a large commit they don't 
want to have to commit twice and they are close to committing it let me know so 
I can make sure it gets in before I branch.

I'll also be updating 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute to clarify how 
to handle feature and bug fix patches on master and branch-1.

Also, we will need to make sure patches can be tested against master and 
branch-1.  If I understand correctly the test system today will run a patch 
against a branch instead of master if the patch is named with the branch name.  
There are a couple of issues with this.  One, people will often want to submit 
two versions of patches and have them both tested (one against master and one 
against branch-1) rather than one or the other.  The second is we will want a 
way for one patch to be tested against both when appropriate.  The first case 
could be handled by the system picking up both branch-1 and master patches and 
running them automatically.  The second could be handled by hints in the 
comments so the system needs to run both.  I'm open to other suggestions as 
well.  Can someone familiar with the testing code point to where I'd look to 
see what it would take to make this work?

Alan.



Re: Preparation for Hive-1.2 release

2015-04-27 Thread Vinod Kumar Vavilapalli
Hi,

Coming from the Apache Hadoop community.

We made a recent release of 2.7.0 [1]. We are calling it not-yet-ready with one 
of the primary intentions to get it vetted through downstream projects. We plan 
to release a 2.7.1 (or 2.7.2) when it is considered stable enough for most of 
our users.

Can Hive move to 2.7.0, at least with the initial intention of 
testing/validation?

Thanks,
+Vinod

[1] [ANNOUNCE] Apache Hadoop 2.7.0 Release 
http://markmail.org/thread/ytisa4w73ym4ee65

On Apr 21, 2015, at 8:33 PM, Sushanth Sowmyan 
mailto:khorg...@gmail.com>> wrote:

Hi Folks,

Per my mail 3 weeks back, we should start getting ready to release 1.2
as a rollup. And as per my proposal to manage this release, I'd like
to start off the process of forking 1.2, and making trunk 1.3.

I've set up a cwiki page for people to land development patches that
are almost done, to signal their desire that this be included in 1.2 :
https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status

A rough timeline I see for this process would be to fork this Friday
(24th Apr), and then start rolling out RC0 by, say, Wednesday next
week. This would mean that I would request that if you want your jira
included in 1.2, it be close to completion, or have a patch available
for review. By mid next week, also, I expect to freeze the wiki
inclusion list for features, and keep it open only for bugfixes
discovered during testing the various RCs.

Please feel free to edit that jira with your requests, or, if you
don't have edit privileges, if you reply to this mail, I can add it
in. (Also, if you don't have wiki edit privileges, you should probably
ask for it. :p)

Thanks!
-Sushanth



Re: [ANNOUNCE] New Hive PMC Member - Sergey Shelukhin

2015-02-26 Thread Vinod Kumar Vavilapalli
Congratulations and keep up the great work!

+Vinod

On Feb 25, 2015, at 8:43 AM, Carl Steinbach  wrote:

> I am pleased to announce that Sergey Shelukhin has been elected to the Hive 
> Project Management Committee. Please join me in congratulating Sergey!
> 
> Thanks.
> 
> - Carl
> 



Re: [ANNOUNCE] New Hive Committers - Prasanth J and Vaibhav Gumashta

2014-04-25 Thread Vinod Kumar Vavilapalli
Congratulations folks!

+Vinod


On Thu, Apr 24, 2014 at 7:26 PM, Carl Steinbach  wrote:

> The Apache Hive PMC has voted to make Prasanth J and Vaibhav
> Gumashta committers on the Apache Hive Project.
>
> Please join me in congratulating Prasanth and Vaibhav!
>
> Thanks.
>
> - Carl
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Commented] (HIVE-6900) HostUtil.getTaskLogUrl signature change causes compilation to fail

2014-04-24 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980111#comment-13980111
 ] 

Vinod Kumar Vavilapalli commented on HIVE-6900:
---

We can fix it in 2.4.1 and Hive can depend on that release if that is the route.

I see you filed MAPREDUCE-5857. It is strictly a YARN issue, I'll move it to 
the right sub-project.

> HostUtil.getTaskLogUrl signature change causes compilation to fail
> --
>
> Key: HIVE-6900
> URL: https://issues.apache.org/jira/browse/HIVE-6900
> Project: Hive
>  Issue Type: Bug
>  Components: Shims
>Affects Versions: 0.13.0, 0.14.0
>Reporter: Chris Drome
> Attachments: HIVE-6900.1.patch.txt
>
>
> The signature for HostUtil.getTaskLogUrl has changed between Hadoop-2.3 and 
> Hadoop-2.4.
> Code in 
> shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java 
> works with Hadoop-2.3 method and causes compilation failure with Hadoop-2.4.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6900) HostUtil.getTaskLogUrl signature change causes compilation to fail

2014-04-23 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979014#comment-13979014
 ] 

Vinod Kumar Vavilapalli commented on HIVE-6900:
---

I looked at the issue together with [~jdere]. Haven't reviewed the patch but 
overall this can let the compilation pass. The eventual link is used elsewhere 
in Hive to pull the logs and do some processing. The link used in the patch 
will still not work as the URLs changed completely.

We can do this in two halves
 - Fix compilation for now
 - And then follow up in YARN with a right API that can expose logs to users 
and change Hive to use that.

For the compilation fix, we can put back the previous API in YARN via 
MAPREDUCE-5830 or we can do the fix as done here in Hive.

Thoughts?

> HostUtil.getTaskLogUrl signature change causes compilation to fail
> --
>
> Key: HIVE-6900
> URL: https://issues.apache.org/jira/browse/HIVE-6900
> Project: Hive
>  Issue Type: Bug
>  Components: Shims
>Affects Versions: 0.13.0, 0.14.0
>Reporter: Chris Drome
> Attachments: HIVE-6900.1.patch.txt
>
>
> The signature for HostUtil.getTaskLogUrl has changed between Hadoop-2.3 and 
> Hadoop-2.4.
> Code in 
> shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java 
> works with Hadoop-2.3 method and causes compilation failure with Hadoop-2.4.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support

2014-03-04 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1392#comment-1392
 ] 

Vinod Kumar Vavilapalli commented on HIVE-5317:
---

bq. MAPREDUCE-279, at 109, currently out scores us. There may be others, but it 
would be cool to have more watchers than Yarn.
Hehe, looks like we have a race. I'll go ask some of us YARN folks who are also 
watching this JIRA to stop watching this one :D

> Implement insert, update, and delete in Hive with full ACID support
> ---
>
> Key: HIVE-5317
> URL: https://issues.apache.org/jira/browse/HIVE-5317
> Project: Hive
>  Issue Type: New Feature
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: InsertUpdatesinHive.pdf
>
>
> Many customers want to be able to insert, update and delete rows from Hive 
> tables with full ACID support. The use cases are varied, but the form of the 
> queries that should be supported are:
> * INSERT INTO tbl SELECT …
> * INSERT INTO tbl VALUES ...
> * UPDATE tbl SET … WHERE …
> * DELETE FROM tbl WHERE …
> * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN 
> ...
> * SET TRANSACTION LEVEL …
> * BEGIN/END TRANSACTION
> Use Cases
> * Once an hour, a set of inserts and updates (up to 500k rows) for various 
> dimension tables (eg. customer, inventory, stores) needs to be processed. The 
> dimension tables have primary keys and are typically bucketed and sorted on 
> those keys.
> * Once a day a small set (up to 100k rows) of records need to be deleted for 
> regulatory compliance.
> * Once an hour a log of transactions is exported from a RDBS and the fact 
> tables need to be updated (up to 1m rows)  to reflect the new data. The 
> transactions are a combination of inserts, updates, and deletes. The table is 
> partitioned and bucketed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6098) Merge Tez branch into trunk

2013-12-23 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855980#comment-13855980
 ] 

Vinod Kumar Vavilapalli commented on HIVE-6098:
---

If it is not too late, how about changing _hive.optimize.tez_ to be 
_hive.execution-engine_ taking values _(MapReduce, Tez] etc)_ or something like 
that?

> Merge Tez branch into trunk
> ---
>
> Key: HIVE-6098
> URL: https://issues.apache.org/jira/browse/HIVE-6098
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 0.12.0
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-6098.1.patch, HIVE-6098.2.patch, HIVE-6098.3.patch, 
> hive-on-tez-conf.txt
>
>
> I think the Tez branch is at a point where we can consider merging it back 
> into trunk after review. 
> Tez itself has had its first release, most hive features are available on Tez 
> and the test coverage is decent. There are a few known limitations, all of 
> which can be handled in trunk as far as I can tell (i.e.: None of them are 
> large disruptive changes that still require a branch.)
> Limitations:
> - Union all is not yet supported on Tez
> - SMB is not yet supported on Tez
> - Bucketed map-join is executed as broadcast join (bucketing is ignored)
> Since the user is free to toggle hive.optimize.tez, it's obviously possible 
> to just run these on MR.
> I am hoping to follow the approach that was taken with vectorization and 
> shoot for a merge instead of single commit. This would retain history of the 
> branch. Also in vectorization we required at least three +1s before merge, 
> I'm hoping to go with that as well.
> I will add a combined patch to this ticket for review purposes (not for 
> commit). I'll also attach instructions to run on a cluster if anyone wants to 
> try.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: [ANNOUNCE] New Hive Committers - Jitendra Nath Pandey and Eric Hanson

2013-11-21 Thread Vinod Kumar Vavilapalli
Congratulations to both! Great job and keep up the good work!

Thanks,
+Vinod

On Nov 21, 2013, at 3:29 PM, Carl Steinbach wrote:

> The Apache Hive PMC has voted to make Jitendra Nath Pandey and Eric Hanson
> committers on the Apache Hive project.
> 
> Please join me in congratulating Jitendra and Eric!
> 
> Thanks.
> 
> Carl


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: HowTo: Debugging/Running localmode on YARN ?

2013-11-19 Thread Vinod Kumar Vavilapalli

Local mapreduce job-runner was never supported/tested with HDFS though it could 
logically work.

Thanks,
+Vinod


On Nov 19, 2013, at 4:58 AM, Remus Rusanu wrote:

> Hello all,
> 
> I just discovered that with 23 shims the localmode is driven by
> 
> SET  mapreduce.framework.name=local;
> 
> not by the traditional SET mapred.job.tracker=local; Has anyone put together 
> a how-to for debugging/running localmode on Yarn, like Thejas had for classic 
> Hadoop at 
> http://hadoop-pig-hive-thejas.blogspot.ie/2013/04/running-hive-in-local-mode.html
>  ?
> 
> My specific issue is that on localmode I get error launching the job due to 
> missing HDFS file:
> 
> java.io.FileNotFoundException: File does not exist: 
> hdfs://sandbox.hortonworks.com:8020/usr/lib/hcatalog/share/hcatalog/hcatalog-core.jar
>at 
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>at 
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>at 
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>at 
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>at 
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>at 
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:300)
>at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:387)
>at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
>at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
>at java.security.AccessController.doPrivileged(Native Method)
>at javax.security.auth.Subject.doAs(Subject.java:396)
>at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
>at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>at java.security.AccessController.doPrivileged(Native Method)
>at javax.security.auth.Subject.doAs(Subject.java:396)
>at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:433)
>at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:741)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>at java.lang.reflect.Method.invoke(Method.java:597)
>at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 
> Changing SET fs.default.name=file:///tmp; 'solves' the error, but I'm a bit 
> confused why using the (valid and running!) HDFS does not work. It seems to 
> me that the HDFS resource in question is just a concat of the default FS with 
> a localpath, not a valid HDFS name...
> 
> Thanks,
> ~Remus


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: FYI Hive trunk has moved to maven

2013-10-31 Thread Vinod Kumar Vavilapalli
Awesome, great effort!

Thanks,
+Vinod


On Oct 31, 2013, at 12:11 PM, Brock Noland wrote:

> More details here
> 
> https://issues.apache.org/jira/browse/HIVE-5610
> 
> How to configure your development environment is here:
> 
> https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [ANNOUNCE] New Hive Committer - Gunther Hagleitner

2013-07-22 Thread Vinod Kumar Vavilapalli

Congratulations!

Thanks,
+Vinod

On Jul 21, 2013, at 1:00 AM, Carl Steinbach wrote:

> The Apache Hive PMC has voted to make Gunther Hagleitner a
> committer on the Apache Hive project.
> 
> Congratulations Gunther!
> 
> Carl



[jira] [Commented] (HIVE-4801) hive.mapred.map.tasks.speculative.execution is not used to configure Hadoop jobs

2013-07-08 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702697#comment-13702697
 ] 

Vinod Kumar Vavilapalli commented on HIVE-4801:
---

Or just deprecate hive.mapred.reduce.tasks.speculative.execution.

> hive.mapred.map.tasks.speculative.execution is not used to configure Hadoop 
> jobs
> 
>
> Key: HIVE-4801
> URL: https://issues.apache.org/jira/browse/HIVE-4801
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration
>Affects Versions: 0.10.0
>Reporter: Chu Tong
>Assignee: Chu Tong
> Attachments: HIVE-4801.patch
>
>
> Hive does not honor hive.mapred.map.tasks.speculative.execution parameter 
> while it comes to configuring hadoop jobs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4160) Vectorized Query Execution in Hive

2013-07-04 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699820#comment-13699820
 ] 

Vinod Kumar Vavilapalli commented on HIVE-4160:
---

A huge +1 to that. Having a common set of operators will be a huge win. That 
said, I already see that the current branch follows Hive's operator base 
classes, uses HiveConf etc. I believe with little effort, this can be cleaned 
and pulled apart into one separate maven module that everyone can use.

Some points to think about:
 - The target location of the module. The dependency graph can become 
un-wieldly.
 - Given the use of base Operator, OperatorDesc etc from Hive, if at all there 
is interest and commitment, we should do this ASAP when we only have a handful 
of operators.
 - Make one other project demonstrate how it can be reused across ecosystem 
projects, PIG will be great - just a few operators will be a great start 

Thoughts?

> Vectorized Query Execution in Hive
> --
>
> Key: HIVE-4160
> URL: https://issues.apache.org/jira/browse/HIVE-4160
> Project: Hive
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Hive-Vectorized-Query-Execution-Design.docx, 
> Hive-Vectorized-Query-Execution-Design-rev2.docx, 
> Hive-Vectorized-Query-Execution-Design-rev3.docx, 
> Hive-Vectorized-Query-Execution-Design-rev3.docx, 
> Hive-Vectorized-Query-Execution-Design-rev3.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev4.docx, 
> Hive-Vectorized-Query-Execution-Design-rev4.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev5.docx, 
> Hive-Vectorized-Query-Execution-Design-rev5.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev6.docx, 
> Hive-Vectorized-Query-Execution-Design-rev6.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev7.docx, 
> Hive-Vectorized-Query-Execution-Design-rev8.docx, 
> Hive-Vectorized-Query-Execution-Design-rev8.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev9.docx, 
> Hive-Vectorized-Query-Execution-Design-rev9.pdf
>
>
> The Hive query execution engine currently processes one row at a time. A 
> single row of data goes through all the operators before the next row can be 
> processed. This mode of processing is very inefficient in terms of CPU usage. 
> Research has demonstrated that this yields very low instructions per cycle 
> [MonetDB X100]. Also currently Hive heavily relies on lazy deserialization 
> and data columns go through a layer of object inspectors that identify column 
> type, deserialize data and determine appropriate expression routines in the 
> inner loop. These layers of virtual method calls further slow down the 
> processing. 
> This work will add support for vectorized query execution to Hive, where, 
> instead of individual rows, batches of about a thousand rows at a time are 
> processed. Each column in the batch is represented as a vector of a primitive 
> data type. The inner loop of execution scans these vectors very fast, 
> avoiding method calls, deserialization, unnecessary if-then-else, etc. This 
> substantially reduces CPU time used, and gives excellent instructions per 
> cycle (i.e. improved processor pipeline utilization). See the attached design 
> specification for more details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Supporting an independent build farm

2013-06-16 Thread Vinod Kumar Vavilapalli
This is from someone from Hadoop and who's  been on and off in Hive.

Dedicated test resources is good, but there are other (simpler?) things worth 
pursuing to begin with - suggestions from the peanut gallery:
 - Split the project into modules. Without thinking much, a simple split could 
be client, execution engine, metastore. We did the module split in Hadoop, it 
is initially a bit of pain but pays back a lot in future. And whenever there 
are isolated module changes, only those modules needs to be tested. Also has 
the added benefit of clear modularity.
 - A separate candidate suite of pre-commit tests. It can be a subset of all 
the tests, may be even hand-picked. Sure they won't catch some bugs, but it is 
a reasonable compromise that worked in Hadoop.
 - And wire the pre-commit tests with JIRA/Jenkins.

Thanks,
+Vinod

On Jun 16, 2013, at 11:02 AM, Edward Capriolo wrote:

> Hive's unit test suite has gotten larger as we have added more features and
> thus it takes longer to run. For a single machine duel core with solid
> state disks I have to start a test run at night, and then check the next
> morning to see if the run has finished. (I have been running tests for
> maybe 2 hours and am up to escape.q)
> 
> ::opinion::
> Also for a long time the distribution of which features get reviewed,
> tested, and committed has been unfair. With more people involved in the
> project this situation has gotten better however it is still not fair. What
> sometimes ends up happening is that a good feature, which is reviewed, and
> +1ed sits uncommitted for months or years.
> 
> Some committers or groups of commiters have an agenda and dedicated testing
> resources, and others do not. This unbalances the project. It means that
> small incremental improvements and new features not important to 'large
> company with testing resources x' sit ready to be committed while other
> people working in pairs further the project to their agenda. (This last
> statement is not a condemnation of anyone, just possibly a fact of life)
> 
> ::suggestion::
> 1) The project should sponsor an open and independent build/test farm
> 2) Once a ticket is marked 'patch available' this build farm should
> automatically notice this and begin testing the patch
> 3) patches/issues which pass tests first should be considered 1st for
> inclusions
> 
> We can use a hosted testing service such as:
> http://www.cloudbees.com/platform/pricing/devcloud.cb
> 
> Q. Do any committers/interested parties like the idea?
> Q. Would anyone be interested in dedicating financial resources to getting
> this off the ground (I am)
> 
> Q. Does anyone have ideas for a better platform or a better system



[jira] [Updated] (HIVE-3952) merge map-job followed by map-reduce job

2013-04-29 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HIVE-3952:
--

Attachment: HIVE-3952-20130428-branch-0.11-bugfix.txt

Patch with only the bug fix. The previously failing tests pass now.

> merge map-job followed by map-reduce job
> 
>
> Key: HIVE-3952
> URL: https://issues.apache.org/jira/browse/HIVE-3952
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>    Assignee: Vinod Kumar Vavilapalli
> Fix For: 0.11.0
>
> Attachments: hive.3952.1.patch, HIVE-3952-20130226.txt, 
> HIVE-3952-20130227.1.txt, HIVE-3952-20130301.txt, HIVE-3952-20130421.txt, 
> HIVE-3952-20130424.txt, HIVE-3952-20130428-branch-0.11-bugfix.txt, 
> HIVE-3952-20130428-branch-0.11.txt, HIVE-3952-20130428-branch-0.11-v2.txt
>
>
> Consider the query like:
> select count(*) FROM
> ( select idOne, idTwo, value FROM
>   bigTable   
>   JOIN
> 
>   smallTableOne on (bigTable.idOne = smallTableOne.idOne) 
>   
>   ) firstjoin 
> 
> JOIN  
> 
> smallTableTwo on (firstjoin.idTwo = smallTableTwo.idTwo);
> where smallTableOne and smallTableTwo are smaller than 
> hive.auto.convert.join.noconditionaltask.size and
> hive.auto.convert.join.noconditionaltask is set to true.
> The joins are collapsed into mapjoins, and it leads to a map-only job
> (for the map-joins) followed by a map-reduce job (for the group by).
> Ideally, the map-only job should be merged with the following map-reduce job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3952) merge map-job followed by map-reduce job

2013-04-28 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HIVE-3952:
--

Attachment: HIVE-3952-20130428-branch-0.11-v2.txt

That was a case of bad merge, here's the correct one. All these failing tests 
pass for me now..

> merge map-job followed by map-reduce job
> 
>
> Key: HIVE-3952
> URL: https://issues.apache.org/jira/browse/HIVE-3952
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>    Assignee: Vinod Kumar Vavilapalli
> Fix For: 0.11.0
>
> Attachments: hive.3952.1.patch, HIVE-3952-20130226.txt, 
> HIVE-3952-20130227.1.txt, HIVE-3952-20130301.txt, HIVE-3952-20130421.txt, 
> HIVE-3952-20130424.txt, HIVE-3952-20130428-branch-0.11.txt, 
> HIVE-3952-20130428-branch-0.11-v2.txt
>
>
> Consider the query like:
> select count(*) FROM
> ( select idOne, idTwo, value FROM
>   bigTable   
>   JOIN
> 
>   smallTableOne on (bigTable.idOne = smallTableOne.idOne) 
>   
>   ) firstjoin 
> 
> JOIN  
> 
> smallTableTwo on (firstjoin.idTwo = smallTableTwo.idTwo);
> where smallTableOne and smallTableTwo are smaller than 
> hive.auto.convert.join.noconditionaltask.size and
> hive.auto.convert.join.noconditionaltask is set to true.
> The joins are collapsed into mapjoins, and it leads to a map-only job
> (for the map-joins) followed by a map-reduce job (for the group by).
> Ideally, the map-only job should be merged with the following map-reduce job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3952) merge map-job followed by map-reduce job

2013-04-28 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644239#comment-13644239
 ] 

Vinod Kumar Vavilapalli commented on HIVE-3952:
---

Sure, looking..

> merge map-job followed by map-reduce job
> 
>
> Key: HIVE-3952
> URL: https://issues.apache.org/jira/browse/HIVE-3952
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>    Assignee: Vinod Kumar Vavilapalli
> Fix For: 0.11.0
>
> Attachments: hive.3952.1.patch, HIVE-3952-20130226.txt, 
> HIVE-3952-20130227.1.txt, HIVE-3952-20130301.txt, HIVE-3952-20130421.txt, 
> HIVE-3952-20130424.txt, HIVE-3952-20130428-branch-0.11.txt
>
>
> Consider the query like:
> select count(*) FROM
> ( select idOne, idTwo, value FROM
>   bigTable   
>   JOIN
> 
>   smallTableOne on (bigTable.idOne = smallTableOne.idOne) 
>   
>   ) firstjoin 
> 
> JOIN  
> 
> smallTableTwo on (firstjoin.idTwo = smallTableTwo.idTwo);
> where smallTableOne and smallTableTwo are smaller than 
> hive.auto.convert.join.noconditionaltask.size and
> hive.auto.convert.join.noconditionaltask is set to true.
> The joins are collapsed into mapjoins, and it leads to a map-only job
> (for the map-joins) followed by a map-reduce job (for the group by).
> Ideally, the map-only job should be merged with the following map-reduce job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3952) merge map-job followed by map-reduce job

2013-04-28 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HIVE-3952:
--

Attachment: HIVE-3952-20130428-branch-0.11.txt

Patch against branch-0.11 if someone is interested..

> merge map-job followed by map-reduce job
> 
>
> Key: HIVE-3952
> URL: https://issues.apache.org/jira/browse/HIVE-3952
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>    Assignee: Vinod Kumar Vavilapalli
> Fix For: 0.12.0
>
> Attachments: hive.3952.1.patch, HIVE-3952-20130226.txt, 
> HIVE-3952-20130227.1.txt, HIVE-3952-20130301.txt, HIVE-3952-20130421.txt, 
> HIVE-3952-20130424.txt, HIVE-3952-20130428-branch-0.11.txt
>
>
> Consider the query like:
> select count(*) FROM
> ( select idOne, idTwo, value FROM
>   bigTable   
>   JOIN
> 
>   smallTableOne on (bigTable.idOne = smallTableOne.idOne) 
>   
>   ) firstjoin 
> 
> JOIN  
> 
> smallTableTwo on (firstjoin.idTwo = smallTableTwo.idTwo);
> where smallTableOne and smallTableTwo are smaller than 
> hive.auto.convert.join.noconditionaltask.size and
> hive.auto.convert.join.noconditionaltask is set to true.
> The joins are collapsed into mapjoins, and it leads to a map-only job
> (for the map-joins) followed by a map-reduce job (for the group by).
> Ideally, the map-only job should be merged with the following map-reduce job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3952) merge map-job followed by map-reduce job

2013-04-24 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641463#comment-13641463
 ] 

Vinod Kumar Vavilapalli commented on HIVE-3952:
---

Thanks for the patch update, Namit!

Also Ashutosh and Namit again for all the reviews!

> merge map-job followed by map-reduce job
> 
>
> Key: HIVE-3952
> URL: https://issues.apache.org/jira/browse/HIVE-3952
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>    Assignee: Vinod Kumar Vavilapalli
> Attachments: hive.3952.1.patch, HIVE-3952-20130226.txt, 
> HIVE-3952-20130227.1.txt, HIVE-3952-20130301.txt, HIVE-3952-20130421.txt, 
> HIVE-3952-20130424.txt
>
>
> Consider the query like:
> select count(*) FROM
> ( select idOne, idTwo, value FROM
>   bigTable   
>   JOIN
> 
>   smallTableOne on (bigTable.idOne = smallTableOne.idOne) 
>   
>   ) firstjoin 
> 
> JOIN  
> 
> smallTableTwo on (firstjoin.idTwo = smallTableTwo.idTwo);
> where smallTableOne and smallTableTwo are smaller than 
> hive.auto.convert.join.noconditionaltask.size and
> hive.auto.convert.join.noconditionaltask is set to true.
> The joins are collapsed into mapjoins, and it leads to a map-only job
> (for the map-joins) followed by a map-reduce job (for the group by).
> Ideally, the map-only job should be merged with the following map-reduce job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3952) merge map-job followed by map-reduce job

2013-04-24 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HIVE-3952:
--

Status: Patch Available  (was: Open)

> merge map-job followed by map-reduce job
> 
>
> Key: HIVE-3952
> URL: https://issues.apache.org/jira/browse/HIVE-3952
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>    Assignee: Vinod Kumar Vavilapalli
> Attachments: HIVE-3952-20130226.txt, HIVE-3952-20130227.1.txt, 
> HIVE-3952-20130301.txt, HIVE-3952-20130421.txt, HIVE-3952-20130424.txt
>
>
> Consider the query like:
> select count(*) FROM
> ( select idOne, idTwo, value FROM
>   bigTable   
>   JOIN
> 
>   smallTableOne on (bigTable.idOne = smallTableOne.idOne) 
>   
>   ) firstjoin 
> 
> JOIN  
> 
> smallTableTwo on (firstjoin.idTwo = smallTableTwo.idTwo);
> where smallTableOne and smallTableTwo are smaller than 
> hive.auto.convert.join.noconditionaltask.size and
> hive.auto.convert.join.noconditionaltask is set to true.
> The joins are collapsed into mapjoins, and it leads to a map-only job
> (for the map-joins) followed by a map-reduce job (for the group by).
> Ideally, the map-only job should be merged with the following map-reduce job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3952) merge map-job followed by map-reduce job

2013-04-24 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HIVE-3952:
--

Attachment: HIVE-3952-20130424.txt

Sigh the patch is broken again. Updating it.

Also addressed the review comments on the review board. Added one more test for 
validating this.

> merge map-job followed by map-reduce job
> 
>
> Key: HIVE-3952
> URL: https://issues.apache.org/jira/browse/HIVE-3952
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>    Assignee: Vinod Kumar Vavilapalli
> Attachments: HIVE-3952-20130226.txt, HIVE-3952-20130227.1.txt, 
> HIVE-3952-20130301.txt, HIVE-3952-20130421.txt, HIVE-3952-20130424.txt
>
>
> Consider the query like:
> select count(*) FROM
> ( select idOne, idTwo, value FROM
>   bigTable   
>   JOIN
> 
>   smallTableOne on (bigTable.idOne = smallTableOne.idOne) 
>   
>   ) firstjoin 
> 
> JOIN  
> 
> smallTableTwo on (firstjoin.idTwo = smallTableTwo.idTwo);
> where smallTableOne and smallTableTwo are smaller than 
> hive.auto.convert.join.noconditionaltask.size and
> hive.auto.convert.join.noconditionaltask is set to true.
> The joins are collapsed into mapjoins, and it leads to a map-only job
> (for the map-joins) followed by a map-reduce job (for the group by).
> Ideally, the map-only job should be merged with the following map-reduce job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3952) merge map-job followed by map-reduce job

2013-04-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HIVE-3952:
--

Attachment: HIVE-3952-20130421.txt

Thanks for the info Ashutosh.

Attaching updated patch against latest trunk. Also fixes the offending test 
related issues. Latest patch also on review-board. Tx.

> merge map-job followed by map-reduce job
> 
>
> Key: HIVE-3952
> URL: https://issues.apache.org/jira/browse/HIVE-3952
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>    Assignee: Vinod Kumar Vavilapalli
> Attachments: HIVE-3952-20130226.txt, HIVE-3952-20130227.1.txt, 
> HIVE-3952-20130301.txt, HIVE-3952-20130421.txt
>
>
> Consider the query like:
> select count(*) FROM
> ( select idOne, idTwo, value FROM
>   bigTable   
>   JOIN
> 
>   smallTableOne on (bigTable.idOne = smallTableOne.idOne) 
>   
>   ) firstjoin 
> 
> JOIN  
> 
> smallTableTwo on (firstjoin.idTwo = smallTableTwo.idTwo);
> where smallTableOne and smallTableTwo are smaller than 
> hive.auto.convert.join.noconditionaltask.size and
> hive.auto.convert.join.noconditionaltask is set to true.
> The joins are collapsed into mapjoins, and it leads to a map-only job
> (for the map-joins) followed by a map-reduce job (for the group by).
> Ideally, the map-only job should be merged with the following map-reduce job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3952) merge map-job followed by map-reduce job

2013-04-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HIVE-3952:
--

Status: Patch Available  (was: Open)

> merge map-job followed by map-reduce job
> 
>
> Key: HIVE-3952
> URL: https://issues.apache.org/jira/browse/HIVE-3952
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>    Assignee: Vinod Kumar Vavilapalli
> Attachments: HIVE-3952-20130226.txt, HIVE-3952-20130227.1.txt, 
> HIVE-3952-20130301.txt, HIVE-3952-20130421.txt
>
>
> Consider the query like:
> select count(*) FROM
> ( select idOne, idTwo, value FROM
>   bigTable   
>   JOIN
> 
>   smallTableOne on (bigTable.idOne = smallTableOne.idOne) 
>   
>   ) firstjoin 
> 
> JOIN  
> 
> smallTableTwo on (firstjoin.idTwo = smallTableTwo.idTwo);
> where smallTableOne and smallTableTwo are smaller than 
> hive.auto.convert.join.noconditionaltask.size and
> hive.auto.convert.join.noconditionaltask is set to true.
> The joins are collapsed into mapjoins, and it leads to a map-only job
> (for the map-joins) followed by a map-reduce job (for the group by).
> Ideally, the map-only job should be merged with the following map-reduce job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (HIVE-4313) Build fails with OOM in mvn-init stage

2013-04-15 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reopened HIVE-4313:
---


I am also running into this, shouldn't this be 'fixed' somehow? Either address 
why it suddenly increased or just change some build settings, perhaps?

Reopening this anyways..

> Build fails with OOM in mvn-init stage
> --
>
> Key: HIVE-4313
> URL: https://issues.apache.org/jira/browse/HIVE-4313
> Project: Hive
>  Issue Type: Wish
>  Components: Build Infrastructure
> Environment: ubuntu 10.10, 32bit
>Reporter: Navis
>Priority: Minor
>
> Recently hive build fails with OOM frequently with exception like,
> {noformat}
> mvn-init:
>  [echo] hcatalog-server-extensions
>   [get] Destination already exists (skipping): 
> /home/navis/apache/oss-hive/hcatalog/build/maven-ant-tasks-2.1.3.jar
> Caught an exception while logging the end of the build.  Exception was:
> java.lang.OutOfMemoryError: PermGen space
> java.lang.OutOfMemoryError: PermGen space
>   at java.lang.Throwable.getStackTraceElement(Native Method)
>   at java.lang.Throwable.getOurStackTrace(Throwable.java:591)
>   at java.lang.Throwable.printStackTrace(Throwable.java:462)
>   at java.lang.Throwable.printStackTrace(Throwable.java:451)
>   at org.apache.tools.ant.Main.runBuild(Main.java:828)
>   at org.apache.tools.ant.Main.startAnt(Main.java:218)
>   at org.apache.tools.ant.launch.Launcher.run(Launcher.java:280)
>   at org.apache.tools.ant.launch.Launcher.main(Launcher.java:109)
> PermGen space
> {noformat}
> or
> {noformat}
> mvn-init:
>  [echo] hcatalog-server-extensions
>   [get] Destination already exists (skipping): 
> /home/navis/apache/oss-hive/hcatalog/build/maven-ant-tasks-2.1.3.jar
> java.lang.OutOfMemoryError: PermGen space
>   at org.apache.tools.ant.Main.runBuild(Main.java:826)
>   at org.apache.tools.ant.Main.startAnt(Main.java:218)
>   at org.apache.tools.ant.launch.Launcher.run(Launcher.java:280)
>   at org.apache.tools.ant.launch.Launcher.main(Launcher.java:109)
> PermGen space
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4105) Hive MapJoinOperator unnecessarily deserializes values for all join-keys

2013-04-15 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HIVE-4105:
--

Attachment: HIVE-4105-20130415.txt

Yes, the clearing of the row should happen independent of row-generation. 
Attaching updated patch addressing the review comment.

> Hive MapJoinOperator unnecessarily deserializes values for all join-keys
> 
>
> Key: HIVE-4105
> URL: https://issues.apache.org/jira/browse/HIVE-4105
> Project: Hive
>  Issue Type: Bug
>        Reporter: Vinod Kumar Vavilapalli
>    Assignee: Vinod Kumar Vavilapalli
> Attachments: HIVE-4105-20130301.1.txt, HIVE-4105-20130301.txt, 
> HIVE-4105-20130415.txt, HIVE-4105.patch
>
>
> We can avoid this for inner-joins. Hive does an explicit value 
> de-serialization up front so even for those rows which won't emit output. In 
> these cases, we can do just with key de-serialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4105) Hive MapJoinOperator unnecessarily deserializes values for all join-keys

2013-04-09 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HIVE-4105:
--

Assignee: Vinod Kumar Vavilapalli
  Status: Patch Available  (was: Open)

> Hive MapJoinOperator unnecessarily deserializes values for all join-keys
> 
>
> Key: HIVE-4105
> URL: https://issues.apache.org/jira/browse/HIVE-4105
> Project: Hive
>  Issue Type: Bug
>        Reporter: Vinod Kumar Vavilapalli
>    Assignee: Vinod Kumar Vavilapalli
> Attachments: HIVE-4105-20130301.1.txt, HIVE-4105-20130301.txt, 
> HIVE-4105.patch
>
>
> We can avoid this for inner-joins. Hive does an explicit value 
> de-serialization up front so even for those rows which won't emit output. In 
> these cases, we can do just with key de-serialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4105) Hive MapJoinOperator unnecessarily deserializes values for all join-keys

2013-04-09 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HIVE-4105:
--

Attachment: HIVE-4105.patch

Latest patch addressing Vikram's comments.

Created review board request at https://reviews.apache.org/r/10323/.

> Hive MapJoinOperator unnecessarily deserializes values for all join-keys
> 
>
> Key: HIVE-4105
> URL: https://issues.apache.org/jira/browse/HIVE-4105
> Project: Hive
>  Issue Type: Bug
>    Reporter: Vinod Kumar Vavilapalli
> Attachments: HIVE-4105-20130301.1.txt, HIVE-4105-20130301.txt, 
> HIVE-4105.patch
>
>
> We can avoid this for inner-joins. Hive does an explicit value 
> de-serialization up front so even for those rows which won't emit output. In 
> these cases, we can do just with key de-serialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3952) merge map-job followed by map-reduce job

2013-04-05 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HIVE-3952:
--

Status: Patch Available  (was: Open)

> merge map-job followed by map-reduce job
> 
>
> Key: HIVE-3952
> URL: https://issues.apache.org/jira/browse/HIVE-3952
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>    Assignee: Vinod Kumar Vavilapalli
> Attachments: HIVE-3952-20130226.txt, HIVE-3952-20130227.1.txt, 
> HIVE-3952-20130301.txt
>
>
> Consider the query like:
> select count(*) FROM
> ( select idOne, idTwo, value FROM
>   bigTable   
>   JOIN
> 
>   smallTableOne on (bigTable.idOne = smallTableOne.idOne) 
>   
>   ) firstjoin 
> 
> JOIN  
> 
> smallTableTwo on (firstjoin.idTwo = smallTableTwo.idTwo);
> where smallTableOne and smallTableTwo are smaller than 
> hive.auto.convert.join.noconditionaltask.size and
> hive.auto.convert.join.noconditionaltask is set to true.
> The joins are collapsed into mapjoins, and it leads to a map-only job
> (for the map-joins) followed by a map-reduce job (for the group by).
> Ideally, the map-only job should be merged with the following map-reduce job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3952) merge map-job followed by map-reduce job

2013-04-05 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13624177#comment-13624177
 ] 

Vinod Kumar Vavilapalli commented on HIVE-3952:
---

The patch named HIVE-3952-20130227.1.txt still applies on trunk.

Created a review board request: https://reviews.apache.org/r/10321/

> merge map-job followed by map-reduce job
> 
>
> Key: HIVE-3952
> URL: https://issues.apache.org/jira/browse/HIVE-3952
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>    Assignee: Vinod Kumar Vavilapalli
> Attachments: HIVE-3952-20130226.txt, HIVE-3952-20130227.1.txt, 
> HIVE-3952-20130301.txt
>
>
> Consider the query like:
> select count(*) FROM
> ( select idOne, idTwo, value FROM
>   bigTable   
>   JOIN
> 
>   smallTableOne on (bigTable.idOne = smallTableOne.idOne) 
>   
>   ) firstjoin 
> 
> JOIN  
> 
> smallTableTwo on (firstjoin.idTwo = smallTableTwo.idTwo);
> where smallTableOne and smallTableTwo are smaller than 
> hive.auto.convert.join.noconditionaltask.size and
> hive.auto.convert.join.noconditionaltask is set to true.
> The joins are collapsed into mapjoins, and it leads to a map-only job
> (for the map-joins) followed by a map-reduce job (for the group by).
> Ideally, the map-only job should be merged with the following map-reduce job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3952) merge map-job followed by map-reduce job

2013-03-01 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HIVE-3952:
--

Attachment: HIVE-3952-20130301.txt

Ran my new test again, passes. This patch can be applied on top of HIVE-4106.

> merge map-job followed by map-reduce job
> 
>
> Key: HIVE-3952
> URL: https://issues.apache.org/jira/browse/HIVE-3952
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>    Assignee: Vinod Kumar Vavilapalli
> Attachments: HIVE-3952-20130226.txt, HIVE-3952-20130227.1.txt, 
> HIVE-3952-20130301.txt
>
>
> Consider the query like:
> select count(*) FROM
> ( select idOne, idTwo, value FROM
>   bigTable   
>   JOIN
> 
>   smallTableOne on (bigTable.idOne = smallTableOne.idOne) 
>   
>   ) firstjoin 
> 
> JOIN  
> 
> smallTableTwo on (firstjoin.idTwo = smallTableTwo.idTwo);
> where smallTableOne and smallTableTwo are smaller than 
> hive.auto.convert.join.noconditionaltask.size and
> hive.auto.convert.join.noconditionaltask is set to true.
> The joins are collapsed into mapjoins, and it leads to a map-only job
> (for the map-joins) followed by a map-reduce job (for the group by).
> Ideally, the map-only job should be merged with the following map-reduce job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4105) Hive MapJoinOperator unnecessarily deserializes values for all join-keys

2013-03-01 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HIVE-4105:
--

Attachment: HIVE-4105-20130301.1.txt

Patch upmerged to the latest trunk.

> Hive MapJoinOperator unnecessarily deserializes values for all join-keys
> 
>
> Key: HIVE-4105
> URL: https://issues.apache.org/jira/browse/HIVE-4105
> Project: Hive
>  Issue Type: Bug
>        Reporter: Vinod Kumar Vavilapalli
> Attachments: HIVE-4105-20130301.1.txt, HIVE-4105-20130301.txt
>
>
> We can avoid this for inner-joins. Hive does an explicit value 
> de-serialization up front so even for those rows which won't emit output. In 
> these cases, we can do just with key de-serialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4105) Hive MapJoinOperator unnecessarily deserializes values for all join-keys

2013-03-01 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HIVE-4105:
--

Attachment: HIVE-4105-20130301.txt

Here's a patch to avoid value de-serialization where not needed in case of 
inner join.

In my microbenchmark, where I was map-joining a big table, with a small table, 
this brought the task execution time down from 15seconds to 10seconds on about 
3 million records on the big table, the second table being very small and the 
output is small too. Note that you won't see this much of an improvement for 
non-selective inner joins.

If folks are interested, I'll try productionizing the benchmark.

> Hive MapJoinOperator unnecessarily deserializes values for all join-keys
> 
>
> Key: HIVE-4105
> URL: https://issues.apache.org/jira/browse/HIVE-4105
> Project: Hive
>  Issue Type: Bug
>    Reporter: Vinod Kumar Vavilapalli
> Attachments: HIVE-4105-20130301.txt
>
>
> We can avoid this for inner-joins. Hive does an explicit value 
> de-serialization up front so even for those rows which won't emit output. In 
> these cases, we can do just with key de-serialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4105) Hive MapJoinOperator unnecessarily deserializes values for all join-keys

2013-03-01 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created HIVE-4105:
-

 Summary: Hive MapJoinOperator unnecessarily deserializes values 
for all join-keys
 Key: HIVE-4105
 URL: https://issues.apache.org/jira/browse/HIVE-4105
 Project: Hive
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli


We can avoid this for inner-joins. Hive does an explicit value de-serialization 
up front so even for those rows which won't emit output. In these cases, we can 
do just with key de-serialization.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4014) Hive+RCFile is not doing column pruning and reading much more data than necessary

2013-03-01 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13591058#comment-13591058
 ] 

Vinod Kumar Vavilapalli commented on HIVE-4014:
---

Okay, I cannot reproduce this on trunk, though I was consistently hitting this 
on hive-0.10. I'll try hive-0.10 again to be sure some other patch fixed this.

[~tamastarjanyi], what version are you using?

> Hive+RCFile is not doing column pruning and reading much more data than 
> necessary
> -
>
> Key: HIVE-4014
> URL: https://issues.apache.org/jira/browse/HIVE-4014
> Project: Hive
>  Issue Type: Bug
>    Reporter: Vinod Kumar Vavilapalli
>    Assignee: Vinod Kumar Vavilapalli
>
> With even simple projection queries, I see that HDFS bytes read counter 
> doesn't show any reduction in the amount of data read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3952) merge map-job followed by map-reduce job

2013-02-28 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13590141#comment-13590141
 ] 

Vinod Kumar Vavilapalli commented on HIVE-3952:
---

Okay, will do.

Tests passed except the two about input37.

> merge map-job followed by map-reduce job
> 
>
> Key: HIVE-3952
> URL: https://issues.apache.org/jira/browse/HIVE-3952
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>    Assignee: Vinod Kumar Vavilapalli
> Attachments: HIVE-3952-20130226.txt, HIVE-3952-20130227.1.txt
>
>
> Consider the query like:
> select count(*) FROM
> ( select idOne, idTwo, value FROM
>   bigTable   
>   JOIN
> 
>   smallTableOne on (bigTable.idOne = smallTableOne.idOne) 
>   
>   ) firstjoin 
> 
> JOIN  
> 
> smallTableTwo on (firstjoin.idTwo = smallTableTwo.idTwo);
> where smallTableOne and smallTableTwo are smaller than 
> hive.auto.convert.join.noconditionaltask.size and
> hive.auto.convert.join.noconditionaltask is set to true.
> The joins are collapsed into mapjoins, and it leads to a map-only job
> (for the map-joins) followed by a map-reduce job (for the group by).
> Ideally, the map-only job should be merged with the following map-reduce job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3952) merge map-job followed by map-reduce job

2013-02-27 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HIVE-3952:
--

Attachment: HIVE-3952-20130227.1.txt

Thanks for trying this, Amareshwari!

I've added your "INSERT OVERWRITE DIRECTORY "/dir Select " case to the test.

Here's an updated patch that should work for you, can you please try again? Tx.

> merge map-job followed by map-reduce job
> 
>
> Key: HIVE-3952
> URL: https://issues.apache.org/jira/browse/HIVE-3952
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>    Reporter: Namit Jain
>Assignee: Vinod Kumar Vavilapalli
> Attachments: HIVE-3952-20130226.txt, HIVE-3952-20130227.1.txt
>
>
> Consider the query like:
> select count(*) FROM
> ( select idOne, idTwo, value FROM
>   bigTable   
>   JOIN
> 
>   smallTableOne on (bigTable.idOne = smallTableOne.idOne) 
>   
>   ) firstjoin 
> 
> JOIN  
> 
> smallTableTwo on (firstjoin.idTwo = smallTableTwo.idTwo);
> where smallTableOne and smallTableTwo are smaller than 
> hive.auto.convert.join.noconditionaltask.size and
> hive.auto.convert.join.noconditionaltask is set to true.
> The joins are collapsed into mapjoins, and it leads to a map-only job
> (for the map-joins) followed by a map-reduce job (for the group by).
> Ideally, the map-only job should be merged with the following map-reduce job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3952) merge map-job followed by map-reduce job

2013-02-26 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HIVE-3952:
--

Status: Patch Available  (was: Open)

I am running tests in the background. The multiJoin test passes though.

> merge map-job followed by map-reduce job
> 
>
> Key: HIVE-3952
> URL: https://issues.apache.org/jira/browse/HIVE-3952
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>    Assignee: Vinod Kumar Vavilapalli
> Attachments: HIVE-3952-20130226.txt
>
>
> Consider the query like:
> select count(*) FROM
> ( select idOne, idTwo, value FROM
>   bigTable   
>   JOIN
> 
>   smallTableOne on (bigTable.idOne = smallTableOne.idOne) 
>   
>   ) firstjoin 
> 
> JOIN  
> 
> smallTableTwo on (firstjoin.idTwo = smallTableTwo.idTwo);
> where smallTableOne and smallTableTwo are smaller than 
> hive.auto.convert.join.noconditionaltask.size and
> hive.auto.convert.join.noconditionaltask is set to true.
> The joins are collapsed into mapjoins, and it leads to a map-only job
> (for the map-joins) followed by a map-reduce job (for the group by).
> Ideally, the map-only job should be merged with the following map-reduce job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3952) merge map-job followed by map-reduce job

2013-02-26 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HIVE-3952:
--

Attachment: HIVE-3952-20130226.txt

Had a patch for a while but took a bit of time to clean it up. Attached. Here's 
what it does:
 - The changes are in CommonJoinResolver, which does the collapse of multi-way 
joins into a single task doing all map-joins.
 - Every time a join is converted to map-join, I also inspect the child task to 
see if it is a MR job and then merge M with MR.
 - Added a test to multiJoin1.q to test that a M-MR collapses into a single MR 
job.
 - The memory model after this patch is very complicated, it all depends on 
what operations are performed in the second MR job. AFAIU, We also don't have a 
clear memory model for HIVE-3952 multi-way map-join too. So for now, I just 
added a config "hive.optimize.mapjoin.mapreduce" to control this. I think we 
need a bigger JIRA to figure out memory restrictions when we have these 
multiple optimizations in play.

Please review. Thanks!

> merge map-job followed by map-reduce job
> 
>
> Key: HIVE-3952
> URL: https://issues.apache.org/jira/browse/HIVE-3952
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>    Reporter: Namit Jain
>Assignee: Vinod Kumar Vavilapalli
> Attachments: HIVE-3952-20130226.txt
>
>
> Consider the query like:
> select count(*) FROM
> ( select idOne, idTwo, value FROM
>   bigTable   
>   JOIN
> 
>   smallTableOne on (bigTable.idOne = smallTableOne.idOne) 
>   
>   ) firstjoin 
> 
> JOIN  
> 
> smallTableTwo on (firstjoin.idTwo = smallTableTwo.idTwo);
> where smallTableOne and smallTableTwo are smaller than 
> hive.auto.convert.join.noconditionaltask.size and
> hive.auto.convert.join.noconditionaltask is set to true.
> The joins are collapsed into mapjoins, and it leads to a map-only job
> (for the map-joins) followed by a map-reduce job (for the group by).
> Ideally, the map-only job should be merged with the following map-reduce job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4014) Hive+RCFile is not doing column pruning and reading much more data than necessary

2013-02-12 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13577216#comment-13577216
 ] 

Vinod Kumar Vavilapalli commented on HIVE-4014:
---

I already tracked it down, will upload a patch soon..

> Hive+RCFile is not doing column pruning and reading much more data than 
> necessary
> -
>
> Key: HIVE-4014
> URL: https://issues.apache.org/jira/browse/HIVE-4014
> Project: Hive
>  Issue Type: Bug
>    Reporter: Vinod Kumar Vavilapalli
>    Assignee: Vinod Kumar Vavilapalli
>
> With even simple projection queries, I see that HDFS bytes read counter 
> doesn't show any reduction in the amount of data read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4014) Hive+RCFile is not doing column pruning and reading much more data than necessary

2013-02-12 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created HIVE-4014:
-

 Summary: Hive+RCFile is not doing column pruning and reading much 
more data than necessary
 Key: HIVE-4014
 URL: https://issues.apache.org/jira/browse/HIVE-4014
 Project: Hive
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


With even simple projection queries, I see that HDFS bytes read counter doesn't 
show any reduction in the amount of data read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [VOTE] Graduate HCatalog from the incubator and become part of Hive

2013-02-06 Thread Vinod Kumar Vavilapalli
+1 non-binding.

Thanks,
+Vinod


On Wed, Feb 6, 2013 at 8:06 PM, Namit Jain  wrote:

> +1
>
>
> On 2/5/13 2:54 PM, "Alexander Alten-Lorenz"  wrote:
>
> >+1, non-binding
> >
> >- Alex
> >
> >On Feb 5, 2013, at 10:06 AM, Sushanth Sowmyan  wrote:
> >
> >> And my axe! Erm... I mean, my +1.
> >>
> >>
> >> On Mon, Feb 4, 2013 at 10:18 PM, Alan Gates 
> >>wrote:
> >>> FYI.
> >>>
> >>> Alan.
> >>>
> >>> Begin forwarded message:
> >>>
>  From: Alan Gates 
>  Date: February 4, 2013 10:18:09 PM PST
>  To: hcatalog-...@incubator.apache.org
>  Subject: [VOTE] Graduate HCatalog from the incubator and become part
> of Hive
> 
>  The Hive PMC has voted to accept HCatalog as a submodule of Hive.
> You can see the vote thread at
> 
> http://mail-archives.apache.org/mod_mbox/hive-dev/201301.mbox/%3cCACf6R
> rzktBYD0suZxn3Pfv8XkR=vgwszrzyb_2qvesuj2vh...@mail.gmail.com%3e .  We
> now need to vote to graduate from the incubator and become a submodule
> of Hive.  This entails the following:
> 
>  1) the establishment of an HCatalog submodule in the Apache Hive
> Project;
>  2) the adoption of the Apache HCatalog codebase into the Hive
> HCatalog submodule; and
>  3) adding all currently active HCatalog committers as submodule
> committers on the Hive HCatalog submodule.
> 
>  Definitions for all these can be found in the (now adopted) Hive
> bylaws at
> 
> https://cwiki.apache.org/confluence/display/Hive/Proposed+Changes+to+Hi
> ve+Bylaws+for+Submodule+Committer.
> 
>  This vote will stay open for at least 72 hours (thus 23:00 PST on
> 2/7/13).  PPMC members votes are binding in this vote, though input
> from all is welcome.
> 
>  If this vote passes the next step will be to submit the graduation
> motion to the Incubator PMC.
> 
>  Here's my +1.
> 
>  Alan.
> >>>
> >
> >--
> >Alexander Alten-Lorenz
> >http://mapredit.blogspot.com
> >German Hadoop LinkedIn Group: http://goo.gl/N8pCF
> >
>
>


-- 
+Vinod
Hortonworks Inc.
http://hortonworks.com/


[jira] [Commented] (HIVE-3992) Hive RCFile::sync(long) does a sub-sequence linear search for sync blocks

2013-02-06 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13572927#comment-13572927
 ] 

Vinod Kumar Vavilapalli commented on HIVE-3992:
---

This is one of the big things that is solved by the ORC file (HIVE-3874). Not 
saying that it shouldn't be fixed in RCFile, but we will need to modify RCFile 
to similarly include some kind of file header/footer to index into the 
row-groups.

> Hive RCFile::sync(long) does a sub-sequence linear search for sync blocks
> -
>
> Key: HIVE-3992
> URL: https://issues.apache.org/jira/browse/HIVE-3992
> Project: Hive
>  Issue Type: Bug
> Environment: Ubuntu x86_64/java-1.6/hadoop-2.0.3
>Reporter: Gopal V
> Attachments: select-join-limit.html
>
>
> The following function does some bad I/O
> {code}
> public synchronized void sync(long position) throws IOException {
>   ...
>   try {
> seek(position + 4); // skip escape
> in.readFully(syncCheck);
> int syncLen = sync.length;
> for (int i = 0; in.getPos() < end; i++) {
>   int j = 0;
>   for (; j < syncLen; j++) {
> if (sync[j] != syncCheck[(i + j) % syncLen]) {
>   break;
> }
>   }
>   if (j == syncLen) {
> in.seek(in.getPos() - SYNC_SIZE); // position before
> // sync
> return;
>   }
>   syncCheck[i % syncLen] = in.readByte();
> }
>   }
> ...
> }
> {code}
> This causes a rather large number of readByte() calls which are passed onto a 
> ByteBuffer via a single byte array.
> This results in rather a large amount of CPU being burnt in a the linear 
> search for the sync pattern in the input RCFile (upto 92% for a skewed 
> example - a trivial map-join + limit 100).
> This behaviour should be avoided at best or at least replaced by a rolling 
> hash for efficient comparison, since it has a known byte-width of 16 bytes.
> Attached the stack trace from a Yourkit profile.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-02-06 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13572677#comment-13572677
 ] 

Vinod Kumar Vavilapalli commented on HIVE-3874:
---

Would it make sense to create a (very) temporary svn branch for capturing 
various bug fixes from (possibly) different contributors on sub-JIRAs?

> Create a new Optimized Row Columnar file format for Hive
> 
>
> Key: HIVE-3874
> URL: https://issues.apache.org/jira/browse/HIVE-3874
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: hive.3874.2.patch, OrcFileIntro.pptx, orc.tgz
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-3952) merge map-job followed by map-reduce job

2013-02-04 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned HIVE-3952:
-

Assignee: Vinod Kumar Vavilapalli

I'd like to take a stab at it..

> merge map-job followed by map-reduce job
> 
>
> Key: HIVE-3952
> URL: https://issues.apache.org/jira/browse/HIVE-3952
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>    Assignee: Vinod Kumar Vavilapalli
>
> Consider the query like:
> select count(*) FROM
> ( select idOne, idTwo, value FROM
>   bigTable   
>   JOIN
> 
>   smallTableOne on (bigTable.idOne = smallTableOne.idOne) 
>   
>   ) firstjoin 
> 
> JOIN  
> 
> smallTableTwo on (firstjoin.idTwo = smallTableTwo.idTwo);
> where smallTableOne and smallTableTwo are smaller than 
> hive.auto.convert.join.noconditionaltask.size and
> hive.auto.convert.join.noconditionaltask is set to true.
> The joins are collapsed into mapjoins, and it leads to a map-only job
> (for the map-joins) followed by a map-reduce job (for the group by).
> Ideally, the map-only job should be merged with the following map-reduce job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3784) de-emphasize mapjoin hint

2013-01-31 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568173#comment-13568173
 ] 

Vinod Kumar Vavilapalli commented on HIVE-3784:
---

[~namit] Sorry I was away and couldn't reply back.

Thanks for addressing my use-case, I'll play with it!

> de-emphasize mapjoin hint
> -
>
> Key: HIVE-3784
> URL: https://issues.apache.org/jira/browse/HIVE-3784
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.11.0
>
> Attachments: hive.3784.10.patch, hive.3784.11.patch, 
> hive.3784.12.patch, hive.3784.13.patch, hive.3784.14.patch, 
> hive.3784.15.patch, hive.3784.16.patch, hive.3784.17.patch, 
> hive.3784.18.patch, hive.3784.19.patch, hive.3784.1.patch, 
> hive.3784.21.patch, hive.3784.22.patch, hive.3784.2.patch, hive.3784.3.patch, 
> hive.3784.4.patch, hive.3784.5.patch, hive.3784.6.patch, hive.3784.7.patch, 
> hive.3784.8.patch, hive.3784.9.patch
>
>
> hive.auto.convert.join has been around for a long time, and is pretty stable.
> When mapjoin hint was created, the above parameter did not exist.
> The only reason for the user to specify a mapjoin currently is if they want
> it to be converted to a bucketed-mapjoin or a sort-merge bucketed mapjoin.
> Eventually, that should also go away, but that may take some time to 
> stabilize.
> There are many rules in SemanticAnalyzer to handle the following trees:
> ReduceSink -> MapJoin
> Union  -> MapJoin
> MapJoin-> MapJoin
> This should not be supported anymore. In any of the above scenarios, the
> user can get the mapjoin behavior by setting hive.auto.convert.join to true
> and not specifying the hint. This will simplify the code a lot.
> What does everyone think ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [VOTE] Amend Hive Bylaws + Add HCatalog Submodule

2013-01-31 Thread Vinod Kumar Vavilapalli
+1 and +1 non-binding.

Great to see this happen!

Thanks,
+Vinod


On Thu, Jan 31, 2013 at 12:14 AM, Namit Jain  wrote:

> +1 and +1
>
> On 1/30/13 6:53 AM, "Gunther Hagleitner" 
> wrote:
>
> >+1 and +1
> >
> >Thanks,
> >Gunther.
> >
> >
> >On Tue, Jan 29, 2013 at 5:18 PM, Edward Capriolo
> >wrote:
> >
> >> Measure 1: +1
> >> Measure 2: +1
> >>
> >> On Mon, Jan 28, 2013 at 2:47 PM, Carl Steinbach  wrote:
> >>
> >> > I am calling a vote on the following two measures.
> >> >
> >> > Measure 1: Amend Hive Bylaws to Define Submodules and Submodule
> >> Committers
> >> >
> >> > If this measure passes the Apache Hive Project Bylaws will be
> >> > amended with the following changes:
> >> >
> >> >
> >> >
> >>
> >>
> https://cwiki.apache.org/confluence/display/Hive/Proposed+Changes+to+Hive
> >>+Bylaws+for+Submodule+Committers
> >> >
> >> > The motivation for these changes is discussed in the following
> >> > email thread which appeared on the hive-dev and hcatalog-dev
> >> > mailing lists:
> >> >
> >> > http://markmail.org/thread/u5nap7ghvyo7euqa
> >> >
> >> >
> >> > Measure 2: Create HCatalog Submodule and Adopt HCatalog Codebase
> >> >
> >> > This measure provides for 1) the establishment of an HCatalog
> >> > submodule in the Apache Hive Project, 2) the adoption of the
> >> > Apache HCatalog codebase into the Hive HCatalog submodule, and
> >> > 3) adding all currently active HCatalog committers as submodule
> >> > committers on the Hive HCatalog submodule.
> >> >
> >> > Passage of this measure depends on the passage of Measure 1.
> >> >
> >> >
> >> > Voting:
> >> >
> >> > Both measures require +1 votes from 2/3 of active Hive PMC
> >> > members in order to pass. All participants in the Hive project
> >> > are encouraged to vote on these measures, but only votes from
> >> > active Hive PMC members are binding. The voting period
> >> > commences immediately and shall last a minimum of six days.
> >> >
> >> > Voting is carried out by replying to this email thread. You must
> >> > indicate which measure you are voting on in order for your vote
> >> > to be counted.
> >> >
> >> > More details about the voting process can be found in the Apache
> >> > Hive Project Bylaws:
> >> >
> >> > https://cwiki.apache.org/confluence/display/Hive/Bylaws
> >> >
> >> >
> >>
>
>


-- 
+Vinod
Hortonworks Inc.
http://hortonworks.com/


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-01-10 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550898#comment-13550898
 ] 

Vinod Kumar Vavilapalli commented on HIVE-3874:
---

bq. Can the index be made optional ? In our typical use-case, the old data is 
hardly queried - so we are willing to trade off cpu, and not
support skipping rows for old data to save some space.
The way I understand it, index creation can be specified during creation, so it 
can be made optional. To start with, we may in fact have no indices and then 
add them later.

> Create a new Optimized Row Columnar file format for Hive
> 
>
> Key: HIVE-3874
> URL: https://issues.apache.org/jira/browse/HIVE-3874
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: OrcFileIntro.pptx
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-01-10 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549416#comment-13549416
 ] 

Vinod Kumar Vavilapalli commented on HIVE-3874:
---

Bumping up the version number for ORC and transparently forwarding old data to 
the current file format should work, no?

> Create a new Optimized Row Columnar file format for Hive
> 
>
> Key: HIVE-3874
> URL: https://issues.apache.org/jira/browse/HIVE-3874
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: OrcFileIntro.pptx
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-01-09 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549396#comment-13549396
 ] 

Vinod Kumar Vavilapalli commented on HIVE-3874:
---

+100 !

> Create a new Optimized Row Columnar file format for Hive
> 
>
> Key: HIVE-3874
> URL: https://issues.apache.org/jira/browse/HIVE-3874
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: OrcFileIntro.pptx
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3784) de-emphasize mapjoin hint

2012-12-20 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537479#comment-13537479
 ] 

Vinod Kumar Vavilapalli commented on HIVE-3784:
---

Was trying to play with the patch, and my earlier concern resurfaced.
bq.  With different join keys, it needs some work to merge into a single MR 
anyway - that work is independent of this change.
That isn't true. Even today, I am able to get hive to automatically merge 
multi-way map-join with different join keys into a single map-only job. With 
this patch, we are losing that functionality. For e.g., the following runs as a 
single Map only job:
{noformat}
select /*+MAPJOIN(smallTableTwo)*/ idOne, idTwo, value FROM
( select /*+MAPJOIN(smallTableOne)*/ idOne, idTwo, value FROM
  bigTable   
  JOIN  
  
  smallTableOne on (bigTable.idOne = smallTableOne.idOne)   

  ) firstjoin   
  
JOIN
  
smallTableTwo on (firstjoin.idTwo = smallTableTwo.idTwo)
   
{noformat}


> de-emphasize mapjoin hint
> -
>
> Key: HIVE-3784
> URL: https://issues.apache.org/jira/browse/HIVE-3784
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3784.1.patch, hive.3784.2.patch, hive.3784.3.patch, 
> hive.3784.4.patch, hive.3784.5.patch
>
>
> hive.auto.convert.join has been around for a long time, and is pretty stable.
> When mapjoin hint was created, the above parameter did not exist.
> The only reason for the user to specify a mapjoin currently is if they want
> it to be converted to a bucketed-mapjoin or a sort-merge bucketed mapjoin.
> Eventually, that should also go away, but that may take some time to 
> stabilize.
> There are many rules in SemanticAnalyzer to handle the following trees:
> ReduceSink -> MapJoin
> Union  -> MapJoin
> MapJoin-> MapJoin
> This should not be supported anymore. In any of the above scenarios, the
> user can get the mapjoin behavior by setting hive.auto.convert.join to true
> and not specifying the hint. This will simplify the code a lot.
> What does everyone think ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3784) de-emphasize mapjoin hint

2012-12-14 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13532548#comment-13532548
 ] 

Vinod Kumar Vavilapalli commented on HIVE-3784:
---

Thanks for the clarification, Namit!

> de-emphasize mapjoin hint
> -
>
> Key: HIVE-3784
> URL: https://issues.apache.org/jira/browse/HIVE-3784
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3784.1.patch, hive.3784.2.patch, hive.3784.3.patch, 
> hive.3784.4.patch, hive.3784.5.patch
>
>
> hive.auto.convert.join has been around for a long time, and is pretty stable.
> When mapjoin hint was created, the above parameter did not exist.
> The only reason for the user to specify a mapjoin currently is if they want
> it to be converted to a bucketed-mapjoin or a sort-merge bucketed mapjoin.
> Eventually, that should also go away, but that may take some time to 
> stabilize.
> There are many rules in SemanticAnalyzer to handle the following trees:
> ReduceSink -> MapJoin
> Union  -> MapJoin
> MapJoin-> MapJoin
> This should not be supported anymore. In any of the above scenarios, the
> user can get the mapjoin behavior by setting hive.auto.convert.join to true
> and not specifying the hint. This will simplify the code a lot.
> What does everyone think ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3784) de-emphasize mapjoin hint

2012-12-13 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13531451#comment-13531451
 ] 

Vinod Kumar Vavilapalli commented on HIVE-3784:
---

Hi, couple of questsions:
 - Does this rule out bucketed map-join or hive.optimize.bucketmapjoin will 
continue to work? If it is the earlier, shouldn't fixing that be a blocker of 
this?
 - Also, does this rule out map join of multiple small tables in a single 
map-only job? As discussed on HIVE-3652, giving map-join hints to a nested join 
automatically converts it into a single map-join map.

bq. also optimizes a lot of queries - mapjoin followed by groupby.
This is great!

> de-emphasize mapjoin hint
> -
>
> Key: HIVE-3784
> URL: https://issues.apache.org/jira/browse/HIVE-3784
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3784.1.patch, hive.3784.2.patch, hive.3784.3.patch, 
> hive.3784.4.patch
>
>
> hive.auto.convert.join has been around for a long time, and is pretty stable.
> When mapjoin hint was created, the above parameter did not exist.
> The only reason for the user to specify a mapjoin currently is if they want
> it to be converted to a bucketed-mapjoin or a sort-merge bucketed mapjoin.
> Eventually, that should also go away, but that may take some time to 
> stabilize.
> There are many rules in SemanticAnalyzer to handle the following trees:
> ReduceSink -> MapJoin
> Union  -> MapJoin
> MapJoin-> MapJoin
> This should not be supported anymore. In any of the above scenarios, the
> user can get the mapjoin behavior by setting hive.auto.convert.join to true
> and not specifying the hint. This will simplify the code a lot.
> What does everyone think ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2708) Hive MR local jobs fail on Hadoop 0.23

2012-01-12 Thread Vinod Kumar Vavilapalli (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185146#comment-13185146
 ] 

Vinod Kumar Vavilapalli commented on HIVE-2708:
---

Amareshwari, this is highly likely a mapreduce issue. Can you paste the 
complete logs, possibly enabling the DEBUG mode? I am watching this issue, so 
we can proceed here for now.

> Hive MR local jobs fail on Hadoop 0.23
> --
>
> Key: HIVE-2708
> URL: https://issues.apache.org/jira/browse/HIVE-2708
> Project: Hive
>  Issue Type: Bug
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.8.1
>
>
> Hive MR local jobs fail on 0.23 with following exception:
> Job running in-process (local Hadoop)
> Hadoop job information for null: number of mappers: 0; number of reducers: 0
> java.io.IOException: Could not find status of job:job_local_0001
>   at 
> org.apache.hadoop.hive.ql.exec.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:291)
>   at 
> org.apache.hadoop.hive.ql.exec.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:685)
>   at 
> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:458)
>   at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:710)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:189)
> Ended Job = job_local_0001 with exception 'java.io.IOException(Could not find 
> status of job:job_local_0001)'
> Execution

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira