Build failed in Hudson: Hive-trunk-h0.17 #8

2009-02-18 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/8/changes

Changes:

[zshao] HIVE-270. Add a lazy-deserialized SerDe for efficient deserialization 
of rows with primitive types. (zshao)

--
[...truncated 16709 lines...]
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_column2.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_column2.q.out
 
[junit] Done query: unknown_column2.q
[junit] Hive history 
file=http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/../build/ql/tmp/hive_job_log_hudson_200902180523_68094412.txt
 
[junit] Begin query: unknown_column3.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_column3.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_column3.q.out
 
[junit] Done query: unknown_column3.q
[junit] Hive history 
file=http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/../build/ql/tmp/hive_job_log_hudson_200902180523_576389447.txt
 
[junit] Begin query: unknown_column4.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_column4.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_column4.q.out
 
[junit] Done query: unknown_column4.q
[junit] Hive history 
file=http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/../build/ql/tmp/hive_job_log_hudson_200902180523_-1227788210.txt
 
[junit] Begin query: unknown_column5.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_column5.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_column5.q.out
 
[junit] Done query: unknown_column5.q
[junit] Hive history 
file=http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/../build/ql/tmp/hive_job_log_hudson_200902180523_538024551.txt
 
[junit] Begin query: unknown_column6.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_column6.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_column6.q.out
 
[junit] Done query: unknown_column6.q
[junit] Hive history 
file=http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/../build/ql/tmp/hive_job_log_hudson_200902180523_-1608223043.txt
 
[junit] Begin query: unknown_function1.q
[junit] 

Build failed in Hudson: Hive-trunk-h0.18 #9

2009-02-18 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/9/changes

Changes:

[zshao] HIVE-270. Add a lazy-deserialized SerDe for efficient deserialization 
of rows with primitive types. (zshao)

--
[...truncated 19148 lines...]
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/ql/test/logs/negative/unknown_column2.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/src/test/results/compiler/errors/unknown_column2.q.out
 
[junit] Done query: unknown_column2.q
[junit] Hive history 
file=http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/../build/ql/tmp/hive_job_log_hudson_200902180624_-1207805739.txt
 
[junit] Begin query: unknown_column3.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/ql/test/logs/negative/unknown_column3.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/src/test/results/compiler/errors/unknown_column3.q.out
 
[junit] Done query: unknown_column3.q
[junit] Hive history 
file=http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/../build/ql/tmp/hive_job_log_hudson_200902180624_-355194879.txt
 
[junit] Begin query: unknown_column4.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/ql/test/logs/negative/unknown_column4.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/src/test/results/compiler/errors/unknown_column4.q.out
 
[junit] Done query: unknown_column4.q
[junit] Hive history 
file=http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/../build/ql/tmp/hive_job_log_hudson_200902180624_238035720.txt
 
[junit] Begin query: unknown_column5.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/ql/test/logs/negative/unknown_column5.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/src/test/results/compiler/errors/unknown_column5.q.out
 
[junit] Done query: unknown_column5.q
[junit] Hive history 
file=http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/../build/ql/tmp/hive_job_log_hudson_200902180624_1911369666.txt
 
[junit] Begin query: unknown_column6.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/ql/test/logs/negative/unknown_column6.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/src/test/results/compiler/errors/unknown_column6.q.out
 
[junit] Done query: unknown_column6.q
[junit] Hive history 
file=http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/../build/ql/tmp/hive_job_log_hudson_200902180624_-295681886.txt
 
[junit] Begin query: unknown_function1.q
[junit] 

Build failed in Hudson: Hive-trunk-h0.19 #8

2009-02-18 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/8/changes

Changes:

[zshao] HIVE-270. Add a lazy-deserialized SerDe for efficient deserialization 
of rows with primitive types. (zshao)

--
[...truncated 18767 lines...]
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/build/ql/test/logs/negative/unknown_column2.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/ql/src/test/results/compiler/errors/unknown_column2.q.out
 
[junit] Done query: unknown_column2.q
[junit] Hive history 
file=http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/ql/../build/ql/tmp/hive_job_log_hudson_200902180724_474849503.txt
 
[junit] Begin query: unknown_column3.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/build/ql/test/logs/negative/unknown_column3.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/ql/src/test/results/compiler/errors/unknown_column3.q.out
 
[junit] Done query: unknown_column3.q
[junit] Hive history 
file=http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/ql/../build/ql/tmp/hive_job_log_hudson_200902180724_1948005189.txt
 
[junit] Begin query: unknown_column4.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/build/ql/test/logs/negative/unknown_column4.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/ql/src/test/results/compiler/errors/unknown_column4.q.out
 
[junit] Done query: unknown_column4.q
[junit] Hive history 
file=http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/ql/../build/ql/tmp/hive_job_log_hudson_200902180724_1836088980.txt
 
[junit] Begin query: unknown_column5.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/build/ql/test/logs/negative/unknown_column5.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/ql/src/test/results/compiler/errors/unknown_column5.q.out
 
[junit] Done query: unknown_column5.q
[junit] Hive history 
file=http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/ql/../build/ql/tmp/hive_job_log_hudson_200902180724_1261990314.txt
 
[junit] Begin query: unknown_column6.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/build/ql/test/logs/negative/unknown_column6.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/ql/src/test/results/compiler/errors/unknown_column6.q.out
 
[junit] Done query: unknown_column6.q
[junit] Hive history 
file=http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/ql/../build/ql/tmp/hive_job_log_hudson_200902180724_782453616.txt
 
[junit] Begin query: unknown_function1.q
[junit] 

RE: You are voted to be a Hive committer

2009-02-18 Thread Joydeep Sen Sarma
Congrats!

I guess this means Hive can now reliably survive a massive earthquake in SF Bay 
Area.

-Original Message-
From: Dhruba Borthakur [mailto:dhr...@gmail.com] 
Sent: Tuesday, February 17, 2009 10:45 PM
To: Johan Oskarsson
Cc: hive-dev@hadoop.apache.org
Subject: You are voted to be a Hive committer

 Hi Johan,

The Hadoop PMC has voted to make you a committer for the Hive subproject.
Please complete and sign the ICLA at
http://www.apache.org/licenses/icla.txtand fax it to the number
specified in the form. Once the form is processed,
you would be granted an apache account.

thanks,
dhruba


Re: You are voted to be a Hive committer

2009-02-18 Thread Jeff Hammerbacher
Congrats Johan!

On Wed, Feb 18, 2009 at 10:55 AM, Joydeep Sen Sarma jssa...@facebook.comwrote:

 Congrats!

 I guess this means Hive can now reliably survive a massive earthquake in SF
 Bay Area.

 -Original Message-
 From: Dhruba Borthakur [mailto:dhr...@gmail.com]
 Sent: Tuesday, February 17, 2009 10:45 PM
 To: Johan Oskarsson
 Cc: hive-dev@hadoop.apache.org
 Subject: You are voted to be a Hive committer

  Hi Johan,

 The Hadoop PMC has voted to make you a committer for the Hive subproject.
 Please complete and sign the ICLA at
 http://www.apache.org/licenses/icla.txtand fax it to the number
 specified in the form. Once the form is processed,
 you would be granted an apache account.

 thanks,
 dhruba



[jira] Commented: (HIVE-74) Hive can use CombineFileInputFormat for when the input are many small files

2009-02-18 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-74?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12674742#action_12674742
 ] 

Joydeep Sen Sarma commented on HIVE-74:
---

Is it possible to do this in a way that Hive continues to compile against 
0.17/18/19?. I think this is almost a hard requirement.

One possibility is to have a new version of HiveInputSplit that only compiles 
against 0.20 - and have this conditionally in the code only for 0.20 and 
onwards. (for example in HiveInputFormat.java - there's a conditional tag 
(//[exclude_0_19]) that does some conditional code inclusion). I am not sure 
how this was implemented.

But even this is less than ideal. How will we deploy this with 17 (with 
combinefilesplit and related patches) (unless we are not using the open source 
version directly)

 Hive can use CombineFileInputFormat for when the input are many small files
 ---

 Key: HIVE-74
 URL: https://issues.apache.org/jira/browse/HIVE-74
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.2.0

 Attachments: hiveCombineSplit.patch, hiveCombineSplit.patch


 There are cases when the input to a Hive job are thousands of small files. In 
 this case, there is a mapper for each file. Most of the overhead for spawning 
 all these mappers can be avoided if Hive used CombineFileInputFormat 
 introduced via HADOOP-4565

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: You are voted to be a Hive committer

2009-02-18 Thread Edward Capriolo
Congrats Johan.

The subject of the email always fool me. I see an email titled You
are voted to be a Hive committer and I feel like I have won an
academy award. Then I open the email to find someone else is getting
one. great sorrow. JK


Re: You are voted to be a Hive committer

2009-02-18 Thread Dhruba Borthakur
Hi Edward,

You are absolutely right! Sorry for the confusion.

dhruba

On Wed, Feb 18, 2009 at 11:36 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 Congrats Johan.

 The subject of the email always fool me. I see an email titled You
 are voted to be a Hive committer and I feel like I have won an
 academy award. Then I open the email to find someone else is getting
 one. great sorrow. JK



[jira] Commented: (HIVE-74) Hive can use CombineFileInputFormat for when the input are many small files

2009-02-18 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-74?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12674769#action_12674769
 ] 

Joydeep Sen Sarma commented on HIVE-74:
---

where are the pools for the combinefileinputformat created (one per table)?

 Hive can use CombineFileInputFormat for when the input are many small files
 ---

 Key: HIVE-74
 URL: https://issues.apache.org/jira/browse/HIVE-74
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.2.0

 Attachments: hiveCombineSplit.patch, hiveCombineSplit.patch


 There are cases when the input to a Hive job are thousands of small files. In 
 this case, there is a mapper for each file. Most of the overhead for spawning 
 all these mappers can be avoided if Hive used CombineFileInputFormat 
 introduced via HADOOP-4565

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-131) insert overwrite directory leaves behind uncommitted/tmp files from failed tasks

2009-02-18 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12674776#action_12674776
 ] 

Joydeep Sen Sarma commented on HIVE-131:


please commit this to 0.2 also since it's a pretty severe bug

 insert overwrite directory leaves behind uncommitted/tmp files from failed 
 tasks
 

 Key: HIVE-131
 URL: https://issues.apache.org/jira/browse/HIVE-131
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Joydeep Sen Sarma
Assignee: Joydeep Sen Sarma
Priority: Critical
 Attachments: HIVE-131.patch.1, hive-131.patch.2


 _tmp files are getting left behind on insert overwrite directory:
 /user/jssarma/ctst1/40422_m_000195_0.deflate  r 3 13285 2008-12-07 01:47  
 rw-r--r-- jssarma supergroup
 /user/jssarma/ctst1/40422_m_000196_0.deflate  r 3 3055  2008-12-07 01:46  
 rw-r--r-- jssarma supergroup
 /user/jssarma/ctst1/_tmp.40422_m_33_0 r 3 0 2008-12-07 01:53  rw-r--r-- 
 jssarma supergroup
 /user/jssarma/ctst1/_tmp.40422_m_37_1 r 3 0 2008-12-07 01:53  rw-r--r-- 
 jssarma supergroup
 this happened with speculative execution. the code looks good (in fact in 
 this case many speculative tasks were launched - and only a couple caused 
 problems). Almost seems like these files did not appear in the namespace 
 until after the map-reduce job finished and the movetask did a listing of the 
 output dir ..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: You are voted to be a Hive committer

2009-02-18 Thread Johan Oskarsson
Thanks guys, I'll keep the project alive if California slides into the 
pacific :)


/Johan

Jeff Hammerbacher wrote:

Congrats Johan!

On Wed, Feb 18, 2009 at 10:55 AM, Joydeep Sen Sarma 
jssa...@facebook.com mailto:jssa...@facebook.com wrote:


Congrats!

I guess this means Hive can now reliably survive a massive
earthquake in SF Bay Area.

-Original Message-
From: Dhruba Borthakur [mailto:dhr...@gmail.com
mailto:dhr...@gmail.com]
Sent: Tuesday, February 17, 2009 10:45 PM
To: Johan Oskarsson
Cc: hive-dev@hadoop.apache.org mailto:hive-dev@hadoop.apache.org
Subject: You are voted to be a Hive committer

 Hi Johan,

The Hadoop PMC has voted to make you a committer for the Hive
subproject.
Please complete and sign the ICLA at
http://www.apache.org/licenses/icla.txtand fax it to the number
specified in the form. Once the form is processed,
you would be granted an apache account.

thanks,
dhruba






Re: Need help on Hive.g and parser!

2009-02-18 Thread Shyam Sarkar
Thank you. I went through antlr. Just curious -- was there any comparison done 
between JavaCC and antlr ? How is the quality of code generated by antlr 
compared to JavaCC ? This could be an issue if in future we like to embed XML 
or java script inside Hive QL (not very important at this point).
Advanced SQL syntax embeds XML and Java scripts.

Thanks,
Shyam


--- On Tue, 2/17/09, Zheng Shao zsh...@gmail.com wrote:

 From: Zheng Shao zsh...@gmail.com
 Subject: Re: Need help on Hive.g and parser!
 To: hive-dev@hadoop.apache.org, shyam_sar...@yahoo.com
 Date: Tuesday, February 17, 2009, 10:01 PM
 We are using antlr.
 
 Basically, the rule checks the timestamp of
 HiveParser.java. If it's newer
 than Hive.g, then we don't need to regenerate
 HiveParse.java from Hive.g
 again.
 
 Zheng
 
 On Tue, Feb 17, 2009 at 12:15 PM, Shyam Sarkar
 shyam_sar...@yahoo.comwrote:
 
  Hello,
 
  Someone please explain the following build.xml spec
 for grammar build
  (required and not required) ::
 
 
 ===
 
  uptodate
 property=grammarBuild.notRequired
 srcfiles dir=
 ${src.dir}/org/apache/hadoop/hive/ql/parse
  includes=**/*.g/
 mapper type=merge
 
 to=${build.dir.hive}/ql/gen-java/org/apache/hadoop/hive/ql/parse/HiveParser.java/
   /uptodate
 
   target name=build-grammar
 unless=grammarBuild.notRequired
 echoBuilding Grammar
 ${src.dir}/org/apache/hadoop/hive/ql/parse/Hive.g
   /echo
 java classname=org.antlr.Tool
 classpathref=classpath fork=true
arg value=-fo /
arg
 
 value=${build.dir.hive}/ql/gen-java/org/apache/hadoop/hive/ql/parse
 /
arg
 value=${src.dir}/org/apache/hadoop/hive/ql/parse/Hive.g
 /
 /java
   /target
 
 =
 
  Also can someone tell me which parser generator is
 used? I used JavaCC
  in the past.
 
  Thanks,
  shyam_sar...@yahoo.com
 
 
 
 
 
 
 
 -- 
 Yours,
 Zheng


  


[jira] Created: (HIVE-294) Support MAP(a.*), REDUCE(a.*) and TRANSFORM(a.*)

2009-02-18 Thread Zheng Shao (JIRA)
Support MAP(a.*), REDUCE(a.*) and TRANSFORM(a.*)


 Key: HIVE-294
 URL: https://issues.apache.org/jira/browse/HIVE-294
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.2.0, 0.3.0
Reporter: Zheng Shao


Hive language does not accept  MAP(a.*), REDUCE(a.*) and TRANSFORM(a.*) now. We 
should support it.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-131) insert overwrite directory leaves behind uncommitted/tmp files from failed tasks

2009-02-18 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-131:


   Resolution: Fixed
Fix Version/s: 0.3.0
   0.2.0
 Release Note: HIVE-131. Remove uncommitted files from failed tasks. 
(Joydeep Sen Sarma via zshao)
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

trunk: Committed revision 745709.
branch-0.2: Committed revision 745710.



 insert overwrite directory leaves behind uncommitted/tmp files from failed 
 tasks
 

 Key: HIVE-131
 URL: https://issues.apache.org/jira/browse/HIVE-131
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Joydeep Sen Sarma
Assignee: Joydeep Sen Sarma
Priority: Critical
 Fix For: 0.2.0, 0.3.0

 Attachments: HIVE-131.patch.1, hive-131.patch.2


 _tmp files are getting left behind on insert overwrite directory:
 /user/jssarma/ctst1/40422_m_000195_0.deflate  r 3 13285 2008-12-07 01:47  
 rw-r--r-- jssarma supergroup
 /user/jssarma/ctst1/40422_m_000196_0.deflate  r 3 3055  2008-12-07 01:46  
 rw-r--r-- jssarma supergroup
 /user/jssarma/ctst1/_tmp.40422_m_33_0 r 3 0 2008-12-07 01:53  rw-r--r-- 
 jssarma supergroup
 /user/jssarma/ctst1/_tmp.40422_m_37_1 r 3 0 2008-12-07 01:53  rw-r--r-- 
 jssarma supergroup
 this happened with speculative execution. the code looks good (in fact in 
 this case many speculative tasks were launched - and only a couple caused 
 problems). Almost seems like these files did not appear in the namespace 
 until after the map-reduce job finished and the movetask did a listing of the 
 output dir ..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-276) input3_limit.q fails under 0.17

2009-02-18 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-276:


Attachment: HIVE-276.2.patch

Incorporated Ashish's comments.


 input3_limit.q fails under 0.17
 ---

 Key: HIVE-276
 URL: https://issues.apache.org/jira/browse/HIVE-276
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-276.1.patch, HIVE-276.2.patch


 The plan ql/src/test/results/clientpositive/input3_limit.q.out shows that 
 there are 2 map-reduce jobs:
 The first one is distributed and sorted as is specified by the query. The 
 reducer side has LIMIT 20.
 The second one (single reducer job imposed by LIMIT 20) does not have the 
 same sort order, so the final result is non-deterministic.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-276) input3_limit.q fails under 0.17

2009-02-18 Thread Raghotham Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12674870#action_12674870
 ] 

Raghotham Murthy commented on HIVE-276:
---

+1

looks good.

 input3_limit.q fails under 0.17
 ---

 Key: HIVE-276
 URL: https://issues.apache.org/jira/browse/HIVE-276
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-276.1.patch, HIVE-276.2.patch


 The plan ql/src/test/results/clientpositive/input3_limit.q.out shows that 
 there are 2 map-reduce jobs:
 The first one is distributed and sorted as is specified by the query. The 
 reducer side has LIMIT 20.
 The second one (single reducer job imposed by LIMIT 20) does not have the 
 same sort order, so the final result is non-deterministic.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



*UNIT TEST FAILURE for apache HIVE* Hadoop.Version=0.17.1 based on SVN Rev# 745710.54

2009-02-18 Thread Murli Varadachari
[junit] Test org.apache.hadoop.hive.cli.TestCliDriver FAILED
BUILD FAILED
[junit] Test org.apache.hadoop.hive.cli.TestCliDriver FAILED
BUILD FAILED


[jira] Updated: (HIVE-279) Implement predicate push down for hive queries

2009-02-18 Thread Prasad Chakka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Chakka updated HIVE-279:
---

Attachment: hive-279.patch

this is a drop for initial review since i suspect there will be lot of comments 
:). it should work for all cases except for multi-insert queries.

i have not enabled this by default but added a new config param called 
hive.optimize.ppd to enable this feature. 

i have not modified existing testcases but added couple of new testcases. will 
add more while uploading final patch.


 Implement predicate push down for hive queries
 --

 Key: HIVE-279
 URL: https://issues.apache.org/jira/browse/HIVE-279
 Project: Hadoop Hive
  Issue Type: New Feature
Affects Versions: 0.2.0
Reporter: Prasad Chakka
Assignee: Prasad Chakka
 Attachments: hive-279.patch


 Push predicates that are expressed in outer queries into inner queries where 
 possible so that rows will get filtered out sooner.
 eg.
 select a.*, b.* from a join b on (a.uid = b.uid) where a.age = 20 and 
 a.gender = 'm'
 current compiler generates the filter predicate in the reducer after the join 
 so all the rows have to be passed from mapper to reducer. by pushing the 
 filter predicate to the mapper, query performance should improve.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-223) when using map-side aggregates - perform single map-reduce group-by

2009-02-18 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-223:


Status: Patch Available  (was: Open)

fixed a small bug

 when using map-side aggregates - perform single map-reduce group-by
 ---

 Key: HIVE-223
 URL: https://issues.apache.org/jira/browse/HIVE-223
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Joydeep Sen Sarma
Assignee: Namit Jain
 Attachments: 223.2.txt, 223.patch1.txt


 today even when we do map side aggregates - we do multiple map-reduce jobs. 
 however - the reason for doing multiple map-reduce group-bys (for single 
 group-bys) was the fear of skews. When we are doing map side aggregates - 
 skews should not exist for the most part. There can be two reason for skews:
 - large number of entries for a single grouping set - map side aggregates 
 should take care of this
 - badness in hash function that sends too much stuff to one reducer - we 
 should be able to take care of this by having good hash functions (and prime 
 number reducer counts)
 So i think we should be able to do a single stage map-reduce when doing 
 map-side aggregates.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-223) when using map-side aggregates - perform single map-reduce group-by

2009-02-18 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-223:


Status: Patch Available  (was: Open)

 when using map-side aggregates - perform single map-reduce group-by
 ---

 Key: HIVE-223
 URL: https://issues.apache.org/jira/browse/HIVE-223
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Joydeep Sen Sarma
Assignee: Namit Jain
 Attachments: 223.2.txt, 223.3.txt, 223.patch1.txt


 today even when we do map side aggregates - we do multiple map-reduce jobs. 
 however - the reason for doing multiple map-reduce group-bys (for single 
 group-bys) was the fear of skews. When we are doing map side aggregates - 
 skews should not exist for the most part. There can be two reason for skews:
 - large number of entries for a single grouping set - map side aggregates 
 should take care of this
 - badness in hash function that sends too much stuff to one reducer - we 
 should be able to take care of this by having good hash functions (and prime 
 number reducer counts)
 So i think we should be able to do a single stage map-reduce when doing 
 map-side aggregates.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-223) when using map-side aggregates - perform single map-reduce group-by

2009-02-18 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-223:


Status: Open  (was: Patch Available)

 when using map-side aggregates - perform single map-reduce group-by
 ---

 Key: HIVE-223
 URL: https://issues.apache.org/jira/browse/HIVE-223
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Joydeep Sen Sarma
Assignee: Namit Jain
 Attachments: 223.2.txt, 223.3.txt, 223.patch1.txt


 today even when we do map side aggregates - we do multiple map-reduce jobs. 
 however - the reason for doing multiple map-reduce group-bys (for single 
 group-bys) was the fear of skews. When we are doing map side aggregates - 
 skews should not exist for the most part. There can be two reason for skews:
 - large number of entries for a single grouping set - map side aggregates 
 should take care of this
 - badness in hash function that sends too much stuff to one reducer - we 
 should be able to take care of this by having good hash functions (and prime 
 number reducer counts)
 So i think we should be able to do a single stage map-reduce when doing 
 map-side aggregates.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-223) when using map-side aggregates - perform single map-reduce group-by

2009-02-18 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-223:


Attachment: 223.3.txt

 when using map-side aggregates - perform single map-reduce group-by
 ---

 Key: HIVE-223
 URL: https://issues.apache.org/jira/browse/HIVE-223
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Joydeep Sen Sarma
Assignee: Namit Jain
 Attachments: 223.2.txt, 223.3.txt, 223.patch1.txt


 today even when we do map side aggregates - we do multiple map-reduce jobs. 
 however - the reason for doing multiple map-reduce group-bys (for single 
 group-bys) was the fear of skews. When we are doing map side aggregates - 
 skews should not exist for the most part. There can be two reason for skews:
 - large number of entries for a single grouping set - map side aggregates 
 should take care of this
 - badness in hash function that sends too much stuff to one reducer - we 
 should be able to take care of this by having good hash functions (and prime 
 number reducer counts)
 So i think we should be able to do a single stage map-reduce when doing 
 map-side aggregates.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-291) [Hive] map-side aggregation should be automatically disabled at run-time if it is not turning out to be useful

2009-02-18 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12674893#action_12674893
 ] 

Namit Jain commented on HIVE-291:
-

tested one big job for correctness

 [Hive] map-side aggregation should be automatically disabled at run-time if 
 it is not turning out to be useful
 --

 Key: HIVE-291
 URL: https://issues.apache.org/jira/browse/HIVE-291
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: 291.1.txt


 Map-side aggregation should be automatically disabled at run-time if it is 
 not turning out to be useful.
 If map-side aggregation is not reducing the number of output rows, it is a 
 drain on the mapper, since it is consuming memory and performing unnecessary 
 hash lookups

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-223) when using map-side aggregates - perform single map-reduce group-by

2009-02-18 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12674892#action_12674892
 ] 

Namit Jain commented on HIVE-223:
-

tested one big job for correctness

 when using map-side aggregates - perform single map-reduce group-by
 ---

 Key: HIVE-223
 URL: https://issues.apache.org/jira/browse/HIVE-223
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Joydeep Sen Sarma
Assignee: Namit Jain
 Attachments: 223.2.txt, 223.3.txt, 223.patch1.txt


 today even when we do map side aggregates - we do multiple map-reduce jobs. 
 however - the reason for doing multiple map-reduce group-bys (for single 
 group-bys) was the fear of skews. When we are doing map side aggregates - 
 skews should not exist for the most part. There can be two reason for skews:
 - large number of entries for a single grouping set - map side aggregates 
 should take care of this
 - badness in hash function that sends too much stuff to one reducer - we 
 should be able to take care of this by having good hash functions (and prime 
 number reducer counts)
 So i think we should be able to do a single stage map-reduce when doing 
 map-side aggregates.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



JIRA_223.3.txt_UNIT_TEST_SUCCEEDED

2009-02-18 Thread Murli Varadachari

SUCCESS: BUILD AND UNIT TEST using PATCH 223.3.txt PASSED!!