[jira] Created: (HIVE-1206) Return row results in a list instead of a tab-delimited string

2010-02-28 Thread bc Wong (JIRA)
Return row results in a list instead of a tab-delimited string
--

 Key: HIVE-1206
 URL: https://issues.apache.org/jira/browse/HIVE-1206
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: bc Wong


Driver.getResults() returns each row as a string, with fields tab delimited 
always. This breaks for data with tabs. It'd be really nice if the interface 
allows returning the row as a list of fields.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-28 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839391#action_12839391
 ] 

He Yongqiang commented on HIVE-259:
---

The code looks very good. Thanks for the code work, Jerome and Zheng!
Just some minor comments:
(1) I am not familiar with the exact definition of percentile function. Is the 
percentile()'s result must be a member of input data? 
(2) HashMap and ArrayList is used to copy and sort. Can we use tree map here? 
this is a small and can be ignored.
In the beginning of  new test case, 
DESCRIBE FUNCTION percentile;
DESCRIBE FUNCTION EXTENDED percentile;
appears two times.

And this is a very good function to have, it will be great if we can update its 
usage to the wiki page or somewhere.

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, 
 HIVE-259.4.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-259) Add PERCENTILE aggregate function

2010-02-28 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-259:


Attachment: HIVE-259.5.patch

We take the method recommended by NIST.

See http://en.wikipedia.org/wiki/Percentile#Alternative_methods

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, 
 HIVE-259.4.patch, HIVE-259.5.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-28 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839393#action_12839393
 ] 

Zheng Shao commented on HIVE-259:
-

 (1) I am not familiar with the exact definition of percentile function. Is 
 the percentile()'s result must be a member of input data?
See the link above.

 (2) HashMap and ArrayList is used to copy and sort. Can we use tree map here? 
 this is a small and can be ignored.
In the beginning of new test case, 
I think HashMap is better here. The reason is that the number of iterate is 
usually much higher than the number of unique numbers (the size of the 
HashMap). By using HashMap we reduce the cost of iterate.

 In the beginning of new test case, .. appears two times
Fixed in HIVE-259.5.patch


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, 
 HIVE-259.4.patch, HIVE-259.5.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-28 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839394#action_12839394
 ] 

He Yongqiang commented on HIVE-259:
---

looks good, will test and commit.

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, 
 HIVE-259.4.patch, HIVE-259.5.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-259) Add PERCENTILE aggregate function

2010-02-28 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-259:
--

   Resolution: Fixed
Fix Version/s: 0.6.0
 Release Note: Add PERCENTILE aggregate function
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed. Thanks for the hard work, Jerome Boulon and Zheng.

Btw, i manually fixed a show_function.q diff.  Please update the usage of 
percentile function on the wiki or somewhere.

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Fix For: 0.6.0

 Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, 
 HIVE-259.4.patch, HIVE-259.5.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Hive-trunk-h0.17 #375

2010-02-28 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/375/changes

Changes:

[heyongqiang] HIVE-259. Add PERCENTILE aggregate function.(Jerome Boulon, Zheng 
via He Yongqiang)

--
[...truncated 10891 lines...]
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_function4.q.out
 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_function4.q.out
[junit] Done query: unknown_function4.q
[junit] Begin query: unknown_table1.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_table1.q.out
 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_table1.q.out
[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: 

[jira] Commented: (HIVE-1203) HiveInputFormat.getInputFormatFromCache swallows cause exception when trowing IOExcpetion

2010-02-28 Thread Vladimir Klimontovich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839427#action_12839427
 ] 

Vladimir Klimontovich commented on HIVE-1203:
-

Well, I can't assign this issue. Seems like I don't have enough permissions

 HiveInputFormat.getInputFormatFromCache swallows  cause exception when 
 trowing IOExcpetion
 

 Key: HIVE-1203
 URL: https://issues.apache.org/jira/browse/HIVE-1203
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.4.0, 0.4.1, 0.5.0
Reporter: Vladimir Klimontovich
 Fix For: 0.4.2, 0.5.1, 0.6.0

 Attachments: 0.4.patch, 0.5.patch, trunk.patch


 To fix this it's simply needed to add second parameter to IOException 
 constructor. Patches for 0.4, 0.5 and trunk are available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-600) Running TPC-H queries on Hive

2010-02-28 Thread Kamil Bajda-Pawlikowski (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839478#action_12839478
 ] 

Kamil Bajda-Pawlikowski commented on HIVE-600:
--

Hi Yuntao,

I have attempted to run TPC-H on Hive. Thanks for really well prepared scripts!

During the first query, I realized that things are not going well. It seems 
that Aaron's concern about the number of reducers was valid one.
However, the problem is that Hive schedules too many reducers! The default 
configuration of Hive tries to determine # of tasks automatically using value 
of  hive.exec.reducers.bytes.per.reducer property (the default setting is to 
have one reduce task per 1GB of input data). When the size of the data is huge, 
this is inefficient. This needs to capped!

For example in my case, there is 50GB data per node, but only 2 reduce task 
slots and I'm getting 25 reduce task waves. Q1 ran for 1h49min. In contrast, 
when I set hive.exec.reducers.max property to the number of reduce slots in 
my Hadoop installation, the query running time is only about 23min. Of note, 
the default value for hive.exec.reducers.max is 999.

The above issue was not too bad for the data size you used. TPC-H dataset with 
SF=100 translates into at most 100 reducers per job, and with 40 reduce slots 
in total, each job had max. 2.5 reduce task waves. Still, your numbers could be 
somewhat better by capping hive.exec.reducers.max to 40 per Tom White's tip 
#9 from http://www.cloudera.com/blog/2009/05/10-mapreduce-tips.

Could please confirm whether my understanding is correct.

Thank you,
Kamil





 Running TPC-H queries on Hive
 -

 Key: HIVE-600
 URL: https://issues.apache.org/jira/browse/HIVE-600
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Yuntao Jia
Assignee: Yuntao Jia
 Attachments: TPC-H_on_Hive_2009-08-11.pdf, 
 TPC-H_on_Hive_2009-08-11.tar.gz, TPC-H_on_Hive_2009-08-14.tar.gz


 The goal is to run all TPC-H (http://www.tpc.org/tpch/) benchmark queries on 
 Hive for two reasons. First, through those queries, we would like to find the 
 new features that we need to put into Hive so that Hive supports common SQL 
 queries. Second, we would like to measure the performance of Hive to find out 
 what Hive is not good at. We can then improve Hive based on those 
 information. 
 For queries that are not supported now in Hive, I will try to rewrite them to 
 one or more Hive-supported queries. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1204) typedbytes: writing to stderr kills the mapper

2010-02-28 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1204:
---

  Resolution: Fixed
Release Note: typedbytes: writing to stderr kills the mapper
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed. Thanks Namit!

 typedbytes: writing to stderr kills the mapper
 --

 Key: HIVE-1204
 URL: https://issues.apache.org/jira/browse/HIVE-1204
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.6.0

 Attachments: hive.1204.1.patch, hive.1204.2.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1202) Unknown exception : null while join

2010-02-28 Thread Mafish (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839515#action_12839515
 ] 

Mafish commented on HIVE-1202:
--

@Yongqaing
I ran the query:

select a.name, b.* from classes a join classes b on a.name = b.number where 
a.name  b.number 
It passed.

In this case, two tables are physical.

But when I changed one of them to sub-query, error occured again, as:

select a.name, b.* from  (select name from classes) a join classes b on a.name 
= b.number where a.name  b.number ;

Please try this case.

 Unknown exception : null while join
 -

 Key: HIVE-1202
 URL: https://issues.apache.org/jira/browse/HIVE-1202
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.4.1
 Environment: hive-0.4.1
 hadoop 0.19.1
Reporter: Mafish
 Fix For: 0.4.1

 Attachments: HIVE-1202.branch-0.4.1.patch


 Hive throws Unknown exception : null with query:
 select * from 
 (
   select name from classes 
 ) a
   join classes b
 where a.name  b.number
 After tracing the code, I found this bug will occur with following
 conditions:
 1. It is join operation.
 2. At least one of the source of join is physical table (right side in
 above case).
 3. With where condition and condition(s) of where clause must include
 columns from both side of join (a.name and b.number in case)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1202) Unknown exception : null while join

2010-02-28 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839546#action_12839546
 ] 

He Yongqiang commented on HIVE-1202:


{quote}
But when I changed one of them to sub-query, error occured again, as:
select a.name, b.* from (select name from classes) a join classes b on a.name = 
b.number where a.name  b.number ;
{quote}

what's the error for this query?


 Unknown exception : null while join
 -

 Key: HIVE-1202
 URL: https://issues.apache.org/jira/browse/HIVE-1202
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.4.1
 Environment: hive-0.4.1
 hadoop 0.19.1
Reporter: Mafish
 Fix For: 0.4.1

 Attachments: HIVE-1202.branch-0.4.1.patch


 Hive throws Unknown exception : null with query:
 select * from 
 (
   select name from classes 
 ) a
   join classes b
 where a.name  b.number
 After tracing the code, I found this bug will occur with following
 conditions:
 1. It is join operation.
 2. At least one of the source of join is physical table (right side in
 above case).
 3. With where condition and condition(s) of where clause must include
 columns from both side of join (a.name and b.number in case)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1202) Unknown exception : null while join

2010-02-28 Thread Mafish (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839547#action_12839547
 ] 

Mafish commented on HIVE-1202:
--

Error message in hive is as title: Unknown exception : null.

And the call stack is the same as my first comment.

 Unknown exception : null while join
 -

 Key: HIVE-1202
 URL: https://issues.apache.org/jira/browse/HIVE-1202
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.4.1
 Environment: hive-0.4.1
 hadoop 0.19.1
Reporter: Mafish
 Fix For: 0.4.1

 Attachments: HIVE-1202.branch-0.4.1.patch


 Hive throws Unknown exception : null with query:
 select * from 
 (
   select name from classes 
 ) a
   join classes b
 where a.name  b.number
 After tracing the code, I found this bug will occur with following
 conditions:
 1. It is join operation.
 2. At least one of the source of join is physical table (right side in
 above case).
 3. With where condition and condition(s) of where clause must include
 columns from both side of join (a.name and b.number in case)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1202) Unknown exception : null while join

2010-02-28 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839551#action_12839551
 ] 

He Yongqiang commented on HIVE-1202:


I tried this query with the trunk code. it works fine.
hive select a.name, b.* from (select name from classes) a join classes b on 
a.name = b.number where a.nameb.number;   
Total MapReduce jobs = 1


 Unknown exception : null while join
 -

 Key: HIVE-1202
 URL: https://issues.apache.org/jira/browse/HIVE-1202
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.4.1
 Environment: hive-0.4.1
 hadoop 0.19.1
Reporter: Mafish
 Fix For: 0.4.1

 Attachments: HIVE-1202.branch-0.4.1.patch


 Hive throws Unknown exception : null with query:
 select * from 
 (
   select name from classes 
 ) a
   join classes b
 where a.name  b.number
 After tracing the code, I found this bug will occur with following
 conditions:
 1. It is join operation.
 2. At least one of the source of join is physical table (right side in
 above case).
 3. With where condition and condition(s) of where clause must include
 columns from both side of join (a.name and b.number in case)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1202) Unknown exception : null while join

2010-02-28 Thread Mafish (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839552#action_12839552
 ] 

Mafish commented on HIVE-1202:
--

Which trunk are you using?
I'm using release 0.4.1, which is checkoed out from 
http://svn.apache.org/repos/asf/hadoop/hive/branches/branch-0.4

$ svn info
Path: .
URL: http://svn.apache.org/repos/asf/hadoop/hive/branches/branch-0.4
Repository Root: http://svn.apache.org/repos/asf
Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
Revision: 916543
Node Kind: directory
Schedule: normal
Last Changed Author: nzhang
Last Changed Rev: 912061
Last Changed Date: 2010-02-20 09:44:44 +0800 (Sat, 20 Feb 2010)



 Unknown exception : null while join
 -

 Key: HIVE-1202
 URL: https://issues.apache.org/jira/browse/HIVE-1202
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.4.1
 Environment: hive-0.4.1
 hadoop 0.19.1
Reporter: Mafish
 Fix For: 0.4.1

 Attachments: HIVE-1202.branch-0.4.1.patch


 Hive throws Unknown exception : null with query:
 select * from 
 (
   select name from classes 
 ) a
   join classes b
 where a.name  b.number
 After tracing the code, I found this bug will occur with following
 conditions:
 1. It is join operation.
 2. At least one of the source of join is physical table (right side in
 above case).
 3. With where condition and condition(s) of where clause must include
 columns from both side of join (a.name and b.number in case)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1202) Unknown exception : null while join

2010-02-28 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839555#action_12839555
 ] 

He Yongqiang commented on HIVE-1202:


Please try 
http://svn.apache.org/repos/asf/hadoop/hive/trunk

or you can download the latest stable 0.5 version from
http://hadoop.apache.org/hive/releases.html#Download


 Unknown exception : null while join
 -

 Key: HIVE-1202
 URL: https://issues.apache.org/jira/browse/HIVE-1202
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.4.1
 Environment: hive-0.4.1
 hadoop 0.19.1
Reporter: Mafish
 Fix For: 0.4.1

 Attachments: HIVE-1202.branch-0.4.1.patch


 Hive throws Unknown exception : null with query:
 select * from 
 (
   select name from classes 
 ) a
   join classes b
 where a.name  b.number
 After tracing the code, I found this bug will occur with following
 conditions:
 1. It is join operation.
 2. At least one of the source of join is physical table (right side in
 above case).
 3. With where condition and condition(s) of where clause must include
 columns from both side of join (a.name and b.number in case)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1197) create a new input format where a mapper spans a file

2010-02-28 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839575#action_12839575
 ] 

He Yongqiang commented on HIVE-1197:


Looks very good overall, congrats!

just few minor comments:
1. Can you change inputFormatClassName to use getter and setter method?
2. some duplication code with HiveInputFormat, can we reuse them?
3. In BucketizedHiveRecordReader's next, i think should remove the check of 
curReader == null. we should throw an exception if curReader==null, which 
means the reader has been closed.
4. i think we should remove line 207 in BucketizedHiveInputFormat:   
newjob.setInputFormat(inputFormat.getClass());
5. In HiveRecordReader,
5.1 progress is calculated based on (number of splits done) / (total split 
number), can we make it more accurate? Let's say the work is evenly divided 
among all splits. something like this: (number of splits done) / (total split 
number) + currReader.getProgess();
5.2 getPos should return this currReader.getPos()

Another one is do you think it is a good idea to let the 
BucketizedHiveInputFormat extend HiveInputFormat? That way, the code would be 
more clear. And we should put the RecordReader and InputSplit in the same file 
as BucketizedHiveInputFormat.

 create a new input format where a mapper spans a file
 -

 Key: HIVE-1197
 URL: https://issues.apache.org/jira/browse/HIVE-1197
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Siying Dong
 Fix For: 0.6.0

 Attachments: hive.1197.1.patch


 This will be needed for Sort merge joins.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1197) create a new input format where a mapper spans a file

2010-02-28 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839574#action_12839574
 ] 

Namit Jain commented on HIVE-1197:
--

Overall, looks good - some general comments.

Would it be a good idea to make BucketizedHiveInputFormat extend 
HiveInpuFormat, and BucketizedHiveRecordReader extend HiveRecordReader ?
You wont have to copy a lot of code, and it would be easy to maintain. For 
example, the check for ExecMapper in hiverecordreader and such future 
optimizations would be easier to maintain.

 create a new input format where a mapper spans a file
 -

 Key: HIVE-1197
 URL: https://issues.apache.org/jira/browse/HIVE-1197
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Siying Dong
 Fix For: 0.6.0

 Attachments: hive.1197.1.patch


 This will be needed for Sort merge joins.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1197) create a new input format where a mapper spans a file

2010-02-28 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839576#action_12839576
 ] 

He Yongqiang commented on HIVE-1197:


Correction about 5.1, it should be  ((number of splits done) + 
currReader.getProgess() )/ (total split number)

 create a new input format where a mapper spans a file
 -

 Key: HIVE-1197
 URL: https://issues.apache.org/jira/browse/HIVE-1197
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Siying Dong
 Fix For: 0.6.0

 Attachments: hive.1197.1.patch


 This will be needed for Sort merge joins.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1194) sorted merge join

2010-02-28 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1194:
---

Attachment: hive-1194-2010-02-28.patch

for early review only. 
I will test it more and add more testcases.

 sorted merge join
 -

 Key: HIVE-1194
 URL: https://issues.apache.org/jira/browse/HIVE-1194
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: He Yongqiang
 Fix For: 0.6.0

 Attachments: hive-1194-2010-02-28.patch


 If the input tables are sorted on the join key, and a mapjoin is being 
 performed, it is useful to exploit the sorted properties of the table.
 This can lead to substantial cpu savings - this needs to work across bucketed 
 map joins also.
 Since, sorted properties of a table are not enforced currently, a new 
 parameter can be added to specify to use the sort-merge join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.