[jira] Created: (HIVE-650) [UDAF] implement GROUP_CONCAT(expr)

2009-07-16 Thread Min Zhou (JIRA)
[UDAF]  implement  GROUP_CONCAT(expr)
-

 Key: HIVE-650
 URL: https://issues.apache.org/jira/browse/HIVE-650
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Min Zhou


It's a very useful udaf for us. 
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat

GROUP_CONCAT(expr)

This function returns a string result with the concatenated non-NULL values 
from a group. It returns NULL if there are no non-NULL values. The full syntax 
is as follows: 

GROUP_CONCAT([DISTINCT] expr [,expr ...]
 [ORDER BY {unsigned_integer | col_name | expr}
 [ASC | DESC] [,col_name ...]]
 [SEPARATOR str_val])


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-594) Add "isNull", "hashCode", "equals", "compareTo" to ObjectInspector

2009-07-16 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732363#action_12732363
 ] 

Zheng Shao commented on HIVE-594:
-

Without this, we could not do complete object reuse.

For example, if all columns in the first row are non-null, we need to create 
objects for each of the column.
Then the second row comes in and all columns are null, so we have to set all 
fields in the row object to "null".
Then the third row comes in and all columns are non-null, we need to create new 
objects again.

LazySimpleSerDe has a hack to avoid this by adding a special function 
"getObject()".

With "isNull" added to ObjectInspector, we don't need to set the field in the 
row object to null, instead, we can set a bit inside the field object so that 
it knows it is a null.


We will need to replace the logic "xxx == null" and "xxx != null" in the code 
(Mainly, GenericUDFOPNull and GeneircUDFOPNonNull).


> Add "isNull", "hashCode", "equals", "compareTo" to ObjectInspector
> --
>
> Key: HIVE-594
> URL: https://issues.apache.org/jira/browse/HIVE-594
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.3.0, 0.3.1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>
> ObjectInspector should delegate all methods of the object, including 
> "isNull", "hashCode", "equals", "compareTo".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-649) [UDF] now() for getting current time

2009-07-16 Thread Min Zhou (JIRA)
[UDF] now() for getting current time


 Key: HIVE-649
 URL: https://issues.apache.org/jira/browse/HIVE-649
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Min Zhou


http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html#function_now

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-609) optimize multi-group by

2009-07-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-609:
---

Assignee: Namit Jain

> optimize multi-group by 
> 
>
> Key: HIVE-609
> URL: https://issues.apache.org/jira/browse/HIVE-609
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.609.1.patch
>
>
> For query like:
> from src
> insert overwrite table dest1 select col1, count(distinct colx) group by col1
> insert overwrite table dest2 select col2, count(distinct colx) group by col2;
> If map side aggregation is turned off, we currently do 4 map-reduce jobs.
> The plan can be optimized by running it in 3 map-reduce jobs, by spraying 
> over the
> distinct column first and then aggregating individual results.
> This may not be possible if there are multiple distinct columns, but the 
> above query is very common
> in data warehousing environments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-609) optimize multi-group by

2009-07-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-609:


Attachment: hive.609.1.patch

> optimize multi-group by 
> 
>
> Key: HIVE-609
> URL: https://issues.apache.org/jira/browse/HIVE-609
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.609.1.patch
>
>
> For query like:
> from src
> insert overwrite table dest1 select col1, count(distinct colx) group by col1
> insert overwrite table dest2 select col2, count(distinct colx) group by col2;
> If map side aggregation is turned off, we currently do 4 map-reduce jobs.
> The plan can be optimized by running it in 3 map-reduce jobs, by spraying 
> over the
> distinct column first and then aggregating individual results.
> This may not be possible if there are multiple distinct columns, but the 
> above query is very common
> in data warehousing environments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-405) Cleanup operator initialization

2009-07-16 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-405:


   Resolution: Fixed
Fix Version/s: 0.4.0
 Release Note: HIVE-405. Cleanup operator initialization. (Prasad Chakka 
via zshao)
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed. Thanks Prasad.

> Cleanup operator initialization
> ---
>
> Key: HIVE-405
> URL: https://issues.apache.org/jira/browse/HIVE-405
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Prasad Chakka
>Priority: Critical
> Fix For: 0.4.0
>
> Attachments: hive-405.6.patch, hive-405.7.patch, hive-405.9.patch, 
> hive-405.patch
>
>
> We are always passing the same ObjectInspector, so there is no need to pass 
> it again and again in forward.
> Also there is a problem that can ONLY be fixed by passing ObjectInspector in 
> init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for 
> all inputs, as a result, there is no way to construct an output 
> ObjectInspector based on the inputs. Currently we have hard-coded code that 
> assumes joins are always outputting Strings, which did break but was hidden 
> by the old framework (because we do toString() when serializing the output, 
> and toString() is defined for all Java Classes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-405) Cleanup operator initialization

2009-07-16 Thread Prasad Chakka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Chakka updated HIVE-405:
---

Attachment: hive-405.9.patch

> Cleanup operator initialization
> ---
>
> Key: HIVE-405
> URL: https://issues.apache.org/jira/browse/HIVE-405
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Prasad Chakka
>Priority: Critical
> Attachments: hive-405.6.patch, hive-405.7.patch, hive-405.9.patch, 
> hive-405.patch
>
>
> We are always passing the same ObjectInspector, so there is no need to pass 
> it again and again in forward.
> Also there is a problem that can ONLY be fixed by passing ObjectInspector in 
> init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for 
> all inputs, as a result, there is no way to construct an output 
> ObjectInspector based on the inputs. Currently we have hard-coded code that 
> assumes joins are always outputting Strings, which did break but was hidden 
> by the old framework (because we do toString() when serializing the output, 
> and toString() is defined for all Java Classes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-645) A UDF that can export data to JDBC databases.

2009-07-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732309#action_12732309
 ] 

Namit Jain commented on HIVE-645:
-

Actually, this is more of a user-defined procedure than a user-defined function.
Should we support them in some other way than a select list item ? 

Something, like

begin
   proc();
end;





> A UDF that can export data to JDBC databases.
> -
>
> Key: HIVE-645
> URL: https://issues.apache.org/jira/browse/HIVE-645
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Minor
> Attachments: hive-645-2.patch, hive-645.patch
>
>
> A UDF that can export data to JDBC databases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-645) A UDF that can export data to JDBC databases.

2009-07-16 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-645:
-

Release Note: Provides DBOutputUDF
  Status: Patch Available  (was: Open)

> A UDF that can export data to JDBC databases.
> -
>
> Key: HIVE-645
> URL: https://issues.apache.org/jira/browse/HIVE-645
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Minor
> Attachments: hive-645-2.patch, hive-645.patch
>
>
> A UDF that can export data to JDBC databases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-645) A UDF that can export data to JDBC databases.

2009-07-16 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-645:
-

Attachment: hive-645-2.patch

Better use of object inspectors.

> A UDF that can export data to JDBC databases.
> -
>
> Key: HIVE-645
> URL: https://issues.apache.org/jira/browse/HIVE-645
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Minor
> Attachments: hive-645-2.patch, hive-645.patch
>
>
> A UDF that can export data to JDBC databases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-646) UDFs for conversion between different number bases (conv, hex, bin)

2009-07-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-646:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed. Thanks Emil

> UDFs for conversion between different number bases (conv, hex, bin)
> ---
>
> Key: HIVE-646
> URL: https://issues.apache.org/jira/browse/HIVE-646
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Emil Ibrishimov
>Assignee: Emil Ibrishimov
> Fix For: 0.4.0
>
> Attachments: HIVE-646.1.patch
>
>
> Add conv, hex and bin UDFs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-405) Cleanup operator initialization

2009-07-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732276#action_12732276
 ] 

Namit Jain commented on HIVE-405:
-

i dont like it - but it is a minor issue and can be done later
so fine


> Cleanup operator initialization
> ---
>
> Key: HIVE-405
> URL: https://issues.apache.org/jira/browse/HIVE-405
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Prasad Chakka
>Priority: Critical
> Attachments: hive-405.6.patch, hive-405.7.patch, hive-405.patch
>
>
> We are always passing the same ObjectInspector, so there is no need to pass 
> it again and again in forward.
> Also there is a problem that can ONLY be fixed by passing ObjectInspector in 
> init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for 
> all inputs, as a result, there is no way to construct an output 
> ObjectInspector based on the inputs. Currently we have hard-coded code that 
> assumes joins are always outputting Strings, which did break but was hidden 
> by the old framework (because we do toString() when serializing the output, 
> and toString() is defined for all Java Classes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-405) Cleanup operator initialization

2009-07-16 Thread Prasad Chakka (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732268#action_12732268
 ] 

Prasad Chakka commented on HIVE-405:


for now i will leave the initializeChildren() inside initializeOp(). what do 
you guys say?

> Cleanup operator initialization
> ---
>
> Key: HIVE-405
> URL: https://issues.apache.org/jira/browse/HIVE-405
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Prasad Chakka
>Priority: Critical
> Attachments: hive-405.6.patch, hive-405.7.patch, hive-405.patch
>
>
> We are always passing the same ObjectInspector, so there is no need to pass 
> it again and again in forward.
> Also there is a problem that can ONLY be fixed by passing ObjectInspector in 
> init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for 
> all inputs, as a result, there is no way to construct an output 
> ObjectInspector based on the inputs. Currently we have hard-coded code that 
> assumes joins are always outputting Strings, which did break but was hidden 
> by the old framework (because we do toString() when serializing the output, 
> and toString() is defined for all Java Classes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-405) Cleanup operator initialization

2009-07-16 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732234#action_12732234
 ] 

Zheng Shao commented on HIVE-405:
-

Talked with Prasad on this offline.

Since we already have an example of initializing children before we can finish 
all the initializations (ScriptOperator kicking off the thread that gets data 
from the script), I think it makes sense to keep the "initializeChildren" call 
inside the customized initializeOp().

Another way to do this is to add a post-order intialize recursive call to the 
Operator. That is probably the cleanest approach - we will first do pre-order 
initialization, and then post-order initialization.


> Cleanup operator initialization
> ---
>
> Key: HIVE-405
> URL: https://issues.apache.org/jira/browse/HIVE-405
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Prasad Chakka
>Priority: Critical
> Attachments: hive-405.6.patch, hive-405.7.patch, hive-405.patch
>
>
> We are always passing the same ObjectInspector, so there is no need to pass 
> it again and again in forward.
> Also there is a problem that can ONLY be fixed by passing ObjectInspector in 
> init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for 
> all inputs, as a result, there is no way to construct an output 
> ObjectInspector based on the inputs. Currently we have hard-coded code that 
> assumes joins are always outputting Strings, which did break but was hidden 
> by the old framework (because we do toString() when serializing the output, 
> and toString() is defined for all Java Classes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-647) SORT BY with GROUP ignored without LIMIT

2009-07-16 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732231#action_12732231
 ] 

Ashish Thusoo commented on HIVE-647:


Actually sort is supposed to be a local sort within a reduce instead of a 
global sort. It is usually used along with the distribute by to define the 
manner in which the keys are distributed to a reducer and sorted within a 
reducer.

I believe that if you used order by instead of sort by we automatically select 
1 reducer and do the sort.


> SORT BY with GROUP ignored without LIMIT
> 
>
> Key: HIVE-647
> URL: https://issues.apache.org/jira/browse/HIVE-647
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Bill Graham
>
> For queries with GROUP BY and SORT BY, the sort is not handled properly when 
> a LIMIT is not supplied. If I run the following two queries, the first 
> returns properly sorted results. The second does not.
> SELECT user, SUM(numRequests) AS num FROM MyTable GROUP BY user SORT BY num 
> DESC LIMIT 50;
> SELECT user, SUM(numRequests) AS num FROM MyTable GROUP BY user SORT BY num 
> DESC;
> Explain is different for the two queries as well. The first uses 3 M/R jobs 
> and the second only uses 2, which might be part of the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-644) change default size for merging files at the end of the job

2009-07-16 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-644:


   Resolution: Fixed
Fix Version/s: 0.4.0
 Release Note: HIVE-644. Change default size for merging files to 256MB.  
(Namit Jain via zshao)
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed. Thanks Namit!

> change default size for merging files at the end of the job
> ---
>
> Key: HIVE-644
> URL: https://issues.apache.org/jira/browse/HIVE-644
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.4.0
>
> Attachments: hive.644.1.patch
>
>
> Currently, the size is 1G and the reducers end up taking a really long time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-648) Better info messages on the command line

2009-07-16 Thread Zheng Shao (JIRA)
Better info messages on the command line


 Key: HIVE-648
 URL: https://issues.apache.org/jira/browse/HIVE-648
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Zheng Shao


We want to print out some better error messages on the hive command line.
Currently we show the progress of each map-reduce jobs, but it's hard to see 
which map-reduce jobs we are currently running.

We should show the information in a more structured way.

This is dependent on another issue where we want to show the total number of 
tasks for each query (instead of just map-reduce jobs).


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-647) SORT BY with GROUP ignored without LIMIT

2009-07-16 Thread Bill Graham (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732176#action_12732176
 ] 

Bill Graham commented on HIVE-647:
--

Note that the second query does sort properly if I explicitly set the number of 
reducers to 1 with the following command.

set mapred.reduce.tasks=1; 

> SORT BY with GROUP ignored without LIMIT
> 
>
> Key: HIVE-647
> URL: https://issues.apache.org/jira/browse/HIVE-647
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Bill Graham
>
> For queries with GROUP BY and SORT BY, the sort is not handled properly when 
> a LIMIT is not supplied. If I run the following two queries, the first 
> returns properly sorted results. The second does not.
> SELECT user, SUM(numRequests) AS num FROM MyTable GROUP BY user SORT BY num 
> DESC LIMIT 50;
> SELECT user, SUM(numRequests) AS num FROM MyTable GROUP BY user SORT BY num 
> DESC;
> Explain is different for the two queries as well. The first uses 3 M/R jobs 
> and the second only uses 2, which might be part of the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-647) SORT BY with GROUP ignored without LIMIT

2009-07-16 Thread Bill Graham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Graham updated HIVE-647:
-

Summary: SORT BY with GROUP ignored without LIMIT  (was: ORDER BY with 
GROUP ignored without LIMIT)

> SORT BY with GROUP ignored without LIMIT
> 
>
> Key: HIVE-647
> URL: https://issues.apache.org/jira/browse/HIVE-647
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Bill Graham
>
> For queries with GROUP BY and SORT BY, the sort is not handled properly when 
> a LIMIT is not supplied. If I run the following two queries, the first 
> returns properly sorted results. The second does not.
> SELECT user, SUM(numRequests) AS num FROM MyTable GROUP BY user SORT BY num 
> DESC LIMIT 50;
> SELECT user, SUM(numRequests) AS num FROM MyTable GROUP BY user SORT BY num 
> DESC;
> Explain is different for the two queries as well. The first uses 3 M/R jobs 
> and the second only uses 2, which might be part of the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-396) Hive performance benchmarks

2009-07-16 Thread Yuntao Jia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuntao Jia updated HIVE-396:


Attachment: (was: hive_benchmark_2009-07-12.pdf)

> Hive performance benchmarks
> ---
>
> Key: HIVE-396
> URL: https://issues.apache.org/jira/browse/HIVE-396
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Zheng Shao
> Attachments: hive_benchmark_2009-06-18.pdf, 
> hive_benchmark_2009-06-18.tar.gz, hive_benchmark_2009-07-12.pdf, 
> hive_benchmark_2009-07-12.tar.gz
>
>
> We need some performance benchmark to measure and track the performance 
> improvements of Hive.
> Some references:
> PIG performance benchmarks PIG-200
> PigMix: http://wiki.apache.org/pig/PigMix

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-396) Hive performance benchmarks

2009-07-16 Thread Yuntao Jia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuntao Jia updated HIVE-396:


Attachment: hive_benchmark_2009-07-12.pdf

Revising the benchmark report. Thanks Raghu Murthy for his help.

> Hive performance benchmarks
> ---
>
> Key: HIVE-396
> URL: https://issues.apache.org/jira/browse/HIVE-396
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Zheng Shao
> Attachments: hive_benchmark_2009-06-18.pdf, 
> hive_benchmark_2009-06-18.tar.gz, hive_benchmark_2009-07-12.pdf, 
> hive_benchmark_2009-07-12.tar.gz
>
>
> We need some performance benchmark to measure and track the performance 
> improvements of Hive.
> Some references:
> PIG performance benchmarks PIG-200
> PigMix: http://wiki.apache.org/pig/PigMix

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-647) ORDER BY with GROUP ignored without LIMIT

2009-07-16 Thread Bill Graham (JIRA)
ORDER BY with GROUP ignored without LIMIT
-

 Key: HIVE-647
 URL: https://issues.apache.org/jira/browse/HIVE-647
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Bill Graham


For queries with GROUP BY and SORT BY, the sort is not handled properly when a 
LIMIT is not supplied. If I run the following two queries, the first returns 
properly sorted results. The second does not.

SELECT user, SUM(numRequests) AS num FROM MyTable GROUP BY user SORT BY num 
DESC LIMIT 50;
SELECT user, SUM(numRequests) AS num FROM MyTable GROUP BY user SORT BY num 
DESC;

Explain is different for the two queries as well. The first uses 3 M/R jobs and 
the second only uses 2, which might be part of the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-425) HWI JSP pages should be compiled at build-time instead of run-time

2009-07-16 Thread Alex Loddengaard (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732120#action_12732120
 ] 

Alex Loddengaard commented on HIVE-425:
---

I've used both Struts and Spring, but I'm pretty certain both are much larger 
than 5MB.  That said, I found each of them to be good in their own separate 
ways.  I recall liking Spring better.  It's been a long time since my JSP MVC 
fiddling :).  Struts is an Apache project, though: [http://struts.apache.org/]. 
 Sorry I can't provide more insight, Edward.

> HWI JSP pages should be compiled at build-time instead of run-time
> --
>
> Key: HIVE-425
> URL: https://issues.apache.org/jira/browse/HIVE-425
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Alex Loddengaard
>
> HWI JSP pages are compiled via the ant jar at run-time.  Doing so at run-time 
> requires ant as a dependency and also makes developing slightly more tricky, 
> as compiler errors are not discovered until HWI is deployed and running.  HWI 
> should be instrumented in such a way where the JSP pages are compiled by ant 
> at build-time instead, just as the Hadoop status pages are.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-425) HWI JSP pages should be compiled at build-time instead of run-time

2009-07-16 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732112#action_12732112
 ] 

Edward Capriolo commented on HIVE-425:
--

Alex,

Here are the current problems I see.

* JSP 
** Hard To check Syntax at Compile Time
** References to other parts of hive (not many but) may become outdated with 
API changes
* Hard to unit test Are unit test tests the SessionManager, not the actual jsp 
flow of passing form items.

So the ideal framework would be: 
* small <5MB. we dont want to make Hive bigger just better
* mostly .Java and servlet based (little/no JSP)
* Possibly a way to unit test a transaction. Login->RunQuery->Verify Results
* Apache compat license 
* XML/CSS (if it helps eliminate JSP)
* Low complexity (I am not interested in learning what a 
org.springframework.transaction.interceptor.AbstractFallbackTransactionAttributeSource
 ) is for example :)

I know little about JSP Frameworks. Do you you have one that fits the needs?

> HWI JSP pages should be compiled at build-time instead of run-time
> --
>
> Key: HIVE-425
> URL: https://issues.apache.org/jira/browse/HIVE-425
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Alex Loddengaard
>
> HWI JSP pages are compiled via the ant jar at run-time.  Doing so at run-time 
> requires ant as a dependency and also makes developing slightly more tricky, 
> as compiler errors are not discovered until HWI is deployed and running.  HWI 
> should be instrumented in such a way where the JSP pages are compiled by ant 
> at build-time instead, just as the Hadoop status pages are.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-405) Cleanup operator initialization

2009-07-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732057#action_12732057
 ] 

Namit Jain commented on HIVE-405:
-

>> Are you suggesting that we keep initializeChildren() method and override in 
>> MapOperator()?
yes


Yes, MapOperator() becomes more customized, but all other operators become 
simplified
initializeOp() just calls operator specific initialization, if any.
By default, all children are initialized after that, with the 
outputObjectInspector (assuming there is only one).

MapOperator overrides this behavior -- 

If we want to make it more general, we can have an array of 
outputObjectInspectors, one for each child - they will be the same except for 
MapOperator,
but then mapOperator can also fit in this framework - I dont think it is worth 
it to generalize at that level.





> Cleanup operator initialization
> ---
>
> Key: HIVE-405
> URL: https://issues.apache.org/jira/browse/HIVE-405
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Prasad Chakka
>Priority: Critical
> Attachments: hive-405.6.patch, hive-405.7.patch, hive-405.patch
>
>
> We are always passing the same ObjectInspector, so there is no need to pass 
> it again and again in forward.
> Also there is a problem that can ONLY be fixed by passing ObjectInspector in 
> init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for 
> all inputs, as a result, there is no way to construct an output 
> ObjectInspector based on the inputs. Currently we have hard-coded code that 
> assumes joins are always outputting Strings, which did break but was hidden 
> by the old framework (because we do toString() when serializing the output, 
> and toString() is defined for all Java Classes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-405) Cleanup operator initialization

2009-07-16 Thread Prasad Chakka (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732053#action_12732053
 ] 

Prasad Chakka commented on HIVE-405:


Are you suggesting that we keep initializeChildren() method and override in 
MapOperator()? I think the current initialization methods has distinct 
functionalities

1) Operator.initialize() -- makes sure that all parents are initialized before 
the operator is initialized. this also initializes common structures needed for 
all operators. this the only public initialization method.
2) Operator.initializeOp() -- does operator specific initialization including 
initialization of children. It is up to the operator in what order child 
operators need to be initialized. the base implementation will just call 
initializeChildren() with the output ObjectInspector. this is a protected 
method.
3) Operator.initializeChildren() -- calls initialize() on all children. this is 
a protected method.

I think what we have here and you are proposing are pretty similar except that 
MapOperator() become more customized.

But I agree that there should be an OutputObjectInspector field in 
Operator.java and that should be used while calling initialize() on children in 
Operator.java. I will make that change.

> Cleanup operator initialization
> ---
>
> Key: HIVE-405
> URL: https://issues.apache.org/jira/browse/HIVE-405
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Prasad Chakka
>Priority: Critical
> Attachments: hive-405.6.patch, hive-405.7.patch, hive-405.patch
>
>
> We are always passing the same ObjectInspector, so there is no need to pass 
> it again and again in forward.
> Also there is a problem that can ONLY be fixed by passing ObjectInspector in 
> init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for 
> all inputs, as a result, there is no way to construct an output 
> ObjectInspector based on the inputs. Currently we have hard-coded code that 
> assumes joins are always outputting Strings, which did break but was hidden 
> by the old framework (because we do toString() when serializing the output, 
> and toString() is defined for all Java Classes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-425) HWI JSP pages should be compiled at build-time instead of run-time

2009-07-16 Thread Alex Loddengaard (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732039#action_12732039
 ] 

Alex Loddengaard commented on HIVE-425:
---

Hi Edward, which MVC frameworks are you going to consider?  I assume Struts and 
Spring don't fall into the "light" category, yeah?  I'm mostly just curious.

> HWI JSP pages should be compiled at build-time instead of run-time
> --
>
> Key: HIVE-425
> URL: https://issues.apache.org/jira/browse/HIVE-425
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Alex Loddengaard
>
> HWI JSP pages are compiled via the ant jar at run-time.  Doing so at run-time 
> requires ant as a dependency and also makes developing slightly more tricky, 
> as compiler errors are not discovered until HWI is deployed and running.  HWI 
> should be instrumented in such a way where the JSP pages are compiled by ant 
> at build-time instead, just as the Hadoop status pages are.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-405) Cleanup operator initialization

2009-07-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732040#action_12732040
 ] 

Namit Jain commented on HIVE-405:
-

As you said 2. does not matter - I dont think if we move initializeChildren() 
to the end, anything will change

1. is needed - MapOperator is an exception, and it needs a different walker - 
it can override initializeChildren() to do nothing.
In some sense, it has more than one outputObjectInspectors, which is not true 
for any other operator - ExecMapper and ExecReducer are
different.

I still think we should do 1. - and treat MapOperator as a exception


> Cleanup operator initialization
> ---
>
> Key: HIVE-405
> URL: https://issues.apache.org/jira/browse/HIVE-405
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Prasad Chakka
>Priority: Critical
> Attachments: hive-405.6.patch, hive-405.7.patch, hive-405.patch
>
>
> We are always passing the same ObjectInspector, so there is no need to pass 
> it again and again in forward.
> Also there is a problem that can ONLY be fixed by passing ObjectInspector in 
> init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for 
> all inputs, as a result, there is no way to construct an output 
> ObjectInspector based on the inputs. Currently we have hard-coded code that 
> assumes joins are always outputting Strings, which did break but was hidden 
> by the old framework (because we do toString() when serializing the output, 
> and toString() is defined for all Java Classes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-646) UDFs for conversion between different number bases (conv, hex, bin)

2009-07-16 Thread Emil Ibrishimov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emil Ibrishimov updated HIVE-646:
-

Fix Version/s: 0.4.0
   Status: Patch Available  (was: Open)

> UDFs for conversion between different number bases (conv, hex, bin)
> ---
>
> Key: HIVE-646
> URL: https://issues.apache.org/jira/browse/HIVE-646
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Emil Ibrishimov
>Assignee: Emil Ibrishimov
> Fix For: 0.4.0
>
> Attachments: HIVE-646.1.patch
>
>
> Add conv, hex and bin UDFs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-646) UDFs for conversion between different number bases (conv, hex, bin)

2009-07-16 Thread Emil Ibrishimov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emil Ibrishimov updated HIVE-646:
-

Attachment: HIVE-646.1.patch

Does anyone know a faster way to check for overflow and for unsigned division 
in java (than the one implemented in conv)?

> UDFs for conversion between different number bases (conv, hex, bin)
> ---
>
> Key: HIVE-646
> URL: https://issues.apache.org/jira/browse/HIVE-646
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Emil Ibrishimov
>Assignee: Emil Ibrishimov
> Fix For: 0.4.0
>
> Attachments: HIVE-646.1.patch
>
>
> Add conv, hex and bin UDFs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-646) UDFs for conversion between different number bases (conv, hex, bin)

2009-07-16 Thread Emil Ibrishimov (JIRA)
UDFs for conversion between different number bases (conv, hex, bin)
---

 Key: HIVE-646
 URL: https://issues.apache.org/jira/browse/HIVE-646
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Emil Ibrishimov
Assignee: Emil Ibrishimov


Add conv, hex and bin UDFs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-405) Cleanup operator initialization

2009-07-16 Thread Prasad Chakka (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732029#action_12732029
 ] 

Prasad Chakka commented on HIVE-405:


i thought about it but didn't do it for two reasons

1) any operator can choose not to call initializeChildren. eg. MapOperator does 
not call initializeChildren() directly but calls initialize() on children 
directly with different ObjectInspector object for each child.

2) ScriptOperator.initializeOp() seems to do some more stuff after calling 
initializeChildren() which is different from calling initiaizeChildren() after 
calling initializeOp()

Even if 2) doesn't matter, I don't see how we can get around 1) in the scheme 
you suggested

If you agree then I am going to upload a new patch which fixes TestOperators 
(which was failing after merging with trunk).

> Cleanup operator initialization
> ---
>
> Key: HIVE-405
> URL: https://issues.apache.org/jira/browse/HIVE-405
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Prasad Chakka
>Priority: Critical
> Attachments: hive-405.6.patch, hive-405.7.patch, hive-405.patch
>
>
> We are always passing the same ObjectInspector, so there is no need to pass 
> it again and again in forward.
> Also there is a problem that can ONLY be fixed by passing ObjectInspector in 
> init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for 
> all inputs, as a result, there is no way to construct an output 
> ObjectInspector based on the inputs. Currently we have hard-coded code that 
> assumes joins are always outputting Strings, which did break but was hidden 
> by the old framework (because we do toString() when serializing the output, 
> and toString() is defined for all Java Classes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-405) Cleanup operator initialization

2009-07-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732025#action_12732025
 ] 

Namit Jain commented on HIVE-405:
-

Other than that, it looks good, you dont need to set 
state = State.INIT in initializeChildren()


the operator.initialize() will look like:

...

initializeOp()
state = INIT
initializeChildren()
..


initializechildren() can be a private method which no one else should call




> Cleanup operator initialization
> ---
>
> Key: HIVE-405
> URL: https://issues.apache.org/jira/browse/HIVE-405
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Prasad Chakka
>Priority: Critical
> Attachments: hive-405.6.patch, hive-405.7.patch, hive-405.patch
>
>
> We are always passing the same ObjectInspector, so there is no need to pass 
> it again and again in forward.
> Also there is a problem that can ONLY be fixed by passing ObjectInspector in 
> init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for 
> all inputs, as a result, there is no way to construct an output 
> ObjectInspector based on the inputs. Currently we have hard-coded code that 
> assumes joins are always outputting Strings, which did break but was hidden 
> by the old framework (because we do toString() when serializing the output, 
> and toString() is defined for all Java Classes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-405) Cleanup operator initialization

2009-07-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732019#action_12732019
 ] 

Namit Jain commented on HIVE-405:
-

Thinking about it more, you should add a outputObjectInspector to Operator, 
which can be defaulted to inputObjectInspector[0], but can be overridden by
specific initializeOp() - for example, groupBy.

This way, initializeChildren() in Operator need not be called with 
inputObjectInspector[0], but it can be called with outputObjectInspector


> Cleanup operator initialization
> ---
>
> Key: HIVE-405
> URL: https://issues.apache.org/jira/browse/HIVE-405
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Prasad Chakka
>Priority: Critical
> Attachments: hive-405.6.patch, hive-405.7.patch, hive-405.patch
>
>
> We are always passing the same ObjectInspector, so there is no need to pass 
> it again and again in forward.
> Also there is a problem that can ONLY be fixed by passing ObjectInspector in 
> init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for 
> all inputs, as a result, there is no way to construct an output 
> ObjectInspector based on the inputs. Currently we have hard-coded code that 
> assumes joins are always outputting Strings, which did break but was hidden 
> by the old framework (because we do toString() when serializing the output, 
> and toString() is defined for all Java Classes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-405) Cleanup operator initialization

2009-07-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732016#action_12732016
 ] 

Namit Jain commented on HIVE-405:
-

Does initializeOp() even need to call initializeChildren() ?

This can be removed from all implementations of initializeOp() and can be added 
to Operator.initialize() after initializeOp()

> Cleanup operator initialization
> ---
>
> Key: HIVE-405
> URL: https://issues.apache.org/jira/browse/HIVE-405
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Prasad Chakka
>Priority: Critical
> Attachments: hive-405.6.patch, hive-405.7.patch, hive-405.patch
>
>
> We are always passing the same ObjectInspector, so there is no need to pass 
> it again and again in forward.
> Also there is a problem that can ONLY be fixed by passing ObjectInspector in 
> init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for 
> all inputs, as a result, there is no way to construct an output 
> ObjectInspector based on the inputs. Currently we have hard-coded code that 
> assumes joins are always outputting Strings, which did break but was hidden 
> by the old framework (because we do toString() when serializing the output, 
> and toString() is defined for all Java Classes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-645) A UDF that can export data to JDBC databases.

2009-07-16 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-645:
-

Attachment: hive-645.patch

First Draft patch for comments.

> A UDF that can export data to JDBC databases.
> -
>
> Key: HIVE-645
> URL: https://issues.apache.org/jira/browse/HIVE-645
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Minor
> Attachments: hive-645.patch
>
>
> A UDF that can export data to JDBC databases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-645) A UDF that can export data to JDBC databases.

2009-07-16 Thread Edward Capriolo (JIRA)
A UDF that can export data to JDBC databases.
-

 Key: HIVE-645
 URL: https://issues.apache.org/jira/browse/HIVE-645
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Edward Capriolo
Assignee: Edward Capriolo
Priority: Minor


A UDF that can export data to JDBC databases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-405) Cleanup operator initialization

2009-07-16 Thread Prasad Chakka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Chakka updated HIVE-405:
---

Attachment: hive-405.7.patch

resolved conflicts with GroupByOperator.java and UnionOperator.java. I am 
testing the later conflict but uploaded latest patch.

> Cleanup operator initialization
> ---
>
> Key: HIVE-405
> URL: https://issues.apache.org/jira/browse/HIVE-405
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Prasad Chakka
>Priority: Critical
> Attachments: hive-405.6.patch, hive-405.7.patch, hive-405.patch
>
>
> We are always passing the same ObjectInspector, so there is no need to pass 
> it again and again in forward.
> Also there is a problem that can ONLY be fixed by passing ObjectInspector in 
> init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for 
> all inputs, as a result, there is no way to construct an output 
> ObjectInspector based on the inputs. Currently we have hard-coded code that 
> assumes joins are always outputting Strings, which did break but was hidden 
> by the old framework (because we do toString() when serializing the output, 
> and toString() is defined for all Java Classes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-405) Cleanup operator initialization

2009-07-16 Thread Prasad Chakka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Chakka updated HIVE-405:
---

Attachment: (was: hive-405.7.patch)

> Cleanup operator initialization
> ---
>
> Key: HIVE-405
> URL: https://issues.apache.org/jira/browse/HIVE-405
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Prasad Chakka
>Priority: Critical
> Attachments: hive-405.6.patch, hive-405.patch
>
>
> We are always passing the same ObjectInspector, so there is no need to pass 
> it again and again in forward.
> Also there is a problem that can ONLY be fixed by passing ObjectInspector in 
> init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for 
> all inputs, as a result, there is no way to construct an output 
> ObjectInspector based on the inputs. Currently we have hard-coded code that 
> assumes joins are always outputting Strings, which did break but was hidden 
> by the old framework (because we do toString() when serializing the output, 
> and toString() is defined for all Java Classes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-405) Cleanup operator initialization

2009-07-16 Thread Prasad Chakka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Chakka updated HIVE-405:
---

Attachment: hive-405.7.patch

> Cleanup operator initialization
> ---
>
> Key: HIVE-405
> URL: https://issues.apache.org/jira/browse/HIVE-405
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Prasad Chakka
>Priority: Critical
> Attachments: hive-405.6.patch, hive-405.7.patch, hive-405.patch
>
>
> We are always passing the same ObjectInspector, so there is no need to pass 
> it again and again in forward.
> Also there is a problem that can ONLY be fixed by passing ObjectInspector in 
> init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for 
> all inputs, as a result, there is no way to construct an output 
> ObjectInspector based on the inputs. Currently we have hard-coded code that 
> assumes joins are always outputting Strings, which did break but was hidden 
> by the old framework (because we do toString() when serializing the output, 
> and toString() is defined for all Java Classes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-425) HWI JSP pages should be compiled at build-time instead of run-time

2009-07-16 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731990#action_12731990
 ] 

Edward Capriolo commented on HIVE-425:
--

I have tried several things to make this process work. Without siting any one 
specific error it seems that the 'org.apache.jasper.JspC' compilation seems to 
require specific ant-jetty combination. I want to look into a light mvc 
framework something that will get all the code into servlets/java classes. In 
this way it will be pre-compiled.



> HWI JSP pages should be compiled at build-time instead of run-time
> --
>
> Key: HIVE-425
> URL: https://issues.apache.org/jira/browse/HIVE-425
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Alex Loddengaard
>
> HWI JSP pages are compiled via the ant jar at run-time.  Doing so at run-time 
> requires ant as a dependency and also makes developing slightly more tricky, 
> as compiler errors are not discovered until HWI is deployed and running.  HWI 
> should be instrumented in such a way where the JSP pages are compiled by ant 
> at build-time instead, just as the Hadoop status pages are.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson build is back to normal: Hive-trunk-h0.18 #158

2009-07-16 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/158/changes




[jira] Commented: (HIVE-405) Cleanup operator initialization

2009-07-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731970#action_12731970
 ] 

Namit Jain commented on HIVE-405:
-

Can you regenerate the patch - there are some conflicts

> Cleanup operator initialization
> ---
>
> Key: HIVE-405
> URL: https://issues.apache.org/jira/browse/HIVE-405
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Prasad Chakka
>Priority: Critical
> Attachments: hive-405.6.patch, hive-405.patch
>
>
> We are always passing the same ObjectInspector, so there is no need to pass 
> it again and again in forward.
> Also there is a problem that can ONLY be fixed by passing ObjectInspector in 
> init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for 
> all inputs, as a result, there is no way to construct an output 
> ObjectInspector based on the inputs. Currently we have hard-coded code that 
> assumes joins are always outputting Strings, which did break but was hidden 
> by the old framework (because we do toString() when serializing the output, 
> and toString() is defined for all Java Classes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Error on Load into multiple Partitions

2009-07-16 Thread Matt Pestritto
Namit.

I just Updated to revision 794686 and that worked.  It looks like Zheng
committed this patch in the afternoon and this failed for me earlier that
morning.  Bad luck on my timing but I'm happy it works now.

Thanks.
-Matt


On Thu, Jul 16, 2009 at 10:09 AM, Namit Jain  wrote:

> Most probably, this is the same as
>
> https://issues.apache.org/jira/browse/HIVE-636
>
> which was merged just a days back. Can you try on the latest trunk ?
>
>
>
>
> On 7/16/09 6:45 AM, "Matt Pestritto"  wrote:
>
> Does anyone have any idea as to the reason for this error ?
>
> Thanks in Advance
> -Matt
>
> -- Forwarded message --
> From: Matt Pestritto 
> Date: Wed, Jul 15, 2009 at 10:09 AM
> Subject: Error on Load into multiple Partitions
> To: hive-dev@hadoop.apache.org
>
>
> Hi All.
>
> Are there are existing test cases that load into multiple partitions using
> a
> single from query?  This query worked in an older revision but the mappers
> fails when I run on trunk:
>
> java.lang.RuntimeException: Map operator initialization failed
>
>at
> org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
>at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)
>
> Caused by: java.lang.NullPointerException
>at
> org.apache.hadoop.hive.ql.exec.MapOperator.initObjectInspector(MapOperator.java:176)
>at
> org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:204)
>
>at
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:264)
>at
> org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:103)
>
>
> Here is a simplified version of what I'm running and DDL to support:
> *create table test_m ( client int, description string )
>  row format delimited fields terminated by '\011' lines terminated by
> '\012' stored as textfile;
> *
> *create table test_m_p ( description string )
>  partitioned by ( client int ) row format delimited fields terminated by
> '\011' lines terminated by '\012' stored as textfile;
> *
> *LOAD DATA LOCAL INPATH '/tmp/m.lst' OVERWRITE INTO TABLE test_m  ; *
>
> *FROM test_m
> INSERT OVERWRITE TABLE test_m_p PARTITION ( client=1 ) select description
> where client=1
> INSERT OVERWRITE TABLE test_m_p PARTITION ( client=2 ) select description
> where client=2 ;
> *
> --- contents of /tmp/m.lst
> 1test
> 1test2
> 1test3
> 2hi
> 2hi1
> 2hi3
>
> Thanks!
> -Matt
>
>


[jira] Commented: (HIVE-405) Cleanup operator initialization

2009-07-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731954#action_12731954
 ] 

Namit Jain commented on HIVE-405:
-

I will take a look and get back to you.

> Cleanup operator initialization
> ---
>
> Key: HIVE-405
> URL: https://issues.apache.org/jira/browse/HIVE-405
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Prasad Chakka
>Priority: Critical
> Attachments: hive-405.6.patch, hive-405.patch
>
>
> We are always passing the same ObjectInspector, so there is no need to pass 
> it again and again in forward.
> Also there is a problem that can ONLY be fixed by passing ObjectInspector in 
> init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for 
> all inputs, as a result, there is no way to construct an output 
> ObjectInspector based on the inputs. Currently we have hard-coded code that 
> assumes joins are always outputting Strings, which did break but was hidden 
> by the old framework (because we do toString() when serializing the output, 
> and toString() is defined for all Java Classes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-604) remove Schema from metastore

2009-07-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-604:
---

Assignee: Min Zhou  (was: Namit Jain)

> remove Schema from metastore
> 
>
> Key: HIVE-604
> URL: https://issues.apache.org/jira/browse/HIVE-604
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Min Zhou
> Fix For: 0.4.0
>
>
> Since Schema is not used by metastore, remove it from there

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-576) complete jdbc driver

2009-07-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731952#action_12731952
 ] 

Namit Jain commented on HIVE-576:
-

Now that, hive is using asf thrift, you can start with modifying getSchema() - 
remove the new object Schema() from the metastore.
The hive client can return hive types instead of thrift types.

We can more discussion on https://issues.apache.org/jira/browse/HIVE-604
I will assign that to you

> complete jdbc driver
> 
>
> Key: HIVE-576
> URL: https://issues.apache.org/jira/browse/HIVE-576
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Min Zhou
>Assignee: Min Zhou
> Fix For: 0.4.0
>
> Attachments: HIVE-576.1.patch, HIVE-576.2.patch, sqlexplorer.jpg
>
>
> hive only support a few interfaces of jdbc, let's complete it. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-541) Implement UDFs: INSTR and LOCATE

2009-07-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-541.
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Committed. Thanks Min

> Implement UDFs: INSTR and LOCATE
> 
>
> Key: HIVE-541
> URL: https://issues.apache.org/jira/browse/HIVE-541
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Min Zhou
> Attachments: HIVE-541.1.patch, HIVE-541.2.patch
>
>
> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_instr
> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_locate
> These functions can be directly implemented with Text (instead of String). 
> This will make the test of whether one string contains another string much 
> faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Error on Load into multiple Partitions

2009-07-16 Thread Namit Jain
Most probably, this is the same as

https://issues.apache.org/jira/browse/HIVE-636

which was merged just a days back. Can you try on the latest trunk ?




On 7/16/09 6:45 AM, "Matt Pestritto"  wrote:

Does anyone have any idea as to the reason for this error ?

Thanks in Advance
-Matt

-- Forwarded message --
From: Matt Pestritto 
Date: Wed, Jul 15, 2009 at 10:09 AM
Subject: Error on Load into multiple Partitions
To: hive-dev@hadoop.apache.org


Hi All.

Are there are existing test cases that load into multiple partitions using a
single from query?  This query worked in an older revision but the mappers
fails when I run on trunk:

java.lang.RuntimeException: Map operator initialization failed

at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)

Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.MapOperator.initObjectInspector(MapOperator.java:176)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:204)

at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:264)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:103)


Here is a simplified version of what I'm running and DDL to support:
*create table test_m ( client int, description string )
  row format delimited fields terminated by '\011' lines terminated by
'\012' stored as textfile;
*
*create table test_m_p ( description string )
  partitioned by ( client int ) row format delimited fields terminated by
'\011' lines terminated by '\012' stored as textfile;
*
*LOAD DATA LOCAL INPATH '/tmp/m.lst' OVERWRITE INTO TABLE test_m  ; *

*FROM test_m
INSERT OVERWRITE TABLE test_m_p PARTITION ( client=1 ) select description
where client=1
INSERT OVERWRITE TABLE test_m_p PARTITION ( client=2 ) select description
where client=2 ;
*
--- contents of /tmp/m.lst
1test
1test2
1test3
2hi
2hi1
2hi3

Thanks!
-Matt



Hudson build is back to normal: Hive-trunk-h0.17 #156

2009-07-16 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/156/changes




Fwd: Error on Load into multiple Partitions

2009-07-16 Thread Matt Pestritto
Does anyone have any idea as to the reason for this error ?

Thanks in Advance
-Matt

-- Forwarded message --
From: Matt Pestritto 
Date: Wed, Jul 15, 2009 at 10:09 AM
Subject: Error on Load into multiple Partitions
To: hive-dev@hadoop.apache.org


Hi All.

Are there are existing test cases that load into multiple partitions using a
single from query?  This query worked in an older revision but the mappers
fails when I run on trunk:

java.lang.RuntimeException: Map operator initialization failed

at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)

Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.MapOperator.initObjectInspector(MapOperator.java:176)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:204)

at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:264)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:103)


Here is a simplified version of what I'm running and DDL to support:
*create table test_m ( client int, description string )
  row format delimited fields terminated by '\011' lines terminated by
'\012' stored as textfile;
*
*create table test_m_p ( description string )
  partitioned by ( client int ) row format delimited fields terminated by
'\011' lines terminated by '\012' stored as textfile;
*
*LOAD DATA LOCAL INPATH '/tmp/m.lst' OVERWRITE INTO TABLE test_m  ; *

*FROM test_m
INSERT OVERWRITE TABLE test_m_p PARTITION ( client=1 ) select description
where client=1
INSERT OVERWRITE TABLE test_m_p PARTITION ( client=2 ) select description
where client=2 ;
*
--- contents of /tmp/m.lst
1test
1test2
1test3
2hi
2hi1
2hi3

Thanks!
-Matt


[jira] Resolved: (HIVE-515) [UDF] new string function INSTR(str,substr)

2009-07-16 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou resolved HIVE-515.
---

Resolution: Duplicate

duplicates [#HIVE-541]

> [UDF] new string function INSTR(str,substr)
> ---
>
> Key: HIVE-515
> URL: https://issues.apache.org/jira/browse/HIVE-515
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Min Zhou
>Assignee: Min Zhou
> Attachments: HIVE-515-2.patch, HIVE-515.patch
>
>
> UDF for string function INSTR(str,substr)
> This extends the function from MySQL
> http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_instr
> usage:
>  INSTR(str, substr)
>  INSTR(str, substr, start)
> example:
> {code:sql}
> select instr('abcd', 'abc') from pokes;  // all result are '1'
> select instr('abcabc', 'ccc') from pokes;  // all result are '0'
> select instr('abcabc', 'abc', 2) from pokes;  // all result are '4'
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-541) Implement UDFs: INSTR and LOCATE

2009-07-16 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731858#action_12731858
 ] 

Min Zhou commented on HIVE-541:
---

all test cases passed on my side,  how's  yours?

> Implement UDFs: INSTR and LOCATE
> 
>
> Key: HIVE-541
> URL: https://issues.apache.org/jira/browse/HIVE-541
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Min Zhou
> Attachments: HIVE-541.1.patch, HIVE-541.2.patch
>
>
> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_instr
> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_locate
> These functions can be directly implemented with Text (instead of String). 
> This will make the test of whether one string contains another string much 
> faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.