[jira] Updated: (HIVE-474) Support for distinct selection on two or more columns

2010-10-20 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-474:
-

Attachment: patch-474.txt

I have reworked on the patch from Mafish so that it works for trunk. Now, the 
patch takes care of multiple columns in distinct (HIVE-287) also.

> Support for distinct selection on two or more columns
> -
>
> Key: HIVE-474
> URL: https://issues.apache.org/jira/browse/HIVE-474
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Alexis Rondeau
>Assignee: Amareshwari Sriramadasu
> Attachments: hive-474.0.4.2rc.patch, patch-474.txt
>
>
> The ability to select distinct several, individual columns as by example: 
> select count(distinct user), count(distinct session) from actions;   
> Currently returns the following failure: 
> FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns 
> not Supported user

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-474) Support for distinct selection on two or more columns

2010-10-20 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-474:
-

Status: Patch Available  (was: Open)

> Support for distinct selection on two or more columns
> -
>
> Key: HIVE-474
> URL: https://issues.apache.org/jira/browse/HIVE-474
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Alexis Rondeau
>Assignee: Amareshwari Sriramadasu
> Attachments: hive-474.0.4.2rc.patch, patch-474.txt
>
>
> The ability to select distinct several, individual columns as by example: 
> select count(distinct user), count(distinct session) from actions;   
> Currently returns the following failure: 
> FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns 
> not Supported user

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-474) Support for distinct selection on two or more columns

2010-10-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923033#action_12923033
 ] 

Namit Jain commented on HIVE-474:
-

I will take a look 

> Support for distinct selection on two or more columns
> -
>
> Key: HIVE-474
> URL: https://issues.apache.org/jira/browse/HIVE-474
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Alexis Rondeau
>Assignee: Amareshwari Sriramadasu
> Attachments: hive-474.0.4.2rc.patch, patch-474.txt
>
>
> The ability to select distinct several, individual columns as by example: 
> select count(distinct user), count(distinct session) from actions;   
> Currently returns the following failure: 
> FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns 
> not Supported user

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-474) Support for distinct selection on two or more columns

2010-10-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923035#action_12923035
 ] 

Namit Jain commented on HIVE-474:
-

Diff at https://review.cloudera.org/r/1052/ for review


> Support for distinct selection on two or more columns
> -
>
> Key: HIVE-474
> URL: https://issues.apache.org/jira/browse/HIVE-474
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Alexis Rondeau
>Assignee: Amareshwari Sriramadasu
> Attachments: hive-474.0.4.2rc.patch, patch-474.txt
>
>
> The ability to select distinct several, individual columns as by example: 
> select count(distinct user), count(distinct session) from actions;   
> Currently returns the following failure: 
> FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns 
> not Supported user

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1376) Simple UDAFs with more than 1 parameter crash on empty row query

2010-10-20 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923090#action_12923090
 ] 

John Sichi commented on HIVE-1376:
--

This patch only did (1), not (3).  I think we'll still need a followup to avoid 
the problem for arbitrary UDAF's (unless we require them to avoid primitive 
types).

> Simple UDAFs with more than 1 parameter crash on empty row query 
> -
>
> Key: HIVE-1376
> URL: https://issues.apache.org/jira/browse/HIVE-1376
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1376.2.patch, HIVE-1376.patch
>
>
> Simple UDAFs with more than 1 parameter crash when the query returns no rows. 
> Currently, this only seems to affect the percentile() UDAF where the second 
> parameter is the percentile to be computed (of type double). I've also 
> verified the bug by adding a dummy parameter to ExampleMin in contrib. 
> On an empty query, Hive seems to be trying to resolve an iterate() method 
> with signature {null,null} instead of {null,double}. You can reproduce this 
> bug using:
> CREATE TABLE pct_test ( val INT );
> SELECT percentile(val, 0.5) FROM pct_test;
> which produces a lot of errors like: 
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
> execute method public boolean 
> org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.iterate(org.apache.hadoop.io.LongWritable,double)
>   on object 
> org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@11d13272 
> of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator 
> with arguments {null, null} of size 2

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson build is back to normal : Hive-trunk-h0.20 #398

2010-10-20 Thread Apache Hudson Server
See 




[jira] Updated: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"

2010-10-20 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1633:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I just committed! Thanks Sreekanth Ramakrishnan!

> CombineHiveInputFormat fails with "cannot find dir for emptyFile"
> -
>
> Key: HIVE-1633
> URL: https://issues.apache.org/jira/browse/HIVE-1633
> Project: Hive
>  Issue Type: Bug
>  Components: Clients
>Reporter: Amareshwari Sriramadasu
> Attachments: HIVE-1633.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1641) add map joined table to distributed cache

2010-10-20 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923168#action_12923168
 ] 

He Yongqiang commented on HIVE-1641:


Some tests are failing because of plan change.

Can you refresh the diff?


And some more minor comments, you can fix them in the following up jiras or in 
your next patch (some of them are just few lines of change).
1. 
NOTSKIPBIGTABLE is defined in both AbstractMapJoinOperator and 
CommonJoinOperator. And let's not use 'static'.

2.
In MapJoinObjectKey, metadataTag is always -1, and we serialize and deserialize 
it for each key. We can avoid it by simply assume that metadataTag is -1.

3.
In JDBMSinkOperator, 

if (hashTable.cacheSize() > 0) {
  o.setObj(res);
  needNewKey = false;
}

has no effect. 

Even hashTable.cacheSize() > 0, and then needNewKey = true

In the following code,
if (needNewKey){
...
hashTable.put(keyObj, valueObj);
}

the keyObj and valueObj is already in hashTable, so the put also has no effect 
except put the value to the head of MRUList. But at the put time, it is already 
in the head because of the get()

So ideally, 

we should put most code into 

if (o == null) {


 if (metadataValueTag[tag] == -1) {
 .
 }

 if (needNewKey) { //this is always true here
 
 }
} else {
res = o.getObj();
res.add(value);
}

These maybe beneficial to the client performance, and that will be good since 
now we are now putting all the process work of small tables at the client. 

4. 
In JDBMSinkOperator's close(), put hashTable.close(); before uploading jdbm 
file. That way, JDBM itself may want to do some cleanup work in the close 
before uploading jdbm file.

5.
In JDBMSinkOperator, remove getPersistentFilePath(). there is no referenced to 
it.

6.
In MapjoinOperator's loadJDBM, remove line "int alias;"
In loadJDBM(), remove code:
"
for(int i = 0;i add map joined table to distributed cache
> -
>
> Key: HIVE-1641
> URL: https://issues.apache.org/jira/browse/HIVE-1641
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: Hive-1641(3).txt, Hive-1641(4).patch, 
> Hive-1641(5).patch, Hive-1641.patch
>
>
> Currently, the mappers directly read the map-joined table from HDFS, which 
> makes it difficult to scale.
> We end up getting lots of timeouts once the number of mappers are beyond a 
> few thousand, due to 
> concurrent mappers.
> It would be good idea to put the mapped file into distributed cache and read 
> from there instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[VOTE] hive 0.6.0 release candidate 0

2010-10-20 Thread John Sichi
The tarballs are at

http://people.apache.org/~jvs/hive-0.6.0-candidate-0

Carl did some sanity testing on it already, but any additional testing you can 
do before voting helps to ensure a quality release.

JVS



[jira] Created: (HIVE-1737) Two Bugs for Estimating Row Sizes in GroupByOperator

2010-10-20 Thread Siying Dong (JIRA)
Two Bugs for Estimating Row Sizes in GroupByOperator


 Key: HIVE-1737
 URL: https://issues.apache.org/jira/browse/HIVE-1737
 Project: Hive
  Issue Type: Bug
Reporter: Siying Dong
Assignee: Siying Dong


Two bugs:
1. if UDAF uses string type, Group-by will break as it tries to insert an 
ArrayList to a HashMap.
2. The code to sample size of keys only handles String type and Text type, 
while in most cases, they are org.apache.hadoop.hive.serde2.lazy.LazyString, so 
that 0 is always used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1738) Optimize Key Comparison in GroupByOperator

2010-10-20 Thread Siying Dong (JIRA)
Optimize Key Comparison in GroupByOperator
--

 Key: HIVE-1738
 URL: https://issues.apache.org/jira/browse/HIVE-1738
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong


GroupByOperator uses ObjectInspectorUtils.compare() to compare keys, which is 
written for generalized object comparisons, which is not optimized for group-by 
operator. By optimizing this logic, we expect to see obvious improvements in 
GroupByOperator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1737) Two Bugs for Estimating Row Sizes in GroupByOperator

2010-10-20 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-1737:
--

Attachment: HIVE-1737.1.patch

> Two Bugs for Estimating Row Sizes in GroupByOperator
> 
>
> Key: HIVE-1737
> URL: https://issues.apache.org/jira/browse/HIVE-1737
> Project: Hive
>  Issue Type: Bug
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE-1737.1.patch
>
>
> Two bugs:
> 1. if UDAF uses string type, Group-by will break as it tries to insert an 
> ArrayList to a HashMap.
> 2. The code to sample size of keys only handles String type and Text type, 
> while in most cases, they are org.apache.hadoop.hive.serde2.lazy.LazyString, 
> so that 0 is always used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1737) Two Bugs for Estimating Row Sizes in GroupByOperator

2010-10-20 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-1737:
--

Status: Patch Available  (was: Open)

> Two Bugs for Estimating Row Sizes in GroupByOperator
> 
>
> Key: HIVE-1737
> URL: https://issues.apache.org/jira/browse/HIVE-1737
> Project: Hive
>  Issue Type: Bug
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE-1737.1.patch
>
>
> Two bugs:
> 1. if UDAF uses string type, Group-by will break as it tries to insert an 
> ArrayList to a HashMap.
> 2. The code to sample size of keys only handles String type and Text type, 
> while in most cases, they are org.apache.hadoop.hive.serde2.lazy.LazyString, 
> so that 0 is always used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1737) Two Bugs for Estimating Row Sizes in GroupByOperator

2010-10-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923320#action_12923320
 ] 

Namit Jain commented on HIVE-1737:
--

+1

> Two Bugs for Estimating Row Sizes in GroupByOperator
> 
>
> Key: HIVE-1737
> URL: https://issues.apache.org/jira/browse/HIVE-1737
> Project: Hive
>  Issue Type: Bug
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE-1737.1.patch
>
>
> Two bugs:
> 1. if UDAF uses string type, Group-by will break as it tries to insert an 
> ArrayList to a HashMap.
> 2. The code to sample size of keys only handles String type and Text type, 
> while in most cases, they are org.apache.hadoop.hive.serde2.lazy.LazyString, 
> so that 0 is always used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.