[jira] [Commented] (HIVE-7400) count and count distinct not correct

2014-07-14 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060838#comment-14060838
 ] 

Ashutosh Chauhan commented on HIVE-7400:


[~darranl] If you can upload a small dataset with which this can be reproduced, 
that will be great.

 count and count distinct not correct
 

 Key: HIVE-7400
 URL: https://issues.apache.org/jira/browse/HIVE-7400
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Danran Lai

 I have a table in Hive and I want to count unique records and all records.
 Table looks like:
 {quote}   
 sid string   
 param   mapstring,string
  
 domain  string   
 product string
 {quote}
 And my query like this:
 {quote}
 select domain,product,count(1) as num,count(distinct param['from'])  as 
 user_num
 from table
 group by domain,product
 {quote}
 But the results are not correct. I can get the right user_num, but the num is 
 wrong which is less than the real num. The real num is about 30 millon but I 
 can only get 9 millon. 
 So how can I fix this so that I get the correct result?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7400) count and count distinct not correct

2014-07-14 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061654#comment-14061654
 ] 

Gopal V commented on HIVE-7400:
---

Never mind, laggy JIRA updates. Saw the file now.

 count and count distinct not correct
 

 Key: HIVE-7400
 URL: https://issues.apache.org/jira/browse/HIVE-7400
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Danran Lai
 Attachments: data_15


 I have a table in Hive and I want to count unique records and all records.
 Table looks like:
 {quote}   
 sid string   
 sender   string 
 domain  string   
 product string
 {quote}
 And my query like this:
 {quote}
 select domain,product,count(1) as num,count(distinct sender)  as user_num
 from table
 group by domain,product
 {quote}
 But the results are not correct. I can get the right user_num, but the num is 
 wrong which is less than the real num. The real num is about 30 millon but I 
 can only get 9 millon. 
 So how can I fix this so that I get the correct result?
 ==Updated==
 Dataset is uploaded. Row format delimited fields terminated by '\t'. This 
 dataset has 150,000 rows. With this query, I got the result as followed:
 bq.domain1 product136424   36424



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7400) count and count distinct not correct

2014-07-14 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061651#comment-14061651
 ] 

Gopal V commented on HIVE-7400:
---

Where is the data-set? 

Can you attach it to the JIRA?

 count and count distinct not correct
 

 Key: HIVE-7400
 URL: https://issues.apache.org/jira/browse/HIVE-7400
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Danran Lai
 Attachments: data_15


 I have a table in Hive and I want to count unique records and all records.
 Table looks like:
 {quote}   
 sid string   
 sender   string 
 domain  string   
 product string
 {quote}
 And my query like this:
 {quote}
 select domain,product,count(1) as num,count(distinct sender)  as user_num
 from table
 group by domain,product
 {quote}
 But the results are not correct. I can get the right user_num, but the num is 
 wrong which is less than the real num. The real num is about 30 millon but I 
 can only get 9 millon. 
 So how can I fix this so that I get the correct result?
 ==Updated==
 Dataset is uploaded. Row format delimited fields terminated by '\t'. This 
 dataset has 150,000 rows. With this query, I got the result as followed:
 bq.domain1 product136424   36424



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7400) count and count distinct not correct

2014-07-14 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061662#comment-14061662
 ] 

Gopal V commented on HIVE-7400:
---

With Tez enabled, I cannot reproduce this

{code}
Status: Finished successfully
OK
domain1 product115  36424
Time taken: 6.389 seconds, Fetched: 1 row(s)
hive 
{code}

 count and count distinct not correct
 

 Key: HIVE-7400
 URL: https://issues.apache.org/jira/browse/HIVE-7400
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Danran Lai
 Attachments: data_15


 I have a table in Hive and I want to count unique records and all records.
 Table looks like:
 {quote}   
 sid string   
 sender   string 
 domain  string   
 product string
 {quote}
 And my query like this:
 {quote}
 select domain,product,count(1) as num,count(distinct sender)  as user_num
 from table
 group by domain,product
 {quote}
 But the results are not correct. I can get the right user_num, but the num is 
 wrong which is less than the real num. The real num is about 30 millon but I 
 can only get 9 millon. 
 So how can I fix this so that I get the correct result?
 ==Updated==
 Dataset is uploaded. Row format delimited fields terminated by '\t'. This 
 dataset has 150,000 rows. With this query, I got the result as followed:
 bq.domain1 product136424   36424



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7400) count and count distinct not correct

2014-07-14 Thread Danran Lai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061666#comment-14061666
 ] 

Danran Lai commented on HIVE-7400:
--

~Gopal V which version of hive are you using? With my own hive environment, I 
can only get the wrong result
By the way, What's Tez?

 count and count distinct not correct
 

 Key: HIVE-7400
 URL: https://issues.apache.org/jira/browse/HIVE-7400
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Danran Lai
 Attachments: data_15


 I have a table in Hive and I want to count unique records and all records.
 Table looks like:
 {quote}   
 sid string   
 sender   string 
 domain  string   
 product string
 {quote}
 And my query like this:
 {quote}
 select domain,product,count(1) as num,count(distinct sender)  as user_num
 from table
 group by domain,product
 {quote}
 But the results are not correct. I can get the right user_num, but the num is 
 wrong which is less than the real num. The real num is about 30 millon but I 
 can only get 9 millon. 
 So how can I fix this so that I get the correct result?
 ==Updated==
 Dataset is uploaded. Row format delimited fields terminated by '\t'. This 
 dataset has 150,000 rows. With this query, I got the result as followed:
 bq.domain1 product136424   36424



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7400) count and count distinct not correct

2014-07-14 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061672#comment-14061672
 ] 

Gopal V commented on HIVE-7400:
---

[~darranl]: I am using hive-14 which is the only branch in development at the 
moment.

And Tez is the new faster execution engine for Hive on Hadoop-2 clusters - 
http://www.slideshare.net/t3rmin4t0r/hivetez-a-performance-deep-dive/3

 count and count distinct not correct
 

 Key: HIVE-7400
 URL: https://issues.apache.org/jira/browse/HIVE-7400
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Danran Lai
 Attachments: data_15


 I have a table in Hive and I want to count unique records and all records.
 Table looks like:
 {quote}   
 sid string   
 sender   string 
 domain  string   
 product string
 {quote}
 And my query like this:
 {quote}
 select domain,product,count(1) as num,count(distinct sender)  as user_num
 from table
 group by domain,product
 {quote}
 But the results are not correct. I can get the right user_num, but the num is 
 wrong which is less than the real num. The real num is about 30 millon but I 
 can only get 9 millon. 
 So how can I fix this so that I get the correct result?
 ==Updated==
 Dataset is uploaded. Row format delimited fields terminated by '\t'. This 
 dataset has 150,000 rows. With this query, I got the result as followed:
 bq.domain1 product136424   36424



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7400) count and count distinct not correct

2014-07-14 Thread Danran Lai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061677#comment-14061677
 ] 

Danran Lai commented on HIVE-7400:
--

~gopalv : Thank you. Now I see some. 
But I'm using hive-11 and it's not easy for me to upgrade hive-11 to hive-14 
for a variety of reasons. Did you know whether there's a kind of patch based on 
hive-11 could solve this bug?

 count and count distinct not correct
 

 Key: HIVE-7400
 URL: https://issues.apache.org/jira/browse/HIVE-7400
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Danran Lai
 Attachments: data_15


 I have a table in Hive and I want to count unique records and all records.
 Table looks like:
 {quote}   
 sid string   
 sender   string 
 domain  string   
 product string
 {quote}
 And my query like this:
 {quote}
 select domain,product,count(1) as num,count(distinct sender)  as user_num
 from table
 group by domain,product
 {quote}
 But the results are not correct. I can get the right user_num, but the num is 
 wrong which is less than the real num. The real num is about 30 millon but I 
 can only get 9 millon. 
 So how can I fix this so that I get the correct result?
 ==Updated==
 Dataset is uploaded. Row format delimited fields terminated by '\t'. This 
 dataset has 150,000 rows. With this query, I got the result as followed:
 bq.domain1 product136424   36424



--
This message was sent by Atlassian JIRA
(v6.2#6252)