[jira] [Commented] (HIVE-4590) HCatalog documentation example is wrong

2014-07-21 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068880#comment-14068880
 ] 

Lefty Leverenz commented on HIVE-4590:
--

Done: 

{quote}
The following very simple MapReduce program reads data from one table which it 
assumes to have an integer in the second column (column 1), and counts how 
many instances of each distinct value it finds. That is, it does the equivalent 
of select col1, count\(*\) from $table group by col1;.

For example, if the values in the second column are \{1,1,1,3,3,5\} the program 
will produce this output of values and counts:

1, 3
3, 2
5, 1
{quote}

* [HCatalog Input Output -- Read Example | 
https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput#HCatalogInputOutput-ReadExample]

 HCatalog documentation example is wrong
 ---

 Key: HIVE-4590
 URL: https://issues.apache.org/jira/browse/HIVE-4590
 Project: Hive
  Issue Type: Bug
  Components: Documentation, HCatalog
Affects Versions: 0.10.0
Reporter: Eugene Koifman
Assignee: Lefty Leverenz
Priority: Minor

 http://hive.apache.org/docs/hcat_r0.5.0/inputoutput.html#Read+Example
 reads
 The following very simple MapReduce program reads data from one table which 
 it assumes to have an integer in the second column, and counts how many 
 different values it sees. That is, it does the equivalent of select col1, 
 count(*) from $table group by col1;.
 The description of the query is wrong.  It actually counts how many instances 
 of each distinct value it find.  For example, if values of col1 are 
 {1,1,1,3,3,3,5) it will produce
 1, 3
 3, 2,
 5, 1
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4590) HCatalog documentation example is wrong

2014-07-20 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068095#comment-14068095
 ] 

Lefty Leverenz commented on HIVE-4590:
--

[~eugene.koifman], it's past time to fix this but first I have a couple of 
questions:

#  Why does the equivalent SELECT statement say col1 while the description 
says an integer in the second column?  Does this assume column numbers start 
with zero?
#*  select col1, count\(*\) from $table group by col1;
I tried to figure it out from the MR program, but strained my brain.
#  Is there a typo in the output for your sample dataset (1,1,1,3,3,3,5)?  I 
see three 3s, not 2.  
#*  1, 3
3, 2,
5, 1
... and presumably the comma after the 2 (or 3) can be removed.

The doc has a new location, by the way:

* [HCat Input and Output -- Read Example | 
https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput#HCatalogInputOutput-ReadExample]

 HCatalog documentation example is wrong
 ---

 Key: HIVE-4590
 URL: https://issues.apache.org/jira/browse/HIVE-4590
 Project: Hive
  Issue Type: Bug
  Components: Documentation, HCatalog
Affects Versions: 0.10.0
Reporter: Eugene Koifman
Assignee: Lefty Leverenz
Priority: Minor

 http://hive.apache.org/docs/hcat_r0.5.0/inputoutput.html#Read+Example
 reads
 The following very simple MapReduce program reads data from one table which 
 it assumes to have an integer in the second column, and counts how many 
 different values it sees. That is, it does the equivalent of select col1, 
 count(*) from $table group by col1;.
 The description of the query is wrong.  It actually counts how many instances 
 of each distinct value it find.  For example, if values of col1 are 
 {1,1,1,3,3,3,5) it will produce
 1, 3
 3, 2,
 5, 1
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4590) HCatalog documentation example is wrong

2014-07-20 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068228#comment-14068228
 ] 

Eugene Koifman commented on HIVE-4590:
--

[~leftylev]
1. The MR program does value.get(1) in reduce() which means it's col1 is 
the 2nd column.  Presumably the 1st (0th) column could have been UserName.
2. you are correct on both

 HCatalog documentation example is wrong
 ---

 Key: HIVE-4590
 URL: https://issues.apache.org/jira/browse/HIVE-4590
 Project: Hive
  Issue Type: Bug
  Components: Documentation, HCatalog
Affects Versions: 0.10.0
Reporter: Eugene Koifman
Assignee: Lefty Leverenz
Priority: Minor

 http://hive.apache.org/docs/hcat_r0.5.0/inputoutput.html#Read+Example
 reads
 The following very simple MapReduce program reads data from one table which 
 it assumes to have an integer in the second column, and counts how many 
 different values it sees. That is, it does the equivalent of select col1, 
 count(*) from $table group by col1;.
 The description of the query is wrong.  It actually counts how many instances 
 of each distinct value it find.  For example, if values of col1 are 
 {1,1,1,3,3,3,5) it will produce
 1, 3
 3, 2,
 5, 1
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)