[jira] [Commented] (HIVE-4590) HCatalog documentation example is wrong
[ https://issues.apache.org/jira/browse/HIVE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068880#comment-14068880 ] Lefty Leverenz commented on HIVE-4590: -- Done: {quote} The following very simple MapReduce program reads data from one table which it assumes to have an integer in the second column (column 1), and counts how many instances of each distinct value it finds. That is, it does the equivalent of select col1, count\(*\) from $table group by col1;. For example, if the values in the second column are \{1,1,1,3,3,5\} the program will produce this output of values and counts: 1, 3 3, 2 5, 1 {quote} * [HCatalog Input Output -- Read Example | https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput#HCatalogInputOutput-ReadExample] HCatalog documentation example is wrong --- Key: HIVE-4590 URL: https://issues.apache.org/jira/browse/HIVE-4590 Project: Hive Issue Type: Bug Components: Documentation, HCatalog Affects Versions: 0.10.0 Reporter: Eugene Koifman Assignee: Lefty Leverenz Priority: Minor http://hive.apache.org/docs/hcat_r0.5.0/inputoutput.html#Read+Example reads The following very simple MapReduce program reads data from one table which it assumes to have an integer in the second column, and counts how many different values it sees. That is, it does the equivalent of select col1, count(*) from $table group by col1;. The description of the query is wrong. It actually counts how many instances of each distinct value it find. For example, if values of col1 are {1,1,1,3,3,3,5) it will produce 1, 3 3, 2, 5, 1 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4590) HCatalog documentation example is wrong
[ https://issues.apache.org/jira/browse/HIVE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068095#comment-14068095 ] Lefty Leverenz commented on HIVE-4590: -- [~eugene.koifman], it's past time to fix this but first I have a couple of questions: # Why does the equivalent SELECT statement say col1 while the description says an integer in the second column? Does this assume column numbers start with zero? #* select col1, count\(*\) from $table group by col1; I tried to figure it out from the MR program, but strained my brain. # Is there a typo in the output for your sample dataset (1,1,1,3,3,3,5)? I see three 3s, not 2. #* 1, 3 3, 2, 5, 1 ... and presumably the comma after the 2 (or 3) can be removed. The doc has a new location, by the way: * [HCat Input and Output -- Read Example | https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput#HCatalogInputOutput-ReadExample] HCatalog documentation example is wrong --- Key: HIVE-4590 URL: https://issues.apache.org/jira/browse/HIVE-4590 Project: Hive Issue Type: Bug Components: Documentation, HCatalog Affects Versions: 0.10.0 Reporter: Eugene Koifman Assignee: Lefty Leverenz Priority: Minor http://hive.apache.org/docs/hcat_r0.5.0/inputoutput.html#Read+Example reads The following very simple MapReduce program reads data from one table which it assumes to have an integer in the second column, and counts how many different values it sees. That is, it does the equivalent of select col1, count(*) from $table group by col1;. The description of the query is wrong. It actually counts how many instances of each distinct value it find. For example, if values of col1 are {1,1,1,3,3,3,5) it will produce 1, 3 3, 2, 5, 1 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4590) HCatalog documentation example is wrong
[ https://issues.apache.org/jira/browse/HIVE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068228#comment-14068228 ] Eugene Koifman commented on HIVE-4590: -- [~leftylev] 1. The MR program does value.get(1) in reduce() which means it's col1 is the 2nd column. Presumably the 1st (0th) column could have been UserName. 2. you are correct on both HCatalog documentation example is wrong --- Key: HIVE-4590 URL: https://issues.apache.org/jira/browse/HIVE-4590 Project: Hive Issue Type: Bug Components: Documentation, HCatalog Affects Versions: 0.10.0 Reporter: Eugene Koifman Assignee: Lefty Leverenz Priority: Minor http://hive.apache.org/docs/hcat_r0.5.0/inputoutput.html#Read+Example reads The following very simple MapReduce program reads data from one table which it assumes to have an integer in the second column, and counts how many different values it sees. That is, it does the equivalent of select col1, count(*) from $table group by col1;. The description of the query is wrong. It actually counts how many instances of each distinct value it find. For example, if values of col1 are {1,1,1,3,3,3,5) it will produce 1, 3 3, 2, 5, 1 -- This message was sent by Atlassian JIRA (v6.2#6252)