RE: Converting rows into dynamic colums in Hive
You could do it using Pivot table in MS Excel. It's under the Insert tab, first option on the left. Richin -Original Message- From: Jain Richin (Nokia-LC/Boston) Sent: Thursday, August 09, 2012 4:16 PM To: user@hive.apache.org Subject: RE: Converting rows into dynamic colums in Hive Thanks John. Is there a way to do this is excel since I am outputting the table data in csv format (using macros or something else)? Richin -Original Message- From: ext John Meagher [mailto:john.meag...@gmail.com] Sent: Thursday, August 09, 2012 10:11 AM To: user@hive.apache.org Subject: Re: Converting rows into dynamic colums in Hive R is another programming language geared to statistical analysis. It has some features that make it easy to write this sort of data manipulation. http://www.r-project.org/ Here's a sample R function that does something similar to what you need (note: I'm not a real R programmer, so this might be ugly, but it works): abCountsToMatrix - function(csvFilename, outputmatrixfile){ data - read.csv(csvFilename, as.is=T, check.names=F, header=F) cat(Loaded, nrow(data), rows.\n) idrows - sort(unique(c(data[ , 1]))) idcols - sort(unique(c(data[ , 2]))) cat(Found, length(idrows), X , length(idcols), .) output - array(0, c(length(idrows), length(idcols))) rownames(output) - idrows colnames(output) - idcols for(i in 1:nrow(data)){ if (i %% 100 == 0) { cat(On , i, of , nrow(data), \n) } rowidx - which(idrows == data[i, 1]) colidx - which(idcols == data[i, 2]) count - data[i,3] output[rowidx, colidx] - count } write.csv(output, file=outputmatrixfile, quote=F) } On Wed, Aug 8, 2012 at 8:58 PM, richin.j...@nokia.com wrote: John, What is R? -Original Message- From: ext John Meagher [mailto:john.meag...@gmail.com] Sent: Wednesday, August 08, 2012 4:34 PM To: user@hive.apache.org Subject: Re: Converting rows into dynamic colums in Hive I don't think having dynamic columns is possible in Hive. I've always output from Hive a structure like your query output and used R to convert it into a dynamic column structure. On Wed, Aug 8, 2012 at 3:56 PM, richin.j...@nokia.com wrote: Thanks Ashish, that gives an idea. But I am not sure about the outer select loop, I have to know all the values in Beta column beforehand to do a max on each value. Is there a better way? Richin From: ext Ashish Thusoo [mailto:athu...@qubole.com] Sent: Tuesday, August 07, 2012 5:05 PM To: user@hive.apache.org Subject: Re: Converting rows into dynamic colums in Hive you should be able to do this in hive using a group by on alpha and then using a combination of the max and if statement... something on the following lines select alpha, max(abc), max(pqr), ... ( select alpha, if (beta == 'abc', Gamma, NULL) as abc, if (beta == 'pqr', Gamma, NUL) as pqr, from table ) group by alpha something on those lines... Ashish On Tue, Aug 7, 2012 at 1:57 PM, richin.j...@nokia.com wrote: Hi All, One of my Query output looks like- AlphaBeta Gamma 123 xyz 1.0 123 abc 0.5 123 pqr 1.3 123 def 2.1 456 xyz 0.1 456 abc 0.6 456 pqr 1.9 456 def 3.2 456 lmn1.1 456 sdf 1.2 I want the output for the data visualization purpose to look like (basically taking rows from one table and making it column in another table) Alphaxyz abc pqr def lmn sdf 123 1.0 0.5 1.3 2.1 456 0.1 0.6 1.9 3.2 1.1 1.2 Question - Can it be done in Hive? If not, any suggestions. Thanks, Richin
Re: Converting rows into dynamic colums in Hive
R is another programming language geared to statistical analysis. It has some features that make it easy to write this sort of data manipulation. http://www.r-project.org/ Here's a sample R function that does something similar to what you need (note: I'm not a real R programmer, so this might be ugly, but it works): abCountsToMatrix - function(csvFilename, outputmatrixfile){ data - read.csv(csvFilename, as.is=T, check.names=F, header=F) cat(Loaded, nrow(data), rows.\n) idrows - sort(unique(c(data[ , 1]))) idcols - sort(unique(c(data[ , 2]))) cat(Found, length(idrows), X , length(idcols), .) output - array(0, c(length(idrows), length(idcols))) rownames(output) - idrows colnames(output) - idcols for(i in 1:nrow(data)){ if (i %% 100 == 0) { cat(On , i, of , nrow(data), \n) } rowidx - which(idrows == data[i, 1]) colidx - which(idcols == data[i, 2]) count - data[i,3] output[rowidx, colidx] - count } write.csv(output, file=outputmatrixfile, quote=F) } On Wed, Aug 8, 2012 at 8:58 PM, richin.j...@nokia.com wrote: John, What is R? -Original Message- From: ext John Meagher [mailto:john.meag...@gmail.com] Sent: Wednesday, August 08, 2012 4:34 PM To: user@hive.apache.org Subject: Re: Converting rows into dynamic colums in Hive I don't think having dynamic columns is possible in Hive. I've always output from Hive a structure like your query output and used R to convert it into a dynamic column structure. On Wed, Aug 8, 2012 at 3:56 PM, richin.j...@nokia.com wrote: Thanks Ashish, that gives an idea. But I am not sure about the outer select loop, I have to know all the values in Beta column beforehand to do a max on each value. Is there a better way? Richin From: ext Ashish Thusoo [mailto:athu...@qubole.com] Sent: Tuesday, August 07, 2012 5:05 PM To: user@hive.apache.org Subject: Re: Converting rows into dynamic colums in Hive you should be able to do this in hive using a group by on alpha and then using a combination of the max and if statement... something on the following lines select alpha, max(abc), max(pqr), ... ( select alpha, if (beta == 'abc', Gamma, NULL) as abc, if (beta == 'pqr', Gamma, NUL) as pqr, from table ) group by alpha something on those lines... Ashish On Tue, Aug 7, 2012 at 1:57 PM, richin.j...@nokia.com wrote: Hi All, One of my Query output looks like- AlphaBeta Gamma 123 xyz 1.0 123 abc 0.5 123 pqr 1.3 123 def 2.1 456 xyz 0.1 456 abc 0.6 456 pqr 1.9 456 def 3.2 456 lmn1.1 456 sdf 1.2 I want the output for the data visualization purpose to look like (basically taking rows from one table and making it column in another table) Alphaxyz abc pqr def lmn sdf 123 1.0 0.5 1.3 2.1 456 0.1 0.6 1.9 3.2 1.1 1.2 Question - Can it be done in Hive? If not, any suggestions. Thanks, Richin
RE: Converting rows into dynamic colums in Hive
Thanks John. Is there a way to do this is excel since I am outputting the table data in csv format (using macros or something else)? Richin -Original Message- From: ext John Meagher [mailto:john.meag...@gmail.com] Sent: Thursday, August 09, 2012 10:11 AM To: user@hive.apache.org Subject: Re: Converting rows into dynamic colums in Hive R is another programming language geared to statistical analysis. It has some features that make it easy to write this sort of data manipulation. http://www.r-project.org/ Here's a sample R function that does something similar to what you need (note: I'm not a real R programmer, so this might be ugly, but it works): abCountsToMatrix - function(csvFilename, outputmatrixfile){ data - read.csv(csvFilename, as.is=T, check.names=F, header=F) cat(Loaded, nrow(data), rows.\n) idrows - sort(unique(c(data[ , 1]))) idcols - sort(unique(c(data[ , 2]))) cat(Found, length(idrows), X , length(idcols), .) output - array(0, c(length(idrows), length(idcols))) rownames(output) - idrows colnames(output) - idcols for(i in 1:nrow(data)){ if (i %% 100 == 0) { cat(On , i, of , nrow(data), \n) } rowidx - which(idrows == data[i, 1]) colidx - which(idcols == data[i, 2]) count - data[i,3] output[rowidx, colidx] - count } write.csv(output, file=outputmatrixfile, quote=F) } On Wed, Aug 8, 2012 at 8:58 PM, richin.j...@nokia.com wrote: John, What is R? -Original Message- From: ext John Meagher [mailto:john.meag...@gmail.com] Sent: Wednesday, August 08, 2012 4:34 PM To: user@hive.apache.org Subject: Re: Converting rows into dynamic colums in Hive I don't think having dynamic columns is possible in Hive. I've always output from Hive a structure like your query output and used R to convert it into a dynamic column structure. On Wed, Aug 8, 2012 at 3:56 PM, richin.j...@nokia.com wrote: Thanks Ashish, that gives an idea. But I am not sure about the outer select loop, I have to know all the values in Beta column beforehand to do a max on each value. Is there a better way? Richin From: ext Ashish Thusoo [mailto:athu...@qubole.com] Sent: Tuesday, August 07, 2012 5:05 PM To: user@hive.apache.org Subject: Re: Converting rows into dynamic colums in Hive you should be able to do this in hive using a group by on alpha and then using a combination of the max and if statement... something on the following lines select alpha, max(abc), max(pqr), ... ( select alpha, if (beta == 'abc', Gamma, NULL) as abc, if (beta == 'pqr', Gamma, NUL) as pqr, from table ) group by alpha something on those lines... Ashish On Tue, Aug 7, 2012 at 1:57 PM, richin.j...@nokia.com wrote: Hi All, One of my Query output looks like- AlphaBeta Gamma 123 xyz 1.0 123 abc 0.5 123 pqr 1.3 123 def 2.1 456 xyz 0.1 456 abc 0.6 456 pqr 1.9 456 def 3.2 456 lmn1.1 456 sdf 1.2 I want the output for the data visualization purpose to look like (basically taking rows from one table and making it column in another table) Alphaxyz abc pqr def lmn sdf 123 1.0 0.5 1.3 2.1 456 0.1 0.6 1.9 3.2 1.1 1.2 Question - Can it be done in Hive? If not, any suggestions. Thanks, Richin
RE: Converting rows into dynamic colums in Hive
Thanks Ashish, that gives an idea. But I am not sure about the outer select loop, I have to know all the values in Beta column beforehand to do a max on each value. Is there a better way? Richin From: ext Ashish Thusoo [mailto:athu...@qubole.com] Sent: Tuesday, August 07, 2012 5:05 PM To: user@hive.apache.org Subject: Re: Converting rows into dynamic colums in Hive you should be able to do this in hive using a group by on alpha and then using a combination of the max and if statement... something on the following lines select alpha, max(abc), max(pqr), ... ( select alpha, if (beta == 'abc', Gamma, NULL) as abc, if (beta == 'pqr', Gamma, NUL) as pqr, from table ) group by alpha something on those lines... Ashish On Tue, Aug 7, 2012 at 1:57 PM, richin.j...@nokia.commailto:richin.j...@nokia.com wrote: Hi All, One of my Query output looks like- AlphaBeta Gamma 123 xyz 1.0 123 abc 0.5 123 pqr 1.3 123 def 2.1 456 xyz 0.1 456 abc 0.6 456 pqr 1.9 456 def 3.2 456 lmn1.1 456 sdf 1.2 I want the output for the data visualization purpose to look like (basically taking rows from one table and making it column in another table) Alphaxyz abc pqr def lmn sdf 123 1.0 0.5 1.3 2.1 456 0.1 0.6 1.9 3.2 1.1 1.2 Question - Can it be done in Hive? If not, any suggestions. Thanks, Richin
Re: Converting rows into dynamic colums in Hive
I don't think having dynamic columns is possible in Hive. I've always output from Hive a structure like your query output and used R to convert it into a dynamic column structure. On Wed, Aug 8, 2012 at 3:56 PM, richin.j...@nokia.com wrote: Thanks Ashish, that gives an idea. But I am not sure about the outer select loop, I have to know all the values in Beta column beforehand to do a max on each value. Is there a better way? Richin From: ext Ashish Thusoo [mailto:athu...@qubole.com] Sent: Tuesday, August 07, 2012 5:05 PM To: user@hive.apache.org Subject: Re: Converting rows into dynamic colums in Hive you should be able to do this in hive using a group by on alpha and then using a combination of the max and if statement... something on the following lines select alpha, max(abc), max(pqr), ... ( select alpha, if (beta == 'abc', Gamma, NULL) as abc, if (beta == 'pqr', Gamma, NUL) as pqr, from table ) group by alpha something on those lines... Ashish On Tue, Aug 7, 2012 at 1:57 PM, richin.j...@nokia.com wrote: Hi All, One of my Query output looks like- AlphaBeta Gamma 123 xyz 1.0 123 abc 0.5 123 pqr 1.3 123 def 2.1 456 xyz 0.1 456 abc 0.6 456 pqr 1.9 456 def 3.2 456 lmn1.1 456 sdf 1.2 I want the output for the data visualization purpose to look like (basically taking rows from one table and making it column in another table) Alphaxyz abc pqr def lmn sdf 123 1.0 0.5 1.3 2.1 456 0.1 0.6 1.9 3.2 1.1 1.2 Question – Can it be done in Hive? If not, any suggestions. Thanks, Richin
RE: Converting rows into dynamic colums in Hive
John, What is R? -Original Message- From: ext John Meagher [mailto:john.meag...@gmail.com] Sent: Wednesday, August 08, 2012 4:34 PM To: user@hive.apache.org Subject: Re: Converting rows into dynamic colums in Hive I don't think having dynamic columns is possible in Hive. I've always output from Hive a structure like your query output and used R to convert it into a dynamic column structure. On Wed, Aug 8, 2012 at 3:56 PM, richin.j...@nokia.com wrote: Thanks Ashish, that gives an idea. But I am not sure about the outer select loop, I have to know all the values in Beta column beforehand to do a max on each value. Is there a better way? Richin From: ext Ashish Thusoo [mailto:athu...@qubole.com] Sent: Tuesday, August 07, 2012 5:05 PM To: user@hive.apache.org Subject: Re: Converting rows into dynamic colums in Hive you should be able to do this in hive using a group by on alpha and then using a combination of the max and if statement... something on the following lines select alpha, max(abc), max(pqr), ... ( select alpha, if (beta == 'abc', Gamma, NULL) as abc, if (beta == 'pqr', Gamma, NUL) as pqr, from table ) group by alpha something on those lines... Ashish On Tue, Aug 7, 2012 at 1:57 PM, richin.j...@nokia.com wrote: Hi All, One of my Query output looks like- AlphaBeta Gamma 123 xyz 1.0 123 abc 0.5 123 pqr 1.3 123 def 2.1 456 xyz 0.1 456 abc 0.6 456 pqr 1.9 456 def 3.2 456 lmn1.1 456 sdf 1.2 I want the output for the data visualization purpose to look like (basically taking rows from one table and making it column in another table) Alphaxyz abc pqr def lmn sdf 123 1.0 0.5 1.3 2.1 456 0.1 0.6 1.9 3.2 1.1 1.2 Question - Can it be done in Hive? If not, any suggestions. Thanks, Richin
RE: Converting rows into dynamic colums in Hive
You can do this using case statements and summing the values. The only item to remember here is that the values in the grid need to be numeric for the sum to work. Thanks, Ranjith From: richin.j...@nokia.com [mailto:richin.j...@nokia.com] Sent: Tuesday, August 07, 2012 3:58 PM To: user@hive.apache.org Subject: Converting rows into dynamic colums in Hive Hi All, One of my Query output looks like- AlphaBeta Gamma 123 xyz 1.0 123 abc 0.5 123 pqr 1.3 123 def 2.1 456 xyz 0.1 456 abc 0.6 456 pqr 1.9 456 def 3.2 456 lmn1.1 456 sdf 1.2 I want the output for the data visualization purpose to look like (basically taking rows from one table and making it column in another table) Alphaxyz abc pqr def lmn sdf 123 1.0 0.5 1.3 2.1 456 0.1 0.6 1.9 3.2 1.1 1.2 Question - Can it be done in Hive? If not, any suggestions. Thanks, Richin