On 11/02/2013 10:35 AM, Zhao Jin wrote:
Dear all,

I am trying to make a series of waffle plot-like figures for my data to
visualize the ratios of amino acid residues at each position. For each one
of 37 positions, there may be one to four different amino acid residues. So
the data consist of the positions, what residues are there, and the ratios
of residues. The ratios of residues at a position add up to 100, or close
to 100 (more on this soon)*. I am hoping to make a *square* waffle
plot-like figure for each position, and fill the 10 X 10 grids with colors
representing each amino acid residue and areas for grids of a certain color
corresponding to the ratio of that residue. Then I could line up all the
plots in one row from position 1 to position 37.
*: if the sum of the ratios is less than 100 at a position, that's because
of an unknown residue which I did not include in the table.

I am attaching the dput output for my data here:
structure(list(position = c(1L, 2L, 3L, 4L, 4L, 5L, 6L, 7L, 7L,
8L, 9L, 9L, 9L, 10L, 10L, 11L, 11L, 12L, 12L, 13L, 13L, 14L,
15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 22L, 23L, 24L, 25L, 26L,
26L, 27L, 28L, 29L, 29L, 30L, 31L, 32L, 33L, 34L, 34L, 35L, 35L,
36L, 36L, 36L, 37L, 37L), residue = structure(c(9L, 4L, 18L,
7L, 9L, 7L, 12L, 3L, 4L, 1L, 7L, 9L, 12L, 1L, 4L, 4L, 13L, 5L,
14L, 2L, 18L, 3L, 16L, 9L, 17L, 15L, 7L, 5L, 5L, 7L, 17L, 13L,
15L, 11L, 6L, 13L, 16L, 14L, 10L, 13L, 17L, 1L, 1L, 17L, 1L,
12L, 1L, 5L, 3L, 6L, 8L, 7L, 9L), .Label = c("A", "C", "D", "E",
"G", "H", "I", "K", "L", "M", "N", "P", "Q", "R", "S", "T", "V",
"Y"), class = "factor"), ratio = c(99L, 100L, 100L, 1L, 99L,
100L, 100L, 1L, 98L, 100L, 10L, 87L, 3L, 79L, 9L, 12L, 84L, 99L,
1L, 83L, 13L, 100L, 100L, 100L, 100L, 99L, 100L, 100L, 100L,
98L, 2L, 100L, 100L, 100L, 2L, 98L, 100L, 100L, 1L, 99L, 100L,
100L, 98L, 100L, 95L, 5L, 98L, 2L, 3L, 95L, 1L, 1L, 98L)), .Names =
c("position",
"residue", "ratio"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "10", "11", "12", "13", "14", "15",
"17", "18", "19", "20", "23", "25", "27", "28", "29", "30", "31",
"32", "33", "34", "36", "37", "38", "39", "40", "42", "43", "44",
"45", "46", "47", "48", "50", "51", "52", "53", "54", "56", "57",
"58", "59", "60", "61", "62", "63", "64", "65"))

Inspired by a statexchange post, I am using these scripts to make the plots
:
library(ggplot2)
col4=c('#E66101','#FDB863','#B2ABD2','#5E3C99')
dflist=list()
for (i in 1:37){
residue_num=length(which(df$position==i))
dflist[[i]]=df[df$position==i,2:3]
waffle=expand.grid(y=1:residue_num,x=seq_len(ceiling(sum(dflist[[i]]$ratio)/residue_num)))
residuevec=rep(dflist[[i]]$residue,dflist[[i]]$ratio)
waffle$residue=c(as.vector(residuevec),rep(NA,nrow(waffle)-length(residuevec)))
png(paste('plot',i,'.png',sep=''))
print(ggplot(waffle, aes(x = x, y = y, fill = residue)) + geom_tile(color =
"white") + scale_fill_manual("residue",values = col4) + coord_equal() +
theme(panel.grid.minor=element_blank(),panel.grid.major=element_blank())
+ theme(axis.ticks=element_blank()) +
theme(axis.text.x=element_blank(),axis.text.y=element_blank()) +
theme(axis.title.x=element_blank(),axis.title.y=element_blank())
)
dev.off()}

With my scripts, I could make a waffle plot, but not a *square* 10 X 10
waffle plot. Also, the grid size differs for positions with different
numbers of residues. I am suspecting that I didn't use coord_equal()
correctly.

So I wonder how I can make the plots like I described above in ggplot2 or
with some other packages. Also, is there a way to assign a color to
different residues, say, purple for alanine, blue for glycine, etc, and
incorporate that information in the for loop?

Hi Zhao,
By beginning with a 10x10 matrix of NA values and then replacing some of them with a color, I think you can do what you want. First you need a function to fill one corner of your matrix with values, leaving the rest uncolored (i.e. NA):

fill.corner<-function(x,nrow,ncol) {
 xlen<-length(x)
 if(nrow*ncol > xlen) {
  newmat<-matrix(NA,nrow=nrow,ncol=ncol)
  xside<-1
  while(xside*xside < xlen) xside<-xside+1
  row=1
  col=1
  for(xindex in 1:xlen) {
   newmat[row,col]<-x[xindex]
   if(row == xside) {
    col<-col+1
    row<-1
   }
   else row<-row+1
  }
  return(newmat)
 }
 cat("Too many values in x for",xrow,"by",xcol,"\n")
}

Then you have to massage your data frame into 37 smaller data frames, create matrices with the values and colors to display on your 37 waffle plots:

library(plotrix)
# get an "alphabet" of colors
alphacol<-rainbow(18)
# the actual values in the plotted matrix don't matter
fakemat<-matrix(1:100,nrow=10)
# pick off the positions one by one
for(pos in 1:37) {
 posdf<-zjdat[zjdat$position == pos,]
 for(res in 1:dim(posdf)[1]) {
  if(res == 1)
   rescol<-rep(alphacol[as.numeric(posdf$residue[res])],
   posdf$ratio[res])
  else
   rescol<-c(rescol,rep(alphacol[as.numeric(posdf$residue[res])],
   posdf$ratio[res]))
 }
 if(!is.null(resmat<-fill.corner(rescol,10,10)))
  color2D.matplot(fakemat,border="lightgray",cellcolors=resmat,
   yrev=FALSE,main=c(pos,length(resmat)))
}

That might get you started. In fact, I might even write a waffle plot function for plotrix.

Jim

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to