[R] producing histogram-like plot

2011-03-29 Thread Karin Lagesen

Hi!

I have a dataset that looks like this:

0.0 14
0.0 3
0.9 12
0.7315
0.782
1.0 15
0.3 2
0.328

...and so on.

I.e. a value between 0 and 1, and a number

I would like to plot this in a histogram-like manner. I would like to 
have a set of bins, each 0.1 wide, and plot the sum of values in column 
2 that falls within each bin. I.e, in this case I would like the first 
bin, 0.0, to have the value 17, the second, 0.1, to have the value 0 and 
so on, until the last bin which has the value 15. I am sadly uncertain 
of both how to sum these together, and also on which plot type to use.


Thanks in advance!

Karin
--
Karin Lagesen, Ph.D.
Centre for Ecological and Evolutionary Synthesis (CEES)
University of Oslo, Dept. of Biology
P.O. Box 1066 Blindern 0316 Oslo, Norway
Ph. +47 22844132 Fax. +47 22854001
Email karin.lage...@bio.uio.no
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] plotting histograms/density plots in a triangular layout?

2010-11-10 Thread Karin Lagesen

Hi!

I have a set of 49 pairwise comparisons that I have done. From this I 
would like to plot either histograms or the density plots of the values 
I get. Now, I can plot one histogram per comparison, but I have problems 
getting the output I want. When plotting like I normally would do:


histogram(~percid | orgA_orgB, data = alldata)

I get the histograms next to eachother in a boxlike shape. However, 
since these are pairwise ( 7x7 ) I would like to have them placed in a 
triangular shape, like this:



1   x
2   x   x
3   x   x   x
1   3   3

where the Xes represent where I want plots, and the 1,2,3 represent the 
legends.


I have seen similar plots done by R, so I know it is possible, but the 
question is how :D


TIA,

Karin
--
Karin Lagesen, Ph.D.
Centre for Ecological and Evolutionary Synthesis (CEES)
University of Oslo, Dept. of Biology
P.O. Box 1066 Blindern 0316 Oslo, Norway
Ph. +47 22844132 Fax. +47 22854001
Email karin.lage...@bio.uio.no
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] transparent concentric circles

2010-02-09 Thread Karin Lagesen
I have a data set which I would like to plot as a set of concentric
circles. The data represent a count of the number of characteristics
shared by various elements - an example would look like this:

1 100
2 75
3 50
4 25

I.e. all four sets share 25 characteristics, three of them share 50
characteristics, and so on.

I would like to plot these as concentric circles, with the circle size
preferentially being proportional to the size of the number of elements
(this is not a must, however). I would also like the colors of the circles
to become stronger/deeper as we progress to the innermost circle (which
would be the one containing the number of characteristics shared by all
four).

Can somebody point me to what I can use to do this?

Thanks!



Karin
-- 
Karin Lagesen, PhD

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] performing function on data frame

2009-04-16 Thread Karin Lagesen
David Hajage  writes:

> Hi Karin,
>
> I'm not sure I understand... Is this what you want ?
>
> d$y - mean(d$y)/sd(d$y)




Yes, and also a bit no.

Each column in my data frame represents one data set. For every
element in this data set I want to know the z value for that
element. I.e: I want to create a new data frame from the old data
frame, where each element in the new data frame is

newDF[i,j] = oldDF[i,j] - mean(d[,j]) / sddev(d[,j])

I could, I think, iterate like this over the data frame, but I keep
thinking that one of the apply functions should be employed...

Karin
-- 
Karin Lagesen, Ph.D.
karin.lage...@medisin.uio.no
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] performing function on data frame

2009-04-15 Thread Karin Lagesen

Hi!

First, pardon me if this is a faq. I think I should be using some sort
of apply, but I am not managing to figure those out.

I have a data frame similar to this:

> d <- data.frame(x = LETTERS[1:5], y = rnorm(5), z = rnorm(5))
> d
  x  y  z
1 A  0.1605464 -0.2719820
2 B -0.9258660  1.2623117
3 C -0.3602656  1.5470351
4 D  1.2621797  1.2996500
5 E  0.6021728  0.5027095
> 

>From this I want to get a new data frame which contains the z scores
based on the values found in each row. For instance for element [C,y],
I would like to calculate (-0.3602656 - mean(column y)/stddev(column
y)).

Thanks!
-- 
Karin Lagesen, Ph.D.
karin.lage...@medisin.uio.no
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] xyplot key issue - line colors

2008-08-05 Thread Karin Lagesen


I have a problem regarding the colors assigned to the lines in the key
to an xy plot. I specify the plot like this:

xyplot(numbers~sqrt(breaks)|moltype+disttype, groups = type, data = alldata,
  layout = c(3,2), type = "l" , lwd = 2, col = c("gray", "skyblue"), 
  key = simpleKey(levels(alldata$type), points = FALSE, lines = TRUE, 
  columns = 2, lwd = 2, col = c("gray", "skyblue")))

However, the lines in the key (the lines that indicates which line is
which) are still blue and magenta, and not gray and skyblue. I have
seen something about superposing lines on top of this somehow, but I
couldn't figure out how to do it.

Thanks!

Karin
-- 
Karin Lagesen, PhD student
[EMAIL PROTECTED]
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] conversion of data for use within barchart

2008-07-02 Thread Karin Lagesen


I have a data matrix like this:


> data[1:10,]
   aaname grp   cluster count
1 Ala All Singleton   432
2 Arg All Singleton  1239
3 Asn All Singleton   396
4 Asp All Singleton   152
5 Cys All Singleton   206
6 Gln All Singleton   370
7 Glu All Singleton   211
8 Gly All Singleton   594
9 His All Singleton   213
10Ile All Singleton44

where the cluster column has three levels.

> levels(data$cluster)
[1] "Array" "Singleton" "rRNA" 
> 

Now, I would like to plot this like this:

barchart(aaname~count|grp, group = cluster, data = data, stack = TRUE)

I am thus using the cluster as the grouping.

I would like to plot the relative abundance within each grouping, such
that the max level in my plot always is one (or 100). This would for
instance mean for the Ala in the All grp that the Singleton cluster
consitute lets say 40% of the Ala in the All grp, wheras the Singleton
and rRNA makes up 20% each. In this case I would get in my plot a
Singleton stretching to 40%, whereas the other two would be 20% each,
all in all making 100%.

I am uncertain of whether I am managing to describe what I want, so I
hope somebody understands what I want!

Thanks!

Karin
-- 
Karin Lagesen, PhD student
[EMAIL PROTECTED]
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] xyplot and separate abline per plot

2008-06-27 Thread Karin Lagesen

Hello list!

I have a set of data like this:


> alldata[1:5,]
 breaks numbers disttype moltypetype
1 0.0006598 Gapped Distances  5S Between species
2 0.407   0 Gapped Distances  5S Between species
3 0.8135228 Gapped Distances  5S Between species
4 1.220   0 Gapped Distances  5S Between species
5 1.6279702 Gapped Distances  5S Between species
> levels(alldata$disttype)
[1] "Gapped Distances"   "Ungapped Distances"
> levels(alldata$type)
[1] "Between species" "Same species"   
> levels(alldata$moltype)
[1] "16S" "23S" "5S" 
> 

Which I plot like this:

xyplot(numbers~sqrt(breaks)|moltype+disttype, groups = type, data = alldata)

This results in a plot consisting of six different panels. Now, I have
a set of six different values that I would like to incorporate into
these plots through a vertical line in each panel (one separate value
per panel). I think I can do this through panel.abline somehow, but I
don't know how to incoporate that into the xyplot command, and I don't
know how to specify which values I want plotted in each plot.

I hope I am able to convey what I want:)

Thanks in advance,

karin
-- 
Karin Lagesen, PhD student
[EMAIL PROTECTED]
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] xyplot questions - axis and plotting two things in same panel

2008-06-26 Thread Karin Lagesen
"Deepayan Sarkar" <[EMAIL PROTECTED]> writes:

> On 6/25/08, Franz Mueter <[EMAIL PROTECTED]> wrote:
>> As for your first problem, try:
>>
>>  xyplot(numbers~breaks|moltype, groups = type, data = alldata, type = "l")
>>
>>
>>  -Original Message-
>>  From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
>>  Behalf Of Karin Lagesen
>>  Sent: Wednesday, June 25, 2008 2:13 AM
>>  To: r-help@r-project.org
>>  Subject: [R] xyplot questions - axis and plotting two things in same panel
>
> [...]
>
>>  I am also wondering about whether it is possible to change the x axis
>>  scale here. I have data going from 0 to 35, but most of the
>>  interesting stuff is between 0 and 5. Thus I am wondering if there is
>>  any way of specifying that the 0 to 5 range should take up 30 % of the
>>  x axis (or something like that) and gradually shrink the axis after
>>  that. I have tried doing log on the x axis, but I have a lot of zeros
>>  in my data set that really breaks everything.
>
> Rather than having the software support arbitrary axis
> transformations, it would be simpler to transform the data; e.g.,
>
> xyplot(numbers~asinh(breaks) | moltype, groups = type, data = alldata,
> type = "l")
>
> That still leaves the problem of "nice" axis labels in the original
> scale. That is in general a hard problem. For special cases, you can
> specify explicit tick positions (in the transformed scale) and
> associated labels (in the original scale) using the 'scales' argument.

Since I search the archives of this list quite a lot, I thought I'd
just post the results I came up with and which worked nicely for me.

I ended up with using the sqrt transformation, which gave me a very
nice plot. For the axis I used the scales argument, as follows:

# the max value on the x axis for my data is 40
labels = seq(0,40, by = 5)
atvalues = sqrt(seq(0,40, by = 5))
#the plot itself
xyplot(numbers~sqrt(breaks)|moltype, groups = type, data = alldata, type = "l", 
scales = list(at = atvalues, labels = labels))

worked like a charm!

Thanks a lot for your help!

Karin
-- 
Karin Lagesen, PhD student
[EMAIL PROTECTED]
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] xyplot questions - axis and plotting two things in same panel

2008-06-25 Thread Karin Lagesen

Hi list!

I am trying to use xyplot to plot some graphs.

The data I have looks like this:

> alldata[1:10,]
  breaks numbers moltypetype
1  0.0006598  5S Between species
2  0.407   0  5S Between species
3  0.8135228  5S Between species
4  1.220   0  5S Between species
5  1.6279702  5S Between species
6  2.033   0  5S Between species
7  2.4407834  5S Between species
8  2.847   0  5S Between species
9  3.253   12084  5S Between species
10 3.660  24  5S Between species
> 

where moltype and type are factors, moltype having three different
levels, and type having two.

I am now plotting things like this:

xyplot(numbers~breaks|moltype+type, data = alldata, type = "l")

which gives me six panels showing just what I want.

Now, my first problem is how to plot two graphs in the same panel. I
would like one panel per moltype, but I want the type factor to result
in two graphs plotted in the same panel. How do I specify this?

I am also wondering about whether it is possible to change the x axis
scale here. I have data going from 0 to 35, but most of the
interesting stuff is between 0 and 5. Thus I am wondering if there is
any way of specifying that the 0 to 5 range should take up 30 % of the
x axis (or something like that) and gradually shrink the axis after
that. I have tried doing log on the x axis, but I have a lot of zeros
in my data set that really breaks everything.

Thanks for your help! 

Karin
-- 
Karin Lagesen, PhD student
[EMAIL PROTECTED]
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using the stepfun to plot histogram outline.

2008-06-21 Thread Karin Lagesen

Hello list:)

I have lots of values which I would like to get a histogram outline
out of.

An example of what I am talking about:

testdata = runif(100)
bbb = seq(0,1, by = 0.01)
hist(testdata, breaks = bbb)

I would like to get the outline of the resulting histogram.

Now, I think that I can do this using the stepfun function. However, I
am uncertain of how to get to the data the stepfun function require.

>From ?stepfun

Arguments:

   x: numeric vector giving the knots or jump locations of the step
  function for 'stepfun()'.  For the other functions, 'x' is as
  'object' below.

   y: numeric vector one longer than 'x', giving the heights of the
  function values _between_ the x values.


X I think is the same as bbb above. I am however uncertain of how I
would go about getting the data needed for y, given that the data I
have is on the same format as testdata above is.

Thanks for your help!

Karin
-- 
Karin Lagesen, PhD student
[EMAIL PROTECTED]
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] vector comparison

2008-06-05 Thread Karin Lagesen

I know this is fairly basic, but I must have somehow missed it in the
manuals.

I have two vectors, often of unequal length. I would like to compare
them for identity. Order of elements do not matter, but they should
contain the same.

I.e: I want this kind of comparison:

> if (1==1) show("yes") else show("blah")
[1] "yes"
> if (1==2) show("yes") else show("blah")
[1] "blah"
> 

Only replace the numbers with for instance the vectors 

> a = c("a")
> b = c("b","c")
> c = c("c","b")


Now, I realize I only get a warning when comparing things, but this to
me means that I am not doing it correctly:

> if (a==a) show("yes") else show("blah")
[1] "yes"
> if (a==b) show("yes") else show("blah")
[1] "blah"
Warning message:
In if (a == b) show("yes") else show("blah") :
  the condition has length > 1 and only the first element will be used
> 
> if (b == c) show("yes") else show("blah")
[1] "blah"
Warning message:
In if (b == c) show("yes") else show("blah") :
  the condition has length > 1 and only the first element will be used
> 

I have also tried the %in% comparator, but that one throws warnings too:

> if (b %in% c) show("yes") else show("blah")
[1] "yes"
Warning message:
In if (b %in% c) show("yes") else show("blah") :
  the condition has length > 1 and only the first element will be used
> 
> if (c %in% c) show("yes") else show("blah")
[1] "yes"
Warning message:
In if (c %in% c) show("yes") else show("blah") :
  the condition has length > 1 and only the first element will be used
>

So, how is this really supposed to be done?

Thanks!

Karin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] figuring out the results from hclust

2008-05-31 Thread Karin Lagesen

I have two examples that I run hclust on:

a = c(0,1,1.5,1.5)
b = c(1,0,1.5,1.5)
c = c(1.5,1.5,0,0.5)
d = c(1.5,1.5,0.5,0)
ll = as.matrix(rbind(a,b,c,d))
test = as.dist(ll)
long = hclust(test)

a = c(0,0.3,1,1)
b = c(0.3,0,1,1)
c = c(1,1,0,0.5)
d = c(1,1,0.5,0)
ll = as.matrix(rbind(a,b,c,d))
test = as.dist(ll)
short = hclust(test)

The main difference between them is whether a and b gets clustered
higher up or lower down than the b,c cluster.

I am working on partitioning this kind of data into three clusters. I
know I can do that with cutree. The result I get from that is the
following: 

> cutree(short, k=3)
a b c d 
1 1 2 3 
> cutree(long, k=3)
a b c d 
1 2 3 3 
> 

And I can also access the height matrix for both:

> short$height
[1] 0.3 0.5 1.0
> long$height
[1] 0.5 1.0 1.5
> 

So I know at what heights they get merged.

What I seem to be unable to get at is which one of the clusters as
shown by cutree correspond to what split. When I examine short in a
plot I can easily see that the highest split (i.e corresponding to the
last height, 1, in the height matrix) is between the cutree clusters 1
and 2,3. In the long example this split is between 1,2 and 3. I would
however like to not examine all of the data I have by hand:)

Could any of you point me to what I need to do to get at this data? I
have tried to examine the merge data in both cases, but I am coming up
short.

Thanks!

Karin
-- 
Karin Lagesen, PhD student
[EMAIL PROTECTED]
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] deconvoluting hclust objects

2008-05-22 Thread Karin Lagesen

I have a hclust object that looks like this:

> test77

Call:
hclust(d = input)

Cluster method   : complete
Number of objects: 11

> test77$height
 [1] 0.000 0.000 0.000 0.000 0.000 0.000 0.900
 [8] 0.9473684 1.7894737 8.5771948
> test77$merge
  [,1] [,2]
 [1,]   -1   -2
 [2,]   -31
 [3,]   -72
 [4,]   -83
 [5,]   -4   -6
 [6,]   -5  -10
 [7,]   -95
 [8,]  -114
 [9,]78
[10,]69
>

I am specifically interested in what happens when you divide this
object into three clusters. When I look at the plot, the three
clusters form like this: (monospace font)

--
||
|   
|   |  |
2   3  6

What I am wanting is to get distance information on the three groups,
and also the number of objects in each. The distances I need are in
height, but I don't know how to get at the sizes of the subtrees. 

I know I can use cutree to get at the cut, but I cannot see anything
systematic in which group becomes group no 1 and so forth.

The results I want would in this example be:

236 8.5771948   1.7894737

Any hints for me?

Thanks!

Karin
-- 
Karin Lagesen, PhD student
[EMAIL PROTECTED]
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problems with data frames, factors and lists

2008-05-21 Thread Karin Lagesen

I have a function that creates a list based on some clustered data:

mix <- function(Y, pid) {
hc = gethc(Y,pid)
maxheight = max(hc$height)
noingrp = processhc(hc)
one = noingrp$one
two = noingrp$two
twoisone = "one"
if (two != 1)
  twoisone = "more"
out = list(pid = pid,one = noingrp$one, two = noingrp$two, diff = maxheight, 
noseqs = length(hc$labels), twogrp = twoisone)
return(out)
}

example result:

> mix(tsus_same, 77)
$pid
[1] 77

$one
[1] 9

$two
[1] 2

$diff
[1] 8.577195

$noseqs
[1] 11

$twogrp
[1] "more"

>

I then use this function in another function that just runs this
function through a lot of data:


doset <- function(sameset) {
pids = unique(c(sameset$APID, sameset$BPID))
for (f in pids) {
  oputframe = data.frame(rbind(oputframe, mix(sameset, f)))
  }
return(oputframe)
}

All values except $twogrp are numbers. There are two possible values
for $twogrp, "one" and "more". the first one is more common and gets
added to the data frame first. The result is that I cannot add the
rows where this is "more" without getting

38: In `[<-.factor`(`*tmp*`, ri, value = "more") :
  invalid factor level, NAs generated


Now, this is a pain in the neck. How can I merge these lists to the
data frame and still have the value $twogrp as a factor?

Thanks, and I hope my code makes some sense!

Karin
-- 
Karin Lagesen, PhD student 
[EMAIL PROTECTED]
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cloud plot has white(transparent?) background

2008-04-22 Thread Karin Lagesen


I am using the code example from the R graph gallery to look at a
cloud plot:

require(lattice) 
data(iris) 
print(cloud(Sepal.Length ~ Petal.Length * Petal.Width, data = iris, 
groups = Species, screen = list(z = 20, x = -70), 
perspective = FALSE,
key = list(title = "Iris Data", x = .15, y=.85, corner = c(0,1), 
border = TRUE,
points = Rows(trellis.par.get("superpose.symbol"), 1:3), 
text = list(levels(iris$Species) 

Now, in the example on the webpage this comes out with a nice gray
background that makes things easier to see. Mine comes out with a
white, potentially transparent background and also the point colors
have changed.

How do I get the nice gray color back?

Thanks!

Karin
-- 
Karin Lagesen, PhD student
[EMAIL PROTECTED]
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] adding bwplot to existing bwplot

2008-03-27 Thread Karin Lagesen


Hello.

I have made many normal boxplots where I have added a new boxplot to
an existing one. When I have done this, I have used the at command to
move the boxplots a bit so that they could fit next to eachother, like
this:

boxplot(data.., at = number_of_categories-0.15)
boxplot(data.., at = number_of_categoreis+0.15, add =TRUE)

Now I am wondering if it is possible to do the same in some way with
bwplot. The data I want to plot is like this:

> operonthings[1:5,]
  phylum   pid type no_clust no_seqs
1  Acidobacteria 15771   5S1   1
2  Acidobacteria 12638   5S1   2
3 Actinobacteria 16321   5S2   6
4 Actinobacteria92   5S2   2
5 Actinobacteria87   5S1   5
>

where phylum and types are the factors I would like to plot no_clust
and no_seqs against.I basically want these in the same plot:

bwplot(no_clust~type|phylum, data = operonthings)
and
bwplot(no_seqs~type|phylum, data = operonthings)

Any thoughts on how to do this?

Thanks!

Karin

-- 
Karin Lagesen, PhD student
[EMAIL PROTECTED]
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] hclust graphics - plotting many points

2008-03-10 Thread Karin Lagesen

Hello.

I have a distance matrix with lots of distances that I use hclust to
organise. I then plot the results using the plot method of hclust.

However, the plot itself takes around 20 mins to make due to there
being ~700 things in the matrix that I have distances for. I thus
would like to dump this to some graphics format which will let me
examine this further.

I tried dumping it to postscript:

postscript("myfile.ps", height = 50, pointsize=5)
plot(my_hc_object)
dev.off()

What happens is that since most of the items in the matrix have a
distance of zero to something everything just becomes a black smear on
the bottom where I cannot distinguish anything from anything else. I
thus tried increasing the heigth and/or width and also downscaling the
pointsize. None of these improved anything much. 

So, now I am wondering if any of you have any tips for how I can get
something like I get in the x11() window which I can also store and
potentially show other people.

Thanks!

Karin
-- 
Karin Lagesen, PhD student
[EMAIL PROTECTED]
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] combine vector and data frame on field?

2008-02-26 Thread Karin Lagesen

I have managed to create a data frame like this:


> tsus_same_mean[1:10,]
 PIDGrpDist   PercAlnPercId
1  12638  Acidobacteria 0.0 1.000 1.000
2 87 Actinobacteria 0.0 0.970 0.970
3 92 Actinobacteria 0.008902000 1.000 0.991
4 94 Actinobacteria 0.0 1.000 1.000
5189 Actinobacteria 0.005876733 0.973 0.9676667
6242 Actinobacteria 0.001734200 0.973 0.9715333
7305 Actinobacteria 0.0 0.970 0.970
8307 Actinobacteria 0.0 0.970 0.970
9328 Actinobacteria 0.0 1.000 1.000
10 10689 Actinobacteria 0.0 1.000 1.000
> 

and what I think is a factor like this:

> tsuPIDCount[1:10]
 3  4  5  8  9 12 13 15 18 19
 2  2  2  3  4  7  4  2  2  3
>

Now, I'd like to combine the two. The factor levels in tsuPIDCount
corresponds to the field called PID in the data frame.

Any hints on how to do this? cbind just adds the vector onto the end,
and I couldn't quite figure out if I could somehow say that the level
should correspond to the PID.

Thanks a lot for your helpin advance:)

Karin
-- 
Karin Lagesen, PhD student
[EMAIL PROTECTED]
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] clustering problem

2008-02-20 Thread Karin Lagesen

First I just want to say thanks for all the help I've had from the
list so far..)

I now have what I think is a clustering problem. I have lots of
objects which I have measured a dissimilarity between. Now, this list
only has one entry per pair, so it is not symmetrical.

Example input:

NameA   NameB   Dist
189_1C2 189_1C1 0
189_1C3 189_1C1 0.017
189_1C3 189_1C2 0.017
189_1C4 189_1C1 0
189_1C4 189_1C2 0
189_1C4 189_1C3 0.017
189_1C5 189_1C1 0.05
189_1C5 189_1C2 0.05
189_1C5 189_1C3 0.067
189_1C5 189_1C4 0.05
189_1C6 189_1C1 0.05
189_1C6 189_1C2 0.05
189_1C6 189_1C3 0.067
189_1C6 189_1C4 0.05
189_1C6 189_1C5 0


The distance measure is 0 if identical, and then increases with
increasing dissimilarity up till 1.

What I would like to get from these data is a hierarchical clustering
graph. In this example I would then group

189_1C2 189_1C1 189_1C4,

189_1C6 189_1C5,

and 189_1C3 off with itself.

The distances between the groups should be the mean distances between
the objects within each group (I think).

I have looked at hclust and it seems like it should be able to do what
I want. However, I am unsure of how to use it to get what I am looking
for.

Thankyou in advance for your help!

Karin
-- 
Karin Lagesen, PhD student
[EMAIL PROTECTED]
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tabulation on dataframe question

2008-02-18 Thread Karin Lagesen

I have a data frame with data similar to this:

NameA  GrpA   NameB GrpB Dist
A  Alpha  B Alpha   0.2
A  Alpha  C Beta0.2
A  Alpha  D Beta0.4
B  Alpha  C Beta0.2
B  Alpha  D Beta0.1
C  Beta   D Beta0.3

Dist is a distance measure between two entities. The table displays
all to all distances, but the distance between two entities only
appears once. What I would like to get is a table where I get a count
of entities per group where the distance satisfies a certain condition
( equal to zero for instance).

In this case, if the requirement was Distances == 0.2

AlphaBeta
Alpha   12
Beta20

This resulting table would be symmetrical.

I hope I am able to convey what I would like, and TIA for your help!

Karin
-- 
Karin Lagesen, PhD student
[EMAIL PROTECTED]
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] extracting rows from dataframe that match a vector

2008-02-05 Thread Karin Lagesen

Hi!

I have a large dataframe that I want to extract a subset from. This
subset has a certain column value that matches elements in a vector I
have defined. So, my question is how do I get the rows that match one
of the elements in the vector.

Example:

a = c(1:5)
b = letters[1:10]
df = data.frame(ind = a, letrs = b)

> df
   ind letrs
11 a
22 b
33 c
44 d
55 e
61 f
72 g
83 h
94 i
10   5 j
>


# Now I want to extract all of the rows where ind == 2, 4 or 5.
# This would be rows 2, 4, 5, 7, 9 and 10 

subgr = c(2,4,6)

My most natural inclination would be to do 

df[df$ind == subgr,]

However, this does not work:


> df[df$ind == subgr,]
  ind letrs
7   2 g
Warning message:
In df$ind == subgr :
  longer object length is not a multiple of shorter object length
>

So, which part of this is it that I have misunderstood?

Thanks for your help btw!

Karin
-- 
Karin Lagesen, PhD student
[EMAIL PROTECTED]
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] accessing the indices of outliers in a data frame boxplot

2008-01-25 Thread Karin Lagesen

I have a data frame containing columns which are factors. I use this
to make boxplots for the data, with one box per factor. I would now
like to get at the data in the data frame which corresponds to the
outliers. I have so far found the $out, which gives "the values of any
data points which lie beyond the extremes of the whiskers", but I
haven't found anything which will let me get at the indices in the
original data frame for these outliers. 

I think there might be a chance that I could simply compare the values
I am plotting from my data frame with the values for the whiskers and
use that as a criteria, but I am unsertain of how to do this withhout
doing it manually. The factor I am plotting against contains 17
levels, and I'd thus like to see if there is a somewhat more general
solution available.

Thanks for your help!

Karin
-- 
Karin Lagesen, PhD student
[EMAIL PROTECTED]
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] contingency table on data frame

2008-01-22 Thread Karin Lagesen

I am sorry if this is a faq or tutorial somewhere, but I am unable to
solve this one.

What I am looking for is a count of how many different
categories(numbers in this case) that appears for a given factor.

Example:

> l <- c("Yes", "No", "Perhaps")
> x <- factor( sample(l, 10, replace=T), levels=l )
> m <- c(1:5)
> y <- factor( sample(m, 10, replace=T), levels=m )
> z = c(1:10)
> my_df = data.frame("Z" = z, "Y"= y, "X" = x)
> my_df
Z Y   X
1   1 4 Yes
2   2 1  No
3   3 2 Perhaps
4   4 3 Yes
5   5 4  No
6   6 5  No
7   7 1 Yes
8   8 4 Perhaps
9   9 4 Yes
10 10 2 Perhaps
> 

I am now looking for a table that will give me this:

Yes  3   # Yes has these ys: 4,3,1,4, two are the same, ergo 3
No   3   # No has these ys: 1,4,5
Perhaps  2   # Perhaps has these ys: 2,4,2

My dataframe has lots of other colums too, but I only want this
information out.


Thankyou for your help!

Karin
-- 
Karin Lagesen, PhD student
[EMAIL PROTECTED]
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subsetting a data frame using string matching

2008-01-21 Thread Karin Lagesen

Example data frame: 


a = c("Alpha", "Beta", "Gamma", "Beeta", "Alpha", "beta")
b = c(1:6)
example = data.frame("Title" = a, "Vals" = b)


> example
  Title Vals
1 Alpha1
2  Beta2
3 Gamma3
4 Beeta4
5 Alpha5
6  beta6
> 

I would like to be able to get a new data frame from this data frame
containing only rows that match a certain string. In this case it
could for instance be the string "eta". I have tried various ways of
using agrep and grep, but so far I have not found anything that
worked.

Thankyou in advance for your help!

Karin
-- 
Karin Lagesen, PhD student
[EMAIL PROTECTED]
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] creating summary functions for data frame

2007-10-11 Thread Karin Lagesen

I have a data frame that looks like this:


> gctablechromonly[1:5,]
 refseq geometry gccontent X60_origin X60_terminus  length  kingdom
1 NC_009484  cir0.6799179   773000 3389227 Bacteria
2 NC_009484  cir0.6799179   773000 3389227 Bacteria
3 NC_009484  cir0.6799179   773000 3389227 Bacteria
4 NC_009484  cir0.6799179   773000 3389227 Bacteria
5 NC_009484  cir0.6799179   773000 3389227 Bacteria
  grp feature gene begin dir gc_content replicor LEADLAG
1 Alphaproteobacteria CDS  CDS   261   +   0.654244RIGHTLEAD
2 Alphaproteobacteria CDS  CDS  1737   -   0.651408RIGHT LAG
3 Alphaproteobacteria CDS  CDS  2902   +   0.607843RIGHTLEAD
4 Alphaproteobacteria CDS  CDS  3693   +   0.617647RIGHTLEAD
5 Alphaproteobacteria CDS  CDS  4227   +   0.699208RIGHTLEAD
>

About half of these columns are factors, for instance refseq, kingdom,
grp and feature.

Now, I have seen that I can do 

by(gctablechromonly, gctablechromonly$feature, summary)

to get useful information.

However, I a wondering how I can write my own functions to get what
I'd like. For instance, how could I get a table with grp as rows down
the right, feature on the top, and a count of each kind of feature
within each grp?

I realize that this is probably pretty easy to do, but I do not know
enough R yet to know which words to look for in the mail archives...:)

TIA,

Karin
-- 
Karin Lagesen, PhD student
[EMAIL PROTECTED]
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] "continuous" boxplot?

2007-10-01 Thread Karin Lagesen


I have two vectors x and y, which I would like to plot against each
other. I am also displaying other data in this plot. However, I have
about 1 million points to plot, and just plotting them x againt y is
not very informative. What I'd like to do is to do sort of a
continuous box plot. 

My x values goes from -1 to 1 and my y values from 0 to 1, so I´d like
to plot the median and quantiles, and possibly also all of the
outliers somehow. Are there any facilities in R for doing something
like this, or would I need to do this the hard coded way?

Thankyou very much for your help!

Karin
-- 
Karin Lagesen, PhD student
[EMAIL PROTECTED]
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to choose name for package during install (was:Re: problem loading hexbin associated package colorspace)

2007-09-27 Thread Karin Lagesen
Prof Brian Ripley <[EMAIL PROTECTED]> writes:

> Where does this 'hexbin' package come from?  The one I have installed
> (and the only one I found) is from BioC, and that does not depend on
> colorspace:
>
> Description:
>
> Package:   hexbin
> Version:   1.10.0
> Date:  2006-09-28
> Depends:   R (>= 2.0), methods, grid, lattice

I have now discovered that I had an old hexbin version installed, one
that did require colorspace. I am now trying to install the 1.10
version, however, I cannot get it properly loaded since library grabs
the first one in the list. So now for my next question:

During R CMD INSTALL, how do I specify that a package should not just
be named "hexbin" but for instance "hexbin_1.10" so that I can
actually tell library to get the correct one. (this seems to me to be
the way the library help file tells me that I should solve this
problem).

> That said, something is wrong with your installation of colorspace, so
> I suggest you reinstall it.

Reinstalled, does still not load properly:

alanine[15:01]:~/work/rna_comparison/scripts/rpackages> cat 
colorspace/DESCRIPTION 
Package: colorspace
Version: 0.95
Date: 2006-11-16
Title: Colorspace Manipulation
Author: Ross Ihaka <[EMAIL PROTECTED]>
Maintainer: Ross Ihaka <[EMAIL PROTECTED]>
Depends: R (>= 2.0.0), methods
Description: Carries out mapping between assorted color spaces.
License: BSD
URL: http://www.r-project.org
LazyLoad: yes
Packaged: Thu Nov 16 11:47:26 2006; ihaka
Built: R 2.5.1; x86_64-unknown-linux-gnu; 2007-09-27 12:58:16; unix
alanine[15:02]:~/work/rna_comparison/scripts/rpackages> 

> library(colorspace)
Error in loadNamespace(package, c(which.lib.loc, lib.loc), keep.source = 
keep.source) : 
in 'colorspace' methods for export not found: [, coords, plot
Error: package/namespace load failed for 'colorspace'
> 


Karin
-- 
Karin Lagesen, PhD student
[EMAIL PROTECTED]
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem loading hexbin associated package colorspace

2007-09-27 Thread Karin Lagesen

I have lots of data that I need to display, and I think hexbin would
be good for it.

However, I cannot load one of the requried packages associated with
the hexbin package:

> library(hexbin)
Loading required package: colorspace
Error in loadNamespace(package, c(which.lib.loc, lib.loc), keep.source = 
keep.source) : 
in 'colorspace' methods for export not found: [, coords, plot
Error: package 'colorspace' could not be loaded
> library(colorspace)
Error in loadNamespace(package, c(which.lib.loc, lib.loc), keep.source = 
keep.source) : 
in 'colorspace' methods for export not found: [, coords, plot
Error: package/namespace load failed for 'colorspace'
> sessionInfo()
R version 2.5.1 (2007-06-27) 
x86_64-unknown-linux-gnu 

locale:
C

attached base packages:
[1] "grid"  "stats" "graphics"  "grDevices" "utils" "datasets" 
[7] "methods"   "base" 

other attached packages:
 lattice 
"0.14-9" 
> 

The colorspace package is version 0.95.

Is this an error with my system, this code, or something else?

Thanks for having this list btw...:)

Karin
-- 
Karin Lagesen, PhD student
[EMAIL PROTECTED]
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Am I misunderstanding the ifelse construction?

2007-09-25 Thread Karin Lagesen

I have a function like this:

changedir <- function(dataframe) {
dir <- dataframe$dir
gc_content <- dataframe$gc_content
d <- ifelse(dir == "-",
gc_content <- -gc_content,gc_content <- gc_content)
return(d)
}

The goal of this function is to be able to input a data frame like this:


> lala
   dir gc_content
1+0.5
2-0.5
3+0.5
4-0.5
5+0.5
6-0.5
7+0.5
8-0.5
9+0.5
10   -0.5
11   +0.5
12   -0.5
13   +0.5
14   -0.5
15   +0.5
16   -0.5
17   +0.5
18   -0.5
19   +0.5
20   -0.5
>

And change the sign of the value of the gc_content field if the
corresponding dir field is negative.

Howver, when I run this through the changedir function, all of the
gc_contents become negative.

An I misunderstanding how to use the ifelse construct? And in that
case, how should I go about doing this in a different way?

Thankyou very much in advance for your help, and I hope that my
question is not too banal!

Karin
-- 
Karin Lagesen, PhD student
[EMAIL PROTECTED]
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] function on factors - how best to proceed

2007-09-19 Thread Karin Lagesen

Sorry about this one being long, and I apologise beforehand if there
is something obvious here that I have missed. I am new to creating my
own functions in R, and I am uncertain of how they work.

I have a data set that I have read into a data frame:

> gctable[1:5,]
 refseq geometry X60_origin X60_terminus  length  kingdom
1 NC_009484  cir179   773000 3389227 Bacteria
2 NC_009484  cir179   773000 3389227 Bacteria
3 NC_009484  cir179   773000 3389227 Bacteria
4 NC_009484  cir179   773000 3389227 Bacteria
5 NC_009484  cir179   773000 3389227 Bacteria
  grp feature gene begin dir gc_content replicor LEADLAG
1 Alphaproteobacteria CDS  CDS   261   +   0.654244RIGHTLEAD
2 Alphaproteobacteria CDS  CDS  1737   -   0.651408RIGHT LAG
3 Alphaproteobacteria CDS  CDS  2902   +   0.607843RIGHTLEAD
4 Alphaproteobacteria CDS  CDS  3693   +   0.617647RIGHTLEAD
5 Alphaproteobacteria CDS  CDS  4227   +   0.699208RIGHTLEAD
> 

Most of these columns are factors.

Now, I have a function that I would like to employ on this data
frame. Right now I cannot get it to work, and that seems to be due to
the columns in the data frame being factors. I tested it with a data
frame created from vectors, and it worked fine.

The function:

percentdistance <- function(origin, terminus, length, begin, replicor){
print(c(origin, terminus, length, begin, repl))
d = 0
if (terminus>origin) {
  if(replicor=="LEFT") {
d = -((origin-begin)%%length)
  }
else {
d = (begin-origin)
  }
}
else {
  if (replicor=="LEFT") {
d=(origin-begin)
  }
  else{
d = -((begin-origin)%%length)
  }
}
d/length*2
}

The error I get:
> percentdistance(gctable$X60_origin, gctable$X60_terminus, gctable$length, 
> gctable$begin, gctable$replicor)
[1]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87
   [19]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87
   [37]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87
   [55]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87
   [73]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87
   [91]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87
  [109]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87
  [127]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87
.[99919]   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   
2   2
[99937]   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
[99955]   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
[99973]   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
[1]   2   2   2   2   2   2   2   2   2
 [ reached getOption("max.print") -- omitted 8526091 entries ]]
Error in if (terminus > origin) { : missing value where TRUE/FALSE needed
In addition: Warning messages:
1: > not meaningful for factors in: Ops.factor(terminus, origin) 
2: the condition has length > 1 and only the first element will be used in: if 
(terminus > origin) { 
> 

This worked nice when the input were columns from a data frame created
from vectors.

I have also tried the different apply-functions, although I am
uncertain of which one would be appropriate here.


I would like to use this function to create a new data frame which
would look something like this:

new_frame = (gctable$feature, gctable$gene, gctable$kingdom, gctable$grp, 
gctable$gc_content, percentdistance(gctable))

I am uncertain of how to proceed. Should I deconstruct the data frame
within the function, or should I get just the numbers out of the
factors and input that into the function? Or is my solution way off
from how things are done in R?

Thankyou very much for your help!

Karin
-- 
Karin Lagesen, PhD student
[EMAIL PROTECTED]
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] add group to boxplot

2007-09-17 Thread Karin Lagesen


I have two sets of data in data frames (different dimensions). Now, I
am able to make boxplots for one of these. I am using a formula, which
gives me three boxplots which is automatically placed at at = c(1:3),
since there are three groups. However, I would now like to add data
from the other data frame, at position 4 in the box plot. Is there any
way of telling the first boxplot command that this should be allowed?

Hopefully an example that expresses what I need:

> data(InsectSprays)
> boxplot(count~spray, data = InsectSprays)
> Aspray = subset(InsectSprays, spray == "A")
> Aspray[] = lapply(Aspray, function(x) if (is.factor(x)) factor(x) else x)

Now, I want to add Aspray as a new boxplot at the end of the existing plot,

What do I do then?


Karin
-- 
Karin Lagesen, PhD student
[EMAIL PROTECTED]
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.