[R] read.alignment() crashes

2016-02-29 Thread Eric Hu
Hi, I am trying to read a fasta file with >15K alignedd sequences. However it 
crashes showing a runtime error. I have tried both R.3.2.3 and R.3.2.1 under 
windows 7. A smaller fasta file works perfectly fine in either case.

library(seqinr)
myseqs <- read.alignment("aligned_seqs.fasta",format="fasta")

Thanks,
Eric


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Non-uniformly distributed plot

2011-01-04 Thread Eric Hu
Hi Dennise, David and Jorge,

I hope you all had a great holiday season. Thank you so much for the good 
comments. I think all your suggestions are sufficient for my current needs. I 
guess it is safe to say that there is no easy way to plot the data according to 
the x-axis range. By the way, spotfire can achieve this beautifully.

Eric


From: Dennis Murphy [mailto:djmu...@gmail.com]
Sent: Friday, December 24, 2010 12:43 PM
To: Eric Hu
Cc: r-help@r-project.org
Subject: Re: [R] Non-uniformly distributed plot

Hi:

One way to combine Jorge's and David's solutions is to visualize the data in 
ggplot2 and/or lattice:

library(ggplot2)
x - c(0.349763, 3.39489, 1.52249, 0.269066, 0.107872, 0.0451689,
0.590268, 0.275755, 0.751845, 1.00599, 0.652409, 2.80664, 0.0269933,
0.137307, 0.282939, 1.23008, 0.436429, 0.0555626, 1.10624, 53,
1.30411, 1.29749, 53, 3.2552, 1.189, 2.23616, 1.13259, 0.505039,
1.05812, 1.18238, 0.500926, 1.0314, 0.733468, 3.13292, 1.26685,
3.10882, 1.01719, 0.13096, 0.0529692, 0.418408, 0.213299, 0.536631,
1.82336, 1.15287, 0.192519, 0.961295, 51, 0.470511, 4.05688,
1.78098, 0.364686, 1.24533)
y - c(0.423279, 0.473681, 0.629478, 1.09712, 0.396239, 0.273577,
0.303214, 0.628386, 0.465841, 0.687251, 0.544569, 0.635805, 0.358983,
0.16519, 0.366217, 1.08421, 0.668939, 0.181861, 0.782656, 13.3816,
1.15256, 0.965943, 20, 2.86051, 0.304939, 1.94654, 0.967576,
0.647599, 0.520811, 1.27434, 0.363666, 0.93621, 0.544573, 0.696733,
1.0031, 3.78895, 0.694053, 0.289111, 0.178439, 0.746576, 0.391725,
0.363901, 1.20297, 0.461934, 0.364011, 0.691368, 20, 0.81947,
1.69594, 1.56381, 0.900398, 0.960948)

d - data.frame(x, y)

g - ggplot(d, aes(log(x), log(y))
g + geom_point() + geom_smooth(colour = 'red', size = 1) +
  geom_smooth(method = 'lm', colour = 'blue', size = 1)

The default smooth is a loess curve, which shows the curvature present in the 
residual  vs. fitted plot from Jorge's solution. The predicted values from the 
linear model in the log-log scale lie along the blue line. (To get rid of the 
confidence curves, add se = FALSE to both geom_smooth() calls above.) If you 
were to fit a model to these data in the log-log scale, the plot indicates that 
a quadratic polynomial would be a reasonable next step.

This is pretty easy to do in lattice as well (sans the confidence curves):

library(lattice)
xyplot(log(y) ~ log(x), data = d, type = c('p', 'r', 'smooth'),
pch = 16, col = 'black',
panel = function(x, y, ...) {
   panel.xyplot(x, y, ..., col.line = 'blue')
   panel.loess(x, y, col.line = 'red')
   }
)

I needed to write a small panel function to get separate colors for the least 
squares line and loess curves, but maybe there's an easier way (col.line = 
c('blue', 'red') by itself doesn't work - I tried that - and it makes sense to 
me why it doesn't).

Dennis

On Thu, Dec 23, 2010 at 3:50 PM, David Winsemius 
dwinsem...@comcast.netmailto:dwinsem...@comcast.net wrote:

On Dec 23, 2010, at 6:41 PM, David Winsemius wrote:

On Dec 23, 2010, at 5:55 PM, Eric Hu wrote:
Thanks David. I am reposting the data here.

Jorge has already responded masterfully. He's apparently less lazy that I and 
did all the editing. A log transformation as he illustrated can be very useful 
with bivariate skewed distributions. The only variation I would have suggested 
would be to record the default par settings and restore them at the end.

You could also repeat the plot an use abline to look at the predicted values

plot(x,y, log=xy)
lines( log(x), fit$predicted)

It's complementary to the residual plot and the QQ plot in the plot.lm display 
for consideration of the possibility that this may not be a truly 
log-log-linear relationship.



--
David

Eric

Hi,

I would like to plot a linear relationship between variable x and y.
Can anyone help me with scaled plotting and axes so that all data
points can be visualized somehow evenly? Plaint plot(x,y) will
generate condensed points near (0,0) due to several large data
points. Thank you.

Eric


dput(x)
c(0.349763, 3.39489, 1.52249, 0.269066, 0.107872, 0.0451689,
0.590268, 0.275755, 0.751845, 1.00599, 0.652409, 2.80664, 0.0269933,
0.137307, 0.282939, 1.23008, 0.436429, 0.0555626, 1.10624, 53,
1.30411, 1.29749, 53, 3.2552, 1.189, 2.23616, 1.13259, 0.505039,
1.05812, 1.18238, 0.500926, 1.0314, 0.733468, 3.13292, 1.26685,
3.10882, 1.01719, 0.13096, 0.0529692, 0.418408, 0.213299, 0.536631,
1.82336, 1.15287, 0.192519, 0.961295, 51, 0.470511, 4.05688,
1.78098, 0.364686, 1.24533)
dput(y)
c(0.423279, 0.473681, 0.629478, 1.09712, 0.396239, 0.273577,
0.303214, 0.628386, 0.465841, 0.687251, 0.544569, 0.635805, 0.358983,
0.16519, 0.366217, 1.08421, 0.668939, 0.181861, 0.782656, 13.3816,
1.15256, 0.965943, 20, 2.86051, 0.304939, 1.94654, 0.967576,
0.647599, 0.520811, 1.27434, 0.363666, 0.93621, 0.544573, 0.696733,
1.0031, 3.78895, 0.694053, 0.289111, 0.178439, 0.746576, 0.391725,
0.363901

[R] Non-uniformly distributed plot

2010-12-23 Thread Eric Hu
Hi,

I would like to plot a linear relationship between variable x and y. Can anyone 
help me with scaled plotting and axes so that all data points can be visualized 
somehow evenly? Plaint plot(x,y) will generate condensed points near (0,0) due 
to several large data points. Thank you.

Eric


 x
 [1]  0.3497630  3.3948900  1.5224900  0.2690660  0.1078720  0.0451689  
0.5902680  0.2757550  0.7518450
[10]  1.0059900  0.6524090  2.8066400  0.0269933  0.1373070  0.2829390  
1.2300800  0.4364290  0.0555626
[19]  1.1062400 53.000  1.3041100  1.2974900 53.000  3.2552000  
1.189  2.2361600  1.1325900
[28]  0.5050390  1.0581200  1.1823800  0.5009260  1.0314000  0.7334680  
3.1329200  1.2668500  3.1088200
[37]  1.0171900  0.1309600  0.0529692  0.4184080  0.2132990  0.5366310  
1.8233600  1.1528700  0.1925190
[46]  0.9612950 51.000  0.4705110  4.0568800  1.7809800  0.3646860  
1.2453300
 y
 [1]  0.423279  0.473681  0.629478  1.097120  0.396239  0.273577  0.303214  
0.628386  0.465841
[10]  0.687251  0.544569  0.635805  0.358983  0.165190  0.366217  1.084210  
0.668939  0.181861
[19]  0.782656 13.381600  1.152560  0.965943 20.00  2.860510  0.304939  
1.946540  0.967576
[28]  0.647599  0.520811  1.274340  0.363666  0.936210  0.544573  0.696733  
1.003100  3.788950
[37]  0.694053  0.289111  0.178439  0.746576  0.391725  0.363901  1.202970  
0.461934  0.364011
[46]  0.691368 20.00  0.819470  1.695940  1.563810  0.900398  0.960948

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Non-uniformly distributed plot

2010-12-23 Thread Eric Hu
Thanks David. I am reposting the data here.

Eric


 Hi,

 I would like to plot a linear relationship between variable x and y.  
 Can anyone help me with scaled plotting and axes so that all data  
 points can be visualized somehow evenly? Plaint plot(x,y) will  
 generate condensed points near (0,0) due to several large data  
 points. Thank you.

 Eric


 dput(x)
c(0.349763, 3.39489, 1.52249, 0.269066, 0.107872, 0.0451689, 
0.590268, 0.275755, 0.751845, 1.00599, 0.652409, 2.80664, 0.0269933, 
0.137307, 0.282939, 1.23008, 0.436429, 0.0555626, 1.10624, 53, 
1.30411, 1.29749, 53, 3.2552, 1.189, 2.23616, 1.13259, 0.505039, 
1.05812, 1.18238, 0.500926, 1.0314, 0.733468, 3.13292, 1.26685, 
3.10882, 1.01719, 0.13096, 0.0529692, 0.418408, 0.213299, 0.536631, 
1.82336, 1.15287, 0.192519, 0.961295, 51, 0.470511, 4.05688, 
1.78098, 0.364686, 1.24533)
 dput(y)
c(0.423279, 0.473681, 0.629478, 1.09712, 0.396239, 0.273577, 
0.303214, 0.628386, 0.465841, 0.687251, 0.544569, 0.635805, 0.358983, 
0.16519, 0.366217, 1.08421, 0.668939, 0.181861, 0.782656, 13.3816, 
1.15256, 0.965943, 20, 2.86051, 0.304939, 1.94654, 0.967576, 
0.647599, 0.520811, 1.27434, 0.363666, 0.93621, 0.544573, 0.696733, 
1.0031, 3.78895, 0.694053, 0.289111, 0.178439, 0.746576, 0.391725, 
0.363901, 1.20297, 0.461934, 0.364011, 0.691368, 20, 0.81947, 
1.69594, 1.56381, 0.900398, 0.960948)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pipeline pilot fingerprint package

2010-10-13 Thread Eric Hu
Hi Rajarshi,

Here is a post I found from Pipeline pilot community help pages:

https://community.accelrys.com/message/3466


Eric

-Original Message-
From: Rajarshi Guha [mailto:rajarshi.g...@gmail.com] 
Sent: Wednesday, October 13, 2010 7:52 AM
To: Eric Hu
Cc: r-help@r-project.org
Subject: Re: Pipeline pilot fingerprint package

On Tue, Oct 12, 2010 at 8:54 PM, Eric Hu eric...@gilead.com wrote:
 Hi,

 I am trying to see if I can use R to perform more rigorous regression
 analysis. I wonder if the fingerprint package is able to handle pipeline
 pilot fingerprints (ECFC6 etc) now.

Currently no - does Pipeline Pilot out put their ECFP's in a standard
format? if so can you send me an example file? (Asuming they output
fp's for a single molecule on a single row, you could implement your
own line parse and supply it via the lf argument in fp.read. See
cdk.lf, moe.lf or bci.lf for examples)

The other issue is how one evaluates similarity between variable
length feature fingerprints, such as ECFPs. One approach is to map the
features into a fixed length bit string. Another approach is to just
look at intersections and unions of features to evaluate the Tanimoto
score. It seems to me that the former leads to loss of resolution and
that the latter could lead to generally low Tanimoto scores.

Do you know what Pipeline Pilot does?
-- 
Rajarshi Guha
NIH Chemical Genomics Center

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Pipeline pilot fingerprint package

2010-10-12 Thread Eric Hu
Hi,
I am trying to see if I can use R to perform more rigorous regression analysis. 
I wonder if the fingerprint package is able to handle pipeline pilot 
fingerprints (ECFC6 etc) now.

Thank you,
Eric

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Plot time range with rect or boxplot

2010-10-09 Thread Eric Hu
Hi,
I am trying to use rect (R2.11) to plot a set of data as following
 
 
  data
  CompanyPt  Pri  Pub
1AWO520  8/5/09  2/11/10
2BWO893 7/30/03  2/24/05
3AWO258 12/8/08  6/17/10
4C   WO248 1/13/09   9/2/10


pri- strptime(pri,%m/%d/%y)
pub - strptime(pub,%m/%d/%y)

plot.new()
plot.window(xlim=c(min(pri,pub),max(pri,pub)),ylim=c(0,length(company)-1))
%y - seq(0,0.5*(length(company)-1),0.5)
%h - 0.1
%rect(pri, y-h, pub, y+h, col=c(light blue,pink,yellow,red))
 
Neither xlim nor rect/boxplot recognizes pri/pub with date format. I wonder if 
there is a good way to deal with the date ploting so the x-axis can reflect the 
actual time range.
 
Thank you,
Eric
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Plot data on year/mon/day

2010-10-08 Thread Eric Hu
Hi,
I am trying to use rect (R2.11) to plot a set of data as following

Company

Pt

Pri

Pub

A

W200

4/5/2009

3/11/2010

B

W293

2/30/2003

3/24/2005

A

W258

2/8/2008

8/17/2010

C

W248

5/13/2009

1/2/2010



%y - seq(0,0.5*(length(company)-1),0.5)
%h - 0.1
%rect(pri, y-h, pub, y+h, col=c(light blue,pink,yellow,red))

I wonder if there is a good way to deal with the date conversion so the x-axis 
can reflect the actual time range.

Thank you,
Eric



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.