date:20101105

Re: [R] slow dget

2010-11-05 Thread jim holtman

dput/dget were not intended to save/restore large objects.  Understand
what is happening in the use of dput/dget.  dput is creating a text
file that can reconstitute the object with dget.  dget is having to
read the file in and then parse it:

> dget
function (file)
eval(parse(file = file))

This can be a complex process if there object is large and complex.

save/load basically take the binary object and save it with little
additional processing and the load is just as fast.

In general, most of the functions can be used both correctly and
incorrectly.  So should a warning for every potential
condition/criteria be put in the help file?  Probably not.  It is hard
to protect the user against him/herself.

So what you are doing in seeing how long alternatives take is a good
learning tool and will help you improve your use of the features.

On Fri, Nov 5, 2010 at 11:16 PM, Jack Tanner  wrote:
> I have a data structure that is fast to dput(), but very slow to dget(). On
> disk, the file is about 35MB.
>
>> system.time(dget("r.txt"))
>   user  system elapsed
>  142.93    1.27  192.84
>
> The same data structure is fast to save() and fast to load(). The .RData file 
> on
> disk is about 12MB.
>
>> system.time(load("r.RData"))
>   user  system elapsed
>   4.89    0.08    7.82
>
> I imagine that this is a known speed issue with dget, and that the recommended
> solution is to use load, which is fine with me. If so, perhaps a note to this
> effect could be added to the dget help page.
>
> All timings above using
>
> R version 2.12.0 (2010-10-15)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] calculate probability

2010-11-05 Thread Joshua Wiley

Hi Jumlong,

Is this what you want?

> pnorm(q = c(2.4, 2.9), mean = 2, sd = 1)
[1] 0.6554217 0.8159399

HTH,

Josh


On Fri, Nov 5, 2010 at 9:57 PM, Jumlong Vongprasert
 wrote:
> Dear Joshua Wiley
> 2.4 and 2.9 are score, and mean = 2 variance = 1 n = 10 with normal
> distribution.
> Many Thanks.
> Jumlong
>
> 2010/11/6 Joshua Wiley 
>>
>> Dear Jumlong,
>>
>> Perhaps look at ?pnorm
>>
>> I am not really certain what you want to do.  Are 2.4 and 2.9 scores
>> or means?  Is the variance 2?  What distribution are you assuming
>> these values come from?  If you explain a bit more what you are after,
>> we can help more.
>>
>> Cheers,
>>
>> Josh
>>
>> On Fri, Nov 5, 2010 at 9:47 PM, Jumlong Vongprasert
>>  wrote:
>> > Dear All
>> > I have 2 value assume 2.4 and 2.9 and mean = 2 variance = with n = 10
>> > I want to find probability = 2.4 and 2.9.
>> > How I can do this.
>> > Many Thanks.
>> > Jumlong
>> >
>> > --
>> > Jumlong Vongprasert Assist, Prof.
>> > Institute of Research and Development
>> > Ubon Ratchathani Rajabhat University
>> > Ubon Ratchathani
>> > THAILAND
>> > 34000
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>
>>
>> --
>> Joshua Wiley
>> Ph.D. Student, Health Psychology
>> University of California, Los Angeles
>> http://www.joshuawiley.com/
>
>
>
> --
> Jumlong Vongprasert Assist, Prof.
> Institute of Research and Development
> Ubon Ratchathani Rajabhat University
> Ubon Ratchathani
> THAILAND
> 34000
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] calculate probability

2010-11-05 Thread Joshua Wiley

Dear Jumlong,

Perhaps look at ?pnorm

I am not really certain what you want to do.  Are 2.4 and 2.9 scores
or means?  Is the variance 2?  What distribution are you assuming
these values come from?  If you explain a bit more what you are after,
we can help more.

Cheers,

Josh

On Fri, Nov 5, 2010 at 9:47 PM, Jumlong Vongprasert
 wrote:
> Dear All
> I have 2 value assume 2.4 and 2.9 and mean = 2 variance = with n = 10
> I want to find probability = 2.4 and 2.9.
> How I can do this.
> Many Thanks.
> Jumlong
>
> --
> Jumlong Vongprasert Assist, Prof.
> Institute of Research and Development
> Ubon Ratchathani Rajabhat University
> Ubon Ratchathani
> THAILAND
> 34000
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] calculate probability

2010-11-05 Thread Jumlong Vongprasert

Dear All
I have 2 value assume 2.4 and 2.9 and mean = 2 variance = with n = 10
I want to find probability = 2.4 and 2.9.
How I can do this.
Many Thanks.
Jumlong

-- 
Jumlong Vongprasert Assist, Prof.
Institute of Research and Development
Ubon Ratchathani Rajabhat University
Ubon Ratchathani
THAILAND
34000

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] calcute probability

2010-11-05 Thread Jumlong Vongprasert

Dear All

-- 
Jumlong Vongprasert Assist, Prof.
Institute of Research and Development
Ubon Ratchathani Rajabhat University
Ubon Ratchathani
THAILAND
34000

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] assignment operator saving factor level as number

2010-11-05 Thread Jeffrey Spies

Glad you figured it out, but just be aware that if you set one value
of the column to be a character, it will make the whole vector
characters.  This could cause issues for analysis if you need numerics
or factors.  If the column is supposed to be a factor to begin with,
set it to be so; if you have two data frames, one with a column of
factors (dat2) and one with what should be a column of factors (dat1),
you can use something like this:

dat1$columnThatShouldBeFactors <- as.factor(
dat1$columnThatShouldBeFactors,
levels=levels(dat2$columnThatIsAlreadyFactors)
)

Cheers,

Jeff.

On Fri, Nov 5, 2010 at 6:03 PM, Wade Wall  wrote:
> Hi all,
>
> Thanks for the help.  Jeffrey was right; my initial dataframe did not have
> the columns defined for factors.  I solved it using Jorge's example of using
> as.character.
>
> Sorry for not being more clear before.
>
> Wade
>
> On Fri, Nov 5, 2010 at 4:12 PM, Jeffrey Spies  wrote:
>>
>> Perhaps this will help:
>>
>> > test1 <- test2 <- data.frame(col1=factor(c(1:3), labels=c("a", "b",
>> > "c")))
>> > test3 <- data.frame(col1 = 1:3)
>>
>> Now:
>>
>> > test2[2,1] <- test1$col1[1]
>> > test2$col1
>> [1] a a c
>> Levels: a b c
>>
>> vs
>>
>> > test3[2,1] <- test1$col1[1]
>> > test3$col1
>> [1] 1 1 3
>>
>> Because test3's first column, col1, is a vector of numeric, and each
>> element of a vector must have the same data type (numeric, factor,
>> etc), it will coerce the data coming in to have the same data type (if
>> it can).  In this case, the data type is numeric.  Had it been a
>> character coming in, because it can't coerce a character to a numeric,
>> it would have made the entire vector a vector of characters:
>>
>> > test3[2,1] <- 'b'
>> > test3$col1
>> [1] "1" "b" "3"
>>
>> Hope that demonstrates what's probably going on,
>>
>> Jeff.
>>
>> On Fri, Nov 5, 2010 at 3:54 PM, Wade Wall  wrote:
>> > Hi all,
>> >
>> > I have a dataframe (df1) that I am trying to select values from to a
>> > second
>> > dataframe that at the current time is only for the selected items from
>> > df1
>> > (df2).  The values that I am trying to save from df1 are factors with
>> > alphanumeric names
>> >
>> > df1 looks like this:
>> >
>> > 'data.frame':   3014 obs. of  13 variables:
>> >  $ Num         : int  1 1 1 2 2 2 3 3 3 4 ...
>> >  $ Tag_Num     : int  1195 1195 1195 1162 1162 1162 1106 1106 1106 1173
>> > ...
>> >  $ Site        : Factor w/ 25 levels "PYBR002A","PYBR003B",..: 1 1 1 1 1
>> > 1 1
>> > 1 1 1 ...
>> >  $ Site_IndNum : Factor w/ 1044 levels "PYBR002A_001",..: 1 1 1 2 2 2 3
>> > 3 3
>> > 4 ...
>> >   ...
>> >  $ Area        : num  463.3 29.5 101.8 152.9 34.6 ...
>> >
>> > However, whenever I try to assign values, like this
>> >
>> > df2[j,1]<-df2$Site[i]
>> >
>> > the values are changed from alphanumeric (e.g. PYBR003A) to numerals
>> > (e.g.
>> > 1).
>> >
>> > Does anyone know why this is happening and how I can assign the actual
>> > values from df1 to df2?
>> >
>> > Thanks in advance,
>> >
>> > Wade
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] slow dget

2010-11-05 Thread Jack Tanner

I have a data structure that is fast to dput(), but very slow to dget(). On
disk, the file is about 35MB.

> system.time(dget("r.txt"))
   user  system elapsed 
 142.931.27  192.84 

The same data structure is fast to save() and fast to load(). The .RData file on
disk is about 12MB.

> system.time(load("r.RData"))
   user  system elapsed 
   4.890.087.82 

I imagine that this is a known speed issue with dget, and that the recommended
solution is to use load, which is fine with me. If so, perhaps a note to this
effect could be added to the dget help page.

All timings above using

R version 2.12.0 (2010-10-15)
Platform: i386-pc-mingw32/i386 (32-bit)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] prob with legend in my plots!

2010-11-05 Thread David Winsemius



On Nov 5, 2010, at 6:59 PM, govin...@msu.edu wrote:


Hi,

I have a problem with the appearance of legend in my plots. If I  
specify  the legend positions in characters like "topright"..etc, it  
appears, if  i specify it in terms of coordinates like "-1, 1" ..  
etc, it does not  appear.


Is -1, 1 a valid location in the range and domain of the arguments?


Can anyone help me with this?

script -
x.date <- as.Date(paste(year, month, day, sep="-"))
ts1.n.e3 <- ts(data.nemr.e3[,3])
z1.n.e3 <- zoo(ts1.n.e3, x.date)
plot(z1.n.e3, ylim = c(min(data.nemr.e3[,3]), max(data.nemr.e3[, 
5])), col="orange",
main = "Monthly variations of SST in El-Nino3", xlab = "Year",  
ylab="SST (deg C)")

lines(z2.n.e3, lty = 2, col="red2")
lines(z3.n.e3, lty = 3, col="maroon3", lwd=1)
legend(-0.1, -0.1, legend=c("min", "mean", "max"), lty=c(1,2,3),
col=c("orange", "red2", "maroon3"))

attached is my plot!


No, it's not. Neither is any data with which to reproduce the problem.  
Please read the Posting Guide more thoroughly.






--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to extract Friday data from daily data.

2010-11-05 Thread Gabor Grothendieck

On Fri, Nov 5, 2010 at 8:24 PM, Gabor Grothendieck
 wrote:
> On Fri, Nov 5, 2010 at 1:22 PM, thornbird  wrote:
>>
>> I am new to Using R for data analysis. I have an incomplete time series
>> dataset that is in daily format. I want to extract only Friday data from it.
>> However, there are two problems with it.
>>
>> First, if Friday data is missing in that week, I need to extract the data of
>> the day prior to that Friday (e.g. Thursday).
>>
>> Second, sometimes there are duplicate Friday data (say Friday morning and
>> afternoon), but I only need the latest one (Friday afternoon).
>>
>> My question is how I can only extract the Friday data and make it a new
>> dataset so that I have data for every single week for the convenience of
>> data analysis.
>>
>
>
> There are several approaches depending on exactly what is to be
> produced.  We show two of them here using zoo.
>
>
> # read in data
>
> Lines <- "  views  number  timestamp day            time
> 1  views  910401 1246192687 Sun 6/28/2009 12:38
> 2  views  921537 1246278917 Mon 6/29/2009 12:35
> 3  views  934280 1246365403 Tue 6/30/2009 12:36
> 4  views  986463 1246888699 Mon  7/6/2009 13:58
> 5  views  995002 1246970243 Tue  7/7/2009 12:37
> 6  views 1005211 1247079398 Wed  7/8/2009 18:56
> 7  views 1011144 1247135553 Thu  7/9/2009 10:32
> 8  views 1026765 1247308591 Sat 7/11/2009 10:36
> 9  views 1036856 1247436951 Sun 7/12/2009 22:15
> 10 views 1040909 1247481564 Mon 7/13/2009 10:39
> 11 views 1057337 1247568387 Tue 7/14/2009 10:46
> 12 views 1066999 1247665787 Wed 7/15/2009 13:49
> 13 views 1077726 1247778752 Thu 7/16/2009 21:12
> 14 views 1083059 1247845413 Fri 7/17/2009 15:43
> 15 views 1083059 1247845824 Fri 7/17/2009 18:45
> 16 views 1089529 1247914194 Sat 7/18/2009 10:49"
>
> library(zoo)
>
> # read in and create a zoo series
> # - skip= over the header
> # - index=. the time index is third non-removed column.
> # - format=. convert the index to Date class using indicated format
> # - col.names= as specified
> # - aggregate= over duplicate dates keeping last
> # - colClasses= specifies "NULL" for columns we want to remove
>
> colClasses <-
>  c("NULL", "NULL", "numeric", "numeric", "NULL", "character", "NULL")
>
> col.names <- c(NA, NA, "views", "number", NA, NA, NA)
>
> # z <- read.zoo("myfile.dat", skip = 1, index = 3,
> z <- read.zoo(textConnection(Lines), skip = 1, index = 3,
>        format = "%m/%d/%Y", col.names = col.names,
>        aggregate = function(x) tail(x, 1), colClasses = colClasses)
>
> ## Now that we have read it in lets process it
>
> ## 1.
>
> # extract all Thursdays and Fridays
> z45 <- z[format(time(z), "%w") %in% 4:5,]
>
> # keep last entry in each week
> # and show result on R console
> z45[!duplicated(format(time(z45), "%U"), fromLast = TRUE), ]
>
>
> # 2. alternative approach
> # above approach labels each point as it was originally labelled
> # so if Thursday is used it gets the date of that Thursday
> # Another approach is to always label the resulting point as Friday
> # and also use the last available value even if its not Thursday
>
> # create daily grid
> g <- seq(start(z), end(z), by = "day")
>
> # fill in daily grid so Friday is filled in with prior value
> # if Friday is NA
> z.filled <- na.locf(z, xout = g)
>
> # extract Fridays (including those filled in from previous)
> # and show result on R console
> z.filled[format(time(z.filled), "%w") == "5", ]
>

Note that if the data can span more than one year then "%U" above
should be replaced with "%Y-%U" so that weeks in one year are not
lumped with weeks in other years.


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] prob with legend in my plots!

2010-11-05 Thread govindas



Hi,

I have a problem with the appearance of legend in my plots. If I specify  the 
legend positions in characters like "topright"..etc, it appears, if  i specify 
it in terms of coordinates like "-1, 1" .. etc, it does not  appear. Can anyone 
help me with this?

script - 
x.date <- as.Date(paste(year, month, day, sep="-"))
ts1.n.e3 <- ts(data.nemr.e3[,3])
z1.n.e3 <- zoo(ts1.n.e3, x.date)
plot(z1.n.e3, ylim = c(min(data.nemr.e3[,3]), max(data.nemr.e3[,5])), 
col="orange",
main = "Monthly variations of SST in El-Nino3", xlab = "Year", ylab="SST (deg 
C)")
lines(z2.n.e3, lty = 2, col="red2")
lines(z3.n.e3, lty = 3, col="maroon3", lwd=1)
legend(-0.1, -0.1, legend=c("min", "mean", "max"), lty=c(1,2,3),
col=c("orange", "red2", "maroon3"))

attached is my plot! suggestions are welcome! thanks in advance! -- 
Regards,
Maha!__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] variable type assignment in daisy

2010-11-05 Thread Penny Adversario

Dear Rhelp,
Â 
I did a daisy on 5 lifestyle variables, 3 of which were nominal and 2 were 
ordinal and assigned types ânominalâ and âordinalâ for the variables, 
respectively.Â  I got an output indicating their types as âIâ for 
interval(?). Doing it on the Rdata example âflowerâ gave the same types in 
the output as the types they were assigned to.Â  Why is this so? Below are the 
codes and outputs.
Â 
sfq Â Â Â is a nominal variable with 5 categories pertaining to
Â Â Â Â Â  Â smoking frequency and consumption (1=none, 2=â¤10
Â Â Â Â Â  Â sticks/day somedays, 3=>10 sticks/day somedays, 4=â¤10
Â Â Â Â Â  Â sticks/day daily, 5=>10 sticks/day daily)
afq Â Â Â is a nominal variable with 5 categories pertaining to
Â Â Â Â Â Â Â alcohol frequency and consumption
pafd Â Â is a nominal variable with 5 categories pertaining to
Â Â Â Â Â Â Â physical activity frequency and duration
dietp1 is an ordinal variable with 3 categories pertaining low, 
Â Â Â Â Â Â  medium, high consumption of Western diet
dietp3 is an ordinal variable with 3 categories pertaining to
Â Â Â Â Â  Â low, medium, high consumption of prudent diet
Â Â  
Â 
>head(lsclusjt3)
Â  sfq afq pafd dietp1 dietp3
1Â Â  1Â Â  1Â Â Â  3Â Â Â Â Â  1Â Â Â Â Â  2
2Â Â  1Â Â  1Â Â Â  3Â Â Â Â Â  3Â Â Â Â Â  3
3Â Â  1Â Â  1Â Â Â  1Â Â Â Â Â  2Â Â Â Â Â  1
4Â Â  1Â Â  1Â Â Â  1Â Â Â Â Â  2Â Â Â Â Â  2
5Â Â  1Â Â  2Â Â Â  3Â Â Â Â Â  3Â Â Â Â Â  3
6Â Â  1Â Â  1Â Â Â  1Â Â Â Â Â  2Â Â Â Â Â  2
Â 
>dm=daisy(lsclusjt3,metric="gower", stand=FALSE,type=list(nominal=c(1,2,3), 
>ordinal=c(4,5)))
>summary(dm)
38434528 dissimilarities, summarized :
Â Â  Min. 1st Qu.Â  MedianÂ Â Â  Mean 3rd Qu.Â Â Â  Max. 
0.0 0.25000 0.35000 0.36599 0.5 1.0 
Metric :Â  mixed ;Â  Types = I, I, I, I, I
Number of objects : 8768
Â 
>dfl=daisy(flower,type=list(asymm=1:3,nominal=4,ordinal=5:6,interval=7:8))
> summary(dfl)
153 dissimilarities, summarized :
Â Â  Min. 1st Qu.Â  MedianÂ Â Â  Mean 3rd Qu.Â Â Â  Max. 
0.15915 0.43576 0.53408 0.53473 0.62908 0.89099 
Metric :Â  mixed ;Â  Types = A, A, A, N, O, O, I, I 
Number of objects : 18
Â 
Â 
Penny
Â 


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to extract Friday data from daily data.

2010-11-05 Thread Gabor Grothendieck

On Fri, Nov 5, 2010 at 1:22 PM, thornbird  wrote:
>
> I am new to Using R for data analysis. I have an incomplete time series
> dataset that is in daily format. I want to extract only Friday data from it.
> However, there are two problems with it.
>
> First, if Friday data is missing in that week, I need to extract the data of
> the day prior to that Friday (e.g. Thursday).
>
> Second, sometimes there are duplicate Friday data (say Friday morning and
> afternoon), but I only need the latest one (Friday afternoon).
>
> My question is how I can only extract the Friday data and make it a new
> dataset so that I have data for every single week for the convenience of
> data analysis.
>


There are several approaches depending on exactly what is to be
produced.  We show two of them here using zoo.


# read in data

Lines <- "  views  number  timestamp daytime
1  views  910401 1246192687 Sun 6/28/2009 12:38
2  views  921537 1246278917 Mon 6/29/2009 12:35
3  views  934280 1246365403 Tue 6/30/2009 12:36
4  views  986463 1246888699 Mon  7/6/2009 13:58
5  views  995002 1246970243 Tue  7/7/2009 12:37
6  views 1005211 1247079398 Wed  7/8/2009 18:56
7  views 1011144 1247135553 Thu  7/9/2009 10:32
8  views 1026765 1247308591 Sat 7/11/2009 10:36
9  views 1036856 1247436951 Sun 7/12/2009 22:15
10 views 1040909 1247481564 Mon 7/13/2009 10:39
11 views 1057337 1247568387 Tue 7/14/2009 10:46
12 views 1066999 1247665787 Wed 7/15/2009 13:49
13 views 1077726 1247778752 Thu 7/16/2009 21:12
14 views 1083059 1247845413 Fri 7/17/2009 15:43
15 views 1083059 1247845824 Fri 7/17/2009 18:45
16 views 1089529 1247914194 Sat 7/18/2009 10:49"

library(zoo)

# read in and create a zoo series
# - skip= over the header
# - index=. the time index is third non-removed column.
# - format=. convert the index to Date class using indicated format
# - col.names= as specified
# - aggregate= over duplicate dates keeping last
# - colClasses= specifies "NULL" for columns we want to remove

colClasses <-
 c("NULL", "NULL", "numeric", "numeric", "NULL", "character", "NULL")

col.names <- c(NA, NA, "views", "number", NA, NA, NA)

# z <- read.zoo("myfile.dat", skip = 1, index = 3,
z <- read.zoo(textConnection(Lines), skip = 1, index = 3,
format = "%m/%d/%Y", col.names = col.names,
aggregate = function(x) tail(x, 1), colClasses = colClasses)

## Now that we have read it in lets process it

## 1.

# extract all Thursdays and Fridays
z45 <- z[format(time(z), "%w") %in% 4:5,]

# keep last entry in each week
# and show result on R console
z45[!duplicated(format(time(z45), "%U"), fromLast = TRUE), ]


# 2. alternative approach
# above approach labels each point as it was originally labelled
# so if Thursday is used it gets the date of that Thursday
# Another approach is to always label the resulting point as Friday
# and also use the last available value even if its not Thursday

# create daily grid
g <- seq(start(z), end(z), by = "day")

# fill in daily grid so Friday is filled in with prior value
# if Friday is NA
z.filled <- na.locf(z, xout = g)

# extract Fridays (including those filled in from previous)
# and show result on R console
z.filled[format(time(z.filled), "%w") == "5", ]

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Spatstat rmh function error message

2010-11-05 Thread Neba Funwi-Gabga

Hello,
I have fitted a Poisson Process model in spatstat using

>fit1<-ppm(points, ~elevation, covariates=list(elevation=elevation.im))

This far, everything went well, but I try to simulate the fitted model using
the function:

>sim1<-rmh(fit1)

But I get the error message:
"Extracting model information...Evaluating trend...done.
Checking arguments..determining simulation windows...Error in rmh.default(X,
start = start, control = control, ..., verbose = verbose) :
  Expanded simulation window does not contain model window"

Is there anything I am doing wrong? I am working on R version 2.12.0 and
spatstat version 1.20-5 .

I greatly appreciate any help.

*Neba.

Universitat Jaume I,
Castellon de la plana,
Spain.*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] assignment operator saving factor level as number

2010-11-05 Thread Wade Wall

Hi all,

Thanks for the help.  Jeffrey was right; my initial dataframe did not have
the columns defined for factors.  I solved it using Jorge's example of using
as.character.

Sorry for not being more clear before.

Wade

On Fri, Nov 5, 2010 at 4:12 PM, Jeffrey Spies  wrote:

> Perhaps this will help:
>
> > test1 <- test2 <- data.frame(col1=factor(c(1:3), labels=c("a", "b",
> "c")))
> > test3 <- data.frame(col1 = 1:3)
>
> Now:
>
> > test2[2,1] <- test1$col1[1]
> > test2$col1
> [1] a a c
> Levels: a b c
>
> vs
>
> > test3[2,1] <- test1$col1[1]
> > test3$col1
> [1] 1 1 3
>
> Because test3's first column, col1, is a vector of numeric, and each
> element of a vector must have the same data type (numeric, factor,
> etc), it will coerce the data coming in to have the same data type (if
> it can).  In this case, the data type is numeric.  Had it been a
> character coming in, because it can't coerce a character to a numeric,
> it would have made the entire vector a vector of characters:
>
> > test3[2,1] <- 'b'
> > test3$col1
> [1] "1" "b" "3"
>
> Hope that demonstrates what's probably going on,
>
> Jeff.
>
> On Fri, Nov 5, 2010 at 3:54 PM, Wade Wall  wrote:
> > Hi all,
> >
> > I have a dataframe (df1) that I am trying to select values from to a
> second
> > dataframe that at the current time is only for the selected items from
> df1
> > (df2).  The values that I am trying to save from df1 are factors with
> > alphanumeric names
> >
> > df1 looks like this:
> >
> > 'data.frame':   3014 obs. of  13 variables:
> >  $ Num : int  1 1 1 2 2 2 3 3 3 4 ...
> >  $ Tag_Num : int  1195 1195 1195 1162 1162 1162 1106 1106 1106 1173
> ...
> >  $ Site: Factor w/ 25 levels "PYBR002A","PYBR003B",..: 1 1 1 1 1
> 1 1
> > 1 1 1 ...
> >  $ Site_IndNum : Factor w/ 1044 levels "PYBR002A_001",..: 1 1 1 2 2 2 3 3
> 3
> > 4 ...
> >   ...
> >  $ Area: num  463.3 29.5 101.8 152.9 34.6 ...
> >
> > However, whenever I try to assign values, like this
> >
> > df2[j,1]<-df2$Site[i]
> >
> > the values are changed from alphanumeric (e.g. PYBR003A) to numerals
> (e.g.
> > 1).
> >
> > Does anyone know why this is happening and how I can assign the actual
> > values from df1 to df2?
> >
> > Thanks in advance,
> >
> > Wade
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with 'lars' package

2010-11-05 Thread Steve Lianoglou

Hi,

On Fri, Nov 5, 2010 at 4:14 PM, Vladimir Subbotin
 wrote:
> Hello,
>
> I have problems with 'lars' package. I found the previous post of the person
> who had the same issue, but the suggested solution in that post did not help
> me.
>
> I created the matrices:
>
> ResponseMatrix <- data.frame (GAOdecision=GAOdecision)
> PredictorsMatrix <- data.frame (WeaponvsNon = WeaponvsNon, ProductvsService
> = ProductvsService, KDuration = KDuration, BusinessSize = BusinessSize,
> Bidders = Bidders, Staging = Staging, Criteria = Criteria, KPricing =
> KPricing, Political = Political)
>
> and then
> output <- lars(PredictorsMatrix, ResponseMatrix)
> *Error in one %*% x : requires numeric/complex matrix/vector arguments*

This is likely talking about variables in the function -- for
instance, not that the first parameter in the function is named 'x'.

Also, your ResponeMatrix and PredictorsMatrix should be of type
*matrix* not data.frame ... you can try to convert it with
as.matrix(ResponseMatrix). You should then ensure that all values in
your response and predictors matrix are numeric, ie:
all(is.numeric(ResponseMatrix)), etc ... also check for odd values
like NA, NaN or Inf.

Lastly, consider using the glmnet package instead of lars. It provides
a superset of the functionality from lars, and its "core" is written
in fortran, so will likely be much faster and deal with larger
problems. It's also more actively maintained (note a new version of
glmnet was just released this week).

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Visualisation of data structures

2010-11-05 Thread Greg Snow

There is the TkListView function in the TeachingDemos package for looking at 
list structures.  It gives you a view of the list structure with nested 
elements available to be expanded by clicking on the little plus sign.  You can 
view or run code on the selected piece, which could help create some plots of 
interest.

There are no tools to plot a general list, but if you know a specific structure 
of the elements you could create a series of plots using lapply.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Friedericksen Hope
> Sent: Friday, November 05, 2010 12:30 PM
> To: r-h...@stat.math.ethz.ch
> Subject: [R] Visualisation of data structures
> 
> Hi everyone,
> 
> I wonder if there is a package or functions to visualize data
> structures in R?
> For example I have a list with a lot of data frames - is there a
> function which plots the elements of the list?
> 
> Thanks!
> 
> Best,
> Friedericksen
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to extract Friday data from daily data.

2010-11-05 Thread thornbird


Thank you very much. It worked great with the testdata. I have one more
questionto to ask. As my data is incomplete, sometimes Thu is also missing,
then I have no other options but to pick Sat instead, and if Sat is also
missing, then my best possible option is to pick Wed, and etc. Bascially I
have to pick a day as the data for that week starting from Friday following
this order: 

Fri--> (if no Fri) Thu--> (if no Thu) Sat--> (if no Sat) Wed --> (if no Wed)
Sun --> (if no Sun) Tue -->(if no Tue) Mon. 

In this sense, I have to write a loop if command, right? Could you please
help me with that? Again thanks a lot. 



testdata$date = as.Date(testdata$date,"%m/%d/%Y") 

Thudat = subset(testdata,day=="Thu") 
Fridat = subset(testdata,day=="Fri") 

Friday_dates = Thudat$date+1 

Friday_info = NULL 

for(i in 1:length(Friday_dates)){ 

temp = subset(Fridat,date==Friday_dates[i]) # select the Friday dates from 
Fridat 

if(nrow(temp)>0){ # if that Friday date value exists in Friday 

Friday_info = rbind(Friday_info,temp[nrow(temp),]) # by saying nrow(temp) 
with the data organized chronologically already, you don't have to add an 
additional if  statement for multiple measurements in the same day. 

} else { # if that Friday date value doesn't exist in Fridat 

Friday_info = rbind(Friday_info,Thudat[i,]) # choosing the date from Thudat 
instead. 

} 

} 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/How-to-extract-Friday-data-from-daily-data-tp3029050p3029328.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem with 'lars' package

2010-11-05 Thread Vladimir Subbotin

Hello,

I have problems with 'lars' package. I found the previous post of the person
who had the same issue, but the suggested solution in that post did not help
me.

I created the matrices:

ResponseMatrix <- data.frame (GAOdecision=GAOdecision)
PredictorsMatrix <- data.frame (WeaponvsNon = WeaponvsNon, ProductvsService
= ProductvsService, KDuration = KDuration, BusinessSize = BusinessSize,
Bidders = Bidders, Staging = Staging, Criteria = Criteria, KPricing =
KPricing, Political = Political)

and then
output <- lars(PredictorsMatrix, ResponseMatrix)
*Error in one %*% x : requires numeric/complex matrix/vector arguments*
*
*
I did not create any variables x in my code. I also used rm(x) in the
beginning of my code, but it did not help.

I would really appreciate any help.

Thanks,
Vlad

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Visualisation of data structures

2010-11-05 Thread Friedericksen Hope


Hi everyone,

I wonder if there is a package or functions to visualize data structures in R?
For example I have a list with a lot of data frames - is there a function which 
plots the elements of the list?

Thanks!

Best,
Friedericksen

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] assignment operator saving factor level as number

2010-11-05 Thread Jorge Ivan Velez

Hi Wade,

Try (untested):

df2[j,1] <- as.character(f2$Site)[i]

If that does not work, which is very likely, could you please provide
commented, minimal, self-contained, reproducible code?

HTH,
Jorge


On Fri, Nov 5, 2010 at 3:54 PM, Wade Wall <> wrote:

> Hi all,
>
> I have a dataframe (df1) that I am trying to select values from to a second
> dataframe that at the current time is only for the selected items from df1
> (df2).  The values that I am trying to save from df1 are factors with
> alphanumeric names
>
> df1 looks like this:
>
> 'data.frame':   3014 obs. of  13 variables:
>  $ Num : int  1 1 1 2 2 2 3 3 3 4 ...
>  $ Tag_Num : int  1195 1195 1195 1162 1162 1162 1106 1106 1106 1173 ...
>  $ Site: Factor w/ 25 levels "PYBR002A","PYBR003B",..: 1 1 1 1 1 1
> 1
> 1 1 1 ...
>  $ Site_IndNum : Factor w/ 1044 levels "PYBR002A_001",..: 1 1 1 2 2 2 3 3 3
> 4 ...
>   ...
>  $ Area: num  463.3 29.5 101.8 152.9 34.6 ...
>
> However, whenever I try to assign values, like this
>
> df2[j,1]<-df2$Site[i]
>
> the values are changed from alphanumeric (e.g. PYBR003A) to numerals (e.g.
> 1).
>
> Does anyone know why this is happening and how I can assign the actual
> values from df1 to df2?
>
> Thanks in advance,
>
> Wade
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory Management under Linux

2010-11-05 Thread jim holtman

I would do some monitoring (debugging) of the script by placing some 'gc()'
calls in the sequence of statements leading to the problem to see what the
memory usage is at that point.  Take a close look at the sizes of your
objects.  If it is happening in some function you have called, you may have
to take a look and understand if multiple copies are being made.  Most
problems of this type may require that you put hooks in your code (most of
the stuff that I write has it in so I can isolate performance problems) to
gain an understanding of what is happening when.  To improve memory
allocation, you first have to understand what is causing the problem, and
enough information has not been provided so that I could make a comment on
it.  There are lots of rules of thumb that can be used, but many depend on
exactly what you are trying to do.

On Fri, Nov 5, 2010 at 2:59 PM, ricardo souza wrote:

>   Dear Jim,
>
> Thanks for your attention. I am running a geostatistic analysis with geoR
> that is computational intense. At the end my analysis I call the function
> krige.control and krige.conv.  Do you have any idea how to improve the
> memory allocation in Linux?
>
> Thanks,
> Ricardo
>
>
>
> De: jim holtman 
> Assunto: Re: [R] Memory Management under Linux
> Para: "ricardo souza" 
> Cc: r-help@r-project.org
> Data: Sexta-feira, 5 de Novembro de 2010, 10:21
>
>
> It would be very useful if you would post some information about what
> exactly you are doing.  There si something with the size of the data
> object you are processing ('str' would help us understand it) and then
> a portion of the script (both before and after the error message) so
> we can understand the transformation that you are doing.  It is very
> easy to generate a similar message:
>
> > x <- matrix(0,2, 2)
> Error: cannot allocate vector of size 3.0 Gb
>
> but unless you know the context, it is almost impossible to give
> advice.  It also depends on if you are in some function calls were
> copies of objects may have been made, etc.
>
> On Thu, Nov 4, 2010 at 7:52 PM, ricardo souza 
> http://mc/compose?to=ricsouz...@yahoo.com.br>>
> wrote:
> > Dear all,
> >
> > I am using ubuntu linux 32 with 4 Gb.  I am running a very small script
> and I always got the same error message:  CAN NOT ALLOCATE A VECTOR OF SIZE
> 231.8 Mb.
> >
> > I have reading carefully the instruction in ?Memory.  Using the function
> gc() I got very low numbers of memory (please sea below).  I know that it
> has been posted several times at r-help (
> http://tolstoy.newcastle.edu.au/R/help/05/06/7565.html#7627qlink2).
> However I did not find yet the solution to improve my memory issue in
> Linux.  Somebody cold please give some instruction how to improve my memory
> under linux?
> >
> >> gc()
> >  used (Mb) gc trigger (Mb) max used (Mb)
> > Ncells 170934  4.6 35  9.4   35  9.4
> > Vcells 195920  1.5 786432  6.0   781384  6.0
> >
> > INCREASING THE R MEMORY FOLLOWING THE INSTRUCTION IN  ?Memory
> >
> > I started R with:
> >
> > R --min-vsize=10M --max-vsize=4G --min-nsize=500k --max-nsize=900M
> >> gc()
> >  used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
> > Ncells 130433  3.5 50 13.4  25200   50 13.4
> > Vcells  81138  0.71310720 10.0 NA   499143  3.9
> >
> > It increased but not so much!
> >
> > Please, please let me know.  I have read all r-help about this matter,
> but not solution. Thanks for your attention!
> >
> > Ricardo
> >
> >
> >
> >
> >
> >
> >
> >[[alternative HTML version deleted]]
> >
> >
> > __
> > R-help@r-project.org  mailing
> list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>
>
>




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] assignment operator saving factor level as number

2010-11-05 Thread Jeffrey Spies

Perhaps this will help:

> test1 <- test2 <- data.frame(col1=factor(c(1:3), labels=c("a", "b", "c")))
> test3 <- data.frame(col1 = 1:3)

Now:

> test2[2,1] <- test1$col1[1]
> test2$col1
[1] a a c
Levels: a b c

vs

> test3[2,1] <- test1$col1[1]
> test3$col1
[1] 1 1 3

Because test3's first column, col1, is a vector of numeric, and each
element of a vector must have the same data type (numeric, factor,
etc), it will coerce the data coming in to have the same data type (if
it can).  In this case, the data type is numeric.  Had it been a
character coming in, because it can't coerce a character to a numeric,
it would have made the entire vector a vector of characters:

> test3[2,1] <- 'b'
> test3$col1
[1] "1" "b" "3"

Hope that demonstrates what's probably going on,

Jeff.

On Fri, Nov 5, 2010 at 3:54 PM, Wade Wall  wrote:
> Hi all,
>
> I have a dataframe (df1) that I am trying to select values from to a second
> dataframe that at the current time is only for the selected items from df1
> (df2).  The values that I am trying to save from df1 are factors with
> alphanumeric names
>
> df1 looks like this:
>
> 'data.frame':   3014 obs. of  13 variables:
>  $ Num         : int  1 1 1 2 2 2 3 3 3 4 ...
>  $ Tag_Num     : int  1195 1195 1195 1162 1162 1162 1106 1106 1106 1173 ...
>  $ Site        : Factor w/ 25 levels "PYBR002A","PYBR003B",..: 1 1 1 1 1 1 1
> 1 1 1 ...
>  $ Site_IndNum : Factor w/ 1044 levels "PYBR002A_001",..: 1 1 1 2 2 2 3 3 3
> 4 ...
>   ...
>  $ Area        : num  463.3 29.5 101.8 152.9 34.6 ...
>
> However, whenever I try to assign values, like this
>
> df2[j,1]<-df2$Site[i]
>
> the values are changed from alphanumeric (e.g. PYBR003A) to numerals (e.g.
> 1).
>
> Does anyone know why this is happening and how I can assign the actual
> values from df1 to df2?
>
> Thanks in advance,
>
> Wade
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] assignment operator saving factor level as number

2010-11-05 Thread jim holtman

Your example looks like you are assigning back to the first column of
df2 (Num).  Is this what you are really doing in your code?

You need to follow the posting guide:

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

On Fri, Nov 5, 2010 at 3:54 PM, Wade Wall  wrote:
> Hi all,
>
> I have a dataframe (df1) that I am trying to select values from to a second
> dataframe that at the current time is only for the selected items from df1
> (df2).  The values that I am trying to save from df1 are factors with
> alphanumeric names
>
> df1 looks like this:
>
> 'data.frame':   3014 obs. of  13 variables:
>  $ Num         : int  1 1 1 2 2 2 3 3 3 4 ...
>  $ Tag_Num     : int  1195 1195 1195 1162 1162 1162 1106 1106 1106 1173 ...
>  $ Site        : Factor w/ 25 levels "PYBR002A","PYBR003B",..: 1 1 1 1 1 1 1
> 1 1 1 ...
>  $ Site_IndNum : Factor w/ 1044 levels "PYBR002A_001",..: 1 1 1 2 2 2 3 3 3
> 4 ...
>   ...
>  $ Area        : num  463.3 29.5 101.8 152.9 34.6 ...
>
> However, whenever I try to assign values, like this
>
> df2[j,1]<-df2$Site[i]
>
> the values are changed from alphanumeric (e.g. PYBR003A) to numerals (e.g.
> 1).
>
> Does anyone know why this is happening and how I can assign the actual
> values from df1 to df2?
>
> Thanks in advance,
>
> Wade
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] assignment operator saving factor level as number

2010-11-05 Thread Erik Iverson


Could you give a small reproducible example please?
It is not clear to me what your looping structure is
doing, or what your goal here is.

There may be a much simpler method than introducing
subscripts.

--Erik

Wade Wall wrote:

Hi all,

I have a dataframe (df1) that I am trying to select values from to a second
dataframe that at the current time is only for the selected items from df1
(df2).  The values that I am trying to save from df1 are factors with
alphanumeric names

df1 looks like this:

'data.frame':   3014 obs. of  13 variables:
 $ Num : int  1 1 1 2 2 2 3 3 3 4 ...
 $ Tag_Num : int  1195 1195 1195 1162 1162 1162 1106 1106 1106 1173 ...
 $ Site: Factor w/ 25 levels "PYBR002A","PYBR003B",..: 1 1 1 1 1 1 1
1 1 1 ...
 $ Site_IndNum : Factor w/ 1044 levels "PYBR002A_001",..: 1 1 1 2 2 2 3 3 3
4 ...
  ...
 $ Area: num  463.3 29.5 101.8 152.9 34.6 ...

However, whenever I try to assign values, like this

df2[j,1]<-df2$Site[i]

the values are changed from alphanumeric (e.g. PYBR003A) to numerals (e.g.
1).

Does anyone know why this is happening and how I can assign the actual
values from df1 to df2?

Thanks in advance,

Wade

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] assignment operator saving factor level as number

2010-11-05 Thread Wade Wall

Hi all,

I have a dataframe (df1) that I am trying to select values from to a second
dataframe that at the current time is only for the selected items from df1
(df2).  The values that I am trying to save from df1 are factors with
alphanumeric names

df1 looks like this:

'data.frame':   3014 obs. of  13 variables:
 $ Num : int  1 1 1 2 2 2 3 3 3 4 ...
 $ Tag_Num : int  1195 1195 1195 1162 1162 1162 1106 1106 1106 1173 ...
 $ Site: Factor w/ 25 levels "PYBR002A","PYBR003B",..: 1 1 1 1 1 1 1
1 1 1 ...
 $ Site_IndNum : Factor w/ 1044 levels "PYBR002A_001",..: 1 1 1 2 2 2 3 3 3
4 ...
  ...
 $ Area: num  463.3 29.5 101.8 152.9 34.6 ...

However, whenever I try to assign values, like this

df2[j,1]<-df2$Site[i]

the values are changed from alphanumeric (e.g. PYBR003A) to numerals (e.g.
1).

Does anyone know why this is happening and how I can assign the actual
values from df1 to df2?

Thanks in advance,

Wade

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data acquisition with R?

2010-11-05 Thread Matt Shotwell

R implements (almost) all IO through its 'connections'. Unfortunately,
there is no API (public or private) for adding connections, and
therefore no packages that implement connections. You will find more
discussion of connections and hardware (serial, USB) interface in the
R-devel list archives.

There are two source code patches that implement two types of
connections that work on POSIX compliant OSs, including GNU Linux, BSD,
and Mac OS X. The first is a 'serial' connection, a high level
connection to a serial port . The
second is a 'tty' connection, a more low level connection to the POSIX
termios interface . Both of these
solutions require that you apply the patch and recompile R. I can help
with this, if you like.

AFAIK, these are the only attempts at interfacing R with POSIX TTYs
directly.

-Matt

On Fri, 2010-11-05 at 09:48 -0400, B.-MarkusS wrote:
> Hello,
> 
> I spent quite some time now searching for any hint that R can also be 
> used to address the interfaces of a computer (i.e. RS232 or USB) to 
> acquire data from measurement devices (like with the - I think it is the 
> - devices or serial toolbox of Matlab).
> 
> Is there any package available or a project going on that you know of? I 
> would so much like to have never to work with Matlab again. The only 
> thing I am really missing in R so far is the possibility to connect to 
> my measurement devices (for instance a precision balance) and record 
> data directly with R.
> 
> Please let me know whether I am just missing something or if you have 
> some information about something like that.
> 
> Thank you very much!
> Mango

-- 
Matthew S. Shotwell
Graduate Student 
Division of Biostatistics and Epidemiology
Medical University of South Carolina

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] connecting to remote database using RMySQL

2010-11-05 Thread Daniel Nordlund

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of Jan Theodore Galkowski
> Sent: Friday, November 05, 2010 12:05 PM
> To: R Project
> Subject: [R] connecting to remote database using RMySQL
> 
> Apologies if this is the wrong place to ask.  I'm not aware of a
> mail list devoted to database interfaces.  Please direct me if
> so.
> 
> I am trying to use TSMySQL. It loads RMySQL, and apparent the
> CRAN version has an .onLoad script which seeks out the local
> MySQL server.  If that fails, the package fails to load.
> 
> Thing is, I'm trying to access a MySQL database on a remote
> machine for which I have authorized access. Anyone know how to do
> this, or suggest where I can find out?
> 
> Thanks.  I have successfully accessed remote databases using
> RODBC.  Is that how I need to go?
> 
>  - Jan

This would be your best resource for databases and R.

https://stat.ethz.ch/mailman/listinfo/r-sig-db

Hope this is helpful,

Dan

Daniel Nordlund
Bothell, WA USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to extract Friday data from daily data.

2010-11-05 Thread Adrienne Wootten

Hey,

This should work, but after you read in your data make sure that your day,
date and time are separate, this should work just fine, or something like
it.

> testdata
views  number  timestamp  day   date time
1  views  910401 1246192687 Sun 6/28/2009 12:38
2  views  921537 1246278917 Mon 6/29/2009 12:35
3  views  934280 1246365403 Tue 6/30/2009 12:36
4  views  986463 1246888699 Mon  7/6/2009 13:58
5  views  995002 1246970243 Tue  7/7/2009 12:37
6  views 1005211 1247079398 Wed  7/8/2009 18:56
7  views 1011144 1247135553 Thu  7/9/2009 10:32
8  views 1026765 1247308591 Sat 7/11/2009 10:36
9  views 1036856 1247436951 Sun 7/12/2009 22:15
10 views 1040909 1247481564 Mon 7/13/2009 10:39
11 views 1057337 1247568387 Tue 7/14/2009 10:46
12 views 1066999 1247665787 Wed 7/15/2009 13:49
13 views 1077726 1247778752 Thu 7/16/2009 21:12
14 views 1083059 1247845413 Fri 7/17/2009 15:43
15 views 1083059 1247845824 Fri 7/17/2009 18:45
16 views 1089529 1247914194 Sat 7/18/2009 10:49


testdata$date = as.Date(testdata$date,"%m/%d/%Y")

Thudat = subset(testdata,day=="Thu")
Fridat = subset(testdata,day=="Fri")

Friday_dates = Thudat$date+1

Friday_info = NULL

for(i in 1:length(Friday_dates)){

temp = subset(Fridat,date==Friday_dates[i]) # select the Friday dates from
Fridat

if(nrow(temp)>0){ # if that Friday date value exists in Friday

Friday_info = rbind(Friday_info,temp[nrow(temp),]) # by saying nrow(temp)
with the data organized chronologically already, you don't have to add an
additional if  statement for multiple measurements in the same day.

} else { # if that Friday date value doesn't exist in Fridat

Friday_info = rbind(Friday_info,Thudat[i,]) # choosing the date from Thudat
instead.

}

}

Friday_info
   views  number  timestamp day   date  time
7  views 1011144 1247135553 Thu 2009-07-09 10:32
15 views 1083059 1247845824 Fri 2009-07-17 18:45


Also, for other things involving getting data out to monthly or weekly, you
might want to try working with some functions from the chron package.
Things like seq.dates can allow you to get the appropriate dates for a
specific day of the week for every week that you want.  something like this
for instance:

as.Date(seq.dates("7/3/2009","7/24/2009",by="weeks"),"%m/%d/%Y")

for all the Fridays in July 2009.


Hope this helps!

A

-- 
Adrienne Wootten
Graduate Research Assistant
State Climate Office of North Carolina
Department of Marine, Earth and Atmospheric Sciences
North Carolina State University




On Fri, Nov 5, 2010 at 1:22 PM, thornbird  wrote:

>
> I am new to Using R for data analysis. I have an incomplete time series
> dataset that is in daily format. I want to extract only Friday data from
> it.
> However, there are two problems with it.
>
> First, if Friday data is missing in that week, I need to extract the data
> of
> the day prior to that Friday (e.g. Thursday).
>
> Second, sometimes there are duplicate Friday data (say Friday morning and
> afternoon), but I only need the latest one (Friday afternoon).
>
> My question is how I can only extract the Friday data and make it a new
> dataset so that I have data for every single week for the convenience of
> data analysis.
>
> Your help and time will be appreciated. Thanks.  Kevin
>
>
> Below is what my dataset looks like:
>
>   views  number  timestamp daytime
> 1  views  910401 1246192687 Sun 6/28/2009 12:38
> 2  views  921537 1246278917 Mon 6/29/2009 12:35
> 3  views  934280 1246365403 Tue 6/30/2009 12:36
> 4  views  986463 1246888699 Mon  7/6/2009 13:58
> 5  views  995002 1246970243 Tue  7/7/2009 12:37
> 6  views 1005211 1247079398 Wed  7/8/2009 18:56
> 7  views 1011144 1247135553 Thu  7/9/2009 10:32
> 8  views 1026765 1247308591 Sat 7/11/2009 10:36
> 9  views 1036856 1247436951 Sun 7/12/2009 22:15
> 10 views 1040909 1247481564 Mon 7/13/2009 10:39
> 11 views 1057337 1247568387 Tue 7/14/2009 10:46
> 12 views 1066999 1247665787 Wed 7/15/2009 13:49
> 13 views 1077726 1247778752 Thu 7/16/2009 21:12
> 14 views 1083059 1247845413 Fri 7/17/2009 15:43
> 15 views 1083059 1247845824 Fri 7/17/2009 18:45
> 16 views 1089529 1247914194 Sat 7/18/2009 10:49
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/How-to-extract-Friday-data-from-daily-data-tp3029050p3029050.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] connecting to remote database using RMySQL

2010-11-05 Thread Jan Theodore Galkowski

Yes, what happens when I do that is:

R version 2.11.1 (2010-05-31)
Copyright (C) 2010 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
[snip]
> Sys.getenv("MYSQL_USER")
MYSQL_USER
"mf1"
> # As an example. Other env vars are defined.
> library(RMySQL)
Loading required package: DBI
Error : .onLoad failed in loadNamespace() for 'RMySQL', details:
call: utils::readRegistry("SOFTWARE\\MySQL AB", hive = "HLM",
maxdepth = 2)
error: Registry key 'SOFTWARE\MySQL AB' not found
Error: package/namespace load failed for 'RMySQL'
>

I don't even get to do anything more.

 - Jan


On Fri, 05 Nov 2010 17:14 -0200, "Henrique Dallazuanna"
 wrote:

  library(RMySQL)
  conn <- dbConnect(MySQL(), user = 'user', password =
  'password', host = '[1]your_host.com')
  Look at [2]http://biostat.mc.vanderbilt.edu/RMySQL for more
  information

On Fri, Nov 5, 2010 at 5:04 PM, Jan Theodore Galkowski
<[3]bayesianlo...@acm.org> wrote:

  Apologies if this is the wrong place to ask.  I'm not aware of
  a
  mail list devoted to database interfaces.  Please direct me if
  so.
  I am trying to use TSMySQL. It loads RMySQL, and apparent the
  CRAN version has an .onLoad script which seeks out the local
  MySQL server.  If that fails, the package fails to load.
  Thing is, I'm trying to access a MySQL database on a remote
  machine for which I have authorized access. Anyone know how to
  do
  this, or suggest where I can find out?
  Thanks.  I have successfully accessed remote databases using
  RODBC.  Is that how I need to go?
   - Jan
 [[alternative HTML version deleted]]
  __
  [4]r-h...@r-project.org mailing list
  [5]https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  [6]http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible
  code.

  --
  Henrique Dallazuanna
  Curitiba-Paraná-Brasil
  25° 25' 40" S 49° 16' 22" O

References

1. http://your_host.com/
2. http://biostat.mc.vanderbilt.edu/RMySQL
3. mailto:bayesianlo...@acm.org
4. mailto:R-help@r-project.org
5. https://stat.ethz.ch/mailman/listinfo/r-help
6. http://www.R-project.org/posting-guide.html
--
  Jan Theodore Galkowski  (o°)

  607.239.1834 [mobile]
  607.239.1834 [home]
  617.444.4995 [work]

 bayesianlo...@acm.org
 http://www.linkedin.com/in/deepdevelopment


 "Eppur si muove." --Galilei



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] connecting to remote database using RMySQL

2010-11-05 Thread Henrique Dallazuanna

library(RMySQL)
conn <- dbConnect(MySQL(), user = 'user', password = 'password', host = '
your_host.com')

Look at http://biostat.mc.vanderbilt.edu/RMySQL for more information

On Fri, Nov 5, 2010 at 5:04 PM, Jan Theodore Galkowski <
bayesianlo...@acm.org> wrote:

> Apologies if this is the wrong place to ask.  I'm not aware of a
> mail list devoted to database interfaces.  Please direct me if
> so.
>
> I am trying to use TSMySQL. It loads RMySQL, and apparent the
> CRAN version has an .onLoad script which seeks out the local
> MySQL server.  If that fails, the package fails to load.
>
> Thing is, I'm trying to access a MySQL database on a remote
> machine for which I have authorized access. Anyone know how to do
> this, or suggest where I can find out?
>
> Thanks.  I have successfully accessed remote databases using
> RODBC.  Is that how I need to go?
>
>  - Jan
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] connecting to remote database using RMySQL

2010-11-05 Thread Jan Theodore Galkowski

Apologies if this is the wrong place to ask.  I'm not aware of a
mail list devoted to database interfaces.  Please direct me if
so.

I am trying to use TSMySQL. It loads RMySQL, and apparent the
CRAN version has an .onLoad script which seeks out the local
MySQL server.  If that fails, the package fails to load.

Thing is, I'm trying to access a MySQL database on a remote
machine for which I have authorized access. Anyone know how to do
this, or suggest where I can find out?

Thanks.  I have successfully accessed remote databases using
RODBC.  Is that how I need to go?

 - Jan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory Management under Linux

2010-11-05 Thread ricardo souza

Dear Jim,

Thanks for your attention. I am running a geostatistic analysis with geoR that 
is computational intense. At the end my analysis I call the function 
krige.control and krige.conv.  Do you have any idea how to improve the memory 
allocation in Linux?

Thanks,
Ricardo



De: jim holtman 
Assunto: Re: [R] Memory Management under Linux
Para: "ricardo souza" 
Cc: r-help@r-project.org
Data: Sexta-feira, 5 de Novembro de 2010, 10:21

It would be very useful if you would post some information about what
exactly you are doing.  There si something with the size of the data
object you are processing ('str' would help us understand it) and then
a portion of the script (both before and after the error message) so
we can understand the transformation that you are doing.  It is very
easy to generate a similar message:

> x <- matrix(0,2, 2)
Error: cannot allocate vector of size 3.0 Gb

but unless you know the context, it is almost impossible to give
advice.  It also depends on if you are in some function calls were
copies of objects may have been made, etc.

On Thu, Nov 4, 2010 at 7:52 PM, ricardo souza  wrote:
> Dear all,
>
> I am using ubuntu linux 32 with 4 Gb.  I am running a very small script and I 
> always got the same error message:  CAN NOT ALLOCATE A VECTOR OF SIZE 231.8 
> Mb.
>
> I have reading carefully the instruction in ?Memory.  Using the function gc() 
> I got very low numbers of memory (please sea below).  I know that it has been 
> posted several times at r-help 
> (http://tolstoy.newcastle.edu.au/R/help/05/06/7565.html#7627qlink2).  However 
> I did not find yet the solution to improve my memory issue in Linux.  
> Somebody cold please give some instruction how to improve my memory under 
> linux?
>
>> gc()
>  used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 170934  4.6 35  9.4   35  9.4
> Vcells 195920  1.5 786432  6.0   781384  6.0
>
> INCREASING THE R MEMORY FOLLOWING THE INSTRUCTION IN  ?Memory
>
> I started R with:
>
> R --min-vsize=10M --max-vsize=4G --min-nsize=500k --max-nsize=900M
>> gc()
>  used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
> Ncells 130433  3.5 50 13.4  25200   50 13.4
> Vcells  81138  0.7    1310720 10.0 NA   499143  3.9
>
> It increased but not so much!
>
> Please, please let me know.  I have read all r-help about this matter, but 
> not solution. Thanks for your attention!
>
> Ricardo
>
>
>
>
>
>
>
>        [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] improve R memory under linux

2010-11-05 Thread ricardo souza

Dear Jonathan,

It is not small,  I passed the wrong information.  I am running a geostatistic 
analysis with geoR that is computational intense. At the end my analysis I call 
the function krige.control and krige.conv.  However, what call my attention is 
that I have a friend that was able to run the same code with his MAC 5 years 
old (OS 32 with 4 Gb of memory).  Reading several email list I discovered that 
MAC has better library than repository in linux, but we can in increase the 
memory allocation in linux.  I do not know how

Below is the memory allocation for a MAC 32 of 4 Gb Ran.  I know that it 
depends of the machine but it is far way better than in linux.  Why?  How to 
improve it in linux?

Memory allocation in a laptop 4 years old MAC:
> gc()

used (Mb) gc  trigger   ( Mb)  max used   
(Mb)
Ncells   481875 25.8 984024 52.6 984024   
52.6
Vcells   481512  3.7   140511341  1072.1    641928461 4897.6

Memory allocation in a new laptop Linux 4Gb 32 Ubuntu:

>gc()
  used (Mb) gc trigger (Mb) max   used (Mb)
Ncells 170934  4.6 35  9.4   35  9.4
Vcells 195920  1.5 786432  6.0   781384  6.0

Thanks for your attention,
Ricardo
De: Jonathan P Daily 
Assunto: Re: [R] improve R memory under linux
Para: "ricardo souza" 
Cc: r-help@r-project.org, r-help-boun...@r-project.org
Data: Sexta-feira, 5 de Novembro de 2010, 13:36



A "very small script" should
fit just fine in an email: what are you trying to do?



Likely, you are assigning many small
variables in some loop. Even if you have 4GB of RAM available, if R assigns
3.99 GB of it and then then a call comes in to assign something of size
.02, it will tell you it can't allocate an object of size .02.

--

Jonathan P. Daily

Technician - USGS Leetown Science Center

11649 Leetown Road

Kearneysville WV, 25430

(304) 724-4480

"Is the room still a room when its empty? Does the room,

 the thing itself have purpose? Or do we, what's the word... imbue it."

     - Jubal Early, Firefly








From:


To:
r-help@r-project.org

Date:
11/05/2010 11:29 AM

Subject:
[R] improve R memory under linux

Sent by:
r-help-boun...@r-project.org








Dear all,



I am using ubuntu linux 32 with 4 Gb.  I am running a very small script
and I always got the same error message:  CAN NOT ALLOCATE A VECTOR
OF SIZE 231.8 Mb.



I have reading carefully the instruction in ?Memory.  Using the function
gc() I got very low numbers of memory (please sea below).  I know
that it has been posted several times at r-help 
(http://tolstoy.newcastle.edu.au/R/help/05/06/7565.html#7627qlink2). 
However I did not find yet the solution to improve my memory issue in Linux. 
Somebody cold please give some instruction how to improve my memory under
linux? 



> gc()

 used (Mb) gc trigger (Mb)
max used (Mb)

Ncells 170934  4.6 35  9.4  
35  9.4

Vcells 195920  1.5 786432  6.0  
781384  6.0



INCREASING THE R MEMORY FOLLOWING THE INSTRUCTION IN  ?Memory



I started R with:



R --min-vsize=10M --max-vsize=4G --min-nsize=500k --max-nsize=900M

> gc()

 used (Mb) gc trigger (Mb)
limit (Mb) max used (Mb)

Ncells 130433  3.5 50 13.4 
25200   50 13.4

Vcells  81138  0.7    1310720 10.0
NA   499143  3.9



It increased but not so much! 



Please, please let me know.  I have read all r-help about this matter,
[[elided Yahoo spam]]



Ricardo





      

                
[[alternative HTML version deleted]]



__

R-help@r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.








  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subsets, %in%

2010-11-05 Thread Seeliger . Curt

> Say I need to keep ID 1,2,4,5, 10 from the data frame dat. I can do:
>   dat <- data.frame(ID = 1:10, var = 1:10)
>   someID <- c(1,2,4,5,10)
>   subset(dat, dat$ID %in% someID)
> Is there a quick way to do the opposite ...
> 

Two operators spring to mind: ! and %nin
subset(dat, !(dat$ID %in% someID))
subset(dat, dat$ID %nin% someID)


-- 
Curt Seeliger, Data Ranger
Raytheon Information Services - Contractor to ORD
seeliger.c...@epa.gov
541/754-4638



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subsets, %in%

2010-11-05 Thread Jorge Ivan Velez

Hi MP,

Try

subset(dat, ! dat$ID %in% someID) # ! symbol

HTH,
Jorge


On Fri, Nov 5, 2010 at 10:13 AM, <> wrote:

> Hi,
>
> I have a question about %in% and subsettin data frames.
>
> Say I need to keep ID 1,2,4,5, 10 from the data frame dat. I can do:
>
> dat <- data.frame(ID = 1:10, var = 1:10)
> someID <- c(1,2,4,5,10)
> subset(dat, dat$ID %in% someID)
>
> Is there a quick way to do the opposite, ie to do a subset that contains
> all ID but someID? Something like %not in%, which would *remove* lines with
> ID in someID?
>
> I am asking because I need this in a more complex example where there are
> multiple lines with the same ID (data in long format) and I need to remove
> selected ID.
>
> thanks,
>
> MP
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subsets, %in%

2010-11-05 Thread Jonathan P Daily

Any logical value can be negatively compared using !
does:
subset(dat, !(dat$ID %in% someID))

provide what you need?
--
Jonathan P. Daily
Technician - USGS Leetown Science Center
11649 Leetown Road
Kearneysville WV, 25430
(304) 724-4480
"Is the room still a room when its empty? Does the room,
 the thing itself have purpose? Or do we, what's the word... imbue it."
 - Jubal Early, Firefly



From:
mp.sylves...@gmail.com
To:
r-help@r-project.org
Date:
11/05/2010 02:21 PM
Subject:
[R] subsets, %in%
Sent by:
r-help-boun...@r-project.org



Hi,

I have a question about %in% and subsettin data frames.

Say I need to keep ID 1,2,4,5, 10 from the data frame dat. I can do:

dat <- data.frame(ID = 1:10, var = 1:10)
someID <- c(1,2,4,5,10)
subset(dat, dat$ID %in% someID)

Is there a quick way to do the opposite, ie to do a subset that contains 
all ID but someID? Something like %not in%, which would *remove* lines 
with 
ID in someID?

I am asking because I need this in a more complex example where there are 
multiple lines with the same ID (data in long format) and I need to remove 
 
selected ID.

thanks,

MP

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subsets, %in%

2010-11-05 Thread Erik Iverson


Well, %in% returns a logical vector...

So

subset(dat, ! ID %in% someID)

Also, from ?subset:

Note
 that ‘subset’ will be evaluated in the data frame, so columns can
 be referred to (by name) as variables in the expression

Thus, you don't need 'dat$ID', bur just 'ID' in the subset argument.

-Erik

mp.sylves...@gmail.com wrote:

Hi,

I have a question about %in% and subsettin data frames.

Say I need to keep ID 1,2,4,5, 10 from the data frame dat. I can do:

dat <- data.frame(ID = 1:10, var = 1:10)
someID <- c(1,2,4,5,10)
subset(dat, dat$ID %in% someID)

Is there a quick way to do the opposite, ie to do a subset that contains  
all ID but someID? Something like %not in%, which would *remove* lines with  
ID in someID?


I am asking because I need this in a more complex example where there are  
multiple lines with the same ID (data in long format) and I need to remove  
selected ID.


thanks,

MP

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to extract Friday data from daily data.

2010-11-05 Thread thornbird


I am new to Using R for data analysis. I have an incomplete time series
dataset that is in daily format. I want to extract only Friday data from it.
However, there are two problems with it. 

First, if Friday data is missing in that week, I need to extract the data of
the day prior to that Friday (e.g. Thursday).

Second, sometimes there are duplicate Friday data (say Friday morning and
afternoon), but I only need the latest one (Friday afternoon). 

My question is how I can only extract the Friday data and make it a new
dataset so that I have data for every single week for the convenience of
data analysis. 

Your help and time will be appreciated. Thanks.  Kevin


Below is what my dataset looks like:

   views  number  timestamp daytime
1  views  910401 1246192687 Sun 6/28/2009 12:38
2  views  921537 1246278917 Mon 6/29/2009 12:35
3  views  934280 1246365403 Tue 6/30/2009 12:36
4  views  986463 1246888699 Mon  7/6/2009 13:58
5  views  995002 1246970243 Tue  7/7/2009 12:37
6  views 1005211 1247079398 Wed  7/8/2009 18:56
7  views 1011144 1247135553 Thu  7/9/2009 10:32
8  views 1026765 1247308591 Sat 7/11/2009 10:36
9  views 1036856 1247436951 Sun 7/12/2009 22:15
10 views 1040909 1247481564 Mon 7/13/2009 10:39
11 views 1057337 1247568387 Tue 7/14/2009 10:46
12 views 1066999 1247665787 Wed 7/15/2009 13:49
13 views 1077726 1247778752 Thu 7/16/2009 21:12
14 views 1083059 1247845413 Fri 7/17/2009 15:43
15 views 1083059 1247845824 Fri 7/17/2009 18:45
16 views 1089529 1247914194 Sat 7/18/2009 10:49

-- 
View this message in context: 
http://r.789695.n4.nabble.com/How-to-extract-Friday-data-from-daily-data-tp3029050p3029050.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] subsets, %in%

2010-11-05 Thread MP . Sylvestre

Hi,

I have a question about %in% and subsettin data frames.

Say I need to keep ID 1,2,4,5, 10 from the data frame dat. I can do:

dat <- data.frame(ID = 1:10, var = 1:10)
someID <- c(1,2,4,5,10)
subset(dat, dat$ID %in% someID)

Is there a quick way to do the opposite, ie to do a subset that contains  
all ID but someID? Something like %not in%, which would *remove* lines with  
ID in someID?

I am asking because I need this in a more complex example where there are  
multiple lines with the same ID (data in long format) and I need to remove  
selected ID.

thanks,

MP

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] X11 and pdf differences

2010-11-05 Thread statquant2


Hi guys,
I have the following problem : when plooting using the X11() device, the
output I get is different from the output I get when I use pdf().
For instance the title font size in a pdf seems to be proportionnaly bigger
that in a x11...
 
I tryed to set pdf.options() egal to x11.options(), but it didn't work, do
one of you have an idea how to deal with this ?

Thanks for reading
Cheers 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/X11-and-pdf-differences-tp3028649p3028649.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] data acquisition with R?

2010-11-05 Thread B.-Markus Schuller


Hello,

I spent quite some time now searching for any hint that R can also be 
used to address the interfaces of a computer (i.e. RS232 or USB) to 
acquire data from measurement devices (like with the - I think it is the 
- devices or serial toolbox of Matlab).


Is there any package available or a project going on that you know of? I 
would so much like to have never to work with Matlab again. The only 
thing I am really missing in R so far is the possibility to connect to 
my measurement devices (for instance a precision balance) and record 
data directly with R.


Please let me know whether I am just missing something or if you have 
some information about something like that.


Thank you very much!
Mango
--
-
B.-Markus Schuller aka Mango

Sensory Ecology Group
Max-Planck-Institute for Ornithology
82319 Seewiesen, Germany

phone: +49 (0)8157 932 -378
fax:   +49 (0)8157 932 -344
email: schul...@orn.mpg.de
http://www.orn.mpg.de/nwg/abtsiemers.html
-
Never run for the bus.
Never skip tea.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help Digest, Vol 93, Issue 5

2010-11-05 Thread Joshua Ulrich

On Fri, Nov 5, 2010 at 6:00 AM,   wrote:
> Send R-help mailing list submissions to
>        r-h...@r-project.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        https://stat.ethz.ch/mailman/listinfo/r-help
> or, via email, send a message with subject or body 'help' to
>        r-help-requ...@r-project.org
>
> You can reach the person managing the list at
>        r-help-ow...@r-project.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of R-help digest..."
>
>
> Message: 121
> Date: Fri, 5 Nov 2010 09:28:51 +0100
> From: spela podgorsek 
> To: r-help@r-project.org
> Subject: [R] as.xts
> Message-ID:
>        
> Content-Type: text/plain; charset=UTF-8
>
> hey
>
> I am trying to turn a dataframe into xts with the function:
> as.xts,
> but it returns the error:
>
> Error in as.POSIXlt.character(x, tz, ...) :
> character string is not in a standard unambiguous format
>
> could someone give me some pointers please
>
> the data is coming from a spreadsheet via the excel, and has 5 columns
> of data (date (with the date and time), open, high, low, close) (excel
> format)
>
> ela
>

as.xts.data.frame expects the rownames of the data.frame to contain
the dates/times.  It would probably be easier to use the xts
constructor on your data.frame:

xData <- xts(Data[,-1],Data[,1])  # assumes "date" in first column

You will need to ensure that Data[,"date"] is a time-based class (e.g.
Date, POSIXt).  If it is character, you will need to convert it to a
time-based class before calling xts().

Best,
--
Joshua Ulrich  |  FOSS Trading: www.fosstrading.com

>
> ___
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> End of R-help Digest, Vol 93, Issue 5
> *
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regular Expressions

2010-11-05 Thread Gabor Grothendieck

2010/11/5 Brian Diggs :
> Is there a standard, built in way to get both (all) backreferences at the
> same time with just one call to sub (or the appropriate function)? I can
> cobble something together specifically for 2 backreferences (not extensively
> tested):
>
> both_backrefs <- function(pattern, x) {
>        s <- sub(pattern, "\\1\034\\2", x)
>        matrix(unlist(strsplit(s,"\034")), ncol=2, byrow=TRUE)
> }
>
> both_backrefs(regex, x)
>
> However, putting the parts back together into a string (with a delimiter
> that hopefully won't be in the string otherwise) just to use strsplit to
> pull them apart seems inelegant (as does making multiple calls to sub()).
>  sub() (and siblings) surely already have the backreferences as strings at
> some point in the processing, but I don't see a way to return them as a
> vector or matrix, only to substitute using backreferences (sub) or return
> indicies pointing to where the matches start (regexpr) or return the whole
> string matches (grep with value=TRUE).
>

The gsubfn package has gsubfn which is like gsub except it can take a
function in place of the replacement string.  The function's arguments
are match or the back references and the function's output replaces
the match.Also it has strapply which will does the same thing
except instead of inserting the function's output it returns the
function's output.  See the home page at http://gsubfn.googlecode.com

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regular Expressions

2010-11-05 Thread Brian Diggs


On 11/5/2010 12:09 AM, Prof Brian Ripley wrote:

On Thu, 4 Nov 2010, Noah Silverman wrote:


Hi,

I'm trying to figure out how to use capturing parenthesis in regular
expressions in R. (Doing this in Perl, Java, etc. is fairly trivial,
but I can't seem to find the functionality in R.)

For example, given the string: "10 Nov 13.00 (PFE1020K13)"

I want to capture the first to digits and then the month abreviation.

In perl, this would be

/^(\d\d)\s(\w\w\w)\s/

Then I have the variables $1 and $1 assigned to the capturing
parenthesis.

I've found the grep and sub commands in R, but the docs don't indicate
any way to capture things.

Any suggestions?


Read the the link to ?regexp. It *does* 'indicate the way to capture
things'.

The backreference ‘\N’, where ‘N = 1 ... 9’, matches the substring
previously matched by the Nth parenthesized subexpression of the
regular expression. (This is an extension for extended regular
expressions: POSIX defines them only for basic ones.)

and there is an example on the help page for grep():

## Double all 'a' or 'b's; "\" must be escaped, i.e., 'doubled'
gsub("([ab])", "\\1_\\1_", "abc and ABC")

In your example

x <- "10 Nov 13.00 (PFE1020K13)"
regex <- "(\\d\\d)\\s(\\w\\w\\w).*"
sub(regex, "\\1", x, perl = TRUE)
sub(regex, "\\2", x, perl = TRUE)

A better way to do this would be something like

regex <- "([[:digit:]]{2})\\s([[:alpha:]]{3}).*"

which is also a POSIX extended regexp.


Is there a standard, built in way to get both (all) backreferences at 
the same time with just one call to sub (or the appropriate function)? 
I can cobble something together specifically for 2 backreferences (not 
extensively tested):


both_backrefs <- function(pattern, x) {
s <- sub(pattern, "\\1\034\\2", x)
matrix(unlist(strsplit(s,"\034")), ncol=2, byrow=TRUE)
}

both_backrefs(regex, x)

However, putting the parts back together into a string (with a delimiter 
that hopefully won't be in the string otherwise) just to use strsplit to 
pull them apart seems inelegant (as does making multiple calls to 
sub()).  sub() (and siblings) surely already have the backreferences as 
strings at some point in the processing, but I don't see a way to return 
them as a vector or matrix, only to substitute using backreferences 
(sub) or return indicies pointing to where the matches start (regexpr) 
or return the whole string matches (grep with value=TRUE).


--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Loop

2010-11-05 Thread Matevž Pavlič

Hi Jim an Petr, 

Both of your solutions do what i want. Thanks for the help, 

Regards, m

-Original Message-
From: Petr PIKAL [mailto:petr.pi...@precheza.cz] 
Sent: Friday, November 05, 2010 6:49 AM
To: Matevž Pavlič
Cc: jim holtman; r-help@r-project.org
Subject: RE: [R] Loop

Hi

the list/loop solution given below enables to give you both.  You can transform 
it to data,frame

data.frame(do.call(cbind, lll),sapply(lll, names))

and add appropriate names

or you can use Jim's solution and combine those 2 steps

Regards
Petr


Matevž Pavlič  napsal dne 05.11.2010 00:05:00:

> Hi Jim,
> 
> Actually, this is better, but both values are what i am looking for. 
Count and
> the value of the count. 
> Is there a way to just paste those two together?
> 
> Thanks, m
> 



> >
> > If you want to do it in loop (can be quicker sometimes) and save it 
> > to

> > list make a list
> >
> > lll<-vector("list", 10)
> >
> > and fill it with your results
> >
> > for (i in 1:10) lll[[i]]<-head(sort(table(mat[,i]), decreasing=T),5)
> >
> > and now you can call values from this lll list simply by
> >
> > lll[5]
> > [[1]]
> >
> >   9   15   136   16
> > 5199 5113 5079 5059 5057
> >
> > lll[[5]]
> >
> >   9   15   136   16
> > 5199 5113 5079 5059 5057
> >
> > or even
> >
> > lll[[5]][3]
> >  13
> > 5079
> >
> > without need for writing to individual files pasting together 
> > letters and numbers etc.
> >
> > There shall be R-intro document in your installation and it is worth 
> > reading. It is not so big, you can manage it in less then month if 
> > you

> > complete more than 3 pages per day.
> >
> > Regards
> > Petr
> >
> >
> >
> >>
> >> M
> >>
> >> -Original Message-
> >> From: Petr PIKAL [mailto:petr.pi...@precheza.cz]
> >> Sent: Thursday, November 04, 2010 3:40 PM
> >> To: Matevž Pavlič
> >> Cc: r-help@r-project.org
> >> Subject: Re: [R] Loop
> >>
> >> Hi
> >>
> >> r-help-boun...@r-project.org napsal dne 04.11.2010 14:21:38:
> >>
> >> > Hi David,
> >> >
> >> > I am still having troubles with that loop ...
> >> >
> >> > This code gives me (kinda) the name of the column/field in a data
> > frame.
> >> Filed
> >> > names are form W1-W10. But there is a space between W and a 
> >> > number
> >> > -->
> >> "W 10",
> >> > and column (field) names do not contain numbers.
> >> >
> >> > >for(i in 1:10)
> >> > >{
> >> > >vari <- paste("W",i)
> >> > >}
> >> > >vari
> >> >
> >> > [1] "W 10"
> >> >
> >> > Now as i understand than i would call different columns to R with
> >> >
> >> > >w<-lit[[vari]]
> >> >
> >> > Or am i wrong again?
> >> >
> >> > Then I would probably need another loop to create the names of 
> >> > the
> >> variables
> >> > on R, i.e. w1 to w10. Is that a general idea for the procedure?
> >>
> >> Beware of such loops. Instead of littering your workspace with
> > files/objects
> >> constructed by some paste(whatever, i) solution you can save 
> >> results in
> > list
> >> or data.frame or matrix and simply use basic subsetting procedures 
> >> or
> > lapply/
> >> sapply functions.
> >>
> >> I must say I never used such paste(...) construction yet and I work 
> >> with
> > R for
> >> quite a long time.
> >>
> >> Regards
> >> Petr
> >>
> >>
> >> >
> >> >
> >> > Thank for the help, m
> >> >
> >> > -Original Message-
> >> > From: David Winsemius [mailto:dwinsem...@comcast.net]
> >> > Sent: Wednesday, November 03, 2010 10:41 PM
> >> > To: Matevž Pavlič
> >> > Cc: r-help@r-project.org
> >> > Subject: Re: [R] Loop
> >> >
> >> >
> >> > On Nov 3, 2010, at 5:03 PM, Matevž Pavlič wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > Thanks for the help and the manuals. Will come very handy i am
sure.
> >> > >
> >> > > But regarding the code i don't hink this is what i 
> >> > > wantbasically
> >
> >> > > i
> >>
> >> > > would like to repeat bellow code :
> >> > >
> >> > > w1<-table(lit$W1)
> >> > > w1<-as.data.frame(w1)
> >> >
> >> > It appears you are not reading for meaning. Burns has advised you 
> >> > how to
> >>
> >> > construct column names and use them in your initial steps. The 
> >> > `$`
> >> function is
> >> > quite limited in comparison to `[[` , so he was showing you a 
> >> > method
> >> that
> >> > would be more effective.  BTW the as.data.frame step is 
> >> > unnecessary,
> >> since the
> >> > first thing write.table does is coerce an object to a data.frame. 
> >> > The "write.table" name is misleading. It should be 
> >> > "write.data.frame". You
> >> cannot
> >> > really write tables with write.table.
> >> >
> >> > You would also use:
> >> >
> >> >   file=paste(vari, "csv", sep=".") as the file argument to 
> >> > write.table
> >> >
> >> > > write.table(w1,file="w1.csv",sep=";",row.names=T, dec=".")
> >> >
> >> > What are these next actions supposed to do after the file is
written?
> >> > Are you trying to store a group of related "w" objects that will 
> >> > later
> >> be
> >> > indexed in sequence? If so, then a list would make more sense.
> >> >
> >> > --
> >> > David.
> >> >

Re: [R] how to work with long vectors

2010-11-05 Thread William Dunlap

There was a numerical typo below, I said
the sample sizes were 5 and 10 thousand,
I should have said 10 and 20 thousand
(the size argument to sample()).

Also, I timed cover_per_2 and _3 for size
200,000 and gots times of 338 and 0.12 seconds,
respectively.  Growing the problem by a factor
to 10 made cover_per_2 used 100 times more time
and cover_per_3 c. 10 times more (the times are
too small to get an accurate ratio).

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of William Dunlap
> Sent: Friday, November 05, 2010 9:58 AM
> To: Changbin Du
> Cc: r-help@r-project.org
> Subject: Re: [R] how to work with long vectors
> 
> The following cover_per_3 uses sorting to solve
> the problem more quickly.  It still has room
> for improvement.
> 
> cover_per_3 <- function (data) 
> {
> n <- length(data)
> o <- rev(order(data))
> sdata <- data[o]
> r <- rle(sdata)$lengths
> output <- numeric(n)
> output[o] <- rep(cumsum(r), r)
> 100 * output/n
> }
> 
> (The ecdf function would probabably also do the
> job quickly.)
> 
> When trying to work on problems like this I find
> it most fruitful to work on smaller datasets and
> see how the time grows with the size of the data,
> instead of seeing how many days a it takes on a huge
> dataset.  E.g., the following compares times for
> your original function, Phil Spector's simple cleanup
> of your function, and the sort based approach for
> vectors of length 5 and 10 thousand.
> 
> > z<-sample(5e3, size=1e4, replace=TRUE) ; print(system.time(v <-
> cover_per(z))) ; print(system.time(v_2 <- cover_per_2(z))) ;
> print(system.time(v_3 <- cover_per_3(z)))
>user  system elapsed 
>   38.210.00   38.41 
>user  system elapsed 
>0.860.000.86 
>user  system elapsed 
>   0   0   0 
> > identical(v_3,v)
> [1] TRUE
> > z<-sample(1e4, size=2e4, replace=TRUE) ; print(system.time(v <-
> cover_per(z))) ; print(system.time(v_2 <- cover_per_2(z))) ;
> print(system.time(v_3 <- cover_per_3(z)))
>user  system elapsed 
>  158.480.07  159.31 
>user  system elapsed 
>3.230.003.25 
>user  system elapsed 
>0.020.000.02 
> > identical(v_3,v)
> [1] TRUE
> 
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com  
> 
> > -Original Message-
> > From: r-help-boun...@r-project.org 
> > [mailto:r-help-boun...@r-project.org] On Behalf Of Changbin Du
> > Sent: Friday, November 05, 2010 9:14 AM
> > To: Phil Spector
> > Cc: 
> > Subject: Re: [R] how to work with long vectors
> > 
> > HI, Phil,
> > 
> > I used the following codes and run it overnight for 15 hours, 
> > this morning,
> > I stopped it. It seems it is still not efficient.
> > 
> > 
> > >
> > matt<-read.table("/house/groupdirs/genetic_analysis/mjblow/ILL
> UMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GW>
> ZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth",
> > sep="\t", skip=0, header=F,fill=T) #
> > > names(matt)<-c("id","reads")
> > 
> > > dim(matt)
> > [1] 3384766   2
> > 
> > >  cover<-matt$reads
> > 
> > > cover_per_2 <- function(data){
> > +   l = length(data)
> > +   output = numeric(l)
> > +   for(i in 1:l)output[i] = sum(data >= data[i])
> > +   100 * output / l
> > + }
> > 
> > > result3<-cover_per_2(cover)
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > On Thu, Nov 4, 2010 at 10:37 AM, Changbin Du 
> >  wrote:
> > 
> > > Thanks Phil, that is great! I WILL try this and let you 
> > know how it goes.
> > >
> > >
> > >
> > >
> > > On Thu, Nov 4, 2010 at 10:16 AM, Phil Spector 
> > wrote:
> > >
> > >> Changbin -
> > >>   Does
> > >>
> > >>100 * sapply(matt$reads,function(x)sum(matt$reads >=
> > >> x))/length(matt$reads)
> > >>
> > >> give what you want?
> > >>
> > >>By the way, if you want to use a loop (there's nothing 
> > wrong with
> > >> that),
> > >> then try to avoid the most common mistake that people make 
> > with loops in
> > >> R:
> > >> having your result grow inside the loop.  Here's a better 
> > way to use a
> > >> loop
> > >> to solve your problem:
> > >>
> > >> cover_per_1 <- function(data){
> > >>   l = length(data)
> > >>   output = numeric(l)
> > >>   for(i in 1:l)output[i] = 100 * sum(ifelse(data >= data[i], 1,
> > >> 0))/length(data)
> > >>   output
> > >> }
> > >>
> > >> Using some random data, and comparing to your original 
> > cover_per function:
> > >>
> > >>  dat = rnorm(1000)
> > >>> system.time(one <- cover_per(dat))
> > >>>
> > >>   user  system elapsed
> > >>  0.816   0.000   0.824
> > >>
> > >>> system.time(two <- cover_per_1(dat))
> > >>>
> > >>   user  system elapsed
> > >>  0.792   0.000   0.805
> > >>
> > >> Not that big a speedup, but it does increase quite a bit 
> > as the problem
> > >> gets
> > >> larger.
> > >>
> > >> There are two obvious ways to speed up your function:
> > >>   1)  Eliminate the

Re: [R] how to work with long vectors

2010-11-05 Thread Changbin Du

Thanks, William. It gave me a lesson.


On Fri, Nov 5, 2010 at 9:58 AM, William Dunlap  wrote:

> The following cover_per_3 uses sorting to solve
> the problem more quickly.  It still has room
> for improvement.
>
> cover_per_3 <- function (data)
> {
>n <- length(data)
>o <- rev(order(data))
>sdata <- data[o]
>r <- rle(sdata)$lengths
>output <- numeric(n)
>output[o] <- rep(cumsum(r), r)
>100 * output/n
> }
>
> (The ecdf function would probabably also do the
> job quickly.)
>
> When trying to work on problems like this I find
> it most fruitful to work on smaller datasets and
> see how the time grows with the size of the data,
> instead of seeing how many days a it takes on a huge
> dataset.  E.g., the following compares times for
> your original function, Phil Spector's simple cleanup
> of your function, and the sort based approach for
> vectors of length 5 and 10 thousand.
>
> > z<-sample(5e3, size=1e4, replace=TRUE) ; print(system.time(v <-
> cover_per(z))) ; print(system.time(v_2 <- cover_per_2(z))) ;
> print(system.time(v_3 <- cover_per_3(z)))
>   user  system elapsed
>  38.210.00   38.41
>   user  system elapsed
>   0.860.000.86
>   user  system elapsed
>  0   0   0
> > identical(v_3,v)
> [1] TRUE
> > z<-sample(1e4, size=2e4, replace=TRUE) ; print(system.time(v <-
> cover_per(z))) ; print(system.time(v_2 <- cover_per_2(z))) ;
> print(system.time(v_3 <- cover_per_3(z)))
>   user  system elapsed
>  158.480.07  159.31
>   user  system elapsed
>   3.230.003.25
>   user  system elapsed
>   0.020.000.02
> > identical(v_3,v)
> [1] TRUE
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
> > -Original Message-
> > From: r-help-boun...@r-project.org
> > [mailto:r-help-boun...@r-project.org] On Behalf Of Changbin Du
> > Sent: Friday, November 05, 2010 9:14 AM
> > To: Phil Spector
> > Cc: 
> > Subject: Re: [R] how to work with long vectors
> >
> > HI, Phil,
> >
> > I used the following codes and run it overnight for 15 hours,
> > this morning,
> > I stopped it. It seems it is still not efficient.
> >
> >
> > >
> > matt<-read.table("/house/groupdirs/genetic_analysis/mjblow/ILL
> UMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GW>
> ZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth",
> > sep="\t", skip=0, header=F,fill=T) #
> > > names(matt)<-c("id","reads")
> >
> > > dim(matt)
> > [1] 3384766   2
> >
> > >  cover<-matt$reads
> >
> > > cover_per_2 <- function(data){
> > +   l = length(data)
> > +   output = numeric(l)
> > +   for(i in 1:l)output[i] = sum(data >= data[i])
> > +   100 * output / l
> > + }
> >
> > > result3<-cover_per_2(cover)
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Nov 4, 2010 at 10:37 AM, Changbin Du
> >  wrote:
> >
> > > Thanks Phil, that is great! I WILL try this and let you
> > know how it goes.
> > >
> > >
> > >
> > >
> > > On Thu, Nov 4, 2010 at 10:16 AM, Phil Spector
> > wrote:
> > >
> > >> Changbin -
> > >>   Does
> > >>
> > >>100 * sapply(matt$reads,function(x)sum(matt$reads >=
> > >> x))/length(matt$reads)
> > >>
> > >> give what you want?
> > >>
> > >>By the way, if you want to use a loop (there's nothing
> > wrong with
> > >> that),
> > >> then try to avoid the most common mistake that people make
> > with loops in
> > >> R:
> > >> having your result grow inside the loop.  Here's a better
> > way to use a
> > >> loop
> > >> to solve your problem:
> > >>
> > >> cover_per_1 <- function(data){
> > >>   l = length(data)
> > >>   output = numeric(l)
> > >>   for(i in 1:l)output[i] = 100 * sum(ifelse(data >= data[i], 1,
> > >> 0))/length(data)
> > >>   output
> > >> }
> > >>
> > >> Using some random data, and comparing to your original
> > cover_per function:
> > >>
> > >>  dat = rnorm(1000)
> > >>> system.time(one <- cover_per(dat))
> > >>>
> > >>   user  system elapsed
> > >>  0.816   0.000   0.824
> > >>
> > >>> system.time(two <- cover_per_1(dat))
> > >>>
> > >>   user  system elapsed
> > >>  0.792   0.000   0.805
> > >>
> > >> Not that big a speedup, but it does increase quite a bit
> > as the problem
> > >> gets
> > >> larger.
> > >>
> > >> There are two obvious ways to speed up your function:
> > >>   1)  Eliminate the ifelse function, since automatic coersion from
> > >>   logical to numeric does the same thing.
> > >>   2)  Multiply by 100 and divide by the length outside the loop:
> > >>
> > >> cover_per_2 <- function(data){
> > >>   l = length(data)
> > >>   output = numeric(l)
> > >>   for(i in 1:l)output[i] = sum(data >= data[i])
> > >>   100 * output / l
> > >> }
> > >>
> > >>  system.time(three <- cover_per_2(dat))
> > >>>
> > >>   user  system elapsed
> > >>  0.024   0.000   0.027
> > >>
> > >> That makes the loop just about equivalent to the sapply solution:
> > >>
> > >>  system.time(four <- 100*sapply(dat,function(x)sum(dat >=
> > x))/length(dat))
> > >>>
> > >>   user  system elapsed
> > >>  0.024   0.000   0.026
> > >>

Re: [R] How to extract particular rows and column from a table

2010-11-05 Thread Mike Rennie

Hi Mauluda,

Next time, please read the posting guide- helping you is made alot easier if
you provide the code that didn't work.

It sounds like you might want something like this?

#make a data frame, with some column names assigned...
aa<-data.frame(c(rep("a",5), rep("c",3)),c(rep(7,5), rep(2,3)))
aa
colnames(aa)<-c("cola", "colb")
aa

#select your items of interest...
ab<-aa$colb[aa$cola=="a"]
ab

HTH,

Mike
On Fri, Nov 5, 2010 at 11:26 AM, Mauluda Akhtar  wrote:

> Hello,
> I'm a new user of R. I've a very big table like the following structure
> (suppose the variable name with "aa"). From this table I want to make a new
> table which'll contain just two column with V2 and V6 with some particular
> rows( Suppose, variable name with "bb"). I'd like to mention V2 column is
> representing the id  that correspond to the column V6 whis is represention
> the base position of DNA. In this bb table, just for an example I want to
> extract all the corresponding rows of V2 column where in V6 column there is
> "30049831" (in my table there is repeatation of same base position). I
> tried
> this but faild to solve.
> Could you please let me know how can i solve this.
>
> Thank you.
> Mauluda
>
>
> V1   V2
> V3V4  V5   V6  V7V8   V9 V10  V11
> ESMEHEP0102102796h05.w2kF59780SCF:32  CpGVariation6
> 3004983130049831+.-1NA
> ESMEHEP0102102796h05.w2kF59780SCF:114CpGVariation6
> 3004991330049913+.31NA
> ESMEHEP0102102796h05.w2kF59780SCF:154CpGVariation6
> 3004995330049953+.48NA
> ESMEHEP0102102796h05.w2kF59780SCF:170CpGVariation6
> 3004996930049969+.30NA
> ESMEHEP0102102796h05.w2kF59780SCF:172CpGVariation6
> 3004997130049971+.38NA
> ESMEHEP0102102796h05.w2kF59780SCF:245CpGVariation6
> 3005004430050044+.14NA
> ESMEHEP0102102796h05.w2kF59780SCF:363CpGVariation6
> 3005016230050162+.0NA
> ESMEHEP0102102796h05.w2kF59780SCF:382CpGVariation6
> 3005018130050181+.1NA
> ESMEHEP0102102796a04.w2kF59780SCF:114CpGVariation6
> 3004991330049913+.25NA
> ESMEHEP0102102796a04.w2kF59780SCF:154CpGVariation6
> 3004995330049953+.28NA
> ESMEHEP0102102796a04.w2kF59780SCF:170CpGVariation6
> 3004996930049969+.28NA
> ESMEHEP0102102796a04.w2kF59780SCF:172CpGVariation6
> 3004997130049971+.45NA
> ESMEHEP0102102796a04.w2kF59780SCF:245CpGVariation6
> 3005004430050044+.29NA
> ESMEHEP0102102796a04.w2kF59780SCF:363CpGVariation6
> 3005016230050162+.0NA
> ESMEHEP0102102796a04.w2kF59780SCF:382CpGVariation6
> 3005018130050181+.8NA
> ESMEHEP0102102796e06.w2kF59780SCF:114CpGVariation6
> 3004991330049913+.20NA
> ESMEHEP0102102796e06.w2kF59780SCF:154CpGVariation6
> 3004995330049953+.28NA
> ESMEHEP0102102796e06.w2kF59780SCF:170CpGVariation6
> 3004996930049969+.44NA
> ESMEHEP0102102796e06.w2kF59780SCF:172CpGVariation6
> 3004997130049971+.-1NA
> ESMEHEP0102102796e06.w2kF59780SCF:245CpGVariation6
> 3005004430050044+.22NA
> ESMEHEP0102102796e06.w2kF59780SCF:363CpGVariation6
> 3005016230050162+.0NA
> ESMEHEP0102102796e06.w2kF59780SCF:382CpGVariation6
> 3005018130050181+.0NA
> ESMEHEP0102102788c04.w2kF59780SCF:32  CpGVariation6
> 3004983130049831+.-1NA
> ESMEHEP0102102788c04.w2kF59780SCF:114CpGVariation6
> 3004991330049913+.38NA
> ESMEHEP0102102788c04.w2kF59780SCF:154CpGVariation6
> 3004995330049953+.31NA
> ESMEHEP0102102788c04.w2kF59780SCF:170CpGVariation6
> 3004996930049969+.54NA
> ESMEHEP0102102788c04.w2kF59780SCF:172CpGVariation6
> 3004997130049971+.36NA
> ESMEHEP0102102788c04.w2kF59780SCF:245CpGVariation6
> 3005004430050044+.27NA
> ESMEHEP0102102788c04.w2kF59780SCF:363CpGVariation6
> 3005016230050162+.0NA
> ESMEHEP0102102788c04.w2kF59780SCF:382CpGVariation6
> 3005018130050181+.4NA
> ESMEHEP0102102796d06.w2kF59780SCF:32CpGVariation6
> 3004983130049831+.-1NA
> ESMEHEP0102102796d06.w2kF59780SCF:114CpGVariation6
> 3004991330049913+.0NA
> ESMEHEP0102102796d06.w2kF597

Re: [R] how to work with long vectors

2010-11-05 Thread William Dunlap

The following cover_per_3 uses sorting to solve
the problem more quickly.  It still has room
for improvement.

cover_per_3 <- function (data) 
{
n <- length(data)
o <- rev(order(data))
sdata <- data[o]
r <- rle(sdata)$lengths
output <- numeric(n)
output[o] <- rep(cumsum(r), r)
100 * output/n
}

(The ecdf function would probabably also do the
job quickly.)

When trying to work on problems like this I find
it most fruitful to work on smaller datasets and
see how the time grows with the size of the data,
instead of seeing how many days a it takes on a huge
dataset.  E.g., the following compares times for
your original function, Phil Spector's simple cleanup
of your function, and the sort based approach for
vectors of length 5 and 10 thousand.

> z<-sample(5e3, size=1e4, replace=TRUE) ; print(system.time(v <-
cover_per(z))) ; print(system.time(v_2 <- cover_per_2(z))) ;
print(system.time(v_3 <- cover_per_3(z)))
   user  system elapsed 
  38.210.00   38.41 
   user  system elapsed 
   0.860.000.86 
   user  system elapsed 
  0   0   0 
> identical(v_3,v)
[1] TRUE
> z<-sample(1e4, size=2e4, replace=TRUE) ; print(system.time(v <-
cover_per(z))) ; print(system.time(v_2 <- cover_per_2(z))) ;
print(system.time(v_3 <- cover_per_3(z)))
   user  system elapsed 
 158.480.07  159.31 
   user  system elapsed 
   3.230.003.25 
   user  system elapsed 
   0.020.000.02 
> identical(v_3,v)
[1] TRUE

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Changbin Du
> Sent: Friday, November 05, 2010 9:14 AM
> To: Phil Spector
> Cc: 
> Subject: Re: [R] how to work with long vectors
> 
> HI, Phil,
> 
> I used the following codes and run it overnight for 15 hours, 
> this morning,
> I stopped it. It seems it is still not efficient.
> 
> 
> >
> matt<-read.table("/house/groupdirs/genetic_analysis/mjblow/ILL
UMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GW>
ZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth",
> sep="\t", skip=0, header=F,fill=T) #
> > names(matt)<-c("id","reads")
> 
> > dim(matt)
> [1] 3384766   2
> 
> >  cover<-matt$reads
> 
> > cover_per_2 <- function(data){
> +   l = length(data)
> +   output = numeric(l)
> +   for(i in 1:l)output[i] = sum(data >= data[i])
> +   100 * output / l
> + }
> 
> > result3<-cover_per_2(cover)
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Thu, Nov 4, 2010 at 10:37 AM, Changbin Du 
>  wrote:
> 
> > Thanks Phil, that is great! I WILL try this and let you 
> know how it goes.
> >
> >
> >
> >
> > On Thu, Nov 4, 2010 at 10:16 AM, Phil Spector 
> wrote:
> >
> >> Changbin -
> >>   Does
> >>
> >>100 * sapply(matt$reads,function(x)sum(matt$reads >=
> >> x))/length(matt$reads)
> >>
> >> give what you want?
> >>
> >>By the way, if you want to use a loop (there's nothing 
> wrong with
> >> that),
> >> then try to avoid the most common mistake that people make 
> with loops in
> >> R:
> >> having your result grow inside the loop.  Here's a better 
> way to use a
> >> loop
> >> to solve your problem:
> >>
> >> cover_per_1 <- function(data){
> >>   l = length(data)
> >>   output = numeric(l)
> >>   for(i in 1:l)output[i] = 100 * sum(ifelse(data >= data[i], 1,
> >> 0))/length(data)
> >>   output
> >> }
> >>
> >> Using some random data, and comparing to your original 
> cover_per function:
> >>
> >>  dat = rnorm(1000)
> >>> system.time(one <- cover_per(dat))
> >>>
> >>   user  system elapsed
> >>  0.816   0.000   0.824
> >>
> >>> system.time(two <- cover_per_1(dat))
> >>>
> >>   user  system elapsed
> >>  0.792   0.000   0.805
> >>
> >> Not that big a speedup, but it does increase quite a bit 
> as the problem
> >> gets
> >> larger.
> >>
> >> There are two obvious ways to speed up your function:
> >>   1)  Eliminate the ifelse function, since automatic coersion from
> >>   logical to numeric does the same thing.
> >>   2)  Multiply by 100 and divide by the length outside the loop:
> >>
> >> cover_per_2 <- function(data){
> >>   l = length(data)
> >>   output = numeric(l)
> >>   for(i in 1:l)output[i] = sum(data >= data[i])
> >>   100 * output / l
> >> }
> >>
> >>  system.time(three <- cover_per_2(dat))
> >>>
> >>   user  system elapsed
> >>  0.024   0.000   0.027
> >>
> >> That makes the loop just about equivalent to the sapply solution:
> >>
> >>  system.time(four <- 100*sapply(dat,function(x)sum(dat >= 
> x))/length(dat))
> >>>
> >>   user  system elapsed
> >>  0.024   0.000   0.026
> >>
> >>- Phil Spector
> >> Statistical 
> Computing Facility
> >> Department of Statistics
> >> UC Berkeley
> >> spec...@stat.berkeley.edu
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Thu

Re: [R] how to work with long vectors

2010-11-05 Thread Changbin Du

Thanks Martin! I will try it and will let your guys know how it goes.


On Fri, Nov 5, 2010 at 9:42 AM, Martin Morgan  wrote:

> On 11/05/2010 09:13 AM, Changbin Du wrote:
> > HI, Phil,
> >
> > I used the following codes and run it overnight for 15 hours, this
> morning,
> > I stopped it. It seems it is still not efficient.
> >
> >
> >>
> >
> matt<-read.table("/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth",
> > sep="\t", skip=0, header=F,fill=T) #
> >> names(matt)<-c("id","reads")
> >
> >> dim(matt)
> > [1] 3384766   2
>
> [snip]
>
> >>> On Thu, 4 Nov 2010, Changbin Du wrote:
> >>>
> >>>  HI, Dear R community,
> 
>  I have one data set like this,  What I want to do is to calculate the
>  cumulative coverage. The following codes works for small data set
> (#rows
>  =
>  100), but when feed the whole data set,  it still running after 24
> hours.
>  Can someone give some suggestions for long vector?
> 
>  idreads
>  Contig79:14
>  Contig79:28
>  Contig79:313
>  Contig79:414
>  Contig79:517
>  Contig79:620
>  Contig79:725
>  Contig79:827
>  Contig79:932
>  Contig79:1033
>  Contig79:1134
> 
> 
> 
> matt<-read.table("/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth",
>  sep="\t", skip=0, header=F,fill=T) #
>  dim(matt)
>  [1] 3384766   2
> 
>  matt_plot<-function(matt, outputfile) {
>  names(matt)<-c("id","reads")
> 
>  cover<-matt$reads
> 
> 
>  #calculate the cumulative coverage.
>  + cover_per<-function (data) {
>  + output<-numeric(0)
>  + for (i in data) {
>  +   x<-(100*sum(ifelse(data >= i, 1, 0))/length(data))
>  +   output<-c(output, x)
>  + }
>  + return(output)
>  + }
> 
> 
>  result<-cover_per(cover)
>
> Hi Changbin
>
> If I understand correctly, your contigs 'start' at position 1, and have
> 'width' equal to matt$reads. You'd like to know the coverage at the last
> covered location of each contig in matt$reads.
>
> ## first time only
> source("http://bioconductor.org";)
> biocLite("IRanges")
>
> ##
> library(IRanges)
> contigs = IRanges(start=1, width=matt$reads)
> cvg = coverage(contigs) ## an RLE summarizing coverage, from position 1
> as.vector(cvg[matt$reads]) / nrow(matt)  ## at the end of each contig
>
> for a larger data set:
>
> > matt=data.frame(reads=ceiling(as.integer(runif(3384766, 1, 100
> > contigs = IRanges(start=1, width=matt$reads)
> > system.time(cvg <- coverage(contigs))
>   user  system elapsed
>  5.145   0.050   5.202
>
> Martin
>
> 
> 
>  Thanks so much!
> 
> 
>  --
>  Sincerely,
>  Changbin
>  --
> 
> [[alternative HTML version deleted]]
> 
>  __
>  R-help@r-project.org mailing list
>  https://stat.ethz.ch/mailman/listinfo/r-help
>  PLEASE do read the posting guide
>  http://www.R-project.org/posting-guide.html
>  and provide commented, minimal, self-contained, reproducible code.
> 
> 
> >>
> >>
> >> --
> >> Sincerely,
> >> Changbin
> >> --
> >>
> >> Changbin Du
> >> DOE Joint Genome Institute
> >> Bldg 400 Rm 457
> >> 2800 Mitchell Dr
> >> Walnut Creet, CA 94598
> >> Phone: 925-927-2856
> >>
> >>
> >>
> >
> >
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>



-- 
Sincerely,
Changbin
--

Changbin Du
DOE Joint Genome Institute
Bldg 400 Rm 457
2800 Mitchell Dr
Walnut Creet, CA 94598
Phone: 925-927-2856

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] About 5.1 Arrays

2010-11-05 Thread RICHARD M. HEIBERGER

Continuing with Daniel's example, but with different data values



a <- sample(24)
a
dim(a) <- c(3,4,2)
a
as.vector(a)

## for an array with
## dim(a) == c(3,4,2)
## a[i,j,k] means select the element in position
##i + (j-1)*3 + (k-1)*3*4

index <- function(i,j,k) {
   i + (j-1)*3 + (k-1)*3*4
}

## find the vector position described by row 2, column 1, layer 2
index(2,1,2)## this is the position in the original vector
a[2,1,2]## this is the value in that position with 3D indexing
a[index(2,1,2)] ## this is the same value with 1D vector indexing
a[14]   ## this is the same value with 1D vector indexing

## find the position in row 3, column 4, layer 1
index(3,4,1)## this is the position in the original vector
a[3,4,1]## this is the value in that position with 3D indexing
a[index(3,4,1)] ## this is the same value with 1D vector indexing
a[12]   ## this is the same value with 1D vector indexing


index(1,1,1)## this is the position in the original vector
index(2,1,1)## this is the position in the original vector
index(3,1,1)## this is the position in the original vector
index(1,2,1)## this is the position in the original vector
index(2,2,1)## this is the position in the original vector
index(3,2,1)## this is the position in the original vector
index(1,3,1)## this is the position in the original vector
index(2,3,1)## this is the position in the original vector
index(3,3,1)## this is the position in the original vector
index(1,4,1)## this is the position in the original vector
index(2,4,1)## this is the position in the original vector
index(3,4,1)## this is the position in the original vector
index(1,1,2)## this is the position in the original vector
index(2,1,2)## this is the position in the original vector
index(3,1,2)## this is the position in the original vector
index(1,2,2)## this is the position in the original vector
index(2,2,2)## this is the position in the original vector
index(3,2,2)## this is the position in the original vector
index(1,3,2)## this is the position in the original vector
index(2,3,2)## this is the position in the original vector
index(3,3,2)## this is the position in the original vector
index(1,4,2)## this is the position in the original vector
index(2,4,2)## this is the position in the original vector
index(3,4,2)## this is the position in the original vector

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] About 5.1 Arrays

2010-11-05 Thread Joshua Wiley

On Fri, Nov 5, 2010 at 9:17 AM, Stephen Liu  wrote:
> Hi Daniel,
>
> Thanks for your detail advice.  I completely understand your explain.
>
> But I can't resolve what does "a" stand for there?

the "a" just represents some vector.  It is the name of the object
that stores your data.  Like you might tell someone to go look in a
book to find some information.

>
> a[1,1,1] is 1 * 1 * 1 = 1
> a[2,1,1] is 2 * 1 * 1 = 2
> a[2,4,2] is 2 * 4 * 2 = 16
> a[3,4,2] is 3 * 4 * 2 = 24

That is the basic idea, but it may not be the most helpful way to
think of it because it depends on the length of the each dimension.
For example

a[1, 2, 1] is not 1 * 2 * 1 = 2
a[1, 1, 2] is not 1 * 1 * 2 = 2

in the little 3d array I show below, it would actually be

a[1, 2, 1] = 4
a[1, 1, 2] = 13

>
> ?
>
>
> B.R.
> Stephen L
>

> - Original Message 
> From: Daniel Nordlund 
> To: r-help@r-project.org
> Sent: Fri, November 5, 2010 11:54:15 PM
> Subject: Re: [R] About 5.1 Arrays
>
>> -Original Message-
>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
>> On Behalf Of Stephen Liu
>> Sent: Friday, November 05, 2010 7:57 AM
>> To: Steve Lianoglou
>> Cc: r-help@r-project.org
>> Subject: Re: [R] About 5.1 Arrays
>>
>> Hi Steve,
>>
>> > It's not clear what you're having problems understanding. By
>> > setting the "dim" attribute of your (1d) vector, you are changing
>> > itsdimenensions.
>>
>> I'm following An Introduction to R to learn R
>>
>> On
>>
>> 5.1 Arrays
>> http://cran.r-project.org/doc/manuals/R-intro.html#Vectors-and-assignment
>>
>>
>> It mentions:-
>> ...
>> For example if the dimension vector for an array, say a, is c(3,4,2) then
>> there
>> are 3 * 4 * 2 = 24 entries in a and the data vector holds them in the
>> order
>> a[1,1,1], a[2,1,1], ..., a[2,4,2], a[3,4,2].
>>
>>
>> I don't understand "on  =24 entries in a and the data vector holds
>> them in
>> the order a[1,1,1], a[2,1,1], ..., a[2,4,2], a[3,4,2]."  the order
>> a[1,1,1],
>> a[2,1,1], ..., a[2,4,2], a[3,4,2]?  What does it mean "the order a[1,1,1],
>> a[2,1,1], ..., a[2,4,2], a[3,4,2]"?

because it is actually stored as a 1 dimensional vector, it is just
telling you the order.  For example, given some vector "a" that
contains the numbers 1 through 24, you could reshape this into a three
dimensional object.  It would be stored like:

# make a vector "a" and an array (built from "a") called a3d
> a <- 1:24
> a3d <- array(a, dim = c(3, 4, 2))
> a
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
> a3d
, , 1 <--- this is the first position of the third dimension

 [,1] [,2] [,3] [,4]  <--- positions 1, 2, 3, 4 of the second dimension
[1,]147   10
[2,]258   11
[3,]369   12
 ^  the first dimension

, , 2 <--- the second position of the third dimension

 [,1] [,2] [,3] [,4]
[1,]   13   16   19   22
[2,]   14   17   20   23
[3,]   15   18   21   24

a[1, 1, 1] is the first element of dimension 1, the first element of
dimension 2, and the first element of dimension 3. so 1.
a[2, 1, 1] is the *second* element of dimension 1, the first element
of dimension 2, and the first element of dimension 3. so 2
a[3, 4, 2] is the third element of dimension 1, the fourth element of
dimension 2, and the second element of dimension 3. so 24.

so you can think that in the original vector "a":
1 maps to a[1, 1, 1] in the 3d array
2 maps to a[2, 1, 1].
3 maps to a[3, 1, 1]
4 maps to a[1, 2, 1]
12 maps to a[3, 4, 1]
20 maps to a[2, 3, 2]
24 maps to a[3, 4, 2]

>>
>> Thanks
>>
>> B.R.
>> Stephen

-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to work with long vectors

2010-11-05 Thread Martin Morgan

On 11/05/2010 09:42 AM, Martin Morgan wrote:

> ## first time only
> source("http://bioconductor.org";)

oops, source("http://bioconductor.org/biocLite.R";)

> biocLite("IRanges")
> 
> ##
> library(IRanges)
> contigs = IRanges(start=1, width=matt$reads)
> cvg = coverage(contigs) ## an RLE summarizing coverage, from position 1
> as.vector(cvg[matt$reads]) / nrow(matt)  ## at the end of each contig
> 
> for a larger data set:
> 
>> matt=data.frame(reads=ceiling(as.integer(runif(3384766, 1, 100
>> contigs = IRanges(start=1, width=matt$reads)
>> system.time(cvg <- coverage(contigs))
>user  system elapsed
>   5.145   0.050   5.202
> 
> Martin
> 
>
>
> Thanks so much!
>
>
> --
> Sincerely,
> Changbin
> --
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>>>
>>>
>>> --
>>> Sincerely,
>>> Changbin
>>> --
>>>
>>> Changbin Du
>>> DOE Joint Genome Institute
>>> Bldg 400 Rm 457
>>> 2800 Mitchell Dr
>>> Walnut Creet, CA 94598
>>> Phone: 925-927-2856
>>>
>>>
>>>
>>
>>
> 
> 


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to work with long vectors

2010-11-05 Thread Martin Morgan

On 11/05/2010 09:13 AM, Changbin Du wrote:
> HI, Phil,
> 
> I used the following codes and run it overnight for 15 hours, this morning,
> I stopped it. It seems it is still not efficient.
> 
> 
>>
> matt<-read.table("/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth",
> sep="\t", skip=0, header=F,fill=T) #
>> names(matt)<-c("id","reads")
> 
>> dim(matt)
> [1] 3384766   2

[snip]

>>> On Thu, 4 Nov 2010, Changbin Du wrote:
>>>
>>>  HI, Dear R community,

 I have one data set like this,  What I want to do is to calculate the
 cumulative coverage. The following codes works for small data set (#rows
 =
 100), but when feed the whole data set,  it still running after 24 hours.
 Can someone give some suggestions for long vector?

 idreads
 Contig79:14
 Contig79:28
 Contig79:313
 Contig79:414
 Contig79:517
 Contig79:620
 Contig79:725
 Contig79:827
 Contig79:932
 Contig79:1033
 Contig79:1134


 matt<-read.table("/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth",
 sep="\t", skip=0, header=F,fill=T) #
 dim(matt)
 [1] 3384766   2

 matt_plot<-function(matt, outputfile) {
 names(matt)<-c("id","reads")

 cover<-matt$reads


 #calculate the cumulative coverage.
 + cover_per<-function (data) {
 + output<-numeric(0)
 + for (i in data) {
 +   x<-(100*sum(ifelse(data >= i, 1, 0))/length(data))
 +   output<-c(output, x)
 + }
 + return(output)
 + }


 result<-cover_per(cover)

Hi Changbin

If I understand correctly, your contigs 'start' at position 1, and have
'width' equal to matt$reads. You'd like to know the coverage at the last
covered location of each contig in matt$reads.

## first time only
source("http://bioconductor.org";)
biocLite("IRanges")

##
library(IRanges)
contigs = IRanges(start=1, width=matt$reads)
cvg = coverage(contigs) ## an RLE summarizing coverage, from position 1
as.vector(cvg[matt$reads]) / nrow(matt)  ## at the end of each contig

for a larger data set:

> matt=data.frame(reads=ceiling(as.integer(runif(3384766, 1, 100
> contigs = IRanges(start=1, width=matt$reads)
> system.time(cvg <- coverage(contigs))
   user  system elapsed
  5.145   0.050   5.202

Martin



 Thanks so much!


 --
 Sincerely,
 Changbin
 --

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


>>
>>
>> --
>> Sincerely,
>> Changbin
>> --
>>
>> Changbin Du
>> DOE Joint Genome Institute
>> Bldg 400 Rm 457
>> 2800 Mitchell Dr
>> Walnut Creet, CA 94598
>> Phone: 925-927-2856
>>
>>
>>
> 
> 


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to extract particular rows and column from a table

2010-11-05 Thread Mauluda Akhtar

Hello,
I'm a new user of R. I've a very big table like the following structure
(suppose the variable name with "aa"). From this table I want to make a new
table which'll contain just two column with V2 and V6 with some particular
rows( Suppose, variable name with "bb"). I'd like to mention V2 column is
representing the id  that correspond to the column V6 whis is represention
the base position of DNA. In this bb table, just for an example I want to
extract all the corresponding rows of V2 column where in V6 column there is
"30049831" (in my table there is repeatation of same base position). I tried
this but faild to solve.
Could you please let me know how can i solve this.

Thank you.
Mauluda


V1   V2
V3V4  V5   V6  V7V8   V9 V10  V11
ESMEHEP0102102796h05.w2kF59780SCF:32  CpGVariation6
3004983130049831+.-1NA
ESMEHEP0102102796h05.w2kF59780SCF:114CpGVariation6
3004991330049913+.31NA
ESMEHEP0102102796h05.w2kF59780SCF:154CpGVariation6
3004995330049953+.48NA
ESMEHEP0102102796h05.w2kF59780SCF:170CpGVariation6
3004996930049969+.30NA
ESMEHEP0102102796h05.w2kF59780SCF:172CpGVariation6
3004997130049971+.38NA
ESMEHEP0102102796h05.w2kF59780SCF:245CpGVariation6
3005004430050044+.14NA
ESMEHEP0102102796h05.w2kF59780SCF:363CpGVariation6
3005016230050162+.0NA
ESMEHEP0102102796h05.w2kF59780SCF:382CpGVariation6
3005018130050181+.1NA
ESMEHEP0102102796a04.w2kF59780SCF:114CpGVariation6
3004991330049913+.25NA
ESMEHEP0102102796a04.w2kF59780SCF:154CpGVariation6
3004995330049953+.28NA
ESMEHEP0102102796a04.w2kF59780SCF:170CpGVariation6
3004996930049969+.28NA
ESMEHEP0102102796a04.w2kF59780SCF:172CpGVariation6
3004997130049971+.45NA
ESMEHEP0102102796a04.w2kF59780SCF:245CpGVariation6
3005004430050044+.29NA
ESMEHEP0102102796a04.w2kF59780SCF:363CpGVariation6
3005016230050162+.0NA
ESMEHEP0102102796a04.w2kF59780SCF:382CpGVariation6
3005018130050181+.8NA
ESMEHEP0102102796e06.w2kF59780SCF:114CpGVariation6
3004991330049913+.20NA
ESMEHEP0102102796e06.w2kF59780SCF:154CpGVariation6
3004995330049953+.28NA
ESMEHEP0102102796e06.w2kF59780SCF:170CpGVariation6
3004996930049969+.44NA
ESMEHEP0102102796e06.w2kF59780SCF:172CpGVariation6
3004997130049971+.-1NA
ESMEHEP0102102796e06.w2kF59780SCF:245CpGVariation6
3005004430050044+.22NA
ESMEHEP0102102796e06.w2kF59780SCF:363CpGVariation6
3005016230050162+.0NA
ESMEHEP0102102796e06.w2kF59780SCF:382CpGVariation6
3005018130050181+.0NA
ESMEHEP0102102788c04.w2kF59780SCF:32  CpGVariation6
3004983130049831+.-1NA
ESMEHEP0102102788c04.w2kF59780SCF:114CpGVariation6
3004991330049913+.38NA
ESMEHEP0102102788c04.w2kF59780SCF:154CpGVariation6
3004995330049953+.31NA
ESMEHEP0102102788c04.w2kF59780SCF:170CpGVariation6
3004996930049969+.54NA
ESMEHEP0102102788c04.w2kF59780SCF:172CpGVariation6
3004997130049971+.36NA
ESMEHEP0102102788c04.w2kF59780SCF:245CpGVariation6
3005004430050044+.27NA
ESMEHEP0102102788c04.w2kF59780SCF:363CpGVariation6
3005016230050162+.0NA
ESMEHEP0102102788c04.w2kF59780SCF:382CpGVariation6
3005018130050181+.4NA
ESMEHEP0102102796d06.w2kF59780SCF:32CpGVariation6
3004983130049831+.-1NA
ESMEHEP0102102796d06.w2kF59780SCF:114CpGVariation6
3004991330049913+.0NA
ESMEHEP0102102796d06.w2kF59780SCF:154CpGVariation6
3004995330049953+.15NA
ESMEHEP0102102796d06.w2kF59780SCF:170CpGVariation6
3004996930049969+.16NA
ESMEHEP0102102796d06.w2kF59780SCF:172CpGVariation6
3004997130049971+.16NA
ESMEHEP0102102796d06.w2kF59780SCF:245CpGVariation6
3005004430050044+.21NA
ESMEHEP0102102796d06.w2kF59780SCF:363CpGVariation6
3005016230050162+.0NA
ESMEHEP0102102796d06.w2kF59780SCF:382CpGVariation6
3005018130050181+.0NA
ESMEHEP

Re: [R] About 5.1 Arrays

2010-11-05 Thread Stephen Liu

Hi Daniel,

Thanks for your detail advice.  I completely understand your explain.

But I can't resolve what does "a" stand for there?

a[1,1,1] is 1 * 1 * 1 = 1
a[2,1,1] is 2 * 1 * 1 = 2
a[2,4,2] is 2 * 4 * 2 = 16
a[3,4,2] is 3 * 4 * 2 = 24

?


B.R.
Stephen L







- Original Message 
From: Daniel Nordlund 
To: r-help@r-project.org
Sent: Fri, November 5, 2010 11:54:15 PM
Subject: Re: [R] About 5.1 Arrays

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of Stephen Liu
> Sent: Friday, November 05, 2010 7:57 AM
> To: Steve Lianoglou
> Cc: r-help@r-project.org
> Subject: Re: [R] About 5.1 Arrays
> 
> Hi Steve,
> 
> > It's not clear what you're having problems understanding. By
> > setting the "dim" attribute of your (1d) vector, you are changing
> > itsdimenensions.
> 
> I'm following An Introduction to R to learn R
> 
> On
> 
> 5.1 Arrays
> http://cran.r-project.org/doc/manuals/R-intro.html#Vectors-and-assignment
> 
> 
> It mentions:-
> ...
> For example if the dimension vector for an array, say a, is c(3,4,2) then
> there
> are 3 * 4 * 2 = 24 entries in a and the data vector holds them in the
> order
> a[1,1,1], a[2,1,1], ..., a[2,4,2], a[3,4,2].
> 
> 
> I don't understand "on  =24 entries in a and the data vector holds
> them in
> the order a[1,1,1], a[2,1,1], ..., a[2,4,2], a[3,4,2]."  the order
> a[1,1,1],
> a[2,1,1], ..., a[2,4,2], a[3,4,2]?  What does it mean "the order a[1,1,1],
> a[2,1,1], ..., a[2,4,2], a[3,4,2]"?
> 
> Thanks
> 
> B.R.
> Stephen
> 
> 

Stephen,

Start with a vector of length = 12.  The vector, v, is stored in consecutive 
locations in memory, one after the other.  And 


> v <- 1:12
> v
[1]  1  2  3  4  5  6  7  8  9 10 11 12

Now change then change the dimension of v to c(3,4), i.e. a matrix with 3 rows 
and 4 columns.  


> dim(v) <- c(3,4)
> v
 [,1] [,2] [,3] [,4]
[1,]147   10
[2,]258   11
[3,]369   12

The values of v are still stored in memory in consecutive locations.  But now 
you refer to the first location as v[1,1], the second as v[2,1], third as 
v[3,1] 
... and the 12th as v[3,4].  We sometimes talk about the values "going into" 
v[1,1] or more generally, v[i,j], but the values aren't going anywhere.  They 
are still stored in consecutive locations.  We are just changing how they are 
referred to when we change the dimensions.

So in the 2-dimensional matrix above, the values of the vector v "go into" the 
matrix in column order, i.e. the first column is filled first, then the second, 
...  


Now, create a 24 element vector.

> v <- 1:24
> v
[1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Change the dimensions to a 3-dimensional array.

> dim(v) <- c(3,4,2)
> v
, , 1

 [,1] [,2] [,3] [,4]
[1,]147   10
[2,]258   11
[3,]369   12

, , 2

 [,1] [,2] [,3] [,4]
[1,]   13   16   19   22
[2,]   14   17   20   23
[3,]   15   18   21   24

You can visualize a 3-dimensional array as a series of 2-dimensional arrays 
stacked on top of each other.  But this is just a convenient image.  The items 
are still stored consecutively in memory.  Notice that layer one in the stack 
was "filled" first, and the first layer was "filled" just like the previous 
2-dimensional example.  But the items are still physically stored linearly, in 
consecutive locations in memory.

Hope this is helpful,

Dan

Daniel Nordlund
Bothell, WA USA


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to work with long vectors

2010-11-05 Thread Changbin Du

HI, Phil,

I used the following codes and run it overnight for 15 hours, this morning,
I stopped it. It seems it is still not efficient.


>
matt<-read.table("/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth",
sep="\t", skip=0, header=F,fill=T) #
> names(matt)<-c("id","reads")

> dim(matt)
[1] 3384766   2

>  cover<-matt$reads

> cover_per_2 <- function(data){
+   l = length(data)
+   output = numeric(l)
+   for(i in 1:l)output[i] = sum(data >= data[i])
+   100 * output / l
+ }

> result3<-cover_per_2(cover)












On Thu, Nov 4, 2010 at 10:37 AM, Changbin Du  wrote:

> Thanks Phil, that is great! I WILL try this and let you know how it goes.
>
>
>
>
> On Thu, Nov 4, 2010 at 10:16 AM, Phil Spector 
> wrote:
>
>> Changbin -
>>   Does
>>
>>100 * sapply(matt$reads,function(x)sum(matt$reads >=
>> x))/length(matt$reads)
>>
>> give what you want?
>>
>>By the way, if you want to use a loop (there's nothing wrong with
>> that),
>> then try to avoid the most common mistake that people make with loops in
>> R:
>> having your result grow inside the loop.  Here's a better way to use a
>> loop
>> to solve your problem:
>>
>> cover_per_1 <- function(data){
>>   l = length(data)
>>   output = numeric(l)
>>   for(i in 1:l)output[i] = 100 * sum(ifelse(data >= data[i], 1,
>> 0))/length(data)
>>   output
>> }
>>
>> Using some random data, and comparing to your original cover_per function:
>>
>>  dat = rnorm(1000)
>>> system.time(one <- cover_per(dat))
>>>
>>   user  system elapsed
>>  0.816   0.000   0.824
>>
>>> system.time(two <- cover_per_1(dat))
>>>
>>   user  system elapsed
>>  0.792   0.000   0.805
>>
>> Not that big a speedup, but it does increase quite a bit as the problem
>> gets
>> larger.
>>
>> There are two obvious ways to speed up your function:
>>   1)  Eliminate the ifelse function, since automatic coersion from
>>   logical to numeric does the same thing.
>>   2)  Multiply by 100 and divide by the length outside the loop:
>>
>> cover_per_2 <- function(data){
>>   l = length(data)
>>   output = numeric(l)
>>   for(i in 1:l)output[i] = sum(data >= data[i])
>>   100 * output / l
>> }
>>
>>  system.time(three <- cover_per_2(dat))
>>>
>>   user  system elapsed
>>  0.024   0.000   0.027
>>
>> That makes the loop just about equivalent to the sapply solution:
>>
>>  system.time(four <- 100*sapply(dat,function(x)sum(dat >= x))/length(dat))
>>>
>>   user  system elapsed
>>  0.024   0.000   0.026
>>
>>- Phil Spector
>> Statistical Computing Facility
>> Department of Statistics
>> UC Berkeley
>> spec...@stat.berkeley.edu
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, 4 Nov 2010, Changbin Du wrote:
>>
>>  HI, Dear R community,
>>>
>>> I have one data set like this,  What I want to do is to calculate the
>>> cumulative coverage. The following codes works for small data set (#rows
>>> =
>>> 100), but when feed the whole data set,  it still running after 24 hours.
>>> Can someone give some suggestions for long vector?
>>>
>>> idreads
>>> Contig79:14
>>> Contig79:28
>>> Contig79:313
>>> Contig79:414
>>> Contig79:517
>>> Contig79:620
>>> Contig79:725
>>> Contig79:827
>>> Contig79:932
>>> Contig79:1033
>>> Contig79:1134
>>>
>>>
>>> matt<-read.table("/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth",
>>> sep="\t", skip=0, header=F,fill=T) #
>>> dim(matt)
>>> [1] 3384766   2
>>>
>>> matt_plot<-function(matt, outputfile) {
>>> names(matt)<-c("id","reads")
>>>
>>> cover<-matt$reads
>>>
>>>
>>> #calculate the cumulative coverage.
>>> + cover_per<-function (data) {
>>> + output<-numeric(0)
>>> + for (i in data) {
>>> +   x<-(100*sum(ifelse(data >= i, 1, 0))/length(data))
>>> +   output<-c(output, x)
>>> + }
>>> + return(output)
>>> + }
>>>
>>>
>>> result<-cover_per(cover)
>>>
>>>
>>> Thanks so much!
>>>
>>>
>>> --
>>> Sincerely,
>>> Changbin
>>> --
>>>
>>>[[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>
>
> --
> Sincerely,
> Changbin
> --
>
> Changbin Du
> DOE Joint Genome Institute
> Bldg 400 Rm 457
> 2800 Mitchell Dr
> Walnut Creet, CA 94598
> Phone: 925-927-2856
>
>
>


-- 
Sincerely,
Changbin
--

Changbin Du
DOE Joint Genome Institute
Bldg 400 Rm 457
2800 Mitchell Dr
Walnut Creet, CA 9459

Re: [R] Extracting data only for particular index values from a zoo structure

2010-11-05 Thread Gabor Grothendieck

On Fri, Nov 5, 2010 at 11:54 AM, Santosh Srinivas
 wrote:
> Thanks Gabor for pointing in the right direction.
> Looked up cycle and the doc is tough to understand " cycle gives the
> positions in the cycle of each observation." ... how is cycle defined.
>
> I just extended your idea to make it readable for an avg. user in the
> following way
> mRet[format(index(mRet),"%m")==11]
>

cycle is defined in the core of R and is extended for additional
methods by zoo.  For a zoo object applying cycle to zoo applies it to
the zoo object's index.   If the index is a yearmon object, ym, then
cycle(ym) gives the month number at each time, 1 for Jan, 2 for Feb,
etc.  If the index is a yearqtr object, yq, then cycle(yq) gives the
quarter number at each time, 1 for Q1, 2 for Q2, etc.

For a zooreg object cycle will give the position in the cycle at each
time.  If the zooreg object has frequency F then the cycle values will
be 1, 2, ..., F.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] as.xts

2010-11-05 Thread Joshua Wiley

Hi Ela,

as.xts() calls as.POSIXlt() to convert the dates to a date/time class.
 Evidently, the the column that contains your times is not
"unambiguous" to POSIX, that is, the format is not clear.  It is
really impossible to give you much more advice without having some
sample data or what you actually tried.

If you did not specify the "order.by" argument, by default,
as.xts.data.frame() will try the rownames, but these are almost
certainly not in any readable format.  Further, even if you tell it
which column has your times, they need to be in a known format:
‘Date’, ‘POSIXct’, ‘timeDate’, as well as ‘yearmon’ and ‘yearqtr’

Things that would help:

1) provide a small example dataset that mimics your own and
illustrates your problem.  If this is unfeasible, at least provide
str(yourdata)
2) include the code you tried that lead to the error

Without more, my best suggestion is to read:

?xts
?as.xts.data.frame
?as.POSIXct
?as.Date

and depending how familiar you are with R, look at the code from:
getAnywhere(as.xts.data.frame)

HTH,

Josh

On Fri, Nov 5, 2010 at 1:28 AM, spela podgorsek  wrote:
> hey
>
> I am trying to turn a dataframe into xts with the function:
> as.xts,
> but it returns the error:
>
> Error in as.POSIXlt.character(x, tz, ...) :
> character string is not in a standard unambiguous format
>
> could someone give me some pointers please
>
> the data is coming from a spreadsheet via the excel, and has 5 columns
> of data (date (with the date and time), open, high, low, close) (excel
> format)
>
> ela
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extracting data only for particular index values from a zoo structure

2010-11-05 Thread Santosh Srinivas

Thanks Gabor for pointing in the right direction.
Looked up cycle and the doc is tough to understand " cycle gives the
positions in the cycle of each observation." ... how is cycle defined.

I just extended your idea to make it readable for an avg. user in the
following way
mRet[format(index(mRet),"%m")==11]

-Original Message-
From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] 
Sent: 05 November 2010 20:59
To: Santosh Srinivas
Cc: r-help@r-project.org
Subject: Re: [R] Extracting data only for particular index values from a zoo
structure

On Fri, Nov 5, 2010 at 11:21 AM, Santosh Srinivas
 wrote:
> Hello All,
>
> I have a zoo structure as follows:
>> dput(tMRet)
> structure(c(0.00138742474397713, -0.0309023681475112, 0.0390276302410908,
> 0.0832282874685357, -0.00315002033871414, -0.0158548785709138,
> -0.0410876001496389, -0.0503189291168807, 0.00229628598547049,
> 0.112348434473647, 0.0760004696254608, 0.100820586425124,
> 0.0803767768546975,
> 0.0967805566974766, 0.054288018745434, 0.106415042990242,
> 0.0848339767191362,
> -0.0293833917022324, -0.0355384394730908, 0.0398272106900921), .Dim =
c(20L,
>
> 1L), .Dimnames = list(c("Sep 2002", "Oct 2002", "Nov 2002", "Dec 2002",
> "Jan 2003", "Feb 2003", "Mar 2003", "Apr 2003", "May 2003", "Jun 2003",
> "Jul 2003", "Aug 2003", "Sep 2003", "Oct 2003", "Nov 2003", "Dec 2003",
> "Jan 2004", "Feb 2004", "Mar 2004", "Apr 2004"), "Close"), index =
> structure(c(2002.667,
> 2002.75, 2002.833, 2002.917, 2003, 2003.083,
> 2003.167, 2003.25, 2003.333, 2003.417,
> 2003.5, 2003.583, 2003.667, 2003.75, 2003.833,
> 2003.917, 2004, 2004.083, 2004.167, 2004.25
> ), class = "yearmon"), class = "zoo")
>
> I want to extract only the values for say "November" but may span multiple
> years.
> How can I use conditions on the zoo index?
>

Try this:

> tMRet[cycle(tMRet) == 11,, drop = FALSE]
  Close
Nov 2002 0.03902763
Nov 2003 0.05428802

If you just want a 1 dimensional object omit the drop=FALSE part.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] About 5.1 Arrays

2010-11-05 Thread Daniel Nordlund

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of Stephen Liu
> Sent: Friday, November 05, 2010 7:57 AM
> To: Steve Lianoglou
> Cc: r-help@r-project.org
> Subject: Re: [R] About 5.1 Arrays
> 
> Hi Steve,
> 
> > It's not clear what you're having problems understanding. By
> > setting the "dim" attribute of your (1d) vector, you are changing
> > itsdimenensions.
> 
> I'm following An Introduction to R to learn R
> 
> On
> 
> 5.1 Arrays
> http://cran.r-project.org/doc/manuals/R-intro.html#Vectors-and-assignment
> 
> 
> It mentions:-
> ...
> For example if the dimension vector for an array, say a, is c(3,4,2) then
> there
> are 3 * 4 * 2 = 24 entries in a and the data vector holds them in the
> order
> a[1,1,1], a[2,1,1], ..., a[2,4,2], a[3,4,2].
> 
> 
> I don't understand "on  =24 entries in a and the data vector holds
> them in
> the order a[1,1,1], a[2,1,1], ..., a[2,4,2], a[3,4,2]."  the order
> a[1,1,1],
> a[2,1,1], ..., a[2,4,2], a[3,4,2]?  What does it mean "the order a[1,1,1],
> a[2,1,1], ..., a[2,4,2], a[3,4,2]"?
> 
> Thanks
> 
> B.R.
> Stephen
> 
> 

Stephen,

Start with a vector of length = 12.  The vector, v, is stored in consecutive 
locations in memory, one after the other.  And 

> v <- 1:12
> v
 [1]  1  2  3  4  5  6  7  8  9 10 11 12

Now change then change the dimension of v to c(3,4), i.e. a matrix with 3 rows 
and 4 columns.  
 
> dim(v) <- c(3,4)
> v
 [,1] [,2] [,3] [,4]
[1,]147   10
[2,]258   11
[3,]369   12

The values of v are still stored in memory in consecutive locations.  But now 
you refer to the first location as v[1,1], the second as v[2,1], third as 
v[3,1] ... and the 12th as v[3,4].  We sometimes talk about the values "going 
into" v[1,1] or more generally, v[i,j], but the values aren't going anywhere.  
They are still stored in consecutive locations.  We are just changing how they 
are referred to when we change the dimensions.

So in the 2-dimensional matrix above, the values of the vector v "go into" the 
matrix in column order, i.e. the first column is filled first, then the second, 
...  

Now, create a 24 element vector.

> v <- 1:24
> v
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Change the dimensions to a 3-dimensional array.

> dim(v) <- c(3,4,2)
> v
, , 1

 [,1] [,2] [,3] [,4]
[1,]147   10
[2,]258   11
[3,]369   12

, , 2

 [,1] [,2] [,3] [,4]
[1,]   13   16   19   22
[2,]   14   17   20   23
[3,]   15   18   21   24

You can visualize a 3-dimensional array as a series of 2-dimensional arrays 
stacked on top of each other.  But this is just a convenient image.  The items 
are still stored consecutively in memory.  Notice that layer one in the stack 
was "filled" first, and the first layer was "filled" just like the previous 
2-dimensional example.  But the items are still physically stored linearly, in 
consecutive locations in memory.

Hope this is helpful,

Dan

Daniel Nordlund
Bothell, WA USA
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] improve R memory under linux

2010-11-05 Thread Jonathan P Daily

A "very small script" should fit just fine in an email: what are you 
trying to do?

Likely, you are assigning many small variables in some loop. Even if you 
have 4GB of RAM available, if R assigns 3.99 GB of it and then then a call 
comes in to assign something of size .02, it will tell you it can't 
allocate an object of size .02.
--
Jonathan P. Daily
Technician - USGS Leetown Science Center
11649 Leetown Road
Kearneysville WV, 25430
(304) 724-4480
"Is the room still a room when its empty? Does the room,
 the thing itself have purpose? Or do we, what's the word... imbue it."
 - Jubal Early, Firefly



From:
ricardo souza 
To:
r-help@r-project.org
Date:
11/05/2010 11:29 AM
Subject:
[R] improve R memory under linux
Sent by:
r-help-boun...@r-project.org



Dear all,

I am using ubuntu linux 32 with 4 Gb.  I am running a very small script 
and I always got the same error message:  CAN NOT ALLOCATE A VECTOR OF 
SIZE 231.8 Mb.

I have reading carefully the instruction in ?Memory.  Using the function 
gc() I got very low numbers of memory (please sea below).  I know that it 
has been posted several times at r-help (
http://tolstoy.newcastle.edu.au/R/help/05/06/7565.html#7627qlink2).  
However I did not find yet the solution to improve my memory issue in 
Linux.  Somebody cold please give some instruction how to improve my 
memory under linux? 

> gc()
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 170934  4.6 35  9.4   35  9.4
Vcells 195920  1.5 786432  6.0   781384  6.0

INCREASING THE R MEMORY FOLLOWING THE INSTRUCTION IN  ?Memory

I started R with:

R --min-vsize=10M --max-vsize=4G --min-nsize=500k --max-nsize=900M
> gc()
 used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells 130433  3.5 50 13.4  25200   50 13.4
Vcells  81138  0.71310720 10.0 NA   499143  3.9

It increased but not so much! 

Please, please let me know.  I have read all r-help about this matter, but 
not solution. Thanks for your attention!

Ricardo


 
 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] boot.stepAIC NA values

2010-11-05 Thread Martin McCabe


Hi.

I have a coxph model of variables linked to survival in  
medulloblastoma.  The data were collated from various publications and  
not all authors quoted all variables.  I'd like to internally validate  
the model and have tried bootstrapping it using boot.stepAIC but it  
fails because of the NA values.  If I remove all samples with any  
missing values my dataset falls from n=227 to n=65 and, not- 
surprisingly, only the most glaringly obvious variables remain in the  
final model.  Is there a way to get the boot-strapping and variable- 
picking process to include samples with NA values, even though it's  
not statistically the most correct approach?


Martin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] About 5.1 Arrays

2010-11-05 Thread Steve Lianoglou

Hi,

On Fri, Nov 5, 2010 at 10:56 AM, Stephen Liu  wrote:
> Hi Steve,
>
>> It's not clear what you're having problems understanding. By
>> setting the "dim" attribute of your (1d) vector, you are changing
>> itsdimenensions.
>
> I'm following An Introduction to R to learn R
>
> On
>
> 5.1 Arrays
> http://cran.r-project.org/doc/manuals/R-intro.html#Vectors-and-assignment
>
>
> It mentions:-
> ...
> For example if the dimension vector for an array, say a, is c(3,4,2) then 
> there
> are 3 * 4 * 2 = 24 entries in a and the data vector holds them in the order
> a[1,1,1], a[2,1,1], ..., a[2,4,2], a[3,4,2].
>
>
> I don't understand "on  =24 entries in a and the data vector holds them in
> the order a[1,1,1], a[2,1,1], ..., a[2,4,2], a[3,4,2]."  the order a[1,1,1],
> a[2,1,1], ..., a[2,4,2], a[3,4,2]?  What does it mean "the order a[1,1,1],
> a[2,1,1], ..., a[2,4,2], a[3,4,2]"?

Let's just stick with a 2d matrix -- it's easier to think about.

I'm not sure if you are coming from a different programming language
or not, so perhaps this isn't helpful if you don't, but you might
imagine holding data for a 2d matrix in an 'array of arrays'
structure.

R doesn't do this. It holds the data for a 1d, 2d, 3d, ... 10d array
in a 1d vector. The data is stored in "column major" format, so the
rows of a 2d matrix are filled first.

If I have a 2d matrix like this:

1   2   3   4
5   6   7   8

R holds this in a 1d vector/array that looks like this:

1, 5, 2, 6, 3, 7, 4, 8

This idea follows through to higher dimensions.

Hope that helps,

-steve
-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extracting data only for particular index values from a zoo structure

2010-11-05 Thread Gabor Grothendieck

On Fri, Nov 5, 2010 at 11:21 AM, Santosh Srinivas
 wrote:
> Hello All,
>
> I have a zoo structure as follows:
>> dput(tMRet)
> structure(c(0.00138742474397713, -0.0309023681475112, 0.0390276302410908,
> 0.0832282874685357, -0.00315002033871414, -0.0158548785709138,
> -0.0410876001496389, -0.0503189291168807, 0.00229628598547049,
> 0.112348434473647, 0.0760004696254608, 0.100820586425124,
> 0.0803767768546975,
> 0.0967805566974766, 0.054288018745434, 0.106415042990242,
> 0.0848339767191362,
> -0.0293833917022324, -0.0355384394730908, 0.0398272106900921), .Dim = c(20L,
>
> 1L), .Dimnames = list(c("Sep 2002", "Oct 2002", "Nov 2002", "Dec 2002",
> "Jan 2003", "Feb 2003", "Mar 2003", "Apr 2003", "May 2003", "Jun 2003",
> "Jul 2003", "Aug 2003", "Sep 2003", "Oct 2003", "Nov 2003", "Dec 2003",
> "Jan 2004", "Feb 2004", "Mar 2004", "Apr 2004"), "Close"), index =
> structure(c(2002.667,
> 2002.75, 2002.833, 2002.917, 2003, 2003.083,
> 2003.167, 2003.25, 2003.333, 2003.417,
> 2003.5, 2003.583, 2003.667, 2003.75, 2003.833,
> 2003.917, 2004, 2004.083, 2004.167, 2004.25
> ), class = "yearmon"), class = "zoo")
>
> I want to extract only the values for say "November" but may span multiple
> years.
> How can I use conditions on the zoo index?
>

Try this:

> tMRet[cycle(tMRet) == 11,, drop = FALSE]
  Close
Nov 2002 0.03902763
Nov 2003 0.05428802

If you just want a 1 dimensional object omit the drop=FALSE part.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] improve R memory under linux

2010-11-05 Thread ricardo souza

Dear all,

I am using ubuntu linux 32 with 4 Gb.  I am running a very small script and I 
always got the same error message:  CAN NOT ALLOCATE A VECTOR OF SIZE 231.8 Mb.

I have reading carefully the instruction in ?Memory.  Using the function gc() I 
got very low numbers of memory (please sea below).  I know that it has been 
posted several times at r-help 
(http://tolstoy.newcastle.edu.au/R/help/05/06/7565.html#7627qlink2).  However I 
did not find yet the solution to improve my memory issue in Linux.  Somebody 
cold please give some instruction how to improve my memory under linux? 

> gc()
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 170934  4.6 35  9.4   35  9.4
Vcells 195920  1.5 786432  6.0   781384  6.0

INCREASING THE R MEMORY FOLLOWING THE INSTRUCTION IN  ?Memory

I started R with:

R --min-vsize=10M --max-vsize=4G --min-nsize=500k --max-nsize=900M
> gc()
 used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells 130433  3.5 50 13.4  25200   50 13.4
Vcells  81138  0.7    1310720 10.0 NA   499143  3.9

It increased but not so much! 

Please, please let me know.  I have read all r-help about this matter, but not 
solution. Thanks for your attention!

Ricardo


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Extracting data only for particular index values from a zoo structure

2010-11-05 Thread Santosh Srinivas

Hello All,

I have a zoo structure as follows:
> dput(tMRet)
structure(c(0.00138742474397713, -0.0309023681475112, 0.0390276302410908, 
0.0832282874685357, -0.00315002033871414, -0.0158548785709138, 
-0.0410876001496389, -0.0503189291168807, 0.00229628598547049, 
0.112348434473647, 0.0760004696254608, 0.100820586425124,
0.0803767768546975, 
0.0967805566974766, 0.054288018745434, 0.106415042990242,
0.0848339767191362, 
-0.0293833917022324, -0.0355384394730908, 0.0398272106900921), .Dim = c(20L,

1L), .Dimnames = list(c("Sep 2002", "Oct 2002", "Nov 2002", "Dec 2002", 
"Jan 2003", "Feb 2003", "Mar 2003", "Apr 2003", "May 2003", "Jun 2003", 
"Jul 2003", "Aug 2003", "Sep 2003", "Oct 2003", "Nov 2003", "Dec 2003", 
"Jan 2004", "Feb 2004", "Mar 2004", "Apr 2004"), "Close"), index =
structure(c(2002.667, 
2002.75, 2002.833, 2002.917, 2003, 2003.083, 
2003.167, 2003.25, 2003.333, 2003.417, 
2003.5, 2003.583, 2003.667, 2003.75, 2003.833, 
2003.917, 2004, 2004.083, 2004.167, 2004.25
), class = "yearmon"), class = "zoo")

I want to extract only the values for say "November" but may span multiple
years.
How can I use conditions on the zoo index?

For info, I did an aggregate over daily data to reach here.

Have a great weekend!

Thanks.
S

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] About 5.1 Arrays

2010-11-05 Thread Stephen Liu

Hi Steve,

> It's not clear what you're having problems understanding. By
> setting the "dim" attribute of your (1d) vector, you are changing 
> itsdimenensions.

I'm following An Introduction to R to learn R

On

5.1 Arrays
http://cran.r-project.org/doc/manuals/R-intro.html#Vectors-and-assignment


It mentions:-
...
For example if the dimension vector for an array, say a, is c(3,4,2) then there 
are 3 * 4 * 2 = 24 entries in a and the data vector holds them in the order 
a[1,1,1], a[2,1,1], ..., a[2,4,2], a[3,4,2]. 


I don't understand "on  =24 entries in a and the data vector holds them in 
the order a[1,1,1], a[2,1,1], ..., a[2,4,2], a[3,4,2]."  the order a[1,1,1], 
a[2,1,1], ..., a[2,4,2], a[3,4,2]?  What does it mean "the order a[1,1,1], 
a[2,1,1], ..., a[2,4,2], a[3,4,2]"?

Thanks

B.R.
Stephen





- Original Message 
From: Steve Lianoglou 
To: Stephen Liu 
Cc: Gerrit Eichner ; r-help@r-project.org
Sent: Fri, November 5, 2010 10:18:18 PM
Subject: Re: [R] About 5.1 Arrays

Hi,

On Fri, Nov 5, 2010 at 6:00 AM, Stephen Liu  wrote:
[snip]
>> A[i, j, k] is the value of the element in position (i,j,k) of array A. In
>> other words, it is the entry in row i, column j, and "layer" k (if one
>> wants to think of A as a cuboidal grid).
>
> Sorry I can't follow.  Could you pls explain in more detail.
>
> e.g.
>
>> z <- 0:23
>> dim(z) <- c(3,4,2)
>> dim(z)
> [1] 3 4 2
>
>
>> z
> , , 1
>
> [,1] [,2] [,3] [,4]
> [1,]0369
> [2,]147   10
> [3,]258   11
>
> , , 2
>
> [,1] [,2] [,3] [,4]
> [1,]   12   15   18   21
> [2,]   13   16   19   22
> [3,]   14   17   20   23

It's not clear what you're having problems understanding. By setting
the "dim" attribute of your (1d) vector, you are changing its
dimenensions.

## This is a 1d vector
R> x <- 1:12
R> x

## I can change it into 2d (like a matrix), let's do 2 rows, 6 columns
R> dim(x) <- c(2,6)
R>  [,1] [,2] [,3] [,4] [,5] [,6]
[1,]13579   11
[2,]2468   10   12

If you understand that using three numbers to set the dimension means
you are making a 3d matrix

R> dim(x) <- c(2,3,2)

But the problem is you can't draw 3d in a terminal, so it just draws
the third dimension in order

R> x

x
, , 1

 [,1] [,2] [,3]
[1,]135
[2,]246

, , 2

 [,1] [,2] [,3]
[1,]79   11
[2,]8   10   12

###

Imagine this as a cube: ,,1 is the front layer, ,,2 is the back layer.

Just chew on it for a minute, it'll make sense.

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] About installing RBloomberg

2010-11-05 Thread Stephen Liu

Hi folks,

Debian 600 64bit desktop

> sudo R
Password:
> install.packages("RBloomberg", repos="http://R-Forge.R-project.org";)


* DONE (zoo)
ERROR: dependencies ‘rcom’, ‘bitops’, ‘RUnit’ are not available for package 
‘RBloomberg’
* removing ‘/usr/local/lib/R/site-library/RBloomberg’

Failed


> install.packages("RBloomberg", "rcom", "bitops", "RUnit", 
>repos="http://R-Forge.R-project.org";)
Warning in install.packages("RBloomberg", "rcom", "bitops", "RUnit", repos = 
"http://R-Forge.R-project.org";) :
  'lib = "rcom"' is not writable
Would you like to create a personal library
'~/R/x86_64-pc-linux-gnu-library/2.11'
to install packages into?  (y/n) 


Whether I need to create a personal library


However I found rcom, bitops and RUnit on Debian repo;

$ apt-cache search bitops | grep r-cran
r-cran-bitops - GNU R package implementing bitwise operations
 
$ apt-cache search RUnit | grep r-cran
r-cran-runit - GNU R package providing unit testing framework

$ apt-cache search rcom | grep r-cran
r-cran-rcpp - GNU R / C++ interface classes and examples

Can I install them on Debian repo?  TIA

B.R.
Stephen L




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] About 5.1 Arrays

2010-11-05 Thread Steve Lianoglou

Hi,

On Fri, Nov 5, 2010 at 6:00 AM, Stephen Liu  wrote:
[snip]
>> A[i, j, k] is the value of the element in position (i,j,k) of array A. In
>> other words, it is the entry in row i, column j, and "layer" k (if one
>> wants to think of A as a cuboidal grid).
>
> Sorry I can't follow.  Could you pls explain in more detail.
>
> e.g.
>
>> z <- 0:23
>> dim(z) <- c(3,4,2)
>> dim(z)
> [1] 3 4 2
>
>
>> z
> , , 1
>
>     [,1] [,2] [,3] [,4]
> [1,]    0    3    6    9
> [2,]    1    4    7   10
> [3,]    2    5    8   11
>
> , , 2
>
>     [,1] [,2] [,3] [,4]
> [1,]   12   15   18   21
> [2,]   13   16   19   22
> [3,]   14   17   20   23

It's not clear what you're having problems understanding. By setting
the "dim" attribute of your (1d) vector, you are changing its
dimenensions.

## This is a 1d vector
R> x <- 1:12
R> x

## I can change it into 2d (like a matrix), let's do 2 rows, 6 columns
R> dim(x) <- c(2,6)
R>  [,1] [,2] [,3] [,4] [,5] [,6]
[1,]13579   11
[2,]2468   10   12

If you understand that using three numbers to set the dimension means
you are making a 3d matrix

R> dim(x) <- c(2,3,2)

But the problem is you can't draw 3d in a terminal, so it just draws
the third dimension in order

R> x

x
, , 1

 [,1] [,2] [,3]
[1,]135
[2,]246

, , 2

 [,1] [,2] [,3]
[1,]79   11
[2,]8   10   12

###

Imagine this as a cube: ,,1 is the front layer, ,,2 is the back layer.

Just chew on it for a minute, it'll make sense.

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] NFFT on a Zoo?

2010-11-05 Thread Mike Marchywka








> Date: Fri, 5 Nov 2010 00:14:15 -0700
> From: flym...@gmail.com
> To: marchy...@hotmail.com
> CC: ggrothendi...@gmail.com; r-help@r-project.org; 
> rpy-l...@lists.sourceforge.net
> Subject: Re: [R] NFFT on a Zoo?
>
> FWIW: It turns out I dove into a rabbit hole:
>
> 1. Though the gaps in my 3-axis accelerometer data represent 10% data
> loss (OMG!), the number of gaps represents only 0.1% of the 3 million
> data points (BFD).
>
> 2. The data is noisy enough that 0.1% discontinuity can't affect an
> FFT. Each gap was removed simply by adjusting subsequent timestamps.
>
> 3. With the gaps removed, the remaining jitter in the timestamps is both
> small and nearly normally distributed (no systematic errors). So the
> timestamps were eliminated from further processing, and the mean
> inter-sample time was used as the sampling period.
>
> So, neither NFFT nor Zoo are needed, since a regular FFT now works just
> fine.

Well, again, it is easy to simulate these effects in R by sampling
a sine wave or other known signal at the wrong times, DFT, and see
what spectrum looks like. I suggested sine wave to start, but any
nonlinearities can be hard to estimate without a little work. You
can also do two-tone tests by hand and see IMD, harmonics, etc.


>
> The Moral of the Story is: "Take a closer look at the data before
> deciding difficult processing is needed."
>
> Homer Simpson translation: "Doh!"
>
I think Bart said " if it is in a book it must be true" and computers
of course don't make mistakes :)


> A big "Thanks!" to all who responded to my newbie posts: The R
> Community is richly blessed with wisdom, kindness and patience.

This is the kind of thing you can play with on the back of an 
envelope when bored once you get started. 

>
> -BobC
>
>
>
> On 11/03/2010 01:22 PM, Mike Marchywka wrote:
> > 
> >
> >> From: ggrothendi...@gmail.com
> >> Date: Wed, 3 Nov 2010 15:27:13 -0400
> >> To: flym...@gmail.com
> >> CC: r-help@r-project.org; rpy-l...@lists.sourceforge.net
> >> Subject: Re: [R] NFFT on a Zoo?
> >>
> >> On Wed, Nov 3, 2010 at 2:59 PM, Bob Cunningham wrote:
> >>
> >>> I have an irregular time series in a Zoo object, and I've been unable to
> >>> find any way to do an FFT on it. More precisely, I'd like to do an NFFT
> >>> (non-equispaced / non-uniform time FFT) on the data.
> >>>
> >>> The data is timestamped samples from a cheap self-logging accelerometer.
> >>> The data is weakly regular, with the following characteristics:
> >>> - short gaps every ~20ms
> >>> - large gaps every ~200ms
> >>> - jitter/noise in the timestamp
> >>>
> >>> The gaps cover ~10% of the acquisition time. And they occur often enough
> >>> that the uninterrupted portions of the data are too short to yield useful
> >>> individual FFT results, even without timestamp noise.
> >>>
> >>> My searches have revealed no NFFT support in R, but I'm hoping it may be
> >>> known under some other name (just as non-uniform time series are known as
> >>> 'zoo' rather than 'nts' or 'nuts').
> >>>
> >>> I'm using R through RPy, so any solution that makes use of numpy/scipy 
> >>> would
> >>> also work. And I care more about accuracy than speed, so a non-library
> >>> solution in R or Python would also work.
> >>>
> >>> Alternatively, is there a technique by which multiple FFTs over smaller
> >>> (incomplete) data regions may be combined to yield an improved view of the
> >>> whole? My experiments have so far yielded only useless results, but I'm
> >>> getting ready to try PCA across the set of partial FFTs.
> >>>
> >>>
> >>
> >
> > I'm pretty sure all of this is in Oppenheim and Shaffer meaning it
> > is also in any newer books. I recall something about averaging
> > but you'd need to look at details. Alternatively, and this is from
> > distant memory so maybe someone else can comment, you can just
> > feed a regularly spaced time series to anyone, go get FFTW for example,
> > and insert zeroes for missing data. This is equivalent to multiplying
> > your real data with a window function that is zero at missing points.
> > I think you can prove that multiplication
> > in time domain is convolution in FT domain so you can back this out
> > by deconvolving with your window function spectrum. This probably is not
> > painless, the window spectrum will have badly placed zeroes etc, but it
> > may be helpful.
> > Apaprently this is still a bit of an open issue,
> >
> > http://books.google.com/books?id=BW1PdOqZo6AC&pg=PA2&lpg=PA2&dq=dft+window+missing+data&source=bl&ots=fSY-iRoCNN&sig=30cC0SdkrDcp62iWc-Mv26mfNjI&hl=en&ei=AMTRTNmyMYP88AauxtzKDA&sa=X&oi=book_result&ct=result&resnum=6&ved=0CDEQ6AEwBTgK#v=onepage&q&f=false
> >
> >
> >
> > You should be able to do the case of a sine wave with pencil and paper
> > and see if or how this really would work.
> >
> >
> >
> >> Check out the entire thread that starts here.
> >>
> >> http://www.mail-archive.com/r-help@r-project.or

Re: [R] RBloomberg on R-2.12.0

2010-11-05 Thread Duncan Temple Lang



On 11/5/10 5:20 AM, Tolga I Uzuner wrote:
> Dear R Users,
> 
> Tried to install RBloomberg with R-2.12.0 and appears RDComclient has not 
> been built for this version of R, so failed. I then tried to get RBloombergs' 
> Java API version to work, but ran into problems with RJava which does not 
> appear to exist for Windows. My platform is Windows XP SP3.
> 
> Will RDcomclient be built for R-2.12.0 anytime soon ?

It is on the Omegahat site. Just that the directories weren't linked to the 
appropriate place.
You can install it now.

 D.

> 
> Does a version of RBloomberh with a Java API really exist ? An obvious Google 
> search like "Java api rbloomberg" throws up a bunch of discussions but 
> somehow, I cannot locate a package ?
> 
> Will RJava work on Windows ?
> 
> Thanks in advance for any pointers.
> Regards,
> Tolga
> 
> 
> This email is confidential and subject to important disclaimers and
> conditions including on offers for the purchase or sale of
> securities, accuracy and completeness of information, viruses,
> confidentiality, legal privilege, and legal entity disclaimers,
> available at http://www.jpmorgan.com/pages/disclosures/email.  
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] postForm() in RCurl and library RHTMLForms

2010-11-05 Thread Duncan Temple Lang



On 11/4/10 11:31 PM, sayan dasgupta wrote:
> Thanks a lot thats exactly what I was looking for
> 
> Just a quick question I agree the form gets submitted to the URL
> "http://www.nseindia.com/marketinfo/indices/histdata/historicalindices.jsp";
> 
> and I am filling up the form in the page
> "http://www.nseindia.com/content/indices/ind_histvalues.htm";
> 
> How do I submit the arguments like FromDate, ToDate, Symbol using postForm()
> and submit the query to get the similar table.
> 

Well that is what the function that RHTMLForms creates does.
So you can look at that code and see that it calls formQuery()
which ends in a call to postForm(). You could use

   debug(postForm)

and examine the arguments to it.

postForm("...jsp", FromDate = "10-"


The answer is

o = 
postForm("http://www.nseindia.com/marketinfo/indices/histdata/historicalindices.jsp";,
  FromDate = "01-11-2010", ToDate = "04-11-2010",
  IndexType = "S&P CNX NIFTY", check = "new",
 style = "POST" )


> 
> 
> 
> 
> 
> 
> On Fri, Nov 5, 2010 at 6:43 AM, Duncan Temple Lang
> wrote:
> 
>>
>>
>> On 11/4/10 2:39 AM, sayan dasgupta wrote:
>>> Hi RUsers,
>>>
>>> Suppose I want to see the data on the website
>>> url <- "http://www.nseindia.com/content/indices/ind_histvalues.htm";
>>>
>>> for the index "S&P CNX NIFTY" for
>>> dates "FromDate"="01-11-2010","ToDate"="02-11-2010"
>>>
>>> then read the html table from the page using readHTMLtable()
>>>
>>> I am using this code
>>> webpage <- postForm(url,.params=list(
>>>"FromDate"="01-11-2010",
>>>"ToDate"="02-11-2010",
>>>"IndexType"="S&P CNX NIFTY",
>>>"Indicesdata"="Get Details"),
>>>  .opts=list(useragent = getOption("HTTPUserAgent")))
>>>
>>> But it doesn't give me desired result
>>
>> You need to be more specific about how it fails to give the desired result.
>>
>> You are in fact posting to the wrong URL. The form is submitted to a
>> different
>> URL -
>> http://www.nseindia.com/marketinfo/indices/histdata/historicalindices.jsp
>>
>>
>>
>>>
>>> Also I was trying to use the function getHTMLFormDescription from the
>>> package RHTMLForms but there we can't use the argument
>>> .opts=list(useragent = getOption("HTTPUserAgent")) which is needed for
>> this
>>> particular website
>>
>> That's not the case. The function RHTMLForms will generate for you does
>> support
>> the .opts parameter.
>>
>> What you want is something along the lines:
>>
>>
>>  # Set default options for RCurl
>>  # requests
>> options(RCurlOptions = list(useragent = "R"))
>> library(RCurl)
>>
>>  # Read the HTML page since we cannot use htmlParse() directly
>>  # as it does not specify the user agent or an
>>  # Accept:*.*
>>
>> url <- "http://www.nseindia.com/content/indices/ind_histvalues.htm";
>> wp = getURLContent(url)
>>
>>  # Now that we have the page, parse it and use the RHTMLForms
>>  # package to create an R function that will act as an interface
>>  # to the form.
>> library(RHTMLForms)
>> library(XML)
>> doc = htmlParse(wp, asText = TRUE)
>>  # need to set the URL for this document since we read it from
>>  # text, rather than from the URL directly
>>
>> docName(doc) = url
>>
>>  # Create the form description and generate the R
>>  # function "call" the
>>
>> form = getHTMLFormDescription(doc)[[1]]
>> fun = createFunction(form)
>>
>>
>>  # now we can invoke the form from R. We only need 2
>>  # inputs  - FromDate and ToDate
>>
>> o = fun(FromDate = "01-11-2010", ToDate = "04-11-2010")
>>
>>  # Having looked at the tables, I think we want the the 3rd
>>  # one.
>> table = readHTMLTable(htmlParse(o, asText = TRUE),
>>which = 3,
>>header = TRUE,
>>stringsAsFactors = FALSE)
>> table
>>
>>
>>
>>
>> Yes it is marginally involved. But that is because we cannot simply read
>> the HTML document directly from htmlParse() because the lack of Accept(&
>> useragent)
>> HTTP header.
>>
>>>
>>>
>>> Thanks and Regards
>>> Sayan Dasgupta
>>>
>>>   [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guid

Re: [R] filled.contour colorbar without black color separators?

2010-11-05 Thread David Winsemius



On Nov 5, 2010, at 7:36 AM, Gregor Volberg wrote:



Dear list members,

I have been using filled.contour in order to plot EEG data. For the  
colors, I used a conventional ramp from blue to red (blue - green -  
yellow - red), and 100 color levels to make the plot looking smooth:


(...) color.palette = colorRampPalette(c('blue','green',   
'yellow','red'), space='rgb'), nlevels = 100 (...)


My problem ist that filled.contour draws a black bar as a separation  
between each color of the color bar (color key) so that color bar  
becomes essentially black if I use many color levels. Is there a way  
to  turn of this behavior? Any advice would be greatly appreciated,


I am not being categorical about this but it doesn't look simple.  I  
have looked at the code and the parameter list and had hoped to see  
either a way to pass parameters to legend or perhaps to change the  
code. Neither of those appears possible, since parameters are not sent  
to legend and legend isn't even called within the code. If I had this  
need. I would be looking at moving my plotting over to levelplot in  
package:lattice.


--
David


Gregor




David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Detect the Warning Message

2010-11-05 Thread jim holtman

?options

and then you will find the following:

warn:
sets the handling of warning messages. If warn is negative all
warnings are ignored. If warn is zero (the default) warnings are
stored until the top–level function returns. If fewer than 10 warnings
were signalled they will be printed otherwise a message saying how
many (max 50) were signalled. An object called last.warning is created
and can be printed through the function warnings. If warn is one,
warnings are printed as they occur. If warn is two or larger all
warnings are turned into errors.

by setting

options(warn = 2)

will cause the system to halt at that point.  Also setting:

options(error=utils::recover)

will drop you in the 'browser' (?browser) so you can see the values of
objects when the error occurred.

Google for 'debugging R' to get some more information.

On Fri, Nov 5, 2010 at 4:00 AM, Yen Lee  wrote:
> Dear all,
>
>
>
> I've written a function and repeated it for 5000 times with loops with
> different value, and the messages returned are the output I set and 15
> warnings.
>
> I would like to trace the warnings by stopping the loop when warning came
> out.
>
> Does anyone know how to make it?
>
>
>
> Thanks a lot for your help.
>
>
>
> Yen
>
>
>
>
>
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ANOVA table and lmer

2010-11-05 Thread Dennis Murphy

Hi:

Look at the structure of the experiment.
The six blocks represent different replications of the experiment.
No treatment is assigned at the block level.

Within a particular block, there are three plots, to which each
variety is randomly assigned to one of them. Ideally, separate
randomizations
of treatments to plots take place in each block.

Each plot is divisible into four subplots, to which the nitrogen levels are
randomly assigned. Again, separate randomizations in each plot is ideal.

For a particular block, then, there are 12 subplots altogether, each
producing a
single measurement.

Over all replicates, we have

Six blocks
18  plots
72 subplots

which determine how the degrees of freedom (and corresponding SS) are
allocated
at each size of experimental unit (or stratum):

Block level:
Replicate (block)5
Error(block)   0

Plot level:
Variety  2
Variety x block10(whole plot error)

Subplot level:
Nitrogen3
Nitrogen x Variety6
N x V x B45(split plot error)

Observe how the degrees of freedom add up at each size of EU.

The expected mean squares of the random effects terms are used to get their
variance component estimates. Since the data are balanced, the ANOVA (or
method of moments) approach can be used to match the observed and expected
mean squares, from which the variance component estimates are computed.

In lme() and lmer(), the variance or standard deviation component estimates
are reported rather than the mean squares, but in a balanced model
one can reconstruct the observed mean squares for each random term
by plugging in the estimated variance components into the expected
mean square formulas. [In lmer(), you need to add both variance estimates
of block to get it right.] You will see that they agree.

lme():
Random effects:
 Formula: ~1 | block
(Intercept)
StdDev:14.64496

 Formula: ~1 | variety %in% block
(Intercept) Residual
StdDev:10.29863 13.30727

lmer():
Random effects:
 GroupsNameVariance Std.Dev.
 variety:block (Intercept) 106.06   10.299
 block (Intercept) 107.24   10.356
 block (Intercept) 107.24   10.356
 Residual  177.08   13.307

The error variance component is estimated by the split plot error mean
square.
Since there are four observations per plot, the expected whole plot mean
square is
 sigma_e^2 + 4 * sigma_w^2, which is estimated by the observed whole plot
MSE.
Finally, there are 12 observations per block, so its expected mean square is
sigma_e^2 + 4 * sigma_w^2 + 12 * sigma_b^2, which is estimated by the block
mean
square. From the ANOVA approach, one should be able to estimate the variance

components reported in lme() and lmer() by equating observed and expected
mean
squares, solving the resulting linear system from the bottom up. Conversely,

given the variance component estimates, you should be able to reconstruct
the mean squares of the three random effects terms, at which point you
should
deduce that all three are doing the right thing in their own way.

This happy circumstance obtains because in balanced, normal theory models,
the methods of least squares and (restricted) maximum likelihood coincide.
In more general unbalanced data situations, the results from LS and (RE)ML
do
not necessarily agree, and in fact they may even disagree about what are the
degrees of freedom for certain terms in a given model.

HTH,
Dennis

On Fri, Nov 5, 2010 at 3:50 AM, ian m s white  wrote:

> Like James Booth, I find the SSQ and MSQ in lmer output confusing. The
> F-ratio (1.485) for Variety is the same for aov, lme and lmer, but
> lmer's mean square for variety is 1.485 times the subplot residual mean
> square. In the conventional anova table for a split-plot expt, the
> variety mean square is 1.485 times *main-plot* residual mean square.
> --
> ian m s white 
>
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory Management under Linux

2010-11-05 Thread jim holtman

It would be very useful if you would post some information about what
exactly you are doing.  There si something with the size of the data
object you are processing ('str' would help us understand it) and then
a portion of the script (both before and after the error message) so
we can understand the transformation that you are doing.  It is very
easy to generate a similar message:

> x <- matrix(0,2, 2)
Error: cannot allocate vector of size 3.0 Gb

but unless you know the context, it is almost impossible to give
advice.  It also depends on if you are in some function calls were
copies of objects may have been made, etc.

On Thu, Nov 4, 2010 at 7:52 PM, ricardo souza  wrote:
> Dear all,
>
> I am using ubuntu linux 32 with 4 Gb.  I am running a very small script and I 
> always got the same error message:  CAN NOT ALLOCATE A VECTOR OF SIZE 231.8 
> Mb.
>
> I have reading carefully the instruction in ?Memory.  Using the function gc() 
> I got very low numbers of memory (please sea below).  I know that it has been 
> posted several times at r-help 
> (http://tolstoy.newcastle.edu.au/R/help/05/06/7565.html#7627qlink2).  However 
> I did not find yet the solution to improve my memory issue in Linux.  
> Somebody cold please give some instruction how to improve my memory under 
> linux?
>
>> gc()
>  used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 170934  4.6 35  9.4   35  9.4
> Vcells 195920  1.5 786432  6.0   781384  6.0
>
> INCREASING THE R MEMORY FOLLOWING THE INSTRUCTION IN  ?Memory
>
> I started R with:
>
> R --min-vsize=10M --max-vsize=4G --min-nsize=500k --max-nsize=900M
>> gc()
>  used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
> Ncells 130433  3.5 50 13.4  25200   50 13.4
> Vcells  81138  0.7    1310720 10.0 NA   499143  3.9
>
> It increased but not so much!
>
> Please, please let me know.  I have read all r-help about this matter, but 
> not solution. Thanks for your attention!
>
> Ricardo
>
>
>
>
>
>
>
>        [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] RBloomberg on R-2.12.0

2010-11-05 Thread Tolga I Uzuner

Dear R Users,

Tried to install RBloomberg with R-2.12.0 and appears RDComclient has not been 
built for this version of R, so failed. I then tried to get RBloombergs' Java 
API version to work, but ran into problems with RJava which does not appear to 
exist for Windows. My platform is Windows XP SP3.

Will RDcomclient be built for R-2.12.0 anytime soon ?

Does a version of RBloomberh with a Java API really exist ? An obvious Google 
search like "Java api rbloomberg" throws up a bunch of discussions but somehow, 
I cannot locate a package ?

Will RJava work on Windows ?

Thanks in advance for any pointers.
Regards,
Tolga


This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] table with values as dots in increasing sizes

2010-11-05 Thread Gabor Grothendieck

On Fri, Nov 5, 2010 at 4:45 AM, fugelpitch  wrote:
>
> I was just thinking of a way to present data and if it is possible in R.
>
> I have a data frame that looks as follows (this is just mockup data).
>
> df
> location,"species1","species2","species3","species4","species5"
> "loc1",0.44,0.28,0.37,-0.24,0.41
> "loc2",0.54,0.62,0.34,0.52,0.71
> "loc3",-0.33,0.75,-0.34,0.48,0.61
>
> location is a factor while all the species are numerical vectors.
>
> I would like to present this as a table (or something that looks like a
> table) but instead of the numbers I would like to present circles (pch = 19)
> that increases in size with increasing number. Is it also possible to make
> it change color if the value is negative. (E.g. larger blue circles
> represent larger +values while larger red circles represent larger -values)?
>

This was recently discussed on the list.  See the thread that begins here:
https://stat.ethz.ch/pipermail/r-help/2010-November/258453.html


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] filled.contour colorbar without black color separators?

2010-11-05 Thread Gregor Volberg


Dear list members, 

I have been using filled.contour in order to plot EEG data. For the colors, I 
used a conventional ramp from blue to red (blue - green - yellow - red), and 
100 color levels to make the plot looking smooth: 

(...) color.palette = colorRampPalette(c('blue','green',  'yellow','red'), 
space='rgb'), nlevels = 100 (...) 

 My problem ist that filled.contour draws a black bar as a separation between 
each color of the color bar (color key) so that color bar becomes essentially 
black if I use many color levels. Is there a way to  turn of this behavior? Any 
advice would be greatly appreciated, 
Gregor 



-- 
Dr. rer. nat. Gregor Volberg  ( 
mailto:gregor.volb...@psychologie.uni-regensburg.de )
University of Regensburg
Institute for Experimental Psychology
93040 Regensburg, Germany
Tel: +49 941 943 3862 
Fax: +49 941 943 3233
http://www.psychologie.uni-regensburg.de/Greenlee/team/volberg/volberg.html



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ANOVA table and lmer

2010-11-05 Thread ian m s white

Like James Booth, I find the SSQ and MSQ in lmer output confusing. The
F-ratio (1.485) for Variety is the same for aov, lme and lmer, but
lmer's mean square for variety is 1.485 times the subplot residual mean
square. In the conventional anova table for a split-plot expt, the
variety mean square is 1.485 times *main-plot* residual mean square.
-- 
ian m s white 


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] About 5.1 Arrays

2010-11-05 Thread Stephen Liu


Hi Gerrit,

Thanks for your advice.



- snip -

> A[i, j, k] is the value of the element in position (i,j,k) of array A. In 
> other words, it is the entry in row i, column j, and "layer" k (if one 
> wants to think of A as a cuboidal grid).

Sorry I can't follow.  Could you pls explain in more detail.

e.g.

> z <- 0:23
> dim(z) <- c(3,4,2)
> dim(z)
[1] 3 4 2


> z
, , 1

 [,1] [,2] [,3] [,4]
[1,]0369
[2,]147   10
[3,]258   11

, , 2

 [,1] [,2] [,3] [,4]
[1,]   12   15   18   21
[2,]   13   16   19   22
[3,]   14   17   20   23


TIA


B.R.
Stephen L



-
AOR Dr. Gerrit Eichner   Mathematical Institute, Room 212
gerrit.eich...@math.uni-giessen.de   Justus-Liebig-University Giessen
Tel: +49-(0)641-99-32104  Arndtstr. 2, 35392 Giessen, Germany
Fax: +49-(0)641-99-32109http://www.uni-giessen.de/cms/eichner
-




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] table with values as dots in increasing sizes

2010-11-05 Thread ONKELINX, Thierry

> install.packages("fortunes")
> library(fortunes)
> fortune("yoda")

Evelyn Hall: I would like to know how (if) I can extract some of the
information from the summary of my nlme.
Simon Blomberg: This is R. There is no if. Only how.
   -- Evelyn Hall and Simon 'Yoda' Blomberg
  R-help (April 2005)

df <- data.frame(matrix(rnorm(15), nrow = 3))
colnames(df) <- paste("species", 1:5, sep = "")
df$location <- paste("loc", 1:3)
install.packages("ggplot2")
library(ggplot2)
molten <- melt(df, id.vars = "location", variable_name = "species")
molten$sign <- factor(sign(molten$value))
ggplot(molten, aes(x = species, y = location, colour = sign, size =
abs(value))) + geom_point()

ggplot(molten, aes(x = species, y = location, colour = sign, size =
abs(value))) + geom_point() + scale_colour_manual(values = c("red",
"blue"))

HTH,

Thierry




ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
  

> -Oorspronkelijk bericht-
> Van: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] Namens fugelpitch
> Verzonden: vrijdag 5 november 2010 9:45
> Aan: r-help@r-project.org
> Onderwerp: [R] table with values as dots in increasing sizes
> 
> 
> I was just thinking of a way to present data and if it is 
> possible in R.
> 
> I have a data frame that looks as follows (this is just mockup data).
> 
> df
> location,"species1","species2","species3","species4","species5"
> "loc1",0.44,0.28,0.37,-0.24,0.41
> "loc2",0.54,0.62,0.34,0.52,0.71
> "loc3",-0.33,0.75,-0.34,0.48,0.61
> 
> location is a factor while all the species are numerical vectors.
> 
> I would like to present this as a table (or something that 
> looks like a
> table) but instead of the numbers I would like to present 
> circles (pch = 19) that increases in size with increasing 
> number. Is it also possible to make it change color if the 
> value is negative. (E.g. larger blue circles represent larger 
> +values while larger red circles represent larger -values)?
> 
> 
> Jonas
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/table-with-values-as-dots-in-inc
> reasing-sizes-tp3028297p3028297.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] About 5.1 Arrays

2010-11-05 Thread Gerrit Eichner


On Fri, 5 Nov 2010, Stephen Liu wrote:


[snip]


"0" is counted as 1 object.

Of course! It is a number like any other.


Does "object length" mean the total number of objects/entries?

Yes.


Please help me to understand follow;

"For example if the dimension vector for an array, say a, is c(3,4,2) then there
are 3 * 4 * 2 = 24 entries in a and the data vector holds them in the order
a[1,1,1], a[2,1,1], ..., a[2,4,2], a[3,4,2]."

I don't understand;
a[1,1,1], a[2,1,1], ..., a[2,4,2]

[snip]

A[i, j, k] is the value of the element in position (i,j,k) of array A. In 
other words, it is the entry in row i, column j, and "layer" k (if one 
wants to think of A as a cuboidal grid).


Hth  -- Gerrit

-
AOR Dr. Gerrit Eichner   Mathematical Institute, Room 212
gerrit.eich...@math.uni-giessen.de   Justus-Liebig-University Giessen
Tel: +49-(0)641-99-32104  Arndtstr. 2, 35392 Giessen, Germany
Fax: +49-(0)641-99-32109http://www.uni-giessen.de/cms/eichner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] table with values as dots in increasing sizes

2010-11-05 Thread fugelpitch


I was just thinking of a way to present data and if it is possible in R.

I have a data frame that looks as follows (this is just mockup data).

df
location,"species1","species2","species3","species4","species5"
"loc1",0.44,0.28,0.37,-0.24,0.41
"loc2",0.54,0.62,0.34,0.52,0.71
"loc3",-0.33,0.75,-0.34,0.48,0.61

location is a factor while all the species are numerical vectors.

I would like to present this as a table (or something that looks like a
table) but instead of the numbers I would like to present circles (pch = 19)
that increases in size with increasing number. Is it also possible to make
it change color if the value is negative. (E.g. larger blue circles
represent larger +values while larger red circles represent larger -values)?


Jonas 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/table-with-values-as-dots-in-increasing-sizes-tp3028297p3028297.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] as.xts

2010-11-05 Thread spela podgorsek

hey

I am trying to turn a dataframe into xts with the function:
as.xts,
but it returns the error:

Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format

could someone give me some pointers please

the data is coming from a spreadsheet via the excel, and has 5 columns
of data (date (with the date and time), open, high, low, close) (excel
format)

ela

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Ordinal response model in depmixS4

2010-11-05 Thread Ingmar Visser

Penny,
The ?makeDepmix page has an example of how to add your own response
distribution model.
hth, Ingmar

On Fri, Oct 22, 2010 at 3:08 PM, Penny Adversario  wrote:

> I am running a latent class regression with 3 nominal and 2 ordinal
> variables using depmixS4 but the available response models do not include
> one for ordinal response.  How do I go about this?
>
> Penny
>
>
>
>[[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (no subject)

2010-11-05 Thread Michael Bedward

Hello,

One approach would be to fit your distribution using MCMC with, for
example, the rjags package. Then you can use the "zeroes trick" or
"ones trick" to implement your new distribution as described here...

http://mathstat.helsinki.fi/openbugs/data/Docu/Tricks.html

You will find a summary of Bayesian / MCMC packages here...

http://cran.r-project.org/web/views/Bayesian.html

Of these, rjags is the only one I've used directly so I can't comment
on which would be easiest. Hopefully others here can offer advice.

Michael

On 5 November 2010 00:25, Roes Da  wrote:
> hello,i'm roesda from indonesia
> I have trouble when they have to perform parameter estimation by MLE method
> using the R programming.because, the distribution  that will be used instead
> of not like the distribution that already known distributions such as gamma
> distribution, Poisson or binomial.  the distribution that i would estimate
> the parameters are the joint distribution between the negative binomial
> distribution and Lindley. how do I translate it in R if the distribution is
> still new as I mentioned? i hope everyone can help me. thank you very much
> Simak
> Baca secara fonetik
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] About 5.1 Arrays

2010-11-05 Thread Stephen Liu

Hi folks,

(Learning R)

5.1 Arrays
http://cran.r-project.org/doc/manuals/R-intro.html#Vectors-and-assignment

1)
If continued on previous example (3.1 Intrinsic attributes: mode and length),

> z <- 0:9
> dim(z) <- c(3,5,100)
Error in dim(z) <- c(3, 5, 100) : 
  dims [product 1500] do not match the length of object [10]

failed.


2)
Ran;

> z <- 0:1499
> dim(z) <- c(3,5,100)
> dim(z)
[1]   3   5 100

It worked


OR

3)
> z <- 1:1500
> dim(z) <- c(3,5,100)
> dim(z)
[1]   3   5 100

It also worked.

> z
   [1]123456789   10   11   12   13   14
  [15]   15   16   17   18   19   20   21   22   23   24   25   26   27   28
.
[1485] 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498
[1499] 1499 1500

"0" is counted as 1 object.


Does "object length" mean the total number of objects/entries?


Please help me to understand follow;

"For example if the dimension vector for an array, say a, is c(3,4,2) then 
there 
are 3 * 4 * 2 = 24 entries in a and the data vector holds them in the order 
a[1,1,1], a[2,1,1], ..., a[2,4,2], a[3,4,2]."

I don't understand;
a[1,1,1], a[2,1,1], ..., a[2,4,2]

1 * 1 * 1 / 2 * 1 * 1 / 2 * 4 * 2  is NOT 24 ?

TIA

B.R.
Stephen L




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Detect the Warning Message

2010-11-05 Thread Yen Lee

Dear all,

 

I've written a function and repeated it for 5000 times with loops with
different value, and the messages returned are the output I set and 15
warnings.

I would like to trace the warnings by stopping the loop when warning came
out.

Does anyone know how to make it?

 

Thanks a lot for your help.

 

Yen 

 

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] NFFT on a Zoo?

2010-11-05 Thread Bob Cunningham


FWIW:  It turns out I dove into a rabbit hole:

1. Though the gaps in my 3-axis accelerometer data represent 10% data 
loss (OMG!), the number of gaps represents only 0.1% of the 3 million 
data points (BFD).


2. The data is noisy enough that 0.1% discontinuity can't affect an 
FFT.  Each gap was removed simply by adjusting subsequent timestamps.


3. With the gaps removed, the remaining jitter in the timestamps is both 
small and nearly normally distributed (no systematic errors).  So the 
timestamps were eliminated from further processing, and the mean 
inter-sample time was used as the sampling period.


So, neither NFFT nor Zoo are needed, since a regular FFT now works just 
fine.


The Moral of the Story is: "Take a closer look at the data before 
deciding difficult processing is needed."


Homer Simpson translation: "Doh!"

A big "Thanks!" to all who responded to my newbie posts:  The R 
Community is richly blessed with wisdom, kindness and patience.


-BobC



On 11/03/2010 01:22 PM, Mike Marchywka wrote:


   

From: ggrothendi...@gmail.com
Date: Wed, 3 Nov 2010 15:27:13 -0400
To: flym...@gmail.com
CC: r-help@r-project.org; rpy-l...@lists.sourceforge.net
Subject: Re: [R] NFFT on a Zoo?

On Wed, Nov 3, 2010 at 2:59 PM, Bob Cunningham  wrote:
 

I have an irregular time series in a Zoo object, and I've been unable to
find any way to do an FFT on it.  More precisely, I'd like to do an NFFT
(non-equispaced / non-uniform time FFT) on the data.

The data is timestamped samples from a cheap self-logging accelerometer.
  The data is weakly regular, with the following characteristics:
- short gaps every ~20ms
- large gaps every ~200ms
- jitter/noise in the timestamp

The gaps cover ~10% of the acquisition time.  And they occur often enough
that the uninterrupted portions of the data are too short to yield useful
individual FFT results, even without timestamp noise.

My searches have revealed no NFFT support in R, but I'm hoping it may be
known under some other name (just as non-uniform time series are known as
'zoo' rather than 'nts' or 'nuts').

I'm using R through RPy, so any solution that makes use of numpy/scipy would
also work.  And I care more about accuracy than speed, so a non-library
solution in R or Python would also work.

Alternatively, is there a technique by which multiple FFTs over smaller
(incomplete) data regions may be combined to yield an improved view of the
whole?  My experiments have so far yielded only useless results, but I'm
getting ready to try PCA across the set of partial FFTs.

   
 


I'm pretty sure all of this is in Oppenheim and Shaffer meaning it
is also in any newer books. I recall something about averaging
but you'd need to look at details. Alternatively, and this is from
distant memory so maybe someone else can comment, you can just
feed a regularly spaced time series to anyone, go get FFTW for example,
and insert zeroes for missing data. This is equivalent to multiplying
your real data with a window function that is zero at missing points.
I think you can prove that multiplication
in time domain is convolution in FT domain so you can back this out
by deconvolving with your window function spectrum. This probably is not
painless, the window spectrum will have badly placed zeroes etc, but it
may be helpful.
Apaprently this is still a bit of an open issue,

http://books.google.com/books?id=BW1PdOqZo6AC&pg=PA2&lpg=PA2&dq=dft+window+missing+data&source=bl&ots=fSY-iRoCNN&sig=30cC0SdkrDcp62iWc-Mv26mfNjI&hl=en&ei=AMTRTNmyMYP88AauxtzKDA&sa=X&oi=book_result&ct=result&resnum=6&ved=0CDEQ6AEwBTgK#v=onepage&q&f=false



You should be able to do the case of a sine wave with pencil and paper
and see if or how this really would work.


   

Check out the entire thread that starts here.

http://www.mail-archive.com/r-help@r-project.org/msg36349.html

--
Statistics&  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
 





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] newbie question on importing and parsing file by row

2010-11-05 Thread Gerrit Eichner


Hello, Emily,

take a look at read.table() for importing (with or without header 
depending on your file which holds the data). Maybe


X <- read.table( "yourfilename", header = FALSE, row.names = 1)

and then

pvalues <- apply( X, 1,
  function( x)
   fisher.test( matrix( x, 2, 2))$p.value
)

does the job (if all the data in your file are such that fisher.test() can 
cope with them ...).


Hth  --  Gerrit


On Fri, 5 Nov 2010, Emily Wong wrote:


Hi,

I'm new to R and I have a file with many rows of values. Each row 
contains a title and values for a contingency table e.g.


row 1= title8   0   37796   47
which is a table called 'title'
with values
8 0
37796 47

I would like to know how I can import this using R and for each row 
calculate a p value using the fisher test. Using each p value I will do 
multiple a correction.


I am unsure how to automate this process.

Many thanks,
Emily

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regular Expressions

2010-11-05 Thread Noah Silverman

That's perfect! 

Don't know how I missed that.

I want to start playing with some modeling of financial data and the
only format I can download is rather ugly.  So my plan is to use a
series of Regex to extract what I want.

Noticed that you are a Prof. in applied stats.  I'm at UCLA working on
an MS in stats.  My department is fairly flexible, so I'm taking several
finance courses as part of my work.  Currently debating if I want to
graduate with an MS in June, or roll everything into a PhD and be
finished in an extra 1-2 years.

Thanks!

-N

On 11/5/10 12:09 AM, Prof Brian Ripley wrote:
> On Thu, 4 Nov 2010, Noah Silverman wrote:
>
>> Hi,
>>
>> I'm trying to figure out how to use capturing parenthesis in regular
>> expressions in R.  (Doing this in Perl, Java, etc. is fairly trivial,
>> but I can't seem to find the functionality in R.)
>>
>> For example, given the string:"10 Nov 13.00 (PFE1020K13)"
>>
>> I want to capture the first to digits and then the month abreviation.
>>
>> In perl, this would be
>>
>> /^(\d\d)\s(\w\w\w)\s/
>>
>> Then I have the variables $1 and $1 assigned to the capturing
>> parenthesis.
>>
>> I've found the grep and sub commands in R, but the docs don't
>> indicate any way to capture things.
>>
>> Any suggestions?
>
> Read the the link to ?regexp.  It *does* 'indicate the way to capture
> things'.
>
>  The backreference ‘\N’, where ‘N = 1 ... 9’, matches the substring
>  previously matched by the Nth parenthesized subexpression of the
>  regular expression.  (This is an extension for extended regular
>  expressions: POSIX defines them only for basic ones.)
>
> and there is an example on the help page for grep():
>
>  ## Double all 'a' or 'b's;  "\" must be escaped, i.e., 'doubled'
>  gsub("([ab])", "\\1_\\1_", "abc and ABC")
>
> In your example
>
> x <- "10 Nov 13.00 (PFE1020K13)"
> regex <- "(\\d\\d)\\s(\\w\\w\\w).*"
> sub(regex, "\\1", x, perl = TRUE)
> sub(regex, "\\2", x, perl = TRUE)
>
> A better way to do this would be something like
>
> regex <- "([[:digit:]]{2})\\s([[:alpha:]]{3}).*"
>
> which is also a POSIX extended regexp.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regular Expressions

2010-11-05 Thread Prof Brian Ripley


On Thu, 4 Nov 2010, Noah Silverman wrote:


Hi,

I'm trying to figure out how to use capturing parenthesis in regular 
expressions in R.  (Doing this in Perl, Java, etc. is fairly trivial, but I 
can't seem to find the functionality in R.)


For example, given the string:"10 Nov 13.00 (PFE1020K13)"

I want to capture the first to digits and then the month abreviation.

In perl, this would be

/^(\d\d)\s(\w\w\w)\s/

Then I have the variables $1 and $1 assigned to the capturing parenthesis.

I've found the grep and sub commands in R, but the docs don't indicate any 
way to capture things.


Any suggestions?


Read the the link to ?regexp.  It *does* 'indicate the way to capture 
things'.


 The backreference ‘\N’, where ‘N = 1 ... 9’, matches the substring
 previously matched by the Nth parenthesized subexpression of the
 regular expression.  (This is an extension for extended regular
 expressions: POSIX defines them only for basic ones.)

and there is an example on the help page for grep():

 ## Double all 'a' or 'b's;  "\" must be escaped, i.e., 'doubled'
 gsub("([ab])", "\\1_\\1_", "abc and ABC")

In your example

x <- "10 Nov 13.00 (PFE1020K13)"
regex <- "(\\d\\d)\\s(\\w\\w\\w).*"
sub(regex, "\\1", x, perl = TRUE)
sub(regex, "\\2", x, perl = TRUE)

A better way to do this would be something like

regex <- "([[:digit:]]{2})\\s([[:alpha:]]{3}).*"

which is also a POSIX extended regexp.

--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] newbie question on importing and parsing file by row

2010-11-05 Thread Emily Wong

Hi,

I'm new to R and I have a file with many rows of values. Each row contains a 
title and values for a contingency table
e.g.

row 1= title8   0   37796   47
which is a table called 'title'
with values
8 0
37796 47

I would like to know how I can import this using R and for each row calculate a 
p value using the fisher test. Using each p value I will do multiple a 
correction.

I am unsure how to automate this process.

Many thanks,
Emily

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

95 matches

Mail list logo