[R] Readjusting frequencies

2013-11-11 Thread Katherine Gobin
Dear Forum,

I have following data.frame as

fraud_data = data.frame(no_of_frauds = c(1, 2, 4, 6, 7, 9, 10), frequency = 
c(3, 1, 7, 11, 13, 1, 4))

 fraud_data
  no_of_frauds frequency
1            1         3
2            2         1
3            4         7
4            6        11
5            7        13
6            9         1
7           10         4


I need to regroup the data in such a way that if the frequency is less than 5, 
the corresponding class data gets merged to next class i.e. the frequencies get 
added added till the added frequencies exceed 5. Thus, in above data.frame 
since frequencies pertaining to no_of_frauds 1 and 2 are 3 and 1 respectively, 
these get added to class 4 and the frequency of this class now becomes 3+1+7 = 
11. Likewise, frequency of classes 9 and 10 are 1 and 4 and when these are 
added still it is 5 i.e. doesn't exceed 5. Thus, these should get added to the 
previous class i.e. 7.

Thus I need to have

no_of_frauds       frequency
        4                   11            #  ( 3 + 1 + 7)
        6                   11           
        7                   18            #  (13 + 1 + 4)

Kindly guide

Regards

Katherine
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] create Geotiff

2013-11-11 Thread Ludwig Hilger
Hi Karren,

not sure if this is a problem of the software you are using to view the
image after writing? I would first check the color scaling of the image in
this software. I would interpret the black background as no data.

regards,
Ludwig


Karren wrote
 Hi
 
 I am trying to export a raster as a Geotiff using -
 
 writeRaster(grazedmasstot, paste(pad, grazedmass_total.tif), GTiff,
 overwrite=TRUE) -
 
 But the resulting image is incorrect, the image is tiny and shows up as a
 white object with a black background. 
 
 Does anyone have any suggestions how I can rectify this?
 
 Thanks





-
Dipl. Geogr. Ludwig Hilger
Wiss. MA
Lehrstuhl für Physische Geographie
Katholische Universität Eichstätt-Ingolstadt
Ostenstraße 18
85072 Eichstätt
--
View this message in context: 
http://r.789695.n4.nabble.com/create-Geotiff-tp4680188p4680203.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Date handling in R is hard to understand

2013-11-11 Thread PIKAL Petr
Hi

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Alemu Tadesse
 Sent: Friday, November 08, 2013 8:41 PM
 To: r-help@r-project.org
 Subject: [R] Date handling in R is hard to understand
 
 Dear All,
 
 I usually work with time series data. The data may come in AM/PM date
 format or on 24 hour time basis. R can not recognize the two
 differences automatically - at least for me. I have to specifically
 tell R in which time format the data is. It seems that Pandas knows how
 to handle date without being told the format. The problem arises when I
 try to shift time by a certain time. Say adding 3600 to shift it
 forward, that case I have to use something like:
 Measured_data$Date - as.POSIXct(as.character(Measured_data$Date),
 tz=,format = %m/%d/%Y %I:%M %p)+3600 or Measured_data$Date -
 as.POSIXct(as.character(Measured_data$Date),
 tz=,format = %m/%d/%Y %H:%M)+3600  depending on the format. The
 date also attaches MDT or MST and so on. When merging two data frames
 with dates of different format that may create a problem (I think).
 When I get data from excel it could be in any/random format and I
 needed to customize the date to use in R in one of the above formats.
 Any TIPS - for automatic processing with no need to specifically tell
 the data format ?
 
 Another problem I saw was that when using r bind to bind data frames,
 if one column of one of the data frames is a character data (say for
 example none - coming from mysql) format R doesn't know how to
 concatenate numeric column from the other data frame to it. I needed to

rbind/cbind can use data.frame method which add any column specific format. 
However with normal method, it results in matrix which has to have common 
type of data in all columns (actually matrix is only vector with dimensions).

 str(cbind(airquality, 1:153))
'data.frame':   153 obs. of  7 variables:
 $ ozone  : int  41 36 12 18 NA 28 23 19 8 NA ...
 $ solar.r: int  190 118 149 313 NA NA 299 99 19 194 ...
 $ wind   : num  7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
 $ temp   : int  67 72 74 62 56 66 65 59 61 69 ...
 $ month  : int  5 5 5 5 5 5 5 5 5 5 ...
 $ day: int  1 2 3 4 5 6 7 8 9 10 ...
 $ 1:153  : int  1 2 3 4 5 6 7 8 9 10 ...

Regards
Petr


 change the numeric to character and later after binding takes place I
 had to re-convert it to numeric. But, this causes problem in an
 automated environment. Any suggestion ?
 
 Thanks
 Mihretu
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Show time in x-axis

2013-11-11 Thread mohan . radhakrishnan
Hi,
I am trying to show time( HH:MM:SS) in my x-axis. I have these 
two questions.

1. The error in the code is 

Error in axis(1, at = data$Time, labels = data$Time, las = 2, cex.axis = 
1.2) : 
  (list) object cannot be coerced to type 'double'

Should I use 'POSIXCt' or 'strptime' ?

2. I have times that are repeated because it is the next or previous day. 
But I want to show the times and data points in sequence - as they are in 
the data frame - along the axis.

Thanks,
Mohan

X1   X2 X3  X4 
OldGenAfterFullGC  X6   X7  PermGenAfterFullGC  Time
10  3285873856  3456  3285 
1256128 12862  1286219:36:16
2 3285  30437   873856  31324 30437 1256128 
39212  3921219:36:26
3   312755  313565   873856 313843313565  1214080 
182327 182327   20:36:27
4   313565  281379  873856  313789281379 1213248 
182338 147729   21:36:29
50  3285873856  3456  3285 
1256128 12862  1286219:36:16

plot(data$Time,levels(data$PermGenAfterFullGC)[data$PermGenAfterFullGC],col=darkblue,pch=2,type=b,
 
ylab=Megabytes, xlab=Time,las=2,lwd=2, cex.lab=1,cex.axis=1,xaxt=n)

axis(1, at = data$Time, labels = data$Time, las = 2,cex.axis=1.2)
text(data$Time,data$Time, data$Time, 2, cex=1.45)


This e-Mail may contain proprietary and confidential information and is sent 
for the intended recipient(s) only.  If by an addressing or transmission error 
this mail has been misdirected to you, you are requested to delete this mail 
immediately. You are also hereby notified that any use, any form of 
reproduction, dissemination, copying, disclosure, modification, distribution 
and/or publication of this e-mail message, contents or its attachment other 
than by its intended recipient/s is strictly prohibited.

Visit us at http://www.polarisFT.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] MM robust

2013-11-11 Thread IZHAK shabsogh
given the model

y = x1 / 1+ b1x2^b2
given data
y-c(2,3,4,5,6)
x1- c(0.23,0.32,0.43,0.54,0.65)
x2-c(0.11,021,0.31,0.41,0.33)
initial parameter
b1=0.023
b2=0.045

i am able to find the parameter of the above model usingnls method,
can u please give hint on how i can solve the same model as above using MM 
robust estimate to obtain the parameter. i mean u can illustrate using the 
above information to enable me extend to what i am doing

thanks
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] package ‘build-essential’ is not available (for R version 3.0.2)

2013-11-11 Thread Charles Evans
Hello,

I have searched on the R-Project site, R-Help archives, and the Internet
at large, and I cannot find a solution to my problem.

I am running R version 3.0.2 (2013-09-25) -- Frisbee Sailing on Ubuntu
13.04.

When I try to install several packages, including quantmod, with
dependencies=T set, and I keep getting long lists of packages that
result in installation of package 'X' had non-zero exit status.  When
I try to install X, I get another list of packages that failed to install.

After a few iterations of this, the package that I am trying to install
is listed among the packages that have failed to install.

I found a reference online to build-essential, but when I tried to
install that, I got package ‘build-essential’ is not available (for R
version 3.0.2).

Any hints or follow-up questions would be greatly appreciated.

C.Evans

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Show time in x-axis

2013-11-11 Thread Jim Lemon

On 11/11/2013 09:07 PM, mohan.radhakrish...@polarisft.com wrote:

Hi,
 I am trying to show time( HH:MM:SS) in my x-axis. I have these
two questions.

1. The error in the code is

Error in axis(1, at = data$Time, labels = data$Time, las = 2, cex.axis =
1.2) :
   (list) object cannot be coerced to type 'double'

Should I use 'POSIXCt' or 'strptime' ?

2. I have times that are repeated because it is the next or previous day.
But I want to show the times and data points in sequence - as they are in
the data frame - along the axis.

Thanks,
Mohan

 X1   X2 X3  X4
OldGenAfterFullGC  X6   X7  PermGenAfterFullGC  Time
10  3285873856  3456  3285
1256128 12862  1286219:36:16
2 3285  30437   873856  31324 30437 1256128
39212  3921219:36:26
3   312755  313565   873856 313843313565  1214080
182327 182327   20:36:27
4   313565  281379  873856  313789281379 1213248
182338 147729   21:36:29
50  3285873856  3456  3285
1256128 12862  1286219:36:16

plot(data$Time,levels(data$PermGenAfterFullGC)[data$PermGenAfterFullGC],col=darkblue,pch=2,type=b,
ylab=Megabytes, xlab=Time,las=2,lwd=2, cex.lab=1,cex.axis=1,xaxt=n)

axis(1, at = data$Time, labels = data$Time, las = 2,cex.axis=1.2)
text(data$Time,data$Time, data$Time, 2, cex=1.45)



Hi Mohan,
Yes, you probably want to convert the Time variable. However, to 
answer both questions in one, you also probably want to stick a starting 
date on your times, incrementing this whenever a time is less than the 
previous one:


# this will produce times for the current date
data$Time1-strptime(data$Time,%H:%M:%S)
offset-0
lasttime-0
for(timedate in 1:length(data$Time1)) {
 if(as.numeric(data$Time1[timedate])  lasttime) offset-offset + 86400
 data$Time1[timedate]-data$Time1[timedate]+offset
 lasttime-data$Time1[timedate]
}

Then you can use Time1 as the at argument, and Time as the 
labels argument to axis.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] repeating values in an index two by two

2013-11-11 Thread Federico Calboli
Hi All,

I am trying to create an index that returns something like

1,2,1,2,3,4,3,4,5,6,5,6,7,8,7,8

and so on and so forth until a predetermined value (which is obviously even).  
I am trying very hard to avoid for loops or for loops front ends.

I'd be obliged if anybody could offer a suggestion.

BW

F


signature.asc
Description: Message signed with OpenPGP using GPGMail
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] graphics or table

2013-11-11 Thread Enzo Cocca
hi

I have this code for a cross validation:

res - as.data.frame(CV_Pb_var)$residual  sqrt(mean(res^2))  
mean(res)  mean(res^2/as.data.frame(CV_Pb_var)$var1.var)


I can not seem to export everything in one table


 also  can I to be exported it graphically?


thanks


enzo


-- 
Enzo Cocca (PhD Candidate)
Research Fellow
Università di Napoli L'Orientale
mail: enzo@gmail.com
cell: +393495087014

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] grnn input format usage?

2013-11-11 Thread Cyril Auburtin
I'm trying grnn package, and reproduced the example (
http://cran.r-project.org/web/packages/grnn/grnn.pdf), I tried the example
with another x input column in the dataset (see below):

but I'm getting the following error  Error in Ya * patterns1 :
non-conformable arrays, though I took care to pass an input of length 2

n - 100
set.seed(1)

x1 - runif(n, -2, 2)
x2 = x1^2
y0 - x1 * x2

epsilon - rnorm(n, 0, .1)
y - y0 + epsilon
grnn - learn(data.frame(y,x1, x2))
grnn - smooth(grnn,sigma=0.1)
guess(grnn, c(2,4))
*Error in Ya * patterns1 : non-conformable arrays*

guess(grnn, data.frame(x1=c(2), x2=c(4)))
*Error in (X - Xa) %*% t(X - Xa) : *
*  requires numeric/complex matrix/vector arguments*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Apply a function with multiple argument on each column of matrix

2013-11-11 Thread Mohammad Tanvir Ahamed
Hi there !!
I have a function like 
fun - function(x,y) 
{
loe-loess(y ~ x,span=0.9,family=gaussian)
pre-predict(loe,data.frame(x=x))
return(pre)
}

Now i have defined : 
x-1:500

y-matrix(rnorm(1000,3),ncol=2)

I can manipulate fun(x,y[,1]) .
But i want to apply the function on each column of matrix y . 
Any suggestion will be appreciated . 
Thanks .  

 
Best regards


... 
Tanvir Ahamed
Göteborg, Sweden
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] repeating values in an index two by two

2013-11-11 Thread andrija djurovic
Hi. Here are two approaches:

c(mapply(function(x,y) rep(c(x,y), 2), (1:10)[c(T,F)], (1:10)[c(F,T)]))

c(tapply(1:10, rep(1:(10/2), each=2), rep, 2), recursive=T)

Andrija





On Mon, Nov 11, 2013 at 1:11 PM, Federico Calboli
f.calb...@imperial.ac.ukwrote:

 Hi All,

 I am trying to create an index that returns something like

 1,2,1,2,3,4,3,4,5,6,5,6,7,8,7,8

 and so on and so forth until a predetermined value (which is obviously
 even).  I am trying very hard to avoid for loops or for loops front ends.

 I'd be obliged if anybody could offer a suggestion.

 BW

 F

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] MM robust

2013-11-11 Thread S Ellison
 given the model
 
 y = x1 / 1+ b1x2^b2
...
 i am able to find the parameter of the above model usingnls method, can u
 please give hint on how i can solve the same model as above using MM
 robust estimate to obtain the parameter. i mean u can illustrate using the
 above information to enable me extend to what i am doing

have a look at ?nlrob in the robustbase package.

S Ellison


***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Apply a function with multiple argument on each column of matrix

2013-11-11 Thread Uwe Ligges



On 11.11.2013 13:31, Mohammad Tanvir Ahamed wrote:

Hi there !!
I have a function like
fun - function(x,y)
{
loe-loess(y ~ x,span=0.9,family=gaussian)
pre-predict(loe,data.frame(x=x))
return(pre)
}

Now i have defined :
x-1:500

y-matrix(rnorm(1000,3),ncol=2)

I can manipulate fun(x,y[,1]) .
But i want to apply the function on each column of matrix y .


apply(y, 2, function(i) fun(x, i))

Uwe Ligges



Any suggestion will be appreciated .
Thanks .


Best regards


...
Tanvir Ahamed
Göteborg, Sweden
[[alternative HTML version deleted]]



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Apply a function with multiple argument on each column of matrix

2013-11-11 Thread Mohammad Tanvir Ahamed
Thanks !!
 
Best regards


... 
Tanvir Ahamed
Göteborg, Sweden



On Monday, 11 November 2013, 13:49, Uwe Ligges 
lig...@statistik.tu-dortmund.de wrote:
 


On 11.11.2013 13:31, Mohammad Tanvir Ahamed wrote:
 Hi there !!
 I have a function like
 fun - function(x,y)
 {
 loe-loess(y ~ x,span=0.9,family=gaussian)
 pre-predict(loe,data.frame(x=x))
 return(pre)
 }

 Now i have defined :
 x-1:500

 y-matrix(rnorm(1000,3),ncol=2)

 I can manipulate fun(x,y[,1]) .
 But i want to apply the function on each column of matrix y .

apply(y, 2, function(i) fun(x, i))

Uwe Ligges



 Any suggestion will be appreciated .
 Thanks .


 Best regards


 ...
 Tanvir Ahamed
 Göteborg, Sweden
     [[alternative HTML version deleted]]



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] repeating values in an index two by two

2013-11-11 Thread Federico Calboli
Hi,

first off, thanks for the suggestion.  I managed to solve it by doing:

IND = rep(c(T,T,F,F), 5)
X = rep(NA, 20)
X[IND] = 1:10
X[!IND] = 1:10

which avoids any function -- I think mapply, apply etc call a for loop 
internally, which I'd rather avoid.

BW

F



On 11 Nov 2013, at 12:35, andrija djurovic djandr...@gmail.com wrote:

 Hi. Here are two approaches:
 
 c(mapply(function(x,y) rep(c(x,y), 2), (1:10)[c(T,F)], (1:10)[c(F,T)]))
 
 c(tapply(1:10, rep(1:(10/2), each=2), rep, 2), recursive=T)
 
 Andrija
 
 
 
 
 
 On Mon, Nov 11, 2013 at 1:11 PM, Federico Calboli f.calb...@imperial.ac.uk 
 wrote:
 Hi All,
 
 I am trying to create an index that returns something like
 
 1,2,1,2,3,4,3,4,5,6,5,6,7,8,7,8
 
 and so on and so forth until a predetermined value (which is obviously even). 
  I am trying very hard to avoid for loops or for loops front ends.
 
 I'd be obliged if anybody could offer a suggestion.
 
 BW
 
 F
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 



signature.asc
Description: Message signed with OpenPGP using GPGMail
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] package ‘build-essential’ is not available (for R version 3.0.2)

2013-11-11 Thread Joshua Ulrich
Have you read these instructions?
http://cran.r-project.org/bin/linux/ubuntu/README.html

They say to run
sudo apt-get install r-base-dev

which should install 'build-essential' (which is an Ubuntu package,
not an R package).
--
Joshua Ulrich  |  about.me/joshuaulrich
FOSS Trading  |  www.fosstrading.com


On Mon, Nov 11, 2013 at 5:02 AM, Charles Evans cev...@chyden.net wrote:
 Hello,

 I have searched on the R-Project site, R-Help archives, and the Internet
 at large, and I cannot find a solution to my problem.

 I am running R version 3.0.2 (2013-09-25) -- Frisbee Sailing on Ubuntu
 13.04.

 When I try to install several packages, including quantmod, with
 dependencies=T set, and I keep getting long lists of packages that
 result in installation of package 'X' had non-zero exit status.  When
 I try to install X, I get another list of packages that failed to install.

 After a few iterations of this, the package that I am trying to install
 is listed among the packages that have failed to install.

 I found a reference online to build-essential, but when I tried to
 install that, I got package ‘build-essential’ is not available (for R
 version 3.0.2).

 Any hints or follow-up questions would be greatly appreciated.

 C.Evans

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] r package to solve for Nash equilibrium

2013-11-11 Thread Dereje Fentie
Is there an r package out there that solves for pure strategy* Nash
equilibrium of a two-person game*? A search for Nash equilibrium in r
provides a link to the *GNE* package which solves for the Generalized Nash
equilibrium. But what I would like to solve is a pure strategy Nash
equilibrium.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to introduce missing data for complete data

2013-11-11 Thread Bert Gunter
1. You need to define more explicitly exactly what you mean by randomly.

2. You need to make an honest effort to learn basic R, e.g. by
spending time with the Introduction to R document that ships with R
or an online tutorial (there are many good ones).

Cheers,
Bert

On Sun, Nov 10, 2013 at 10:31 PM, dila radi dilarad...@gmail.com wrote:
 Hi,

 Im new R users. In my research I use rainfall data and Im interested in
 estimating missing data. I would like to use Normal Ratio Method to
 estimate missing data. My problem is, how do I introduce missing data
 randomly within my complete set of data?


 Stn ID  Year  Mth   Day   Amount
 48603 71 1 1 1
 48603 71 1 2 0.5
 48603 71 1 3 1.3
 48603 71 1 4 0.8
 48603 71 1 5 0
 48603 71 1 6 0
 48603 71 1 7 0
 ...

 Thank you so much for your attention and help.

 Regards,
 Dila

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

(650) 467-7374

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] repeating values in an index two by two

2013-11-11 Thread Patrick Burns

 f1
function(x) {
one - matrix(1:x, nrow=2)
as.vector(rbind(one, one))
}
environment: 0x0daaf1c0
 f1(8)
 [1] 1 2 1 2 3 4 3 4 5 6 5 6 7 8 7 8

Pat


On 11/11/2013 12:11, Federico Calboli wrote:

Hi All,

I am trying to create an index that returns something like

1,2,1,2,3,4,3,4,5,6,5,6,7,8,7,8

and so on and so forth until a predetermined value (which is obviously even).  
I am trying very hard to avoid for loops or for loops front ends.

I'd be obliged if anybody could offer a suggestion.

BW

F



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Patrick Burns
pbu...@pburns.seanet.com
twitter: @burnsstat @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of:
 'Impatient R'
 'The R Inferno'
 'Tao Te Programming')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cross Tabulation

2013-11-11 Thread David Carlson
OK. Then using aggregate():

 data$yes - ifelse(data$response==yes, 1, 0)
 data$no - ifelse(data$response==no, 1, 0)
 dataresp - aggregate(cbind(no, yes)~region+district, data,
sum)
 dataresp[,3:4] - dataresp[,3:4]/rowSums(dataresp[,3:4])
 # or dataresp[,3:4] - prop.table(as.matrix(dataresp[,3:4]),
1)
 dataresp
  region district  no yes
1  Ad 0.5 0.5
2  Ae 0.0 1.0
3  Bf 0.5 0.5
4  Bg 0.5 0.5
5  Ch 0.5 0.5
6  Ci 0.0 1.0
7  Cj 1.0 0.0

David

From: Peter Maclean [mailto:pmaclean2...@yahoo.com] 
Sent: Sunday, November 10, 2013 12:52 PM
To: dcarl...@tamu.edu
Subject: Re: [R] Cross Tabulation

Thanks. But I am creating lots of tables and I need Regions and
Districts to appear so as to avoid to much editing.
 
Peter Maclean
Department of Economics
UDSM

On Sunday, November 10, 2013 12:32 PM, David Carlson
dcarl...@tamu.edu wrote:
The simplest would be to create a variable combining region and
district:

 data$region_district - with(data, paste(region, district))
 prop.table(xtabs(~region_district+response, data), 1)
              response
region_district  no yes
            A d 0.5 0.5
            A e 0.0 1.0
            B f 0.5 0.5
            B g 0.5 0.5
            C h 0.5 0.5
            C i 0.0 1.0
            C j 1.0 0.0

-
David L Carlson
Department of Anthropology
Texas AM University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Peter Maclean
Sent: Sunday, November 10, 2013 12:06 AM
To: r-help@r-project.org
Subject: Re: [R] Cross Tabulation


#Would like to create a cross-table (Region, district, response)
and 
#(Region, district, cost. The flat table function does not look
so good 
region  - c(A,A,A,A,B,B, B, B, C,C, C,
C) 
district - c(d,d,e,e,f,f, g, g, h,h, i,
j) 
response - c(yes, no, yes, yes, no, yes, yes,
no, yes, no, yes,no)
cost  -  runif(12, 5.0, 9)
var - c(region, response, district)
data - data.frame(region, district, response, cost)
var1 - c(region, district, response)
var2 - c(region, district, cost)
data1 - data[var1]
#This look okay
with(data, aggregate(x=cost, by=list(region, district),
FUN=mean))
#This does not look good 
#How do i remove the NaN or create a better one
prop.table(ftable(data1, exclude = c(NA, NaN)), 1)
prop.table(ftable(xtabs(~region + district+ response,
data=data)),1)


Peter Maclean
Department of Economics
UDSM

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] repeating values in an index two by two

2013-11-11 Thread Carl Witthoft
Here's a rather extreme solution:

 foo-rep(1:6,each=2)
Rgames foo
 [1] 1 1 2 2 3 3 4 4 5 5 6 6

Rgames foo[rep(c(1,3,2,4),3)+rep(c(0,4,8),each=4)]
 [1] 1 2 1 2 3 4 3 4 5 6 5 6

In the general case, then, it would be something like

foo- rep(1:N, each = 2)  # foo is of length(2*N)

foo[rep(c(1,3,2,4),2*N/4 + rep( seq(0, 3*N/4,by=4),each=4)]

Note that the refolding requires the sequence to have length a multiple of
4.




Patrick Burns wrote
 f1
 function(x) {
  one - matrix(1:x, nrow=2)
  as.vector(rbind(one, one))
 }
 environment: 0x0daaf1c0
   f1(8)
   [1] 1 2 1 2 3 4 3 4 5 6 5 6 7 8 7 8
 
 Pat
 
 
 On 11/11/2013 12:11, Federico Calboli wrote:
 Hi All,

 I am trying to create an index that returns something like

 1,2,1,2,3,4,3,4,5,6,5,6,7,8,7,8

 and so on and so forth until a predetermined value (which is obviously
 even).  I am trying very hard to avoid for loops or for loops front ends.

 I'd be obliged if anybody could offer a suggestion.

 BW

 F



 __
 

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 -- 
 Patrick Burns

 pburns@.seanet

 twitter: @burnsstat @portfolioprobe
 http://www.portfolioprobe.com/blog
 http://www.burns-stat.com
 (home of:
   'Impatient R'
   'The R Inferno'
   'Tao Te Programming')
 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





--
View this message in context: 
http://r.789695.n4.nabble.com/repeating-values-in-an-index-two-by-two-tp4680210p4680234.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] repeating values in an index two by two

2013-11-11 Thread Charles Determan Jr
Here is another solution that is a bit more flexible

tmp - seq(8)
# split into your desired groups
max.groups - 2
tmp.g - split(tmp, ceiling(seq_along(tmp)/max.groups))

# do repeats, unlist, numeric index
as.numeric(unlist(rep(tmp.g, each = 2)))

Hope this works for you,
Charles


On Mon, Nov 11, 2013 at 10:16 AM, Carl Witthoft c...@witthoft.com wrote:

 Here's a rather extreme solution:

  foo-rep(1:6,each=2)
 Rgames foo
  [1] 1 1 2 2 3 3 4 4 5 5 6 6

 Rgames foo[rep(c(1,3,2,4),3)+rep(c(0,4,8),each=4)]
  [1] 1 2 1 2 3 4 3 4 5 6 5 6

 In the general case, then, it would be something like

 foo- rep(1:N, each = 2)  # foo is of length(2*N)

 foo[rep(c(1,3,2,4),2*N/4 + rep( seq(0, 3*N/4,by=4),each=4)]

 Note that the refolding requires the sequence to have length a multiple of
 4.




 Patrick Burns wrote
  f1
  function(x) {
   one - matrix(1:x, nrow=2)
   as.vector(rbind(one, one))
  }
  environment: 0x0daaf1c0
f1(8)
[1] 1 2 1 2 3 4 3 4 5 6 5 6 7 8 7 8
 
  Pat
 
 
  On 11/11/2013 12:11, Federico Calboli wrote:
  Hi All,
 
  I am trying to create an index that returns something like
 
  1,2,1,2,3,4,3,4,5,6,5,6,7,8,7,8
 
  and so on and so forth until a predetermined value (which is obviously
  even).  I am trying very hard to avoid for loops or for loops front
 ends.
 
  I'd be obliged if anybody could offer a suggestion.
 
  BW
 
  F
 
 
 
  __
 

  R-help@

   mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
  --
  Patrick Burns

  pburns@.seanet

  twitter: @burnsstat @portfolioprobe
  http://www.portfolioprobe.com/blog
  http://www.burns-stat.com
  (home of:
'Impatient R'
'The R Inferno'
'Tao Te Programming')
 
  __

  R-help@

   mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.





 --
 View this message in context:
 http://r.789695.n4.nabble.com/repeating-values-in-an-index-two-by-two-tp4680210p4680234.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Charles Determan
Integrated Biosciences PhD Candidate
University of Minnesota

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] repeating values in an index two by two

2013-11-11 Thread Iakub Henschen
 n-7
 rep(seq(1,n,2), each=4)+c(0,1,0,1)
 [1] 1 2 1 2 3 4 3 4 5 6 5 6 7 8 7 8

rep(), seq(), rbind(), apply() ... whatever: internally there will always
be iteration via some loop :-)

Ia.


On Mon, Nov 11, 2013 at 11:16 AM, Carl Witthoft c...@witthoft.com wrote:

 Here's a rather extreme solution:

  foo-rep(1:6,each=2)
 Rgames foo
  [1] 1 1 2 2 3 3 4 4 5 5 6 6

 Rgames foo[rep(c(1,3,2,4),3)+rep(c(0,4,8),each=4)]
  [1] 1 2 1 2 3 4 3 4 5 6 5 6

 In the general case, then, it would be something like

 foo- rep(1:N, each = 2)  # foo is of length(2*N)

 foo[rep(c(1,3,2,4),2*N/4 + rep( seq(0, 3*N/4,by=4),each=4)]

 Note that the refolding requires the sequence to have length a multiple of
 4.




 Patrick Burns wrote
  f1
  function(x) {
   one - matrix(1:x, nrow=2)
   as.vector(rbind(one, one))
  }
  environment: 0x0daaf1c0
f1(8)
[1] 1 2 1 2 3 4 3 4 5 6 5 6 7 8 7 8
 
  Pat
 
 
  On 11/11/2013 12:11, Federico Calboli wrote:
  Hi All,
 
  I am trying to create an index that returns something like
 
  1,2,1,2,3,4,3,4,5,6,5,6,7,8,7,8
 
  and so on and so forth until a predetermined value (which is obviously
  even).  I am trying very hard to avoid for loops or for loops front
 ends.
 
  I'd be obliged if anybody could offer a suggestion.
 
  BW
 
  F
 
 
 
  __
 

  R-help@

   mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
  --
  Patrick Burns

  pburns@.seanet

  twitter: @burnsstat @portfolioprobe
  http://www.portfolioprobe.com/blog
  http://www.burns-stat.com
  (home of:
'Impatient R'
'The R Inferno'
'Tao Te Programming')
 
  __

  R-help@

   mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.





 --
 View this message in context:
 http://r.789695.n4.nabble.com/repeating-values-in-an-index-two-by-two-tp4680210p4680234.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] repeating values in an index two by two

2013-11-11 Thread William Dunlap
Or you can use the integer divide and remainder operators:
n - 30
x - seq(0, len=n)
+ (x %% 2) + (x %/% 4)*2 + 1 # period 2 oscillator + jump by 2 every fourth
[1]  1  2  1  2  3  4  3  4  5  6  5  6  7  8  7
   [16]  8  9 10  9 10 11 12 11 12 13 14 13 14 15 16

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Iakub Henschen
 Sent: Monday, November 11, 2013 8:42 AM
 To: r-help@r-project.org
 Subject: Re: [R] repeating values in an index two by two
 
  n-7
  rep(seq(1,n,2), each=4)+c(0,1,0,1)
  [1] 1 2 1 2 3 4 3 4 5 6 5 6 7 8 7 8
 
 rep(), seq(), rbind(), apply() ... whatever: internally there will always
 be iteration via some loop :-)
 
 Ia.
 
 
 On Mon, Nov 11, 2013 at 11:16 AM, Carl Witthoft c...@witthoft.com wrote:
 
  Here's a rather extreme solution:
 
   foo-rep(1:6,each=2)
  Rgames foo
   [1] 1 1 2 2 3 3 4 4 5 5 6 6
 
  Rgames foo[rep(c(1,3,2,4),3)+rep(c(0,4,8),each=4)]
   [1] 1 2 1 2 3 4 3 4 5 6 5 6
 
  In the general case, then, it would be something like
 
  foo- rep(1:N, each = 2)  # foo is of length(2*N)
 
  foo[rep(c(1,3,2,4),2*N/4 + rep( seq(0, 3*N/4,by=4),each=4)]
 
  Note that the refolding requires the sequence to have length a multiple of
  4.
 
 
 
 
  Patrick Burns wrote
   f1
   function(x) {
one - matrix(1:x, nrow=2)
as.vector(rbind(one, one))
   }
   environment: 0x0daaf1c0
 f1(8)
 [1] 1 2 1 2 3 4 3 4 5 6 5 6 7 8 7 8
  
   Pat
  
  
   On 11/11/2013 12:11, Federico Calboli wrote:
   Hi All,
  
   I am trying to create an index that returns something like
  
   1,2,1,2,3,4,3,4,5,6,5,6,7,8,7,8
  
   and so on and so forth until a predetermined value (which is obviously
   even).  I am trying very hard to avoid for loops or for loops front
  ends.
  
   I'd be obliged if anybody could offer a suggestion.
  
   BW
  
   F
  
  
  
   __
  
 
   R-help@
 
mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
  
   --
   Patrick Burns
 
   pburns@.seanet
 
   twitter: @burnsstat @portfolioprobe
   http://www.portfolioprobe.com/blog
   http://www.burns-stat.com
   (home of:
 'Impatient R'
 'The R Inferno'
 'Tao Te Programming')
  
   __
 
   R-help@
 
mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 
  --
  View this message in context:
  http://r.789695.n4.nabble.com/repeating-values-in-an-index-two-by-two-
 tp4680210p4680234.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] graphics or table

2013-11-11 Thread Jeff Newmiller
Your code is messed up because you posted in HTML. Also, it is not reproducible 
(e.g. no sample data, incomplete analysis code). (See 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
 for more on reproducibility.) Also, this looks very much like homework and the 
Posting Guide (mentioned in the footer of this email) indicates that homework 
is off topic here, and that you should rely on resources provided by your 
educational institution.

If you want to put all of these various values in one table, you will need to 
write code to do so, since you have to specify how you want that table laid 
out. E.g.

resultdf - data.frame( Mean=mean(res), StdDev=SD(res))

but mixing single valued measures such as mean with vector valued measures such 
as residuals in a single table usually requires repeating the SVM in many rows, 
which is why this often is avoided.

Your unnecessary and inappropriate use of as.data.frame also suggests that you 
need to spend some time studying the Introduction to R document that comes with 
the software learning the difference between vectors, lists and data frames.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Enzo Cocca enzo@gmail.com wrote:
hi

I have this code for a cross validation:

res - as.data.frame(CV_Pb_var)$residual  sqrt(mean(res^2)) 
mean(res)  mean(res^2/as.data.frame(CV_Pb_var)$var1.var)


I can not seem to export everything in one table


 also  can I to be exported it graphically?


thanks


enzo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Generating bootstrap samples from a panel data frame

2013-11-11 Thread Dereje Fentie
With a data frame (call it *d*) composed of 2000 individuals and
*n*observations for each individual (thus
*2000n* observations in total), I would like to generate *k* bootstrap
samples with replacement from *d*. Amongst other variables, *d* has a
numeric variable *id* taking on identical value for observations belonging
to the same individual.

Taking into consideration the panel nature of the data, I want to generate
many bootstrap samples with replacement and store each bootstrap sample
data frame for further use. Sampling (or selection into the bootstrap
sample) shall be based on individuals (on unique values of *id*) such that
if an individual is in a particular bootstrap sample, so will all
observations belonging to that individual.

How can I do this in r?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Bar Graph

2013-11-11 Thread Keniajin Wambui
I am using R 3.0.2 on a 64 bit machine

I have a data set from 1989-2002. The data has four variables
serialno, date, admission ward, temperature and bcg scar.

serialno admin_ward date_admn bcg_scar temp_axilla yr
70162Ward2 11-Oct-89   y   38.9 1989
70163 Ward111-Oct-91   y 37.2 1991
70164 Ward2   11-Oct-92n   37.3 1992
70165 Ward111-Oct-93y38.9 1993
70166 Ward1   11-Oct-94  y  37.7 1994
70167  Ward1   11-Oct-95  y  40 1995


I want to do a bar graph of total data (serialno) vs *(data of one of
the variables) to show the available data vs total data over the years

i am using

gplot(dta, aes(temp_axilla, fill=admin_ward)) + geom_bar() +
  facet_grid(. ~ yr, scales = free,margins=F) + geom_histogram(binwidth=300)

But can include the serialno which shows the data. how can I achieve this

-- 
Mega Six Solutions
Web Designer and Research Consultant
Kennedy Mwai

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] (no subject)

2013-11-11 Thread Viarti Eminita
Dear Mr/Mrs.

I am Viarti Eminita, student from magister fifth level of Statistics in
Bogor Agriculture University. Mr/ Mrs, now I'm analyzing ANN on time series
data, I am learning kohonen package for series data, but when I want to
predict, the predict value still on pattern scale. I wanna ask how to
change the predict value to real data value?

example:
data - read.table(D:/THESIS/Data/data.txt,head=T)
Ytraining - scale(data[1:168,3])
Xtraining - scale(data[1:168,4:6])
Xtest - scale(data[168:180,4:6])
xyf - xyf(Xtraining,Ytraining,grid = somgrid(5, 5, hexagonal))
xyf.prediction - predict(xyf,newdata=Xtest)

thank's Mr/Mrs.

best regard,

viarti

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Earth (MARS) package with categorical predictors

2013-11-11 Thread Chris Wilkinson
Steve, thanks for your reply. Here is what I get.

pkg is a 4-level categorical vector.

 is.factor(pkg)
[1] TRUE
 summary(pkg)
BGA PGA QCC QFP 
225  36  19 178 

 dat - earth(lifetime ~ pkg+pins+volts+temp+doi+logspd, degree=3) ## The
other vars are continuous.
 s - 243
 pr - c(pkg[s],pins[s],volts[s],temp[s],doi[s],logspd[s])
 pkg[s]
[1] BGA
Levels: BGA PGA QCC QFP
 pr
[1]1.00  256.003.30  125.00 2002.2581054.890349
 pred - predict(dat, newdata=pr)
Error : variable 'pkg' was fitted with type factor but type numeric was
supplied
Forging on regardless, first few rows of x are
  pkg pins volts temp  doi   logspd
1   1  256   3.3  125 2002.258 4.890349
Error: get.earth.x from model.matrix.earth from predict.earth: the number 6
of columns of x
(after factor expansion) does not match the number 8 of columns of the earth
object
expanded x:  pkg pins volts temp doi logspd
object$dirs: pkgPGA pkgQCC pkgQFP pins volts temp doi logspd
Possible remedy: check factors in the input data


Pkg is being passed as numeric 1. I'm unsure how to correctly specify pkg
for predict. In the example you gave, does the data include a categorical?

Chris

-Original Message-
From: Stephen Milborrow [mailto:mi...@sonic.net] 
Sent: Monday, November 11, 2013 7:21 AM
To: kins...@verizon.net
Subject: [R] Earth (MARS) package with categorical predictors

See if you can provide a simple reproducible example.  It's not clear 
exactly what the issue is from your question.  The following simple example 
gives the correct response:

data(etitanic)
a - earth(survived~., data=etitanic)
predict(a, newdata=etitanic[1,])

Regards,
Steve

Message: 42
Date: Thu, 07 Nov 2013 23:16:18 -0500
From: Chris Wilkinson kins...@verizon.net
To: r-help@r-project.org, Chris Wilkinson kins...@verizon.net
Subject: [R] Earth (MARS) package with categorical predictors
Message-ID: ml99syxejec3ep0u4h0je78h.1383884178...@email.android.com
Content-Type: text/plain; charset=utf-8

It appears to be legitimate to include multi-level categorical and
continuous variables in defining the model for earth (e.g. y ~ cat +
cont1 + cont2) but is it also then possible use categoricals in the
predict method using the earth result? I tried but it returns an error
which is not very informative.

Thanks

Chris

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SOLVED: Count number of consecutive zeros by group

2013-11-11 Thread Carlos Nasher
Thanks to all of you. All solutions work fine. I'm running S Ellisons
version with Williams comment. Perfect for what I'm doing.

And sorry for using a name same as a base R function (twice) ;-)

Cheers,
Carlos


2013/11/1 PIKAL Petr petr.pi...@precheza.cz

 Hi

 Yes you are right. This gives number of zeroes not max number of
 consecutive zeroes.

 Regards
 Petr


  -Original Message-
  From: arun [mailto:smartpink...@yahoo.com]
  Sent: Friday, November 01, 2013 2:17 PM
  To: R help
  Cc: PIKAL Petr; Carlos Nasher
  Subject: Re: [R] Count number of consecutive zeros by group
 
  I think this gives a different result than the one OP asked for:
 
  df1 - structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
  2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), x = c(1, 0, 0, 1, 0,
  0, 0, 1, 2, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0)), .Names = c(ID,
  x), row.names = c(NA, -22L), class = data.frame)
 
  with(df1, sapply(split(x, ID), function(x) sum(x==0)))
 
  with(df1,tapply(x,list(ID),function(y) {rl - rle(!y);
  max(c(0,rl$lengths[rl$values]))}))
 
 
  A.K.
 
 
  On Friday, November 1, 2013 6:01 AM, PIKAL Petr
  petr.pi...@precheza.cz wrote:
  Hi
 
  Another option is sapply/split/sum construction
 
  with(data, sapply(split(x, ID), function(x) sum(x==0)))
 
  Regards
  Petr
 
 
   -Original Message-
   From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
   project.org] On Behalf Of Carlos Nasher
   Sent: Thursday, October 31, 2013 6:46 PM
   To: S Ellison
   Cc: r-help@r-project.org
   Subject: Re: [R] Count number of consecutive zeros by group
  
   If I apply your function to my test data:
  
   ID - c(1,1,1,2,2,3,3,3,3)
   x - c(1,0,0,0,0,1,1,0,1)
   data - data.frame(ID=ID,x=x)
   rm(ID,x)
  
   f2 -   function(x) {
 max( rle(x == 0)$lengths )
   }
   with(data, tapply(x, ID, f2))
  
   the result is
   1 2 3
   2 2 2
  
   which is not what I'm aiming for. It should be
   1 2 3
   2 2 1
  
   I think f2 does not return the max of consecutive zeros, but the max
   of any consecutve number... Any idea how to fix this?
  
  
   2013/10/31 S Ellison s.elli...@lgcgroup.com
  
   
   
 -Original Message-
 So I want to get the max number of consecutive zeros of variable
  x
 for
each
 ID. I found rle() to be helpful for this task; so I did:

 FUN - function(x) {
   rles - rle(x == 0)
 }
 consec - lapply(split(df[,2],df[,1]), FUN)
   
You're probably better off with tapply and a function that returns
what you want. You're probably also better off with a data frame
name that isn't a function name, so I'll use dfr instead of df...
   
dfr- data.frame(x=rpois(500, 1.5), ID=gl(5,100)) #5 ID groups
numbered 1-5, equal size but that doesn't matter for tapply
   
f2 -   function(x) {
max( rle(x == 0)$lengths )
}
with(dfr, tapply(x, ID, f2))
   
   
S Ellison
   
   
***
This email and any attachments are confidential. Any
u...{{dropped:24}}
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide http://www.R-project.org/posting-
   guide.html and provide commented, minimal, self-contained,
   reproducible code.
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
  guide.html
  and provide commented, minimal, self-contained, reproducible code.




-- 
-
Carlos Nasher
Buchenstr. 12
22299 Hamburg

tel:+49 (0)40 67952962
mobil:+49 (0)175 9386725
mail:  carlos.nas...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (no subject)

2013-11-11 Thread COLLINL
Hi Viarti, can you clarify your question slightly?
(1) When you say the predict value still on pattern scale what do you
mean?  It sounds like you are saying that the prediction values are on the
Ytraining values specifically or do you mean that you expect the scale to
differ.
(2) When you say how to change the predict value to the real data value
do you mean change the scale.

Perhaps if you gave some examples of the desired outputs it would be eaiser.

Best,
Collin.


 Dear Mr/Mrs.

 I am Viarti Eminita, student from magister fifth level of Statistics in
 Bogor Agriculture University. Mr/ Mrs, now I'm analyzing ANN on time
 series
 data, I am learning kohonen package for series data, but when I want to
 predict, the predict value still on pattern scale. I wanna ask how to
 change the predict value to real data value?

 example:
 data - read.table(D:/THESIS/Data/data.txt,head=T)
 Ytraining - scale(data[1:168,3])
 Xtraining - scale(data[1:168,4:6])
 Xtest - scale(data[168:180,4:6])
 xyf - xyf(Xtraining,Ytraining,grid = somgrid(5, 5, hexagonal))
 xyf.prediction - predict(xyf,newdata=Xtest)

 thank's Mr/Mrs.

 best regard,

 viarti

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ensemble methods

2013-11-11 Thread Iut Tri Utami
Dear Mr/Mrs

I am Iut, student of graduate student in Bogor Agriculture Institur
I read a book on ensemble methods in data mining by Seni and Elder and find
R code about bagging.
I am confused how to call these functions and and how to agregate it with
the majority votes?
I think there is missing code in here.What if the function is replaced with
SVM?

Example :
genPredictors - function(seed = 123, N = 30) {
# Load package with random number generation
# for the multivariate normal distribution
library(mnormt)
# 5 features each having a standard Normal
# distribution with pairwise correlation 0.95
Rho - matrix(c(1,.95,.95,.95,.95,
+ .95, 1,.95,.95,.95,
+ .95,.95,1,.95,.95,
+ .95,.95,.95,1,.95,
+ .95,.95,.95,.95,1), 5, 5)
mu - c(rep(0,5))
set.seed(seed);
x - rmnorm(N, mu, Rho)
colnames(x) - c(x1, x2, x3, x4, x5)
return(x)
}
genTarget - function(x, N, seed = 123) {
# Response Y is generated according to:
# Pr(Y = 1 | x1 = 0.5) = 0.2,
# Pr(Y = 1 | x1  0.5) = 0.8
y - c(rep(-1, N))
set.seed(seed);
for (i in 1:N) {
if ( x[i,1] = 0.5 ) {
if ( runif(1) = 0.2 ) {
y[i] - 1
} else {
y[i] - 0
}
} else {
if ( runif(1) = 0.8 ) {
y[i] - 1
} else {
y[i] - 0
}
}
}
return(y)
}
genBStrapSamp - function(seed = 123, N = 200, Size = 30) {
set.seed(seed)
sampleList - vector(mode = list, length = N)
for (i in 1:N) {
sampleList[[i]] - sample(1:Size, replace=TRUE)
}
return(sampleList)
}
fitBStrapTrees - function(data, sampleList, N) {
treeList - vector(mode = list, length = N)
for (i in 1:N) {
tree.params=list(minsplit = 4, minbucket = 2, maxdepth = 7)
treeList[[i]] - fitClassTree(data[sampleList[[i]],],
tree.params)
}
return(treeList)
}
fitClassTree - function(x, params, w = NULL,
seed = 123) {
library(rpart)
set.seed(seed)
tree - rpart(y ~ ., method = class,
data = x, weights = w, cp = 0,
minsplit = params.minsplit,
minbucket = params.minbucket,
maxdepth = params.maxdepth)
return(tree)
}

thankyou very much

best regard,

Iut

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ensemble methods

2013-11-11 Thread Bert Gunter
See the R randomForest package.

This already does ensemble classification and regression.

-- Bert

On Mon, Nov 11, 2013 at 10:04 AM, Iut Tri Utami triutami@gmail.com wrote:
 Dear Mr/Mrs

 I am Iut, student of graduate student in Bogor Agriculture Institur
 I read a book on ensemble methods in data mining by Seni and Elder and find
 R code about bagging.
 I am confused how to call these functions and and how to agregate it with
 the majority votes?
 I think there is missing code in here.What if the function is replaced with
 SVM?

 Example :
 genPredictors - function(seed = 123, N = 30) {
 # Load package with random number generation
 # for the multivariate normal distribution
 library(mnormt)
 # 5 features each having a standard Normal
 # distribution with pairwise correlation 0.95
 Rho - matrix(c(1,.95,.95,.95,.95,
 + .95, 1,.95,.95,.95,
 + .95,.95,1,.95,.95,
 + .95,.95,.95,1,.95,
 + .95,.95,.95,.95,1), 5, 5)
 mu - c(rep(0,5))
 set.seed(seed);
 x - rmnorm(N, mu, Rho)
 colnames(x) - c(x1, x2, x3, x4, x5)
 return(x)
 }
 genTarget - function(x, N, seed = 123) {
 # Response Y is generated according to:
 # Pr(Y = 1 | x1 = 0.5) = 0.2,
 # Pr(Y = 1 | x1  0.5) = 0.8
 y - c(rep(-1, N))
 set.seed(seed);
 for (i in 1:N) {
 if ( x[i,1] = 0.5 ) {
 if ( runif(1) = 0.2 ) {
 y[i] - 1
 } else {
 y[i] - 0
 }
 } else {
 if ( runif(1) = 0.8 ) {
 y[i] - 1
 } else {
 y[i] - 0
 }
 }
 }
 return(y)
 }
 genBStrapSamp - function(seed = 123, N = 200, Size = 30) {
 set.seed(seed)
 sampleList - vector(mode = list, length = N)
 for (i in 1:N) {
 sampleList[[i]] - sample(1:Size, replace=TRUE)
 }
 return(sampleList)
 }
 fitBStrapTrees - function(data, sampleList, N) {
 treeList - vector(mode = list, length = N)
 for (i in 1:N) {
 tree.params=list(minsplit = 4, minbucket = 2, maxdepth = 7)
 treeList[[i]] - fitClassTree(data[sampleList[[i]],],
 tree.params)
 }
 return(treeList)
 }
 fitClassTree - function(x, params, w = NULL,
 seed = 123) {
 library(rpart)
 set.seed(seed)
 tree - rpart(y ~ ., method = class,
 data = x, weights = w, cp = 0,
 minsplit = params.minsplit,
 minbucket = params.minbucket,
 maxdepth = params.maxdepth)
 return(tree)
 }

 thankyou very much

 best regard,

 Iut

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

(650) 467-7374

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Date handling in R is hard to understand

2013-11-11 Thread Alemu Tadesse
Thank you all for taking your time and looking at this problem. Yes, date
handling is a problem with many languages. I have resolved the rbind not
being able to handle different data formats in a column for this specific
problem by making the data format a character and later convert back to
numeric.

Thank you again




On Mon, Nov 11, 2013 at 3:06 AM, PIKAL Petr petr.pi...@precheza.cz wrote:

 Hi

  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
  project.org] On Behalf Of Alemu Tadesse
  Sent: Friday, November 08, 2013 8:41 PM
  To: r-help@r-project.org
  Subject: [R] Date handling in R is hard to understand
 
  Dear All,
 
  I usually work with time series data. The data may come in AM/PM date
  format or on 24 hour time basis. R can not recognize the two
  differences automatically - at least for me. I have to specifically
  tell R in which time format the data is. It seems that Pandas knows how
  to handle date without being told the format. The problem arises when I
  try to shift time by a certain time. Say adding 3600 to shift it
  forward, that case I have to use something like:
  Measured_data$Date - as.POSIXct(as.character(Measured_data$Date),
  tz=,format = %m/%d/%Y %I:%M %p)+3600 or Measured_data$Date -
  as.POSIXct(as.character(Measured_data$Date),
  tz=,format = %m/%d/%Y %H:%M)+3600  depending on the format. The
  date also attaches MDT or MST and so on. When merging two data frames
  with dates of different format that may create a problem (I think).
  When I get data from excel it could be in any/random format and I
  needed to customize the date to use in R in one of the above formats.
  Any TIPS - for automatic processing with no need to specifically tell
  the data format ?
 
  Another problem I saw was that when using r bind to bind data frames,
  if one column of one of the data frames is a character data (say for
  example none - coming from mysql) format R doesn't know how to
  concatenate numeric column from the other data frame to it. I needed to

 rbind/cbind can use data.frame method which add any column specific
 format. However with normal method, it results in matrix which has to
 have common type of data in all columns (actually matrix is only vector
 with dimensions).

  str(cbind(airquality, 1:153))
 'data.frame':   153 obs. of  7 variables:
  $ ozone  : int  41 36 12 18 NA 28 23 19 8 NA ...
  $ solar.r: int  190 118 149 313 NA NA 299 99 19 194 ...
  $ wind   : num  7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
  $ temp   : int  67 72 74 62 56 66 65 59 61 69 ...
  $ month  : int  5 5 5 5 5 5 5 5 5 5 ...
  $ day: int  1 2 3 4 5 6 7 8 9 10 ...
  $ 1:153  : int  1 2 3 4 5 6 7 8 9 10 ...

 Regards
 Petr


  change the numeric to character and later after binding takes place I
  had to re-convert it to numeric. But, this causes problem in an
  automated environment. Any suggestion ?
 
  Thanks
  Mihretu
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
  guide.html
  and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to introduce missing data for complete data

2013-11-11 Thread MacQueen, Don
Here's a suggestion.

The sample() function takes random samples of sets. See
  ?sample
The set you want to take a random sample from is the rows of your data.
Represent the rows by their row numbers.
To get a vector of row numbers, you can use the seq() function. See
  ?seq

Let's suppose your data is in a data frame named 'mydat', and you want to
introduce 10 instances of missing data.

nr - nrow(mydat)
set.to.missing - sample( seq(nr) , 10)
mydat$Amount[set.to.missing] - NA


A simplified example of the core idea is:

 foo -seq(10)
 foo
 [1]  1  2  3  4  5  6  7  8  9 10
 foo[3] - NA foo
 [1]  1  2 NA  4  5  6  7  8  9 10


-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 11/10/13 10:31 PM, dila radi dilarad...@gmail.com wrote:

Hi,

Im new R users. In my research I use rainfall data and Im interested in
estimating missing data. I would like to use Normal Ratio Method to
estimate missing data. My problem is, how do I introduce missing data
randomly within my complete set of data?


Stn ID  Year  Mth   Day   Amount
48603 71 1 1 1
48603 71 1 2 0.5
48603 71 1 3 1.3
48603 71 1 4 0.8
48603 71 1 5 0
48603 71 1 6 0
48603 71 1 7 0
...

Thank you so much for your attention and help.

Regards,
Dila

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How do I derive a logical variable in a dataframe based on another row in the same dataframe?

2013-11-11 Thread Lopez, Dan
Hi R Experts,

How do I mark rows in dataframe based on a condition that's based off another 
row in the same dataframe?

I want to mark any combination of FY,ID, TT=='HC' rows that have a 
FY,ID,TT=='TER' row with a 1.  In my example below this is rows 4, 7 and 11.
My data looks something like this:
FY ID  TT
1  FY09  1  HC
2  FY10  1  HC
3  FY11  1  HC
4  FY12  1  HC
5  FY12  1 TER
6  FY09  2  HC
7  FY10  2  HC
8  FY10  2 TER
9  FY11  2  HC
10 FY12  2  HC
11 FY13  2  HC
12 FY13  2 TER

I know for this specific example I can use:
HTDF$EXCL3-1*duplicated(HTDF[,1:2],fromLast=T)

However my actual data set is NOT sorted by FY, ID and TT. TT is a binary 
factor variable. I want to know if there is another way of doing the same thing 
without sorting the data.
I tried the last line of code below but it gave me unexpected results. It marks 
the first three rows with 0 and everything else with 1.  Based on the warning 
messages looks like it has something to do with longer object length is not a 
multiple of shorter object length. But I am now stumped.

#REPRODUCIBLE EXAMPLE
FY-factor(c(FY09,FY10,FY11,FY12,FY12,FY09,FY10,FY10,FY11,FY12,FY13,FY13))
ID-c(rep(1,5),rep(2,7))
TT-factor(c(rep(HC,4),TER,HC,HC,TER,HC,HC,HC,TER))
HTDF-data.frame(FY,ID,TT)

#Summarize data and get max TT. TT is a binary factor variable
library(sqldf)
HTDF.MAX-sqldf('SELECT ID,FY,Max(TT) MAXTT FROM HTDF GROUP BY ID,FY')

# Initiate new variable and assign 0 or 1
HTDF$EXCL-0

# THIS IS WHERE I AM GETTING UNEXPECTE RESULTS
HTDF$EXCL-ifelse(HTDF$FY==HTDF.MAX$FYHTDF$ID==HTDF.MAX$IDHTDF$TT==HTDF.MAX$MAXTT,0,1)


Dan Lopez
Workforce Analyst
LLNL
HRIM - Workforce Analytics  Metrics

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] select .txt from .txt in a directory

2013-11-11 Thread Zilefac Elvis
Thanks, AK.
The three codes worked as expected.
Again, thanks so much for understanding my problem and proving the right 
solutions.
Atem.



On Saturday, November 9, 2013 6:27 PM, arun smartpink...@yahoo.com wrote:
 
HI,

The code could be shortened by using ?merge or ?join().
library(plyr)
##Using the output from `lst6`


lst7 - lapply(lst6,function(x) {x1 - 
data.frame(Year=rep(1961:2005,each=12),Mo=rep(1:12,45)); x2 
-join(x1,x,type=left,by=c(Year,Mo))})

##rest are the same (only change in object names)

 sapply(lst7,nrow)
 lst8 -lapply(lst7,function(x) 
data.frame(col1=unlist(data.frame(t(x)[-c(1:2),]),use.names=FALSE))) 
  lst9- lapply(seq_along(lst8),function(i){
    x- lst11[[i]]
    colnames(x)- lstf1[i]
    row.names(x)- 1:nrow(x)
    x
  }) 
sapply(lst9,nrow)
res2New - do.call(cbind,lst9)
 dim(res2New)
#[1] 16740    98
res2New[res2New ==-.9]-NA # change missing value identifier as in your 
data set
which(res2New==-.9)
#integer(0)

dates1-seq.Date(as.Date('1Jan1961',format=%d%b%Y),as.Date('31Dec2005',format=%d%b%Y),by=day)
dates2- as.character(dates1)
sldat- split(dates2,list(gsub(-.*,,dates2)))
lst12-lapply(sldat,function(x) lapply(split(x,gsub(.*-(.*)-.*,\\1,x)), 
function(y){x1-as.numeric(gsub(.*-.*-(.*),\\1,y));if((31-max(x1))0) 
{x2-seq(max(x1)+1,31,1);x3-paste0(unique(gsub((.*-.*-).*,\\1,y)),x2);c(y,x3)}
 else y} ))
any(sapply(lst12,function(x) any(lapply(x,length)!=31)))
#[1] FALSE

lst22-lapply(lst12,function(x) unlist(x,use.names=FALSE))
sapply(lst22,length)
dates3-unlist(lst22,use.names=FALSE)
length(dates3)
res3New - data.frame(dates=dates3,res2New,stringsAsFactors=FALSE)
str(res3New)
res3New$dates-as.Date(res3New$dates)
res4New - res3New[!is.na(res3New$dates),]
res4New[1:3,1:3]
dim(res4New)
colnames(res4) - colnames(res4New)
 identical(res4,res4New)
#[1] TRUE

A.K.






On Saturday, November 9, 2013 5:46 PM, arun smartpink...@yahoo.com wrote:


Hi,
Try:
library(stringr)
# Created the selected files (98) in a separate working  folder 
(SubsetFiles1) (refer to my previous mail)
filelst - list.files()
#Sublst - filelst[1:2]
res - lapply(filelst,function(x) {con - file(x)
     Lines1 - readLines(con) close(con)
     Lines2 - Lines1[-1]
     Lines3 - str_split(Lines2,-.9M)
     Lines4 - str_trim(unlist(lapply(Lines3,function(x) {x[x==] - NA
     paste(x,collapse= )})))
     Lines5 - gsub((\\d+)[A-Za-z],\\1,Lines4)
     res1 - read.table(text=Lines5,sep=,header=FALSE,fill=TRUE)
     res1})

##Created another folder Modified to store the res files
lapply(seq_along(res),function(i) 
write.table(res[[i]],paste(/home/arunksa111/Zl/Modified,paste0(Mod_,filelst[i]),sep=/),row.names=FALSE,quote=FALSE))

 lstf1 - list.files(path=/home/arunksa111/Zl/Modified)  

lst1 - lapply(lstf1,function(x) 
readLines(paste(/home/arunksa111/Zl/Modified,x,sep=/)))
 which(lapply(lst1,function(x) length(grep(\\d+-.9,x)))0 )
 #[1]  7 11 14 15 30 32 39 40 42 45 46 53 60 65 66 68 69 70 73 74 75 78 80 82 83
#[26] 86 87 90 91 93

lst2 - lapply(lst1,function(x) gsub((\\d+)(-.9),\\1 \\2,x))
 #lapply(lst2,function(x) x[grep(\\d+-.9,x)]) ##checking for the pattern

lst3 - lapply(lst2,function(x) {x-gsub((-.9)(-.9),\\1 \\2,x)})#
#lapply(lst3,function(x) x[grep(\\d+-.9,x)])  ##checking for the pattern
# lapply(lst3,function(x) x[grep(-.9,x)]) ###second check
lst4 - lapply(lst3,function(x) gsub((Day) (\\d+),\\1_\\2, x[-1]))  
#removed the additional header V1, V2, etc.

#sapply(lst4,function(x) length(strsplit(x[1], )[[1]])) #checking the number 
of columns that should be present
lst5 - lapply(lst4,function(x) unlist(lapply(x, function(y) word(y,1,33
lst6 - lapply(lst5,function(x) 
read.table(text=x,header=TRUE,stringsAsFactors=FALSE,sep=,fill=TRUE))
# head(lst6[[94]],3)
lst7 - lapply(lst6,function(x) x[x$Year =1961  x$Year =2005,])
#head(lst7[[45]],3)
 lst8 - lapply(lst7,function(x) x[!is.na(x$Year),])


lst9 - lapply(lst8,function(x) {
    if((min(x$Year)1961)|(max(x$Year)2005)){
  n1- (min(x$Year)-1961)*12
  x1- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n1))
  n2- (2005-max(x$Year))*12
  x2- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n2))
   colnames(x1) - colnames(x)
   colnames(x2) - colnames(x)        
  x3- rbind(x1,x,x2)
    }
   else if((min(x$Year)==1961)  (max(x$Year)==2005)) {
      if((min(x$Mo[x$Year==1961])1)|(max(x$Mo[x$Year==2005])12)){
       n1 - min(x$Mo[x$Year==1961])-1
       x1 - as.data.frame(matrix(NA,ncol=ncol(x),nrow=n1))
       n2 - (12-max(x$Mo[x$Year==2005]))      
       x2 - as.data.frame(matrix(NA,ncol=ncol(x),nrow=n2))
       colnames(x1) - colnames(x)
       colnames(x2) - colnames(x)
       x3 - rbind(x1,x,x2)
      }
        else {    
        x
    }
 
    } })

which(sapply(lst9,nrow)!=540)
#[1] 45 46 54 64 65 66 70 75 97
lst10 - lapply(lst9,function(x) {x1 - x[!is.na(x$Year),]
             hx1 - head(x1,1)
             tx1 - tail(x1,1)
             x2 - as.data.frame(matrix(NA, ncol=ncol(x), 

Re: [R] How do I derive a logical variable in a dataframe based on another row in the same dataframe?

2013-11-11 Thread William Dunlap
If you have an algorithm that only works on sorted data, it is easy to
write a function that sorts [a copy of] the data, applies the algorithm,
then puts the result back in the order of the original data.  E.g.,

f - function (data)  {
ord - with(data, order(TT, ID, FY)) # data[ord,] will be sorted in your 
required order
data$EXCL3 - 1 * duplicated(data[ord, 1:2], fromLast = TRUE)[order(ord)] # 
[order(ord)] puts it back in original order
data
}

E.g.,
 i - c(12, 5, 10, 6, 4, 2, 1, 3, 7, 11, 9, 8)
 scrambled - HTDF[i,]
 f(scrambled)
 FY ID  TT EXCL3
12 FY13  2 TER 0
5  FY12  1 TER 0
10 FY12  2  HC 0
6  FY09  2  HC 0
4  FY12  1  HC 1
2  FY10  1  HC 0
1  FY09  1  HC 0
3  FY11  1  HC 0
7  FY10  2  HC 1
11 FY13  2  HC 1
9  FY11  2  HC 0
8  FY10  2 TER 0

Or is your dataset so large that this sorting and unsorting would take too
long or too much space?

(There are faster ways of doing this than duplicated(), but the details depend
on some details like whether or not there may be more than 2 FY/ID duplicates.]

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Lopez, Dan
 Sent: Monday, November 11, 2013 12:50 PM
 To: R help (r-help@r-project.org)
 Subject: [R] How do I derive a logical variable in a dataframe based on 
 another row in the
 same dataframe?
 
 Hi R Experts,
 
 How do I mark rows in dataframe based on a condition that's based off another 
 row in
 the same dataframe?
 
 I want to mark any combination of FY,ID, TT=='HC' rows that have a 
 FY,ID,TT=='TER' row
 with a 1.  In my example below this is rows 4, 7 and 11.
 My data looks something like this:
 FY ID  TT
 1  FY09  1  HC
 2  FY10  1  HC
 3  FY11  1  HC
 4  FY12  1  HC
 5  FY12  1 TER
 6  FY09  2  HC
 7  FY10  2  HC
 8  FY10  2 TER
 9  FY11  2  HC
 10 FY12  2  HC
 11 FY13  2  HC
 12 FY13  2 TER
 
 I know for this specific example I can use:
 HTDF$EXCL3-1*duplicated(HTDF[,1:2],fromLast=T)
 
 However my actual data set is NOT sorted by FY, ID and TT. TT is a binary 
 factor variable.
 I want to know if there is another way of doing the same thing without 
 sorting the data.
 I tried the last line of code below but it gave me unexpected results. It 
 marks the first
 three rows with 0 and everything else with 1.  Based on the warning messages 
 looks like
 it has something to do with longer object length is not a multiple of shorter 
 object
 length. But I am now stumped.
 
 #REPRODUCIBLE EXAMPLE
 FY-
 factor(c(FY09,FY10,FY11,FY12,FY12,FY09,FY10,FY10,FY11,FY12,FY13
 ,FY13))
 ID-c(rep(1,5),rep(2,7))
 TT-factor(c(rep(HC,4),TER,HC,HC,TER,HC,HC,HC,TER))
 HTDF-data.frame(FY,ID,TT)
 
 #Summarize data and get max TT. TT is a binary factor variable
 library(sqldf)
 HTDF.MAX-sqldf('SELECT ID,FY,Max(TT) MAXTT FROM HTDF GROUP BY ID,FY')
 
 # Initiate new variable and assign 0 or 1
 HTDF$EXCL-0
 
 # THIS IS WHERE I AM GETTING UNEXPECTE RESULTS
 HTDF$EXCL-
 ifelse(HTDF$FY==HTDF.MAX$FYHTDF$ID==HTDF.MAX$IDHTDF$TT==HTDF.MAX$MAX
 TT,0,1)
 
 
 Dan Lopez
 Workforce Analyst
 LLNL
 HRIM - Workforce Analytics  Metrics
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Data Security when using R

2013-11-11 Thread seanstclair

   Hello.  At the company I work for, I recently requested having R loaded onto
   my desktop and some of my colleagues.

   My company's IT/Security groups are having trouble assessing whether R
   software meets their standards.

   Can anyone point me to a source where i can read about how R uses data? does
   it store the data somewhere?  Does data ever actually leave the company's
   environment?  etc...?

   Thanks.
   Sean
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I derive a logical variable in a dataframe based on another row in the same dataframe?

2013-11-11 Thread arun
Hi,
You may try:

fun1 - function(dat){
dat$EXCL3 - 0
dat$EXCL3[dat$TT==HC] - 1*as.character(interaction(dat[,1:2]))[dat$TT==HC] 
%in% as.character(interaction(dat[,1:2]))[dat$TT==TER]
dat
}

fun1(HTDF)

set.seed(14)
 indx - sample(1:nrow(HTDF),12)
 HTDF1 - HTDF[indx,]

fun1(HTDF1)

A.K.




On Monday, November 11, 2013 4:49 PM, Lopez, Dan lopez...@llnl.gov wrote:
Hi R Experts,

How do I mark rows in dataframe based on a condition that's based off another 
row in the same dataframe?

I want to mark any combination of FY,ID, TT=='HC' rows that have a 
FY,ID,TT=='TER' row with a 1.  In my example below this is rows 4, 7 and 11.
My data looks something like this:
    FY ID  TT
1  FY09  1  HC
2  FY10  1  HC
3  FY11  1  HC
4  FY12  1  HC
5  FY12  1 TER
6  FY09  2  HC
7  FY10  2  HC
8  FY10  2 TER
9  FY11  2  HC
10 FY12  2  HC
11 FY13  2  HC
12 FY13  2 TER

I know for this specific example I can use:
HTDF$EXCL3-1*duplicated(HTDF[,1:2],fromLast=T)

However my actual data set is NOT sorted by FY, ID and TT. TT is a binary 
factor variable. I want to know if there is another way of doing the same thing 
without sorting the data.
I tried the last line of code below but it gave me unexpected results. It marks 
the first three rows with 0 and everything else with 1.  Based on the warning 
messages looks like it has something to do with longer object length is not a 
multiple of shorter object length. But I am now stumped.

#REPRODUCIBLE EXAMPLE
FY-factor(c(FY09,FY10,FY11,FY12,FY12,FY09,FY10,FY10,FY11,FY12,FY13,FY13))
ID-c(rep(1,5),rep(2,7))
TT-factor(c(rep(HC,4),TER,HC,HC,TER,HC,HC,HC,TER))
HTDF-data.frame(FY,ID,TT)

#Summarize data and get max TT. TT is a binary factor variable
library(sqldf)
HTDF.MAX-sqldf('SELECT ID,FY,Max(TT) MAXTT FROM HTDF GROUP BY ID,FY')

# Initiate new variable and assign 0 or 1
HTDF$EXCL-0

# THIS IS WHERE I AM GETTING UNEXPECTE RESULTS
HTDF$EXCL-ifelse(HTDF$FY==HTDF.MAX$FYHTDF$ID==HTDF.MAX$IDHTDF$TT==HTDF.MAX$MAXTT,0,1)


Dan Lopez
Workforce Analyst
LLNL
HRIM - Workforce Analytics  Metrics

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] colours legend, for loop,density plot

2013-11-11 Thread Mª Teresa Martinez Soriano
Hi , thanks in advance I have the follow code:

   normal-sort(rnorm(1000))cauchy-sort(rcauchy(1000)) 
t3-sort(rt(1000,3))t10-sort(rt(1000, 10))
col-c(green,blue,orange,purple)
v-list(normal,cauchy,t3,t10)   names(v)-c(Normal, Cauchy, T-stud 3 df, 
T-stud 10 df)
par(mfrow=c(1,2))   
plot(density(normal),col=col[[1]],main=Funciones de densidad) for ( i in 2:4) 
{   lines(density(v[[i]]),col=col[[i]],lty=i+2) }
legend(x=-4,y=0.3,names(v),col=col,cex=0.6)

The problem is that in the legend doesn't appear colours so I can not identify 
which curve is each one, please could you tell me what do I neet to change in 
order to solve it??
Thanks a lot, Tere
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I derive a logical variable in a dataframe based on another row in the same dataframe?

2013-11-11 Thread Gabor Grothendieck
On Mon, Nov 11, 2013 at 3:50 PM, Lopez, Dan lopez...@llnl.gov wrote:
 Hi R Experts,

 How do I mark rows in dataframe based on a condition that's based off another 
 row in the same dataframe?

 I want to mark any combination of FY,ID, TT=='HC' rows that have a 
 FY,ID,TT=='TER' row with a 1.  In my example below this is rows 4, 7 and 11.
 My data looks something like this:
 FY ID  TT
 1  FY09  1  HC
 2  FY10  1  HC
 3  FY11  1  HC
 4  FY12  1  HC
 5  FY12  1 TER
 6  FY09  2  HC
 7  FY10  2  HC
 8  FY10  2 TER
 9  FY11  2  HC
 10 FY12  2  HC
 11 FY13  2  HC
 12 FY13  2 TER

 I know for this specific example I can use:
 HTDF$EXCL3-1*duplicated(HTDF[,1:2],fromLast=T)

 However my actual data set is NOT sorted by FY, ID and TT. TT is a binary 
 factor variable. I want to know if there is another way of doing the same 
 thing without sorting the data.
 I tried the last line of code below but it gave me unexpected results. It 
 marks the first three rows with 0 and everything else with 1.  Based on the 
 warning messages looks like it has something to do with longer object length 
 is not a multiple of shorter object length. But I am now stumped.

 #REPRODUCIBLE EXAMPLE
 FY-factor(c(FY09,FY10,FY11,FY12,FY12,FY09,FY10,FY10,FY11,FY12,FY13,FY13))
 ID-c(rep(1,5),rep(2,7))
 TT-factor(c(rep(HC,4),TER,HC,HC,TER,HC,HC,HC,TER))
 HTDF-data.frame(FY,ID,TT)

 #Summarize data and get max TT. TT is a binary factor variable
 library(sqldf)
 HTDF.MAX-sqldf('SELECT ID,FY,Max(TT) MAXTT FROM HTDF GROUP BY ID,FY')

 # Initiate new variable and assign 0 or 1
 HTDF$EXCL-0

 # THIS IS WHERE I AM GETTING UNEXPECTE RESULTS
 HTDF$EXCL-ifelse(HTDF$FY==HTDF.MAX$FYHTDF$ID==HTDF.MAX$IDHTDF$TT==HTDF.MAX$MAXTT,0,1)

For each FY, ID group ave applies f to TT == 'TER' returning
a logical vector that is TRUE for each HC if TER is in the group
ad otherwise FALSE.   Finally we add 0 to convert from
TRUE/FALSE to 1/0.

The rows of HTDF need not be in any specific order and their
oreder will be preserved.

 f - function(x) any(x)  !x
 transform(HTDF, EXCL = ave(TT == 'TER', FY, ID, FUN = f) + 0)
 FY ID  TT EXCL
1  FY09  1  HC0
2  FY10  1  HC0
3  FY11  1  HC0
4  FY12  1  HC1
5  FY12  1 TER0
6  FY09  2  HC0
7  FY10  2  HC1
8  FY10  2 TER0
9  FY11  2  HC0
10 FY12  2  HC0
11 FY13  2  HC1
12 FY13  2 TER0


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] colours legend, for loop,density plot

2013-11-11 Thread Jim Lemon

On 11/12/2013 09:52 AM, Mª Teresa Martinez Soriano wrote:

normal-sort(rnorm(1000))cauchy-sort(rcauchy(1000))  t3-sort(rt(1000,3)) 
t10-sort(rt(1000, 10))
col-c(green,blue,orange,purple) v-list(normal,cauchy,t3,t10)
names(v)-c(Normal, Cauchy, T-stud 3 df, T-stud 10 df)
par(mfrow=c(1,2))   
plot(density(normal),col=col[[1]],main=Funciones de densidad)   for ( i 
in 2:4) {   lines(density(v[[i]]),col=col[[i]],lty=i+2) }
legend(x=-4,y=0.3,names(v),col=col,cex=0.6)


Hi Tere,
Try this:

legend(x=-2,y=0.04,names(v),col=col,cex=0.6,lty=c(1,4:6))

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Security when using R

2013-11-11 Thread Kevin Wright
As a starting point for answering this question, you might sear Google for
The RAppArmor Package: Enforcing Security Policies in R Using Dynamic
Sandboxing on Linux

kw



On Mon, Nov 11, 2013 at 4:01 PM, seanstcl...@verizon.net wrote:


Hello.  At the company I work for, I recently requested having R loaded
 onto
my desktop and some of my colleagues.

My company's IT/Security groups are having trouble assessing whether R
software meets their standards.

Can anyone point me to a source where i can read about how R uses data?
 does
it store the data somewhere?  Does data ever actually leave the
 company's
environment?  etc...?

Thanks.
Sean
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Kevin Wright

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Security when using R

2013-11-11 Thread MacQueen, Don
See below

-- 
Don MacQueen

Lawrence Livermore National Laboratory


On 11/11/13 2:01 PM, seanstcl...@verizon.net seanstcl...@verizon.net
wrote:


   Hello.  At the company I work for, I recently requested having R
loaded onto
   my desktop and some of my colleagues.

   My company's IT/Security groups are having trouble assessing whether R
   software meets their standards.

   Can anyone point me to a source where i can read about how R uses data?

I would start by downloading  An Introduction to R from CRAN and
searching on save and .RData.


 does
   it store the data somewhere?

Yes. In memory to start, and optionally to disk, normally somewhere in the
user's home directory or working directory.

  Does data ever actually leave the company's
   environment? 

Not unless the user does something explicit to make it happen.

 etc...?

No less secure than, say, MS Excel, I would think.

Others with a deeper understanding than I may point out exceptions or
special cases worth knowing about ... I hope.


   Thanks.
   Sean
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I derive a logical variable in a dataframe based on another row in the same dataframe?

2013-11-11 Thread Lopez, Dan


Thanks.
Dan

-Original Message-
From: arun [mailto:smartpink...@yahoo.com] 
Sent: Monday, November 11, 2013 2:26 PM
To: R help (r-help@r-project.org)
Cc: Lopez, Dan
Subject: Re: [R] How do I derive a logical variable in a dataframe based on 
another row in the same dataframe?

Hi,
You may try:

fun1 - function(dat){
dat$EXCL3 - 0
dat$EXCL3[dat$TT==HC] - 1*as.character(interaction(dat[,1:2]))[dat$TT==HC] 
%in% as.character(interaction(dat[,1:2]))[dat$TT==TER]
dat
}

fun1(HTDF)

set.seed(14)
 indx - sample(1:nrow(HTDF),12)
 HTDF1 - HTDF[indx,]

fun1(HTDF1)

A.K.




On Monday, November 11, 2013 4:49 PM, Lopez, Dan lopez...@llnl.gov wrote:
Hi R Experts,

How do I mark rows in dataframe based on a condition that's based off another 
row in the same dataframe?

I want to mark any combination of FY,ID, TT=='HC' rows that have a 
FY,ID,TT=='TER' row with a 1.  In my example below this is rows 4, 7 and 11.
My data looks something like this:
    FY ID  TT
1  FY09  1  HC
2  FY10  1  HC
3  FY11  1  HC
4  FY12  1  HC
5  FY12  1 TER
6  FY09  2  HC
7  FY10  2  HC
8  FY10  2 TER
9  FY11  2  HC
10 FY12  2  HC
11 FY13  2  HC
12 FY13  2 TER

I know for this specific example I can use:
HTDF$EXCL3-1*duplicated(HTDF[,1:2],fromLast=T)

However my actual data set is NOT sorted by FY, ID and TT. TT is a binary 
factor variable. I want to know if there is another way of doing the same thing 
without sorting the data.
I tried the last line of code below but it gave me unexpected results. It marks 
the first three rows with 0 and everything else with 1.  Based on the warning 
messages looks like it has something to do with longer object length is not a 
multiple of shorter object length. But I am now stumped.

#REPRODUCIBLE EXAMPLE
FY-factor(c(FY09,FY10,FY11,FY12,FY12,FY09,FY10,FY10,FY11,FY12,FY13,FY13))
ID-c(rep(1,5),rep(2,7))
TT-factor(c(rep(HC,4),TER,HC,HC,TER,HC,HC,HC,TER))
HTDF-data.frame(FY,ID,TT)

#Summarize data and get max TT. TT is a binary factor variable
library(sqldf)
HTDF.MAX-sqldf('SELECT ID,FY,Max(TT) MAXTT FROM HTDF GROUP BY ID,FY')

# Initiate new variable and assign 0 or 1
HTDF$EXCL-0

# THIS IS WHERE I AM GETTING UNEXPECTE RESULTS
HTDF$EXCL-ifelse(HTDF$FY==HTDF.MAX$FYHTDF$ID==HTDF.MAX$IDHTDF$TT==HTDF.MAX$MAXTT,0,1)


Dan Lopez
Workforce Analyst
LLNL
HRIM - Workforce Analytics  Metrics

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I derive a logical variable in a dataframe based on another row in the same dataframe?

2013-11-11 Thread Lopez, Dan
Great advice!
Thank you.

Dan


-Original Message-
From: William Dunlap [mailto:wdun...@tibco.com] 
Sent: Monday, November 11, 2013 1:18 PM
To: Lopez, Dan; R help (r-help@r-project.org)
Subject: RE: [R] How do I derive a logical variable in a dataframe based on 
another row in the same dataframe?

If you have an algorithm that only works on sorted data, it is easy to write a 
function that sorts [a copy of] the data, applies the algorithm, then puts the 
result back in the order of the original data.  E.g.,

f - function (data)  {
ord - with(data, order(TT, ID, FY)) # data[ord,] will be sorted in your 
required order
data$EXCL3 - 1 * duplicated(data[ord, 1:2], fromLast = TRUE)[order(ord)] # 
[order(ord)] puts it back in original order
data
}

E.g.,
 i - c(12, 5, 10, 6, 4, 2, 1, 3, 7, 11, 9, 8) scrambled - HTDF[i,]
 f(scrambled)
 FY ID  TT EXCL3
12 FY13  2 TER 0
5  FY12  1 TER 0
10 FY12  2  HC 0
6  FY09  2  HC 0
4  FY12  1  HC 1
2  FY10  1  HC 0
1  FY09  1  HC 0
3  FY11  1  HC 0
7  FY10  2  HC 1
11 FY13  2  HC 1
9  FY11  2  HC 0
8  FY10  2 TER 0

Or is your dataset so large that this sorting and unsorting would take too long 
or too much space?

(There are faster ways of doing this than duplicated(), but the details depend 
on some details like whether or not there may be more than 2 FY/ID duplicates.]

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Lopez, Dan
 Sent: Monday, November 11, 2013 12:50 PM
 To: R help (r-help@r-project.org)
 Subject: [R] How do I derive a logical variable in a dataframe based 
 on another row in the same dataframe?
 
 Hi R Experts,
 
 How do I mark rows in dataframe based on a condition that's based off 
 another row in the same dataframe?
 
 I want to mark any combination of FY,ID, TT=='HC' rows that have a 
 FY,ID,TT=='TER' row with a 1.  In my example below this is rows 4, 7 and 11.
 My data looks something like this:
 FY ID  TT
 1  FY09  1  HC
 2  FY10  1  HC
 3  FY11  1  HC
 4  FY12  1  HC
 5  FY12  1 TER
 6  FY09  2  HC
 7  FY10  2  HC
 8  FY10  2 TER
 9  FY11  2  HC
 10 FY12  2  HC
 11 FY13  2  HC
 12 FY13  2 TER
 
 I know for this specific example I can use:
 HTDF$EXCL3-1*duplicated(HTDF[,1:2],fromLast=T)
 
 However my actual data set is NOT sorted by FY, ID and TT. TT is a binary 
 factor variable.
 I want to know if there is another way of doing the same thing without 
 sorting the data.
 I tried the last line of code below but it gave me unexpected results. 
 It marks the first three rows with 0 and everything else with 1.  
 Based on the warning messages looks like it has something to do with 
 longer object length is not a multiple of shorter object length. But I am now 
 stumped.
 
 #REPRODUCIBLE EXAMPLE
 FY-
 factor(c(FY09,FY10,FY11,FY12,FY12,FY09,FY10,FY10,FY11
 ,FY12,FY13
 ,FY13))
 ID-c(rep(1,5),rep(2,7))
 TT-factor(c(rep(HC,4),TER,HC,HC,TER,HC,HC,HC,TER))
 HTDF-data.frame(FY,ID,TT)
 
 #Summarize data and get max TT. TT is a binary factor variable
 library(sqldf)
 HTDF.MAX-sqldf('SELECT ID,FY,Max(TT) MAXTT FROM HTDF GROUP BY 
 ID,FY')
 
 # Initiate new variable and assign 0 or 1
 HTDF$EXCL-0
 
 # THIS IS WHERE I AM GETTING UNEXPECTE RESULTS
 HTDF$EXCL-
 ifelse(HTDF$FY==HTDF.MAX$FYHTDF$ID==HTDF.MAX$IDHTDF$TT==HTDF.MAX$MAX
 TT,0,1)
 
 
 Dan Lopez
 Workforce Analyst
 LLNL
 HRIM - Workforce Analytics  Metrics
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I derive a logical variable in a dataframe based on another row in the same dataframe?

2013-11-11 Thread Lopez, Dan
Hi Gabor,

This is a great solution!  I will use it.

Thank you!

Dan


-Original Message-
From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] 
Sent: Monday, November 11, 2013 3:02 PM
To: Lopez, Dan
Cc: R help (r-help@r-project.org)
Subject: Re: [R] How do I derive a logical variable in a dataframe based on 
another row in the same dataframe?

On Mon, Nov 11, 2013 at 3:50 PM, Lopez, Dan lopez...@llnl.gov wrote:
 Hi R Experts,

 How do I mark rows in dataframe based on a condition that's based off another 
 row in the same dataframe?

 I want to mark any combination of FY,ID, TT=='HC' rows that have a 
 FY,ID,TT=='TER' row with a 1.  In my example below this is rows 4, 7 and 11.
 My data looks something like this:
 FY ID  TT
 1  FY09  1  HC
 2  FY10  1  HC
 3  FY11  1  HC
 4  FY12  1  HC
 5  FY12  1 TER
 6  FY09  2  HC
 7  FY10  2  HC
 8  FY10  2 TER
 9  FY11  2  HC
 10 FY12  2  HC
 11 FY13  2  HC
 12 FY13  2 TER

 I know for this specific example I can use:
 HTDF$EXCL3-1*duplicated(HTDF[,1:2],fromLast=T)

 However my actual data set is NOT sorted by FY, ID and TT. TT is a binary 
 factor variable. I want to know if there is another way of doing the same 
 thing without sorting the data.
 I tried the last line of code below but it gave me unexpected results. It 
 marks the first three rows with 0 and everything else with 1.  Based on the 
 warning messages looks like it has something to do with longer object length 
 is not a multiple of shorter object length. But I am now stumped.

 #REPRODUCIBLE EXAMPLE
 FY-factor(c(FY09,FY10,FY11,FY12,FY12,FY09,FY10,FY10,
 FY11,FY12,FY13,FY13))
 ID-c(rep(1,5),rep(2,7))
 TT-factor(c(rep(HC,4),TER,HC,HC,TER,HC,HC,HC,TER))
 HTDF-data.frame(FY,ID,TT)

 #Summarize data and get max TT. TT is a binary factor variable
 library(sqldf)
 HTDF.MAX-sqldf('SELECT ID,FY,Max(TT) MAXTT FROM HTDF GROUP BY 
 ID,FY')

 # Initiate new variable and assign 0 or 1
 HTDF$EXCL-0

 # THIS IS WHERE I AM GETTING UNEXPECTE RESULTS
 HTDF$EXCL-ifelse(HTDF$FY==HTDF.MAX$FYHTDF$ID==HTDF.MAX$IDHTDF$TT==H
 TDF.MAX$MAXTT,0,1)

For each FY, ID group ave applies f to TT == 'TER' returning a logical vector 
that is TRUE for each HC if TER is in the group
ad otherwise FALSE.   Finally we add 0 to convert from
TRUE/FALSE to 1/0.

The rows of HTDF need not be in any specific order and their oreder will be 
preserved.

 f - function(x) any(x)  !x
 transform(HTDF, EXCL = ave(TT == 'TER', FY, ID, FUN = f) + 0)
 FY ID  TT EXCL
1  FY09  1  HC0
2  FY10  1  HC0
3  FY11  1  HC0
4  FY12  1  HC1
5  FY12  1 TER0
6  FY09  2  HC0
7  FY10  2  HC1
8  FY10  2 TER0
9  FY11  2  HC0
10 FY12  2  HC0
11 FY13  2  HC1
12 FY13  2 TER0


--
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Update a variable in a dataframe based on variables in another dataframe of a different size

2013-11-11 Thread Lopez, Dan
Below is how I am currently doing this. Is there a more efficient way to do 
this?
The scenario is that I have two dataframes of different sizes. I need to update 
one binary factor variable in one of those dataframes by matching on two 
variables. If there is no match keep as is otherwise update. Also the variable 
being update, TT in this case should remain a binary factor variable 
(levels='HC','TER')

HTDF2-merge(H_DF,T_DF,by=c(FY,ID),all.x=T)
HTDF2$TT-factor(ifelse(is.na(HTDF2$TT.y),HTDF2$TT.x,HTDF2$TT.y),labels=c(HC,TER))
HTDF2-HTDF2[,-(3:4)]


# REPRODUCIBLE EXAMPLE DATA FOR ABOVE..
 dput(H_DF)
structure(list(FY = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L,
5L), .Label = c(FY09, FY10, FY11, FY12, FY13), class = factor),
ID = c(1, 1, 1, 1, 2, 2, 2, 2, 2), TT = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c(HC, TER), class = factor)), 
.Names = c(FY,
ID, TT), class = data.frame, row.names = c(1L, 2L, 3L,
4L, 6L, 7L, 9L, 10L, 11L))
 dput(T_DF)
structure(list(FY = structure(c(4L, 2L, 5L), .Label = c(FY09,
FY10, FY11, FY12, FY13), class = factor), ID = c(1,
2, 2), TT = structure(c(2L, 2L, 2L), .Label = c(HC, TER), class = 
factor)), .Names = c(FY,
ID, TT), row.names = c(5L, 8L, 12L), class = data.frame)

Dan Lopez
LLNL, HRIM - Workforce Analytics  Metrics

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Update a variable in a dataframe based on variables in another dataframe of a different size

2013-11-11 Thread Gabor Grothendieck
On Mon, Nov 11, 2013 at 8:04 PM, Lopez, Dan lopez...@llnl.gov wrote:
 Below is how I am currently doing this. Is there a more efficient way to do 
 this?
 The scenario is that I have two dataframes of different sizes. I need to 
 update one binary factor variable in one of those dataframes by matching on 
 two variables. If there is no match keep as is otherwise update. Also the 
 variable being update, TT in this case should remain a binary factor variable 
 (levels='HC','TER')

 HTDF2-merge(H_DF,T_DF,by=c(FY,ID),all.x=T)
 HTDF2$TT-factor(ifelse(is.na(HTDF2$TT.y),HTDF2$TT.x,HTDF2$TT.y),labels=c(HC,TER))
 HTDF2-HTDF2[,-(3:4)]


 # REPRODUCIBLE EXAMPLE DATA FOR ABOVE..
 dput(H_DF)
 structure(list(FY = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L,
 5L), .Label = c(FY09, FY10, FY11, FY12, FY13), class = factor),
 ID = c(1, 1, 1, 1, 2, 2, 2, 2, 2), TT = structure(c(1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c(HC, TER), class = factor)), 
 .Names = c(FY,
 ID, TT), class = data.frame, row.names = c(1L, 2L, 3L,
 4L, 6L, 7L, 9L, 10L, 11L))
 dput(T_DF)
 structure(list(FY = structure(c(4L, 2L, 5L), .Label = c(FY09,
 FY10, FY11, FY12, FY13), class = factor), ID = c(1,
 2, 2), TT = structure(c(2L, 2L, 2L), .Label = c(HC, TER), class = 
 factor)), .Names = c(FY,
 ID, TT), row.names = c(5L, 8L, 12L), class = data.frame)


Here is an sqldf solution:

 library(sqldf)
 sqldf(select FY, ID, coalesce(t.TT, h.TT) TT from H_DF h left join T_DF t 
 using(FY, ID))
FY ID  TT
1 FY09  1  HC
2 FY10  1  HC
3 FY11  1  HC
4 FY12  1 TER
5 FY09  2  HC
6 FY10  2 TER
7 FY11  2  HC
8 FY12  2  HC
9 FY13  2 TER


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Saving then Loading Objects/Models into existing workspace.

2013-11-11 Thread Lopez, Dan
Hi R Experts,

I need some advice on how to manage the number of models/objects I have in one 
workspace.

Below is typically how I get started each time I begin or resume an analysis.

But now I am storing multiple models which are built off of dataframes with 
dims of 30,000 x 60.
I am anticipating running into RAM issues. I am running 64bit r, r version: 
2.15.1, Windows 7 PC w/ 8GB of RM and processor: Intel Core2 DUO CPU 
E8400@3.00Ghzmailto:E8400@3.00Ghz

Let's say I have models: M1 thru Mk. How do I save these separately and the 
load them as needed? I am picturing storing them in one file and then calling 
one or more from that one file as needed. I hope that makes sense.

# GETTING STARTED---
#Clear current objects and workspace
rm(list=ls())

#Set Working directory
setwd()


#LOAD RDATA and History
load(FY14_RF_Model_Dan.RData)
loadhistory(FY14_RF_Model_Dan.Rhistory)
ls()

#Once I'm done I SAVE
save.image(FY14_RF_Model_Dan.RData)
savehistory(FY14_RF_Model_Dan.RHistory)

Dan Lopez
LLNL,HRIM - Workforce Analytics  Metrics


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Elastic-R Webinar Invite

2013-11-11 Thread Ray DiGiacomo, Jr.
Hello R-Help Mailing List:

Are you interested in running collaborative R analytics in the cloud?

Join the The Knoxville R User Group and The Orange County R User Group for
a free webinar on the Elastic-R software platform.

Webinar Format:
- Introduction to Elastic-R
- Live demonstration of the Elastic-R platform on Amazon EC2
- Question and Answer period

Registration and Information:
https://www3.gotomeeting.com/register/318141670

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] unable to install package xts

2013-11-11 Thread Wang Chongyang
I am using Ubuntu 12.04 and unable to install xts. Here are the info:

usr/bin/ld: cannot find -lgfortran
collect2: error: ld returned 1 exit status
make: *** [xts.so] Error 1
ERROR: compilation failed for package ‘xts’
* removing ‘/home/jasom/R/x86_64-pc-linux-gnu-library/3.0/xts’
Warning in install.packages :
  installation of package ‘xts’ had non-zero exit status

The downloaded source packages are in
‘/tmp/RtmpVH1i1S/downloaded_packages’
 sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
LC_TIME=en_US.UTF-8
 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.0.2

Thanks in advance.

CY

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Getting residual term out of lmer summary table

2013-11-11 Thread aline . frank
Hello

I'm working with mixed effects models using lmer() and have some problems to 
get all variance components of the model's random effects. I can get the 
variance of the random effect out of the summary and use it for further 
calculations, but not the variance component of the residual term. Could 
somebody help me with that problem? Thanks a lot! Below an example.

Aline



## EXAMPLE
#--

require(lme4)

## Simulate data for the example
set.seed(6)
x1 - runif(n=100, min=10, max=100) ## a continuos variable
x2 - runif(n=100, min=10, max=100) ## a continuos variable

treat - rep(letters[1:4], times=25) ## a fixed factor with 4 levels
treat.effect - 20*rep(1:4, times=25) 

group.label - rep(LETTERS[1:5], each=20)  ## the random effect
group.effect - 10*rep(1:5, each=20)   ## there are 5 groups

## Response variable:
y - 2*x1 + (-5)*x2 + treat.effect + group.effect + rnorm(100)

## Dataframe
d.ex - data.frame(y, x1, x2, Group=group.label, treat)

## Apply model
mod1 - lmer(y~x1+x2+treat+x1:treat+ (1|Group), data=d.ex)
output - summary(mod1); output # ok, there is the variance component of the 
random effect group and the residual term

## Now I'd like to get the variance components of the random effect Group and 
of the residual term Residual in order 
# to do further calculations with these numbers
output$varcor[1] ## reveals the variance of the random effect Group
output$varcor[2] ## does not reveal the residual term! what other command do I 
need to use then?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Apply function to every 20 rows between pairs of columns in a matrix

2013-11-11 Thread arun
HI,

It's not very clear.
set.seed(25)
dat1 - 
as.data.frame(matrix(sample(c(A,T,G,C),46482*56,replace=TRUE),ncol=56,nrow=46482),stringsAsFactors=FALSE)
 lst1 - split(dat1,as.character(gl(nrow(dat1),20,nrow(dat1
res - lapply(lst1,function(x) sapply(x[,1:8],function(y) sapply(x[,9:56], 
function(z) sum(y==z)/20)))

 length(res)
#[1] 2325  ### check here
 dim(res[[1]])
#[1] 48  8

A.K.



Hi all, I have a set of genetic SNP data that looks like 

Founder1 Founder2 Founder3 Founder4 Founder5 Founder6 Founder7 Founder8 Sample1 
Sample2 Sample3 Sample... 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 

The size of the matrix is 56 columns by 46482 rows. I need to 
first bin the matrix by every 20 rows, then compare each of the first 8 
columns (founders) to each columns 9-56, and divide the total number of 
matching letters/alleles by the total number of rows (20). Ultimately I 
need 48 8 column by 2342 row matrices, which are essentially similarity 
matrices. I have tried to extract each pair separately by something like 

length(cbind(odd[,9],odd[,1])[cbind(odd[,9],cbind(odd[,9],odd[,1])[,1])[,1]==T
  cbind(odd[,9],odd[,1])[,2]==T,])/nrow(cbind(odd[,9],odd[,1])) 

but this is no where near efficient, and I do not know of a 
faster way of applying the function to every 20 rows and across multiple
 pairs. 

In the example given above, if the rows were all identical like 
shown across 20 rows, then the first row of the matrix for Sample1 would
 be 

1 1 1 0 0 0 0

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Apply function to every 20 rows between pairs of columns in a matrix

2013-11-11 Thread arun


Hi,
May be this what you wanted.
res2 - lapply(row.names(res[[1]]),function(x) 
do.call(rbind,lapply(res,function(y) y[match(x, row.names(y)),])))
 length(res2)
#[1] 48
 dim(res2[[1]])
#[1] 2325    8

A.K.


On Monday, November 11, 2013 10:20 PM, Yu-yu Ren renyan...@gmail.com wrote:

Thank you so much for that script, it works great. One additional request; how 
can I go about binding each of the 2325 matrices for each sample, resulting in 
48 matrices of 8 column by 2325 row?




On Mon, Nov 11, 2013 at 10:02 PM, arun smartpink...@yahoo.com wrote:



Hi,
I already sent a reply to R-help.  I am not sure about the 2342.

set.seed(25)
dat1 - 
as.data.frame(matrix(sample(c(A,T,G,C),46482*56,replace=TRUE),ncol=56,nrow=46482),stringsAsFactors=FALSE)
 lst1 - split(dat1,as.character(gl(nrow(dat1),20,nrow(dat1
res - lapply(lst1,function(x) sapply(x[,1:8],function(y) sapply(x[,9:56], 
function(z) sum(y==z)/20)))

 length(res)
#[1] 2325  ### check here
 dim(res[[1]])
#[1] 48  8

A.K.




On Monday, November 11, 2013 10:00 PM, Yu-yu Ren renyan...@gmail.com wrote:

Thank you, I have uploaded several example files, with intermediate outputs of 
what I have done and the logic flow.




On Mon, Nov 11, 2013 at 9:37 PM, smartpink...@yahoo.com wrote:


Hi,

Comparing the first 8 columns separately with 9-56 columns is not clear.  
Also, please provide a reproducible example (using ?dput) for others to work 
on.

A.K.
quote author='Renyulb28'
Hi all, I have a set of genetic SNP data that looks like

Founder1 Founder2 Founder3 Founder4 Founder5 Founder6 Founder7 Founder8
Sample1 Sample2 Sample3 Sample...
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T

The size of the matrix is 56 columns by 46482 rows. I need to first bin the
matrix by every 20 rows, then compare each of the first 8 columns (founders)
to each columns 9-56, and divide the total number of matching
letters/alleles by the total number of rows (20). Ultimately I need 48 8
column by 2342 row matrices, which are essentially similarity matrices. I
have tried to extract each pair separately by something like

length(cbind(odd[,9],odd[,1])[cbind(odd[,9],cbind(odd[,9],odd[,1])[,1])[,1]==T
 cbind(odd[,9],odd[,1])[,2]==T,])/nrow(cbind(odd[,9],odd[,1]))

but this is no where near efficient, and I do not know of a faster way of
applying the function to every 20 rows and across multiple pairs.

In the example given above, if the rows were all identical like shown across
20 rows, then the first row of the matrix for Sample1 would be

1 1 1 0 0 0 0

/quote
Quoted from:
http://r.789695.n4.nabble.com/Apply-function-to-every-20-rows-between-pairs-of-columns-in-a-matrix-tp4680272.html


_
Sent from http://r.789695.n4.nabble.com




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Apply function to every 20 rows between pairs of columns in a matrix

2013-11-11 Thread arun






HI,


set.seed(25)
dat1 - 
as.data.frame(matrix(sample(c(A,T,G,C),46482*56,replace=TRUE),ncol=56,nrow=46482),stringsAsFactors=FALSE)
 lst1 - split(dat1,as.character(gl(nrow(dat1),20,nrow(dat1
res - lapply(lst1,function(x) sapply(x[,1:8],function(y) sapply(x[,9:56], 
function(z) sum(y==z)/20)))

 length(res)
#[1] 2325  ### check here
 dim(res[[1]])
#[1] 48  8

A.K.



Hi all, I have a set of genetic SNP data that looks like 

Founder1 Founder2 Founder3 Founder4 Founder5 Founder6 Founder7 Founder8 Sample1 
Sample2 Sample3 Sample... 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 

The size of the matrix is 56 columns by 46482 rows. I need to 
first bin the matrix by every 20 rows, then compare each of the first 8 
columns (founders) to each columns 9-56, and divide the total number of 
matching letters/alleles by the total number of rows (20). Ultimately I 
need 48 8 column by 2342 row matrices, which are essentially similarity 
matrices. I have tried to extract each pair separately by something like 

length(cbind(odd[,9],odd[,1])[cbind(odd[,9],cbind(odd[,9],odd[,1])[,1])[,1]==T
 cbind(odd[,9],odd[,1])[,2]==T,])/nrow(cbind(odd[,9],odd[,1])) 

but this is no where near efficient, and I do not know of a 
faster way of applying the function to every 20 rows and across multiple
pairs. 

In the example given above, if the rows were all identical like 
shown across 20 rows, then the first row of the matrix for Sample1 would
be 

1 1 1 0 0 0 0

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unable to install package xts

2013-11-11 Thread Dirk Eddelbuettel
Wang Chongyang wchongyang at gmail.com writes:
 I am using Ubuntu 12.04 and unable to install xts. Here are the info:
 
 usr/bin/ld: cannot find -lgfortran

Do 'sudo apt-get install r-base-dev' to install a set of requirement 
for building packages, which includes among other things the Fortran
library you are missing here.

Dirk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Test for exogeneity

2013-11-11 Thread jpm miao
Hi,



   I am building a bivariate SVAR model



y_1t=c_1+Ã_1 (1,1) y_(1,t-1)+Ã_1 (1,2) y_(2,t-1)+Ã_2 (1,1) y_(1,t-2)+Ã_2
(1,2) y_(2,t-2)+å_1t



   b y_1t+ y_2t=c_2+Ã_1 (2,1) y_(1,t-1)+Ã_1 (2,2) y_(2,t-1)+Ã_2 (2,1)
y_(1,t-2)+Ã_2 (1,2) y_(2,t-2)+å_2t



  Now y1 is relatively exogenous in that y1 impacts y2 contemporaneously
but not the other way around. Given a bivariate dataset, is there any
statistical test (in any R package or elsewhere) that helps to justify/test
the exogeneity of y1 in the present context? Is there any reference
available?



Thanks,



Miao

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sourcing from 2 different computers R code

2013-11-11 Thread Luca Meyer
Hi,

I have a piece of code sitting on a dropbox directory and haev installed R
3.0.2 on 2 machines: one MacBook Pro and one Sony Vaio pc.

Now, when I use

source(/Users/R)

to call the script from the Mac no problems, but when I use

source(C:\Users\...R)

to call the script from the Sony Vaio I get the following:

Error: '\U' used without hex digits in character string starting 'C:\U

What am I doing wrong?

Thanks in advance,

Luca

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sourcing from 2 different computers R code

2013-11-11 Thread Pascal Oettli
Hello,

What is the result when you use source(C:/Users/...R)?

Regards,
Pascal


On 12 November 2013 15:13, Luca Meyer lucam1...@gmail.com wrote:

 Hi,

 I have a piece of code sitting on a dropbox directory and haev installed R
 3.0.2 on 2 machines: one MacBook Pro and one Sony Vaio pc.

 Now, when I use

 source(/Users/R)

 to call the script from the Mac no problems, but when I use

 source(C:\Users\...R)

 to call the script from the Sony Vaio I get the following:

 Error: '\U' used without hex digits in character string starting 'C:\U

 What am I doing wrong?

 Thanks in advance,

 Luca

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Pascal Oettli
Project Scientist
JAMSTEC
Yokohama, Japan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unable to install package xts

2013-11-11 Thread Pascal Oettli
Hello,

You probably should install a Fortran compiler.

Regards,
Pascal




On 12 November 2013 13:40, Wang Chongyang wchongy...@gmail.com wrote:

 I am using Ubuntu 12.04 and unable to install xts. Here are the info:

 usr/bin/ld: cannot find -lgfortran
 collect2: error: ld returned 1 exit status
 make: *** [xts.so] Error 1
 ERROR: compilation failed for package ‘xts’
 * removing ‘/home/jasom/R/x86_64-pc-linux-gnu-library/3.0/xts’
 Warning in install.packages :
   installation of package ‘xts’ had non-zero exit status

 The downloaded source packages are in
 ‘/tmp/RtmpVH1i1S/downloaded_packages’
  sessionInfo()
 R version 3.0.2 (2013-09-25)
 Platform: x86_64-pc-linux-gnu (64-bit)

 locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 LC_TIME=en_US.UTF-8
  [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
 LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 LC_ADDRESS=C
 [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8
 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 loaded via a namespace (and not attached):
 [1] tools_3.0.2

 Thanks in advance.

 CY

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Pascal Oettli
Project Scientist
JAMSTEC
Yokohama, Japan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sourcing from 2 different computers R code

2013-11-11 Thread Prof Brian Ripley

This is not one but two FAQs:

http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-file-names-work-in-Windows_003f

http://cran.r-project.org/bin/windows/base/rw-FAQ.html#R-can_0027t-find-my-file

See the posting guide and the footer of this message.

On 12/11/2013 06:13, Luca Meyer wrote:

Hi,

I have a piece of code sitting on a dropbox directory and haev installed R
3.0.2 on 2 machines: one MacBook Pro and one Sony Vaio pc.

Now, when I use

source(/Users/R)

to call the script from the Mac no problems, but when I use

source(C:\Users\...R)

to call the script from the Sony Vaio I get the following:

Error: '\U' used without hex digits in character string starting 'C:\U

What am I doing wrong?

Thanks in advance,

Luca

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.