Re: [R] retaining characters in a csv file

2015-09-22 Thread Arunkumar Srinivasan
data.table's fread reads this as expected. Quoted strings aren't coerced.

sapply(fread('5724550,"000202075214",2005.02.17,2005.02.17,"F"\n'), class)
#  V1  V2  V3  V4  V5
#   "integer" "character" "character" "character" "character"

Best,
Arun.

On Wed, Sep 23, 2015 at 12:00 AM, Therneau, Terry M., Ph.D.
 wrote:
> I have a csv file from an automatic process (so this will happen thousands
> of times), for which the first row is a vector of variable names and the
> second row often starts something like this:
>
> 5724550,"000202075214",2005.02.17,2005.02.17,"F", .
>
> Notice the second variable which is
>   a character string (note the quotation marks)
>   a sequence of numeric digits
>   leading zeros are significant
>
> The read.csv function insists on turning this into a numeric.  Is there any
> simple set of options that
> will turn this behavior off?  I'm looking for a way to tell it to "obey the
> bloody quotes" -- I still want the first, third, etc columns to become
> numeric.  There can be more than one variable like this, and not always in
> the second position.
>
> This happens deep inside the httr library; there is an easy way for me to
> add more options to the read.csv call but it is not so easy to replace it
> with something else.
>
> Terry T
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] more complex by with data.table???

2015-06-21 Thread Arunkumar Srinivasan
Ramiro,

`dt[, lapply(.SD, mean), by=name]` is the idiomatic way.

I suggest reading through the new HTML vignettes at
https://github.com/Rdatatable/data.table/wiki/Getting-started

Ista, thanks for linking to the new vignette.


On Wed, Jun 10, 2015 at 2:17 AM, Ista Zahn istaz...@gmail.com wrote:
 Hi Ramiro,

 There is a demonstration of this on the data.table wiki at
 https://rawgit.com/wiki/Rdatatable/data.table/vignettes/datatable-intro-vignette.html.
 You can do

 dt[, lapply(.SD, mean), by=name]

 or

 dt[, as.list(colMeans(.SD)), by=name]

 BTW, there are pretty straightforward ways to do this in base R as well, e.g,

 data.frame(t(sapply(split(df[-1], df$name), colMeans)))

 Best,
 Ista

 On Tue, Jun 9, 2015 at 4:22 PM, Ramiro Barrantes
 ram...@precisionbioassay.com wrote:
 Hello,

 I am trying to do something that I am able to do with the by function 
 within data.frame but can't figure out how to achieve with data.table.

 Consider

 dt-data.table(name=c(rep(a,5),rep(b,6)),var1=0:10,var2=20:30,var3=40:50)
 myFunction - function(x) { mean(x) }

 I am aware that I can do something like:

 dt[, .(meanVar1=myFunction(var1)) ,by=.(name)]

 but how could I do the equivalent of:

 df-data.frame(name=c(rep(a,5),rep(b,6)),var1=0:10,var2=20:30,var3=40:50)
 myFunction - function(x) { mean(x) }

 columnNames - c(var1,var2,var3)
 result - by(df, df$name, function(x) {
output - c()
for(col in columnNames) {
  output[col] - myFunction(x[,col])
}
   output
 })
 do.call(rbind,result)

 Thanks in advance,
 Ramiro

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data.frame: data-driven column selections that vary by row??

2015-04-01 Thread Arunkumar Srinivasan
David,

In data.table v1.9.5 (current development version, which you can get
from here: https://github.com/Rdatatable/data.table/wiki/Installation),
new features were added to both `melt` and `cast` for data.tables.
They both can handle multiple columns simultaneously. I think this
would be of interest for you..

Using 1.9.5, here's how I'd do it.

require(data.table) ## v1.9.5+
cols - grep(^da2.*$, names(bw), value=TRUE)  ## (1)
splt - split(cols, seq_len(length(cols)/2L))   ## (2)
vars - unique(gsub((.*?)_(.*$), \\1, cols))## (3)
vals - unique(gsub((.*?)_(.*$), \\2, cols))## (4)

ans1 = melt(setDT(bw), measure=splt, variable.name=disc,
value.name=vals) ## (5)
setattr(ans1$disc, 'levels', vars) ## (6)

Explanation:
---

1. Get all cols you've to melt
2. Split them into column pairs that should be combined together
3. Get levels for 'variable' column
4. Get column names for molten result
5. Melt by providing list of columns with each element containing the
columns you'd want to combine together in the molten result directly.
6. Set levels for variable column appropriately.

Advantages:
--

1. melting by combining corresponding columns together, directly, is
straightforward and easy to understand, since that's the task you want
to perform. Having to combine all columns together and then split them
back seems roundabout.

2. casting (tidyr::spread internally uses reshape2::dcast) is a
relatively complicated operation, and in this case it can be
completely avoided which will save both time and memory (see benchmark
at the bottom of post). It also reorders the result which may not be
desirable.

3. In 'bw', columns `da20_dev_type` and `da2_dev_type` are type
'factor' while others are type 'numeric'. reshape2::melt (or)
tidyr::gather, since it combines all columns will have to coerce these
different types to a common type, here 'character'. So, you'll have to
convert the columns back to the right type after casting. I think
you'll agree that's unnecessary. `melt.data.table` preserves the type
as it combines only relevant columns together.

4. Since the operation is performed in a straightforward manner (and
in C for speed), it's incredibly fast *and* memory efficient.

Benchmark (on ~180,000 rows)
-

library(tidyr)
library(dplyr)
require(data.table) ## v1.9.5+

# replacing timestamp so that rows for unique (for spread to work correctly)
bw.large = rbindlist(replicate(1e4, bw, simplify=FALSE))[, timestamp := .I][]
object.size(bw.large)/1024^2 # ~38MB

The data is 38MB, which is not at all large... but enough to illustrate.

# data.table
system.time({
cols - grep(^da2.*$, names(bw), value=TRUE)  ## (1)
splt - split(cols, seq_len(length(cols)/2L))   ## (2)
vars - unique(gsub((.*?)_(.*$), \\1, cols))## (3)
vals - unique(gsub((.*?)_(.*$), \\2, cols))## (4)

ans1 = melt(setDT(bw.large), measure=splt, variable.name=disc,
value.name=vals) ## (5)
setattr(ans1$disc, 'levels', vars) ## (6)
})
#user  system elapsed
#   0.260   0.013   0.275

Memory used: 56MB

# tidyr
system.time({
ans2 - gather(setDF(bw.large), key = tmp, value = value,
matches(^d[a-z]+[0-9]+))
ans2 - separate(ans2, tmp, c(disc, var), _, extra = merge)
ans2 - spread(ans2, var, value)
})
#user  system elapsed
#  15.818   1.128  17.063

Memory used : 750MB

And that's ~62x speedup.

HTH
Arun Srinivasan
Co-developer, data.table.


On Tue, Mar 31, 2015 at 8:35 PM, Tom Wright t...@maladmin.com wrote:
 Nice clean-up!!!

 On Tue, 2015-03-31 at 14:19 -0400, Ista Zahn wrote:
 library(tidyr)
 library(dplyr)
 bw - gather(bw, key = tmp, value = value,
 matches(^d[a-z]+[0-9]+))
 bw - separate(bw, tmp, c(disc, var), _, extra = merge)
 bw - spread(bw, var, value)

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] DESeq vs DESeq2 different DEGs results

2014-05-09 Thread Arunkumar Srinivasan
You're on the wrong list. This is more appropriate on the bioconductor
mailing list.


On Mon, May 5, 2014 at 9:42 AM, Catalina Aguilar Hurtado
cata...@gmail.comwrote:

 Hi,

 I want to compare DESeq vs DESeq2 and I am getting different number of DEGs
 which I will expect to be normal. However, when I compare the 149 genes ID
 that I get with DESeq with the 869 from DESeq2 there are only ~10 genes
 that are in common which I don’t understand  (using FDR 0.05 for both). I
 want to block the Subject effect for which I am including the reduced
 formula of ~1.

 Shouldn’t these two methods output similar results?  Because at the moment
 I could interpret my results in different ways.

 Thanks for your help,

 Catalina


 This the DESeq script that I am using:


 DESeq

 library(DESeq)

 co=as.matrix(read.table(2014_04_01_6h_LP.csv,header=T, sep=,,
 row.names=1))


 Subject=c(1,2,3,4,5,1,2,4,5)

 Treatment=c(rep(co,5),rep(c2,4))
 a.con=cbind(Subject,Treatment)

 cds=newCountDataSet(co,a.con)


 cds - estimateSizeFactors( cds)

 cds - estimateDispersions(cds,method=pooled-CR,
 modelFormula=count~Subject+Treatment)


 #filtering

 rs = rowSums ( counts ( cds ))
 theta = 0.2
 use = (rs  quantile(rs, probs=theta))
 table(use)
 cdsFilt= cds[ use, ]



 fit0 - fitNbinomGLMs (cdsFilt, count~1)

 fit1 - fitNbinomGLMs (cdsFilt, count~Treatment)

 pvals - nbinomGLMTest (fit1, fit0)


 padj - p.adjust( pvals, method=BH )

 padj - data.frame(padj)

 row.names(padj)=row.names(cdsFilt)

 padj_fil - subset (padj,padj 0.05 )

 dim (padj_fil)

 [1] 149   1


 ——————

 library (DESeq2)

 countdata=as.matrix(read.table(2014_04_01_6h_LP.csv,header=T, sep=,,
 row.names=1))

 coldata= read.table (targets.csv, header = T, sep=,,row.names=1)

 coldata

 Subject Treatment
 F1   1co
 F2   2co
 F3   3co
 F4   4co
 F5   5co
 H1   1c2
 H2   2c2
 H4   4c2
 H5   5c2

 dds - DESeqDataSetFromMatrix(
   countData = countdata,
   colData = coldata,
   design = ~ Subject + Treatment)
 dds

 dds$Treatment - relevel (dds$Treatment, co)


 dds - estimateSizeFactors( dds)

 dds - estimateDispersions(dds)


 rs = rowSums ( counts ( dds ))
 theta = 0.2
 use = (rs  quantile(rs, probs=theta))
 table(use)
 ddsFilt= dds[ use, ]


 dds - nbinomLRT(ddsFilt, full = design(dds), reduced = ~ 1)

 resLRT - results(dds)

 sum( resLRT$padj  0.05, na.rm=TRUE )

 #[1] 869

 [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reshape large Data Frame to new format

2014-04-30 Thread Arunkumar Srinivasan
Hi Dark,

Sorry for the late response. Since you asked for a `data.table` solution as
well, here's one:

require(data.table)
dt - as.data.table(rawData)
dt[, GRP := (0:(.N-1L))%/%25L, by=PersonID]
dt[, `:=`(var=codes, N = 1:.N), by=list(PersonID, GRP)]
dcast.data.table(dt, PersonID+GRP ~ var+N, value.var=codes)


Arun
Co-developer of data.table package.



On Mon, Mar 24, 2014 at 9:44 PM, David Carlson dcarl...@tamu.edu wrote:

 78023, 43785, 69884, 12840, 54021 are listed as PersonID 3 in
 rawData, but PersonID 4 in resultData.
 Here is another way to get there:

 # Split codes by PersonID creating a single vector for each
 step1 - split(rawData$codes, rawData$PersonID)
 # Figure out how many lines we need - here 3 lines
 maxlines - ceiling(max(sapply(step1, length))/25)
 # Figure out how many entries we need - here 75 entries
 max - maxlines*25
 # Fill in blank entries to pad each line to 75
 step2 - lapply(step1, function(x) c(x, rep(, max-length(x
 # Wrap each single line into three lines
 step3 - lapply(step2, function(x) matrix(x, maxlines, 25,
 byrow=TRUE))
 # Create PersonID vector
 PersonID - rep(names(step1), each=maxlines)
 # Create data frame
 step4 - data.frame(PersonID, do.call(rbind, step3),
 stringsAsFactors=FALSE)
 # Label columns
 colnames(step4) - gsub(X, Code, colnames(step4))
 # Delete empty rows
 step4 - step4[apply(step4[, -1], 1, function(x) sum(x!=)0),]

 -
 David L Carlson
 Department of Anthropology
 Texas AM University
 College Station, TX 77840-4352


 -Original Message-
 From: r-help-boun...@r-project.org
 [mailto:r-help-boun...@r-project.org] On Behalf Of arun
 Sent: Monday, March 24, 2014 9:57 AM
 To: r-help@r-project.org
 Cc: Dark
 Subject: Re: [R] Reshape large Data Frame to new format

 Hi,
 In your 'resultData, some observations seems to be omitted.
 with(rawData,tapply(codes, PersonID,FUN=function(x) x))$Person3
  #[1] 56177 61704 70879 69033 87224 68670 65602 25476 81209
 62086 35492 39771
 #[13] 14380 43858 53679 78023 43785 69884 12840 54021

 resultData[4,]
 #  PersonId Code1 Code2 Code3 Code4 Code5 Code6 Code7 Code8
 Code9 Code10 Code11
 #4  Person3 56177 61704 70879 69033 87224 68670 65602 25476
 81209  62086  35492
 #  Code12 Code13 Code14 Code15 Code16 Code17 Code18 Code19
 Code20 Code21 Code22
 #4  39771  14380  43858
 53679
 #  Code23 Code24 Code25

 One way would be:
 rawData$Seq-with(rawData,ave(codes,PersonID,FUN=function(x)
 rep(1:25,length.out=length(x
 rawData$Seq1- with(rawData,ave(codes,PersonID,FUN=function(x)
 rep(seq(length(x) %/%25 +1),each=25,length.out=length(x
 res -
 reshape(rawData,v.names=codes,idvar=c(PersonID,Seq1),timev
 ar=Seq,direction=wide,sep=)[,-2]
  res[is.na(res)] - 
 colnames(res) - colnames(resultData)
  rownames(res) - rownames(resultData)
 A.K.





 On Monday, March 24, 2014 10:15 AM, Dark
 i...@software-solutions.nl wrote:
 Hi R-experts,

 I have a data.frame that I want to reshape to a certain format
 so I can use
 it in a tool for further analysis.
 Basicly I have a very long list with IDs of persons and their
 codes.

 I create a row for every person with 25 of their codes. I a
 person has more
 then 25 codes, I want to add another row for that person. If a
 row contains
 less then 25 codes I want to fill with empty string values.

 I have manually created a sample rawData and resultData and used
 dput so you
 can see my starting DF and the wanted result DF.

 The sample is of very limited size, the real data would contain
 a few
 million(!) records.

 rawData - structure(list(PersonID = structure(c(1L, 1L, 1L, 1L,
 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L,
 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L,
 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
 5L, 5L, 5L, 5L, 5L, 5L), .Label = c(Person1, Person2,
 Person3,
 Person4, Person5), class = factor), codes = c(34396L,
 81878L,
 67829L, 13428L, 12992L, 63724L, 85930L, 78497L, 59578L, 50733L,
 26154L, 47205L, 74578L, 12204L, 42435L, 96643L, 35242L, 29836L,
 73031L, 11326L, 96686L, 55849L, 56415L, 11064L, 78509L, 55715L,
 75851L, 60682L, 16277L, 52763L, 23429L, 39723L, 95809L, 60081L,
 19618L, 46012L, 79188L, 54664L, 64420L, 72875L, 97428L, 74897L,
 75615L, 12023L, 21572L, 56177L, 61704L, 70879L, 69033L, 87224L,
 68670L, 65602L, 25476L, 81209L, 62086L, 35492L, 39771L, 14380L,
 43858L, 53679L, 78023L, 43785L, 69884L, 12840L, 54021L, 68002L,
 79249L, 61784L, 7L, 28935L, 91406L, 42045L, 97716L, 65690L,
 57310L, 57627L, 32227L, 43121L, 22251L, 31255L, 90660L, 89118L,
 14558L, 99824L, 25005L, 62186L, 10527L, 99438L, 85656L, 79465L,
 

Re: [R] creating table with sequences of numbers based on the table

2014-03-13 Thread Arunkumar Srinivasan
I think this'll be way simpler and also faster:

ans - data.frame(pop = rep.int(tab$pop, tab$Freq), ind=sequence(tab$Freq))

Arun

From: Dennis Murphy djmu...@gmail.com
Reply: Dennis Murphy djmu...@gmail.com
Date: March 13, 2014 at 9:57:20 PM
To: arun smartpink...@yahoo.com
Cc: R help r-help@r-project.org
Subject:  Re: [R] creating table with sequences of numbers based on the table  

Less coding with plyr:  

tab - read.table(text=pop Freq  
1 1 30  
2 2 25  
3 3 30  
4 4 30  
5 5 30  
6 6 30  
7 7 30,sep=,header=TRUE)  

# Function to do the work on each row  
f - function(pop, Freq) data.frame(ind = seq_len(Freq))  

library(plyr)  
u - mdply(tab, f)[, -2]  

Dennis  

On Thu, Mar 13, 2014 at 8:01 AM, arun smartpink...@yahoo.com wrote:  
 Hi,  
 Try:  
 Either  
  
 tab - read.table(text=pop Freq  
 1 1 30  
 2 2 25  
 3 3 30  
 4 4 30  
 5 5 30  
 6 6 30  
 7 7 30,sep=,header=TRUE)  
  
 indx - rep(1:nrow(tab),tab$Freq)  
 tab1 - 
 transform(tab[indx,],ind=ave(seq_along(indx),indx,FUN=seq_along))[,-2]  
 #or  
 tab2 - transform(tab[indx,],ind=unlist(sapply(tab$Freq,seq)))[,-2]  
 identical(tab1,tab2)  
 #[1] TRUE  
 #or  
 tab3 - transform(tab[indx,], ind= 
 with(tab,seq_len(sum(Freq))-rep(cumsum(c(0L,Freq[-length(Freq)])),Freq)))[,-2]
   
 identical(tab1,tab3)  
 #[1] TRUE  
  
 A.K.  
  
  
 I have a problem with transfering one table to another automatically. From 
 table like this:  
  
 tab  
 pop Freq  
 1 1 30  
 2 2 25  
 3 3 30  
 4 4 30  
 5 5 30  
 6 6 30  
 7 7 30  
  
 I want to use number of individuals (freq) and then in next  
 table just list them with following numbers (depending on total number  
 of individuals)  
 Like this:  
 in  
 pop ind  
  
 1 1  
 1 2  
 1 3  
 1 4  
 . .  
 . .  
 1 30  
 2 1  
 2 2  
 2 3  
 2 4  
 . .  
 2 25  
 3 1  
 3 2  
 . .  
 . .  
  
 How can i do it? I think i have to use loops but so far I failed.  
 Thank you in advance,  
 Best,  
 Malgorzata Gazda  
  
 __  
 R-help@r-project.org mailing list  
 https://stat.ethz.ch/mailman/listinfo/r-help  
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html  
 and provide commented, minimal, self-contained, reproducible code.  

__  
R-help@r-project.org mailing list  
https://stat.ethz.ch/mailman/listinfo/r-help  
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html  
and provide commented, minimal, self-contained, reproducible code.  

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Assign numbers in R

2014-03-12 Thread Arunkumar Srinivasan
Here's another one: match(d, unique(d)).


Arun

From: Greg Snow 538...@gmail.com
Reply: Greg Snow 538...@gmail.com
Date: March 12, 2014 at 8:41:31 PM
To: T Bal studentt...@gmail.com
Cc: r-help r-help@r-project.org
Subject:  Re: [R] Assign numbers in R  

Here are a couple more options if you want some variety:  

 d - c(8,7,5,5,3,3,2,1,1,1)  
 as.numeric( factor(d, levels=unique(d)) )  
[1] 1 2 3 3 4 4 5 6 6 6  
 cumsum( !duplicated(d) )  
[1] 1 2 3 3 4 4 5 6 6 6  


What would you want the output to be if your d vector had another 8  
after the last 1? The different solutions will give different output.  


On Wed, Mar 12, 2014 at 3:13 AM, T Bal studentt...@gmail.com wrote:  
 Hi,  
 I have the following numbers:  
  
 d - c(8,7,5,5,3,3,2,1,1,1)  
  
 I want to convert these into the following numbers:  
  
 r:  
 1,2,3,3,4,4,5,6,6,6  
  
 So if two numbers are different increment it if they are same then assign  
 the same number:  
  
 r - NULL  
  
 for (i in 1:length(d)) {  
  
 if (d[i] != d[i+1]) {  
 r[i] =i+1;  
 }  
 else {  
 r[i] = i;  
 }  
 }  
  
 But this is not correct. How can I solve this problem? or how can I solve  
 it in a different way? Thanks a lot!  
  
 [[alternative HTML version deleted]]  
  
 __  
 R-help@r-project.org mailing list  
 https://stat.ethz.ch/mailman/listinfo/r-help  
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html  
 and provide commented, minimal, self-contained, reproducible code.  



--  
Gregory (Greg) L. Snow Ph.D.  
538...@gmail.com  

__  
R-help@r-project.org mailing list  
https://stat.ethz.ch/mailman/listinfo/r-help  
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html  
and provide commented, minimal, self-contained, reproducible code.  

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] On ^ returning a matrix when operated on a data.frame

2013-11-13 Thread Arunkumar Srinivasan
Dear R-users, 

I am wondering why ^ operator alone returns a matrix, when operated on a 
data.frame (as opposed to all other arithmetic operators). Here's an example:

DF - data.frame(x=1:5, y=6:10)
class(DF*DF) # [1] data.frame
class(DF^2) # [1] matrix

I posted here on SO: 
http://stackoverflow.com/questions/19964897/why-does-on-a-data-frame-return-a-matrix-instead-of-a-data-frame-like-do
 and got a very nice answer - it happens because a matrix is returned (obvious 
by looking at `Ops.data.frame`). However, what I'd like to understand is, *why* 
a matrix is returned for ^ alone? Here's an excerpt from Ops.data.frame 
(Thanks to Neal Fultz):

if (.Generic %in% c(+, -, *, /, %%, %/%)) {
names(value) - cn
data.frame(value, row.names = rn, check.names = FALSE, 
check.rows = FALSE)
}
else matrix(unlist(value, recursive = FALSE, use.names = FALSE), 
nrow = nr, dimnames = list(rn, cn))


It's clear that a matrix will be returned unless `.Generic` is one of those 
arithmetic operators. My question therefore is, is there any particular reason 
why ^ operator is being missed in the if-statement here? I can't think of a 
reason where this would break. Also ?`^` doesn't seem to mention anything about 
this coercion.

Please let me know if I should be posting this to R-devel list instead.

Thank you very much,
Arun


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] On ^ returning a matrix when operated on a data.frame

2013-11-13 Thread Arunkumar Srinivasan
Duncan, 
Thank you. What I meant was that ^ is the only *arithmetic operator* to 
result in a matrix on operating in a data.frame. I understand it's quite old 
code. Also, your explanation makes sense, with the exception of / operator, I 
suppose (I could be wrong here). 

Arun


On Thursday, November 14, 2013 at 12:32 AM, Duncan Murdoch wrote:

 
 It's not just ^ that is missing, the logical relations like , ==, etc 
 also return matrices. This is very old code (I think from 1999), but I 
 would guess that the reason is that the ^ and  operators always return 
 values of a single type (numeric and logical respectively), whereas the 
 other operators can take mixed type inputs and return mixed type outputs.
 
 Duncan Murdoch
 
  Please let me know if I should be posting this to R-devel list instead.
  
  Thank you very much,
  Arun
  
  
  [[alternative HTML version deleted]]
  
  __
  R-help@r-project.org (mailto:R-help@r-project.org) mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
  
 
 
 



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] calculating mean matrix

2013-01-19 Thread Arunkumar Srinivasan
One way using `Reduce`: 

set.seed(45)
grp - factor(rep(letters[1:10], each=10)) # equivalent of your column x
# dummy data
df   - as.data.frame(matrix(sample(1:1000, replace=T), 
ncol=length(levels(grp
# solution
Reduce('+', split(df, grp))/length(levels(grp))


Arun


On Saturday, January 19, 2013 at 3:49 PM, ya wrote:

 Hi list,
 
 Thank you vey much for reading this post.
 
 I have a data frame, I am trying to split it into a couple of data frame 
 using one of the columns, say, x. After I get the data frames, I am planning 
 to treat them as matrices and trying to calculate an element by element mean 
 matrix. Could anyone give me some advice how to do it?
 
 So far, I know that if I have a couple of matrices, say 
 data1,data2,data3,data4...dataN, I can do it like this:
 
 data=array(cbind(data1,data2,data3,data4,dataN), c(2, 3, N))
 #2 refers to row number of matrix, 3 refers to column number of matrix, N 
 refers to number of matrices to be averaged.
 meanmtrx=apply(data,1:2,mean)
 
 but I do not know how to use the resulting data frames with cbind(). Maybe 
 there are other better ways. Any advice is appreciated.
 
 Thank you very much.
 
 Have a nice day.
 
 ya 
 [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org (mailto:R-help@r-project.org) mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.