[R] Error: unexpected symbol in [with read.table]
When reading in a tab delimited file using args I keep getting the error: Error: unexpected symbol in Name index Execution halted The code is this: a - read.table(args[1],sep=\t,header=T, stringsAsFactors=F) When inputting the file directly, as follows, this produces no errors: a - read.table(/path/to/file/filename.txt, header=T,sep=\t, stringsAsFactors=F). The file is such: Name index Bob 1 George 2 Dave3 Eric 4 . . . . Andrew20 Is there anything I should be looking out for that might be producing this error. Any help will be greatly appreciated. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error: unexpected symbol in [with read.table]
Oops - error on my part. Sorry. On Fri, Jun 26, 2015 at 2:54 PM, Bert Gunter bgunter.4...@gmail.com wrote: ... and you should also know by now to cc the list and not respond just to me! Bert Gunter Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. -- Clifford Stoll On Fri, Jun 26, 2015 at 10:58 AM, Kate Ignatius kate.ignat...@gmail.com wrote: reading in a tab delimited file using args What I mean by that is that I'm using a bash script to call in an R script and using the command: args - commandArgs(TRUE) in my R script. In my shell script I'm calling the R program as follows: /path/to/R/R-3.0.2/bin/Rscript I'm not sure if that will help - sure you will all know if it doesn't. K. On Fri, Jun 26, 2015 at 1:39 PM, Bert Gunter bgunter.4...@gmail.com wrote: ?? Are you expecting us to guess what your code was from reading in a tab delimited file using args ? You've posted here before and should know by now that explicit code should be provided whenever possible. Cheers, Bert Bert Gunter Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. -- Clifford Stoll On Fri, Jun 26, 2015 at 10:32 AM, Kate Ignatius kate.ignat...@gmail.com wrote: When reading in a tab delimited file using args I keep getting the error: Error: unexpected symbol in Name index Execution halted The code is this: a - read.table(args[1],sep=\t,header=T, stringsAsFactors=F) When inputting the file directly, as follows, this produces no errors: a - read.table(/path/to/file/filename.txt, header=T,sep=\t, stringsAsFactors=F). The file is such: Name index Bob 1 George 2 Dave3 Eric 4 . . . . Andrew20 Is there anything I should be looking out for that might be producing this error. Any help will be greatly appreciated. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Converting unique strings to unique numbers
I found this helpful. However - the second to forth columns come out all zero - was this the intention? That is: X0001 0 0 0 2 1 BYX859 X0001 0 0 0 1 1 BYX894 X0001 0 0 0 2 2 BYX862 X0001 0 0 0 2 2 BYX863 X0001 0 0 0 2 2 BYX864 X0001 0 0 0 2 2 BYX865 On Fri, May 29, 2015 at 1:31 PM, William Dunlap wdun...@tibco.com wrote: match() will do what you want. E.g., run your data through the following function. f - function (data) { uniqStrings - unique(c(data[, 2], data[, 3], data[, 4])) uniqStrings - setdiff(uniqStrings, 0) for (j in 2:4) { data[[j]] - match(data[[j]], uniqStrings, nomatch = 0L) } data } Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, May 29, 2015 at 9:58 AM, Kate Ignatius kate.ignat...@gmail.com wrote: I have a pedigree file as so: X0001 BYX859 0 0 2 1 BYX859 X0001 BYX894 0 0 1 1 BYX894 X0001 BYX862 BYX894 BYX859 2 2 BYX862 X0001 BYX863 BYX894 BYX859 2 2 BYX863 X0001 BYX864 BYX894 BYX859 2 2 BYX864 X0001 BYX865 BYX894 BYX859 2 2 BYX865 And I was hoping to change all unique string values to numbers. That is: BYX859 = 1 BYX894 = 2 BYX862 = 3 BYX863 = 4 BYX864 = 5 BYX865 = 6 But only in columns 2 - 4. Essentially I would like the data to look like this: X0001 1 0 0 2 1 BYX859 X0001 2 0 0 1 1 BYX894 X0001 3 2 1 2 2 BYX862 X0001 4 2 1 2 2 BYX863 X0001 5 2 1 2 2 BYX864 X0001 6 2 1 2 2 BYX865 Is this possible with factors? Thanks! K. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Converting unique strings to unique numbers
I have a pedigree file as so: X0001 BYX859 0 0 2 1 BYX859 X0001 BYX894 0 0 1 1 BYX894 X0001 BYX862 BYX894 BYX859 2 2 BYX862 X0001 BYX863 BYX894 BYX859 2 2 BYX863 X0001 BYX864 BYX894 BYX859 2 2 BYX864 X0001 BYX865 BYX894 BYX859 2 2 BYX865 And I was hoping to change all unique string values to numbers. That is: BYX859 = 1 BYX894 = 2 BYX862 = 3 BYX863 = 4 BYX864 = 5 BYX865 = 6 But only in columns 2 - 4. Essentially I would like the data to look like this: X0001 1 0 0 2 1 BYX859 X0001 2 0 0 1 1 BYX894 X0001 3 2 1 2 2 BYX862 X0001 4 2 1 2 2 BYX863 X0001 5 2 1 2 2 BYX864 X0001 6 2 1 2 2 BYX865 Is this possible with factors? Thanks! K. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error importing data - wrapping?
I've tried many things: read.csv(data.frame.txt, header=F, fill=T,stringsAsFactors=FALSE, sep=\t, colClasses=character) read.csv2(data.frame.txt, fill=T,stringsAsFactors=FALSE, sep=\t, as.is=T, colClasses=character) also with read.delim/2 read.table(data.frame.txt, header=F, fill=T,stringsAsFactors=FALSE, sep=\t, colClasses=character) And a combination of various different options. On Sat, May 9, 2015 at 11:11 AM, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: There are many ways to import data into R, and I don't know any of them that would do what you are describing. You really need to give us some reproducible code if we are to follow along with your problem. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On May 9, 2015 7:59:31 AM PDT, Kate Ignatius kate.ignat...@gmail.com wrote: I have some data that I've trouble importing... A B C D E A 1232 0.565 B 2323 0.5656 0.5656 0.5656 C 2323 0.5656 D 2323 0.5656 E 2323 0.5656 F 2323 0.5656 G 2323 0.5656 G 2323 0.5656 0.5656 0.5656 When I input the data it seems to go like this: SampleID ItemB ItemC ItemD ItemE A 1232 0.565 B 2323 0.5656 0.5656 0.5656 C 2323 0.5656 D 2323 0.5656 E 2323 0.5656 F 2323 0.5656 G 2323 0.5656 G 2323 0.5656 0.5656 0.5656 with the last two columns (or the two columns with vast amounts of missing data which are usually the last two = see SampleB) wrapping around - is there away to prevent this? Thanks! __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error importing data - wrapping?
I have some data that I've trouble importing... A B C D E A 1232 0.565 B 2323 0.5656 0.5656 0.5656 C 2323 0.5656 D 2323 0.5656 E 2323 0.5656 F 2323 0.5656 G 2323 0.5656 G 2323 0.5656 0.5656 0.5656 When I input the data it seems to go like this: SampleID ItemB ItemC ItemD ItemE A 1232 0.565 B 2323 0.5656 0.5656 0.5656 C 2323 0.5656 D 2323 0.5656 E 2323 0.5656 F 2323 0.5656 G 2323 0.5656 G 2323 0.5656 0.5656 0.5656 with the last two columns (or the two columns with vast amounts of missing data which are usually the last two = see SampleB) wrapping around - is there away to prevent this? Thanks! __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error importing data - wrapping?
I've tried colClasses=character, fill=T, as.is=T, header=F, sep=\t, read.csv; read.delim, read.csv2, read.delim2 don't know what else to try. On Sat, May 9, 2015 at 11:13 AM, MacQueen, Don macque...@llnl.gov wrote: Some indication of what you have tried would be useful. Assuming you are using read.table(), then the fill argument of read.table() might be what you need. If you look at the help for read.table you will find: From ?read.table: fill: logical. If 'TRUE' then in case the rows have unequal length, blank fields are implicitly added. See 'Details'. -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 5/9/15, 7:59 AM, Kate Ignatius kate.ignat...@gmail.com wrote: I have some data that I've trouble importing... A B C D E A 1232 0.565 B 2323 0.5656 0.5656 0.5656 C 2323 0.5656 D 2323 0.5656 E 2323 0.5656 F 2323 0.5656 G 2323 0.5656 G 2323 0.5656 0.5656 0.5656 When I input the data it seems to go like this: SampleID ItemB ItemC ItemD ItemE A 1232 0.565 B 2323 0.5656 0.5656 0.5656 C 2323 0.5656 D 2323 0.5656 E 2323 0.5656 F 2323 0.5656 G 2323 0.5656 G 2323 0.5656 0.5656 0.5656 with the last two columns (or the two columns with vast amounts of missing data which are usually the last two = see SampleB) wrapping around - is there away to prevent this? Thanks! __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Grep out columns using a list of strings
Hi, I have a list of 150 strings, say, ap,: aajkss dfghjk sdfghk ... xxcvvn And I would l like to grep out these strings from column names in another file, af,. I've tried the following but none seem to work: aps - af[,grep(ap, colnames(af), value=TRUE)] aps - af[,grep(ap, colnames(af), value=FIXED)] aps - af[,grep(as.character(list(ap),colnames(af))] and also aps - unique (grep(ap, colnames(af)) Is there another way I can do this - maybe without using grep? Thanks! Kate. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Summing certain values within columns that satisfy a certain condition
Hi, Supposed I had a data frame like so: A B C D 0 1 0 7 0 2 0 7 0 3 0 7 0 4 0 7 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 5 0 5 1 5 0 4 1 5 0 8 4 7 0 0 3 0 0 0 3 4 0 0 3 4 0 0 0 5 0 2 0 6 0 0 4 0 0 0 4 0 0 0 4 0 For each row, I want to count how many max column values appear to adventurely get the following outcome, while ignoring zeros and N/As: A B C D Sum 0 1 0 7 1 0 2 0 7 1 0 3 0 7 1 0 4 0 7 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 5 0 0 5 1 5 0 0 4 1 5 0 0 8 4 7 3 0 0 3 0 0 0 0 3 4 0 0 0 3 4 0 0 0 0 5 0 0 2 0 6 0 0 0 4 0 1 0 0 4 0 1 0 0 4 0 1 I've used the following code but it doesn't seem to work (my sum column column is all 1s): (apply(df,1, function(x) (sum(x %in% c(pmax(x)) Is this code too simple? Thanks! K. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grepping out columns
Hi, I've got a complicated grep problem (or not)... I currently have a file with the headings as follows: DAY MONTH YEAR SA_TUES SA_MON SU_WED CH_TUES CH_WED CH_MON AR_TUES AR_WED AR_MON SA_THUR SU_FRI CH_THUR CH_FRI AR_THUR AR_FRI I want to grep out all columns that have SA at the beginning of their day including any other information pertaining to that day. Ultimately I want to end up with: SA_TUES SA_MON CH_TUES CH_MON AR_TUES AR_MON SA_THUR CH_THUR AR_THUR Is there a way of doing this simply with grep? Or will this need to be more complicated? Thanks! K. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grepping out columns
Thanks! That was helpful. Although I think there was a typo in the last line: selected - sort(unique(unlist(all_ind))) but I figured it out :) K. On Wed, Feb 18, 2015 at 4:10 PM, Federico Lasa fel...@gmail.com wrote: David's almost works except it catches the MONTH column, just add an empty metacharacter tho. c(DAY, MONTH, YEAR, SA_TUES, SA_MON, SU_WED, CH_TUES, CH_WED, CH_MON, AR_TUES, AR_WED, AR_MON, SA_THUR, SU_FRI, CH_THUR, CH_FRI, AR_THUR, AR_FRI)- columns sa_ind - grep(SA_,columns) days - gsub(SA_,, columns[sa_ind]) days - paste0(days,$) selected - lapply(days, function(x) grep(x,columns)) selected - sort(unique(unlist(all_ind))) columns[selected] [1] SA_TUES SA_MON CH_TUES CH_MON AR_TUES AR_MON SA_THUR CH_THUR AR_THUR On Wed, Feb 18, 2015 at 2:55 PM, David Winsemius dwinsem...@comcast.net wrote: On Feb 18, 2015, at 12:27 PM, Kate Ignatius wrote: Hi, I've got a complicated grep problem (or not)... I currently have a file with the headings as follows: Lets assume these values are in a character vector named 'dat'. SA_TUES SA_MON SU_WED CH_TUES CH_WED CH_MON AR_TUES AR_WED AR_MON SA_THUR SU_FRI CH_THUR CH_FRI AR_THUR AR_FRI sadays - dat[grep(SA, dat) ] sads - gsub(SA_,,sadays) sads #[1] TUES MON THUR dat[ sapply(sads, grep, dat) ] #[1] SA_TUES CH_TUES AR_TUES SA_MON CH_MON AR_MON #[7] SA_THUR CH_THUR AR_THUR -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Paste every two columns together
I have genetic data as follows (simple example, actual data is much larger): comb = ID1 A A T G C T G C G T C G T A ID2 G C T G C C T G C T G T T T And I wish to get an output like this: ID1 AA TG CT GC GT CG TA ID2 GC TG CC TG CT GT TT That is, paste every two columns together. I have this code, but I get the error: Error in seq.default(2, nchar(x), 2) : 'to' must be of length 1 conc - function(x) { s - seq(2, nchar(x), 2) paste0(x[s], x[s+1]) } combn - as.data.frame(lapply(comb, conc), stringsAsFactors=FALSE) Thanks in advance! __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rle with data.table - is it possible?
, 2015, at 12:07 AM, Kate Ignatius wrote: Ah, crap. Yep you're right. This is not going too well. Okay - let me try that again: x$childseg-0 x-x$sumchild !=0 That previous line would appear to overwrite the entire dataframe with the value of one vector span-rle(x)$lengths[rle(x)$values==TRUE] x$childseg[x]-rep(seq_along(span), times = span) Does this one have any errors? Even assuming that the code from Jeff Newmiller is creating those objects I get x$childseg[x]-rep(seq_along(span), times = span) Error in `*tmp*`$childseg : $ operator is invalid for atomic vectors In the last line you are indexing a vector with a dataframe (or perhaps a data.table). If we use Newmiller's object and then change some of the instances of x in your code to DT we get: DT$childseg-0 x-DT$sumchild !=0 # Try not to overwrite your data-objects span-rle(x)$lengths[rle(x)$values==TRUE] DT$childseg[x]-rep(seq_along(span), times = span) DT Dad Mum Child Group sumdad summum sumchild childseg 1: AA RRRA A 2 200 2: AA RRRR A 2 211 3: AA AAAA B 4 551 4: AA AAAA B 4 551 5: RA AARR B 0 551 6: RR AARR B 4 551 7: AA AAAA B 4 551 8: AA AARA C 3 300 9: AA AARA C 3 300 10: AA RRRA C 3 300 You persist in posting code where you do not explain what you are trying to do with it. You have already been told that your earlier efforts using `rle` did not make any sense. Post a complete example and then explain what you desire as an object. It's often helpful to provide a scientific background for what the data represents. -- David. On Fri, Jan 2, 2015 at 2:32 AM, David Winsemius [hidden email] wrote: On Jan 1, 2015, at 5:07 PM, Kate Ignatius [hidden email] wrote: Apologies - mix up of syntax all over the place, a habit of mine. The last line was in there because of code beforehand so it really doesn't need to be there. Here is the proper code I hope: childseg-0 x-sumchild ==0 span-rle(x)$lengths[rle(x)$values==TRUE] childseg[x]-rep(seq_along(span), times = span) This remains not reproducible. We have no idea what sumchild might be and the code throws an error. My guess is that you are trying to get a result such as would be delivered by: childseg - sumchild[ sumchild != 0 ] ? David. On Thu, Jan 1, 2015 at 12:13 PM, Jeff Newmiller [hidden email] wrote: Thank you for attempting to encode what you want using R syntax, but you are not really succeeding yet (too many errors). Perhaps another hand generated result would help? A new input data frame might or might not be needed to illustrate desired results. Your second and third lines are syntactically incorrect, and I don't understand what you hope to accomplish by assigning an empty string to a numeric in your last line. --- Jeff NewmillerThe . . Go Live... DCN:[hidden email]Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On January 1, 2015 4:16:52 AM PST, Kate Ignatius [hidden email] wrote: Is it possible to add the following code or similar in data.table: childseg-0 x:=sumchild -0 span-rle(x)$lengths[rle(x)$values==TRUE childseg[x]-rep(seq_along(span), times = span) childseg[childseg == 0]-'' I was hoping to do this code by Group for mum, dad and child. The problem I'm having is with the span-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can be added to data.table. [Previous email had incorrect code] On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller [hidden email] wrote: I do not understand the value of using the rle function in your description, but the code below appears to produce the table you want. Note that better support for the data.table package might be found at stackexchange as the documentation specifies. x - read.table( text= Dad Mum Child Group AA RR RA A AA RR RR A AA AA AA B AA AA AA B RA AA RR B RR AA RR B AA AA AA B AA AA RA C AA AA RA C AA RR RA C , header=TRUE, stringsAsFactors=FALSE ) library(data.table) DT - data.table( x ) DT[ , cdad := as.integer( Dad %in% c( AA, RR ) ) ] DT[ , sumdad := 0L ] DT[ 1==DT$cdad, sumdad := sum( cdad
Re: [R] rle with data.table - is it possible?
Ah, crap. Yep you're right. This is not going too well. Okay - let me try that again: x$childseg-0 x-x$sumchild !=0 span-rle(x)$lengths[rle(x)$values==TRUE] x$childseg[x]-rep(seq_along(span), times = span) Does this one have any errors? On Fri, Jan 2, 2015 at 2:32 AM, David Winsemius dwinsem...@comcast.net wrote: On Jan 1, 2015, at 5:07 PM, Kate Ignatius kate.ignat...@gmail.com wrote: Apologies - mix up of syntax all over the place, a habit of mine. The last line was in there because of code beforehand so it really doesn't need to be there. Here is the proper code I hope: childseg-0 x-sumchild ==0 span-rle(x)$lengths[rle(x)$values==TRUE] childseg[x]-rep(seq_along(span), times = span) This remains not reproducible. We have no idea what sumchild might be and the code throws an error. My guess is that you are trying to get a result such as would be delivered by: childseg - sumchild[ sumchild != 0 ] — David. On Thu, Jan 1, 2015 at 12:13 PM, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: Thank you for attempting to encode what you want using R syntax, but you are not really succeeding yet (too many errors). Perhaps another hand generated result would help? A new input data frame might or might not be needed to illustrate desired results. Your second and third lines are syntactically incorrect, and I don't understand what you hope to accomplish by assigning an empty string to a numeric in your last line. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On January 1, 2015 4:16:52 AM PST, Kate Ignatius kate.ignat...@gmail.com wrote: Is it possible to add the following code or similar in data.table: childseg-0 x:=sumchild -0 span-rle(x)$lengths[rle(x)$values==TRUE childseg[x]-rep(seq_along(span), times = span) childseg[childseg == 0]-'' I was hoping to do this code by Group for mum, dad and child. The problem I'm having is with the span-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can be added to data.table. [Previous email had incorrect code] On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: I do not understand the value of using the rle function in your description, but the code below appears to produce the table you want. Note that better support for the data.table package might be found at stackexchange as the documentation specifies. x - read.table( text= Dad Mum Child Group AA RR RA A AA RR RR A AA AA AA B AA AA AA B RA AA RR B RR AA RR B AA AA AA B AA AA RA C AA AA RA C AA RR RA C , header=TRUE, stringsAsFactors=FALSE ) library(data.table) DT - data.table( x ) DT[ , cdad := as.integer( Dad %in% c( AA, RR ) ) ] DT[ , sumdad := 0L ] DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ] DT[ , cdad := NULL ] DT[ , cmum := as.integer( Mum %in% c( AA, RR ) ) ] DT[ , summum := 0L ] DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ] DT[ , cmum := NULL ] DT[ , cchild := as.integer( Child %in% c( AA, RR ) ) ] DT[ , sumchild := 0L ] DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ] DT[ , cchild := NULL ] DT Dad Mum Child Group sumdad summum sumchild 1: AA RRRA A 2 20 2: AA RRRR A 2 21 3: AA AAAA B 4 55 4: AA AAAA B 4 55 5: RA AARR B 0 55 6: RR AARR B 4 55 7: AA AAAA B 4 55 8: AA AARA C 3 30 9: AA AARA C 3 30 10: AA RRRA C 3 30 On Tue, 30 Dec 2014, Kate Ignatius wrote: I'm trying to use both these packages and wondering whether they are possible... To make this simple, my ultimate goal is determine long stretches of 1s, but I want to do this within groups (hence using the data.table as I use the set key option. However, I'm I'm not having much luck making this possible. For example, for simplistic sake, I have the following data: Dad Mum Child Group AA RR RA A AA RR RR A AA AA AA B AA AA AA B RA AA RR B RR AA RR B AA AA AA B AA AA RA C AA AA RA C AA RR RA C And the following code which I know works hetdad - as.numeric(x[c(1)]==AA | x[c(1)]==RR) sumdad - rle(hetdad)$lengths[rle(hetdad)$values==1] hetmum - as.numeric(x[c(2)]==AA | x[c(2)]==RR) summum - rle(hetmum)$lengths[rle(hetmum)$values==1] hetchild - as.numeric(x[c(3)]==AA
Re: [R] rle with data.table - is it possible?
Is it possible to add the following code or similar in data.table: childseg-0 x:=sumchild -0 span-rle(x)$lengths[rle(x)$values==TRUE childseg[x]-rep(seq_along(span), times = span) childseg[childseg == 0]-'' I was hoping to do this code by Group for mum, dad and child. The problem I'm having is with the span-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can be added to data.table. [Previous email had incorrect code] On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: I do not understand the value of using the rle function in your description, but the code below appears to produce the table you want. Note that better support for the data.table package might be found at stackexchange as the documentation specifies. x - read.table( text= Dad Mum Child Group AA RR RA A AA RR RR A AA AA AA B AA AA AA B RA AA RR B RR AA RR B AA AA AA B AA AA RA C AA AA RA C AA RR RA C , header=TRUE, stringsAsFactors=FALSE ) library(data.table) DT - data.table( x ) DT[ , cdad := as.integer( Dad %in% c( AA, RR ) ) ] DT[ , sumdad := 0L ] DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ] DT[ , cdad := NULL ] DT[ , cmum := as.integer( Mum %in% c( AA, RR ) ) ] DT[ , summum := 0L ] DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ] DT[ , cmum := NULL ] DT[ , cchild := as.integer( Child %in% c( AA, RR ) ) ] DT[ , sumchild := 0L ] DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ] DT[ , cchild := NULL ] DT Dad Mum Child Group sumdad summum sumchild 1: AA RRRA A 2 20 2: AA RRRR A 2 21 3: AA AAAA B 4 55 4: AA AAAA B 4 55 5: RA AARR B 0 55 6: RR AARR B 4 55 7: AA AAAA B 4 55 8: AA AARA C 3 30 9: AA AARA C 3 30 10: AA RRRA C 3 30 On Tue, 30 Dec 2014, Kate Ignatius wrote: I'm trying to use both these packages and wondering whether they are possible... To make this simple, my ultimate goal is determine long stretches of 1s, but I want to do this within groups (hence using the data.table as I use the set key option. However, I'm I'm not having much luck making this possible. For example, for simplistic sake, I have the following data: Dad Mum Child Group AA RR RA A AA RR RR A AA AA AA B AA AA AA B RA AA RR B RR AA RR B AA AA AA B AA AA RA C AA AA RA C AA RR RA C And the following code which I know works hetdad - as.numeric(x[c(1)]==AA | x[c(1)]==RR) sumdad - rle(hetdad)$lengths[rle(hetdad)$values==1] hetmum - as.numeric(x[c(2)]==AA | x[c(2)]==RR) summum - rle(hetmum)$lengths[rle(hetmum)$values==1] hetchild - as.numeric(x[c(3)]==AA | x[c(3)]==RR) sumchild - rle(hetchild)$lengths[rle(hetchild)$values==1] However, I wish to do the above code by Group (though this file is millions of rows long and groups will be larger but just wanted to simply the example). I did something like this but of course I got an error: LOH[,hetdad:=as.numeric(x[c(1)]==AA | x[c(1)]==RR)] LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group] LOH[,hetmum:=as.numeric(x[c(2)]==AA | x[c(2)]==RR)] LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group] LOH[,hetchild:=as.numeric(x[c(3)]==AA | x[c(3)]==RR)] LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group] The reason being as I want to eventually have something like this: Dad Mum Child Group sumdad summum sumchild AA RR RA A 2 2 0 AA RR RR A 2 2 1 AA AA AA B 4 5 5 AA AA AA B 4 5 5 RA AA RR B 0 5 5 RR AA RR B 4 5 5 AA AA AA B 4 5 5 AA AA RA C 3 3 0 AA AA RA C 3 3 0 AA RR RA C 3 3 0 That is, I would like to have the specific counts next to what I'm consecutively counting per group. So for Group A for dad there are 2 AAs, there are two RRs for mum but only 1 AA or RR for the child and that is RR (so the 1 is next to the RR and not the RA). Can this be done? K. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
Re: [R] rle with data.table - is it possible?
Apologies - mix up of syntax all over the place, a habit of mine. The last line was in there because of code beforehand so it really doesn't need to be there. Here is the proper code I hope: childseg-0 x-sumchild ==0 span-rle(x)$lengths[rle(x)$values==TRUE] childseg[x]-rep(seq_along(span), times = span) On Thu, Jan 1, 2015 at 12:13 PM, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: Thank you for attempting to encode what you want using R syntax, but you are not really succeeding yet (too many errors). Perhaps another hand generated result would help? A new input data frame might or might not be needed to illustrate desired results. Your second and third lines are syntactically incorrect, and I don't understand what you hope to accomplish by assigning an empty string to a numeric in your last line. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On January 1, 2015 4:16:52 AM PST, Kate Ignatius kate.ignat...@gmail.com wrote: Is it possible to add the following code or similar in data.table: childseg-0 x:=sumchild -0 span-rle(x)$lengths[rle(x)$values==TRUE childseg[x]-rep(seq_along(span), times = span) childseg[childseg == 0]-'' I was hoping to do this code by Group for mum, dad and child. The problem I'm having is with the span-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can be added to data.table. [Previous email had incorrect code] On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: I do not understand the value of using the rle function in your description, but the code below appears to produce the table you want. Note that better support for the data.table package might be found at stackexchange as the documentation specifies. x - read.table( text= Dad Mum Child Group AA RR RA A AA RR RR A AA AA AA B AA AA AA B RA AA RR B RR AA RR B AA AA AA B AA AA RA C AA AA RA C AA RR RA C , header=TRUE, stringsAsFactors=FALSE ) library(data.table) DT - data.table( x ) DT[ , cdad := as.integer( Dad %in% c( AA, RR ) ) ] DT[ , sumdad := 0L ] DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ] DT[ , cdad := NULL ] DT[ , cmum := as.integer( Mum %in% c( AA, RR ) ) ] DT[ , summum := 0L ] DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ] DT[ , cmum := NULL ] DT[ , cchild := as.integer( Child %in% c( AA, RR ) ) ] DT[ , sumchild := 0L ] DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ] DT[ , cchild := NULL ] DT Dad Mum Child Group sumdad summum sumchild 1: AA RRRA A 2 20 2: AA RRRR A 2 21 3: AA AAAA B 4 55 4: AA AAAA B 4 55 5: RA AARR B 0 55 6: RR AARR B 4 55 7: AA AAAA B 4 55 8: AA AARA C 3 30 9: AA AARA C 3 30 10: AA RRRA C 3 30 On Tue, 30 Dec 2014, Kate Ignatius wrote: I'm trying to use both these packages and wondering whether they are possible... To make this simple, my ultimate goal is determine long stretches of 1s, but I want to do this within groups (hence using the data.table as I use the set key option. However, I'm I'm not having much luck making this possible. For example, for simplistic sake, I have the following data: Dad Mum Child Group AA RR RA A AA RR RR A AA AA AA B AA AA AA B RA AA RR B RR AA RR B AA AA AA B AA AA RA C AA AA RA C AA RR RA C And the following code which I know works hetdad - as.numeric(x[c(1)]==AA | x[c(1)]==RR) sumdad - rle(hetdad)$lengths[rle(hetdad)$values==1] hetmum - as.numeric(x[c(2)]==AA | x[c(2)]==RR) summum - rle(hetmum)$lengths[rle(hetmum)$values==1] hetchild - as.numeric(x[c(3)]==AA | x[c(3)]==RR) sumchild - rle(hetchild)$lengths[rle(hetchild)$values==1] However, I wish to do the above code by Group (though this file is millions of rows long and groups will be larger but just wanted to simply the example). I did something like this but of course I got an error: LOH[,hetdad:=as.numeric(x[c(1)]==AA | x[c(1)]==RR)] LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group] LOH[,hetmum:=as.numeric(x[c(2)]==AA | x[c(2)]==RR)] LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group] LOH[,hetchild:=as.numeric(x[c(3)]==AA | x[c(3)]==RR)] LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1
Re: [R] rle with data.table - is it possible?
Is it possible to add the following code or similar in data.table: childseg-0 x:=sumchild -0 span-rle(x)$lengths[rle(x)$values==TRUE childseg[x]-rep(seq_along(span), times = spanLOH) childseg[childseg == 0]-'' I was hoping to do this code by SNPEFF_GENE_NAME for mum, dad and child. The problem I'm having is with the span-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can be added to data.table. On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: I do not understand the value of using the rle function in your description, but the code below appears to produce the table you want. Note that better support for the data.table package might be found at stackexchange as the documentation specifies. x - read.table( text= Dad Mum Child Group AA RR RA A AA RR RR A AA AA AA B AA AA AA B RA AA RR B RR AA RR B AA AA AA B AA AA RA C AA AA RA C AA RR RA C , header=TRUE, stringsAsFactors=FALSE ) library(data.table) DT - data.table( x ) DT[ , cdad := as.integer( Dad %in% c( AA, RR ) ) ] DT[ , sumdad := 0L ] DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ] DT[ , cdad := NULL ] DT[ , cmum := as.integer( Mum %in% c( AA, RR ) ) ] DT[ , summum := 0L ] DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ] DT[ , cmum := NULL ] DT[ , cchild := as.integer( Child %in% c( AA, RR ) ) ] DT[ , sumchild := 0L ] DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ] DT[ , cchild := NULL ] DT Dad Mum Child Group sumdad summum sumchild 1: AA RRRA A 2 20 2: AA RRRR A 2 21 3: AA AAAA B 4 55 4: AA AAAA B 4 55 5: RA AARR B 0 55 6: RR AARR B 4 55 7: AA AAAA B 4 55 8: AA AARA C 3 30 9: AA AARA C 3 30 10: AA RRRA C 3 30 On Tue, 30 Dec 2014, Kate Ignatius wrote: I'm trying to use both these packages and wondering whether they are possible... To make this simple, my ultimate goal is determine long stretches of 1s, but I want to do this within groups (hence using the data.table as I use the set key option. However, I'm I'm not having much luck making this possible. For example, for simplistic sake, I have the following data: Dad Mum Child Group AA RR RA A AA RR RR A AA AA AA B AA AA AA B RA AA RR B RR AA RR B AA AA AA B AA AA RA C AA AA RA C AA RR RA C And the following code which I know works hetdad - as.numeric(x[c(1)]==AA | x[c(1)]==RR) sumdad - rle(hetdad)$lengths[rle(hetdad)$values==1] hetmum - as.numeric(x[c(2)]==AA | x[c(2)]==RR) summum - rle(hetmum)$lengths[rle(hetmum)$values==1] hetchild - as.numeric(x[c(3)]==AA | x[c(3)]==RR) sumchild - rle(hetchild)$lengths[rle(hetchild)$values==1] However, I wish to do the above code by Group (though this file is millions of rows long and groups will be larger but just wanted to simply the example). I did something like this but of course I got an error: LOH[,hetdad:=as.numeric(x[c(1)]==AA | x[c(1)]==RR)] LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group] LOH[,hetmum:=as.numeric(x[c(2)]==AA | x[c(2)]==RR)] LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group] LOH[,hetchild:=as.numeric(x[c(3)]==AA | x[c(3)]==RR)] LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group] The reason being as I want to eventually have something like this: Dad Mum Child Group sumdad summum sumchild AA RR RA A 2 2 0 AA RR RR A 2 2 1 AA AA AA B 4 5 5 AA AA AA B 4 5 5 RA AA RR B 0 5 5 RR AA RR B 4 5 5 AA AA AA B 4 5 5 AA AA RA C 3 3 0 AA AA RA C 3 3 0 AA RR RA C 3 3 0 That is, I would like to have the specific counts next to what I'm consecutively counting per group. So for Group A for dad there are 2 AAs, there are two RRs for mum but only 1 AA or RR for the child and that is RR (so the 1 is next to the RR and not the RA). Can this be done? K. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- __ R-help@r-project.org
Re: [R] rle with data.table - is it possible?
correct code: childseg-0 x:=sumchild -0 span-rle(x)$lengths[rle(x)$values==TRUE childseg[x]-rep(seq_along(span), times = span) childseg[childseg == 0]-'' On Thu, Jan 1, 2015 at 1:56 AM, Kate Ignatius kate.ignat...@gmail.com wrote: Is it possible to add the following code or similar in data.table: childseg-0 x:=sumchild -0 span-rle(x)$lengths[rle(x)$values==TRUE childseg[x]-rep(seq_along(span), times = spanLOH) childseg[childseg == 0]-'' I was hoping to do this code by SNPEFF_GENE_NAME for mum, dad and child. The problem I'm having is with the span-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can be added to data.table. On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: I do not understand the value of using the rle function in your description, but the code below appears to produce the table you want. Note that better support for the data.table package might be found at stackexchange as the documentation specifies. x - read.table( text= Dad Mum Child Group AA RR RA A AA RR RR A AA AA AA B AA AA AA B RA AA RR B RR AA RR B AA AA AA B AA AA RA C AA AA RA C AA RR RA C , header=TRUE, stringsAsFactors=FALSE ) library(data.table) DT - data.table( x ) DT[ , cdad := as.integer( Dad %in% c( AA, RR ) ) ] DT[ , sumdad := 0L ] DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ] DT[ , cdad := NULL ] DT[ , cmum := as.integer( Mum %in% c( AA, RR ) ) ] DT[ , summum := 0L ] DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ] DT[ , cmum := NULL ] DT[ , cchild := as.integer( Child %in% c( AA, RR ) ) ] DT[ , sumchild := 0L ] DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ] DT[ , cchild := NULL ] DT Dad Mum Child Group sumdad summum sumchild 1: AA RRRA A 2 20 2: AA RRRR A 2 21 3: AA AAAA B 4 55 4: AA AAAA B 4 55 5: RA AARR B 0 55 6: RR AARR B 4 55 7: AA AAAA B 4 55 8: AA AARA C 3 30 9: AA AARA C 3 30 10: AA RRRA C 3 30 On Tue, 30 Dec 2014, Kate Ignatius wrote: I'm trying to use both these packages and wondering whether they are possible... To make this simple, my ultimate goal is determine long stretches of 1s, but I want to do this within groups (hence using the data.table as I use the set key option. However, I'm I'm not having much luck making this possible. For example, for simplistic sake, I have the following data: Dad Mum Child Group AA RR RA A AA RR RR A AA AA AA B AA AA AA B RA AA RR B RR AA RR B AA AA AA B AA AA RA C AA AA RA C AA RR RA C And the following code which I know works hetdad - as.numeric(x[c(1)]==AA | x[c(1)]==RR) sumdad - rle(hetdad)$lengths[rle(hetdad)$values==1] hetmum - as.numeric(x[c(2)]==AA | x[c(2)]==RR) summum - rle(hetmum)$lengths[rle(hetmum)$values==1] hetchild - as.numeric(x[c(3)]==AA | x[c(3)]==RR) sumchild - rle(hetchild)$lengths[rle(hetchild)$values==1] However, I wish to do the above code by Group (though this file is millions of rows long and groups will be larger but just wanted to simply the example). I did something like this but of course I got an error: LOH[,hetdad:=as.numeric(x[c(1)]==AA | x[c(1)]==RR)] LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group] LOH[,hetmum:=as.numeric(x[c(2)]==AA | x[c(2)]==RR)] LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group] LOH[,hetchild:=as.numeric(x[c(3)]==AA | x[c(3)]==RR)] LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group] The reason being as I want to eventually have something like this: Dad Mum Child Group sumdad summum sumchild AA RR RA A 2 2 0 AA RR RR A 2 2 1 AA AA AA B 4 5 5 AA AA AA B 4 5 5 RA AA RR B 0 5 5 RR AA RR B 4 5 5 AA AA AA B 4 5 5 AA AA RA C 3 3 0 AA AA RA C 3 3 0 AA RR RA C 3 3 0 That is, I would like to have the specific counts next to what I'm consecutively counting per group. So for Group A for dad there are 2 AAs, there are two RRs for mum but only 1 AA or RR for the child and that is RR (so the 1 is next to the RR and not the RA). Can this be done? K. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O
[R] rle with data.table - is it possible?
I'm trying to use both these packages and wondering whether they are possible... To make this simple, my ultimate goal is determine long stretches of 1s, but I want to do this within groups (hence using the data.table as I use the set key option. However, I'm I'm not having much luck making this possible. For example, for simplistic sake, I have the following data: Dad Mum Child Group AA RR RA A AA RR RR A AA AA AA B AA AA AA B RA AA RR B RR AA RR B AA AA AA B AA AA RA C AA AA RA C AA RR RA C And the following code which I know works hetdad - as.numeric(x[c(1)]==AA | x[c(1)]==RR) sumdad - rle(hetdad)$lengths[rle(hetdad)$values==1] hetmum - as.numeric(x[c(2)]==AA | x[c(2)]==RR) summum - rle(hetmum)$lengths[rle(hetmum)$values==1] hetchild - as.numeric(x[c(3)]==AA | x[c(3)]==RR) sumchild - rle(hetchild)$lengths[rle(hetchild)$values==1] However, I wish to do the above code by Group (though this file is millions of rows long and groups will be larger but just wanted to simply the example). I did something like this but of course I got an error: LOH[,hetdad:=as.numeric(x[c(1)]==AA | x[c(1)]==RR)] LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group] LOH[,hetmum:=as.numeric(x[c(2)]==AA | x[c(2)]==RR)] LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group] LOH[,hetchild:=as.numeric(x[c(3)]==AA | x[c(3)]==RR)] LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group] The reason being as I want to eventually have something like this: Dad Mum Child Group sumdad summum sumchild AA RR RA A 2 2 0 AA RR RR A 2 2 1 AA AA AA B 4 5 5 AA AA AA B 4 5 5 RA AA RR B 0 5 5 RR AA RR B 4 5 5 AA AA AA B 4 5 5 AA AA RA C 3 3 0 AA AA RA C 3 3 0 AA RR RA C 3 3 0 That is, I would like to have the specific counts next to what I'm consecutively counting per group. So for Group A for dad there are 2 AAs, there are two RRs for mum but only 1 AA or RR for the child and that is RR (so the 1 is next to the RR and not the RA). Can this be done? K. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Printing/Generating/Outputting a Table (Not Latex)
Thanks! I do get several errors though when running on Linux. Running your code, I get this: Error in system(cmd, intern = TRUE, wait = TRUE) : error in running command Fiddling around with the code and running this: tmp - matrix(1:9,3,3) tmp.tex - latex(tmp, file='tmp.tex') print.default(tmp.tex) tmp.dvi - dvi(tmp.tex) tmp.dvi tmp.tex dvips(tmp.dvi) dvips(tmp.tex) library(tools) texi2dvi(file='tmp.tex', pdf=TRUE, clean=TRUE) I get this: Error in texi2dvi(file=tmp.tex,, : Running 'texi2dvi' on 'tmp.tex' failed. Messages: /usr/bin/texi2dvi: pdflatex exited with bad status, quitting. I've read that it may have something to do with the path of pdflatex. Sys.which('pdflatex') pdflatex /usr/bin/pdflatex Sys.which('texi2dvi') texi2dvi /usr/bin/texi2dvi file.exists(Sys.which('texi2dvi')) [1] TRUE file.exists(Sys.which('pdflatex')) [1] TRUE Is there a specific path I should be giving with pdflatex and/or 'texi2dvi to make this work? Thanks! On Mon, Dec 8, 2014 at 11:13 PM, Richard M. Heiberger r...@temple.edu wrote: yes of course, and the answer is latex() in the Hmisc package. Why were you excluding it? Details follow Rich The current release of the Hmisc package has this capability on Macintosh and Linux. For Windows, you need the next release 3.14-7 which is available now at github. ## windows needs these lines until the new Hmisc version is on CRAN install.packages(devtools) devtools::install_github(Hmisc, harrelfe) ## All operating systems options(latexcmd='pdflatex') options(dviExtension='pdf') ## Macintosh options(xdvicmd='open') ## Windows, one of the following options(xdvicmd='c:\\progra~1\\Adobe\\Reader~1.0\\Reader\\AcroRd32.exe') ## 32-bit windows options(xdvicmd='c:\\progra~2\\Adobe\\Reader~1.0\\Reader\\AcroRd32.exe') ## 64 bit windows ## Linux ## I don't know the xdvicmd value ## this works on all R systems library(Hmisc) tmp - matrix(1:9,3,3) tmp.dvi - dvi(latex(tmp)) print.default(tmp.dvi) ## prints filepath of the pdf file tmp.dvi ## displays the pdf file on your screen On Mon, Dec 8, 2014 at 9:31 PM, Kate Ignatius kate.ignat...@gmail.com wrote: Hi, I have a simple question. I know there are plenty of packages out there that can provide code to generate a table in latex. But I was wondering whether there was one out there where I can generate a table from my data (which ever way I please) then allow me to save it as a pdf? Thanks K. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Printing/Generating/Outputting a Table (Not Latex)
Ah yes, you're right. The log has this error: ! LaTeX Error: Missing \begin{document}. Though can't really find much online on how to resolve it. On Tue, Dec 9, 2014 at 1:15 PM, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: pdflatex appears to have run, because it exited. You should look at the tex log file, the problem is more likely that the latex you sent out to pdflatex was incomplete. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On December 9, 2014 8:43:02 AM PST, Kate Ignatius kate.ignat...@gmail.com wrote: Thanks! I do get several errors though when running on Linux. Running your code, I get this: Error in system(cmd, intern = TRUE, wait = TRUE) : error in running command Fiddling around with the code and running this: tmp - matrix(1:9,3,3) tmp.tex - latex(tmp, file='tmp.tex') print.default(tmp.tex) tmp.dvi - dvi(tmp.tex) tmp.dvi tmp.tex dvips(tmp.dvi) dvips(tmp.tex) library(tools) texi2dvi(file='tmp.tex', pdf=TRUE, clean=TRUE) I get this: Error in texi2dvi(file=tmp.tex,, : Running 'texi2dvi' on 'tmp.tex' failed. Messages: /usr/bin/texi2dvi: pdflatex exited with bad status, quitting. I've read that it may have something to do with the path of pdflatex. Sys.which('pdflatex') pdflatex /usr/bin/pdflatex Sys.which('texi2dvi') texi2dvi /usr/bin/texi2dvi file.exists(Sys.which('texi2dvi')) [1] TRUE file.exists(Sys.which('pdflatex')) [1] TRUE Is there a specific path I should be giving with pdflatex and/or 'texi2dvi to make this work? Thanks! On Mon, Dec 8, 2014 at 11:13 PM, Richard M. Heiberger r...@temple.edu wrote: yes of course, and the answer is latex() in the Hmisc package. Why were you excluding it? Details follow Rich The current release of the Hmisc package has this capability on Macintosh and Linux. For Windows, you need the next release 3.14-7 which is available now at github. ## windows needs these lines until the new Hmisc version is on CRAN install.packages(devtools) devtools::install_github(Hmisc, harrelfe) ## All operating systems options(latexcmd='pdflatex') options(dviExtension='pdf') ## Macintosh options(xdvicmd='open') ## Windows, one of the following options(xdvicmd='c:\\progra~1\\Adobe\\Reader~1.0\\Reader\\AcroRd32.exe') ## 32-bit windows options(xdvicmd='c:\\progra~2\\Adobe\\Reader~1.0\\Reader\\AcroRd32.exe') ## 64 bit windows ## Linux ## I don't know the xdvicmd value ## this works on all R systems library(Hmisc) tmp - matrix(1:9,3,3) tmp.dvi - dvi(latex(tmp)) print.default(tmp.dvi) ## prints filepath of the pdf file tmp.dvi ## displays the pdf file on your screen On Mon, Dec 8, 2014 at 9:31 PM, Kate Ignatius kate.ignat...@gmail.com wrote: Hi, I have a simple question. I know there are plenty of packages out there that can provide code to generate a table in latex. But I was wondering whether there was one out there where I can generate a table from my data (which ever way I please) then allow me to save it as a pdf? Thanks K. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Printing/Generating/Outputting a Table (Not Latex)
I set these options: options(latexcmd='pdflatex') options(dviExtension='pdf') options(xdvicmd='xdvi') Maybe one too many? I'm running in Linux. On Tue, Dec 9, 2014 at 3:24 PM, Richard M. Heiberger r...@temple.edu wrote: It looks like you skipped the step of setting the options. the latex function doesn't do pdflatex (by default it does regular latex) unless you set the options as I indicated. On Tue, Dec 9, 2014 at 3:11 PM, Kate Ignatius kate.ignat...@gmail.com wrote: Ah yes, you're right. The log has this error: ! LaTeX Error: Missing \begin{document}. Though can't really find much online on how to resolve it. On Tue, Dec 9, 2014 at 1:15 PM, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: pdflatex appears to have run, because it exited. You should look at the tex log file, the problem is more likely that the latex you sent out to pdflatex was incomplete. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On December 9, 2014 8:43:02 AM PST, Kate Ignatius kate.ignat...@gmail.com wrote: Thanks! I do get several errors though when running on Linux. Running your code, I get this: Error in system(cmd, intern = TRUE, wait = TRUE) : error in running command Fiddling around with the code and running this: tmp - matrix(1:9,3,3) tmp.tex - latex(tmp, file='tmp.tex') print.default(tmp.tex) tmp.dvi - dvi(tmp.tex) tmp.dvi tmp.tex dvips(tmp.dvi) dvips(tmp.tex) library(tools) texi2dvi(file='tmp.tex', pdf=TRUE, clean=TRUE) I get this: Error in texi2dvi(file=tmp.tex,, : Running 'texi2dvi' on 'tmp.tex' failed. Messages: /usr/bin/texi2dvi: pdflatex exited with bad status, quitting. I've read that it may have something to do with the path of pdflatex. Sys.which('pdflatex') pdflatex /usr/bin/pdflatex Sys.which('texi2dvi') texi2dvi /usr/bin/texi2dvi file.exists(Sys.which('texi2dvi')) [1] TRUE file.exists(Sys.which('pdflatex')) [1] TRUE Is there a specific path I should be giving with pdflatex and/or 'texi2dvi to make this work? Thanks! On Mon, Dec 8, 2014 at 11:13 PM, Richard M. Heiberger r...@temple.edu wrote: yes of course, and the answer is latex() in the Hmisc package. Why were you excluding it? Details follow Rich The current release of the Hmisc package has this capability on Macintosh and Linux. For Windows, you need the next release 3.14-7 which is available now at github. ## windows needs these lines until the new Hmisc version is on CRAN install.packages(devtools) devtools::install_github(Hmisc, harrelfe) ## All operating systems options(latexcmd='pdflatex') options(dviExtension='pdf') ## Macintosh options(xdvicmd='open') ## Windows, one of the following options(xdvicmd='c:\\progra~1\\Adobe\\Reader~1.0\\Reader\\AcroRd32.exe') ## 32-bit windows options(xdvicmd='c:\\progra~2\\Adobe\\Reader~1.0\\Reader\\AcroRd32.exe') ## 64 bit windows ## Linux ## I don't know the xdvicmd value ## this works on all R systems library(Hmisc) tmp - matrix(1:9,3,3) tmp.dvi - dvi(latex(tmp)) print.default(tmp.dvi) ## prints filepath of the pdf file tmp.dvi ## displays the pdf file on your screen On Mon, Dec 8, 2014 at 9:31 PM, Kate Ignatius kate.ignat...@gmail.com wrote: Hi, I have a simple question. I know there are plenty of packages out there that can provide code to generate a table in latex. But I was wondering whether there was one out there where I can generate a table from my data (which ever way I please) then allow me to save it as a pdf? Thanks K. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Printing/Generating/Outputting a Table (Not Latex)
Okay, all. I have it to work using this: library(Hmisc) options(latexcmd='pdflatex') options(dviExtension='pdf') options(xdvicmd='gnome-open') Running your simple code from above... by question is this: the pdf is saved in a tmp directory... where do I change the directory path? I thought it was simply this: tmp.dvi - dvi(latex(m2,file='/path/to/file/tmp.pdf', label=Title)) But maybe not. In addition is it possible to change page size with this? K. On Tue, Dec 9, 2014 at 4:02 PM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: On 09/12/2014 20:47, Richard M. Heiberger wrote: the last one is wrong. That is the one for which I don't know the right answer on linux. 'xdvi' displays dvi files. you need to display a pdf file. whatever is the right program on linux to display pdf files is what belongs there. On Macintosh we can avoid knowing by using 'open', which means use the system standard. I don't know what the linux equivalent is, either the exact program or the instruction to use the standard. xdg-open (but like OS X it depends on having the right associations set). On Tue, Dec 9, 2014 at 3:36 PM, Kate Ignatius kate.ignat...@gmail.com wrote: I set these options: options(latexcmd='pdflatex') options(dviExtension='pdf') options(xdvicmd='xdvi') Maybe one too many? I'm running in Linux. On Tue, Dec 9, 2014 at 3:24 PM, Richard M. Heiberger r...@temple.edu wrote: It looks like you skipped the step of setting the options. the latex function doesn't do pdflatex (by default it does regular latex) unless you set the options as I indicated. On Tue, Dec 9, 2014 at 3:11 PM, Kate Ignatius kate.ignat...@gmail.com wrote: Ah yes, you're right. The log has this error: ! LaTeX Error: Missing \begin{document}. Though can't really find much online on how to resolve it. On Tue, Dec 9, 2014 at 1:15 PM, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: pdflatex appears to have run, because it exited. You should look at the tex log file, the problem is more likely that the latex you sent out to pdflatex was incomplete. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On December 9, 2014 8:43:02 AM PST, Kate Ignatius kate.ignat...@gmail.com wrote: Thanks! I do get several errors though when running on Linux. Running your code, I get this: Error in system(cmd, intern = TRUE, wait = TRUE) : error in running command Fiddling around with the code and running this: tmp - matrix(1:9,3,3) tmp.tex - latex(tmp, file='tmp.tex') print.default(tmp.tex) tmp.dvi - dvi(tmp.tex) tmp.dvi tmp.tex dvips(tmp.dvi) dvips(tmp.tex) library(tools) texi2dvi(file='tmp.tex', pdf=TRUE, clean=TRUE) I get this: Error in texi2dvi(file=tmp.tex,, : Running 'texi2dvi' on 'tmp.tex' failed. Messages: /usr/bin/texi2dvi: pdflatex exited with bad status, quitting. I've read that it may have something to do with the path of pdflatex. Sys.which('pdflatex') pdflatex /usr/bin/pdflatex Sys.which('texi2dvi') texi2dvi /usr/bin/texi2dvi file.exists(Sys.which('texi2dvi')) [1] TRUE file.exists(Sys.which('pdflatex')) [1] TRUE Is there a specific path I should be giving with pdflatex and/or 'texi2dvi to make this work? Thanks! On Mon, Dec 8, 2014 at 11:13 PM, Richard M. Heiberger r...@temple.edu wrote: yes of course, and the answer is latex() in the Hmisc package. Why were you excluding it? Details follow Rich The current release of the Hmisc package has this capability on Macintosh and Linux. For Windows, you need the next release 3.14-7 which is available now at github. ## windows needs these lines until the new Hmisc version is on CRAN install.packages(devtools) devtools::install_github(Hmisc, harrelfe) ## All operating systems options(latexcmd='pdflatex') options(dviExtension='pdf') ## Macintosh options(xdvicmd='open') ## Windows, one of the following options(xdvicmd='c:\\progra~1\\Adobe\\Reader~1.0\\Reader\\AcroRd32.exe') ## 32-bit windows options(xdvicmd='c:\\progra~2\\Adobe\\Reader~1.0\\Reader\\AcroRd32.exe') ## 64 bit windows ## Linux ## I don't know the xdvicmd value ## this works on all R systems library(Hmisc) tmp - matrix(1:9,3,3) tmp.dvi - dvi(latex(tmp)) print.default(tmp.dvi) ## prints filepath of the pdf file tmp.dvi ## displays the pdf file on your screen On Mon, Dec 8, 2014 at 9:31 PM, Kate Ignatius kate.ignat...@gmail.com wrote: Hi
[R] Printing/Generating/Outputting a Table (Not Latex)
Hi, I have a simple question. I know there are plenty of packages out there that can provide code to generate a table in latex. But I was wondering whether there was one out there where I can generate a table from my data (which ever way I please) then allow me to save it as a pdf? Thanks K. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] recoding genetic information using gsub
I have genetic information for several thousand individuals: A/T T/G C/G etc For some individuals there are some genotypes that are like this: A/, C/, T/, G/ or even just / which represents missing and I want to change these to the following: A/ A/. C/ C/. G/ G/. T/ T/. / ./. /A ./A /C ./C /G ./G /T ./T I've tried to use gsub with a command like the following: gsub(A/,[A/.], GT[,6]) but if genotypes arent like the above, the command will change it to look something like: A/.T T/.G C/.G Is there anyway to be more specific in gsub? Thanks! __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grep won't work finding one column
I'm having an issue with grep: I have numerous columns that end with .at... when I use grep like so: df[,grep(.at,colnames(df))] it works fine. When I have one column that ends with .at, it does not work. Why is that? As this is loop with varying number of columns ending in .at I would like some code that would work with 1 to n number of columns. Is there something more optimal than grep? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep won't work finding one column
For example, DF will usually have numerous columns with sample1.at sample1.dp sample1.fg sample2.at sample2.dp sample2.fg and so on I'm running this code in R as part of a shell script which runs over several different file sizes so sometimes it will come across a file with one sample in it: i.e. sample1: when the R code runs through this file... trying to grep out the sample1.at column does not work and it will halt and stop. Here is some sample data... say I want to get out the AT_ only column Sample_1 AT_1 A/A RR G/G AA T/T AA G/A RA G/G RR C/C AA C/C AA C/T RA A/A AA T/G RA it will have a problem grepping out this single column. On Tue, Oct 14, 2014 at 10:38 AM, John McKown john.archie.mck...@gmail.com wrote: On Tue, Oct 14, 2014 at 9:23 AM, Kate Ignatius kate.ignat...@gmail.com wrote: I'm having an issue with grep: I have numerous columns that end with .at... when I use grep like so: df[,grep(.at,colnames(df))] it works fine. When I have one column that ends with .at, it does not work. Why is that? As this is loop with varying number of columns ending in .at I would like some code that would work with 1 to n number of columns. Is there something more optimal than grep? Thanks! I can't answer your direct question. But do you realize that your code does not match your words? The grep show does not _only_ match columns who name end with the characters '.at'. It matches all column names which contain any character followed by the characters at. To do the match with only columns whose names end with the characters .at, you need: grep(\.at$,colnames(df)). You might want to post an example which fails. Just to be complete, be sure to use the dput() function so that it is easy for members of the group to cut'n'paste to get your data into our own R workspace. -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! John McKown __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep won't work finding one column
In the sense - it does not work. it works when there are 50 samples in the file, but it does not work when there is one. The usual headings are: sample1.at sample1.dp sample1.fg sample2.at sample2.dp sample2.fg and so on to a max of sample50.at sample50.dp sample50.fg using this greps out all the .at columns perfectly: df[,grep(.at,colnames(df))] When I come across a file when there is one sample: sample1.at sample1.dp sample1.fg Using this: df[,grep(.at,colnames(df))] returns nothing. Oh - AT/at was just an example... thats not my problem... On Tue, Oct 14, 2014 at 10:57 AM, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: Your question is missing a reproducible example, and you don't say how it does not work, so we cannot tell what is going on. Two things do come to mind, though. A) Data frame subsets with only one column by default return a vector, which is a different type of object than a single-column data frame. You would need to read ?[.data.frame about the drop argument if you wanted to consistently get a data frame from this expression. B) The period is a wildcard in regular expressions. If you expect to limit your search to literal .at at the end of the name then you should use the search pattern \\.at$ instead (the first slash allows the second one to be stored by R in the string, and the second one is the only one seen by grep, which it reads as making the period not act like a wildcard). You really should read about regular expressions before using them. There are many tutorials on the web about this topic. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On October 14, 2014 7:23:55 AM PDT, Kate Ignatius kate.ignat...@gmail.com wrote: I'm having an issue with grep: I have numerous columns that end with .at... when I use grep like so: df[,grep(.at,colnames(df))] it works fine. When I have one column that ends with .at, it does not work. Why is that? As this is loop with varying number of columns ending in .at I would like some code that would work with 1 to n number of columns. Is there something more optimal than grep? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with a function [along columns]
Hi all, I need help with a function. I'm trying to write a function to apply to varying number of columns in a lot of files - hence the function... but I'm getting stuck. Here it is: gt- function(x) { alleles - sapply(x, function(.) strsplit(as.character(.), /)) gt - apply(x, function(.) ifelse(x[1] == vcf[3] x[2] == vcf[3], 'RR', ifelse(x[1] == vcf[4] x[2] == vcf[4], 'AA', ifelse(x[1] == vcf[3] x[2] == vcf[4], 'RA', ifelse(x[1] == vcf[4] x[2] == vcf[3], 'RA', '') } I have different sized family genetic files and at the end of the day I want to see whether the alleles of each person in the family match the ref and/or the alt and if so, give AA, RA or RR. Like so: REF ALT Sample_1 GT_1 Sample_2 GT_2 A G A/A RR A/G RA T G G/G AA T/T RR A T T/T AA A/A RR G A G/A RA G/G RR G A G/G RR G/A RA T C C/C AA C/C AA T C C/C AA C/C AA C T C/T RA T/T AA G A A/A AA A/A AA T G T/G RA G/G AA Is there an easy way to do this? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with a function [along columns]
Just an update to this: gtal - function(d) { alleles - sapply(d, function(.) strsplit(as.character(.), /)) gt - unlist(lapply(alleles, function(x) ifelse(identical(x[[1]], vcf[,3]) identical(x[[2]], vcf[,3]), 'RR', ifelse(identical(x[[1]], vcf[,4]) identical(x[[2]], vcf[,4]), 'AA', ifelse(identical(x[[1]], vcf[,3]) identical(x[[2]], vcf[,4]), 'RA', ifelse(identical(x[[1]], vcf[,4]) identical(x[[2]], vcf[,3]), 'RA', '')) } I've got something working but I'm having trouble with the gt part... I'm getting the error: object of type 'closure' is not subsettable. The vcf is my original file that I want to match with so not sure whether this a problem. On Mon, Oct 13, 2014 at 4:46 PM, Kate Ignatius kate.ignat...@gmail.com wrote: Hi all, I need help with a function. I'm trying to write a function to apply to varying number of columns in a lot of files - hence the function... but I'm getting stuck. Here it is: gt- function(x) { alleles - sapply(x, function(.) strsplit(as.character(.), /)) gt - apply(x, function(.) ifelse(x[1] == vcf[3] x[2] == vcf[3], 'RR', ifelse(x[1] == vcf[4] x[2] == vcf[4], 'AA', ifelse(x[1] == vcf[3] x[2] == vcf[4], 'RA', ifelse(x[1] == vcf[4] x[2] == vcf[3], 'RA', '') } I have different sized family genetic files and at the end of the day I want to see whether the alleles of each person in the family match the ref and/or the alt and if so, give AA, RA or RR. Like so: REF ALT Sample_1 GT_1 Sample_2 GT_2 A G A/A RR A/G RA T G G/G AA T/T RR A T T/T AA A/A RR G A G/A RA G/G RR G A G/G RR G/A RA T C C/C AA C/C AA T C C/C AA C/C AA C T C/T RA T/T AA G A A/A AA A/A AA T G T/G RA G/G AA Is there an easy way to do this? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to check to see if a variable is within a range of another variable
Is there an easy way to check whether a variable is within +/- 10% range of another variable in R? Say, if I have a variable 'A', whether its in +/- 10% range of variable 'B' and if so, create another variable 'C' to say whether it is or not? Is there a function that is able to do that? eventual outcome: A B C 67 76 no 24 23 yes 40 45 yes 10 12 yes 70 72 yes 101 90 no 9 12 no __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to check to see if a variable is within a range of another variable
Apologise - yes, my 10% calculations seem to be slightly off. However, the function gives me all falses which seems to be a little weird. Even where both columns equal each other. Should that be right? In essence I want to check whether A and B equal other give or take 10%. On Wed, Oct 1, 2014 at 6:54 PM, Peter Alspach peter.alsp...@plantandfood.co.nz wrote: Tena koe Kate If kateDF is a data.frame with your data, then apply(kateDF, 1, function(x) isTRUE(all.equal(x[2], x[1], check.attributes = FALSE, tolerance=0.1))) comes close to (what I think) you want (but not to what you have illustrated in your 'eventual outcome'). Anyhow, it may be enough to allow you to get there. HTH Peter Alspach -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Kate Ignatius Sent: Thursday, 2 October 2014 11:11 a.m. To: r-help Subject: [R] How to check to see if a variable is within a range of another variable Is there an easy way to check whether a variable is within +/- 10% range of another variable in R? Say, if I have a variable 'A', whether its in +/- 10% range of variable 'B' and if so, create another variable 'C' to say whether it is or not? Is there a function that is able to do that? eventual outcome: A B C 67 76 no 24 23 yes 40 45 yes 10 12 yes 70 72 yes 101 90 no 9 12 no __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. The contents of this e-mail are confidential and may be subject to legal privilege. If you are not the intended recipient you must not use, disseminate, distribute or reproduce all or any part of this e-mail or attachments. If you have received this e-mail in error, please notify the sender and delete all material pertaining to this e-mail. Any opinion or views expressed in this e-mail are those of the individual sender and may not represent those of The New Zealand Institute for Plant and Food Research Limited. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] if else statement in loop
Ooops, I edited the code wrong to make it more easier for interpretation and got X and Y's mixed up. Try this: for(i in length(1:(nrow(X{ Y$IID1new - ifelse((as.character(Y[,2]) == as.character(X[,i]) Y$IID1new != ''), as.character(as.matrix(X[,(nrow(X)+i)])),'') } The second should be like this: Y$IID1new - ifelse((as.character(Y[,2]) == as.character(X[,1])), as.character(as.matrix(X[,(nrow(X)+1)])),'') for(i in length(2:(nrow(X{ ifelse((as.character(Y[,i]) == as.character(X[,i])), Y$IID1new[is.na(Y$IID1new)] - as.character(as.matrix(X[,(nrow(X)+i)])),'') } The reason why I'm selecting for number of rows seems a little odd here I know but in real life this actually relies on a third data frame, say Z, which for simplicity I didn't include here. But I only want to start looking at the Nth column after twice as many rows in Z. For instance, if Z has 4 rows, I want to take values for IID1new starting from column 9 in X to make IID1new in Y. Does that make sense? Will this cause a problem? So maybe it will probably be more like this if there were a Z for(i in length(1:(2*nrow(Z{ Y$IID1new - ifelse((as.character(Y[,2]) == as.character(X[,i]) Y$IID1new != ''), as.character(as.matrix(X[,(2*nrow(Z)+i)])),'') } But essentially what I would like is this: FID IID IID1new FAM01 samas4 samas4_father FAM01 samas5 samas5_mother FAM01 samas6 samas6_sibling I hope this is a little clearer... Let me know if there are more errors. K. On Mon, Sep 29, 2014 at 2:39 AM, PIKAL Petr petr.pi...@precheza.cz wrote: Hi Please, be more clear in what do you want. I get many errors trying your code and your explanation does not help much. for(i in length(1:(2*nrow(X{ + Y$IID1new - ifelse((as.character(Y[,2]) == as.characterXl[,i]) X$IID1new != '') , as.character(as.matrix(X[,(2*nrow(X)+i)])),'') Error: unexpected ',' in: for(i in length(1:(2*nrow(X{ Y$IID1new - ifelse((as.character(Y[,2]) == as.characterXl[,i]) X$IID1new != '') , } Error: unexpected '}' in } for(i in length(1:(2*nrow(X{ + Y$IID1new - ifelse((as.character(Y[,2]) == as.characterXl[,i]) + X$IID1new != '') , as.character(as.matrix(X[,(2*nrow(X)+i)])),'') Error: unexpected ',' in: Y$IID1new - ifelse((as.character(Y[,2]) == as.characterXl[,i]) X$IID1new != '') , } Beside, this column X$IID1new != '' does not exist in X Here you clearly ask for nonexistent column, and why the heck you want to select column by number of rows? as.character(as.matrix(X[,(2*nrow(X)+1)])) Error in `[.data.frame`(X, , (2 * nrow(X) + 1)) : undefined columns selected So based on your toy data frames, what shall be the result after your computation. Regards Petr -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Kate Ignatius Sent: Sunday, September 28, 2014 9:14 PM To: r-help Subject: [R] if else statement in loop I have two data frames For simplicity: X= V1 V2 V3 V4 V5 V6 samas4 samas5 samas6 samas4_father samas5_mother samas6_sibling samas4 samas5 samas6 samas4_father samas5_mother samas6_sibling samas4 samas5 samas6 samas4_father samas5_mother samas6_sibling Y= FID IID FAM01 samas4 FAM01 samas5 FAM01 samas6 I want to set to create a new IID in Y using V4 V5 V6 in X using an ifelse statement in a loop. I've used something like the following (after figuring out my factor problem): for(i in length(1:(2*nrow(X{ Y$IID1new - ifelse((as.character(Y[,2]) == as.characterXl[,i]) X$IID1new != '') , as.character(as.matrix(X[,(2*nrow(X)+i)])),'') } But of course this tends to overwrite. Is there an easy way to set up a loop to replace missing values? This didn't work either but not sure if its as easy as this: Y$IID1new - ifelse((as.character(Y[,2]) == as.characterXl[,i]) X$IID1new != '') , as.character(as.matrix(X[,(2*nrow(X)+i)])),'') for(i in length(2:(2*nrow(X{ ifelse((as.character(Y[,i]) == as.character(Xl[,i])), X[is.na(X$IID1new)] - as.character(as.matrix(X[(2*nrow(X)+i)])),'') } Thanks! K. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům. Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému. Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu. V případě, že je tento e-mail součástí obchodního jednání
Re: [R] Ifelse statement on a factor level data frame
Strange that, I did put everything with as.character but all I got was the same... class of dbpmn[,2]) = factor class of dbpmn[,21] = factor class of dbpmn[,20] = data.frame This has to be a problem ??? I can put reproducible output here but not sure if this going to of help here. I think its all about factors and data frames and characters... K. On Sun, Sep 28, 2014 at 1:15 AM, Jim Lemon j...@bitwrit.com.au wrote: On Sun, 28 Sep 2014 12:49:41 AM Kate Ignatius wrote: Quick question: I am running the following code on some variables that are factors: dbpmn$IID1new - ifelse(as.character(dbpmn[,2]) == as.character(dbpmn[,(21)]), dbpmn[,20], '') Instead of returning some value it gives me this: c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)) Playing around with the code, gives me some kind of variation to it. Is there some way to get me what I want. The variable that its suppose to give back is a bunch of sampleIDs. Hi Kate, If I create a little example: dbpmn-data.frame(V1=factor(sample(LETTERS[1:4],20,TRUE)), V2=factor(sample(LETTERS[1:4],20,TRUE)), V3=factor(sample(LETTERS[1:4],20,TRUE))) dbpmn[4]- ifelse(as.character(dbpmn[,1]) == as.character(dbpmn[,(2)]), dbpmn[,3],) dbpmn V1 V2 V3 V4 1 B D C 2 C A D 3 C B A 4 A B C 5 B D B 6 D D A 1 7 D D D 4 8 B C A 9 B D B 10 D C A 11 A D C 12 A C B 13 A A A 1 14 D C A 15 C D B 16 A A B 2 17 A C C 18 B B C 3 19 C C C 3 20 D D D 4 I get what I expect, the numeric value of the third element in dbpmn where the first two elements are equal. I think what you want is: dbpmn[4]- ifelse(as.character(dbpmn[,1]) == as.character(dbpmn[,(2)]), as.character(dbpmn[,3]),) dbpmn V1 V2 V3 V4 1 B D C 2 C A D 3 C B A 4 A B C 5 B D B 6 D D A A 7 D D D D 8 B C A 9 B D B 10 D C A 11 A D C 12 A C B 13 A A A A 14 D C A 15 C D B 16 A A B B 17 A C C 18 B B C C 19 C C C C 20 D D D D Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ifelse statement on a factor level data frame
Apologies - you're right. Missed it in the pdf. K. On Sun, Sep 28, 2014 at 10:22 AM, Bert Gunter gunter.ber...@gene.com wrote: Inline. Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. Clifford Stoll On Sun, Sep 28, 2014 at 6:38 AM, Kate Ignatius kate.ignat...@gmail.com wrote: Strange that, I did put everything with as.character but all I got was the same... class of dbpmn[,2]) = factor class of dbpmn[,21] = factor class of dbpmn[,20] = data.frame This has to be a problem ??? Indeed -- your failure to read documentation. I suggest you do your due diligence, read Pat Burns's link, and follow the advice given you by posting a reproducible example. More than likely the last will be unnecessary as you will figure it out in the course of doing what you should do. Cheers, Bert I can put reproducible output here but not sure if this going to of help here. I think its all about factors and data frames and characters... K. On Sun, Sep 28, 2014 at 1:15 AM, Jim Lemon j...@bitwrit.com.au wrote: On Sun, 28 Sep 2014 12:49:41 AM Kate Ignatius wrote: Quick question: I am running the following code on some variables that are factors: dbpmn$IID1new - ifelse(as.character(dbpmn[,2]) == as.character(dbpmn[,(21)]), dbpmn[,20], '') Instead of returning some value it gives me this: c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)) Playing around with the code, gives me some kind of variation to it. Is there some way to get me what I want. The variable that its suppose to give back is a bunch of sampleIDs. Hi Kate, If I create a little example: dbpmn-data.frame(V1=factor(sample(LETTERS[1:4],20,TRUE)), V2=factor(sample(LETTERS[1:4],20,TRUE)), V3=factor(sample(LETTERS[1:4],20,TRUE))) dbpmn[4]- ifelse(as.character(dbpmn[,1]) == as.character(dbpmn[,(2)]), dbpmn[,3],) dbpmn V1 V2 V3 V4 1 B D C 2 C A D 3 C B A 4 A B C 5 B D B 6 D D A 1 7 D D D 4 8 B C A 9 B D B 10 D C A 11 A D C 12 A C B 13 A A A 1 14 D C A 15 C D B 16 A A B 2 17 A C C 18 B B C 3 19 C C C 3 20 D D D 4 I get what I expect, the numeric value of the third element in dbpmn where the first two elements are equal. I think what you want is: dbpmn[4]- ifelse(as.character(dbpmn[,1]) == as.character(dbpmn[,(2)]), as.character(dbpmn[,3]),) dbpmn V1 V2 V3 V4 1 B D C 2 C A D 3 C B A 4 A B C 5 B D B 6 D D A A 7 D D D D 8 B C A 9 B D B 10 D C A 11 A D C 12 A C B 13 A A A A 14 D C A 15 C D B 16 A A B B 17 A C C 18 B B C C 19 C C C C 20 D D D D Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] if else statement in loop
I have two data frames For simplicity: X= V1 V2 V3 V4 V5 V6 samas4 samas5 samas6 samas4_father samas5_mother samas6_sibling samas4 samas5 samas6 samas4_father samas5_mother samas6_sibling samas4 samas5 samas6 samas4_father samas5_mother samas6_sibling Y= FID IID FAM01 samas4 FAM01 samas5 FAM01 samas6 I want to set to create a new IID in Y using V4 V5 V6 in X using an ifelse statement in a loop. I've used something like the following (after figuring out my factor problem): for(i in length(1:(2*nrow(X{ Y$IID1new - ifelse((as.character(Y[,2]) == as.characterXl[,i]) X$IID1new != '') , as.character(as.matrix(X[,(2*nrow(X)+i)])),'') } But of course this tends to overwrite. Is there an easy way to set up a loop to replace missing values? This didn't work either but not sure if its as easy as this: Y$IID1new - ifelse((as.character(Y[,2]) == as.characterXl[,i]) X$IID1new != '') , as.character(as.matrix(X[,(2*nrow(X)+i)])),'') for(i in length(2:(2*nrow(X{ ifelse((as.character(Y[,i]) == as.character(Xl[,i])), X[is.na(X$IID1new)] - as.character(as.matrix(X[(2*nrow(X)+i)])),'') } Thanks! K. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ifelse statement on a factor level data frame
Quick question: I am running the following code on some variables that are factors: dbpmn$IID1new - ifelse(as.character(dbpmn[,2]) == as.character(dbpmn[,(21)]), dbpmn[,20], '') Instead of returning some value it gives me this: c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)) Playing around with the code, gives me some kind of variation to it. Is there some way to get me what I want. The variable that its suppose to give back is a bunch of sampleIDs. Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot2/heat map/duplicated level problem
Hi, I hope I can explain my problem clearly I have a plink output file that I want to graph a heat map of the PI_HAT estimates. I have the following code that I has worked in the past but this time I'm getting the error: In `levels-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels, : duplicated levels in factors are deprecated My code: require(ggplot2) image = (p - ggplot(db, aes(IID1, IID2)) + geom_tile(aes(fill = PI_HAT), colour = white) + scale_fill_gradient2(low = blue, high = red) + labs(x = Individual 1,y = Individual 2) opts(axis.text.x = theme_text(angle=90)) + opts(title=,legend.position = right)) I'm trying to figure out whether this is a problem with duplicated PI-HAT estimates or duplicated ID pairings (though the latter shouldn't be the case as I've used similar files in the past). What else could be the problem? P.S. My file is quite large (300K lines) so its pretty hard to decipher the problem off the bat but the usual plink output file for this type of file has the heading: FID1 IID1FID2 IID2 RTEZ Z0 Z1 Z2 PI_HAT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data.table/ifelse conditional new variable question
Hi, I have a data.table question (as well as if else statement query). I have a large list of families (file has 935 individuals that are sorted by famiy of varying sizes). At the moment the file has the columns: SampleID FamilyID Relationship To prevent from having to make a pedigree file by hand - ie adding a PaternalID and a MaternalID one by one I want to try write a script that will quickly do this for me (I eventually want to run this through a program such as plink) Is there a way to use data.table (maybe in conjucntion with ifelse to do this effectively)? An example of the file is something like: Family.ID Sample.ID Relationship 14 62 sibling 14 94 father 14 63 sibling 14 59 mother 17 6004 father 17 6003 mother 17 6005 sibling 17 368 sibling 130 202 mother 130 203 father 130 204 sibling 130 205 sibling 130 206 sibling 222 9 mother 222 45 sibling 222 34 sibling 222 10 sibling 222 11 sibling 222 18 father But the goal is to have a file like this: Family.ID Sample.ID Relationship PID MID 14 62 sibling 94 59 14 94 father 0 0 14 63 sibling 94 59 14 59 mother 0 0 17 6004 father 0 0 17 6003 mother 0 0 17 6005 sibling 6004 6003 17 368 sibling 6004 6003 130 202 mother 0 0 130 203 father 0 0 130 204 sibling 203 202 130 205 sibling 203 202 130 206 sibling 203 202 222 9 mother 0 0 222 45 sibling 18 9 222 34 sibling 18 9 222 10 sibling 18 9 222 11 sibling 18 9 222 18 father 0 0 I've tried searches for this but with no luck. Greatly appreciate any help - even if its just a link to a great example/solution! Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data.table/ifelse conditional new variable question
Thanks! I think I know what is being done here but not sure how to fix the following error: Error in l$PID[l$\Relationship == sibling] - l$Sample.ID[father] : replacement has length zero On Sat, Aug 16, 2014 at 6:48 PM, Jorge I Velez jorgeivanve...@gmail.com wrote: Dear Kate, Assuming you have nuclear families, one option would be: x - read.table(textConnection(Family.ID Sample.ID Relationship 14 62 sibling 14 94 father 14 63 sibling 14 59 mother 17 6004 father 17 6003 mother 17 6005 sibling 17 368 sibling 130 202 mother 130 203 father 130 204 sibling 130 205 sibling 130 206 sibling 222 9 mother 222 45 sibling 222 34 sibling 222 10 sibling 222 11 sibling 222 18 father), header = TRUE) closeAllConnections() xs - with(x, split(x, Family.ID)) res - do.call(rbind, lapply(xs, function(l){ l$PID - l$MID - 0 father - with(l, Relationship == 'father') mother - with(l, Relationship == 'mother') l$PID[l$Relationship == 'sibling'] - l$Sample.ID[father] l$MID[l$Relationship == 'sibling'] - l$Sample.ID[mother] l })) res HTH, Jorge.- Best regards, Jorge.- On Sun, Aug 17, 2014 at 5:42 AM, Kate Ignatius kate.ignat...@gmail.com wrote: Hi, I have a data.table question (as well as if else statement query). I have a large list of families (file has 935 individuals that are sorted by famiy of varying sizes). At the moment the file has the columns: SampleID FamilyID Relationship To prevent from having to make a pedigree file by hand - ie adding a PaternalID and a MaternalID one by one I want to try write a script that will quickly do this for me (I eventually want to run this through a program such as plink) Is there a way to use data.table (maybe in conjucntion with ifelse to do this effectively)? An example of the file is something like: Family.ID Sample.ID Relationship 14 62 sibling 14 94 father 14 63 sibling 14 59 mother 17 6004 father 17 6003 mother 17 6005 sibling 17 368 sibling 130 202 mother 130 203 father 130 204 sibling 130 205 sibling 130 206 sibling 222 9 mother 222 45 sibling 222 34 sibling 222 10 sibling 222 11 sibling 222 18 father But the goal is to have a file like this: Family.ID Sample.ID Relationship PID MID 14 62 sibling 94 59 14 94 father 0 0 14 63 sibling 94 59 14 59 mother 0 0 17 6004 father 0 0 17 6003 mother 0 0 17 6005 sibling 6004 6003 17 368 sibling 6004 6003 130 202 mother 0 0 130 203 father 0 0 130 204 sibling 203 202 130 205 sibling 203 202 130 206 sibling 203 202 222 9 mother 0 0 222 45 sibling 18 9 222 34 sibling 18 9 222 10 sibling 18 9 222 11 sibling 18 9 222 18 father 0 0 I've tried searches for this but with no luck. Greatly appreciate any help - even if its just a link to a great example/solution! Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data.table/ifelse conditional new variable question
Actually - I didn't check this before, but these are not all nuclear families (as I assumed they were). That is, some don't have a father or don't have a mother Usually if this is the case PID or MID will become 0, respectively, for the child. How can the code be edit to account for this? On Sat, Aug 16, 2014 at 8:02 PM, Kate Ignatius kate.ignat...@gmail.com wrote: Thanks! I think I know what is being done here but not sure how to fix the following error: Error in l$PID[l$\Relationship == sibling] - l$Sample.ID[father] : replacement has length zero On Sat, Aug 16, 2014 at 6:48 PM, Jorge I Velez jorgeivanve...@gmail.com wrote: Dear Kate, Assuming you have nuclear families, one option would be: x - read.table(textConnection(Family.ID Sample.ID Relationship 14 62 sibling 14 94 father 14 63 sibling 14 59 mother 17 6004 father 17 6003 mother 17 6005 sibling 17 368 sibling 130 202 mother 130 203 father 130 204 sibling 130 205 sibling 130 206 sibling 222 9 mother 222 45 sibling 222 34 sibling 222 10 sibling 222 11 sibling 222 18 father), header = TRUE) closeAllConnections() xs - with(x, split(x, Family.ID)) res - do.call(rbind, lapply(xs, function(l){ l$PID - l$MID - 0 father - with(l, Relationship == 'father') mother - with(l, Relationship == 'mother') l$PID[l$Relationship == 'sibling'] - l$Sample.ID[father] l$MID[l$Relationship == 'sibling'] - l$Sample.ID[mother] l })) res HTH, Jorge.- Best regards, Jorge.- On Sun, Aug 17, 2014 at 5:42 AM, Kate Ignatius kate.ignat...@gmail.com wrote: Hi, I have a data.table question (as well as if else statement query). I have a large list of families (file has 935 individuals that are sorted by famiy of varying sizes). At the moment the file has the columns: SampleID FamilyID Relationship To prevent from having to make a pedigree file by hand - ie adding a PaternalID and a MaternalID one by one I want to try write a script that will quickly do this for me (I eventually want to run this through a program such as plink) Is there a way to use data.table (maybe in conjucntion with ifelse to do this effectively)? An example of the file is something like: Family.ID Sample.ID Relationship 14 62 sibling 14 94 father 14 63 sibling 14 59 mother 17 6004 father 17 6003 mother 17 6005 sibling 17 368 sibling 130 202 mother 130 203 father 130 204 sibling 130 205 sibling 130 206 sibling 222 9 mother 222 45 sibling 222 34 sibling 222 10 sibling 222 11 sibling 222 18 father But the goal is to have a file like this: Family.ID Sample.ID Relationship PID MID 14 62 sibling 94 59 14 94 father 0 0 14 63 sibling 94 59 14 59 mother 0 0 17 6004 father 0 0 17 6003 mother 0 0 17 6005 sibling 6004 6003 17 368 sibling 6004 6003 130 202 mother 0 0 130 203 father 0 0 130 204 sibling 203 202 130 205 sibling 203 202 130 206 sibling 203 202 222 9 mother 0 0 222 45 sibling 18 9 222 34 sibling 18 9 222 10 sibling 18 9 222 11 sibling 18 9 222 18 father 0 0 I've tried searches for this but with no luck. Greatly appreciate any help - even if its just a link to a great example/solution! Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data.table/ifelse conditional new variable question
Yep - you're right - missing parents are indicated as zero in the M/PID field. The above code worked with a few errors: 1: In l$PID[l$Relationship == sibling] - l$Sample.ID[father] : number of items to replace is not a multiple of replacement length 2: In l$PID[l$Relationship == sibling] - l$Sample.ID[father] : number of items to replace is not a multiple of replacement length 3: In l$PID[l$Relationship == sibling] - l$Sample.ID[father] : number of items to replace is not a multiple of replacement length 4: In l$MID[l$Relationship == sibling] - l$Sample.ID[mother] : number of items to replace is not a multiple of replacement length looking at the output I get numbers where the father/mother ID should be in the M/PID field. For example: 2702 349 mother 0 0 2702 3456 sibling 0 842 2702 9980 sibling 0 842 3064 3 father 0 0 3064 4 mother 0 0 3064 5sibling 879 880 3064 86 sibling 879 880 3064 87 sibling 879 880 On Sat, Aug 16, 2014 at 9:31 PM, Jorge I Velez jorgeivanve...@gmail.com wrote: Dear Kate, Try this: res - do.call(rbind, lapply(xs, function(l){ l$PID - l$MID - 0 father - with(l, Relationship == 'father') mother - with(l, Relationship == 'mother') if(sum(father) == 0) l$PID[l$Relationship == 'sibling'] - 0 else l$PID[l$Relationship == 'sibling'] - l$Sample.ID[father] if(sum(mother) == 0) l$MID[l$Relationship == 'sibling'] - 0 else l$MID[l$Relationship == 'sibling'] - l$Sample.ID[mother] l })) It is assumed that when either parent is not available the M/PID is 0. Best, Jorge.- On Sun, Aug 17, 2014 at 10:58 AM, Kate Ignatius kate.ignat...@gmail.com wrote: Actually - I didn't check this before, but these are not all nuclear families (as I assumed they were). That is, some don't have a father or don't have a mother Usually if this is the case PID or MID will become 0, respectively, for the child. How can the code be edit to account for this? On Sat, Aug 16, 2014 at 8:02 PM, Kate Ignatius kate.ignat...@gmail.com wrote: Thanks! I think I know what is being done here but not sure how to fix the following error: Error in l$PID[l$\Relationship == sibling] - l$Sample.ID[father] : replacement has length zero On Sat, Aug 16, 2014 at 6:48 PM, Jorge I Velez jorgeivanve...@gmail.com wrote: Dear Kate, Assuming you have nuclear families, one option would be: x - read.table(textConnection(Family.ID Sample.ID Relationship 14 62 sibling 14 94 father 14 63 sibling 14 59 mother 17 6004 father 17 6003 mother 17 6005 sibling 17 368 sibling 130 202 mother 130 203 father 130 204 sibling 130 205 sibling 130 206 sibling 222 9 mother 222 45 sibling 222 34 sibling 222 10 sibling 222 11 sibling 222 18 father), header = TRUE) closeAllConnections() xs - with(x, split(x, Family.ID)) res - do.call(rbind, lapply(xs, function(l){ l$PID - l$MID - 0 father - with(l, Relationship == 'father') mother - with(l, Relationship == 'mother') l$PID[l$Relationship == 'sibling'] - l$Sample.ID[father] l$MID[l$Relationship == 'sibling'] - l$Sample.ID[mother] l })) res HTH, Jorge.- Best regards, Jorge.- On Sun, Aug 17, 2014 at 5:42 AM, Kate Ignatius kate.ignat...@gmail.com wrote: Hi, I have a data.table question (as well as if else statement query). I have a large list of families (file has 935 individuals that are sorted by famiy of varying sizes). At the moment the file has the columns: SampleID FamilyID Relationship To prevent from having to make a pedigree file by hand - ie adding a PaternalID and a MaternalID one by one I want to try write a script that will quickly do this for me (I eventually want to run this through a program such as plink) Is there a way to use data.table (maybe in conjucntion with ifelse to do this effectively)? An example of the file is something like: Family.ID Sample.ID Relationship 14 62 sibling 14 94 father 14 63 sibling 14 59 mother 17 6004 father 17 6003 mother 17 6005 sibling 17 368 sibling 130 202 mother 130 203 father 130 204 sibling 130 205 sibling 130 206 sibling 222 9 mother 222 45 sibling 222 34 sibling 222 10 sibling 222 11 sibling 222 18 father But the goal is to have a file like this: Family.ID Sample.ID Relationship PID MID 14 62 sibling 94 59 14 94 father 0 0 14 63 sibling 94 59 14 59 mother 0 0 17 6004 father 0 0 17 6003 mother 0 0 17 6005
Re: [R] data.table/ifelse conditional new variable question
Actually - your code is not wrong... because this is a large file I went through the file to see if there was anything wrong with it - looks like there are two fathers or three mothers in some families. Taking these duplicates out fixed the problem. Sorry about the confusion! And thanks so much for your help! On Sat, Aug 16, 2014 at 9:53 PM, Jorge I Velez jorgeivanve...@gmail.com wrote: Perhaps I am missing something but I do not get the same result: x - read.table(textConnection(Family.ID Sample.ID Relationship 2702 349 mother 2702 3456 sibling 2702 9980 sibling 3064 3 father 3064 4 mother 3064 5sibling 3064 86 sibling 3064 87 sibling), header = TRUE) closeAllConnections() xs - with(x, split(x, Family.ID)) res - do.call(rbind, lapply(xs, function(l){ l$PID - l$MID - 0 father - with(l, Relationship == 'father') mother - with(l, Relationship == 'mother') if(sum(father) == 0) l$PID[l$Relationship == 'sibling'] - 0 else l$PID[l$Relationship == 'sibling'] - l$Sample.ID[father] if(sum(mother) == 0) l$MID[l$Relationship == 'sibling'] - 0 else l$MID[l$Relationship == 'sibling'] - l$Sample.ID[mother] l })) #Family.ID Sample.ID Relationship MID PID #2702.1 2702 349 mother 0 0 #2702.2 2702 3456 sibling 349 0 #2702.3 2702 9980 sibling 349 0 #3064.4 3064 3 father 0 0 #3064.5 3064 4 mother 0 0 #3064.6 3064 5 sibling 4 3 #3064.7 306486 sibling 4 3 #3064.8 306487 sibling 4 3 HTH, Jorge.- On Sun, Aug 17, 2014 at 11:47 AM, Kate Ignatius kate.ignat...@gmail.com wrote: Yep - you're right - missing parents are indicated as zero in the M/PID field. The above code worked with a few errors: 1: In l$PID[l$Relationship == sibling] - l$Sample.ID[father] : number of items to replace is not a multiple of replacement length 2: In l$PID[l$Relationship == sibling] - l$Sample.ID[father] : number of items to replace is not a multiple of replacement length 3: In l$PID[l$Relationship == sibling] - l$Sample.ID[father] : number of items to replace is not a multiple of replacement length 4: In l$MID[l$Relationship == sibling] - l$Sample.ID[mother] : number of items to replace is not a multiple of replacement length looking at the output I get numbers where the father/mother ID should be in the M/PID field. For example: 2702 349 mother 0 0 2702 3456 sibling 0 842 2702 9980 sibling 0 842 3064 3 father 0 0 3064 4 mother 0 0 3064 5sibling 879 880 3064 86 sibling 879 880 3064 87 sibling 879 880 On Sat, Aug 16, 2014 at 9:31 PM, Jorge I Velez jorgeivanve...@gmail.com wrote: Dear Kate, Try this: res - do.call(rbind, lapply(xs, function(l){ l$PID - l$MID - 0 father - with(l, Relationship == 'father') mother - with(l, Relationship == 'mother') if(sum(father) == 0) l$PID[l$Relationship == 'sibling'] - 0 else l$PID[l$Relationship == 'sibling'] - l$Sample.ID[father] if(sum(mother) == 0) l$MID[l$Relationship == 'sibling'] - 0 else l$MID[l$Relationship == 'sibling'] - l$Sample.ID[mother] l })) It is assumed that when either parent is not available the M/PID is 0. Best, Jorge.- On Sun, Aug 17, 2014 at 10:58 AM, Kate Ignatius kate.ignat...@gmail.com wrote: Actually - I didn't check this before, but these are not all nuclear families (as I assumed they were). That is, some don't have a father or don't have a mother Usually if this is the case PID or MID will become 0, respectively, for the child. How can the code be edit to account for this? On Sat, Aug 16, 2014 at 8:02 PM, Kate Ignatius kate.ignat...@gmail.com wrote: Thanks! I think I know what is being done here but not sure how to fix the following error: Error in l$PID[l$\Relationship == sibling] - l$Sample.ID[father] : replacement has length zero On Sat, Aug 16, 2014 at 6:48 PM, Jorge I Velez jorgeivanve...@gmail.com wrote: Dear Kate, Assuming you have nuclear families, one option would be: x - read.table(textConnection(Family.ID Sample.ID Relationship 14 62 sibling 14 94 father 14 63 sibling 14 59 mother 17 6004 father 17 6003 mother 17 6005 sibling 17 368 sibling 130 202 mother 130 203 father 130 204 sibling 130 205 sibling 130 206 sibling 222 9 mother 222 45 sibling 222 34 sibling 222 10 sibling 222 11 sibling 222 18 father), header = TRUE) closeAllConnections() xs - with(x, split(x, Family.ID)) res - do.call(rbind, lapply(xs, function(l){ l$PID - l$MID - 0 father - with(l, Relationship == 'father
[R] counting the number of rows that satisfy a certain criteria
I have 4 columns, and about 300K plus rows with 0s and 1s. I'm trying to count how many rows satisfy a certain criteria... for instance, how many rows are there that have the first column == 1 as well as the second column == 1. I've tried using rowSums and colSums but it keeps giving me this type of error: Error in rowSums(X[1] == 1 X[2] == 1) : 'x' must be an array of at least two dimensions Thanks in advance! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting the number of rows that satisfy a certain criteria
Thanks! On Sat, Jun 21, 2014 at 11:05 AM, Jorge I Velez jorgeivanve...@gmail.com wrote: Hi Kate, You could try sum(X[, 1] == 1 X[, 2] == 1) where X is your data set. HTH, Jorge.- On Sun, Jun 22, 2014 at 12:57 AM, Kate Ignatius kate.ignat...@gmail.com wrote: I have 4 columns, and about 300K plus rows with 0s and 1s. I'm trying to count how many rows satisfy a certain criteria... for instance, how many rows are there that have the first column == 1 as well as the second column == 1. I've tried using rowSums and colSums but it keeps giving me this type of error: Error in rowSums(X[1] == 1 X[2] == 1) : 'x' must be an array of at least two dimensions Thanks in advance! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Layout of two graphs on a page...
I'm trying to have a layout of two graphs on a page... this has worked before... but I changed up the way I do my venn diagrams so now instead of the Venn Diagram being at the bottom of the page below the bar/line graph it takes up the whole page and its overlays the bar/line graph placed on the top half... Here is my code: layout(matrix(c(1,2),2,1,byrow=TRUE),widths=c(1,1),heights=c(2,2)) oldmar - par(mar) par(oma=c(0,2,0,2),mar=c(5.1,4.1,4.1,3.1)) my_tcks-pretty(c(0,max(counts)),6) b - barplot(counts,col='purple',axes=F,border=FALSE,cex.names = 0.75, las=2, ylim=c(0,my_tcks[length(my_tcks)])) axis(2,at=my_tcks, labels=format(my_tcks, scientific = FALSE), cex.axis=0.75, las=2) mtext(,side=2,line=4,cex=1) par(new=TRUE) barplot(rep(NA,4),ylim=c(0,(max(ratio)+1)),axes=FALSE) axis(4, cex.axis=0.75, las=2) mtext(,side=4,line=2,cex=1) lines(b, ratio,col=black,lwd=2) par(mar=oldmar) par(new=FALSE) library(VennDiagram) draw.quad.venn(area1=area1,area2=area2,area3=area3,area4=area4,n12=n12,n13=n13,n14=n14,n23=n23,n24=n24,n34=n34,n123=n123,n124=n124,n134=n134,n234=n234,n1234=n1234, category=c(A,B,C,D),fill=c(white,white,white,white), alpha=c(0.2,0.2,0.2,0.2), euler.d=FALSE, scaled=FALSE, cex=2, cat.cex=1.5, main=) dev.off() I've changed around the oma and mar settings so much now that I'm a tad confused and probably over looking something really obvious. Thanks in advance... P.S. Let me know if more details are required (I can substitute some numbers here if it helps plot some graphs) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in merge [negative length vectors are not allowed]
Hi All, I'm trying to merge two files together using: combinedfiles - merge(comb1,comb2,by=c(Place,Stall,Menu)) comb1 is about 2 million + rows (158MB) and comb2 is about 600K+ rows (52MB). When I try to merge using the above syntax I get the error: Error in merge.data.frame(comb1, comb2, by = c(Place,Stall,Menu)) : negative length vectors are not allowed Is there is something that I'm doing wrong? I've merged larger files together in the past without a problem so am curious what might be the problem here... Thanks in advance! ~K __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using reduce to merge multiple files
I have a list of files that I have called like so: main_dir - '/path/to/files/' directories - list.files(main_dir, pattern = '[[:alnum:]]', full.names=T) filenames - list.files(file.path(directories,/tmpdir/), pattern = '[[:alnum:][:punct:]]_eat.txt+$', recursive = TRUE, full.names=T) This lists around 35 Files. Each has multiple columns but they all have three columns in common: Burger, Stall and Cost which I want to merge on using: m1 - Reduce(function(a, b) { merge(a, b, by=c(Burger,Stall,Cost)) }, filenames) However, I get the error: Error in fix.by(by.x, x) : 'by' must specify uniquely valid columns Is there something that I have obviously overlooked here? Thanks in advance! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Adding segments to a dot plot in ggplot2
I'm trying to plot a GWAS (in you will) with lined segments representing an overall p-value for each gene. Here is my code: skatg - ggplot(comm, aes(x = position,y = p, colour = grey)) + geom_point(size = 0.75) + geom_segment(data=rare, aes(x = txStart, y = -log10(p), xend=txEnd, yend = -log10(p), colour = darkgreen)) + labs(x = Position,y = -log10 P value) + facet_wrap(~ Chrom, scales = free, ncol = 4) Where comm is a file with 250k+ variants and genes.in.locus is a file with about 18k genes. When running this script, I get the error Don't know how to automatically pick scale for object of type function. Defaulting to continuous Error in data.frame(x = c(40840353L, 31902418L, 19468080L, 236748505L, : arguments imply differing number of rows: 79746, 0 Is this because there are different number of rows in each data frame I'm trying to plot? If so, what is a best way to overcome this error? Example of my data is as follows: comm: Namegene Chrom position p 1 rs1037FAM114A1 4 38924330 0.7513597 2 rs1250 CC2D2A 4 15482477 0.9202882 4 rs1911 USP38 4 144136193 0.8335902 5 rs10001 STXBP219 7711221 0.4709547 7 rs10001370 USP46 4 53463730 0.8759828 8 rs1000152 ZNF462 9 109687288 0.3451001 10 rs10002583POLN 4 2194953 0.7878575 12 rs10002971 EGF 4 110896050 0.5082255 15 rs10003873 SORBS2 4 186605868 0.2309855 16 rs10003909ARHGAP24 4 86915848 0.8714853 17 rs10003947 ANXA3 4 79512800 0.5141532 18rs10004SSR1 6 7310259 0.6851725 20 rs10004136 STX18 4 4463587 0.5296092 21 rs10004516 ENPEP 4 111398208 0.8564897 22 rs1000521 SLC8A314 70522484 0.6234326 23 rs10005849 DCHS2 4 155287317 0.8192577 24 rs10006362 RGS12 4 3319271 0.8061674 25 rs1000640WWP26 69905668 0.2682735 26 rs10006580 PCDH18 4 138449812 0.5178650 27 rs10006676 CYTL1 4 5021086 0.3531493 28 rs10006845 PCDH7 4 31116375 0.4817453 29 rs10007075 NEIL3 4 178274694 0.5433481 31 rs10008636 TMPRSS11BNL 4 69083563 0.8346434 32 rs10008910UBA6 4 68500171 0.5705853 33 rs10009228 CHRNA9 4 40356422 0.4223378 rare: geneName txStart txEnd Chromposition p 36131YTHDC16026 45746 4 6026 0.5009490 10898 FAM110C 38813 46588 19 38813 1.000 37306ZNF595 53178 88099 4 53178 0.1261045 16450 KIR2DL4 57208 6812319 57208 0.156 28406SCAND3 61610 77316 6 61610 0.2568 19926 MPG 127017 1358506 127017 00.000987456 34149TRIM27 174179 195169 6 174179 0.025698 I haven't included all information here. Any help will be greatly appreciated. Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Mean of colMeans
Hi All, I've successfully gotten out the colMeans for 60 columns using: col - colMeans(x, na.rm = TRUE, dims = 1) My next question is: is there a way of getting a mean of all the column means (ie a mean of a mean)? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mean of colMeans
Thanks for the explanation. And tip... this was a quick a dirty code so didn't really think about naming something that is already a function in R. Data was generic - just a bunch of columns with numbers so didn't bother including that as I know that wasn't the problem. Same goes with replying - automatically went to reply, will remember to reply-all. On Wed, May 21, 2014 at 3:11 PM, Sarah Goslee sarah.gos...@gmail.com wrote: That would be because col is a function in base R, and thus a poor choice of names for user objects. Nonetheless, it worked when I ran it, but you didn't provide reproducible example so who knows. R set.seed(1) R x - data.frame(matrix(runif(150), ncol=10)) R # col is a function, so not a good name R col - colMeans(x) R mean(col) [1] 0.5119 It's polite to include the list on your reply. Sarah On Wed, May 21, 2014 at 2:50 PM, Kate Ignatius kate.ignat...@gmail.com wrote: That didn't work: gave me the error = [1] NA Warning message: In mean.default(col) : argument is not numeric or logical: returning NA But writing it like: mean(colMeans(x, na.rm = TRUE, dims = 1)), worked Thanks! On Wed, May 21, 2014 at 2:31 PM, Sarah Goslee sarah.gos...@gmail.com wrote: Is mean(col) not what you're looking for? Sarah On Wed, May 21, 2014 at 2:26 PM, Kate Ignatius kate.ignat...@gmail.com wrote: Hi All, I've successfully gotten out the colMeans for 60 columns using: col - colMeans(x, na.rm = TRUE, dims = 1) My next question is: is there a way of getting a mean of all the column means (ie a mean of a mean)? Thanks! -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Colour of geom_hline is not correct in legend
I've used geom_point and geom_hline in ggplot2 and have gotten satisfactory legends for both. However, I have one black line and one blue line in the figure but in the legend they are both black - how can I correct this in the legend to be the right colors? mcgc - ggplot(sam, aes(x = m,y = ad, colour = X)) + geom_point(size = 0.75) + scale_colour_gradient2(high=red, mid=green, limits=c(0,1), guide = colourbar) + geom_hline(aes(yintercept = mad, linetype = mad), colour = blue, size=0.75, show_guide = TRUE) + geom_hline(aes(yintercept = mmad, linetype = mmad), colour = black, size=0.75, show_guide = TRUE) + facet_wrap(~ Plan, scales = free, ncol = 4) + scale_linetype_manual(name = Plan of Health Care, values = c(mad = 1, mmad = 1),guide = legend) I'm sure I've over written something here... just not sure where (am new to ggplot) Data: Plan ad X m mad mmad 1 1 95 0.323000 0.400303 0.12 1 2 275 0.341818 0.400303 0.12 1 3 2 0.618000 0.400303 0.12 1 4 75 0.32 0.400303 0.12 1 5 13 0.399000 0.400303 0.12 1 6 20 0.40 0.400303 0.12 2 1 219 0.393000 0.353350 0.45 2 2 50 0.06 0.353350 0.45 2 3 213 0.39 0.353350 0.45 2 4 204 0.496100 0.353350 0.45 2 5 19 0.393000 0.353350 0.45 2 6 201 0.388000 0.353350 0.45 Plan goes up to 40, but I've only included a snippet of data here... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Manipulating x axis using scale_x_continuous (but a factor is used). Is there a work around?
My code that I've used is: mcgc - ggplot(sam, aes(x = person,y = m, colour = X)) + geom_point(size = 0.75) + scale_colour_gradient2(high=red, mid=green, limits=c(0,1), guide = colourbar) + geom_hline(aes(yintercept = mad, linetype = mad), colour = blue, size=0.75, show_guide = TRUE) + geom_hline(aes(yintercept = mmad, linetype = mmad), colour = black, size=0.75, show_guide = TRUE) + facet_wrap(~ Plan, scales = free, ncol = 4) + scale_linetype_manual(name = Plan of Health Care, values = c(mad = 1, mmad = 1),guide = legend) For this data: Data: Plan Person X m mad mmad 1 1 95 0.323000 0.400303 0.12 1 2 275 0.341818 0.400303 0.12 1 3 2 0.618000 0.400303 0.12 1 4 75 0.32 0.400303 0.12 1 5 13 0.399000 0.400303 0.12 1 6 20 0.40 0.400303 0.12 2 7 219 0.393000 0.353350 0.45 2 8 50 0.06 0.353350 0.45 2 9 213 0.39 0.353350 0.45 2 15 204 0.496100 0.353350 0.45 2 19 19 0.393000 0.353350 0.45 2 24 201 0.388000 0.353350 0.45 3 30 219 0.567 0.1254 0.89 3 14 50 0.679 0.1254 0.89 3 55 213 0.1234 0.1254 0.89 3 18 204 0.6135 0.1254 0.89 3 59 19 0.39356 0.1254 0.89 3 101 201 0.300 0.1254 0.89 I'm trying to manipulate the x axis using the following, only because the data can get very large and there is just way too many Persons to fit on the x-axis and I need to reduce it so its legible: scale_x_continuous(breaks = c(min(person), median(person), max(person)), labels = c(min(person), median(person), max(person))) However, given that I had to change `person` into a factor to order the data properly, the above code does not work. I get the errors, depending on how I fiddle around with the code: Error: Discrete value supplied to continuous scale Error in Summary.factor(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, : min not meaningful for factors Changing `person` to numeric does not work, as the accumulated `person` for the entire dataset will then be on each Plan figure panel, as opposed to the scale specific for each Plan. That is, the x-axis for each panel (Plan) should have a scale beginning from its lowest Person to its highest Person (ie Plan 1 should have an x-axis that goes from 1 to 6 but Plan 3 has one that goes from 14 to 101). Changing the Person to numeric, the x-axis for all panels starts at 1 and goes to 101. Is there a work around for this? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Setting alternative x-axis breaks using gglpot2
I'm not doing a Manhattan plot, but plotting AD (coloured by DP) along the genome: points - ggplot(sam,aes(x = midpoint,y = ad, colour = dp, size = 3)) + geom_point() + scale_y_continuous(breaks=c(0,20,30,40)) + labs(x = chr,y = ad) + scale_colour_gradient2(high=red, mid=green) However, instead of having the BP position along the bottom, I was wondering whether its possible to have the chromosome instead. Is there an easier way to do this? I'm also trying to reduce the size of the points on the Manhattan plot but changing the size in the code does not work. Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Recoding in R conditioned on a certain value.
I'm trying to work out the average of a certain value by chromosome. I've done the following, but it doesn't seem to work: Say, I want to find average AD for chromosome 1 only and paste the value next to all the positions on chromosome 1: sam$mmad[sam$chrom == '1'] - (sam$ad)/(colSums(sam[c(1:nrow(sam$chrom=='1'))],)) I know this is convoluted and possible wrong... but I would like to do this for all chromosomes. Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.