[R] Error: unexpected symbol in [with read.table]

2015-06-26 Thread Kate Ignatius
When reading in a tab delimited file using args I keep getting the error:

Error: unexpected symbol in Name index

Execution halted

The code is this:

a - read.table(args[1],sep=\t,header=T, stringsAsFactors=F)

When inputting the file directly, as follows, this produces no errors:

a - read.table(/path/to/file/filename.txt, header=T,sep=\t,
stringsAsFactors=F).

The file is such:

Name   index
Bob  1
George 2
Dave3
Eric  4
.
.
.
.
Andrew20

Is there anything I should be looking out for that might be producing
this error.   Any help will be greatly appreciated.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error: unexpected symbol in [with read.table]

2015-06-26 Thread Kate Ignatius
Oops - error on my part.  Sorry.

On Fri, Jun 26, 2015 at 2:54 PM, Bert Gunter bgunter.4...@gmail.com wrote:
 ... and you should also know by now to cc the list and not respond just to me!
 Bert Gunter

 Data is not information. Information is not knowledge. And knowledge
 is certainly not wisdom.
-- Clifford Stoll


 On Fri, Jun 26, 2015 at 10:58 AM, Kate Ignatius kate.ignat...@gmail.com 
 wrote:
 reading in a tab delimited file using args

 What I mean by that is that I'm using a bash script to call in an R
 script and using the command: args - commandArgs(TRUE) in my R
 script.

 In my shell script I'm calling the R program as follows:
 /path/to/R/R-3.0.2/bin/Rscript

 I'm not sure if that will help - sure you will all know if it doesn't.

 K.

 On Fri, Jun 26, 2015 at 1:39 PM, Bert Gunter bgunter.4...@gmail.com wrote:
 ??
 Are you expecting us to guess what your code was from

 reading in a tab delimited file using args ?

 You've posted here before and should know by now that explicit code
 should be provided whenever possible.


 Cheers,
 Bert
 Bert Gunter

 Data is not information. Information is not knowledge. And knowledge
 is certainly not wisdom.
-- Clifford Stoll


 On Fri, Jun 26, 2015 at 10:32 AM, Kate Ignatius kate.ignat...@gmail.com 
 wrote:
 When reading in a tab delimited file using args I keep getting the error:

 Error: unexpected symbol in Name index

 Execution halted

 The code is this:

 a - read.table(args[1],sep=\t,header=T, stringsAsFactors=F)

 When inputting the file directly, as follows, this produces no errors:

 a - read.table(/path/to/file/filename.txt, header=T,sep=\t,
 stringsAsFactors=F).

 The file is such:

 Name   index
 Bob  1
 George 2
 Dave3
 Eric  4
 .
 .
 .
 .
 Andrew20

 Is there anything I should be looking out for that might be producing
 this error.   Any help will be greatly appreciated.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Converting unique strings to unique numbers

2015-05-29 Thread Kate Ignatius
I found this helpful.  However - the second to forth columns come out
all zero - was this the intention?

That is:

X0001 0 0 0  2  1 BYX859
X0001 0 0 0  1  1 BYX894
X0001 0 0 0  2  2 BYX862
X0001 0 0 0  2  2 BYX863
X0001 0 0 0  2  2 BYX864
X0001 0 0 0  2  2 BYX865

On Fri, May 29, 2015 at 1:31 PM, William Dunlap wdun...@tibco.com wrote:
 match() will do what you want.  E.g., run your data through
 the following function.

 f - function (data)
 {
 uniqStrings - unique(c(data[, 2], data[, 3], data[, 4]))
 uniqStrings - setdiff(uniqStrings, 0)
 for (j in 2:4) {
 data[[j]] - match(data[[j]], uniqStrings, nomatch = 0L)
 }
 data
 }



 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com

 On Fri, May 29, 2015 at 9:58 AM, Kate Ignatius kate.ignat...@gmail.com
 wrote:

 I have a pedigree file as so:

 X0001 BYX859  0  0  2  1 BYX859
 X0001 BYX894  0  0  1  1 BYX894
 X0001 BYX862 BYX894 BYX859  2  2 BYX862
 X0001 BYX863 BYX894 BYX859  2  2 BYX863
 X0001 BYX864 BYX894 BYX859  2  2 BYX864
 X0001 BYX865 BYX894 BYX859  2  2 BYX865

 And I was hoping to change all unique string values to numbers.

 That is:

 BYX859 = 1
 BYX894 = 2
 BYX862 = 3
 BYX863 = 4
 BYX864 = 5
 BYX865 = 6

 But only in columns 2 - 4.  Essentially I would like the data to look like
 this:

 X0001 1 0 0  2  1 BYX859
 X0001 2 0 0  1  1 BYX894
 X0001 3 2 1  2  2 BYX862
 X0001 4 2 1  2  2 BYX863
 X0001 5 2 1  2  2 BYX864
 X0001 6 2 1  2  2 BYX865

 Is this possible with factors?

 Thanks!

 K.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Converting unique strings to unique numbers

2015-05-29 Thread Kate Ignatius
I have a pedigree file as so:

X0001 BYX859  0  0  2  1 BYX859
X0001 BYX894  0  0  1  1 BYX894
X0001 BYX862 BYX894 BYX859  2  2 BYX862
X0001 BYX863 BYX894 BYX859  2  2 BYX863
X0001 BYX864 BYX894 BYX859  2  2 BYX864
X0001 BYX865 BYX894 BYX859  2  2 BYX865

And I was hoping to change all unique string values to numbers.

That is:

BYX859 = 1
BYX894 = 2
BYX862 = 3
BYX863 = 4
BYX864 = 5
BYX865 = 6

But only in columns 2 - 4.  Essentially I would like the data to look like this:

X0001 1 0 0  2  1 BYX859
X0001 2 0 0  1  1 BYX894
X0001 3 2 1  2  2 BYX862
X0001 4 2 1  2  2 BYX863
X0001 5 2 1  2  2 BYX864
X0001 6 2 1  2  2 BYX865

Is this possible with factors?

Thanks!

K.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error importing data - wrapping?

2015-05-09 Thread Kate Ignatius
I've tried many things:

read.csv(data.frame.txt, header=F, fill=T,stringsAsFactors=FALSE,
sep=\t, colClasses=character)
read.csv2(data.frame.txt, fill=T,stringsAsFactors=FALSE, sep=\t,
as.is=T, colClasses=character)

also with read.delim/2

read.table(data.frame.txt, header=F, fill=T,stringsAsFactors=FALSE,
sep=\t, colClasses=character)

And a combination of various different options.

On Sat, May 9, 2015 at 11:11 AM, Jeff Newmiller
jdnew...@dcn.davis.ca.us wrote:
 There are many ways to import data into R, and I don't know any of them that 
 would do what you are describing. You really need to give us some 
 reproducible code if we are to follow along with your problem.
 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.

 On May 9, 2015 7:59:31 AM PDT, Kate Ignatius kate.ignat...@gmail.com wrote:
I have some data that I've trouble importing...

A B C D E
A 1232 0.565
B 2323 0.5656 0.5656 0.5656
C 2323 0.5656
D 2323 0.5656
E 2323 0.5656
F 2323 0.5656
G 2323 0.5656
G 2323 0.5656 0.5656 0.5656

When I input the data it seems to go like this:

SampleID ItemB ItemC ItemD ItemE
A 1232 0.565
B 2323 0.5656
0.5656 0.5656
C 2323 0.5656
D 2323 0.5656
E 2323 0.5656
F 2323 0.5656
G 2323 0.5656
G 2323 0.5656 0.5656 0.5656

with the last two columns (or the two columns with vast amounts of
missing data which are usually the last two = see SampleB) wrapping
around - is there away to prevent this?

Thanks!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error importing data - wrapping?

2015-05-09 Thread Kate Ignatius
I have some data that I've trouble importing...

A B C D E
A 1232 0.565
B 2323 0.5656 0.5656 0.5656
C 2323 0.5656
D 2323 0.5656
E 2323 0.5656
F 2323 0.5656
G 2323 0.5656
G 2323 0.5656 0.5656 0.5656

When I input the data it seems to go like this:

SampleID ItemB ItemC ItemD ItemE
A 1232 0.565
B 2323 0.5656
0.5656 0.5656
C 2323 0.5656
D 2323 0.5656
E 2323 0.5656
F 2323 0.5656
G 2323 0.5656
G 2323 0.5656 0.5656 0.5656

with the last two columns (or the two columns with vast amounts of
missing data which are usually the last two = see SampleB) wrapping
around - is there away to prevent this?

Thanks!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error importing data - wrapping?

2015-05-09 Thread Kate Ignatius
I've tried colClasses=character, fill=T, as.is=T, header=F,
sep=\t, read.csv; read.delim, read.csv2, read.delim2 don't know
what else to try.

On Sat, May 9, 2015 at 11:13 AM, MacQueen, Don macque...@llnl.gov wrote:
 Some indication of what you have tried would be useful. Assuming you are
 using read.table(), then the fill argument of read.table() might be what
 you need. If you look at the help for read.table you will find:

 From ?read.table:
fill: logical. If 'TRUE' then in case the rows have unequal length,
   blank fields are implicitly added.  See 'Details'.


 --
 Don MacQueen

 Lawrence Livermore National Laboratory
 7000 East Ave., L-627
 Livermore, CA 94550
 925-423-1062





 On 5/9/15, 7:59 AM, Kate Ignatius kate.ignat...@gmail.com wrote:

I have some data that I've trouble importing...

A B C D E
A 1232 0.565
B 2323 0.5656 0.5656 0.5656
C 2323 0.5656
D 2323 0.5656
E 2323 0.5656
F 2323 0.5656
G 2323 0.5656
G 2323 0.5656 0.5656 0.5656

When I input the data it seems to go like this:

SampleID ItemB ItemC ItemD ItemE
A 1232 0.565
B 2323 0.5656
0.5656 0.5656
C 2323 0.5656
D 2323 0.5656
E 2323 0.5656
F 2323 0.5656
G 2323 0.5656
G 2323 0.5656 0.5656 0.5656

with the last two columns (or the two columns with vast amounts of
missing data which are usually the last two = see SampleB) wrapping
around - is there away to prevent this?

Thanks!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Grep out columns using a list of strings

2015-05-08 Thread Kate Ignatius
Hi,

I have a list of 150 strings, say, ap,:

aajkss
dfghjk
sdfghk
...
xxcvvn


And I would l like to grep out these strings from column names in
another file, af,.   I've tried the following but none seem to work:

aps - af[,grep(ap, colnames(af), value=TRUE)]
aps - af[,grep(ap, colnames(af), value=FIXED)]
aps - af[,grep(as.character(list(ap),colnames(af))]

and also aps - unique (grep(ap, colnames(af))

Is there another way I can do this - maybe without using grep?

Thanks!

Kate.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Summing certain values within columns that satisfy a certain condition

2015-02-26 Thread Kate Ignatius
Hi,

Supposed I had a data frame like so:

A B C D
0 1 0 7
0 2 0 7
0 3 0 7
0 4 0 7
0 1 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 1 5
0 5 1 5
0 4 1 5
0 8 4 7
0 0 3 0
0 0 3 4
0 0 3 4
0 0 0 5
0 2 0 6
0 0 4 0
0 0 4 0
0 0 4 0

For each row, I want to count how many max column values appear to
adventurely get the following outcome, while ignoring zeros and N/As:

A B C D Sum
0 1 0 7 1
0 2 0 7 1
0 3 0 7 1
0 4 0 7 1
0 1 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 1 5 0
0 5 1 5 0
0 4 1 5 0
0 8 4 7 3
0 0 3 0 0
0 0 3 4 0
0 0 3 4 0
0 0 0 5 0
0 2 0 6 0
0 0 4 0 1
0 0 4 0 1
0 0 4 0 1

I've used the following code but it doesn't seem to work (my sum
column column is all 1s):

(apply(df,1, function(x)  (sum(x %in% c(pmax(x))

Is this code too simple?

Thanks!

K.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] grepping out columns

2015-02-18 Thread Kate Ignatius
Hi,

I've got a complicated grep problem (or not)...  I currently have a
file with the headings as follows:

DAY
MONTH
YEAR
SA_TUES
SA_MON
SU_WED
CH_TUES
CH_WED
CH_MON
AR_TUES
AR_WED
AR_MON
SA_THUR
SU_FRI
CH_THUR
CH_FRI
AR_THUR
AR_FRI

I want to grep out all columns that have SA at the beginning of their
day including any other information pertaining to that day.
Ultimately I want to end up with:

SA_TUES
SA_MON
CH_TUES
CH_MON
AR_TUES
AR_MON
SA_THUR
CH_THUR
AR_THUR

Is there a way of doing this simply with grep? Or will this need to be
more complicated?

Thanks!

K.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grepping out columns

2015-02-18 Thread Kate Ignatius
Thanks!  That was helpful.  Although I think there was a typo in the last line:

selected - sort(unique(unlist(all_ind)))

but I figured it out :)

K.

On Wed, Feb 18, 2015 at 4:10 PM, Federico Lasa fel...@gmail.com wrote:
 David's almost works except it catches the MONTH column, just add an
 empty metacharacter tho.

 c(DAY,
 MONTH,
 YEAR,
 SA_TUES,
 SA_MON,
 SU_WED,
 CH_TUES,
 CH_WED,
 CH_MON,
 AR_TUES,
 AR_WED,
 AR_MON,
 SA_THUR,
 SU_FRI,
 CH_THUR,
 CH_FRI,
 AR_THUR,
 AR_FRI)- columns

 sa_ind - grep(SA_,columns)
 days - gsub(SA_,, columns[sa_ind])
 days - paste0(days,$)
 selected - lapply(days, function(x) grep(x,columns))
 selected - sort(unique(unlist(all_ind)))

 columns[selected]
 [1] SA_TUES SA_MON  CH_TUES CH_MON  AR_TUES AR_MON
 SA_THUR CH_THUR AR_THUR

 On Wed, Feb 18, 2015 at 2:55 PM, David Winsemius dwinsem...@comcast.net 
 wrote:

 On Feb 18, 2015, at 12:27 PM, Kate Ignatius wrote:
 Hi,

 I've got a complicated grep problem (or not)...  I currently have a
 file with the headings as follows:

 Lets assume these values are in a character vector named 'dat'.
 SA_TUES
 SA_MON
 SU_WED
 CH_TUES
 CH_WED
 CH_MON
 AR_TUES
 AR_WED
 AR_MON
 SA_THUR
 SU_FRI
 CH_THUR
 CH_FRI
 AR_THUR
 AR_FRI

  sadays - dat[grep(SA, dat) ]
  sads - gsub(SA_,,sadays)
  sads
 #[1] TUES MON  THUR

  dat[ sapply(sads, grep, dat) ]
 #[1] SA_TUES CH_TUES AR_TUES SA_MON  CH_MON  AR_MON
 #[7] SA_THUR CH_THUR AR_THUR

 --
 David Winsemius
 Alameda, CA, USA

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Paste every two columns together

2015-01-28 Thread Kate Ignatius
I have genetic data as follows (simple example, actual data is much larger):

comb =

ID1 A A T G C T G C G T C G T A

ID2 G C T G C C T G C T G T T T

And I wish to get an output like this:

ID1 AA TG CT GC GT CG TA

ID2 GC TG CC TG CT GT TT

That is, paste every two columns together.

I have this code, but I get the error:

Error in seq.default(2, nchar(x), 2) : 'to' must be of length 1

conc - function(x) {
  s - seq(2, nchar(x), 2)
  paste0(x[s], x[s+1])
}

combn - as.data.frame(lapply(comb, conc), stringsAsFactors=FALSE)

Thanks in advance!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rle with data.table - is it possible?

2015-01-03 Thread Kate Ignatius
, 2015, at 12:07 AM, Kate Ignatius wrote:

 Ah, crap.  Yep you're right.  This is not going too well. Okay - let
 me try that again:

 x$childseg-0
 x-x$sumchild !=0


 That previous line would appear to overwrite the entire dataframe

 with the

 value of one vector

 span-rle(x)$lengths[rle(x)$values==TRUE]
 x$childseg[x]-rep(seq_along(span), times = span)

 Does this one have any errors?

 Even assuming that the code from Jeff Newmiller is creating those

 objects I

 get

 x$childseg[x]-rep(seq_along(span), times = span)

 Error in `*tmp*`$childseg : $ operator is invalid for atomic vectors

 In the last line you are indexing a vector with a dataframe (or

 perhaps a

 data.table).

 If we use Newmiller's object and then change some of the instances of

 x in

 your code to DT we get:

 DT$childseg-0
 x-DT$sumchild !=0  # Try not to overwrite your data-objects
 span-rle(x)$lengths[rle(x)$values==TRUE]
 DT$childseg[x]-rep(seq_along(span), times = span)
 DT

 Dad Mum Child Group sumdad summum sumchild childseg
  1:  AA  RRRA A  2  200
  2:  AA  RRRR A  2  211
  3:  AA  AAAA B  4  551
  4:  AA  AAAA B  4  551
  5:  RA  AARR B  0  551
  6:  RR  AARR B  4  551
  7:  AA  AAAA B  4  551
  8:  AA  AARA C  3  300
  9:  AA  AARA C  3  300
 10:  AA  RRRA C  3  300

 You persist in posting code where you do not explain what you are

 trying to

 do with it. You have already been told that your earlier efforts

 using `rle`

 did not make any sense. Post a complete example and then explain what

 you

 desire as an object. It's often helpful to provide a scientific

 background

 for what the data represents.

 --
 David.



 On Fri, Jan 2, 2015 at 2:32 AM, David Winsemius [hidden email]

 wrote:


 On Jan 1, 2015, at 5:07 PM, Kate Ignatius [hidden email] wrote:

 Apologies - mix up of syntax all over the place, a habit of mine.

 The

 last line was in there because of code beforehand so it really

 doesn't

 need to be there.  Here is the proper code I hope:

 childseg-0
 x-sumchild ==0
 span-rle(x)$lengths[rle(x)$values==TRUE]
 childseg[x]-rep(seq_along(span), times = span)


 This remains not reproducible. We have no idea what sumchild might

 be and

 the code throws an error. My guess is that you are trying to get a

 result

 such as would be delivered by:

 childseg - sumchild[ sumchild != 0 ]

 ?
 David.


 On Thu, Jan 1, 2015 at 12:13 PM, Jeff Newmiller
 [hidden email] wrote:

 Thank you for attempting to encode what you want using R syntax,

 but

 you are not really succeeding yet (too many errors). Perhaps

 another hand

 generated result would help? A new input data frame might or

 might not be

 needed to illustrate desired results.

 Your second and third lines are  syntactically incorrect, and I

 don't

 understand what you hope to accomplish by assigning an empty

 string to a

 numeric in your last line.



 ---

 Jeff NewmillerThe .   .

 Go

 Live...
 DCN:[hidden email]Basics: ##.#.   ##.#.  Live Go...
 Live:   OO#.. Dead: OO#..

 Playing

 Research Engineer (Solar/BatteriesO.O#.   #.O#.

 with

 /Software/Embedded Controllers)   .OO#.   .OO#.
 rocks...1k



 ---

 Sent from my phone. Please excuse my brevity.

 On January 1, 2015 4:16:52 AM PST, Kate Ignatius [hidden email]
 wrote:

 Is it possible to add the following code or similar in

 data.table:


 childseg-0
 x:=sumchild -0
 span-rle(x)$lengths[rle(x)$values==TRUE
 childseg[x]-rep(seq_along(span), times = span)
 childseg[childseg == 0]-''

 I was hoping to do this code by Group for mum, dad and
 child.  The problem I'm having is with the
 span-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure

 can

 be added to data.table.

 [Previous email had incorrect code]

 On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller
 [hidden email] wrote:

 I do not understand the value of using the rle function in your

 description,

 but the code below appears to produce the table you want.

 Note that better support for the data.table package might be

 found at

 stackexchange as the documentation specifies.

 x - read.table( text=
 Dad Mum Child Group
 AA RR RA A
 AA RR RR A
 AA AA AA B
 AA AA AA B
 RA AA RR B
 RR AA RR B
 AA AA AA B
 AA AA RA C
 AA AA RA C
 AA RR RA C
 , header=TRUE, stringsAsFactors=FALSE )

 library(data.table)
 DT - data.table( x )
 DT[ , cdad := as.integer( Dad %in% c( AA, RR ) ) ]
 DT[ , sumdad := 0L ]
 DT[ 1==DT$cdad, sumdad := sum( cdad

Re: [R] rle with data.table - is it possible?

2015-01-02 Thread Kate Ignatius
Ah, crap.  Yep you're right.  This is not going too well. Okay - let
me try that again:

x$childseg-0
x-x$sumchild !=0
span-rle(x)$lengths[rle(x)$values==TRUE]
x$childseg[x]-rep(seq_along(span), times = span)

Does this one have any errors?


On Fri, Jan 2, 2015 at 2:32 AM, David Winsemius dwinsem...@comcast.net wrote:

 On Jan 1, 2015, at 5:07 PM, Kate Ignatius kate.ignat...@gmail.com wrote:

 Apologies - mix up of syntax all over the place, a habit of mine.  The
 last line was in there because of code beforehand so it really doesn't
 need to be there.  Here is the proper code I hope:

 childseg-0
 x-sumchild ==0
 span-rle(x)$lengths[rle(x)$values==TRUE]
 childseg[x]-rep(seq_along(span), times = span)


 This remains not reproducible. We have no idea what sumchild might be and the 
 code throws an error. My guess is that you are trying to get a result such as 
 would be delivered by:

 childseg - sumchild[ sumchild != 0 ]

 —
 David.


 On Thu, Jan 1, 2015 at 12:13 PM, Jeff Newmiller
 jdnew...@dcn.davis.ca.us wrote:
 Thank you for attempting to encode what you want using R syntax, but you 
 are not really succeeding yet (too many errors). Perhaps another hand 
 generated result would help? A new input data frame might or might not be 
 needed to illustrate desired results.

 Your second and third lines are  syntactically incorrect, and I don't 
 understand what you hope to accomplish by assigning an empty string to a 
 numeric in your last line.
 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.

 On January 1, 2015 4:16:52 AM PST, Kate Ignatius kate.ignat...@gmail.com 
 wrote:
 Is it possible to add the following code or similar in data.table:

 childseg-0
 x:=sumchild -0
 span-rle(x)$lengths[rle(x)$values==TRUE
 childseg[x]-rep(seq_along(span), times = span)
 childseg[childseg == 0]-''

 I was hoping to do this code by Group for mum, dad and
 child.  The problem I'm having is with the
 span-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can
 be added to data.table.

 [Previous email had incorrect code]

 On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller
 jdnew...@dcn.davis.ca.us wrote:
 I do not understand the value of using the rle function in your
 description,
 but the code below appears to produce the table you want.

 Note that better support for the data.table package might be found at
 stackexchange as the documentation specifies.

 x - read.table( text=
 Dad Mum Child Group
 AA RR RA A
 AA RR RR A
 AA AA AA B
 AA AA AA B
 RA AA RR B
 RR AA RR B
 AA AA AA B
 AA AA RA C
 AA AA RA C
 AA RR RA C
 , header=TRUE, stringsAsFactors=FALSE )

 library(data.table)
 DT - data.table( x )
 DT[ , cdad := as.integer( Dad %in% c( AA, RR ) ) ]
 DT[ , sumdad := 0L ]
 DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ]
 DT[ , cdad := NULL ]
 DT[ , cmum := as.integer( Mum %in% c( AA, RR ) ) ]
 DT[ , summum := 0L ]
 DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ]
 DT[ , cmum := NULL ]
 DT[ , cchild := as.integer( Child %in% c( AA, RR ) ) ]
 DT[ , sumchild := 0L ]
 DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ]
 DT[ , cchild := NULL ]

 DT

Dad Mum Child Group sumdad summum sumchild
 1:  AA  RRRA A  2  20
 2:  AA  RRRR A  2  21
 3:  AA  AAAA B  4  55
 4:  AA  AAAA B  4  55
 5:  RA  AARR B  0  55
 6:  RR  AARR B  4  55
 7:  AA  AAAA B  4  55
 8:  AA  AARA C  3  30
 9:  AA  AARA C  3  30
 10:  AA  RRRA C  3  30


 On Tue, 30 Dec 2014, Kate Ignatius wrote:

 I'm trying to use both these packages and wondering whether they are
 possible...

 To make this simple, my ultimate goal is determine long stretches of
 1s, but I want to do this within groups (hence using the data.table
 as
 I use the set key option.  However, I'm I'm not having much luck
 making this possible.

 For example, for simplistic sake, I have the following data:

 Dad Mum Child Group
 AA RR RA A
 AA RR RR A
 AA AA AA B
 AA AA AA B
 RA AA RR B
 RR AA RR B
 AA AA AA B
 AA AA RA C
 AA AA RA C
 AA RR RA  C

 And the following code which I know works

 hetdad - as.numeric(x[c(1)]==AA | x[c(1)]==RR)
 sumdad - rle(hetdad)$lengths[rle(hetdad)$values==1]

 hetmum - as.numeric(x[c(2)]==AA | x[c(2)]==RR)
 summum - rle(hetmum)$lengths[rle(hetmum)$values==1]

 hetchild - as.numeric(x[c(3)]==AA

Re: [R] rle with data.table - is it possible?

2015-01-01 Thread Kate Ignatius
Is it possible to add the following code or similar in data.table:

childseg-0
x:=sumchild -0
span-rle(x)$lengths[rle(x)$values==TRUE
childseg[x]-rep(seq_along(span), times = span)
childseg[childseg == 0]-''

I was hoping to do this code by Group for mum, dad and
child.  The problem I'm having is with the
span-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can
be added to data.table.

[Previous email had incorrect code]

On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller
jdnew...@dcn.davis.ca.us wrote:
 I do not understand the value of using the rle function in your description,
 but the code below appears to produce the table you want.

 Note that better support for the data.table package might be found at
 stackexchange as the documentation specifies.

 x - read.table( text=
 Dad Mum Child Group
 AA RR RA A
 AA RR RR A
 AA AA AA B
 AA AA AA B
 RA AA RR B
 RR AA RR B
 AA AA AA B
 AA AA RA C
 AA AA RA C
 AA RR RA C
 , header=TRUE, stringsAsFactors=FALSE )

 library(data.table)
 DT - data.table( x )
 DT[ , cdad := as.integer( Dad %in% c( AA, RR ) ) ]
 DT[ , sumdad := 0L ]
 DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ]
 DT[ , cdad := NULL ]
 DT[ , cmum := as.integer( Mum %in% c( AA, RR ) ) ]
 DT[ , summum := 0L ]
 DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ]
 DT[ , cmum := NULL ]
 DT[ , cchild := as.integer( Child %in% c( AA, RR ) ) ]
 DT[ , sumchild := 0L ]
 DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ]
 DT[ , cchild := NULL ]

 DT

 Dad Mum Child Group sumdad summum sumchild
  1:  AA  RRRA A  2  20
  2:  AA  RRRR A  2  21
  3:  AA  AAAA B  4  55
  4:  AA  AAAA B  4  55
  5:  RA  AARR B  0  55
  6:  RR  AARR B  4  55
  7:  AA  AAAA B  4  55
  8:  AA  AARA C  3  30
  9:  AA  AARA C  3  30
 10:  AA  RRRA C  3  30


 On Tue, 30 Dec 2014, Kate Ignatius wrote:

 I'm trying to use both these packages and wondering whether they are
 possible...

 To make this simple, my ultimate goal is determine long stretches of
 1s, but I want to do this within groups (hence using the data.table as
 I use the set key option.  However, I'm I'm not having much luck
 making this possible.

 For example, for simplistic sake, I have the following data:

 Dad Mum Child Group
 AA RR RA A
 AA RR RR A
 AA AA AA B
 AA AA AA B
 RA AA RR B
 RR AA RR B
 AA AA AA B
 AA AA RA C
 AA AA RA C
 AA RR RA  C

 And the following code which I know works

 hetdad - as.numeric(x[c(1)]==AA | x[c(1)]==RR)
 sumdad - rle(hetdad)$lengths[rle(hetdad)$values==1]

 hetmum - as.numeric(x[c(2)]==AA | x[c(2)]==RR)
 summum - rle(hetmum)$lengths[rle(hetmum)$values==1]

 hetchild - as.numeric(x[c(3)]==AA | x[c(3)]==RR)
 sumchild - rle(hetchild)$lengths[rle(hetchild)$values==1]

 However, I wish to do the above code by Group (though this file is
 millions of rows long and groups will be larger but just wanted to
 simply the example).

 I did something like this but of course I got an error:

 LOH[,hetdad:=as.numeric(x[c(1)]==AA | x[c(1)]==RR)]
 LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
 LOH[,hetmum:=as.numeric(x[c(2)]==AA | x[c(2)]==RR)]
 LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
 LOH[,hetchild:=as.numeric(x[c(3)]==AA | x[c(3)]==RR)]
 LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]

 The reason being as I want to eventually have something like this:

 Dad Mum Child Group sumdad summum sumchild
 AA RR RA A 2 2 0
 AA RR RR A 2 2 1
 AA AA AA B 4 5 5
 AA AA AA B 4 5 5
 RA AA RR B 0 5 5
 RR AA RR B 4 5 5
 AA AA AA B 4 5 5
 AA AA RA C 3 3 0
 AA AA RA C 3 3 0
 AA RR RA  C 3 3 0

 That is, I would like to have the specific counts next to what I'm
 consecutively counting per group.  So for Group A for dad there are 2
 AAs,  there are two RRs for mum but only 1 AA or RR for the child and
 that is RR (so the 1 is next to the RR and not the RA).

 Can this be done?

 K.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k

Re: [R] rle with data.table - is it possible?

2015-01-01 Thread Kate Ignatius
Apologies - mix up of syntax all over the place, a habit of mine.  The
last line was in there because of code beforehand so it really doesn't
need to be there.  Here is the proper code I hope:

childseg-0
x-sumchild ==0
span-rle(x)$lengths[rle(x)$values==TRUE]
childseg[x]-rep(seq_along(span), times = span)


On Thu, Jan 1, 2015 at 12:13 PM, Jeff Newmiller
jdnew...@dcn.davis.ca.us wrote:
 Thank you for attempting to encode what you want using R syntax, but you are 
 not really succeeding yet (too many errors). Perhaps another hand generated 
 result would help? A new input data frame might or might not be needed to 
 illustrate desired results.

 Your second and third lines are  syntactically incorrect, and I don't 
 understand what you hope to accomplish by assigning an empty string to a 
 numeric in your last line.
 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.

 On January 1, 2015 4:16:52 AM PST, Kate Ignatius kate.ignat...@gmail.com 
 wrote:
Is it possible to add the following code or similar in data.table:

childseg-0
x:=sumchild -0
span-rle(x)$lengths[rle(x)$values==TRUE
childseg[x]-rep(seq_along(span), times = span)
childseg[childseg == 0]-''

I was hoping to do this code by Group for mum, dad and
child.  The problem I'm having is with the
span-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can
be added to data.table.

[Previous email had incorrect code]

On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller
jdnew...@dcn.davis.ca.us wrote:
 I do not understand the value of using the rle function in your
description,
 but the code below appears to produce the table you want.

 Note that better support for the data.table package might be found at
 stackexchange as the documentation specifies.

 x - read.table( text=
 Dad Mum Child Group
 AA RR RA A
 AA RR RR A
 AA AA AA B
 AA AA AA B
 RA AA RR B
 RR AA RR B
 AA AA AA B
 AA AA RA C
 AA AA RA C
 AA RR RA C
 , header=TRUE, stringsAsFactors=FALSE )

 library(data.table)
 DT - data.table( x )
 DT[ , cdad := as.integer( Dad %in% c( AA, RR ) ) ]
 DT[ , sumdad := 0L ]
 DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ]
 DT[ , cdad := NULL ]
 DT[ , cmum := as.integer( Mum %in% c( AA, RR ) ) ]
 DT[ , summum := 0L ]
 DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ]
 DT[ , cmum := NULL ]
 DT[ , cchild := as.integer( Child %in% c( AA, RR ) ) ]
 DT[ , sumchild := 0L ]
 DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ]
 DT[ , cchild := NULL ]

 DT

 Dad Mum Child Group sumdad summum sumchild
  1:  AA  RRRA A  2  20
  2:  AA  RRRR A  2  21
  3:  AA  AAAA B  4  55
  4:  AA  AAAA B  4  55
  5:  RA  AARR B  0  55
  6:  RR  AARR B  4  55
  7:  AA  AAAA B  4  55
  8:  AA  AARA C  3  30
  9:  AA  AARA C  3  30
 10:  AA  RRRA C  3  30


 On Tue, 30 Dec 2014, Kate Ignatius wrote:

 I'm trying to use both these packages and wondering whether they are
 possible...

 To make this simple, my ultimate goal is determine long stretches of
 1s, but I want to do this within groups (hence using the data.table
as
 I use the set key option.  However, I'm I'm not having much luck
 making this possible.

 For example, for simplistic sake, I have the following data:

 Dad Mum Child Group
 AA RR RA A
 AA RR RR A
 AA AA AA B
 AA AA AA B
 RA AA RR B
 RR AA RR B
 AA AA AA B
 AA AA RA C
 AA AA RA C
 AA RR RA  C

 And the following code which I know works

 hetdad - as.numeric(x[c(1)]==AA | x[c(1)]==RR)
 sumdad - rle(hetdad)$lengths[rle(hetdad)$values==1]

 hetmum - as.numeric(x[c(2)]==AA | x[c(2)]==RR)
 summum - rle(hetmum)$lengths[rle(hetmum)$values==1]

 hetchild - as.numeric(x[c(3)]==AA | x[c(3)]==RR)
 sumchild - rle(hetchild)$lengths[rle(hetchild)$values==1]

 However, I wish to do the above code by Group (though this file is
 millions of rows long and groups will be larger but just wanted to
 simply the example).

 I did something like this but of course I got an error:

 LOH[,hetdad:=as.numeric(x[c(1)]==AA | x[c(1)]==RR)]
 LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
 LOH[,hetmum:=as.numeric(x[c(2)]==AA | x[c(2)]==RR)]
 LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
 LOH[,hetchild:=as.numeric(x[c(3)]==AA | x[c(3)]==RR)]

LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1

Re: [R] rle with data.table - is it possible?

2014-12-31 Thread Kate Ignatius
Is it possible to add the following code or similar in data.table:

childseg-0
x:=sumchild -0
span-rle(x)$lengths[rle(x)$values==TRUE
childseg[x]-rep(seq_along(span), times = spanLOH)
childseg[childseg == 0]-''

I was hoping to do this code by SNPEFF_GENE_NAME for mum, dad and
child.  The problem I'm having is with the
span-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can
be added to data.table.


On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller
jdnew...@dcn.davis.ca.us wrote:
 I do not understand the value of using the rle function in your description,
 but the code below appears to produce the table you want.

 Note that better support for the data.table package might be found at
 stackexchange as the documentation specifies.

 x - read.table( text=
 Dad Mum Child Group
 AA RR RA A
 AA RR RR A
 AA AA AA B
 AA AA AA B
 RA AA RR B
 RR AA RR B
 AA AA AA B
 AA AA RA C
 AA AA RA C
 AA RR RA C
 , header=TRUE, stringsAsFactors=FALSE )

 library(data.table)
 DT - data.table( x )
 DT[ , cdad := as.integer( Dad %in% c( AA, RR ) ) ]
 DT[ , sumdad := 0L ]
 DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ]
 DT[ , cdad := NULL ]
 DT[ , cmum := as.integer( Mum %in% c( AA, RR ) ) ]
 DT[ , summum := 0L ]
 DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ]
 DT[ , cmum := NULL ]
 DT[ , cchild := as.integer( Child %in% c( AA, RR ) ) ]
 DT[ , sumchild := 0L ]
 DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ]
 DT[ , cchild := NULL ]

 DT

 Dad Mum Child Group sumdad summum sumchild
  1:  AA  RRRA A  2  20
  2:  AA  RRRR A  2  21
  3:  AA  AAAA B  4  55
  4:  AA  AAAA B  4  55
  5:  RA  AARR B  0  55
  6:  RR  AARR B  4  55
  7:  AA  AAAA B  4  55
  8:  AA  AARA C  3  30
  9:  AA  AARA C  3  30
 10:  AA  RRRA C  3  30


 On Tue, 30 Dec 2014, Kate Ignatius wrote:

 I'm trying to use both these packages and wondering whether they are
 possible...

 To make this simple, my ultimate goal is determine long stretches of
 1s, but I want to do this within groups (hence using the data.table as
 I use the set key option.  However, I'm I'm not having much luck
 making this possible.

 For example, for simplistic sake, I have the following data:

 Dad Mum Child Group
 AA RR RA A
 AA RR RR A
 AA AA AA B
 AA AA AA B
 RA AA RR B
 RR AA RR B
 AA AA AA B
 AA AA RA C
 AA AA RA C
 AA RR RA  C

 And the following code which I know works

 hetdad - as.numeric(x[c(1)]==AA | x[c(1)]==RR)
 sumdad - rle(hetdad)$lengths[rle(hetdad)$values==1]

 hetmum - as.numeric(x[c(2)]==AA | x[c(2)]==RR)
 summum - rle(hetmum)$lengths[rle(hetmum)$values==1]

 hetchild - as.numeric(x[c(3)]==AA | x[c(3)]==RR)
 sumchild - rle(hetchild)$lengths[rle(hetchild)$values==1]

 However, I wish to do the above code by Group (though this file is
 millions of rows long and groups will be larger but just wanted to
 simply the example).

 I did something like this but of course I got an error:

 LOH[,hetdad:=as.numeric(x[c(1)]==AA | x[c(1)]==RR)]
 LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
 LOH[,hetmum:=as.numeric(x[c(2)]==AA | x[c(2)]==RR)]
 LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
 LOH[,hetchild:=as.numeric(x[c(3)]==AA | x[c(3)]==RR)]
 LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]

 The reason being as I want to eventually have something like this:

 Dad Mum Child Group sumdad summum sumchild
 AA RR RA A 2 2 0
 AA RR RR A 2 2 1
 AA AA AA B 4 5 5
 AA AA AA B 4 5 5
 RA AA RR B 0 5 5
 RR AA RR B 4 5 5
 AA AA AA B 4 5 5
 AA AA RA C 3 3 0
 AA AA RA C 3 3 0
 AA RR RA  C 3 3 0

 That is, I would like to have the specific counts next to what I'm
 consecutively counting per group.  So for Group A for dad there are 2
 AAs,  there are two RRs for mum but only 1 AA or RR for the child and
 that is RR (so the 1 is next to the RR and not the RA).

 Can this be done?

 K.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 ---

__
R-help@r-project.org

Re: [R] rle with data.table - is it possible?

2014-12-31 Thread Kate Ignatius
correct code:

childseg-0
x:=sumchild -0
span-rle(x)$lengths[rle(x)$values==TRUE
childseg[x]-rep(seq_along(span), times = span)
childseg[childseg == 0]-''

On Thu, Jan 1, 2015 at 1:56 AM, Kate Ignatius kate.ignat...@gmail.com wrote:
 Is it possible to add the following code or similar in data.table:

 childseg-0
 x:=sumchild -0
 span-rle(x)$lengths[rle(x)$values==TRUE
 childseg[x]-rep(seq_along(span), times = spanLOH)
 childseg[childseg == 0]-''

 I was hoping to do this code by SNPEFF_GENE_NAME for mum, dad and
 child.  The problem I'm having is with the
 span-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can
 be added to data.table.


 On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller
 jdnew...@dcn.davis.ca.us wrote:
 I do not understand the value of using the rle function in your description,
 but the code below appears to produce the table you want.

 Note that better support for the data.table package might be found at
 stackexchange as the documentation specifies.

 x - read.table( text=
 Dad Mum Child Group
 AA RR RA A
 AA RR RR A
 AA AA AA B
 AA AA AA B
 RA AA RR B
 RR AA RR B
 AA AA AA B
 AA AA RA C
 AA AA RA C
 AA RR RA C
 , header=TRUE, stringsAsFactors=FALSE )

 library(data.table)
 DT - data.table( x )
 DT[ , cdad := as.integer( Dad %in% c( AA, RR ) ) ]
 DT[ , sumdad := 0L ]
 DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ]
 DT[ , cdad := NULL ]
 DT[ , cmum := as.integer( Mum %in% c( AA, RR ) ) ]
 DT[ , summum := 0L ]
 DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ]
 DT[ , cmum := NULL ]
 DT[ , cchild := as.integer( Child %in% c( AA, RR ) ) ]
 DT[ , sumchild := 0L ]
 DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ]
 DT[ , cchild := NULL ]

 DT

 Dad Mum Child Group sumdad summum sumchild
  1:  AA  RRRA A  2  20
  2:  AA  RRRR A  2  21
  3:  AA  AAAA B  4  55
  4:  AA  AAAA B  4  55
  5:  RA  AARR B  0  55
  6:  RR  AARR B  4  55
  7:  AA  AAAA B  4  55
  8:  AA  AARA C  3  30
  9:  AA  AARA C  3  30
 10:  AA  RRRA C  3  30


 On Tue, 30 Dec 2014, Kate Ignatius wrote:

 I'm trying to use both these packages and wondering whether they are
 possible...

 To make this simple, my ultimate goal is determine long stretches of
 1s, but I want to do this within groups (hence using the data.table as
 I use the set key option.  However, I'm I'm not having much luck
 making this possible.

 For example, for simplistic sake, I have the following data:

 Dad Mum Child Group
 AA RR RA A
 AA RR RR A
 AA AA AA B
 AA AA AA B
 RA AA RR B
 RR AA RR B
 AA AA AA B
 AA AA RA C
 AA AA RA C
 AA RR RA  C

 And the following code which I know works

 hetdad - as.numeric(x[c(1)]==AA | x[c(1)]==RR)
 sumdad - rle(hetdad)$lengths[rle(hetdad)$values==1]

 hetmum - as.numeric(x[c(2)]==AA | x[c(2)]==RR)
 summum - rle(hetmum)$lengths[rle(hetmum)$values==1]

 hetchild - as.numeric(x[c(3)]==AA | x[c(3)]==RR)
 sumchild - rle(hetchild)$lengths[rle(hetchild)$values==1]

 However, I wish to do the above code by Group (though this file is
 millions of rows long and groups will be larger but just wanted to
 simply the example).

 I did something like this but of course I got an error:

 LOH[,hetdad:=as.numeric(x[c(1)]==AA | x[c(1)]==RR)]
 LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
 LOH[,hetmum:=as.numeric(x[c(2)]==AA | x[c(2)]==RR)]
 LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
 LOH[,hetchild:=as.numeric(x[c(3)]==AA | x[c(3)]==RR)]
 LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]

 The reason being as I want to eventually have something like this:

 Dad Mum Child Group sumdad summum sumchild
 AA RR RA A 2 2 0
 AA RR RR A 2 2 1
 AA AA AA B 4 5 5
 AA AA AA B 4 5 5
 RA AA RR B 0 5 5
 RR AA RR B 4 5 5
 AA AA AA B 4 5 5
 AA AA RA C 3 3 0
 AA AA RA C 3 3 0
 AA RR RA  C 3 3 0

 That is, I would like to have the specific counts next to what I'm
 consecutively counting per group.  So for Group A for dad there are 2
 AAs,  there are two RRs for mum but only 1 AA or RR for the child and
 that is RR (so the 1 is next to the RR and not the RA).

 Can this be done?

 K.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O

[R] rle with data.table - is it possible?

2014-12-30 Thread Kate Ignatius
I'm trying to use both these packages and wondering whether they are possible...

To make this simple, my ultimate goal is determine long stretches of
1s, but I want to do this within groups (hence using the data.table as
I use the set key option.  However, I'm I'm not having much luck
making this possible.

For example, for simplistic sake, I have the following data:

Dad Mum Child Group
AA RR RA A
AA RR RR A
AA AA AA B
AA AA AA B
RA AA RR B
RR AA RR B
AA AA AA B
AA AA RA C
AA AA RA C
AA RR RA  C

And the following code which I know works

hetdad - as.numeric(x[c(1)]==AA | x[c(1)]==RR)
sumdad - rle(hetdad)$lengths[rle(hetdad)$values==1]

hetmum - as.numeric(x[c(2)]==AA | x[c(2)]==RR)
summum - rle(hetmum)$lengths[rle(hetmum)$values==1]

hetchild - as.numeric(x[c(3)]==AA | x[c(3)]==RR)
sumchild - rle(hetchild)$lengths[rle(hetchild)$values==1]

However, I wish to do the above code by Group (though this file is
millions of rows long and groups will be larger but just wanted to
simply the example).

I did something like this but of course I got an error:

LOH[,hetdad:=as.numeric(x[c(1)]==AA | x[c(1)]==RR)]
LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
LOH[,hetmum:=as.numeric(x[c(2)]==AA | x[c(2)]==RR)]
LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
LOH[,hetchild:=as.numeric(x[c(3)]==AA | x[c(3)]==RR)]
LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]

The reason being as I want to eventually have something like this:

Dad Mum Child Group sumdad summum sumchild
AA RR RA A 2 2 0
AA RR RR A 2 2 1
AA AA AA B 4 5 5
AA AA AA B 4 5 5
RA AA RR B 0 5 5
RR AA RR B 4 5 5
AA AA AA B 4 5 5
AA AA RA C 3 3 0
AA AA RA C 3 3 0
AA RR RA  C 3 3 0

That is, I would like to have the specific counts next to what I'm
consecutively counting per group.  So for Group A for dad there are 2
AAs,  there are two RRs for mum but only 1 AA or RR for the child and
that is RR (so the 1 is next to the RR and not the RA).

Can this be done?

K.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Printing/Generating/Outputting a Table (Not Latex)

2014-12-09 Thread Kate Ignatius
Thanks!  I do get several errors though when running on Linux.

Running your code, I get this:

Error in system(cmd, intern = TRUE, wait = TRUE) :
error in running command

Fiddling around with the code and running this:

tmp - matrix(1:9,3,3)
tmp.tex - latex(tmp, file='tmp.tex')
print.default(tmp.tex)
tmp.dvi - dvi(tmp.tex)
tmp.dvi
tmp.tex
dvips(tmp.dvi)
dvips(tmp.tex)
library(tools)
texi2dvi(file='tmp.tex', pdf=TRUE, clean=TRUE)

I get this:

Error in texi2dvi(file=tmp.tex,,  :
  Running 'texi2dvi' on 'tmp.tex' failed.
Messages:
/usr/bin/texi2dvi: pdflatex exited with bad status, quitting.

I've read that it may have something to do with the path of pdflatex.

Sys.which('pdflatex')

   pdflatex

/usr/bin/pdflatex


Sys.which('texi2dvi')

   texi2dvi

/usr/bin/texi2dvi

 file.exists(Sys.which('texi2dvi'))

[1] TRUE

 file.exists(Sys.which('pdflatex'))

[1] TRUE

Is there a specific path I should be giving with pdflatex and/or
'texi2dvi to make this work?

Thanks!

On Mon, Dec 8, 2014 at 11:13 PM, Richard M. Heiberger r...@temple.edu wrote:
 yes of course, and the answer is latex() in the Hmisc package.
 Why were you excluding it?
 Details follow

 Rich


 The current release of the Hmisc package has this capability on
 Macintosh and Linux.
 For Windows, you need the next release 3.14-7 which is available now at 
 github.

 ## windows needs these lines until the new Hmisc version is on CRAN
 install.packages(devtools)
 devtools::install_github(Hmisc, harrelfe)

 ## All operating systems
 options(latexcmd='pdflatex')
 options(dviExtension='pdf')

 ## Macintosh
 options(xdvicmd='open')

 ## Windows, one of the following
 options(xdvicmd='c:\\progra~1\\Adobe\\Reader~1.0\\Reader\\AcroRd32.exe')
 ## 32-bit windows
 options(xdvicmd='c:\\progra~2\\Adobe\\Reader~1.0\\Reader\\AcroRd32.exe')
 ## 64 bit windows

 ## Linux
 ## I don't know the xdvicmd value


 ## this works on all R systems
 library(Hmisc)
 tmp - matrix(1:9,3,3)
 tmp.dvi - dvi(latex(tmp))
 print.default(tmp.dvi) ## prints filepath of the pdf file
 tmp.dvi  ## displays the pdf file on your screen

 On Mon, Dec 8, 2014 at 9:31 PM, Kate Ignatius kate.ignat...@gmail.com wrote:
 Hi,

 I have a simple question.  I know there are plenty of packages out
 there that can provide code to generate a table in latex.  But I was
 wondering whether there was one out there where I can generate a table
 from my data (which ever way I please) then allow me to save it as a
 pdf?

 Thanks

 K.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Printing/Generating/Outputting a Table (Not Latex)

2014-12-09 Thread Kate Ignatius
Ah yes, you're right.

The log has this error:

! LaTeX Error: Missing \begin{document}.

Though can't really find much online on how to resolve it.

On Tue, Dec 9, 2014 at 1:15 PM, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote:
 pdflatex appears to have run, because it exited. You should look at the tex 
 log file, the problem is more likely that the latex you sent out to pdflatex 
 was incomplete.
 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.

 On December 9, 2014 8:43:02 AM PST, Kate Ignatius kate.ignat...@gmail.com 
 wrote:
Thanks!  I do get several errors though when running on Linux.

Running your code, I get this:

Error in system(cmd, intern = TRUE, wait = TRUE) :
error in running command

Fiddling around with the code and running this:

tmp - matrix(1:9,3,3)
tmp.tex - latex(tmp, file='tmp.tex')
print.default(tmp.tex)
tmp.dvi - dvi(tmp.tex)
tmp.dvi
tmp.tex
dvips(tmp.dvi)
dvips(tmp.tex)
library(tools)
texi2dvi(file='tmp.tex', pdf=TRUE, clean=TRUE)

I get this:

Error in texi2dvi(file=tmp.tex,,  :
  Running 'texi2dvi' on 'tmp.tex' failed.
Messages:
/usr/bin/texi2dvi: pdflatex exited with bad status, quitting.

I've read that it may have something to do with the path of pdflatex.

Sys.which('pdflatex')

   pdflatex

/usr/bin/pdflatex


Sys.which('texi2dvi')

   texi2dvi

/usr/bin/texi2dvi

 file.exists(Sys.which('texi2dvi'))

[1] TRUE

 file.exists(Sys.which('pdflatex'))

[1] TRUE

Is there a specific path I should be giving with pdflatex and/or
'texi2dvi to make this work?

Thanks!

On Mon, Dec 8, 2014 at 11:13 PM, Richard M. Heiberger r...@temple.edu
wrote:
 yes of course, and the answer is latex() in the Hmisc package.
 Why were you excluding it?
 Details follow

 Rich


 The current release of the Hmisc package has this capability on
 Macintosh and Linux.
 For Windows, you need the next release 3.14-7 which is available now
at github.

 ## windows needs these lines until the new Hmisc version is on CRAN
 install.packages(devtools)
 devtools::install_github(Hmisc, harrelfe)

 ## All operating systems
 options(latexcmd='pdflatex')
 options(dviExtension='pdf')

 ## Macintosh
 options(xdvicmd='open')

 ## Windows, one of the following

options(xdvicmd='c:\\progra~1\\Adobe\\Reader~1.0\\Reader\\AcroRd32.exe')
 ## 32-bit windows

options(xdvicmd='c:\\progra~2\\Adobe\\Reader~1.0\\Reader\\AcroRd32.exe')
 ## 64 bit windows

 ## Linux
 ## I don't know the xdvicmd value


 ## this works on all R systems
 library(Hmisc)
 tmp - matrix(1:9,3,3)
 tmp.dvi - dvi(latex(tmp))
 print.default(tmp.dvi) ## prints filepath of the pdf file
 tmp.dvi  ## displays the pdf file on your screen

 On Mon, Dec 8, 2014 at 9:31 PM, Kate Ignatius
kate.ignat...@gmail.com wrote:
 Hi,

 I have a simple question.  I know there are plenty of packages out
 there that can provide code to generate a table in latex.  But I was
 wondering whether there was one out there where I can generate a
table
 from my data (which ever way I please) then allow me to save it as a
 pdf?

 Thanks

 K.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Printing/Generating/Outputting a Table (Not Latex)

2014-12-09 Thread Kate Ignatius
I set these options:

options(latexcmd='pdflatex')
options(dviExtension='pdf')
options(xdvicmd='xdvi')

Maybe one too many?  I'm running in Linux.



On Tue, Dec 9, 2014 at 3:24 PM, Richard M. Heiberger r...@temple.edu wrote:
 It looks like you skipped the step of setting the options.
 the latex function doesn't do pdflatex (by default it does regular
 latex) unless you set the options
 as I indicated.

 On Tue, Dec 9, 2014 at 3:11 PM, Kate Ignatius kate.ignat...@gmail.com wrote:
 Ah yes, you're right.

 The log has this error:

 ! LaTeX Error: Missing \begin{document}.

 Though can't really find much online on how to resolve it.

 On Tue, Dec 9, 2014 at 1:15 PM, Jeff Newmiller jdnew...@dcn.davis.ca.us 
 wrote:
 pdflatex appears to have run, because it exited. You should look at the tex 
 log file, the problem is more likely that the latex you sent out to 
 pdflatex was incomplete.
 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.

 On December 9, 2014 8:43:02 AM PST, Kate Ignatius kate.ignat...@gmail.com 
 wrote:
Thanks!  I do get several errors though when running on Linux.

Running your code, I get this:

Error in system(cmd, intern = TRUE, wait = TRUE) :
error in running command

Fiddling around with the code and running this:

tmp - matrix(1:9,3,3)
tmp.tex - latex(tmp, file='tmp.tex')
print.default(tmp.tex)
tmp.dvi - dvi(tmp.tex)
tmp.dvi
tmp.tex
dvips(tmp.dvi)
dvips(tmp.tex)
library(tools)
texi2dvi(file='tmp.tex', pdf=TRUE, clean=TRUE)

I get this:

Error in texi2dvi(file=tmp.tex,,  :
  Running 'texi2dvi' on 'tmp.tex' failed.
Messages:
/usr/bin/texi2dvi: pdflatex exited with bad status, quitting.

I've read that it may have something to do with the path of pdflatex.

Sys.which('pdflatex')

   pdflatex

/usr/bin/pdflatex


Sys.which('texi2dvi')

   texi2dvi

/usr/bin/texi2dvi

 file.exists(Sys.which('texi2dvi'))

[1] TRUE

 file.exists(Sys.which('pdflatex'))

[1] TRUE

Is there a specific path I should be giving with pdflatex and/or
'texi2dvi to make this work?

Thanks!

On Mon, Dec 8, 2014 at 11:13 PM, Richard M. Heiberger r...@temple.edu
wrote:
 yes of course, and the answer is latex() in the Hmisc package.
 Why were you excluding it?
 Details follow

 Rich


 The current release of the Hmisc package has this capability on
 Macintosh and Linux.
 For Windows, you need the next release 3.14-7 which is available now
at github.

 ## windows needs these lines until the new Hmisc version is on CRAN
 install.packages(devtools)
 devtools::install_github(Hmisc, harrelfe)

 ## All operating systems
 options(latexcmd='pdflatex')
 options(dviExtension='pdf')

 ## Macintosh
 options(xdvicmd='open')

 ## Windows, one of the following

options(xdvicmd='c:\\progra~1\\Adobe\\Reader~1.0\\Reader\\AcroRd32.exe')
 ## 32-bit windows

options(xdvicmd='c:\\progra~2\\Adobe\\Reader~1.0\\Reader\\AcroRd32.exe')
 ## 64 bit windows

 ## Linux
 ## I don't know the xdvicmd value


 ## this works on all R systems
 library(Hmisc)
 tmp - matrix(1:9,3,3)
 tmp.dvi - dvi(latex(tmp))
 print.default(tmp.dvi) ## prints filepath of the pdf file
 tmp.dvi  ## displays the pdf file on your screen

 On Mon, Dec 8, 2014 at 9:31 PM, Kate Ignatius
kate.ignat...@gmail.com wrote:
 Hi,

 I have a simple question.  I know there are plenty of packages out
 there that can provide code to generate a table in latex.  But I was
 wondering whether there was one out there where I can generate a
table
 from my data (which ever way I please) then allow me to save it as a
 pdf?

 Thanks

 K.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Printing/Generating/Outputting a Table (Not Latex)

2014-12-09 Thread Kate Ignatius
Okay, all.

I have it to work using this:

library(Hmisc)
options(latexcmd='pdflatex')
options(dviExtension='pdf')
options(xdvicmd='gnome-open')

Running your simple code from above... by question is this:  the pdf
is saved in a tmp directory... where do I change the directory path? I
thought it was simply this:

tmp.dvi - dvi(latex(m2,file='/path/to/file/tmp.pdf', label=Title))

But maybe not.  In addition is it possible to change page size with this?

K.

On Tue, Dec 9, 2014 at 4:02 PM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote:
 On 09/12/2014 20:47, Richard M. Heiberger wrote:

 the last one is wrong.  That is the one for which I don't know the
 right answer on linux.

 'xdvi' displays dvi files.  you need to display a pdf file.
 whatever is the right program on linux to display pdf files is what
 belongs there.

 On Macintosh we can avoid knowing by using 'open', which means use the
 system standard.
 I don't know what the linux equivalent is, either the exact program or
 the instruction to use the standard.


 xdg-open (but like OS X it depends on having the right associations set).



 On Tue, Dec 9, 2014 at 3:36 PM, Kate Ignatius kate.ignat...@gmail.com
 wrote:

 I set these options:

 options(latexcmd='pdflatex')
 options(dviExtension='pdf')
 options(xdvicmd='xdvi')

 Maybe one too many?  I'm running in Linux.



 On Tue, Dec 9, 2014 at 3:24 PM, Richard M. Heiberger r...@temple.edu
 wrote:

 It looks like you skipped the step of setting the options.
 the latex function doesn't do pdflatex (by default it does regular
 latex) unless you set the options
 as I indicated.

 On Tue, Dec 9, 2014 at 3:11 PM, Kate Ignatius kate.ignat...@gmail.com
 wrote:

 Ah yes, you're right.

 The log has this error:

 ! LaTeX Error: Missing \begin{document}.

 Though can't really find much online on how to resolve it.

 On Tue, Dec 9, 2014 at 1:15 PM, Jeff Newmiller
 jdnew...@dcn.davis.ca.us wrote:

 pdflatex appears to have run, because it exited. You should look at
 the tex log file, the problem is more likely that the latex you sent out 
 to
 pdflatex was incomplete.

 ---
 Jeff NewmillerThe .   .  Go
 Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
 Go...
Live:   OO#.. Dead: OO#..
 Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.
 rocks...1k

 ---
 Sent from my phone. Please excuse my brevity.

 On December 9, 2014 8:43:02 AM PST, Kate Ignatius
 kate.ignat...@gmail.com wrote:

 Thanks!  I do get several errors though when running on Linux.

 Running your code, I get this:

 Error in system(cmd, intern = TRUE, wait = TRUE) :
 error in running command

 Fiddling around with the code and running this:

 tmp - matrix(1:9,3,3)
 tmp.tex - latex(tmp, file='tmp.tex')
 print.default(tmp.tex)
 tmp.dvi - dvi(tmp.tex)
 tmp.dvi
 tmp.tex
 dvips(tmp.dvi)
 dvips(tmp.tex)
 library(tools)
 texi2dvi(file='tmp.tex', pdf=TRUE, clean=TRUE)

 I get this:

 Error in texi2dvi(file=tmp.tex,,  :
   Running 'texi2dvi' on 'tmp.tex' failed.
 Messages:
 /usr/bin/texi2dvi: pdflatex exited with bad status, quitting.

 I've read that it may have something to do with the path of pdflatex.

 Sys.which('pdflatex')

pdflatex

 /usr/bin/pdflatex


 Sys.which('texi2dvi')

texi2dvi

 /usr/bin/texi2dvi

 file.exists(Sys.which('texi2dvi'))


 [1] TRUE

 file.exists(Sys.which('pdflatex'))


 [1] TRUE

 Is there a specific path I should be giving with pdflatex and/or
 'texi2dvi to make this work?

 Thanks!

 On Mon, Dec 8, 2014 at 11:13 PM, Richard M. Heiberger
 r...@temple.edu
 wrote:

 yes of course, and the answer is latex() in the Hmisc package.
 Why were you excluding it?
 Details follow

 Rich


 The current release of the Hmisc package has this capability on
 Macintosh and Linux.
 For Windows, you need the next release 3.14-7 which is available now

 at github.


 ## windows needs these lines until the new Hmisc version is on CRAN
 install.packages(devtools)
 devtools::install_github(Hmisc, harrelfe)

 ## All operating systems
 options(latexcmd='pdflatex')
 options(dviExtension='pdf')

 ## Macintosh
 options(xdvicmd='open')

 ## Windows, one of the following


 options(xdvicmd='c:\\progra~1\\Adobe\\Reader~1.0\\Reader\\AcroRd32.exe')

 ## 32-bit windows


 options(xdvicmd='c:\\progra~2\\Adobe\\Reader~1.0\\Reader\\AcroRd32.exe')

 ## 64 bit windows

 ## Linux
 ## I don't know the xdvicmd value


 ## this works on all R systems
 library(Hmisc)
 tmp - matrix(1:9,3,3)
 tmp.dvi - dvi(latex(tmp))
 print.default(tmp.dvi) ## prints filepath of the pdf file
 tmp.dvi  ## displays the pdf file on your screen

 On Mon, Dec 8, 2014 at 9:31 PM, Kate Ignatius

 kate.ignat...@gmail.com wrote:

 Hi

[R] Printing/Generating/Outputting a Table (Not Latex)

2014-12-08 Thread Kate Ignatius
Hi,

I have a simple question.  I know there are plenty of packages out
there that can provide code to generate a table in latex.  But I was
wondering whether there was one out there where I can generate a table
from my data (which ever way I please) then allow me to save it as a
pdf?

Thanks

K.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] recoding genetic information using gsub

2014-12-05 Thread Kate Ignatius
I have genetic information for several thousand individuals:

A/T
T/G
C/G  etc

For some individuals there are some genotypes that are like this:  A/,
C/, T/, G/ or even just / which represents missing and I want to
change these to the following:

A/ A/.
C/ C/.
G/ G/.
T/ T/.
/ ./.
/A ./A
/C ./C
/G ./G
/T ./T

I've tried to use gsub with a command like the following:

gsub(A/,[A/.], GT[,6])

but if genotypes arent like the above, the command will change it to
look something like:

A/.T
T/.G
C/.G

Is there anyway to be more specific in gsub?

Thanks!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] grep won't work finding one column

2014-10-14 Thread Kate Ignatius
I'm having an issue with grep:

I have numerous columns that end with .at... when I use grep like so:

df[,grep(.at,colnames(df))]

it works fine.  When I have one column that ends with .at, it does not
work.  Why is that?  As this is loop with varying number of columns
ending in .at I would like some code that would work with 1 to n
number of columns.

Is there something more optimal than grep?

Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grep won't work finding one column

2014-10-14 Thread Kate Ignatius
For example,

DF will usually have numerous columns with sample1.at sample1.dp
sample1.fg sample2.at sample2.dp sample2.fg and so on

I'm running this code in R as part of a shell script which runs over
several different file sizes so sometimes it will come across a file
with one sample in it: i.e. sample1: when the R code runs through this
file... trying to grep out  the sample1.at column does not work and
it will halt and stop.

Here is some sample data... say I want to get out the AT_ only column


Sample_1 AT_1
A/A RR
G/G AA
T/T AA
G/A RA
G/G RR
C/C AA
C/C AA
C/T RA
A/A AA
T/G RA

it will have a problem grepping out this single column.

On Tue, Oct 14, 2014 at 10:38 AM, John McKown
john.archie.mck...@gmail.com wrote:
 On Tue, Oct 14, 2014 at 9:23 AM, Kate Ignatius kate.ignat...@gmail.com 
 wrote:
 I'm having an issue with grep:

 I have numerous columns that end with .at... when I use grep like so:

 df[,grep(.at,colnames(df))]

 it works fine.  When I have one column that ends with .at, it does not
 work.  Why is that?  As this is loop with varying number of columns
 ending in .at I would like some code that would work with 1 to n
 number of columns.

 Is there something more optimal than grep?

 Thanks!

 I can't answer your direct question. But do you realize that your code
 does not match your words? The grep show does not _only_ match columns
 who name end with the characters '.at'. It matches all column names
 which contain any character followed by the characters at. To do the
 match with only columns whose names end with the characters .at, you
 need: grep(\.at$,colnames(df)).

 You might want to post an example which fails. Just to be complete, be
 sure to use the dput() function so that it is easy for members of the
 group to cut'n'paste to get your data into our own R workspace.

 --
 There is nothing more pleasant than traveling and meeting new people!
 Genghis Khan

 Maranatha! 
 John McKown

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grep won't work finding one column

2014-10-14 Thread Kate Ignatius
In the sense - it does not work.  it works when there are 50 samples
in the file, but it does not work when there is one.

The usual headings are:  sample1.at sample1.dp
sample1.fg sample2.at sample2.dp sample2.fg and so on to a max of
sample50.at sample50.dp sample50.fg

using this greps out all the .at columns perfectly:

df[,grep(.at,colnames(df))]

When I come across a file when there is one sample:

sample1.at sample1.dp sample1.fg

Using this:

df[,grep(.at,colnames(df))]

returns nothing.

Oh - AT/at was just an example... thats not my problem...



On Tue, Oct 14, 2014 at 10:57 AM, Jeff Newmiller
jdnew...@dcn.davis.ca.us wrote:
 Your question is missing a reproducible example, and you don't say how it 
 does not work, so we cannot tell what is going on.

 Two things do come to mind, though.

 A) Data frame subsets with only one column by default return a vector, which 
 is a different type of object than a single-column data frame. You would need 
 to read ?[.data.frame about the drop argument if you wanted to 
 consistently get a data frame from this expression.

 B) The period is a wildcard in regular expressions. If you expect to limit 
 your search to literal .at at the end of the name then you should use the 
 search pattern  \\.at$ instead (the first slash allows the second one to be 
 stored by R in the string, and the second one is the only one seen by grep, 
 which it reads as making the period not act like a wildcard). You really 
 should read about regular expressions before using them. There are many 
 tutorials on the web about this topic.

 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.

 On October 14, 2014 7:23:55 AM PDT, Kate Ignatius kate.ignat...@gmail.com 
 wrote:
I'm having an issue with grep:

I have numerous columns that end with .at... when I use grep like so:

df[,grep(.at,colnames(df))]

it works fine.  When I have one column that ends with .at, it does not
work.  Why is that?  As this is loop with varying number of columns
ending in .at I would like some code that would work with 1 to n
number of columns.

Is there something more optimal than grep?

Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help with a function [along columns]

2014-10-13 Thread Kate Ignatius
Hi all,

I need help with a function.  I'm trying to write a function to apply
to varying number of columns in a lot of files - hence the function...
but I'm getting stuck.  Here it is:

gt- function(x) {
alleles - sapply(x, function(.) strsplit(as.character(.), /))
gt - apply(x, function(.) ifelse(x[1] == vcf[3]  x[2] == vcf[3], 'RR',
ifelse(x[1] == vcf[4]  x[2] == vcf[4], 'AA',
ifelse(x[1] == vcf[3]  x[2] == vcf[4], 'RA',
ifelse(x[1] == vcf[4]  x[2] == vcf[3], 'RA', '')
}

I have different sized family genetic files and at the end of the day
I want to see whether the alleles of each person in the family match
the ref and/or the alt and if so, give AA, RA or RR.

Like so:

REF ALT Sample_1 GT_1 Sample_2 GT_2
A G A/A RR A/G RA
T G G/G AA T/T RR
A T T/T AA A/A RR
G A G/A RA G/G RR
G A G/G RR G/A RA
T C C/C AA C/C AA
T C C/C AA C/C AA
C T C/T RA T/T AA
G A A/A AA A/A AA
T G T/G RA G/G AA


Is there an easy way to do this?

Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with a function [along columns]

2014-10-13 Thread Kate Ignatius
Just an update to this:

gtal - function(d) {
alleles - sapply(d, function(.) strsplit(as.character(.), /))
gt - unlist(lapply(alleles, function(x)
   ifelse(identical(x[[1]], vcf[,3])  identical(x[[2]], vcf[,3]), 'RR',
   ifelse(identical(x[[1]], vcf[,4])  identical(x[[2]], vcf[,4]), 'AA',
   ifelse(identical(x[[1]], vcf[,3])  identical(x[[2]], vcf[,4]), 'RA',
   ifelse(identical(x[[1]], vcf[,4])  identical(x[[2]],
vcf[,3]), 'RA', ''))
}

I've got something working but I'm having trouble with the gt part...
I'm getting the error: object of type 'closure' is not subsettable.
The vcf is my original file that I want to match with so not sure
whether this a problem.

On Mon, Oct 13, 2014 at 4:46 PM, Kate Ignatius kate.ignat...@gmail.com wrote:
 Hi all,

 I need help with a function.  I'm trying to write a function to apply
 to varying number of columns in a lot of files - hence the function...
 but I'm getting stuck.  Here it is:

 gt- function(x) {
 alleles - sapply(x, function(.) strsplit(as.character(.), /))
 gt - apply(x, function(.) ifelse(x[1] == vcf[3]  x[2] == vcf[3], 'RR',
 ifelse(x[1] == vcf[4]  x[2] == vcf[4], 'AA',
 ifelse(x[1] == vcf[3]  x[2] == vcf[4], 'RA',
 ifelse(x[1] == vcf[4]  x[2] == vcf[3], 'RA', '')
 }

 I have different sized family genetic files and at the end of the day
 I want to see whether the alleles of each person in the family match
 the ref and/or the alt and if so, give AA, RA or RR.

 Like so:

 REF ALT Sample_1 GT_1 Sample_2 GT_2
 A G A/A RR A/G RA
 T G G/G AA T/T RR
 A T T/T AA A/A RR
 G A G/A RA G/G RR
 G A G/G RR G/A RA
 T C C/C AA C/C AA
 T C C/C AA C/C AA
 C T C/T RA T/T AA
 G A A/A AA A/A AA
 T G T/G RA G/G AA


 Is there an easy way to do this?

 Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to check to see if a variable is within a range of another variable

2014-10-01 Thread Kate Ignatius
Is there an easy way to check whether a variable is within  +/- 10%
range of another variable in R?

Say, if I have a variable 'A', whether its in +/- 10% range of
variable 'B' and if so, create another variable 'C' to say whether it
is or not?

Is there a function that is able to do that?

eventual outcome:
A B C
67 76 no
24 23 yes
40 45 yes
10 12 yes
70 72 yes
101 90 no
9 12 no

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to check to see if a variable is within a range of another variable

2014-10-01 Thread Kate Ignatius
Apologise - yes, my 10% calculations seem to be slightly off.

However, the function gives me all falses which seems to be a little
weird.   Even where both columns equal each other.   Should that be
right?

In essence I want to check whether A and B equal other give or take 10%.

On Wed, Oct 1, 2014 at 6:54 PM, Peter Alspach
peter.alsp...@plantandfood.co.nz wrote:
 Tena koe Kate

 If kateDF is a data.frame with your data, then

 apply(kateDF, 1, function(x) isTRUE(all.equal(x[2], x[1], check.attributes = 
 FALSE, tolerance=0.1)))

 comes close to (what I think) you want (but not to what you have illustrated 
 in your 'eventual outcome').  Anyhow, it may be enough to allow you to get 
 there.

 HTH 

 Peter Alspach

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of Kate Ignatius
 Sent: Thursday, 2 October 2014 11:11 a.m.
 To: r-help
 Subject: [R] How to check to see if a variable is within a range of another 
 variable

 Is there an easy way to check whether a variable is within  +/- 10% range of 
 another variable in R?

 Say, if I have a variable 'A', whether its in +/- 10% range of variable 'B' 
 and if so, create another variable 'C' to say whether it is or not?

 Is there a function that is able to do that?

 eventual outcome:
 A B C
 67 76 no
 24 23 yes
 40 45 yes
 10 12 yes
 70 72 yes
 101 90 no
 9 12 no

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 The contents of this e-mail are confidential and may be subject to legal 
 privilege.
  If you are not the intended recipient you must not use, disseminate, 
 distribute or
  reproduce all or any part of this e-mail or attachments.  If you have 
 received this
  e-mail in error, please notify the sender and delete all material pertaining 
 to this
  e-mail.  Any opinion or views expressed in this e-mail are those of the 
 individual
  sender and may not represent those of The New Zealand Institute for Plant and
  Food Research Limited.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] if else statement in loop

2014-09-29 Thread Kate Ignatius
Ooops,

I edited the code wrong to make it more easier for interpretation and
got X and Y's mixed up.  Try this:

for(i in length(1:(nrow(X{
Y$IID1new - ifelse((as.character(Y[,2]) == as.character(X[,i]) 
Y$IID1new != ''), as.character(as.matrix(X[,(nrow(X)+i)])),'')
}

The second should be like this:

Y$IID1new - ifelse((as.character(Y[,2]) == as.character(X[,1])),
as.character(as.matrix(X[,(nrow(X)+1)])),'')

for(i in length(2:(nrow(X{
ifelse((as.character(Y[,i]) == as.character(X[,i])),
Y$IID1new[is.na(Y$IID1new)] -
as.character(as.matrix(X[,(nrow(X)+i)])),'')
}

The reason why I'm selecting for number of rows seems a little odd
here I know but in real life this actually relies on a third data
frame, say Z, which for simplicity I didn't include here. But I only
want to start looking at the Nth column after twice as many rows in Z.
For instance, if Z has 4 rows, I want to  take values for IID1new
starting from column 9 in X to make IID1new in Y. Does that make
sense? Will this cause a problem?

So maybe it will probably be more like this if there were a Z

for(i in length(1:(2*nrow(Z{
Y$IID1new - ifelse((as.character(Y[,2]) == as.character(X[,i]) 
Y$IID1new != ''), as.character(as.matrix(X[,(2*nrow(Z)+i)])),'')
}

But essentially what I would like is this:

FID IID IID1new
FAM01 samas4 samas4_father
FAM01 samas5 samas5_mother
FAM01 samas6 samas6_sibling

I hope this is a little clearer...

Let me know if there are more errors.

K.

On Mon, Sep 29, 2014 at 2:39 AM, PIKAL Petr petr.pi...@precheza.cz wrote:
 Hi

 Please, be more clear in what do you want. I get many errors trying your code 
 and your explanation does not help much.

 for(i in length(1:(2*nrow(X{
 + Y$IID1new - ifelse((as.character(Y[,2]) == as.characterXl[,i])  
 X$IID1new != '') , as.character(as.matrix(X[,(2*nrow(X)+i)])),'')
 Error: unexpected ',' in:
 for(i in length(1:(2*nrow(X{
 Y$IID1new - ifelse((as.character(Y[,2]) == as.characterXl[,i])  
 X$IID1new != '') ,
 }
 Error: unexpected '}' in }
 for(i in length(1:(2*nrow(X{
 + Y$IID1new - ifelse((as.character(Y[,2]) == as.characterXl[,i]) 
 + X$IID1new != '') , as.character(as.matrix(X[,(2*nrow(X)+i)])),'')
 Error: unexpected ',' in:
 Y$IID1new - ifelse((as.character(Y[,2]) == as.characterXl[,i]) 
 X$IID1new != '') ,
 }


 Beside, this column X$IID1new != '' does not exist in X

 Here you clearly ask for nonexistent column, and why the heck you want to 
 select column by number of rows?

 as.character(as.matrix(X[,(2*nrow(X)+1)]))
 Error in `[.data.frame`(X, , (2 * nrow(X) + 1)) :
   undefined columns selected

 So based on your toy data frames, what shall be the result after your 
 computation.

 Regards
 Petr


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Kate Ignatius
 Sent: Sunday, September 28, 2014 9:14 PM
 To: r-help
 Subject: [R] if else statement in loop

 I have two data frames

 For simplicity:

 X=

 V1 V2 V3  V4 V5 V6
 samas4 samas5 samas6 samas4_father samas5_mother samas6_sibling
 samas4 samas5 samas6 samas4_father samas5_mother samas6_sibling
 samas4 samas5 samas6 samas4_father samas5_mother samas6_sibling

 Y=

 FID IID
 FAM01 samas4
 FAM01 samas5
 FAM01 samas6

 I want to set to create a new IID in Y using V4 V5 V6 in X using an
 ifelse statement in a loop.  I've used something like the following
 (after figuring out my factor problem):

 for(i in length(1:(2*nrow(X{
 Y$IID1new - ifelse((as.character(Y[,2]) == as.characterXl[,i]) 
 X$IID1new != '') , as.character(as.matrix(X[,(2*nrow(X)+i)])),'')
 }

 But of course this tends to overwrite.

 Is there an easy way to set up a loop to replace missing values? This
 didn't work either but not sure if its as easy as this:

 Y$IID1new - ifelse((as.character(Y[,2]) == as.characterXl[,i]) 
 X$IID1new != '') , as.character(as.matrix(X[,(2*nrow(X)+i)])),'')

 for(i in length(2:(2*nrow(X{
 ifelse((as.character(Y[,i]) == as.character(Xl[,i])),
 X[is.na(X$IID1new)] - as.character(as.matrix(X[(2*nrow(X)+i)])),'')
 }

 Thanks!

 K.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou 
 určeny pouze jeho adresátům.
 Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
 jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
 svého systému.
 Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
 jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
 Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
 zpožděním přenosu e-mailu.

 V případě, že je tento e-mail součástí obchodního jednání

Re: [R] Ifelse statement on a factor level data frame

2014-09-28 Thread Kate Ignatius
Strange that,

I did put everything with as.character but all I got was the same...

class of dbpmn[,2]) = factor
class of dbpmn[,21]  = factor
class of  dbpmn[,20] = data.frame

This has to be a problem ???

I can put reproducible output here but not sure if this going to of
help here. I think its all about factors and data frames and
characters...

K.

On Sun, Sep 28, 2014 at 1:15 AM, Jim Lemon j...@bitwrit.com.au wrote:
 On Sun, 28 Sep 2014 12:49:41 AM Kate Ignatius wrote:
 Quick question:

 I am running the following code on some variables that are factors:

 dbpmn$IID1new - ifelse(as.character(dbpmn[,2]) ==
 as.character(dbpmn[,(21)]), dbpmn[,20], '')

 Instead of returning some value it gives me this:

 c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1))

 Playing around with the code, gives me some kind of variation to it.
 Is there some way to get me what I want.  The variable that its
 suppose to give back is a bunch of sampleIDs.

 Hi Kate,
 If I create a little example:

 dbpmn-data.frame(V1=factor(sample(LETTERS[1:4],20,TRUE)),
   V2=factor(sample(LETTERS[1:4],20,TRUE)),
   V3=factor(sample(LETTERS[1:4],20,TRUE)))
 dbpmn[4]-
  ifelse(as.character(dbpmn[,1]) == as.character(dbpmn[,(2)]),
  dbpmn[,3],)
 dbpmn
V1 V2 V3 V4
 1   B  D  C
 2   C  A  D
 3   C  B  A
 4   A  B  C
 5   B  D  B
 6   D  D  A  1
 7   D  D  D  4
 8   B  C  A
 9   B  D  B
 10  D  C  A
 11  A  D  C
 12  A  C  B
 13  A  A  A  1
 14  D  C  A
 15  C  D  B
 16  A  A  B  2
 17  A  C  C
 18  B  B  C  3
 19  C  C  C  3
 20  D  D  D  4

 I get what I expect, the numeric value of the third element in dbpmn
 where the first two elements are equal. I think what you want is:

 dbpmn[4]-
  ifelse(as.character(dbpmn[,1]) == as.character(dbpmn[,(2)]),
  as.character(dbpmn[,3]),)
 dbpmn
V1 V2 V3 V4
 1   B  D  C
 2   C  A  D
 3   C  B  A
 4   A  B  C
 5   B  D  B
 6   D  D  A  A
 7   D  D  D  D
 8   B  C  A
 9   B  D  B
 10  D  C  A
 11  A  D  C
 12  A  C  B
 13  A  A  A  A
 14  D  C  A
 15  C  D  B
 16  A  A  B  B
 17  A  C  C
 18  B  B  C  C
 19  C  C  C  C
 20  D  D  D  D

 Jim


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Ifelse statement on a factor level data frame

2014-09-28 Thread Kate Ignatius
Apologies - you're right.  Missed it in the pdf.

K.

On Sun, Sep 28, 2014 at 10:22 AM, Bert Gunter gunter.ber...@gene.com wrote:
 Inline.

 Bert Gunter
 Genentech Nonclinical Biostatistics
 (650) 467-7374

 Data is not information. Information is not knowledge. And knowledge
 is certainly not wisdom.
 Clifford Stoll




 On Sun, Sep 28, 2014 at 6:38 AM, Kate Ignatius kate.ignat...@gmail.com 
 wrote:
 Strange that,

 I did put everything with as.character but all I got was the same...

 class of dbpmn[,2]) = factor
 class of dbpmn[,21]  = factor
 class of  dbpmn[,20] = data.frame

 This has to be a problem ???

 Indeed -- your failure to read documentation.

 I suggest you do your due diligence, read Pat Burns's link, and follow
 the advice given you by posting a reproducible example. More than
 likely the last will be unnecessary as you will figure it out in the
 course of doing what you should do.

 Cheers,
 Bert


 I can put reproducible output here but not sure if this going to of
 help here. I think its all about factors and data frames and
 characters...

 K.

 On Sun, Sep 28, 2014 at 1:15 AM, Jim Lemon j...@bitwrit.com.au wrote:
 On Sun, 28 Sep 2014 12:49:41 AM Kate Ignatius wrote:
 Quick question:

 I am running the following code on some variables that are factors:

 dbpmn$IID1new - ifelse(as.character(dbpmn[,2]) ==
 as.character(dbpmn[,(21)]), dbpmn[,20], '')

 Instead of returning some value it gives me this:

 c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1))

 Playing around with the code, gives me some kind of variation to it.
 Is there some way to get me what I want.  The variable that its
 suppose to give back is a bunch of sampleIDs.

 Hi Kate,
 If I create a little example:

 dbpmn-data.frame(V1=factor(sample(LETTERS[1:4],20,TRUE)),
   V2=factor(sample(LETTERS[1:4],20,TRUE)),
   V3=factor(sample(LETTERS[1:4],20,TRUE)))
 dbpmn[4]-
  ifelse(as.character(dbpmn[,1]) == as.character(dbpmn[,(2)]),
  dbpmn[,3],)
 dbpmn
V1 V2 V3 V4
 1   B  D  C
 2   C  A  D
 3   C  B  A
 4   A  B  C
 5   B  D  B
 6   D  D  A  1
 7   D  D  D  4
 8   B  C  A
 9   B  D  B
 10  D  C  A
 11  A  D  C
 12  A  C  B
 13  A  A  A  1
 14  D  C  A
 15  C  D  B
 16  A  A  B  2
 17  A  C  C
 18  B  B  C  3
 19  C  C  C  3
 20  D  D  D  4

 I get what I expect, the numeric value of the third element in dbpmn
 where the first two elements are equal. I think what you want is:

 dbpmn[4]-
  ifelse(as.character(dbpmn[,1]) == as.character(dbpmn[,(2)]),
  as.character(dbpmn[,3]),)
 dbpmn
V1 V2 V3 V4
 1   B  D  C
 2   C  A  D
 3   C  B  A
 4   A  B  C
 5   B  D  B
 6   D  D  A  A
 7   D  D  D  D
 8   B  C  A
 9   B  D  B
 10  D  C  A
 11  A  D  C
 12  A  C  B
 13  A  A  A  A
 14  D  C  A
 15  C  D  B
 16  A  A  B  B
 17  A  C  C
 18  B  B  C  C
 19  C  C  C  C
 20  D  D  D  D

 Jim


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] if else statement in loop

2014-09-28 Thread Kate Ignatius
I have two data frames

For simplicity:

X=

V1 V2 V3  V4 V5 V6
samas4 samas5 samas6 samas4_father samas5_mother samas6_sibling
samas4 samas5 samas6 samas4_father samas5_mother samas6_sibling
samas4 samas5 samas6 samas4_father samas5_mother samas6_sibling

Y=

FID IID
FAM01 samas4
FAM01 samas5
FAM01 samas6

I want to set to create a new IID in Y using V4 V5 V6 in X using an
ifelse statement in a loop.  I've used something like the following
(after figuring out my factor problem):

for(i in length(1:(2*nrow(X{
Y$IID1new - ifelse((as.character(Y[,2]) == as.characterXl[,i]) 
X$IID1new != '') , as.character(as.matrix(X[,(2*nrow(X)+i)])),'')
}

But of course this tends to overwrite.

Is there an easy way to set up a loop to replace missing values? This
didn't work either but not sure if its as easy as this:

Y$IID1new - ifelse((as.character(Y[,2]) == as.characterXl[,i]) 
X$IID1new != '') , as.character(as.matrix(X[,(2*nrow(X)+i)])),'')

for(i in length(2:(2*nrow(X{
ifelse((as.character(Y[,i]) == as.character(Xl[,i])),
X[is.na(X$IID1new)] - as.character(as.matrix(X[(2*nrow(X)+i)])),'')
}

Thanks!

K.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Ifelse statement on a factor level data frame

2014-09-27 Thread Kate Ignatius
Quick question:

I am running the following code on some variables that are factors:

dbpmn$IID1new - ifelse(as.character(dbpmn[,2]) ==
as.character(dbpmn[,(21)]), dbpmn[,20], '')

Instead of returning some value it gives me this:

c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1))

Playing around with the code, gives me some kind of variation to it.
Is there some way to get me what I want.  The variable that its
suppose to give back is a bunch of sampleIDs.

Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot2/heat map/duplicated level problem

2014-08-17 Thread Kate Ignatius
Hi,

I hope I can explain my problem clearly

I have a plink output file that I want to graph a heat map of the
PI_HAT estimates.  I have the following code that I has worked in the
past but this time I'm getting the error:

In `levels-`(`*tmp*`, value = if (nl == nL) as.character(labels) else
paste0(labels,  :
  duplicated levels in factors are deprecated

My code:

require(ggplot2)

image = (p - ggplot(db, aes(IID1, IID2)) +
   geom_tile(aes(fill = PI_HAT), colour = white) +
   scale_fill_gradient2(low = blue, high = red) +
   labs(x = Individual 1,y = Individual 2)
   opts(axis.text.x = theme_text(angle=90)) +
   opts(title=,legend.position = right))

I'm trying to figure out whether this is a problem with duplicated
PI-HAT estimates or duplicated ID pairings (though the latter
shouldn't be the case as I've used similar files in the past).

What else could be the problem?

P.S.  My file is quite large (300K lines) so its pretty hard to
decipher the problem off the bat but the usual plink output file for
this type of file has the heading:

FID1   IID1FID2   IID2 RTEZ  Z0  Z1  Z2  PI_HAT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] data.table/ifelse conditional new variable question

2014-08-16 Thread Kate Ignatius
Hi,

I have a data.table question (as well as if else statement query).

I have a large list of families (file has 935 individuals that are
sorted by famiy of varying sizes).  At the moment the file has the
columns:

SampleID FamilyID Relationship

To prevent from having to make a pedigree file by hand - ie adding a
PaternalID and a MaternalID one by one I want to try write a script
that will quickly do this for me  (I eventually want to run this
through a program such as plink)   Is there a way to use data.table
(maybe in conjucntion with ifelse to do this effectively)?

An example of the file is something like:

Family.ID Sample.ID Relationship
14   62  sibling
14  94  father
14   63  sibling
14   59 mother
17 6004  father
17   6003 mother
17 6005   sibling
17 368   sibling
130   202 mother
130   203  father
130   204   sibling
130   205   sibling
130   206   sibling
222 9 mother
222 45  sibling
222 34  sibling
222 10  sibling
222 11  sibling
222 18  father

But the goal is to have a file like this:

Family.ID Sample.ID Relationship PID MID
14   62  sibling 94 59
14  94  father 0 0
14   63  sibling 94 59
14   59 mother 0 0
17 6004  father 0 0
17   6003 mother 0 0
17 6005   sibling 6004 6003
17 368   sibling 6004 6003
130   202 mother 0 0
130   203  father 0 0
130   204   sibling 203 202
130   205   sibling 203 202
130   206   sibling 203 202
222 9 mother 0 0
222 45  sibling 18 9
222 34  sibling 18 9
222 10  sibling 18 9
222 11  sibling 18 9
222 18  father 0 0

I've tried searches for this but with no luck.  Greatly appreciate any
help - even if its just a link to a great example/solution!

Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data.table/ifelse conditional new variable question

2014-08-16 Thread Kate Ignatius
Thanks!

I think I know what is being done here but not sure how to fix the
following error:

Error in l$PID[l$\Relationship == sibling] - l$Sample.ID[father] :
  replacement has length zero



On Sat, Aug 16, 2014 at 6:48 PM, Jorge I Velez jorgeivanve...@gmail.com wrote:
 Dear Kate,

 Assuming you have nuclear families, one option would be:

 x - read.table(textConnection(Family.ID Sample.ID Relationship
 14   62  sibling
 14  94  father
 14   63  sibling
 14   59 mother
 17 6004  father
 17   6003 mother
 17 6005   sibling
 17 368   sibling
 130   202 mother
 130   203  father
 130   204   sibling
 130   205   sibling
 130   206   sibling
 222 9 mother
 222 45  sibling
 222 34  sibling
 222 10  sibling
 222 11  sibling
 222 18  father), header = TRUE)
 closeAllConnections()

 xs - with(x, split(x, Family.ID))
 res - do.call(rbind, lapply(xs, function(l){
 l$PID - l$MID - 0
 father - with(l, Relationship == 'father')
 mother - with(l, Relationship == 'mother')
 l$PID[l$Relationship == 'sibling'] - l$Sample.ID[father]
 l$MID[l$Relationship == 'sibling'] - l$Sample.ID[mother]
 l
 }))
 res

 HTH,
 Jorge.-


 Best regards,
 Jorge.-



 On Sun, Aug 17, 2014 at 5:42 AM, Kate Ignatius kate.ignat...@gmail.com
 wrote:

 Hi,

 I have a data.table question (as well as if else statement query).

 I have a large list of families (file has 935 individuals that are
 sorted by famiy of varying sizes).  At the moment the file has the
 columns:

 SampleID FamilyID Relationship

 To prevent from having to make a pedigree file by hand - ie adding a
 PaternalID and a MaternalID one by one I want to try write a script
 that will quickly do this for me  (I eventually want to run this
 through a program such as plink)   Is there a way to use data.table
 (maybe in conjucntion with ifelse to do this effectively)?

 An example of the file is something like:

 Family.ID Sample.ID Relationship
 14   62  sibling
 14  94  father
 14   63  sibling
 14   59 mother
 17 6004  father
 17   6003 mother
 17 6005   sibling
 17 368   sibling
 130   202 mother
 130   203  father
 130   204   sibling
 130   205   sibling
 130   206   sibling
 222 9 mother
 222 45  sibling
 222 34  sibling
 222 10  sibling
 222 11  sibling
 222 18  father

 But the goal is to have a file like this:

 Family.ID Sample.ID Relationship PID MID
 14   62  sibling 94 59
 14  94  father 0 0
 14   63  sibling 94 59
 14   59 mother 0 0
 17 6004  father 0 0
 17   6003 mother 0 0
 17 6005   sibling 6004 6003
 17 368   sibling 6004 6003
 130   202 mother 0 0
 130   203  father 0 0
 130   204   sibling 203 202
 130   205   sibling 203 202
 130   206   sibling 203 202
 222 9 mother 0 0
 222 45  sibling 18 9
 222 34  sibling 18 9
 222 10  sibling 18 9
 222 11  sibling 18 9
 222 18  father 0 0

 I've tried searches for this but with no luck.  Greatly appreciate any
 help - even if its just a link to a great example/solution!

 Thanks!

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data.table/ifelse conditional new variable question

2014-08-16 Thread Kate Ignatius
Actually - I didn't check this before, but these are not all nuclear
families (as I assumed they were).  That is, some don't have a father
or don't have a mother Usually if this is the case PID or MID will
become 0, respectively, for the child.  How can the code be edit to
account for this?

On Sat, Aug 16, 2014 at 8:02 PM, Kate Ignatius kate.ignat...@gmail.com wrote:
 Thanks!

 I think I know what is being done here but not sure how to fix the
 following error:

 Error in l$PID[l$\Relationship == sibling] - l$Sample.ID[father] :
   replacement has length zero



 On Sat, Aug 16, 2014 at 6:48 PM, Jorge I Velez jorgeivanve...@gmail.com 
 wrote:
 Dear Kate,

 Assuming you have nuclear families, one option would be:

 x - read.table(textConnection(Family.ID Sample.ID Relationship
 14   62  sibling
 14  94  father
 14   63  sibling
 14   59 mother
 17 6004  father
 17   6003 mother
 17 6005   sibling
 17 368   sibling
 130   202 mother
 130   203  father
 130   204   sibling
 130   205   sibling
 130   206   sibling
 222 9 mother
 222 45  sibling
 222 34  sibling
 222 10  sibling
 222 11  sibling
 222 18  father), header = TRUE)
 closeAllConnections()

 xs - with(x, split(x, Family.ID))
 res - do.call(rbind, lapply(xs, function(l){
 l$PID - l$MID - 0
 father - with(l, Relationship == 'father')
 mother - with(l, Relationship == 'mother')
 l$PID[l$Relationship == 'sibling'] - l$Sample.ID[father]
 l$MID[l$Relationship == 'sibling'] - l$Sample.ID[mother]
 l
 }))
 res

 HTH,
 Jorge.-


 Best regards,
 Jorge.-



 On Sun, Aug 17, 2014 at 5:42 AM, Kate Ignatius kate.ignat...@gmail.com
 wrote:

 Hi,

 I have a data.table question (as well as if else statement query).

 I have a large list of families (file has 935 individuals that are
 sorted by famiy of varying sizes).  At the moment the file has the
 columns:

 SampleID FamilyID Relationship

 To prevent from having to make a pedigree file by hand - ie adding a
 PaternalID and a MaternalID one by one I want to try write a script
 that will quickly do this for me  (I eventually want to run this
 through a program such as plink)   Is there a way to use data.table
 (maybe in conjucntion with ifelse to do this effectively)?

 An example of the file is something like:

 Family.ID Sample.ID Relationship
 14   62  sibling
 14  94  father
 14   63  sibling
 14   59 mother
 17 6004  father
 17   6003 mother
 17 6005   sibling
 17 368   sibling
 130   202 mother
 130   203  father
 130   204   sibling
 130   205   sibling
 130   206   sibling
 222 9 mother
 222 45  sibling
 222 34  sibling
 222 10  sibling
 222 11  sibling
 222 18  father

 But the goal is to have a file like this:

 Family.ID Sample.ID Relationship PID MID
 14   62  sibling 94 59
 14  94  father 0 0
 14   63  sibling 94 59
 14   59 mother 0 0
 17 6004  father 0 0
 17   6003 mother 0 0
 17 6005   sibling 6004 6003
 17 368   sibling 6004 6003
 130   202 mother 0 0
 130   203  father 0 0
 130   204   sibling 203 202
 130   205   sibling 203 202
 130   206   sibling 203 202
 222 9 mother 0 0
 222 45  sibling 18 9
 222 34  sibling 18 9
 222 10  sibling 18 9
 222 11  sibling 18 9
 222 18  father 0 0

 I've tried searches for this but with no luck.  Greatly appreciate any
 help - even if its just a link to a great example/solution!

 Thanks!

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data.table/ifelse conditional new variable question

2014-08-16 Thread Kate Ignatius
Yep - you're right - missing parents are indicated as zero in the M/PID field.

The above code worked with a few errors:

1: In l$PID[l$Relationship == sibling] - l$Sample.ID[father] :
  number of items to replace is not a multiple of replacement length
2: In l$PID[l$Relationship == sibling] - l$Sample.ID[father] :
  number of items to replace is not a multiple of replacement length
3: In l$PID[l$Relationship == sibling] - l$Sample.ID[father] :
  number of items to replace is not a multiple of replacement length
4: In l$MID[l$Relationship == sibling] - l$Sample.ID[mother] :
  number of items to replace is not a multiple of replacement length

looking at the output I get numbers where the father/mother ID should
be in the M/PID field.  For example:

2702  349   mother   0   0
2702  3456  sibling   0 842
2702  9980  sibling   0 842
3064  3  father   0   0
3064  4  mother   0   0
3064  5sibling 879 880
3064  86   sibling 879 880
3064  87   sibling 879 880

On Sat, Aug 16, 2014 at 9:31 PM, Jorge I Velez jorgeivanve...@gmail.com wrote:
 Dear Kate,

 Try this:

 res - do.call(rbind, lapply(xs, function(l){
 l$PID - l$MID - 0
 father - with(l, Relationship == 'father')
 mother - with(l, Relationship == 'mother')
 if(sum(father) == 0)
 l$PID[l$Relationship == 'sibling'] - 0
 else l$PID[l$Relationship == 'sibling'] - l$Sample.ID[father]
 if(sum(mother) == 0)
 l$MID[l$Relationship == 'sibling'] - 0
 else l$MID[l$Relationship == 'sibling'] - l$Sample.ID[mother]
 l
 }))

 It is assumed that when either parent is not available the M/PID is 0.

 Best,
 Jorge.-


 On Sun, Aug 17, 2014 at 10:58 AM, Kate Ignatius kate.ignat...@gmail.com
 wrote:

 Actually - I didn't check this before, but these are not all nuclear
 families (as I assumed they were).  That is, some don't have a father
 or don't have a mother Usually if this is the case PID or MID will
 become 0, respectively, for the child.  How can the code be edit to
 account for this?

 On Sat, Aug 16, 2014 at 8:02 PM, Kate Ignatius kate.ignat...@gmail.com
 wrote:
  Thanks!
 
  I think I know what is being done here but not sure how to fix the
  following error:
 
  Error in l$PID[l$\Relationship == sibling] - l$Sample.ID[father] :
replacement has length zero
 
 
 
  On Sat, Aug 16, 2014 at 6:48 PM, Jorge I Velez
  jorgeivanve...@gmail.com wrote:
  Dear Kate,
 
  Assuming you have nuclear families, one option would be:
 
  x - read.table(textConnection(Family.ID Sample.ID Relationship
  14   62  sibling
  14  94  father
  14   63  sibling
  14   59 mother
  17 6004  father
  17   6003 mother
  17 6005   sibling
  17 368   sibling
  130   202 mother
  130   203  father
  130   204   sibling
  130   205   sibling
  130   206   sibling
  222 9 mother
  222 45  sibling
  222 34  sibling
  222 10  sibling
  222 11  sibling
  222 18  father), header = TRUE)
  closeAllConnections()
 
  xs - with(x, split(x, Family.ID))
  res - do.call(rbind, lapply(xs, function(l){
  l$PID - l$MID - 0
  father - with(l, Relationship == 'father')
  mother - with(l, Relationship == 'mother')
  l$PID[l$Relationship == 'sibling'] - l$Sample.ID[father]
  l$MID[l$Relationship == 'sibling'] - l$Sample.ID[mother]
  l
  }))
  res
 
  HTH,
  Jorge.-
 
 
  Best regards,
  Jorge.-
 
 
 
  On Sun, Aug 17, 2014 at 5:42 AM, Kate Ignatius
  kate.ignat...@gmail.com
  wrote:
 
  Hi,
 
  I have a data.table question (as well as if else statement query).
 
  I have a large list of families (file has 935 individuals that are
  sorted by famiy of varying sizes).  At the moment the file has the
  columns:
 
  SampleID FamilyID Relationship
 
  To prevent from having to make a pedigree file by hand - ie adding a
  PaternalID and a MaternalID one by one I want to try write a script
  that will quickly do this for me  (I eventually want to run this
  through a program such as plink)   Is there a way to use data.table
  (maybe in conjucntion with ifelse to do this effectively)?
 
  An example of the file is something like:
 
  Family.ID Sample.ID Relationship
  14   62  sibling
  14  94  father
  14   63  sibling
  14   59 mother
  17 6004  father
  17   6003 mother
  17 6005   sibling
  17 368   sibling
  130   202 mother
  130   203  father
  130   204   sibling
  130   205   sibling
  130   206   sibling
  222 9 mother
  222 45  sibling
  222 34  sibling
  222 10  sibling
  222 11  sibling
  222 18  father
 
  But the goal is to have a file like this:
 
  Family.ID Sample.ID Relationship PID MID
  14   62  sibling 94 59
  14  94  father 0 0
  14   63  sibling 94 59
  14   59 mother 0 0
  17 6004  father 0 0
  17   6003 mother 0 0
  17 6005

Re: [R] data.table/ifelse conditional new variable question

2014-08-16 Thread Kate Ignatius
Actually - your code is not wrong... because this is a large file I
went through the file to see if there was anything wrong with it -
looks like there are two fathers or three mothers in some families.
Taking these duplicates out fixed the problem.

Sorry about the confusion!  And thanks so much for your help!

On Sat, Aug 16, 2014 at 9:53 PM, Jorge I Velez jorgeivanve...@gmail.com wrote:
 Perhaps I am missing something but I do not get the same result:

 x - read.table(textConnection(Family.ID Sample.ID Relationship
 2702  349   mother
 2702  3456  sibling
 2702  9980  sibling
 3064  3  father
 3064  4  mother
 3064  5sibling
 3064  86   sibling
 3064  87   sibling), header = TRUE)
 closeAllConnections()

 xs - with(x, split(x, Family.ID))
 res - do.call(rbind, lapply(xs, function(l){
 l$PID - l$MID - 0
 father - with(l, Relationship == 'father')
 mother - with(l, Relationship == 'mother')
 if(sum(father) == 0)
 l$PID[l$Relationship == 'sibling'] - 0
 else l$PID[l$Relationship == 'sibling'] - l$Sample.ID[father]
 if(sum(mother) == 0)
 l$MID[l$Relationship == 'sibling'] - 0
 else l$MID[l$Relationship == 'sibling'] - l$Sample.ID[mother]
 l
 }))
 #Family.ID Sample.ID Relationship MID PID
 #2702.1  2702   349   mother   0   0
 #2702.2  2702  3456  sibling 349   0
 #2702.3  2702  9980  sibling 349   0
 #3064.4  3064 3   father   0   0
 #3064.5  3064 4   mother   0   0
 #3064.6  3064 5  sibling   4   3
 #3064.7  306486  sibling   4   3
 #3064.8  306487  sibling   4   3

 HTH,
 Jorge.-




 On Sun, Aug 17, 2014 at 11:47 AM, Kate Ignatius kate.ignat...@gmail.com
 wrote:

 Yep - you're right - missing parents are indicated as zero in the M/PID
 field.

 The above code worked with a few errors:

 1: In l$PID[l$Relationship == sibling] - l$Sample.ID[father] :
   number of items to replace is not a multiple of replacement length
 2: In l$PID[l$Relationship == sibling] - l$Sample.ID[father] :
   number of items to replace is not a multiple of replacement length
 3: In l$PID[l$Relationship == sibling] - l$Sample.ID[father] :
   number of items to replace is not a multiple of replacement length
 4: In l$MID[l$Relationship == sibling] - l$Sample.ID[mother] :
   number of items to replace is not a multiple of replacement length

 looking at the output I get numbers where the father/mother ID should
 be in the M/PID field.  For example:

 2702  349   mother   0   0
 2702  3456  sibling   0 842
 2702  9980  sibling   0 842
 3064  3  father   0   0
 3064  4  mother   0   0
 3064  5sibling 879 880
 3064  86   sibling 879 880
 3064  87   sibling 879 880

 On Sat, Aug 16, 2014 at 9:31 PM, Jorge I Velez jorgeivanve...@gmail.com
 wrote:
  Dear Kate,
 
  Try this:
 
  res - do.call(rbind, lapply(xs, function(l){
  l$PID - l$MID - 0
  father - with(l, Relationship == 'father')
  mother - with(l, Relationship == 'mother')
  if(sum(father) == 0)
  l$PID[l$Relationship == 'sibling'] - 0
  else l$PID[l$Relationship == 'sibling'] - l$Sample.ID[father]
  if(sum(mother) == 0)
  l$MID[l$Relationship == 'sibling'] - 0
  else l$MID[l$Relationship == 'sibling'] - l$Sample.ID[mother]
  l
  }))
 
  It is assumed that when either parent is not available the M/PID is 0.
 
  Best,
  Jorge.-
 
 
  On Sun, Aug 17, 2014 at 10:58 AM, Kate Ignatius
  kate.ignat...@gmail.com
  wrote:
 
  Actually - I didn't check this before, but these are not all nuclear
  families (as I assumed they were).  That is, some don't have a father
  or don't have a mother Usually if this is the case PID or MID will
  become 0, respectively, for the child.  How can the code be edit to
  account for this?
 
  On Sat, Aug 16, 2014 at 8:02 PM, Kate Ignatius
  kate.ignat...@gmail.com
  wrote:
   Thanks!
  
   I think I know what is being done here but not sure how to fix the
   following error:
  
   Error in l$PID[l$\Relationship == sibling] - l$Sample.ID[father] :
 replacement has length zero
  
  
  
   On Sat, Aug 16, 2014 at 6:48 PM, Jorge I Velez
   jorgeivanve...@gmail.com wrote:
   Dear Kate,
  
   Assuming you have nuclear families, one option would be:
  
   x - read.table(textConnection(Family.ID Sample.ID Relationship
   14   62  sibling
   14  94  father
   14   63  sibling
   14   59 mother
   17 6004  father
   17   6003 mother
   17 6005   sibling
   17 368   sibling
   130   202 mother
   130   203  father
   130   204   sibling
   130   205   sibling
   130   206   sibling
   222 9 mother
   222 45  sibling
   222 34  sibling
   222 10  sibling
   222 11  sibling
   222 18  father), header = TRUE)
   closeAllConnections()
  
   xs - with(x, split(x, Family.ID))
   res - do.call(rbind, lapply(xs, function(l){
   l$PID - l$MID - 0
   father - with(l, Relationship == 'father

[R] counting the number of rows that satisfy a certain criteria

2014-06-21 Thread Kate Ignatius
I have 4 columns, and about 300K plus rows with 0s and 1s.

I'm trying to count how many rows satisfy a certain criteria... for
instance, how many rows are there that have the first column == 1 as
well as the second column == 1.

I've tried using rowSums and colSums but it keeps giving me this type of error:

Error in rowSums(X[1] == 1  X[2] == 1) :
  'x' must be an array of at least two dimensions

Thanks in advance!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] counting the number of rows that satisfy a certain criteria

2014-06-21 Thread Kate Ignatius
Thanks!

On Sat, Jun 21, 2014 at 11:05 AM, Jorge I Velez
jorgeivanve...@gmail.com wrote:
 Hi Kate,

 You could try

 sum(X[, 1] == 1   X[, 2] == 1)

 where X is your data set.

 HTH,
 Jorge.-



 On Sun, Jun 22, 2014 at 12:57 AM, Kate Ignatius kate.ignat...@gmail.com
 wrote:

 I have 4 columns, and about 300K plus rows with 0s and 1s.

 I'm trying to count how many rows satisfy a certain criteria... for
 instance, how many rows are there that have the first column == 1 as
 well as the second column == 1.

 I've tried using rowSums and colSums but it keeps giving me this type of
 error:

 Error in rowSums(X[1] == 1  X[2] == 1) :
   'x' must be an array of at least two dimensions

 Thanks in advance!

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Layout of two graphs on a page...

2014-06-21 Thread Kate Ignatius
I'm trying to have a layout of two graphs on a page... this has worked
before... but I changed up the way I do my venn diagrams so now
instead of the Venn Diagram being at the bottom of the page below the
bar/line graph it takes up the whole page and its overlays the
bar/line graph placed on the top half...

Here is my code:

layout(matrix(c(1,2),2,1,byrow=TRUE),widths=c(1,1),heights=c(2,2))
oldmar - par(mar)
par(oma=c(0,2,0,2),mar=c(5.1,4.1,4.1,3.1))

my_tcks-pretty(c(0,max(counts)),6)

b - barplot(counts,col='purple',axes=F,border=FALSE,cex.names = 0.75,
las=2, ylim=c(0,my_tcks[length(my_tcks)]))
axis(2,at=my_tcks, labels=format(my_tcks, scientific = FALSE),
cex.axis=0.75, las=2)
mtext(,side=2,line=4,cex=1)

par(new=TRUE)
barplot(rep(NA,4),ylim=c(0,(max(ratio)+1)),axes=FALSE)
axis(4, cex.axis=0.75, las=2)
mtext(,side=4,line=2,cex=1)
lines(b, ratio,col=black,lwd=2)

par(mar=oldmar)
par(new=FALSE)

library(VennDiagram)
draw.quad.venn(area1=area1,area2=area2,area3=area3,area4=area4,n12=n12,n13=n13,n14=n14,n23=n23,n24=n24,n34=n34,n123=n123,n124=n124,n134=n134,n234=n234,n1234=n1234,
  category=c(A,B,C,D),fill=c(white,white,white,white),
  alpha=c(0.2,0.2,0.2,0.2), euler.d=FALSE, scaled=FALSE,
cex=2, cat.cex=1.5, main=)
dev.off()

I've changed around the oma and mar settings so much now that I'm a
tad confused and probably over looking something really obvious.

Thanks in advance...

P.S.  Let me know if more details are required (I can substitute some
numbers here if it helps plot some graphs)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error in merge [negative length vectors are not allowed]

2014-06-16 Thread Kate Ignatius
Hi All,

I'm trying to merge two files together using:

combinedfiles - merge(comb1,comb2,by=c(Place,Stall,Menu))

comb1 is about 2 million + rows (158MB) and comb2 is about 600K+ rows (52MB).

When I try to merge using the above syntax I get the error:

Error in merge.data.frame(comb1, comb2, by = c(Place,Stall,Menu)) :
  negative length vectors are not allowed

Is there is something that I'm doing wrong?  I've merged larger files
together in the past without a problem so am curious what might be the
problem here...

Thanks in advance!

~K

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using reduce to merge multiple files

2014-06-12 Thread Kate Ignatius
I have a list of files that I have called like so:

main_dir - '/path/to/files/'
directories - list.files(main_dir, pattern = '[[:alnum:]]', full.names=T)

filenames - list.files(file.path(directories,/tmpdir/),  pattern =
'[[:alnum:][:punct:]]_eat.txt+$', recursive = TRUE, full.names=T)

This lists around 35 Files.  Each has multiple columns but they all
have three columns in common: Burger, Stall and Cost which I want to
merge on using:

m1 - Reduce(function(a, b) { merge(a, b,
by=c(Burger,Stall,Cost)) }, filenames)

However, I get the error:

Error in fix.by(by.x, x) : 'by' must specify uniquely valid columns

Is there something that I have obviously overlooked here?

Thanks in advance!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Adding segments to a dot plot in ggplot2

2014-05-22 Thread Kate Ignatius
I'm trying to plot a GWAS (in you will) with lined segments
representing an overall p-value for each gene.  Here is my code:

skatg - ggplot(comm, aes(x = position,y = p, colour = grey)) +
 geom_point(size = 0.75) +
  geom_segment(data=rare, aes(x = txStart, y =
-log10(p), xend=txEnd,  yend = -log10(p), colour = darkgreen))   +
  labs(x = Position,y = -log10 P value) +
  facet_wrap(~ Chrom, scales = free, ncol = 4)

Where comm is a file with 250k+ variants and genes.in.locus is a file
with about 18k genes.

When running this script, I get the error

Don't know how to automatically pick scale for object of type
function. Defaulting to continuous
Error in data.frame(x = c(40840353L, 31902418L, 19468080L, 236748505L,  :
  arguments imply differing number of rows: 79746, 0

Is this because there are different number of rows in each data frame
I'm trying to plot?  If so, what is a best way to overcome this error?

Example of my data is as follows:

comm:

 Namegene Chrom   position p
1  rs1037FAM114A1 4  38924330 0.7513597
2  rs1250  CC2D2A 4  15482477 0.9202882
4  rs1911   USP38 4 144136193 0.8335902
5 rs10001  STXBP219   7711221 0.4709547
7  rs10001370   USP46 4  53463730 0.8759828
8   rs1000152  ZNF462 9 109687288 0.3451001
10 rs10002583POLN 4   2194953 0.7878575
12 rs10002971 EGF 4 110896050 0.5082255
15 rs10003873  SORBS2 4 186605868 0.2309855
16 rs10003909ARHGAP24 4  86915848 0.8714853
17 rs10003947   ANXA3 4  79512800 0.5141532
18rs10004SSR1 6   7310259 0.6851725
20 rs10004136   STX18 4   4463587 0.5296092
21 rs10004516   ENPEP 4 111398208 0.8564897
22  rs1000521  SLC8A314  70522484 0.6234326
23 rs10005849   DCHS2 4 155287317 0.8192577
24 rs10006362   RGS12 4   3319271 0.8061674
25  rs1000640WWP26  69905668 0.2682735
26 rs10006580  PCDH18 4 138449812 0.5178650
27 rs10006676   CYTL1 4   5021086 0.3531493
28 rs10006845   PCDH7 4  31116375 0.4817453
29 rs10007075   NEIL3 4 178274694 0.5433481
31 rs10008636 TMPRSS11BNL 4  69083563 0.8346434
32 rs10008910UBA6 4  68500171 0.5705853
33 rs10009228  CHRNA9 4  40356422 0.4223378


rare:

   geneName txStart  txEnd Chromposition p
36131YTHDC16026  45746 4   6026 0.5009490
10898   FAM110C   38813  46588 19  38813 1.000
37306ZNF595   53178  88099 4  53178 0.1261045
16450   KIR2DL4   57208  6812319  57208 0.156
28406SCAND3   61610  77316 6  61610 0.2568
19926   MPG  127017 1358506 127017 00.000987456
34149TRIM27  174179 195169 6 174179 0.025698

I haven't included all information here.

Any help will be greatly appreciated.

Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Mean of colMeans

2014-05-21 Thread Kate Ignatius
Hi All,

I've successfully gotten out the colMeans for 60 columns using:

col - colMeans(x, na.rm = TRUE, dims = 1)

My next question is: is there a way of getting a mean of all the
column means (ie a mean of a mean)?

Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mean of colMeans

2014-05-21 Thread Kate Ignatius
Thanks for the explanation.  And tip... this was a quick a dirty code
so didn't really think about naming something that is already a
function in R.  Data was generic - just a bunch of columns with
numbers so didn't bother including that as I know that wasn't the
problem. Same goes with replying - automatically went to reply, will
remember to reply-all.

On Wed, May 21, 2014 at 3:11 PM, Sarah Goslee sarah.gos...@gmail.com wrote:
 That would be because col is a function in base R, and thus a poor
 choice of names for user objects. Nonetheless, it worked when I ran
 it, but you didn't provide reproducible example so who knows.

 R set.seed(1)
 R x - data.frame(matrix(runif(150), ncol=10))
 R # col is a function, so not a good name
 R col - colMeans(x)
 R mean(col)
 [1] 0.5119

 It's polite to include the list on your reply.

 Sarah

 On Wed, May 21, 2014 at 2:50 PM, Kate Ignatius kate.ignat...@gmail.com 
 wrote:
 That didn't work: gave me the error =

 [1] NA
 Warning message:
 In mean.default(col) : argument is not numeric or logical: returning NA

 But writing it like: mean(colMeans(x, na.rm = TRUE, dims = 1)), worked

 Thanks!

 On Wed, May 21, 2014 at 2:31 PM, Sarah Goslee sarah.gos...@gmail.com wrote:
 Is

 mean(col)

 not what you're looking for?

 Sarah

 On Wed, May 21, 2014 at 2:26 PM, Kate Ignatius kate.ignat...@gmail.com 
 wrote:
 Hi All,

 I've successfully gotten out the colMeans for 60 columns using:

 col - colMeans(x, na.rm = TRUE, dims = 1)

 My next question is: is there a way of getting a mean of all the
 column means (ie a mean of a mean)?

 Thanks!


 --
 Sarah Goslee
 http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Colour of geom_hline is not correct in legend

2014-04-06 Thread Kate Ignatius
I've used geom_point and geom_hline in ggplot2 and have gotten
satisfactory legends for both.  However, I have one black line and one
blue line in the figure but in the legend they are both black - how
can I correct this in the legend to be the right colors?

mcgc - ggplot(sam, aes(x = m,y = ad, colour = X)) +
  geom_point(size = 0.75) +
  scale_colour_gradient2(high=red, mid=green,
limits=c(0,1), guide = colourbar) +
  geom_hline(aes(yintercept = mad, linetype =
mad), colour = blue, size=0.75, show_guide = TRUE) +
  geom_hline(aes(yintercept = mmad, linetype =
mmad), colour = black, size=0.75, show_guide = TRUE)  +
  facet_wrap(~ Plan, scales = free, ncol = 4) +
  scale_linetype_manual(name = Plan of Health
Care, values = c(mad = 1, mmad = 1),guide = legend)

I'm sure I've over written something here... just not sure where (am
new to ggplot)

Data:

Plan  ad X   m  mad  mmad
1  1 95 0.323000 0.400303 0.12
1  2 275 0.341818 0.400303 0.12
1  3  2 0.618000 0.400303 0.12
1  4 75 0.32 0.400303 0.12
1  5 13 0.399000 0.400303 0.12
1  6 20 0.40 0.400303 0.12
2  1 219 0.393000 0.353350 0.45
2  2 50 0.06 0.353350 0.45
2  3 213 0.39 0.353350 0.45
2  4 204 0.496100 0.353350 0.45
2  5 19 0.393000 0.353350 0.45
2  6 201 0.388000 0.353350 0.45

Plan goes up to 40, but I've only included a snippet of data here...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Manipulating x axis using scale_x_continuous (but a factor is used). Is there a work around?

2014-04-06 Thread Kate Ignatius
My code that I've used is:

mcgc - ggplot(sam, aes(x = person,y = m, colour = X)) +
  geom_point(size = 0.75) +
  scale_colour_gradient2(high=red, mid=green,
limits=c(0,1), guide = colourbar) +
  geom_hline(aes(yintercept = mad, linetype = mad),
colour = blue, size=0.75, show_guide = TRUE) +
  geom_hline(aes(yintercept = mmad, linetype =
mmad), colour = black, size=0.75, show_guide = TRUE)  +
  facet_wrap(~ Plan, scales = free, ncol = 4) +
  scale_linetype_manual(name = Plan of Health Care,
values = c(mad = 1, mmad = 1),guide = legend)

For this data:

Data:

Plan  Person X   m  mad  mmad
1  1 95 0.323000 0.400303 0.12
1  2 275 0.341818 0.400303 0.12
1  3  2 0.618000 0.400303 0.12
1  4 75 0.32 0.400303 0.12
1  5 13 0.399000 0.400303 0.12
1  6 20 0.40 0.400303 0.12
2  7 219 0.393000 0.353350 0.45
2  8 50 0.06 0.353350 0.45
2  9 213 0.39 0.353350 0.45
2  15 204 0.496100 0.353350 0.45
2  19 19 0.393000 0.353350 0.45
2  24 201 0.388000 0.353350 0.45
3  30 219 0.567 0.1254 0.89
3  14 50 0.679 0.1254 0.89
3  55 213 0.1234 0.1254 0.89
3  18 204 0.6135 0.1254 0.89
3  59 19 0.39356 0.1254 0.89
3  101 201 0.300 0.1254 0.89

I'm trying to manipulate the x axis using the following, only because
the data can get very large and there is just way too many Persons to
fit on the x-axis and I need to reduce it so its legible:

scale_x_continuous(breaks = c(min(person), median(person),
max(person)), labels = c(min(person), median(person), max(person)))

However, given that I had to change `person` into a factor to order
the data properly, the above code does not work.  I get the errors,
depending on how I fiddle around with the code:

Error: Discrete value supplied to continuous scale
Error in Summary.factor(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L,  :
  min not meaningful for factors

Changing `person` to numeric does not work, as the accumulated
`person` for the entire dataset will then be on each Plan figure
panel, as opposed to the scale specific for each Plan. That is, the
x-axis for each panel (Plan) should have a scale beginning from its
lowest Person to its highest Person (ie Plan 1 should have an x-axis
that goes from 1 to 6 but Plan 3 has one that goes from 14 to 101).
Changing the Person to numeric, the x-axis for all panels starts at 1
and goes to 101.

Is there a work around for this?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Setting alternative x-axis breaks using gglpot2

2014-04-05 Thread Kate Ignatius
I'm not doing a Manhattan plot, but plotting AD (coloured by DP) along
the genome:

 points - ggplot(sam,aes(x = midpoint,y = ad, colour = dp, size = 3)) +
  geom_point() +
  scale_y_continuous(breaks=c(0,20,30,40)) +
  labs(x = chr,y = ad) +
  scale_colour_gradient2(high=red, mid=green)

However, instead of having the BP position along the bottom, I was
wondering whether its possible to have the chromosome instead.  Is
there an easier way to do this?

I'm also trying to reduce the size of the points on the Manhattan
plot but changing the size in the code does not work.

Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Recoding in R conditioned on a certain value.

2014-04-05 Thread Kate Ignatius
I'm trying to work out the average of a certain value by chromosome.
I've done the following, but it doesn't seem to work:

Say, I want to find average AD for chromosome 1 only and paste the
value next to all the positions on chromosome 1:

sam$mmad[sam$chrom == '1'] -
(sam$ad)/(colSums(sam[c(1:nrow(sam$chrom=='1'))],))

I know this is convoluted and possible wrong... but I would like to do
this for all chromosomes.

Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.