Re: [R] Help with simple as.POSIXlt or strptime

2013-09-28 Thread Jim Lemon

On 09/29/2013 01:27 AM, Leopoldo Catania wrote:

Hi,
I really don't know what is wrong with my code, I have a character object
and I need to have a POSIXlt object; my code is:

date="Mon, 23 Sep 2013 06:45:05 GMT"
as.POSIXlt(date,format="%a, %d %b %Y %H:%M:%S %Z")

[1] NA
even with

strptime(date,"%a, %d %b %Y %H:%M:%S %Z")

[1] NA
Also if I remove "Mon," and "GMT"

date2="23 Sep 2013 06:45:05"
as.POSIXlt(date2,format="%d %b %Y %H:%M:%S")

[1] NA

strptime(date2,format="%d %b %Y %H:%M:%S")

[1] NA
If I try to run the last code line in ?strptime the result is:
## An RFC 822 header (Eastern Canada, during DST)
strptime("Tue, 23 Mar 2010 14:36:38 -0400",  "%a, %d %b %Y %H:%M:%S %z")
[1] NA


Hi Leopoldo,
The %Z is only included in the format string for output. Try these:

strptime(date,"%a, %d %b %Y %H:%M:%S")
[1] "2013-09-23 06:45:05"
strptime(date,"%a, %d %b %Y %H:%M:%S",tz="GMT")
[1] "2013-09-23 06:45:05 GMT"

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help about a R command

2013-09-28 Thread Duncan Murdoch

On 13-09-28 11:18 AM, Srinivas Sridhara wrote:

Hi,

I was trying to get an answer to this issue:

bookRatingData <- read.table(file.choose(),header=TRUE,nrows=1048570)

Warning message:
In read.table(file.choose(), header = TRUE, nrows = 1048570) :
   incomplete final line found by readTableHeader on
'C:\Users\srinivas\Downloads\BX-Book-Ratings (2).xlsx'



I tried opening the data file (BX-Book-Ratings (2).xlsx) and added a new
line and then saved it. However, that didn't fix the incomplete final line
issue. I still have the same problem. Any suggestions?


That's just a warning.  It says that the last line in your file doesn't 
have a line ending (LF, or CR/LF).  On Unix that often signals a 
problem.  On Windows it's pretty common.


Duncan Murdoch



Thanks,
Srinivas



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error: C stack usage is too close to the limit when using list.files()

2013-09-28 Thread William Dunlap
The issue is not symbolic links per se, but ones that form loops.
Note that you can detect such loops by running 'find -L ...' and 
looking for the error messages.  (find by default does not follow
any symbolic links, which can be a problem also.)

It is a shortcoming of the current version of list.files().

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -Original Message-
> From: jgrn...@gmail.com [mailto:jgrn...@gmail.com] On Behalf Of Jonathan
> Greenberg
> Sent: Saturday, September 28, 2013 10:51 AM
> To: William Dunlap
> Cc: r-help
> Subject: Re: [R] Error: C stack usage is too close to the limit when using 
> list.files()
> 
> Thanks all -- ok, so the symbolic link issue is a distinct
> possibility, but fundamentally doesn't solve the issue since most
> users will have symbolic links on their machines SOMEPLACE, so a full
> drive scan will run into these issues --  is list.files calling find,
> or is it using a different algorithm?  This seems like a shortcoming
> in the list.files algorithm -- is there a better solution (short of a
> System call, which I'm still not sure will work on Macs without Xcode
> -- a colleague of mine did NOT have Xcode, and reported not being able
> to run find from the command line) -- perhaps a different package?
> 
> --j
> 
> On Fri, Sep 27, 2013 at 3:08 PM, William Dunlap  wrote:
> > Toss a couple of extra files in there and you will see the output grow 
> > exponentially.
> >
> > % touch dir/IMPORTANT_1 dir/subdir/IMPORTANT_2
> >
> > and in R those two new files cause 82 more strings to appear in list.file's 
> > output:
> >
> >> nchar(list.files("dir", recursive=TRUE))
> >  [1]  11  18  33  40  55  62  77  84  99 106 121 128 143 150 165 172 187 
> > 194 209
> > [20] 216 231 238 253 260 275 282 297 304 319 326 341 348 363 370 385 392 
> > 407 414
> > [39] 429 436 451 458 473 480 495 502 517 524 539 546 561 568 583 590 605 
> > 612 627
> > [58] 634 649 656 671 678 693 700 715 722 737 744 759 766 781 788 803 810 
> > 825 832
> > [77] 847 854 869 876 891 898 901
> >
> > 'find', by default, does not following symbolic links.
> >
> > % find dir
> > dir
> > dir/subdir
> > dir/subdir/IMPORTANT_2
> > dir/subdir/linkToUpperDir
> > dir/IMPORTANT_1
> >
> > The -L option makes it follow them, but it won't follow loops:
> >
> > % find -L dir
> > dir
> > dir/subdir
> > dir/subdir/IMPORTANT_2
> > find: File system loop detected; `dir/subdir/linkToUpperDir' is part of the 
> > same file
> system loop as `dir'.
> > dir/IMPORTANT_1
> >
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com
> >
> >
> >> -Original Message-
> >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
> Behalf
> >> Of William Dunlap
> >> Sent: Friday, September 27, 2013 12:56 PM
> >> To: Jonathan Greenberg; r-help
> >> Subject: Re: [R] Error: C stack usage is too close to the limit when using 
> >> list.files()
> >>
> >> Do you have some symbolic links that make loops in your file system?
> >> list.files() has problems with such loops and find does not.  E.g.,  on a 
> >> Linux box:
> >>
> >> % cd /tmp
> >> % mkdir dir dir/subdir
> >> % cd dir/subdir
> >> % ln -s ../../dir linkToUpperDir
> >> % cd /tmp
> >> % R --quiet
> >> > list.files("dir", recursive=TRUE, full=TRUE)
> >> [1]
> >>
> "dir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToU
> >>
> pperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkT
> >>
> oUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/li
> >>
> nkToUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdi
> >>
> r/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/su
> >>
> bdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir
> >>
> /subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToUpper
> >>
> Dir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToUp
> >>
> perDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkTo
> >>
> UpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/lin
> >> kToUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir"
> >> > system("find dir")
> >> dir
> >> dir/subdir
> >> dir/subdir/linkToUpperDir
> >>
> >> Bill Dunlap
> >> Spotfire, TIBCO Software
> >> wdunlap tibco.com
> >>
> >>
> >> > -Original Message-
> >> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] 
> >> > On
> Behalf
> >> > Of Jonathan Greenberg
> >> > Sent: Friday, September 27, 2013 12:13 PM
> >> > To: r-help
> >> > Subject: [R] Error: C stack usage is too close to the limit when using 
> >> > list.files()
> >> >
> >> > R-helpers:
> >> >
> >> > I'm running a file search on my entire drive (Mac OS X) using:
> >> >
> >> > files_found <-
> list.files(dir="/",pattern=somepattern,recursive=TRUE,full.names=TRUE)
> >> > w

Re: [R] Strange result from single [] extract operator

2013-09-28 Thread William Dunlap
> First, I tried
>  library(datasets)
> > data<-airquality
> > data[nrow(data)-1:nrow(data),]
> 
> and received 152 rows sorted desc. Could you explain why it worked this way? 
> I changed
> the extract line to:
> data[(nrow(data)-1):nrow(data),]
> 
> and then I received what I wanted but still am curious about the performance 
> of my
> previous code.

Back up a little and compare
nrow(data)-1:nrow(data)
to
(nrow(data)-1):nrow(data)
or compare
   3-1:3
to
   (3-1):3

Then look at the help file for "Syntax".

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf
> Of rafae...@poczta.fm
> Sent: Saturday, September 28, 2013 8:18 AM
> To: r-help@r-project.org
> Subject: [R] Strange result from single [] extract operator
> 
> Hi All,
> 
> I am using Rx64 3.0.1 on Windows 7 x64, and wanted to get two last rows from 
> dataset.
> First, I tried
>  library(datasets)
> > data<-airquality
> > data[nrow(data)-1:nrow(data),]
> 
> and received 152 rows sorted desc. Could you explain why it worked this way? 
> I changed
> the extract line to:
> data[(nrow(data)-1):nrow(data),]
> 
> and then I received what I wanted but still am curious about the performance 
> of my
> previous code.
> 
> Yours faithfully,
> Rafał
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Command line r

2013-09-28 Thread Tomek R
Hi,
I have found myself often doing simple statistical analysis using Linux command 
line on a single dataset. Therefore, I put a perl script together, which makes 
it 
easier:https://github.com/religa/statshttps://github.com/religa/stats/blob/master/r
The idea behind simpleR is that it becomes a standard part of any Linux pipe. 
For example, to get a summary of your data, one would have to type: 'r summary 
file.txt'; to plot it: 'r -p file.txt'; to calculate sum of numbers from 1 to 
100: 'seq 1 100 | r sum -'; 
I tried to test it under Linux / Mac, including various types of pipes and 
redirects. I am looking forward to your comments on how to improve it. 
Tomek

  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help about a R command

2013-09-28 Thread Srinivas Sridhara
Hi,

I was trying to get an answer to this issue:

bookRatingData <- read.table(file.choose(),header=TRUE,nrows=1048570)

Warning message:
In read.table(file.choose(), header = TRUE, nrows = 1048570) :
  incomplete final line found by readTableHeader on
'C:\Users\srinivas\Downloads\BX-Book-Ratings (2).xlsx'



I tried opening the data file (BX-Book-Ratings (2).xlsx) and added a new
line and then saved it. However, that didn't fix the incomplete final line
issue. I still have the same problem. Any suggestions?

Thanks,
Srinivas

-- 
Srinivas Sridhara
Westford, MA 01886
Tel: (978) 467-6857

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help with simple as.POSIXlt or strptime

2013-09-28 Thread Leopoldo Catania
Hi,
I really don't know what is wrong with my code, I have a character object
and I need to have a POSIXlt object; my code is:
>date="Mon, 23 Sep 2013 06:45:05 GMT"
>as.POSIXlt(date,format="%a, %d %b %Y %H:%M:%S %Z")
[1] NA
even with
>strptime(date,"%a, %d %b %Y %H:%M:%S %Z")
[1] NA
Also if I remove "Mon," and "GMT"
>date2="23 Sep 2013 06:45:05"
>as.POSIXlt(date2,format="%d %b %Y %H:%M:%S")
[1] NA
>strptime(date2,format="%d %b %Y %H:%M:%S")
[1] NA
If I try to run the last code line in ?strptime the result is:
## An RFC 822 header (Eastern Canada, during DST)
strptime("Tue, 23 Mar 2010 14:36:38 -0400",  "%a, %d %b %Y %H:%M:%S %z")
[1] NA

R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)

I'm using RStudio.

Regards,
Leopoldo.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Strange result from single [] extract operator

2013-09-28 Thread rafael . 7
Hi All,

I am using Rx64 3.0.1 on Windows 7 x64, and wanted to get two last rows from 
dataset. First, I tried
 library(datasets)
> data<-airquality
> data[nrow(data)-1:nrow(data),]

and received 152 rows sorted desc. Could you explain why it worked this way? I 
changed the extract line to:
data[(nrow(data)-1):nrow(data),]

and then I received what I wanted but still am curious about the performance of 
my previous code.

Yours faithfully,
Rafał 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error: C stack usage is too close to the limit when using list.files()

2013-09-28 Thread Berend Hasselman

On 28-09-2013, at 19:51, Jonathan Greenberg  wrote:

> Thanks all -- ok, so the symbolic link issue is a distinct
> possibility, but fundamentally doesn't solve the issue since most
> users will have symbolic links on their machines SOMEPLACE, so a full
> drive scan will run into these issues --  is list.files calling find,
> or is it using a different algorithm?  This seems like a shortcoming
> in the list.files algorithm -- is there a better solution (short of a
> System call, which I'm still not sure will work on Macs without Xcode
> -- a colleague of mine did NOT have Xcode, and reported not being able
> to run find from the command line) -- perhaps a different package?
> 

Since Mac OS X Tiger at least  there is a find utility.
By doing a search here http://www.opensource.apple.com you can see that there 
is a find utility since 10.0
(to be found here 
http://www.opensource.apple.com/source/shell_cmds/shell_cmds-17.1/ ).
 
There must be something wrong with your colleague's setup.

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error: C stack usage is too close to the limit when using list.files()

2013-09-28 Thread Jonathan Greenberg
Thanks all -- ok, so the symbolic link issue is a distinct
possibility, but fundamentally doesn't solve the issue since most
users will have symbolic links on their machines SOMEPLACE, so a full
drive scan will run into these issues --  is list.files calling find,
or is it using a different algorithm?  This seems like a shortcoming
in the list.files algorithm -- is there a better solution (short of a
System call, which I'm still not sure will work on Macs without Xcode
-- a colleague of mine did NOT have Xcode, and reported not being able
to run find from the command line) -- perhaps a different package?

--j

On Fri, Sep 27, 2013 at 3:08 PM, William Dunlap  wrote:
> Toss a couple of extra files in there and you will see the output grow 
> exponentially.
>
> % touch dir/IMPORTANT_1 dir/subdir/IMPORTANT_2
>
> and in R those two new files cause 82 more strings to appear in list.file's 
> output:
>
>> nchar(list.files("dir", recursive=TRUE))
>  [1]  11  18  33  40  55  62  77  84  99 106 121 128 143 150 165 172 187 194 
> 209
> [20] 216 231 238 253 260 275 282 297 304 319 326 341 348 363 370 385 392 407 
> 414
> [39] 429 436 451 458 473 480 495 502 517 524 539 546 561 568 583 590 605 612 
> 627
> [58] 634 649 656 671 678 693 700 715 722 737 744 759 766 781 788 803 810 825 
> 832
> [77] 847 854 869 876 891 898 901
>
> 'find', by default, does not following symbolic links.
>
> % find dir
> dir
> dir/subdir
> dir/subdir/IMPORTANT_2
> dir/subdir/linkToUpperDir
> dir/IMPORTANT_1
>
> The -L option makes it follow them, but it won't follow loops:
>
> % find -L dir
> dir
> dir/subdir
> dir/subdir/IMPORTANT_2
> find: File system loop detected; `dir/subdir/linkToUpperDir' is part of the 
> same file system loop as `dir'.
> dir/IMPORTANT_1
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
>> -Original Message-
>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
>> Behalf
>> Of William Dunlap
>> Sent: Friday, September 27, 2013 12:56 PM
>> To: Jonathan Greenberg; r-help
>> Subject: Re: [R] Error: C stack usage is too close to the limit when using 
>> list.files()
>>
>> Do you have some symbolic links that make loops in your file system?
>> list.files() has problems with such loops and find does not.  E.g.,  on a 
>> Linux box:
>>
>> % cd /tmp
>> % mkdir dir dir/subdir
>> % cd dir/subdir
>> % ln -s ../../dir linkToUpperDir
>> % cd /tmp
>> % R --quiet
>> > list.files("dir", recursive=TRUE, full=TRUE)
>> [1]
>> "dir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToU
>> pperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkT
>> oUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/li
>> nkToUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdi
>> r/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/su
>> bdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir
>> /subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToUpper
>> Dir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToUp
>> perDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkTo
>> UpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir/subdir/lin
>> kToUpperDir/subdir/linkToUpperDir/subdir/linkToUpperDir"
>> > system("find dir")
>> dir
>> dir/subdir
>> dir/subdir/linkToUpperDir
>>
>> Bill Dunlap
>> Spotfire, TIBCO Software
>> wdunlap tibco.com
>>
>>
>> > -Original Message-
>> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] 
>> > On Behalf
>> > Of Jonathan Greenberg
>> > Sent: Friday, September 27, 2013 12:13 PM
>> > To: r-help
>> > Subject: [R] Error: C stack usage is too close to the limit when using 
>> > list.files()
>> >
>> > R-helpers:
>> >
>> > I'm running a file search on my entire drive (Mac OS X) using:
>> >
>> > files_found <- 
>> > list.files(dir="/",pattern=somepattern,recursive=TRUE,full.names=TRUE)
>> > where somepattern is a search pattern (which I have confirmed via a
>> > unix "find / -name somepattern" only returns ~ 3 results).
>> >
>> > I keep getting an error:
>> >
>> > Error: C stack usage is too close to the limit
>> >
>> > when running this command.  Any ideas on 1) how to fix this or 2) if
>> > there is an alternative to using list.files() to accomplish this
>> > search without resorting to an external package?
>> >
>> > Cheers!
>> >
>> > --jonathan
>> >
>> >
>> > --
>> > Jonathan A. Greenberg, PhD
>> > Assistant Professor
>> > Global Environmental Analysis and Remote Sensing (GEARS) Laboratory
>> > Department of Geography and Geographic Information Science
>> > University of Illinois at Urbana-Champaign
>> > 259 Computing Applications Building, MC-150
>> > 605 East Springfield Avenue
>> > Champaign, IL  61820-6371
>> > Phone: 217-300-1924
>> > http://www.geog.illinois.edu/~jgr

Re: [R] makeCluster help needed

2013-09-28 Thread Uwe Ligges

Can you please upgrade R to R-3.0.2 and use the parallel package?
And can you please explain why you want to start the workers manually? 
I'd be happy to look into the details if you can reproduce the problem 
with a recent version of R and the parallel package.


Best,
Uwe Ligges





On 28.09.2013 03:20, Jeffrey Flint wrote:

This is in regards to the SNOW library.

I'm using Windows.  The problem is that makeSOCKcluster hangs in R as well
as the DOS command line.  Below I've shown that it completes the Rscript
until it reaches the line "slaveLoop(master)" , at which point it hangs.

=

In R:


cl <-

makeSOCKcluster(names=c("localhost","localhost"),manual=T,outfile="jeff.log")
Manually start worker on localhost with
  C:/PROGRA~1/R/R-214~1.2/bin/Rscript.exe "C:/Program
Files/R/R-2.14.2/library/snow/RSOCKnode.R" MASTER=localhost PORT=11590
OUT=jeff.log SNOWLIB=C:/Program Files/R/R-2.14.2/library
[HANGS]


On the DOS Command Line:

C:\Documents and Settings\Jeff>C:/PROGRA~1/R/R-214~1.2/bin/Rscript.exe
"C:/Program Files/R/R-2.14.2/library/snow/RSOCKno
de.R" MASTER=localhost PORT=11590 OUT=jeff.log SNOWLIB=C:/Program
Files/R/R-2.14.2/library
[HANGS]
^C
C:\Documents and Settings\Jeff>type jeff.log
starting worker for localhost:11590




In the file RSOCKnode.R, stalls after last line, after executing
"slaveLoop(master)".




local({
 master <- "localhost"
 port <- "8765"
 snowlib <- Sys.getenv("R_SNOW_LIB")
 outfile <- Sys.getenv("R_SNOW_OUTFILE")

 args <- commandArgs()
 pos <- match("--args", args)
 args <- args[-(1 : pos)]
 for (a in args) {
 pos <- regexpr("=", a)
 name <- substr(a, 1, pos - 1)
 value <- substr(a,pos + 1, nchar(a))
 switch(name,
MASTER = master <- value,
PORT = port <- value,
SNOWLIB = snowlib <- value,
OUT = outfile <- value,
RANK = rank <- value,
TMPWS = tmpWsName <- value)
 }
 ## these should be passed as arguments to makeNWSmaster
 Sys.setenv(MASTER = master)
 Sys.setenv(PORT = port)
 Sys.setenv(RANK = rank)
 Sys.setenv(TMPWS = tmpWsName)

 if (! (snowlib %in% .libPaths()))
 .libPaths(c(snowlib, .libPaths()))
 library(methods) ## because Rscript as of R 2.7.0 doesn't load methods
 library(nws)
 library(snow)

 sinkWorkerOutput(outfile)
 master <- makeNWSmaster()
 sendData(master, "ping")
 cat("starting NWS worker\n")
 slaveLoop(master)
})

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help with automation of WinBUGS from R

2013-09-28 Thread Anamika Chaudhuri
Hi:
I have a question related to the R Code which calls BUGS. I have run the
model in WinBUGS and it runs fine giving me the expected results.  Below is
the automation code used when I had single outcome or univariate data for
Y’s. Now I want to use it for multiple outcomes. So the trial.data below
had one subscript earlier and now its two. There are 3 simulations for
testing.Not sure where to add the subscript in the code so that the same
process can be repeated for 2 outcomes instead of one.
library("R2WinBUGS")

> #Set working directory
> setwd("C://Tina/USB_Backup_042213/Testing")
> trial.data <- read.table("MVN_Test.csv", header=T)
> trial.data
  X.V11.V21.V31.V41.V51.V12.V22.V32.V42.V52
11,11,6,8,5,25,13,1,13,8,22
2 2,9,1,7,9,25,13,1,18,9,12
3   3,6,4,9,10,26,15,1,14,11,12
>
> p1_true<- read.table("p1.csv",header=F)
> p1_true
  V1
1 0.15171877
2 0.07220343
3 0.13898233
4 0.13129042
5 0.43438334
>
> p2_true<-read.table("p2.csv",header=F)
> p2_true
  V1
1 0.20562945
2 0.04400286
3 0.25019831
4 0.18854418
5 0.22946289
> bugs.output <- list()

for(i in 1:3){
   Y <- as.integer(trial.data[i,])
   bugs.output[[i]] <- bugs(
   data=list(Y=as.matrix(Y), Nf=20, n=60,), # how to pass the other
parameters like mn which is a vector, prec which is a matrix, R a matrix
here?,
 inits=list(

list(gamma=c(0,0),T=structure(.Data=c(0.5,0,0.5,0),.Dim=c(2,2)))
   ),
   model.file="M-LN_model_trial.txt",
   parameters.to.save = c("p","rho","sigma2"),
n.chains=1, n.iter=12000, n.burnin=5000,
bugs.directory="C://Tina/USB_Backup_042213/winbugs14/WinBUGS14",
working.directory=NULL)}
bugs.output[[i]]$sims.list$p_trans<-t(bugs.output[[i]]$sims.list$p)
bugs.output[[i]]$sims.list$coverage<-matrix(data=0,nrow=20,ncol=1)
bugs.output[[i]]$sims.list$int_width<-matrix(data=0,nrow=20,ncol=1)
bugs.output[[i]]$sims.list$sqr_err<-matrix(data=0,nrow=20,ncol=1)
bugs.output[[i]]$sims.list$abs_bias<-matrix(data=0,nrow=20,ncol=1)

bugs.output[[i]]$sims.list$p_true<-p_true
for (j in 1:5) # change for no of sites
{
if
# define coverage
((bugs.output[[i]]$sims.list$p_true[j,]>quantile(bugs.output[[i]]$sims.list$p_trans[j,],c(.025)))
&
(bugs.output[[i]]$sims.list$p_true[j,]__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What is a "good fit" Brier score and Harrel's C Index

2013-09-28 Thread David Winsemius


On Sep 28, 2013, at 8:14 AM, E Joffe wrote:



Hi all,

I am evaluating survival models using Brier score ("peperr") and  
Harrel's

C-Index ("Hmisc").


It's spelled 'Harrell'.


I am wondering:

1. What would be considered a "good fit" according to these scores  
(like the

heuristic levels we have for R square in linear regressions) ?

2. Are there any papers to cite on the matter (I couldn't find any) ?


Frank Harrell's excellent text "Regression Modeling Strategies" has an  
extensive discussion of "goodness of fit" and the principles of model  
comparison. It's both too involved as well as off-topic for Rhelp. The  
other text to consult is Steyerberg's "Clinical Prediction Models".




3. Is there any paper to cite that discusses the limitation of using
traditional reporting for model fit in survival analysis as opposed  
to these

measures  ?


I predict that the RMS bibliography would be an excellent place to  
start your search.


Despite getting his name attached to what he calls the 'c-index', I  
don't think one could call Frank Harrell a proponent of that measure  
or any of the "competitors". It's really just a dressed up/transformed  
AUC. The message I have taken from reading his book and listening to  
presentations is that one should apply biologic tests of sensibility  
as well as careful investigation of the functional relationships  
between candidate predictors and the outcomes of interest. He speaks  
very disparagingly about automatic procedures.


--

David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interperting results of glmnet and coxph plot, Brier score and Harrel's C-Index - am I doing something wrong ???

2013-09-28 Thread Frank Harrell
This entire procedure is not valid.  You cannot use a penalized method for
selecting variables then use an unpenalized procedure on those selected.

Frank

David Winsemius wrote
> On Sep 28, 2013, at 2:39 AM, E Joffe wrote:
> 
>> Hi all,
>>
>> I am using COX LASSO (glmnet / coxnet) regression to analyze a  
>> dataset of
>> 394 obs. / 268 vars.
>> I use the following procedure:
>> 1.   Construct a coxnet on the entire dataset (by cv.glmnet)
>> 2.   Pick the significant features by selecting the non-zero coefficient
>> under the best lambda selected by the model
>> 3.   Build a coxph model with bi-directional stepwise feature selection
>> limited to the coxnet selected features.
> 
> I was a bit puzzled by the third step. Once you had a reduced model  
> from glmnet, what was the statistical basis for further elimination of  
> variables?
> 
> (Quite apart from the statistical issues, I was rather surprised that  
> this procedure even produced results since the 'step' function is not  
> described in the 'stats' package as applying to 'coxph' model objects.)
> 
>>  
>> To validate the model I use both Brier score (library=peperr) and  
>> Harrel's [Harrell]
>> C-Index (library=Hmisc) with a bootstrap of 50 iterations.
>>
>>
>> MY QUESTION :  I am getting fairly good C-Index (0.76) and Brier  
>> (0.07)
>> values for the models however per the coxnet the %Dev explained by  
>> the model
>> is at best 0.27 and when I plot the survfit of the coxph the plotted
>> confidence interval is very large.
> 
> How many events did you have?  (The width of CI's is most importantly  
> dependent on event counts and not particularly improved by a high case  
> count. The power considerations are very similar to those of a  
> binomial test.)
> 
> 
>> What am I missing here ?
> 
> Perhaps sufficient events? (You also seem to be missing a description  
> of the study goals.)
> 
> 
> -- 
> David.
> 
>>
>> %DEV=27%
>>
>>
>>
>> Brier score - 0.07  ($ipec.coxglmnet -> [1] 7.24)
>> C-Index - 0.76 ($cIndex -> [1] 0.763)
>>
>>
>>
>> DATA: [Private Health Information - can't publish] 394 obs./268 vars.
>>
>> CODE (need to define a dataset with 'time' and 'status' variables):
>>
>> library("survival")
>> library ("glmnet")
>> library ("c060")
>> library ("peperr")
>> library ("Hmisc")
>>
>>#creat Y (survival matrix) for glmnet
>>surv_obj <- Surv(dataset$time,dataset$status)
>>
>>
>>## tranform categorical variables into binary variables with  
>> dummy for
>> dataset
>>predict_matrix <- model.matrix(~ ., data=dataset,
>>   contrasts.arg = lapply
>> (dataset[,sapply(dataset, is.factor)], contrasts))
>>
>>## remove the statu/time variables from the predictor matrix (x)  
>> for
>> glmnet
>>predict_matrix <- subset (predict_matrix, select=c(-time,-status))
>>
>>## create a glmnet cox object using lasso regularization and cross
>> validation
>>glmnet.cv <- cv.glmnet (predict_matrix, surv_obj, family="cox")
>>
>>## get the glmnet model on the full dataset
>>glmnet.obj <- glmnet.cv$glmnet.fit
>>
>># find lambda index for the models with least partial likelihood
>> deviance (by cv.glmnet)
>>optimal.lambda <- glmnet.cv$lambda.min# For a more parsimoneous
>> model use lambda.1se
>>lambda.index <- which(glmnet.obj$lambda==optimal.lambda)
>>
>>
>># take beta for optimal lambda
>>optimal.beta  <- glmnet.obj$beta[,lambda.index]
>>
>># find non zero beta coef
>>nonzero.coef <- abs(optimal.beta)>0
>>selectedBeta <- optimal.beta[nonzero.coef]
>>
>># take only covariates for which beta is not zero
>>selectedVar   <- predict_matrix[,nonzero.coef]
>>
>># create a dataframe for trainSet with time, status and selected
>> variables in binary representation for evaluation in pec
>>reformat_dataSet <- as.data.frame(cbind(surv_obj,selectedVar))
>>
>># glmnet.cox only with meaningful features selected by stepwise
>> bidirectional AIC feature selection
>>glmnet.cox.meaningful <- step(coxph(Surv(time,status) ~
>> .,data=reformat_dataSet),direction="both")
>>
>>
>>
>>
>> ##--
>> -
>>##MODEL PERFORMANCE
>>
>> ##--
>> -
>>##
>>
>>
>>## Calculate the Brier score - pec does its own bootstrap so this
>> function runs on i=51 (i.e., whole trainset)
>>
>>## Brier score calculation to cox-glmnet
>>peperr.coxglmnet <- peperr(response=surv_obj, x=selectedVarCox,
>>fit.fun=fit.coxph, load.all=TRUE,
>> 
>> indices=resample.indices(n=nrow(surv_obj),
>> method="boot", sample.n=50))
>>
>># Get error predictions from peperr
>>prederr.coxglmnet <- perr(peperr.coxglmnet)
>>
>># Integrated prediction 

Re: [R] Interperting results of glmnet and coxph plot, Brier score and Harrel's C-Index - am I doing something wrong ???

2013-09-28 Thread David Winsemius


On Sep 28, 2013, at 2:39 AM, E Joffe wrote:


Hi all,

I am using COX LASSO (glmnet / coxnet) regression to analyze a  
dataset of

394 obs. / 268 vars.
I use the following procedure:
1.  Construct a coxnet on the entire dataset (by cv.glmnet)
2.  Pick the significant features by selecting the non-zero coefficient
under the best lambda selected by the model
3.  Build a coxph model with bi-directional stepwise feature selection
limited to the coxnet selected features.


I was a bit puzzled by the third step. Once you had a reduced model  
from glmnet, what was the statistical basis for further elimination of  
variables?


(Quite apart from the statistical issues, I was rather surprised that  
this procedure even produced results since the 'step' function is not  
described in the 'stats' package as applying to 'coxph' model objects.)




To validate the model I use both Brier score (library=peperr) and  
Harrel's [Harrell]

C-Index (library=Hmisc) with a bootstrap of 50 iterations.


MY QUESTION :  I am getting fairly good C-Index (0.76) and Brier  
(0.07)
values for the models however per the coxnet the %Dev explained by  
the model

is at best 0.27 and when I plot the survfit of the coxph the plotted
confidence interval is very large.


How many events did you have?  (The width of CI's is most importantly  
dependent on event counts and not particularly improved by a high case  
count. The power considerations are very similar to those of a  
binomial test.)




What am I missing here ?


Perhaps sufficient events? (You also seem to be missing a description  
of the study goals.)



--
David.



%DEV=27%



Brier score - 0.07  ($ipec.coxglmnet -> [1] 7.24)
C-Index - 0.76 ($cIndex -> [1] 0.763)



DATA: [Private Health Information - can't publish] 394 obs./268 vars.

CODE (need to define a dataset with 'time' and 'status' variables):

library("survival")
library ("glmnet")
library ("c060")
library ("peperr")
library ("Hmisc")

   #creat Y (survival matrix) for glmnet
   surv_obj <- Surv(dataset$time,dataset$status)


   ## tranform categorical variables into binary variables with  
dummy for

dataset
   predict_matrix <- model.matrix(~ ., data=dataset,
  contrasts.arg = lapply
(dataset[,sapply(dataset, is.factor)], contrasts))

   ## remove the statu/time variables from the predictor matrix (x)  
for

glmnet
   predict_matrix <- subset (predict_matrix, select=c(-time,-status))

   ## create a glmnet cox object using lasso regularization and cross
validation
   glmnet.cv <- cv.glmnet (predict_matrix, surv_obj, family="cox")

   ## get the glmnet model on the full dataset
   glmnet.obj <- glmnet.cv$glmnet.fit

   # find lambda index for the models with least partial likelihood
deviance (by cv.glmnet)
   optimal.lambda <- glmnet.cv$lambda.min# For a more parsimoneous
model use lambda.1se
   lambda.index <- which(glmnet.obj$lambda==optimal.lambda)


   # take beta for optimal lambda
   optimal.beta  <- glmnet.obj$beta[,lambda.index]

   # find non zero beta coef
   nonzero.coef <- abs(optimal.beta)>0
   selectedBeta <- optimal.beta[nonzero.coef]

   # take only covariates for which beta is not zero
   selectedVar   <- predict_matrix[,nonzero.coef]

   # create a dataframe for trainSet with time, status and selected
variables in binary representation for evaluation in pec
   reformat_dataSet <- as.data.frame(cbind(surv_obj,selectedVar))

   # glmnet.cox only with meaningful features selected by stepwise
bidirectional AIC feature selection
   glmnet.cox.meaningful <- step(coxph(Surv(time,status) ~
.,data=reformat_dataSet),direction="both")




##--
-
   ##MODEL PERFORMANCE

##--
-
   ##


   ## Calculate the Brier score - pec does its own bootstrap so this
function runs on i=51 (i.e., whole trainset)

   ## Brier score calculation to cox-glmnet
   peperr.coxglmnet <- peperr(response=surv_obj, x=selectedVarCox,
   fit.fun=fit.coxph, load.all=TRUE,

indices=resample.indices(n=nrow(surv_obj),

method="boot", sample.n=50))

   # Get error predictions from peperr
   prederr.coxglmnet <- perr(peperr.coxglmnet)

   # Integrated prediction error Brier score calculation
   ipec.coxglmnet<-ipec(prederr.coxglmnet,
eval.times=peperr.coxglmnet$attribute, response=surv_obj)


 ## C-Index calculation 50 iter bootstrapping

 for (i in 1:50){
   print (paste("Iteration:",i))
   train <- sample(1:nrow(dataset), nrow(dataset), replace =  
TRUE) ##

random sampling with replacement
   # create a dataframe for trainSet with time, status and  
selected

variables in binary representation for evaluation in pec
   reformat_trainSet <- reformat_data

[R] What is a "good fit" Brier score and Harrel's C Index

2013-09-28 Thread E Joffe

Hi all, 

I am evaluating survival models using Brier score ("peperr") and Harrel's
C-Index ("Hmisc").
I am wondering:

1. What would be considered a "good fit" according to these scores (like the
heuristic levels we have for R square in linear regressions) ?

2. Are there any papers to cite on the matter (I couldn't find any) ?

3. Is there any paper to cite that discusses the limitation of using
traditional reporting for model fit in survival analysis as opposed to these
measures  ?

Thanks,
Erel

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Compare species presence and absence between sites

2013-09-28 Thread arun


I couldn't get the number you showed below each column number.  How did you get 
that one?

library(XLConnect)
wb<- loadWorkbook("is_matrix1.xls")
dataRM<- readWorksheet(wb,sheet="is_matrix",rownames=1)
mat1<- as.matrix(dataRM)
different <- function(x, y) x == 1 & y == 0

dat<- expand.grid(rownames(mat1),rownames(mat1),stringsAsFactors=FALSE)
dat1<- dat[!paste0(dat[,1],dat[,2])%in% paste0(rownames(mat1),rownames(mat1)),]
lst1<- lapply(seq_len(nrow(dat1)),function(i) {x<-mat1[unlist(dat1[i,]),]; 
which(different(x[1,],x[2,]))}) 
names(lst1)<- paste(dat1[,1],dat1[,2],sep="_")
  tail(lst1,1)
#$New_Guinea_Australia
#D2781 D2782 D2784 D2785 D2786 D2787 D2796 D0005 D0006 
 #   1 2 4 5 6 7 8 9    10    #indicates the column 
index


#compare that with:

 tail(lapply(seq_len(nrow(dat1)),function(i) {x<-mat1[unlist(dat1[i,]),];x}),1)
[[1]]
   D2781 D2782 D2783 D2784 D2785 D2786 D2787 D2796 D0005 D0006 D0008
New_Guinea 1 1 0 1 1 1 1 1 1 1 0
Australia  0 0 0 0 0 0 0 0 0 0 0
   D0009 D0010 D0011 D0012 D3394 D3395 D3396 D3397 D3398 D3399 D3400
New_Guinea 0 0 0 0 0 0 0 0 0 0 0
Australia  0 0 0 0 0 0 0 0 0 0 0
   D3401
New_Guinea 0
Australia  0



#If you wanted the actual columns that satify the condition:
lst2<- lapply(seq_len(nrow(dat1)),function(i) {x<-mat1[unlist(dat1[i,]),]; 
indx<- different(x[1,],x[2,]);x[,indx,drop=FALSE]})
 tail(lst2,1)
#[[1]]
 #  D2781 D2782 D2784 D2785 D2786 D2787 D2796 D0005 D0006
#New_Guinea 1 1 1 1 1 1 1 1 1
#Australia  0 0 0 0 0 0 0 0 0

A.K.







From: Elaine Kuo 
To: arun  
Sent: Saturday, September 28, 2013 12:23 AM
Subject: Re: [R] Compare species presence and absence between sites



Hello Arun, 

Thanks for the immediate response.
Please kindly find the attached file.
For example, under D2202, it is 3402, not 2202.
Please kindly advise if this is a misunderstanding.

Thanks

Elaine



On Sat, Sep 28, 2013 at 11:01 AM, arun  wrote:


>
>
>Hi,
>I didn't understand the problem.  Could you please specify what you meant by 
>the "species ID number and the number"?
>Tx
>
>
>From: Elaine Kuo 
>To: arun 
>Sent: Friday, September 27, 2013 10:51 PM
>
>Subject: Re: [R] Compare species presence and absence between sites
>
>
>
>
>sorry, 
>I meant that the species ID number and the number are not the same.
>Need more sleep :P
>
>Elaine 
>
>
>
>On Sat, Sep 28, 2013 at 10:50 AM, Elaine Kuo  wrote:
>
>Dear arun, 
>>
>>
>>Thanks for the code.
>>
>>
>>However, the number below each species ID is not identical.
>>Please kindly find attached file (I sent you once.) for test help.
>>Thanks again.
>>
>>
>>Elaine
>>
>>
>>
>>
>>Code
>>
>>
>>Try:
>>set.seed(248)
>> mat1<- matrix(sample(0:1,5*100,replace=TRUE),ncol=100,dimnames=list( 
>>c("Hokkaido","Honshu","Shikoku","Kyushu","Amami")
>>
>>,paste0("D",sprintf("%03d",1:100))) )
>>##to change
>> dat<- expand.grid(rownames(mat1),rownames(mat1),stringsAsFactors=FALSE)
>>dat1<- dat[!paste0(dat[,1],dat[,2])%in% 
>>paste0(rownames(mat1),rownames(mat1)),]
>>
>>#same code:
>>
>>
>
>>lst1<- lapply(seq_len(nrow(dat1)),function(i) {x<-mat1[unlist(dat1[i,]),]; 
>>which(different(x[1,],x[2,]))}) #using Rui's function
>> names(lst1)<- paste(dat1[,1],dat1[,2],sep="_")
>>A.K.
>>
>>
>>
>>On Sat, Sep 28, 2013 at 7:39 AM, arun  wrote:
>>
>>
>>>
>>>Elaine,
>>>Try:
>>>set.seed(248)
>>> mat1<- matrix(sample(0:1,5*100,replace=TRUE),ncol=100,dimnames=list( 
>>>c("Hokkaido","Honshu","Shikoku","Kyushu","Amami")
>>>
>>>,paste0("D",sprintf("%03d",1:100))) )
>>>##to change
>>> dat<- expand.grid(rownames(mat1),rownames(mat1),stringsAsFactors=FALSE)
>>>dat1<- dat[!paste0(dat[,1],dat[,2])%in% 
>>>paste0(rownames(mat1),rownames(mat1)),]
>>>
>>>#same code:
>>>
>>>lst1<- lapply(seq_len(nrow(dat1)),function(i) {x<-mat1[unlist(dat1[i,]),]; 
>>>which(different(x[1,],x[2,]))}) #using Rui's function
>>> names(lst1)<- paste(dat1[,1],dat1[,2],sep="_")
>>>A.K.
>>>
>>>
>>>
>>>From: Elaine Kuo 
>>>To: arun 
>>>Sent: Friday, September 27, 2013 6:17 PM
>>>
>>>Subject: Re: [R] Compare species presence and absence between sites
>>>
>>>
>>>
>>>Hello Arun, 
>>>
>>>Thanks for the code, but I have problem replacing Island A-D with real names.
>>>Please kindly indicate where to change in the code.
>>>The real names include Hokkaido, Honshu, Shikoku, Kyushu, and Amami.
>>>Thanks again.
>>>
>>>Elaine
>>>
>>>
>>>
>>>On Fri, Sep 27, 2013 at 9:31 PM, arun  wrote:
>>>
>>>Just to add:

If you wanted the difference of every combination of rows:
set.seed(248)
 mat1<- 
matrix(sample(0:1,5*100,replace=TRUE),ncol=100,dimnames=list(LETTERS[1:5],paste0("D",sprintf("%03d",1:100)))
 )
 dat<-e

Re: [R] Interperting results of glmnet and coxph plot, Brier score and Harrel's C-Index - am I doing something wrong ???

2013-09-28 Thread Bert Gunter
This appears to be a statistics, not an R-help question, so should
probably be asked on a statistics list, not here (e.g.
stats.stackexchange.com).

But if I understand your issue correctly, perhaps the heart f the
matter is: why do you think a stable fit must explain a lot of the
variation?  Feel free to ignore if I I'm wrong or discuss further on
the statistics list.

Cheers,
Bert

On Sat, Sep 28, 2013 at 2:39 AM, E Joffe  wrote:
> Hi all,
>
> I am using COX LASSO (glmnet / coxnet) regression to analyze a dataset of
> 394 obs. / 268 vars.
> I use the following procedure:
> 1.  Construct a coxnet on the entire dataset (by cv.glmnet)
> 2.  Pick the significant features by selecting the non-zero coefficient
> under the best lambda selected by the model
> 3.  Build a coxph model with bi-directional stepwise feature selection
> limited to the coxnet selected features.
>
> To validate the model I use both Brier score (library=peperr) and Harrel's
> C-Index (library=Hmisc) with a bootstrap of 50 iterations.
>
>
> MY QUESTION :  I am getting fairly good C-Index (0.76) and Brier (0.07)
> values for the models however per the coxnet the %Dev explained by the model
> is at best 0.27 and when I plot the survfit of the coxph the plotted
> confidence interval is very large.
> What am I missing here ?
>
> %DEV=27%
>
>
>
> Brier score - 0.07  ($ipec.coxglmnet -> [1] 7.24)
> C-Index - 0.76 ($cIndex -> [1] 0.763)
>
>
>
> DATA: [Private Health Information - can't publish] 394 obs./268 vars.
>
> CODE (need to define a dataset with 'time' and 'status' variables):
>
> library("survival")
> library ("glmnet")
> library ("c060")
> library ("peperr")
> library ("Hmisc")
>
> #creat Y (survival matrix) for glmnet
> surv_obj <- Surv(dataset$time,dataset$status)
>
>
> ## tranform categorical variables into binary variables with dummy for
> dataset
> predict_matrix <- model.matrix(~ ., data=dataset,
>contrasts.arg = lapply
> (dataset[,sapply(dataset, is.factor)], contrasts))
>
> ## remove the statu/time variables from the predictor matrix (x) for
> glmnet
> predict_matrix <- subset (predict_matrix, select=c(-time,-status))
>
> ## create a glmnet cox object using lasso regularization and cross
> validation
> glmnet.cv <- cv.glmnet (predict_matrix, surv_obj, family="cox")
>
> ## get the glmnet model on the full dataset
> glmnet.obj <- glmnet.cv$glmnet.fit
>
> # find lambda index for the models with least partial likelihood
> deviance (by cv.glmnet)
> optimal.lambda <- glmnet.cv$lambda.min# For a more parsimoneous
> model use lambda.1se
> lambda.index <- which(glmnet.obj$lambda==optimal.lambda)
>
>
> # take beta for optimal lambda
> optimal.beta  <- glmnet.obj$beta[,lambda.index]
>
> # find non zero beta coef
> nonzero.coef <- abs(optimal.beta)>0
> selectedBeta <- optimal.beta[nonzero.coef]
>
> # take only covariates for which beta is not zero
> selectedVar   <- predict_matrix[,nonzero.coef]
>
> # create a dataframe for trainSet with time, status and selected
> variables in binary representation for evaluation in pec
> reformat_dataSet <- as.data.frame(cbind(surv_obj,selectedVar))
>
> # glmnet.cox only with meaningful features selected by stepwise
> bidirectional AIC feature selection
> glmnet.cox.meaningful <- step(coxph(Surv(time,status) ~
> .,data=reformat_dataSet),direction="both")
>
>
>
>
> ##--
> -
> ##MODEL PERFORMANCE
>
> ##--
> -
> ##
>
>
> ## Calculate the Brier score - pec does its own bootstrap so this
> function runs on i=51 (i.e., whole trainset)
>
> ## Brier score calculation to cox-glmnet
> peperr.coxglmnet <- peperr(response=surv_obj, x=selectedVarCox,
> fit.fun=fit.coxph, load.all=TRUE,
> indices=resample.indices(n=nrow(surv_obj),
> method="boot", sample.n=50))
>
> # Get error predictions from peperr
> prederr.coxglmnet <- perr(peperr.coxglmnet)
>
> # Integrated prediction error Brier score calculation
> ipec.coxglmnet<-ipec(prederr.coxglmnet,
> eval.times=peperr.coxglmnet$attribute, response=surv_obj)
>
>
>   ## C-Index calculation 50 iter bootstrapping
>
>   for (i in 1:50){
> print (paste("Iteration:",i))
> train <- sample(1:nrow(dataset), nrow(dataset), replace = TRUE) ##
> random sampling with replacement
> # create a dataframe for trainSet with time, status and selected
> variables in binary representation for evaluation in pec
> reformat_trainSet <- reformat_dataSet [train,]
>
>
> # glmnet.cox only with meaningful features selected by stepwise
> bidirectional AIC feature sel

Re: [R] Plot lines whose angle and length depict vector quantities

2013-09-28 Thread Conor Ryan
Hi Greg, thanks for your help. I found vectorFields, as suggested by Jim
Lemon, in plotrix to be slightly easier (given my relatively basic R
language knowledge!). Kind Regards, Conor


On 27 September 2013 22:07, Greg Snow <538...@gmail.com> wrote:

> The ms.arrows along with my.symbols in the TeachingDemos package does not
> require start and end points, it takes a single point along with the angle
> and length (and the length can be a single constant to have all the arrows
> the same length, or a variable to have different lengths).  You can also
> set an adjustment parameter to decide if the arrows should be centered on
> the point, start at the point, or end at the point.
>
> The ms.arrows function was designed for exactly what you describe.
>
>
> On Fri, Sep 27, 2013 at 12:56 PM, Conor Ryan  wrote:
>
>> I am trying to plot points on a map for each ship locations (lat/long),
>> where
>> each point is a line whose angle (degrees) denotes ships heading and whose
>> line length denotes it's speed. Unfortunately arrows(); p.arrows (sfsmisc)
>> and ms.arrows (TeachingDemos) require start and end coordinates but I only
>> have a single coordinate and an angle to work with. Can you suggest any
>> other packages or commands that might allow me to plot this?
>> Alternatively,
>> can anyone suggest a method of making 'angle' a vector quantity when using
>> the arrows function? Any advice would be much appreciated!
>>
>> [[alternative HTML version deleted]]
>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> 538...@gmail.com
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plot lines whose angle and length depict vector quantities

2013-09-28 Thread Conor Ryan
Certainly not a homework question. Elementary questions do not necessarily
indicate that one is a student! Always learning...

Dr Conor Ryan


On 28 September 2013 11:39, Patrick Burns  wrote:

> On 27/09/2013 21:01, Bert Gunter wrote:
>
>> On Fri, Sep 27, 2013 at 12:44 PM, Sarah Goslee 
>> wrote:
>>
>>> It's a straightforward trigonometry problem, isn't it?
>>>
>>
>> Indeed! (  (r,theta) to (x,y) coordinates  ) . So I wonder if this is
>> a homework problem. If so, the OP should note that we try not to do
>> homework here.
>>
>
> So that would make it unanimous that everyone here is trying
> not to do homework.
>
> Pat
>
>
>
>> Cheers,
>> Bert
>>
>>>
>>> On Fri, Sep 27, 2013 at 2:56 PM, Conor Ryan  wrote:
>>>
 I am trying to plot points on a map for each ship locations (lat/long),
 where
 each point is a line whose angle (degrees) denotes ships heading and
 whose
 line length denotes it's speed. Unfortunately arrows(); p.arrows
 (sfsmisc)
 and ms.arrows (TeachingDemos) require start and end coordinates but I
 only
 have a single coordinate and an angle to work with. Can you suggest any
 other packages or commands that might allow me to plot this?
 Alternatively,
 can anyone suggest a method of making 'angle' a vector quantity when
 using
 the arrows function? Any advice would be much appreciated!


>>> --
>>> Sarah Goslee
>>> http://www.**functionaldiversity.org
>>>
>>> __**
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/**listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/**
>>> posting-guide.html 
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>>
> --
> Patrick Burns
> pbu...@pburns.seanet.com
> twitter: @burnsstat @portfolioprobe
> http://www.portfolioprobe.com/**blog 
> http://www.burns-stat.com
> (home of:
>  'Impatient R'
>  'The R Inferno'
>  'Tao Te Programming')
>
>
> __**
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/**listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/**
> posting-guide.html 
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Interperting results of glmnet and coxph plot, Brier score and Harrel's C-Index - am I doing something wrong ???

2013-09-28 Thread E Joffe
Hi all,

I am using COX LASSO (glmnet / coxnet) regression to analyze a dataset of
394 obs. / 268 vars.
I use the following procedure:
1.  Construct a coxnet on the entire dataset (by cv.glmnet) 
2.  Pick the significant features by selecting the non-zero coefficient
under the best lambda selected by the model 
3.  Build a coxph model with bi-directional stepwise feature selection
limited to the coxnet selected features.

To validate the model I use both Brier score (library=peperr) and Harrel's
C-Index (library=Hmisc) with a bootstrap of 50 iterations.


MY QUESTION :  I am getting fairly good C-Index (0.76) and Brier (0.07)
values for the models however per the coxnet the %Dev explained by the model
is at best 0.27 and when I plot the survfit of the coxph the plotted
confidence interval is very large. 
What am I missing here ?

%DEV=27%
 


Brier score - 0.07  ($ipec.coxglmnet -> [1] 7.24)
C-Index - 0.76 ($cIndex -> [1] 0.763)



DATA: [Private Health Information - can't publish] 394 obs./268 vars.

CODE (need to define a dataset with 'time' and 'status' variables):

library("survival")
library ("glmnet")
library ("c060")
library ("peperr")
library ("Hmisc")

#creat Y (survival matrix) for glmnet
surv_obj <- Surv(dataset$time,dataset$status) 

  
## tranform categorical variables into binary variables with dummy for
dataset
predict_matrix <- model.matrix(~ ., data=dataset, 
   contrasts.arg = lapply
(dataset[,sapply(dataset, is.factor)], contrasts))

## remove the statu/time variables from the predictor matrix (x) for
glmnet
predict_matrix <- subset (predict_matrix, select=c(-time,-status))

## create a glmnet cox object using lasso regularization and cross
validation
glmnet.cv <- cv.glmnet (predict_matrix, surv_obj, family="cox")

## get the glmnet model on the full dataset
glmnet.obj <- glmnet.cv$glmnet.fit

# find lambda index for the models with least partial likelihood
deviance (by cv.glmnet) 
optimal.lambda <- glmnet.cv$lambda.min# For a more parsimoneous
model use lambda.1se 
lambda.index <- which(glmnet.obj$lambda==optimal.lambda) 


# take beta for optimal lambda 
optimal.beta  <- glmnet.obj$beta[,lambda.index] 

# find non zero beta coef 
nonzero.coef <- abs(optimal.beta)>0 
selectedBeta <- optimal.beta[nonzero.coef] 

# take only covariates for which beta is not zero 
selectedVar   <- predict_matrix[,nonzero.coef] 

# create a dataframe for trainSet with time, status and selected
variables in binary representation for evaluation in pec
reformat_dataSet <- as.data.frame(cbind(surv_obj,selectedVar))

# glmnet.cox only with meaningful features selected by stepwise
bidirectional AIC feature selection 
glmnet.cox.meaningful <- step(coxph(Surv(time,status) ~
.,data=reformat_dataSet),direction="both")

 
  
 
##--
-
##MODEL PERFORMANCE
 
##--
-
## 

  
## Calculate the Brier score - pec does its own bootstrap so this
function runs on i=51 (i.e., whole trainset) 

## Brier score calculation to cox-glmnet
peperr.coxglmnet <- peperr(response=surv_obj, x=selectedVarCox,
fit.fun=fit.coxph, load.all=TRUE,
indices=resample.indices(n=nrow(surv_obj),
method="boot", sample.n=50))

# Get error predictions from peperr
prederr.coxglmnet <- perr(peperr.coxglmnet)

# Integrated prediction error Brier score calculation
ipec.coxglmnet<-ipec(prederr.coxglmnet,
eval.times=peperr.coxglmnet$attribute, response=surv_obj)


  ## C-Index calculation 50 iter bootstrapping
  
  for (i in 1:50){
print (paste("Iteration:",i))
train <- sample(1:nrow(dataset), nrow(dataset), replace = TRUE) ##
random sampling with replacement
# create a dataframe for trainSet with time, status and selected
variables in binary representation for evaluation in pec
reformat_trainSet <- reformat_dataSet [train,]


# glmnet.cox only with meaningful features selected by stepwise
bidirectional AIC feature selection 
glmnet.cox.meaningful.test <- step(coxph(Surv(time,status) ~
.,data=reformat_dataSet),direction="both")

selectedVarCox   <-
predict_matrix[,attr(glmnet.cox.meaningful.test$terms,"term.labels")] 
reformat_testSet <- as.data.frame(cbind(surv_obj,selectedVarCox))
reformat_testSet <- reformat_dataSet [-train,]
  

# compute c-index (Harrell's) for cox-glmnet models
if (is.null(glmnet.cox.meaningful)){
  cIn

Re: [R] Plot lines whose angle and length depict vector quantities

2013-09-28 Thread Conor Ryan
Hi Jim, vectorField worked a treat - thanks!


On 28 September 2013 01:57, Jim Lemon  wrote:

> On 09/28/2013 04:56 AM, Conor Ryan wrote:
>
>> I am trying to plot points on a map for each ship locations (lat/long),
>> where
>> each point is a line whose angle (degrees) denotes ships heading and whose
>> line length denotes it's speed. Unfortunately arrows(); p.arrows (sfsmisc)
>> and ms.arrows (TeachingDemos) require start and end coordinates but I only
>> have a single coordinate and an angle to work with. Can you suggest any
>> other packages or commands that might allow me to plot this?
>> Alternatively,
>> can anyone suggest a method of making 'angle' a vector quantity when using
>> the arrows function? Any advice would be much appreciated!
>>
>>  Hi Conor,
> Try the vector.field function (plotrix).
>
> Jim
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logical indexing not working

2013-09-28 Thread Patrick Burns

This is Circle 8.1.6 of 'The R Inferno'.

http://www.burns-stat.com/documents/books/the-r-inferno/

Pat


On 27/09/2013 19:20, Mariki Zietsman wrote:

I have a data frame frugivore.abundance.S1 where some columns are factors and others are 
numbers.For example these are my independent variables and "density" is my dependent 
variable. census<-c(1:70)sites<-c(1:5)birds<-c(1:45)

I want to select the data where sites is 1 and birds are 1,23,24 or 29
So I write:fa1<-frugivore.abundance.S1attach(fa1)(abund.frug.RN1<-fa1[sites==1 
& birds==c(1,23,24,29),])
This code doesn't print all the data it should for some reason. It seems to not print 
rows where "density" has the same value as another row with the same criteria.
i.e. if in the original data we have the following then only rows 1 and 3 will 
be printed, not all of them:
census   sites   birds   density1 1 1 0.0032
 1 1 0.0033 1 1 
0.001
Can anyone help me out with this please?
RegardsMariki

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Patrick Burns
pbu...@pburns.seanet.com
twitter: @burnsstat @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of:
 'Impatient R'
 'The R Inferno'
 'Tao Te Programming')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plot lines whose angle and length depict vector quantities

2013-09-28 Thread Patrick Burns

On 27/09/2013 21:01, Bert Gunter wrote:

On Fri, Sep 27, 2013 at 12:44 PM, Sarah Goslee  wrote:

It's a straightforward trigonometry problem, isn't it?


Indeed! (  (r,theta) to (x,y) coordinates  ) . So I wonder if this is
a homework problem. If so, the OP should note that we try not to do
homework here.


So that would make it unanimous that everyone here is trying
not to do homework.

Pat



Cheers,
Bert


On Fri, Sep 27, 2013 at 2:56 PM, Conor Ryan  wrote:

I am trying to plot points on a map for each ship locations (lat/long),
where
each point is a line whose angle (degrees) denotes ships heading and whose
line length denotes it's speed. Unfortunately arrows(); p.arrows (sfsmisc)
and ms.arrows (TeachingDemos) require start and end coordinates but I only
have a single coordinate and an angle to work with. Can you suggest any
other packages or commands that might allow me to plot this? Alternatively,
can anyone suggest a method of making 'angle' a vector quantity when using
the arrows function? Any advice would be much appreciated!



--
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






--
Patrick Burns
pbu...@pburns.seanet.com
twitter: @burnsstat @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of:
 'Impatient R'
 'The R Inferno'
 'Tao Te Programming')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.