date:20120114

Re: [R] Error: unexpected '' in when modifying existing functions

2012-01-14 Thread Rui Esteves

Thank you both.

1) As Duncan said, if I leave environment: namespace:stats out, it
will not work since it is using .C and .Fortran functions that kmeans
calls.
I
2) don`t know how to use the as.environment() (I did not understood by
reading the help).

3) Setting environment(kmeansnew) - environment(stats::kmeans) does
not work as well.

4) Using fix() works, but then I don`t know how to store just the
function in an external file. To use it in another computer, for
example.  If I use save(myfunc,myFile.R, ASCII=TRUE) it doesn't work
when I try to load it again using myfunc=load(myFile.R)

Rui


On Sat, Jan 14, 2012 at 3:22 AM, Duncan Murdoch
murdoch.dun...@gmail.com wrote:
 On 12-01-13 8:05 PM, Peter Langfelder wrote:

 On Fri, Jan 13, 2012 at 4:57 PM, Rui Estevesruimax...@gmail.com  wrote:

 Hi.
 I am trying to modify kmeans function.
 It seems that is failing something obvious with the workspace.
 I am a newbie and here is my code:


 environment: namespace:stats

 Error: unexpected '' in 


 Do not include the last line

 environment: namespace:stats

 it is not part of the function definition. Simply leave it out and
 your function will be defined in the  user workspace (a.k.a. global
 environment).


 That's only partly right.  Leaving it off will define the function in the
 global environment, but the definition might not work, because that's where
 it will look up variables, and the original function would look them up in
 the stats namespace.  I don't know if that will matter, but it might lead to
 tricky bugs.

 What you should do when modifying a function from a package is set the
 environment to the same environment a function in the package would normally
 get, i.e. to the stats namespace.  I think the as.environment() function can
 do this, but I always forget the syntax; an easier way is the following:

 Create the new function:

 kmeansnew - function (...) ...

 Set its environment the same as the old one:

 environment(kmeansnew) - environment(stats::kmeans)

 BTW, if you use the fix() function to get a copy for editing, it will do
 this for you automatically.

 Duncan Murdoch



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems with plotCI

2012-01-14 Thread Martin Maechler

 JL == Jim Lemon j...@bitwrit.com.au
 on Sat, 14 Jan 2012 18:52:50 +1100 writes:

JL On 01/14/2012 06:35 PM, Jim Lemon wrote:
 On 01/13/2012 11:09 PM, Lasse DSR-mail wrote:
 Got problems with plotCI (plotrix) ...

JL Whoops - looks like the R help list doesn't accept R
JL source code as attachments any more. 

nonsense, sorry.

As I say every few months:
If an attachment is accepted or not does *not* depend on its content
proper, but on its MIME type, and R-help, e.g. accepts
text/plain
If your e-mail software has changed, and no longer uses (or
allows you to use) text/plain for plain text such as R source
code, then you should blame the provider of your e-mail software 
...  or alas provide text inline, as you did.

Martin Maechler, ETH Zurich
(and R-help maintainer).

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plotting regression line in with lattice

2012-01-14 Thread matteo dossena

Weidong, 
thanks for the suggestion, but i also need to show to which trt each point 
belongs to.
I had my problem solved by the way.
I've been told to add a group subscript object within the panel function, and 
than use panel.points to plot the original data as data points and panel.lines 
to draw the predicted regression line of the model.

cheers
m.



Il giorno 13 Jan 2012, alle ore 19:57, Weidong Gu ha scritto:

 Hi,
 
 Since trt is a factor, you use it for indexing. try just delete in the code
 fill - my.fill[combined$trt[subscripts]]
 
 Weidong Gu
 
 On Fri, Jan 13, 2012 at 11:30 AM, matteo dossena m.doss...@qmul.ac.uk wrote:
 #Dear All,
 #I'm having a bit of a trouble here, please help me...
 #I have this data
 set.seed(4)
 mydata - data.frame(var = rnorm(100),
temp = rnorm(100),
subj = as.factor(rep(c(1:10),5)),
trt = rep(c(A,B), 50))
 
 #and this model that fits them
 lm  - lm(var ~ temp * subj, data = mydata)
 
 #i want to plot the results with lattice anf fit the regression line, 
 predicted with my model, trough them
 #to do so, I'm using this approach, outlined  Lattice Tricks for the power 
 useR by D. Sarkar
 
 temp_rng - range(mydata$temp, finite = TRUE)
 
 grid - expand.grid(temp = do.breaks(temp_rng, 30),
   subj = unique(mydata$subj),
   trt = unique(mydata$trt))
 
 model - cbind(grid, var = predict(lm, newdata = grid))
 
 orig - mydata[c(var,temp,subj,trt)]
 
 combined - make.groups(original = orig, model = model)
 
 
 xyplot(var ~ temp | subj,
  data = combined,
  groups = which,
  type = c(p, l),
  distribute.type = TRUE
  )
 
 
 # so far every thing is fine, but, i also whant assign a filling to the data 
 points for the two treatments trt=1 and trt=2
 # so I have written this piece of code, that works fine, but when it comes 
 to plot the regression line, it seems that type is not recognized by the 
 panel function...
 
 my.fill - c(black, grey)
 
 plot - with(combined,
   xyplot(var ~ temp | subj,
 data = combined,
 group = combined$which,
 type = c(p, l),
 distribute.type = TRUE,
 panel = function(x, y, ..., subscripts){
fill - my.fill[combined$trt[subscripts]]
panel.xyplot(x, y, pch = 21, fill = my.fill, col = 
 black)
},
key = list(space = right,
text = list(c(trt1, trt2), cex = 0.8),
points = list(pch = c(21), fill = c(black, grey)),
rep = FALSE)
)
 )
 plot
 
 #I've also tried to move type and distribute type within panel.xyplot, as 
 well as subsseting the data in it panel.xyplot like this
 
 plot - with(combined,
   xyplot(var ~ temp | subj,
 data = combined,
 panel = function(x, y, ..., subscripts){
fill - my.fill[combined$trt[subscripts]]
panel.xyplot(x[combined$which==original], 
 y[combined$which==original], pch = 21, fill = my.fill, col = black)
panel.xyplot(x[combined$which==model], 
 y[combined$which==model], type = l, col = black)
},
key = list(space = right,
text = list(c(trt1, trt2), cex = 0.8),
points = list(pch = c(21), fill = c(black, grey)),
rep = FALSE)
)
 )
 plot
 
 #but no success with that either...
 #can anyone help me to get the predicted values plotted as a line instead of 
 being points?
 #really appricieate
 #matteo
 
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems with plotCI

2012-01-14 Thread Jim Lemon


On 01/14/2012 09:22 PM, Martin Maechler wrote:

JL == Jim Lemonj...@bitwrit.com.au
 on Sat, 14 Jan 2012 18:52:50 +1100 writes:


 JL  On 01/14/2012 06:35 PM, Jim Lemon wrote:
   On 01/13/2012 11:09 PM, Lasse DSR-mail wrote:
   Got problems with plotCI (plotrix) ...

 JL  Whoops - looks like the R help list doesn't accept R
 JL  source code as attachments any more.

nonsense, sorry.

As I say every few months:
If an attachment is accepted or not does *not* depend on its content
proper, but on its MIME type, and R-help, e.g. accepts
text/plain
If your e-mail software has changed, and no longer uses (or
allows you to use) text/plain for plain text such as R source
code, then you should blame the provider of your e-mail software
...  or alas provide text inline, as you did.

Martin Maechler, ETH Zurich
(and R-help maintainer).


Hmmm, I send messages in plain text by default, so I've listed 
r-project.org as a plain text only domain (I use the Thunderbird email 
client). I'll see if this fixes it.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Converting .Rout file to pdf via Sweave automatically

2012-01-14 Thread Duncan Murdoch


On 12-01-13 10:21 PM, Parag Magunia wrote:

The R documentation mentions to create a PDF or DVI file from an Rnw template, 
the Sweave command can be used used.

However, is there any way to go from a .Rout file straight to pdf with an Rnw 
template ?

What I'm trying to avoid is adding the Sweave markup to the .tex file manually.

What I think I'm missing is the exact arguments to the Sweave command.

I tried numerous forms of:

Sweave(batch.Rout, RweaveLatex(), myR.Rnw);

but without any succuess.


You misunderstand Sweave.  You don't add markup to the Rout file or the 
tex file, you add markup to the input file.  Usually you name it as Rnw.


For a case where you want to print everything in the batch file and 
there are no graphs, you could just put markup at the very beginning and 
at the very end.  Figures are slightly more complicated, but it sounds 
like you don't need them.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Determining if an object name does not exist

2012-01-14 Thread Ajay Askoolum

Is there a way to tell whether an object name 1. is valid 2. is not going to 
cause a collision with an existing object by the same name?

Thanks.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error: unexpected '' in when modifying existing functions

2012-01-14 Thread Duncan Murdoch


On 12-01-14 3:58 AM, Rui Esteves wrote:

Thank you both.

1) As Duncan said, if I leaveenvironment: namespace:stats  out, it
will not work since it is using .C and .Fortran functions that kmeans
calls.
I
2) don`t know how to use the as.environment() (I did not understood by
reading the help).

3) Setting environment(kmeansnew)- environment(stats::kmeans) does
not work as well.


I think you need to explain what does not work means.  What did you 
do, and how do you know it didn't work?




4) Using fix() works, but then I don`t know how to store just the
function in an external file. To use it in another computer, for
example.  If I use save(myfunc,myFile.R, ASCII=TRUE) it doesn't work
when I try to load it again using myfunc=load(myFile.R)


Don't use load() on a source file.  Use load() on a binary file produced 
by save().  You could save() your working function, but then you can't 
edit it outside of R.  To produce a .R file that you can use in another 
session, you're going to need to produce the function, then modify the 
environment, using 2 or 3 above.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Determining if an object name does not exist

2012-01-14 Thread Duncan Murdoch


On 12-01-14 5:47 AM, Ajay Askoolum wrote:

Is there a way to tell whether an object name 1. is valid 2. is not going to 
cause a collision with an existing object by the same name?


For 1, you could put your names in a character vector x, then check 
whether x and make.names(x) are identical; if so, x contains 
syntactically valid names.  (Do remember that almost anything can be a 
name if you put it in back quotes.)


For 2, you could check exists(x) to find if objects with those names exist.

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error: unexpected '' in when modifying existing functions

2012-01-14 Thread Rui Esteves

All of these tries leave to the same result:
1) First I defined kmeansnew with the content of kmeans, but leaving
the  environment: namespace:stats out.
Then I run environment(kmeansnew)- environment(stats::kmeans) at the
command line.
2) kmeansnew - kmeans() { environment(kmeansnew)-
environment(stats::kmeans) }
3) kmeansnew - kmeans() {}   environment(kmeansnew)-
environment(stats::kmeans)

When I do kmeansnew(iris[-5],4) it returns:
 Error in do_one(nmeth) : object 'R_kmns' not found

'R_kmns' is a .FORTRAN that is called by the original kmeans().
it is the same error as if i would just leave environment:
namespace:stats out.



On Sat, Jan 14, 2012 at 11:50 AM, Duncan Murdoch
murdoch.dun...@gmail.com wrote:
 On 12-01-14 3:58 AM, Rui Esteves wrote:

 Thank you both.

 1) As Duncan said, if I leaveenvironment: namespace:stats  out, it
 will not work since it is using .C and .Fortran functions that kmeans
 calls.
 I
 2) don`t know how to use the as.environment() (I did not understood by
 reading the help).

 3) Setting environment(kmeansnew)- environment(stats::kmeans) does
 not work as well.


 I think you need to explain what does not work means.  What did you do,
 and how do you know it didn't work?


 4) Using fix() works, but then I don`t know how to store just the
 function in an external file. To use it in another computer, for
 example.  If I use save(myfunc,myFile.R, ASCII=TRUE) it doesn't work
 when I try to load it again using myfunc=load(myFile.R)


 Don't use load() on a source file.  Use load() on a binary file produced by
 save().  You could save() your working function, but then you can't edit it
 outside of R.  To produce a .R file that you can use in another session,
 you're going to need to produce the function, then modify the environment,
 using 2 or 3 above.

 Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] multidimensional array calculation

2012-01-14 Thread Johannes Radinger

Dear Jean,

Thank you, expand.grid was the function I needed.

/johannes


 
 See 
 ?expand.grid 
 
 For example, 
 df - expand.grid(L=L, AR=AR, SO=SO, T=T) 
 df$y - fun(df$L, df$AR, df$SO, df$T) 
 
 Jean 
 
 
 Johannes Radinger wrote on 01/13/2012 12:28:46 PM:
 
  Hello,
  
  probably it is quite easy but I can get it: I have
  mulitple numeric vectors and a function using
  all of them to calculate a new value:
  
  L - c(200,400,600)
  AR - c(1.5)
  SO - c(1,3,5)
  T - c(30,365)
  
  fun - function(L,AR,SO,T){
 exp(L*AR+sqrt(SO)*log(T))
  }
  
  How can I get an array or dataframe where
  all possible combinations of the factors are listed
  and the new value is calculated.
  
  I thought about an array like:
  array(NA, dim = c(3,1,3,2), dimnames=list(c(200,400,600),c(1.5),c(1,
  3,5),c(30,365)))
  
  but how can I get the array populated according to the function?
  
  As I want to get in the end a 2D dataframe I probably will use the 
  melt.array()
  function from the reshape package or is there another way to get simple such
  a full-factorial dataframe with all possible combinations?
  
  Best regards,
  Johannes


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fUtilities removed -- use fBasics

2012-01-14 Thread Martin Maechler

 David Winsemius dwinsem...@comcast.net
 on Fri, 13 Jan 2012 13:52:57 -0500 writes:

 On Jan 13, 2012, at 12:33 PM, Dominic Comtois wrote:

 When setting up my new machine, I had the surprise to see
 that Package 'fUtilities' was removed from the CRAN
 repository.
 

 https://stat.ethz.ch/pipermail/rmetrics-core/2012-January/000554.html
 https://stat.ethz.ch/pipermail/rmetrics-core/2011-November/000549.html

indeed. thank you David (and Google, I presume ..)

 
 This is problematic for my work. I use many of its
 functions, and it will complicate things a lot if other
 programmers want to use my previous code in the
 future. Plus, nowhere can I find the justification for
 its removal.

For a longer time, the Rmetrics management had planned to
deprecate  fUtilities (and fSeries and fCalendar),
basically refactoring the functionality ``approximately'' along
the lines of

 old package replacement pkgs
 --- 
 fUtilities  fBasics
 fSeries timeSeries
 fCalendar   timeDate

but clearly not a 1:1 replacement, but a refactoring as said above.
fBasics, indeed 'Depends' on both timeSeries and timeDate,
so I think it is safe to say that you should replace

   fUtilities by fBasics
everywhere ... and things should work...

Yes, the communication about these plans where not put out the
way they should have; and indeed the deprecation would not have
necessarily meant that the package be dropped without proper notice.
One excuse has been the lack of resources and health on the side
of Rmetrics.

Disclaimer: I am one of rmetrics-c...@r-project.org, as having
been an active co-maintainer of some parts of the Rmetrics collection,
but I have not been part of the management nor the foundation.

Martin Maechler, ETH Zurich

 You need to send your questions to the maintainers. They
 apparently did not respond to the requests to fix the
 errors.

 
 Thanks for any info on this

 You should perhaps subscribe to the list that is
 established for discussion on this and related packages.

 -- 

 David Winsemius, MD West Hartford, CT

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
 read the posting guide
 http://www.R-project.org/posting-guide.html and provide
 commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error: unexpected '' in when modifying existing functions

2012-01-14 Thread Duncan Murdoch


On 12-01-14 6:08 AM, Rui Esteves wrote:

All of these tries leave to the same result:
1) First I defined kmeansnew with the content of kmeans, but leaving
theenvironment: namespace:stats  out.
Then I run environment(kmeansnew)- environment(stats::kmeans) at the
command line.
2) kmeansnew- kmeans() { environment(kmeansnew)-
environment(stats::kmeans) }
3) kmeansnew- kmeans() {}   environment(kmeansnew)-
environment(stats::kmeans)

When I do kmeansnew(iris[-5],4) it returns:
  Error in do_one(nmeth) : object 'R_kmns' not found

'R_kmns' is a .FORTRAN that is called by the original kmeans().
it is the same error as if i would just leaveenvironment:
namespace:stats  out.


Number 1 is what you should do.  When you do that and print kmeansnew in 
the console, does it list the environment at the end?  What does

environment(kmeansnew) print?

Duncan Murdoch





On Sat, Jan 14, 2012 at 11:50 AM, Duncan Murdoch
murdoch.dun...@gmail.com  wrote:

On 12-01-14 3:58 AM, Rui Esteves wrote:


Thank you both.

1) As Duncan said, if I leaveenvironment: namespace:statsout, it
will not work since it is using .C and .Fortran functions that kmeans
calls.
I
2) don`t know how to use the as.environment() (I did not understood by
reading the help).

3) Setting environment(kmeansnew)- environment(stats::kmeans) does
not work as well.



I think you need to explain what does not work means.  What did you do,
and how do you know it didn't work?



4) Using fix() works, but then I don`t know how to store just the
function in an external file. To use it in another computer, for
example.  If I use save(myfunc,myFile.R, ASCII=TRUE) it doesn't work
when I try to load it again using myfunc=load(myFile.R)



Don't use load() on a source file.  Use load() on a binary file produced by
save().  You could save() your working function, but then you can't edit it
outside of R.  To produce a .R file that you can use in another session,
you're going to need to produce the function, then modify the environment,
using 2 or 3 above.

Duncan Murdoch


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Date/time

2012-01-14 Thread claire5

Hey guys,

I have been trying for some time to nicely plot some of my data, I have
around 1800 values for some light intensity, taken every hour of the day
over almost 2 months.

My data file looks like:

 DateTime. GMT.02.00 Intensity
106.10.11 11:00:00AM   x
206.10.11 12:00:00PM   x
306.10.11 01:00:00PM   x
406.10.11 02:00:00PM x

As I am pretty new to R, I am totally struggling with this issue, does
anyone has an idea on how I could plot nicely the data and if I need to
change my data file?

Thanks a lot for your help


--
View this message in context: 
http://r.789695.n4.nabble.com/Date-time-tp4294499p4294499.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] The Future of R | API to Public Databases

2012-01-14 Thread Benjamin Weber

Spencer

I highly appreciate your input. What we need is a standard for
statistics. That may reinvent the way how we see data.
The recent crisis is the best proof that we are lost in our own
generated information overload. The traditional approach is not
working anymore.

Finding the right members for the initial committee would be the
hardest but most important part.

Another point is that I am only a student of 21 years which has
limited financial capabilities with respect to what I can commit to
such kind of a work.
But I have my motivation which is the *real* engine to advance an
idea. I am open to work in my spare time on it. Over time I would
become an expert in my own field, that is implicit in such a decision.
I don't have any background of a statistician but know what the
relevance of data is.
It may be a solution that a fresher gives a new perspective. Starting
from scratch is at some point beneficial.
It will be even harder for a person like me to convince the
experienced professionals to overcome their own conventional schemes
and procedures. Because my approach would pay not respect to the
established ones. Why the hell should I know it just better than the
experts? I respect single solutions; they might work in a specific
situation but they make it impossible to put everything together into
a big picture which is finally required.

I am really interested in leading the initiative of such a new
standard. My problem is how to start.

Would a scientific paper which proposes the development of a standard,
be a starting point?

Benjamin

On 14 January 2012 08:19, Spencer Graves
spencer.gra...@structuremonitoring.com wrote:
      A traditional way to exit a chaotic situation as you describe is to try
 to establish a standards committee, invite participation from suppliers and
 users of whatever (data in this case), apply for registration with the
 International Standards Organization, and organize meetings, draft and
 circulate a proposed standard, etc.  A statistician who had published maybe
 100 papers and 3 books told me that his work on ISO 9000 (I think) made a
 larger contribution to humanity than anything else he had done.  Work on
 standards is one of the most boring, tedious activities I can imagine -- and
 can potentially be the most impactful thing one does in this life:  If you
 have an ISO standard number for something, people who are starting something
 new may find it and follow it.  People who are working to upgrade something
 may tell their management, Let's follow this standard.  Customers
 sometimes ask their suppliers, If you follow the standard, you might get
 more customers.


      I think you could get support for such a standard effort from the
 American Association for the Advancement of Science, the American Economics
 Association, the American Statistical Association, and many other
 organizations, including many on-line science journals that today pressure
 authors of papers to put the data behind their published paper in the public
 domain, downloadable from their web site, etc.


      IMHO.
      Spencer


 On 1/13/2012 3:39 PM, Benjamin Weber wrote:

 The whole issue is related to the mismatch of (1) the publisher of the
 data and (2) the user at the rendezvous point.
 Both the publisher and the user don't know anything about the
 rendezvous point. Both want to meet but don't meet in reality.
 The user wastes time to find the rendezvous point defined by the
 publisher.
 The publisher assumes any rendezvous point. As per the number of
 publishers, the variety of the fields and the flavor of each expert,
 we end up in today's data world. Everyone has to waste his precious
 time to find out the rendezvous point. Only experts do know in which
 corner to focus their search on - but even they need their time to
 find what they want.
 However, each expert (of each profession) believes that his approach
 is the best one in the world.
 Finally we have a state of total confusion, where only experts can
 handle the information and non-experts can not even access the data
 without diving fully into the flood of data and their specialities.
 That's my point: Data is not accessible.

 The discussion should follow a strategical approach:
 - Is the classical csv file (in all its varieties) the simplest and best
 way?
 - Isn't it the responsibility of the R community to recommend
 standards for different kinds of data?
 With the existence of this rendezvous point the publisher would know a
 specific point which is favorable from the user's point of view. That
 is missing.
 Only a rendezvous point defined by the community can be a 'known'
 rendezvous point for all stakeholders, globally.

 I do believe that the publisher's greatest interest is data
 accessibility. Where is the toolkit we provide them to enable them to
 serve us the data exactly as we want it? No, we just try to build even
 more packages to be lost in the noise of information.

 I disagree with a proposed solution to

Re: [R] tm package, custom reader

2012-01-14 Thread Milan Bouchet-Valat

Le vendredi 13 janvier 2012 à 09:00 -0800, pl.r...@gmail.com a écrit :
 I need help with creating custom xml reader for use with the tm package.  The
 objective is to crate a corpus for analysis.  Files that I'm working with
 come from solr and are in a funky XML format never the less I'm able to
 parse the XML files using  solrDocs.R function provided by Duncan Temple
 Lang.  
 
 The problem I'm having that once I parse the document I need to create a
 custom reader that would be compatible with the  tm package.  
 
 If someone build a custom reader for tm package, or has some ideas of how to
 go about this,  I would greatly appreciate the help.
I've just written a custom XML source for tm a few days ago, so I guess
I can help. First, tm has a document explaining how to write an XML
reader [1], and it's relatively easy.

Though, I think you shouldn't base your tm reader on the functions
solrDocs.R, since they don't share the same structure as what tm
expects. But you can probably adapt the code from there.

To sum up how tm extensions work, you should have one function parsing
the XML file and returning one XML string for each document in a corpus:
this is the source. And one function parsing these per-document XML
strings, and filling the document's body and meta-data from the XML
tags. I think your code can be simpler than solrDocs.R since you
probably know beforehand which tags are useful for you, which aren't,
and what their types are.

Feel free to ask for help on specific issues you may have. But please
provide a short XML example (and possible code). Also, when you're done,
please consider making this available, either from tm itself, or from a
new package, if it can be useful to others.


Regards

1: http://cran.r-project.org/web/packages/tm/vignettes/extensions.pdf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Date/time

2012-01-14 Thread R. Michael Weylandt

What exactly is your problem? People quite like the zoo package for
handling and plotting time series: perhaps that will work for you?

Michael

On Sat, Jan 14, 2012 at 4:35 AM, claire5 claire.moran...@free.fr wrote:
 Hey guys,

 I have been trying for some time to nicely plot some of my data, I have
 around 1800 values for some light intensity, taken every hour of the day
 over almost 2 months.

 My data file looks like:

     Date    Time. GMT.02.00 Intensity
 1    06.10.11 11:00:00        AM   x
 2    06.10.11 12:00:00        PM   x
 3    06.10.11 01:00:00        PM   x
 4    06.10.11 02:00:00        PM     x

 As I am pretty new to R, I am totally struggling with this issue, does
 anyone has an idea on how I could plot nicely the data and if I need to
 change my data file?

 Thanks a lot for your help


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Date-time-tp4294499p4294499.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Date/time

2012-01-14 Thread claire5

Well, I am not sure how to use the zoo package to be honest.
I am trying to plot all 1800 data in the same graph but of course it looks
super messy. And R does not really recognize the time data input.
so i just want to plot the time series kind of. The problem is that the x
value is date and time and i don't know how to tell that R yet
i would like it to be a line then, no points, I guess a very long line
y would have light data and x the time.
And of course it sounds unrealistic, but would be great to have just the
days on the x axis not each value for every hours

I hope i am clear, somehow it is really unclear in my head as well :)

--
View this message in context: 
http://r.789695.n4.nabble.com/Date-time-tp4294499p4294878.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] The Future of R | API to Public Databases

2012-01-14 Thread Jason Edgecombe

Web services are only part of the problem. In essence, there are at 
least two facets:

1. downloading the data using some protocol
2. mapping the data to a common model

Having #1 makes the import/download easier, but it really becomes useful 
when both are included. I think #2 is the harder problem to address. 
Software can usually be written to handle #1 by making a useful 
abstraction layer. #2 means that data has consistent names and meanings, 
and this requires people to agree on common definitions and a common 
naming convention.


RDF (Resource Description Framework) and its related technologies 
(SPARQL, OWL, etc) are one of the many attempts to try to address this. 
While this effort would benefit R, I think it's best if it's part of a 
larger effort.


Services such as DBpedia and Freebase are trying to unify many data sets 
using RDF.


The task view and package ideas a great ideas. I'm just adding another 
perspective.


Jason

On 01/13/2012 05:18 PM, Roy Mendelssohn wrote:

HI Benjamin:

What would make this easier is if these sites used standardized web services, 
so it would only require writing once.  data.gov is the worst example, they 
spun the own, weak service.

There is a lot of environmental data available through OPenDAP, and that is 
supported in the ncdf4 package.  My own group has a service called ERDDAP that 
is entirely RESTFul, see:

http://coastwatch.pfel.noaa.gov/erddap

and

http://upwell.pfeg.noaa.gov/erddap

We provide R  (and matlab) scripts that automate the extract for certain cases, 
see:

http://coastwatch.pfeg.noaa.gov/xtracto/

We also have a tool called the Environmental Data Connector  (EDC) that  
provides a GUI from with R  (and ArcGIS, Matlab and Excel) that allows you to 
subset  data that is served by OPeNDAP, ERDDAP, certain Sensor Observation 
Service (SOS) servers,  and have it read directly into R.  It is freely 
available at:

http://www.pfeg.noaa.gov/products/EDC/

We can write such tools because the service is either standardized  (OPeNDAP, 
SOS) or is easy to implement  (ERDDAP).

-Roy


On Jan 13, 2012, at 1:14 PM, Benjamin Weber wrote:


Dear R Users -

R is a wonderful software package. CRAN provides a variety of tools to
work on your data. But R is not apt to utilize all the public
databases in an efficient manner.
I observed the most tedious part with R is searching and downloading
the data from public databases and putting it into the right format. I
could not find a package on CRAN which offers exactly this fundamental
capability.
Imagine R is the unified interface to access (and analyze) all public
data in the easiest way possible. That would create a real impact,
would put R a big leap forward and would enable us to see the world
with different eyes.

There is a lack of a direct connection to the API of these databases,
to name a few:

- Eurostat
- OECD
- IMF
- Worldbank
- UN
- FAO
- data.gov
- ...

The ease of access to the data is the key of information processing with R.

How can we handle the flow of information noise? R has to give an
answer to that with an extensive API to public databases.

I would love your comments and ideas as a contribution in a vital discussion.

Benjamin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

**
The contents of this message do not reflect any position of the U.S. Government or 
NOAA.
**
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
1352 Lighthouse Avenue
Pacific Grove, CA 93950-2097

e-mail: roy.mendelss...@noaa.gov (Note new e-mail address)
voice: (831)-648-9029
fax: (831)-648-8440
www: http://www.pfeg.noaa.gov/

Old age and treachery will overcome youth and skill.
From those who have been given much, much will be expected
the arc of the moral universe is long, but it bends toward justice -MLK Jr.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Date/time

2012-01-14 Thread Gabor Grothendieck

On Sat, Jan 14, 2012 at 4:35 AM, claire5 claire.moran...@free.fr wrote:
 I have been trying for some time to nicely plot some of my data, I have
 around 1800 values for some light intensity, taken every hour of the day
 over almost 2 months.

 My data file looks like:

     Date    Time. GMT.02.00 Intensity
 1    06.10.11 11:00:00        AM   x
 2    06.10.11 12:00:00        PM   x
 3    06.10.11 01:00:00        PM   x
 4    06.10.11 02:00:00        PM     x

 As I am pretty new to R, I am totally struggling with this issue, does
 anyone has an idea on how I could plot nicely the data and if I need to
 change my data file?


With the zoo package its as follows.  For the actual data which
resides in a file rather than in a character string, Lines, we would
replace text=Lines with something like myfile.dat.

# sample data
Lines - DateTime. GMT.02.00 Intensity
106.10.11 11:00:00AM   1
206.10.11 12:00:00PM   2
306.10.11 01:00:00PM   3
406.10.11 02:00:00PM   4

library(zoo)
z - read.zoo(text = Lines, index = 1:3, tz = , format = %m.%d.%y %r)
plot(z)


We might alternately want to use chron date/times to avoid time zone
problems later (as per R News 4/1).  In that case it would be:

library(zoo)
library(chron)
toChron - function(d, t, p) as.chron(paste(d, t, p), format = %m.%d.%y %r)
z - read.zoo(text = Lines, index = 1:3, FUN = toChron)
plot(z)

Note that in both cases we could omit header = TRUE because there is
one more data column than header column so it can deduce the correct
header= value.

Read the 5 zoo vignettes and particularly the one on read.zoo as well
as the help files for more info.


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] HOW To CHANGE THE TYPE OF NUMBER IN THE X-Y AXIS in the (barplot) GRAPH?

2012-01-14 Thread Yakamu Yakamu

Dear all,
I have troubles where i have to make all the fonts in my grpahs into TImes New 
Roman,
I know now how to change fonts for the x-axis-y-axis labels (from 
http://www.statmethods.net/advgraphs/parameters.html ) 
 but HOW CAN I ALSO CHANGE THE TYPE OF FONT FOR THE NUMBER INTO Times New 
Roman? 
THank you very much in advance,
KInd regards,
YAKAMU
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Change state names to abbreviations in an irregular list of names, abbreviations, null values, and foreign provinces

2012-01-14 Thread Ben Bolker

David Kikuchi dkikuchi at email.unc.edu writes:

 I'm trying to create maps of reptile abundance in different states  
 counties using data from Herp.net, which provides lists of specimens 
 with the places that they were found.  First I would like to parse the 
 list by state using 2-letter abbreviations, since I'm focusing on 
 certain regions.  To do this, I've been trying to create a vector 
 (state2) that gives all state names as 2-letter abbreviations, using 
 advice given on the thread: 
 http://tolstoy.newcastle.edu.au/R/help/05/09/12136.html
 

  [snip] 

 state2 - rep(NA,length(tener$State.Province))
 for(i in 1:length(tener$Institution)){
  if(tener$State.Province[i] != ''){
  if(grep(tener$State.Province[i],state.name)  0){
  state2[i] - state.abb[grep(tener$State.Province[i], 
 state.name)]
  }
  else{
   state2[i] - NA
  }
  }
  else{
  state2[i] - NA
  }
 }


  I think you might be looking for length(grep(...))0 , but
is this an easier way?

state.province - c(Massachusetts,Ontario,Cuba,,Pennsylvania)
myabbr - state.abb[match(state.province,state.name)]

myabbr
## [1] MA NA   NA   NA   PA

   (You described your problem pretty clearly, but a reproducible
example would have been nice)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Quantiles in boxplot‏

2012-01-14 Thread peter dalgaard


On Jan 14, 2012, at 16:07 , René Brinkhuis wrote:

 
 Based
 on your information I created a custom function for calculating the 
 first and third quartile according to the 'boxplot logic'.

A more compact (though not as readable) version is afforded by stats:::fivenum.

A convenient description is (I believe) that the hinges are the medians of the 
bottom and top halves of the sorted observations, with the middle observation 
counting in both groups if n is odd).

 x - rnorm(121)
 fivenum(x)
[1] -2.4596038 -0.6034689  0.1105829  0.6686026  2.2580863
 median(sort(x)[1:floor((length(x)+1)/2)])
[1] -0.6034689
 median(sort(x)[ceiling((length(x)+1)/2):length(x)])
[1] 0.6686026

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] The Future of R | API to Public Databases

2012-01-14 Thread Mike Marchywka








LOL, I remember posting about this in the past. The US gov agencies vary but 
mostare quite good. The big problem appears to be people who push proprietary 
orcommercial standards for which only one effective source exists. Some 
formats,like Excel and PDF come to mind and there is a disturbing trend towards 
theiradoption in some places where raw data is needed by many. The best thing 
to do is contact the informationprovider and let them know you want raw data, 
not images or stuff that worksin limited commercial software packages. Often 
data sources are valuable andthe revenue model impacts availability. 

If you are just arguing over different open formats,  it is usually easy for 
someone towrite some conversion code and publish it- CSV to JSON would not be a 
problem for example. Data of course are quite variable and there is 
nothingwrong with giving provider his choice. 


 Date: Sat, 14 Jan 2012 10:21:23 -0500
 From: ja...@rampaginggeek.com
 To: r-help@r-project.org
 Subject: Re: [R] The Future of R | API to Public Databases

 Web services are only part of the problem. In essence, there are at
 least two facets:
 1. downloading the data using some protocol
 2. mapping the data to a common model

 Having #1 makes the import/download easier, but it really becomes useful
 when both are included. I think #2 is the harder problem to address.
 Software can usually be written to handle #1 by making a useful
 abstraction layer. #2 means that data has consistent names and meanings,
 and this requires people to agree on common definitions and a common
 naming convention.

 RDF (Resource Description Framework) and its related technologies
 (SPARQL, OWL, etc) are one of the many attempts to try to address this.
 While this effort would benefit R, I think it's best if it's part of a
 larger effort.

 Services such as DBpedia and Freebase are trying to unify many data sets
 using RDF.

 The task view and package ideas a great ideas. I'm just adding another
 perspective.

 Jason

 On 01/13/2012 05:18 PM, Roy Mendelssohn wrote:
  HI Benjamin:
 
  What would make this easier is if these sites used standardized web 
  services, so it would only require writing once. data.gov is the worst 
  example, they spun the own, weak service.
 
  There is a lot of environmental data available through OPenDAP, and that is 
  supported in the ncdf4 package. My own group has a service called ERDDAP 
  that is entirely RESTFul, see:
 
  http://coastwatch.pfel.noaa.gov/erddap
 
  and
 
  http://upwell.pfeg.noaa.gov/erddap
 
  We provide R (and matlab) scripts that automate the extract for certain 
  cases, see:
 
  http://coastwatch.pfeg.noaa.gov/xtracto/
 
  We also have a tool called the Environmental Data Connector (EDC) that 
  provides a GUI from with R (and ArcGIS, Matlab and Excel) that allows you 
  to subset data that is served by OPeNDAP, ERDDAP, certain Sensor 
  Observation Service (SOS) servers, and have it read directly into R. It is 
  freely available at:
 
  http://www.pfeg.noaa.gov/products/EDC/
 
  We can write such tools because the service is either standardized 
  (OPeNDAP, SOS) or is easy to implement (ERDDAP).
 
  -Roy
 
 
  On Jan 13, 2012, at 1:14 PM, Benjamin Weber wrote:
 
  Dear R Users -
 
  R is a wonderful software package. CRAN provides a variety of tools to
  work on your data. But R is not apt to utilize all the public
  databases in an efficient manner.
  I observed the most tedious part with R is searching and downloading
  the data from public databases and putting it into the right format. I
  could not find a package on CRAN which offers exactly this fundamental
  capability.
  Imagine R is the unified interface to access (and analyze) all public
  data in the easiest way possible. That would create a real impact,
  would put R a big leap forward and would enable us to see the world
  with different eyes.
 
  There is a lack of a direct connection to the API of these databases,
  to name a few:
 
  - Eurostat
  - OECD
  - IMF
  - Worldbank
  - UN
  - FAO
  - data.gov
  - ...
 
  The ease of access to the data is the key of information processing with R.
 
  How can we handle the flow of information noise? R has to give an
  answer to that with an extensive API to public databases.
 
  I would love your comments and ideas as a contribution in a vital 
  discussion.
 
  Benjamin
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
  **
  The contents of this message do not reflect any position of the U.S. 
  Government or NOAA.
  **
  Roy Mendelssohn
  Supervisory Operations Research Analyst
  NOAA/NMFS
  Environmental Research Division
  Southwest Fisheries Science Center

Re: [R] Averaging within a range of values

2012-01-14 Thread Gabor Grothendieck

On Fri, Jan 13, 2012 at 6:34 AM, doggysaywhat chwh...@ucsd.edu wrote:
 Hello all.

 I have two data frames.
 Group       Start          End
 G1                     200               700
 G2                     500               1000
 G3                     2000        3000
 G4                     4000        6000
 G5                     7000        8000


 and

 Pos                 C0              C1
 200                 0.9           0.6
 500               0.8             0.8
 800                 0.9           0.7
 1000              0.7           0.6
 2000            0.6               0.4
 2500            1.2             0.8
 3000            0.6             1.5
 3500            0.7             0.7
 4000            0.8               0.8
 4500            0.6             0.6
 5000              0.9           0.9
 5500            0.7               0.8
 6000            0.8             0.7
 6500            0.4             0.4
 7000              0.5           0.8
 7500            0.7               0.9
 8000            0.9             0.5
 8500            0.8             0.6
 9000            0.9             0.8


 I need to conditionally average all values in columns C0 and C1 based upon
 the bins I defined in the first data frame.  For example, for the bin G1 in
 the first dataframe, the values are 200 to 700 so i would average the value
 at pos 200 (0.9) and 500 (0.8) for C0 and then perform the same thing for
 C1.

 I can do this in excel with array formulas but I'm relatively new to R and
 would like know if there is a function that will perform the same action.  I
 don't know if this will help, but the excel array function I used was
 average(if(range=start)*(range=end),range)).  Where the range is the
 entire pos column.

 Initially I looked at the aggregate function.   I can use aggregate when I
 give a single vector to be used for grouping such as (A,B,C) but I'm not
 sure how to define grouping as the bin 200-500 and the second bin as
 500-1000 etc. and use that as my grouping vector.


Here is an sqldf solution where the two input data frames are d1 and
d2 (as in Jeff's post).  Note that Group is quoted since its an SQL
keyword:

library(sqldf)

sqldf(select d1.'Group', avg(d2.C0), avg(d2.C1)
   from d1, d2
   where d2.Pos between d1.Start and d1.End
   group by d1.'Group')

The result is;

  Group avg(d2.C0) avg(d2.C1)
1G1   0.85  0.700
2G2   0.80  0.700
3G3   0.80  0.900
4G4   0.76  0.760
5G5   0.70  0.733

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] The Future of R | API to Public Databases

2012-01-14 Thread Benjamin Weber

Mike

We see that the publishers are aware of the problem. They don't think
that the raw data is the usable for the user. Consequently they
recognizing this fact with the proprietary formats. Yes, they resign
in the information overload. That's pathetic.

It is not a question of *which* data format, it is a question about
the general concept. Where do publisher and user meet? There has to be
one *defined* point which all parties agree on. I disagree with your
statement that the publisher should just publish csv or cook his own
API. That leads to fragmentation and inaccessibility of data. We want
data to be accessible.

A more pragmatic approach is needed to revolutionize the way we go
about raw data.

Benjamin

On 14 January 2012 22:17, Mike Marchywka marchy...@hotmail.com wrote:







 LOL, I remember posting about this in the past. The US gov agencies vary but 
 mostare quite good. The big problem appears to be people who push proprietary 
 orcommercial standards for which only one effective source exists. Some 
 formats,like Excel and PDF come to mind and there is a disturbing trend 
 towards theiradoption in some places where raw data is needed by many. The 
 best thing to do is contact the informationprovider and let them know you 
 want raw data, not images or stuff that worksin limited commercial software 
 packages. Often data sources are valuable andthe revenue model impacts 
 availability.

 If you are just arguing over different open formats,  it is usually easy for 
 someone towrite some conversion code and publish it- CSV to JSON would not be 
 a problem for example. Data of course are quite variable and there is 
 nothingwrong with giving provider his choice.

 
 Date: Sat, 14 Jan 2012 10:21:23 -0500
 From: ja...@rampaginggeek.com
 To: r-help@r-project.org
 Subject: Re: [R] The Future of R | API to Public Databases

 Web services are only part of the problem. In essence, there are at
 least two facets:
 1. downloading the data using some protocol
 2. mapping the data to a common model

 Having #1 makes the import/download easier, but it really becomes useful
 when both are included. I think #2 is the harder problem to address.
 Software can usually be written to handle #1 by making a useful
 abstraction layer. #2 means that data has consistent names and meanings,
 and this requires people to agree on common definitions and a common
 naming convention.

 RDF (Resource Description Framework) and its related technologies
 (SPARQL, OWL, etc) are one of the many attempts to try to address this.
 While this effort would benefit R, I think it's best if it's part of a
 larger effort.

 Services such as DBpedia and Freebase are trying to unify many data sets
 using RDF.

 The task view and package ideas a great ideas. I'm just adding another
 perspective.

 Jason

 On 01/13/2012 05:18 PM, Roy Mendelssohn wrote:
  HI Benjamin:
 
  What would make this easier is if these sites used standardized web 
  services, so it would only require writing once. data.gov is the worst 
  example, they spun the own, weak service.
 
  There is a lot of environmental data available through OPenDAP, and that 
  is supported in the ncdf4 package. My own group has a service called 
  ERDDAP that is entirely RESTFul, see:
 
  http://coastwatch.pfel.noaa.gov/erddap
 
  and
 
  http://upwell.pfeg.noaa.gov/erddap
 
  We provide R (and matlab) scripts that automate the extract for certain 
  cases, see:
 
  http://coastwatch.pfeg.noaa.gov/xtracto/
 
  We also have a tool called the Environmental Data Connector (EDC) that 
  provides a GUI from with R (and ArcGIS, Matlab and Excel) that allows you 
  to subset data that is served by OPeNDAP, ERDDAP, certain Sensor 
  Observation Service (SOS) servers, and have it read directly into R. It is 
  freely available at:
 
  http://www.pfeg.noaa.gov/products/EDC/
 
  We can write such tools because the service is either standardized 
  (OPeNDAP, SOS) or is easy to implement (ERDDAP).
 
  -Roy
 
 
  On Jan 13, 2012, at 1:14 PM, Benjamin Weber wrote:
 
  Dear R Users -
 
  R is a wonderful software package. CRAN provides a variety of tools to
  work on your data. But R is not apt to utilize all the public
  databases in an efficient manner.
  I observed the most tedious part with R is searching and downloading
  the data from public databases and putting it into the right format. I
  could not find a package on CRAN which offers exactly this fundamental
  capability.
  Imagine R is the unified interface to access (and analyze) all public
  data in the easiest way possible. That would create a real impact,
  would put R a big leap forward and would enable us to see the world
  with different eyes.
 
  There is a lack of a direct connection to the API of these databases,
  to name a few:
 
  - Eurostat
  - OECD
  - IMF
  - Worldbank
  - UN
  - FAO
  - data.gov
  - ...
 
  The ease of access to the data is the key of information processing with

Re: [R] The Future of R | API to Public Databases

2012-01-14 Thread Joshua Wiley

I have been following this thread, but there are many aspects of it
which are unclear to me.  Who are the publishers?  Who are the users?
What is the problem?  I have a vauge sense for some of these, but it
seems to me like one valuable starting place would be creating a
document that clarifies everything.  It is easier to tackle a concrete
problem (e.g., agree on a standard numerical representation of dates
and times a la ISO 8601) than something diffuse (e.g., information
overload).

Good luck,

Josh

On Sat, Jan 14, 2012 at 10:02 AM, Benjamin Weber m...@bwe.im wrote:
 Mike

 We see that the publishers are aware of the problem. They don't think
 that the raw data is the usable for the user. Consequently they
 recognizing this fact with the proprietary formats. Yes, they resign
 in the information overload. That's pathetic.

 It is not a question of *which* data format, it is a question about
 the general concept. Where do publisher and user meet? There has to be
 one *defined* point which all parties agree on. I disagree with your
 statement that the publisher should just publish csv or cook his own
 API. That leads to fragmentation and inaccessibility of data. We want
 data to be accessible.

 A more pragmatic approach is needed to revolutionize the way we go
 about raw data.

 Benjamin

 On 14 January 2012 22:17, Mike Marchywka marchy...@hotmail.com wrote:







 LOL, I remember posting about this in the past. The US gov agencies vary but 
 mostare quite good. The big problem appears to be people who push 
 proprietary orcommercial standards for which only one effective source 
 exists. Some formats,like Excel and PDF come to mind and there is a 
 disturbing trend towards theiradoption in some places where raw data is 
 needed by many. The best thing to do is contact the informationprovider and 
 let them know you want raw data, not images or stuff that worksin limited 
 commercial software packages. Often data sources are valuable andthe revenue 
 model impacts availability.

 If you are just arguing over different open formats,  it is usually easy for 
 someone towrite some conversion code and publish it- CSV to JSON would not 
 be a problem for example. Data of course are quite variable and there is 
 nothingwrong with giving provider his choice.

 
 Date: Sat, 14 Jan 2012 10:21:23 -0500
 From: ja...@rampaginggeek.com
 To: r-help@r-project.org
 Subject: Re: [R] The Future of R | API to Public Databases

 Web services are only part of the problem. In essence, there are at
 least two facets:
 1. downloading the data using some protocol
 2. mapping the data to a common model

 Having #1 makes the import/download easier, but it really becomes useful
 when both are included. I think #2 is the harder problem to address.
 Software can usually be written to handle #1 by making a useful
 abstraction layer. #2 means that data has consistent names and meanings,
 and this requires people to agree on common definitions and a common
 naming convention.

 RDF (Resource Description Framework) and its related technologies
 (SPARQL, OWL, etc) are one of the many attempts to try to address this.
 While this effort would benefit R, I think it's best if it's part of a
 larger effort.

 Services such as DBpedia and Freebase are trying to unify many data sets
 using RDF.

 The task view and package ideas a great ideas. I'm just adding another
 perspective.

 Jason

 On 01/13/2012 05:18 PM, Roy Mendelssohn wrote:
  HI Benjamin:
 
  What would make this easier is if these sites used standardized web 
  services, so it would only require writing once. data.gov is the worst 
  example, they spun the own, weak service.
 
  There is a lot of environmental data available through OPenDAP, and that 
  is supported in the ncdf4 package. My own group has a service called 
  ERDDAP that is entirely RESTFul, see:
 
  http://coastwatch.pfel.noaa.gov/erddap
 
  and
 
  http://upwell.pfeg.noaa.gov/erddap
 
  We provide R (and matlab) scripts that automate the extract for certain 
  cases, see:
 
  http://coastwatch.pfeg.noaa.gov/xtracto/
 
  We also have a tool called the Environmental Data Connector (EDC) that 
  provides a GUI from with R (and ArcGIS, Matlab and Excel) that allows you 
  to subset data that is served by OPeNDAP, ERDDAP, certain Sensor 
  Observation Service (SOS) servers, and have it read directly into R. It 
  is freely available at:
 
  http://www.pfeg.noaa.gov/products/EDC/
 
  We can write such tools because the service is either standardized 
  (OPeNDAP, SOS) or is easy to implement (ERDDAP).
 
  -Roy
 
 
  On Jan 13, 2012, at 1:14 PM, Benjamin Weber wrote:
 
  Dear R Users -
 
  R is a wonderful software package. CRAN provides a variety of tools to
  work on your data. But R is not apt to utilize all the public
  databases in an efficient manner.
  I observed the most tedious part with R is searching and downloading
  the data from public databases and putting it into the

[R] How can I change font type in graph (including all the text in lagend, and the number in x-y axis)

2012-01-14 Thread Yakamu Yakamu

Dear all, 
IÂ would like to make a survival analysis graph line with all fonts in Times 
New Roman,
Including all the numbers in x-y axis and the legend explanation.
I know how to change fonts for the x-y axis labels (from 
http://www.statmethods.net/advgraphs/parameters.htmlÂ )Â 
and this is what i did :
# SURVIVAL PLOT 
colsurvival-c(black, black, black, black)
windowsFonts(A=windowsFont(Times New Roman))

plot(fit1, lty=c(2, 1, 4, 3), lwd=2, col=colsurvival, yscale=100, 
frame.plot=FALSE)
title(xlab=results, cex.lab=1.3, cex.axis=1.3, ylab=percentage survival, 
family=A)
legend(âbottomleftâ, â¦â¦â¦etcâ¦) 
I have the titles all in TimesNEw Roman, but not the number in x-y axis. 
Is there anyone can help me here? Thank you very much in advance,
Kind regards,
Yakamu
Â 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fUtilities removed -- use fBasics

2012-01-14 Thread Dominic Comtois

Thank you for your prompt and useful replies. I will be using fBasics from
now on.

Regards,

Dominic Comtois, Montréal

-Message d'origine-
De : Martin Maechler [mailto:maech...@stat.math.ethz.ch] 
Envoyé : 14 janvier 2012 06:39
À : David Winsemius
Cc : Dominic Comtois; r-help@r-project.org; rmetrics-c...@r-project.org
Objet : Re: [R] fUtilities removed -- use fBasics

 David Winsemius dwinsem...@comcast.net
 on Fri, 13 Jan 2012 13:52:57 -0500 writes:

 On Jan 13, 2012, at 12:33 PM, Dominic Comtois wrote:

 When setting up my new machine, I had the surprise to see
 that Package 'fUtilities' was removed from the CRAN
 repository.
 

 https://stat.ethz.ch/pipermail/rmetrics-core/2012-January/000554.html
 https://stat.ethz.ch/pipermail/rmetrics-core/2011-November/000549.html

indeed. thank you David (and Google, I presume ..)

 
 This is problematic for my work. I use many of its
 functions, and it will complicate things a lot if other
 programmers want to use my previous code in the
 future. Plus, nowhere can I find the justification for
 its removal.

For a longer time, the Rmetrics management had planned to deprecate
fUtilities (and fSeries and fCalendar), basically refactoring the
functionality ``approximately'' along the lines of

 old package replacement pkgs
 --- 
 fUtilities  fBasics
 fSeries timeSeries
 fCalendar   timeDate

but clearly not a 1:1 replacement, but a refactoring as said above.
fBasics, indeed 'Depends' on both timeSeries and timeDate, so I think it is
safe to say that you should replace

   fUtilities by fBasics
everywhere ... and things should work...

Yes, the communication about these plans where not put out the way they
should have; and indeed the deprecation would not have necessarily meant
that the package be dropped without proper notice.
One excuse has been the lack of resources and health on the side of
Rmetrics.

Disclaimer: I am one of rmetrics-c...@r-project.org, as having been an
active co-maintainer of some parts of the Rmetrics collection, but I have
not been part of the management nor the foundation.

Martin Maechler, ETH Zurich

 You need to send your questions to the maintainers. They
 apparently did not respond to the requests to fix the
 errors.

 
 Thanks for any info on this

 You should perhaps subscribe to the list that is
 established for discussion on this and related packages.

 -- 

 David Winsemius, MD West Hartford, CT

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
 read the posting guide
 http://www.R-project.org/posting-guide.html and provide
 commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] metafor: weights computation in Mantel-Haenszel method

2012-01-14 Thread Ignacio López De Ullibarri Galparsoro


Dear R users, 

In metafor 1.6-0, the Mantel-Haenszel method is implemented by the rma.mh() 
function. I have observed that the sum of the weights computed by weights(x) 
doesn't add to 100% when x is an object of class rma.mh. The consequences of 
this fact can be clearly seen when a forest diagram is drawn with forest(x), 
which calls weights(x) (or more precisely, the method weights.rma.mh() defined 
in the package). 

Is this, as I suppose, a bug? 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Nabble? Was Re: function to replace values doesn't work on vectors

2012-01-14 Thread Duncan Murdoch


On 12-01-13 11:48 AM, Sarah Goslee wrote:
 ...

I hope that it was a momentary glitch; greater disagreement between Nabble
and the email list will cause all sorts of fun. If the interface,
whatever it is,
starts stripping out code? I'll have to quit answering Nabble queries entirely.


You know, that's a great suggestion.  I'm now filtering out all messages 
with nabble.com in the Message-ID.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tm package, custom reader

2012-01-14 Thread Andy Adamiec

On Sat, Jan 14, 2012 at 12:41 PM, Milan Bouchet-Valat nalimi...@club.frwrote:

 Le samedi 14 janvier 2012 à 12:24 -0600, Andy Adamiec a écrit :
  Hi Milan,
 
 
  The xml solr files are not in a typical format, here is an example
  http://www.omegahat.org/RSXML/solr.xml
  I'm not sure how to parse the documents with out using solrDocs.R
  function, and how to make the function compatible with a tm package.
 Indeed, this doesn't seem to be easy to parse using the generic XML
 source from tm. So it will be easier for you to create your own custom
 source from scratch. Have a look at the source.R and reader.R files in
 the tm source: you need to replicate the behavior of one of the sources.

 The code should include the following functions:

 readSorl - FunctionGenerator(function(...) {
function(elem, language, id) {
# Use elem$content, which contains an item set by SorlSource()
 below,
# and create a PlainTextDocument() from it,
# putting the data where appropriate (text, meta-data)
}
 })

 SorlSource - function(x) {
# Parse the XML file using functions from solrDocs.R, and
# create content, which is a list with one item for each document,
# to pass to readSorl() one by one

s - tm:::.Source(readSorl, UTF-8, length(content), FALSE, seq(1,
 length(content)), 0, FALSE)
s$Content - content
s$URI - match.call()$x
class(s) = c(SorlSource, Source)
s
 }

 getElem - function(x) UseMethod(getElem, x)
 getElem.SorlSource -  function(x) {
list(content = x$Content[[x$Position]], uri = match.call()$x)
 }

 eoi - function(x) UseMethod(eoi, x)
 eoi.SorlSource - function(x) length(x$Content) = x$Position


 Hope this helps



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tm package, custom reader

2012-01-14 Thread Andy Adamiec

On Sat, Jan 14, 2012 at 12:41 PM, Milan Bouchet-Valat nalimi...@club.frwrote:

 Le samedi 14 janvier 2012 à 12:24 -0600, Andy Adamiec a écrit :
  Hi Milan,
 
 
  The xml solr files are not in a typical format, here is an example
  http://www.omegahat.org/RSXML/solr.xml
  I'm not sure how to parse the documents with out using solrDocs.R
  function, and how to make the function compatible with a tm package.
 Indeed, this doesn't seem to be easy to parse using the generic XML
 source from tm. So it will be easier for you to create your own custom
 source from scratch. Have a look at the source.R and reader.R files in
 the tm source: you need to replicate the behavior of one of the sources.

 The code should include the following functions:

 readSorl - FunctionGenerator(function(...) {
function(elem, language, id) {
# Use elem$content, which contains an item set by SorlSource()
 below,
# and create a PlainTextDocument() from it,
# putting the data where appropriate (text, meta-data)
}
 })

 SorlSource - function(x) {
# Parse the XML file using functions from solrDocs.R, and
# create content, which is a list with one item for each document,
# to pass to readSorl() one by one

s - tm:::.Source(readSorl, UTF-8, length(content), FALSE, seq(1,
 length(content)), 0, FALSE)
s$Content - content
s$URI - match.call()$x
class(s) = c(SorlSource, Source)
s
 }

 getElem - function(x) UseMethod(getElem, x)
 getElem.SorlSource -  function(x) {
list(content = x$Content[[x$Position]], uri = match.call()$x)
 }

 eoi - function(x) UseMethod(eoi, x)
 eoi.SorlSource - function(x) length(x$Content) = x$Position


 Hope this helps



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How can I change font type in graph (including all the text in lagend, and the number in x-y axis)

2012-01-14 Thread David Winsemius



On Jan 14, 2012, at 2:19 PM, Yakamu Yakamu wrote:


Dear all,
I would like to make a survival analysis graph line with all fonts  
in Times New Roman,

Including all the numbers in x-y axis and the legend explanation.
I know how to change fonts for the x-y axis labels (from http://www.statmethods.net/advgraphs/parameters.html 
 )

and this is what i did :
# SURVIVAL PLOT
colsurvival-c(black, black, black, black)
windowsFonts(A=windowsFont(Times New Roman))

plot(fit1, lty=c(2, 1, 4, 3), lwd=2, col=colsurvival, yscale=100,  
frame.plot=FALSE)
title(xlab=results, cex.lab=1.3, cex.axis=1.3, ylab=percentage  
survival, family=A)

legend(“bottomleft”, ………etc…)
I have the titles all in TimesNEw Roman, but not the number in x-y  
axis.


(Since you only passed A as an argument to `title`. Why would this  
be expected to bleed over into the axis? I doubt that cex.axis is  
having any effect, either.)



Is there anyone can help me here? Thank you very much in advance,


You may want to see if passing a family argument to `plot` has an  
effect on what is eventually a call to `axis`. That's also (probably)   
where you should be inserting the cex.axis. Cannot test since I  
don't use windows (and you didn't include a reproducible  sample,  
anyway.)
(In other situations the font argument is often a number rather than  
the results of a call to a font-function. See the par help page)





Kind regards,
Yakamu

[[alternative HTML version deleted]]


You should learn to post in plain text and PLEASE stop replying to  
existing thresads when yu are submitting a new question. It screws up  
the threading.



PLEASE do read the posting guide http://www.R-project.org/posting-guide.html



AND provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] The Future of R | API to Public Databases

2012-01-14 Thread Paul Gilbert

The situation for this kind of interface is much more advanced (for 
economic time series data) than has been suggested in other postings. 
Several of the organizations you mention support SDMX and I believe 
there is a working R interface to SDMX which has not yet been made 
public. A more complete list of organizations that I think already have 
working server side support for SDMX is: the OECD, Eurostat, the ECB, 
the IMF, the UN, the BIS, the Federal Reserve Board, the World Bank, the 
Italian Statistics agency, and to a small extent by the Bank of Canada. 
 I have a working API to several time series databases (TS* packages on 
CRAN), and a partially working interface to SDMX, but have postponed 
further development of that in the hope that the already working code 
will be made available. Please see http://tsdbi.r-forge.r-project.org/ 
for more details. I would, of course, be happy to have other developers 
involved in this project. If you think you can contribute then see 
r-forge.r-project.org for details on how to join projects.


Paul

On 12-01-14 06:00 AM, r-help-requ...@r-project.org wrote:

Date: Sat, 14 Jan 2012 02:44:07 +0530
From: Benjamin Weberm...@bwe.im
To:r-help@r-project.org
Subject: [R] The Future of R | API to Public Databases
Message-ID:
cany9q8k+zyvrkjjgbjp+jtnyaw15gqkocivyvpgwgyqa9dl...@mail.gmail.com
Content-Type: text/plain; charset=UTF-8

Dear R Users -

R is a wonderful software package. CRAN provides a variety of tools to
work on your data. But R is not apt to utilize all the public
databases in an efficient manner.
I observed the most tedious part with R is searching and downloading
the data from public databases and putting it into the right format. I
could not find a package on CRAN which offers exactly this fundamental
capability.
Imagine R is the unified interface to access (and analyze) all public
data in the easiest way possible. That would create a real impact,
would put R a big leap forward and would enable us to see the world
with different eyes.

There is a lack of a direct connection to the API of these databases,
to name a few:

- Eurostat
- OECD
- IMF
- Worldbank
- UN
- FAO
- data.gov
- ...

The ease of access to the data is the key of information processing with R.

How can we handle the flow of information noise? R has to give an
answer to that with an extensive API to public databases.

I would love your comments and ideas as a contribution in a vital discussion.

Benjamin


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] HOW To CHANGE THE TYPE OF NUMBER IN THE X-Y AXIS in the (barplot) GRAPH?

2012-01-14 Thread Jim Lemon


On 01/15/2012 02:16 AM, Yakamu Yakamu wrote:

Dear all,
I have troubles where i have to make all the fonts in my grpahs into TImes New 
Roman,
I know now how to change fonts for the x-axis-y-axis labels (from
http://www.statmethods.net/advgraphs/parameters.html )
  but HOW CAN I ALSO CHANGE THE TYPE OF FONT FOR THE NUMBER INTO Times New 
Roman?


Hi Yamaku,
Try this:

par(family=times)
plot(...)

This changes the tick labels (which is what I think you want) to Times.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simulating stable VAR process

2012-01-14 Thread John C Frain

Mark, statquant2

As I understand the question it is not to test if a VAR is stable but how
to construct a VAR that is stable and automatically satisfies the condition
Mark has taken from Lutkohl. The algorithm that I have set out will
automatically satisfy that condition.The matrix that should be estimated
by the algorithm is A on the last line of page 15 of Lutkepohl.
 Incidentally the corresponding matrix for the example on page 15 is
singular. The algorithm that I have set out will only lead to systems with
a non-singular matrix.

I still don't see how a matrix generated in this way corresponds to a real
economic system.  Of course you may have some other constraints in mind
that would make the generated system correspond to something more real.

John

On Saturday, 14 January 2012, Mark Leeds marklee...@gmail.com wrote:
 Hi statquant2 and john: In the first chapter of Lutkepohl, it is shown
that stability f
 a VAR(p) is the same as

 det(I_k - A1z -  Ap Z^p )  does not equal zero for z  1.

 where I_k - A1z - ... Ap z^p is referred to as the reverse characteristic
polynomial.

 So, statquant2,  given your A's,  one way to do it but be would be to
check the roots of the
 polynomial implied by taking the determinant of the your polynomial.

 There's an example on pg 17 of lutkepohl if you have it. If you don't, I
can fax it to you
 over the weekend if you want it.



 On Fri, Jan 13, 2012 at 8:34 PM, John C Frain fra...@gmail.com wrote:

 I think that you must approach this in a different way.

 1 Draw a set of random eigenvalues with modulus  1
 2 Draw a set of random eigenvalues vectors.
 3 From these you can, with some matrix manipulations, derive the
 corresponding Var coefficients.

 If your original coefficients were drawn at random I suspect that the VAR
 would not be stable. I am curious about what you are trying to do.

 John

 On Friday, 13 January 2012, statquant2 statqu...@gmail.com wrote:
  Hello Paul
  Thanks for the answer but my point is not how to simulate a VAR(p)
process
  and check that it is stable.
  My question is more how can I generate a VAR(p) such that I already
know
  that it is stable.
 
  We know a condition that assure that it is stable (see first message)
but
  this is not a condition on coefficients etc...
  What I want is
  generate say a 1000 random VAR(3) processes over say 500 time periods
that
  will be STABLE (meaning If I run stability() all will pass the test)
 
  When I try to do that it seems that none of the VAR I am generating
pass
  this test, so I assume that the class of stable VAR(p) is very small
  compared to the whole VAR(p) process.
 
 
 
  --
  View this message in context:

http://r.789695.n4.nabble.com/simulating-stable-VAR-process-tp4261177p4291835.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 --
 John C Frain
 Economics Department
 Trinity College Dublin
 Dublin 2
 Ireland
 www.tcd.ie/Economics/staff/frainj/home.html
 mailto:fra...@tcd.ie
 mailto:fra...@gmail.com

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
John C Frain
Economics Department
Trinity College Dublin
Dublin 2
Ireland
www.tcd.ie/Economics/staff/frainj/home.html
mailto:fra...@tcd.ie
mailto:fra...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] par.plot() for repeated measurements

2012-01-14 Thread Jonas Stein

 I am using the package gamlss in R to plot repeated measurements. The
 command I am using is par.plot(). It works great except one thing about the
 label of the axises. I tried to label both x and y axises using ylab and
 xlab options. But the plot only gives variable variables. The labels did not
 show up. Below is the code I used.  Any comments are appreciated! Thanks. 

 library(gamlss)
 enable2r=read.csv(D:\\lzg\\jointmodel\\enable2r.csv,header=T)
 enable2r$ID-factor(enable2r$ID)
 par.plot(factpal~timetodeath2,data=enable2r,sub=ID,ylim=c(45,184),ylab='FACIT-PAL',xlab='Time
 to death',color=FALSE,lwd=1)

i can not use your example, as i have no enable2r.csv, but perhaps
you have luck if you change 

xlab='Time to death'
to
xlab=Time to death

kind regards,

-- 
Jonas Stein n...@jonasstein.de
https://github.com/jonasstein/R-Reference-Card

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] The Future of R | API to Public Databases

2012-01-14 Thread Jason Edgecombe


HI,

Happy to oblidge:

Background: All data is dirty to some degree. A large portion of the 
time spent in doing data analysis is spent doing data cleaning (removing 
invalid data, transforming data columns into something useful). When 
data from multiple sources are used, then some time must be spent in 
making the data be able to be merged.


Publishers: anyone who provides data in large quantities, usually 
governments and public organizations.


Users: anyone who wants to use the data. This could be journalists, 
scientists, an concerned citizen, other organizations, etc...


Problems:
1. users of data have a hard time finding data, if they can find it at 
all. This is the rendezvous point. There should be a common service or 
place to publicize the data and allow people to find it. data markets 
such as Infochimps can help with this.


2. data is often published using different protocols. Some data sets are 
so big, that the data is accessed using a custom API. Many of these 
services use web services, but method names vary. This is a technical 
problem that can be worked around using libraries to translate from one 
protocol to another. A 3rd party may also help here by aggregating data 
sets. Publisher-specific libraries have been proposed to help address 
this, but I think those are also a compromise.


3. data sets rarely use common data fields/columns and what they measure 
may vary slightly. Having a common names, and definitions for often-used 
columns allows for confidence in merging the data and more accurate 
insights may be made.


If these issues can be solved, then large amount of data analysts' time 
can be freed up by reducing the data cleansing phase. On top of that, if 
the data can be merged in an automated way, then even laymen can do 
their own analysis. This problem is similar, if not identical, to the 
one being addressed by the semantic web movement.


These problems can't be solved just by using ISO-formatted dates, part 
of the problem is getting people to use common meanings for the fields.


Here is an example to illustrate: Public universities publish data such 
as the number of students enrolled. This number is often broken down by 
undergraduate and graduate students, but you have to know how that is 
measured. Are post-baccalaureates counted as graduate students? Were the 
students counted by head count or by full-time equivalent (FTE) (sum of 
total enrolled credit hours  / credit hours for a full-time student). 
Even the definition of FTE varies by university or by university system.


Jason


On 01/14/2012 01:51 PM, Joshua Wiley wrote:

I have been following this thread, but there are many aspects of it
which are unclear to me.  Who are the publishers?  Who are the users?
What is the problem?  I have a vauge sense for some of these, but it
seems to me like one valuable starting place would be creating a
document that clarifies everything.  It is easier to tackle a concrete
problem (e.g., agree on a standard numerical representation of dates
and times a la ISO 8601) than something diffuse (e.g., information
overload).

Good luck,

Josh

On Sat, Jan 14, 2012 at 10:02 AM, Benjamin Weberm...@bwe.im  wrote:

Mike

We see that the publishers are aware of the problem. They don't think
that the raw data is the usable for the user. Consequently they
recognizing this fact with the proprietary formats. Yes, they resign
in the information overload. That's pathetic.

It is not a question of *which* data format, it is a question about
the general concept. Where do publisher and user meet? There has to be
one *defined* point which all parties agree on. I disagree with your
statement that the publisher should just publish csv or cook his own
API. That leads to fragmentation and inaccessibility of data. We want
data to be accessible.

A more pragmatic approach is needed to revolutionize the way we go
about raw data.

Benjamin

On 14 January 2012 22:17, Mike Marchywkamarchy...@hotmail.com  wrote:







LOL, I remember posting about this in the past. The US gov agencies vary but mostare 
quite good. The big problem appears to be people who push proprietary orcommercial 
standards for which only one effective source exists. Some formats,like Excel 
and PDF come to mind and there is a disturbing trend towards theiradoption in some places 
where raw data is needed by many. The best thing to do is contact the informationprovider 
and let them know you want raw data, not images or stuff that worksin limited commercial 
software packages. Often data sources are valuable andthe revenue model impacts 
availability.

If you are just arguing over different open formats,  it is usually easy for 
someone towrite some conversion code and publish it- CSV to JSON would not be a 
problem for example. Data of course are quite variable and there is 
nothingwrong with giving provider his choice.



Date: Sat, 14 Jan 2012 10:21:23 -0500
From:

Re: [R] simulating stable VAR process

2012-01-14 Thread John C Frain

Mark

This should be reasonably straightforward. In the simplest case you wih to
draw a random complex number in the unit circle. This is best done in polar
coordinates.

If r is a random mumber on (0,1) and theta a random number on (0, 2 Pi)
then if x=r cos(theta) and y= r sin(theta), x + i y is inside the unit
circle.  As such roots come in conjugate pairs a second is x-iy. If you
then need an odd number of roots the final can simply be a random number on
(0,1). You do not need to use a uniform distribution but can use any
distribution on the required intervals or restrain more or the eigenvalues
to be real.

John

On Sunday, 15 January 2012, Mark Leeds marklee...@gmail.com wrote:
 hi  john. I think I follow you. but , in your algorithm, it is
straightforward to
 generate a set of eigenvalues with modulus less than 1 ?  thanks.


 On Sat, Jan 14, 2012 at 5:31 PM, John C Frain fra...@gmail.com wrote:

 Mark, statquant2

 As I understand the question it is not to test if a VAR is stable but how
to construct a VAR that is stable and automatically satisfies the condition
Mark has taken from Lutkohl. The algorithm that I have set out will
automatically satisfy that condition.The matrix that should be estimated
by the algorithm is A on the last line of page 15 of Lutkepohl.
 Incidentally the corresponding matrix for the example on page 15 is
singular. The algorithm that I have set out will only lead to systems with
a non-singular matrix.

 I still don't see how a matrix generated in this way corresponds to a
real economic system.  Of course you may have some other constraints in
mind that would make the generated system correspond to something more real.

 John

 On Saturday, 14 January 2012, Mark Leeds marklee...@gmail.com wrote:
 Hi statquant2 and john: In the first chapter of Lutkepohl, it is shown
that stability f
 a VAR(p) is the same as

 det(I_k - A1z -  Ap Z^p )  does not equal zero for z  1.

 where I_k - A1z - ... Ap z^p is referred to as the reverse
characteristic polynomial.

 So, statquant2,  given your A's,  one way to do it but be would be to
check the roots of the
 polynomial implied by taking the determinant of the your polynomial.

 There's an example on pg 17 of lutkepohl if you have it. If you don't, I
can fax it to you
 over the weekend if you want it.



 On Fri, Jan 13, 2012 at 8:34 PM, John C Frain fra...@gmail.com wrote:

 I think that you must approach this in a different way.

 1 Draw a set of random eigenvalues with modulus  1
 2 Draw a set of random eigenvalues vectors.
 3 From these you can, with some matrix manipulations, derive the
 corresponding Var coefficients.

 If your original coefficients were drawn at random I suspect that the
VAR
 would not be stable. I am curious about what you are trying to do.

 John

 On Friday, 13 January 2012, statquant2 statqu...@gmail.com wrote:
  Hello Paul
  Thanks for the answer but my point is not how to simulate a VAR(p)
process
  and check that it is stable.
  My question is more how can I generate a VAR(p) such that I already
know
  that it is stable.
 
  We know a condition that assure that it is stable (see first message)
but
  this is not a condition on coefficients etc...
  What I want is
  generate say a 1000 random VAR(3) processes over say 500 time periods
that
  will be STABLE (meaning If I run stability() all will pass the test)
 
  When I try to do that it seems that none of the VAR I am generating
pass
  this test, so I assume that the class of stable VAR(p) is very small
  compared to the whole VAR(p) process.
 
 
 
  --
  View this message in context:

http://r.789695.n4.nabble.com/simulating-stable-VAR-process-tp4261177p4291835.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 --
 John C Frain
 Economics Department
 Trinity College Dublin
 Dublin 2
 Ireland


-- 
John C Frain
Economics Department
Trinity College Dublin
Dublin 2
Ireland
www.tcd.ie/Economics/staff/frainj/home.html
mailto:fra...@tcd.ie
mailto:fra...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simulating stable VAR process

2012-01-14 Thread Mark Leeds

gotcha john. thanks.


On Sat, Jan 14, 2012 at 9:28 PM, John C Frain fra...@gmail.com wrote:

 Mark

 This should be reasonably straightforward. In the simplest case you wih to
 draw a random complex number in the unit circle. This is best done in polar
 coordinates.

 If r is a random mumber on (0,1) and theta a random number on (0, 2 Pi)
 then if x=r cos(theta) and y= r sin(theta), x + i y is inside the unit
 circle.  As such roots come in conjugate pairs a second is x-iy. If you
 then need an odd number of roots the final can simply be a random number on
 (0,1). You do not need to use a uniform distribution but can use any
 distribution on the required intervals or restrain more or the eigenvalues
 to be real.

 John

 On Sunday, 15 January 2012, Mark Leeds marklee...@gmail.com wrote:
  hi  john. I think I follow you. but , in your algorithm, it is
 straightforward to
  generate a set of eigenvalues with modulus less than 1 ?  thanks.
 
 
  On Sat, Jan 14, 2012 at 5:31 PM, John C Frain fra...@gmail.com wrote:
 
  Mark, statquant2
 
  As I understand the question it is not to test if a VAR is stable but
 how to construct a VAR that is stable and automatically satisfies the
 condition Mark has taken from Lutkohl. The algorithm that I have set out
 will automatically satisfy that condition.The matrix that should be
 estimated by the algorithm is A on the last line of page 15 of Lutkepohl.
  Incidentally the corresponding matrix for the example on page 15 is
 singular. The algorithm that I have set out will only lead to systems with
 a non-singular matrix.
 
  I still don't see how a matrix generated in this way corresponds to a
 real economic system.  Of course you may have some other constraints in
 mind that would make the generated system correspond to something more real.
 
  John
 
  On Saturday, 14 January 2012, Mark Leeds marklee...@gmail.com wrote:
  Hi statquant2 and john: In the first chapter of Lutkepohl, it is shown
 that stability f
  a VAR(p) is the same as
 
  det(I_k - A1z -  Ap Z^p )  does not equal zero for z  1.
 
  where I_k - A1z - ... Ap z^p is referred to as the reverse
 characteristic polynomial.
 
  So, statquant2,  given your A's,  one way to do it but be would be to
 check the roots of the
  polynomial implied by taking the determinant of the your polynomial.
 
  There's an example on pg 17 of lutkepohl if you have it. If you don't,
 I can fax it to you
  over the weekend if you want it.
 
 
 
  On Fri, Jan 13, 2012 at 8:34 PM, John C Frain fra...@gmail.com wrote:
 
  I think that you must approach this in a different way.
 
  1 Draw a set of random eigenvalues with modulus  1
  2 Draw a set of random eigenvalues vectors.
  3 From these you can, with some matrix manipulations, derive the
  corresponding Var coefficients.
 
  If your original coefficients were drawn at random I suspect that the
 VAR
  would not be stable. I am curious about what you are trying to do.
 
  John
 
  On Friday, 13 January 2012, statquant2 statqu...@gmail.com wrote:
   Hello Paul
   Thanks for the answer but my point is not how to simulate a VAR(p)
 process
   and check that it is stable.
   My question is more how can I generate a VAR(p) such that I already
 know
   that it is stable.
  
   We know a condition that assure that it is stable (see first
 message) but
   this is not a condition on coefficients etc...
   What I want is
   generate say a 1000 random VAR(3) processes over say 500 time
 periods that
   will be STABLE (meaning If I run stability() all will pass the test)
  
   When I try to do that it seems that none of the VAR I am generating
 pass
   this test, so I assume that the class of stable VAR(p) is very small
   compared to the whole VAR(p) process.
  
  
  
   --
   View this message in context:
 
 http://r.789695.n4.nabble.com/simulating-stable-VAR-process-tp4261177p4291835.html
   Sent from the R help mailing list archive at Nabble.com.
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 
  --
  John C Frain
  Economics Department
  Trinity College Dublin
  Dublin 2
  Ireland
 

 --
 John C Frain
 Economics Department
 Trinity College Dublin
 Dublin 2
 Ireland
 www.tcd.ie/Economics/staff/frainj/home.html
 mailto:fra...@tcd.ie
 mailto:fra...@gmail.com


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] GUI preferences are not saved

2012-01-14 Thread Phillip Feldman

I'm running R version 2.14.1 (2011-12-22) on a 32-bit Windows machine.
 I've edited the GUI preferences to increase the font size, saving my
preferences after doing so, but the next time I start an R session, my
changes to the GUI preferences are lost.  Is there a way to make the
GUI preference changes permanent?

Phillip

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Averaging within a range of values

2012-01-14 Thread cberry

doggysaywhat chwh...@ucsd.edu writes:

 My apologies for the context problem.  I'll explain.  

 df1 is a matrix of genes labeled g1 through g5 with start positions in the
 START column and end positions in the END column.

 df2 is a matrix of chromatin modification values at positions along the DNA.  

 I want to average chromatin modification values for each gene from the start
 to the end position.  So this would involve pulling out all values for
 column C0 that are between pos 200 and 700 for the first gene and averaging
 them.  Then, I would pull all values from 500 to 1000, and continue for each
 gene.  

This type of operation is what the IRanges and GenomicRanges packages
were developed for.

Suggest you install both (from bioconductor.org), then review 

http://www.bioconductor.org/help/course-materials/2011/CSAMA/Tuesday/Morning%20Talks/IRangesLecture.pdf

and the vignettes for those packages and the help page for
'findOverlaps'.

If that doesn't solve your problem, post to the bioconductor list.

HTH,

Chuck



 The example I gave previously was a short one, but I will be doing this for
 around 1000 genes with different positions.  This is why just removing one
 group.

 This was something I tried to come up with that allowed me to use start and
 end positions.  Your advice to use the cut is working.  

 start-df1[,2]
 end-df1[,3]

 while(ilength(start)){
   ilt;-i+1
print(cut(df2[,1],c(start[i],end[i])))
 }

 These were the results

  [1] lt;NA  (200,700] NA  NA  NA  NA  NA 
  [8] NA  NA  NA  NA  NA  NA  NA 
 [15] NA  NA  NA  NA  NA 
 Levels: (200,700]
  [1] NANA(500,1e+03] (500,1e+03] NANA   
  [7] NANANANANANA   
 [13] NANANANANANA   
 [19] NA   
 Levels: (500,1e+03]
  [1] NA  NA  NA  NA  NA 
  [6] (2e+03,3e+03] (2e+03,3e+03] NA  NA  NA 
 [11] NA  NA  NA  NA  NA 
 [16] NA  NA  NA  NA 
 Levels: (2e+03,3e+03]
  [1] NA  NA  NA  NA  NA 
  [6] NA  NA  NA  NA  (4e+03,6e+03]
 [11] (4e+03,6e+03] (4e+03,6e+03] (4e+03,6e+03] NA  NA 
 [16] NA  NA  NA  NA 
 Levels: (4e+03,6e+03]
  [1] NA  NA  NA  NA  NA 
  [6] NA  NA  NA  NA  NA 
 [11] NA  NA  NA  NA  NA 
 [16] (7e+03,8e+03] (7e+03,8e+03] NA  NA 
 Levels: (7e+03,8e+03]


 This is producing the right bins for each of the results, but I'm not sure
 how to put this into a data frame.  When I did this.


 start-df1[,2]
 end-df1[,3]

 while(ilength(start)){
   i-i+1
bins-(cut(df2[,1],c(start[i],end[i])))
 }

 the bins variable was the last level.  
 Is there a way to assign the results of the of the while statement to a
 dataframe?

 Many thanks

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Averaging-within-a-range-of-values-tp4291958p4294061.html
 Sent from the R help mailing list archive at Nabble.com.


-- 
Charles C. BerryDept of Family/Preventive Medicine
cberry at ucsd edu  UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] add column with values found in another data frame

2012-01-14 Thread jdog76

 I am positive this problem has a very simple solution, but I have been
unable to find it, so I am asking for your help. I need to know how to look
something up in one data frame and add it as a column in another.  If I have
a data frame that looks like this:

 frame1
  ID  score  test age
1  Guy1 10 1  20
2 Guy1 13 2  20
3 Guy2 9 1  33
4 Guy2 11 2  33

and another frame that looks like this:

 frame2
  ID
1 Guy1
2 Guy2

How do I add a column to frame2 so it looks like this:

  ID age
1 Guy1 20
2 Guy2 33

I know this must be simple, but I couldn't find the solution by searching.

thanks so much
Jeremy


--
View this message in context: 
http://r.789695.n4.nabble.com/add-column-with-values-found-in-another-data-frame-tp4295626p4295626.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Estimate the average abundance using Poisson regression with a log link.

2012-01-14 Thread frankthetank

Hello, please excuse the simplicity of this question as I am not very good
with stats. I am taking a class, using R which I am learning at the same
time, and the questions asks us to Estimate the average abundance using
Poisson regression with a log link. I can estimate the abundance from x,
but I can seem to figure out how to get the average abundance in this
method. Any suggestions would be welcome as I have spent about 4 hours
trying to figure this one out.

Thanks.

--
View this message in context: 
http://r.789695.n4.nabble.com/Estimate-the-average-abundance-using-Poisson-regression-with-a-log-link-tp4295863p4295863.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How can I doing Quality adjusted survival analysis in R?

2012-01-14 Thread Pedro Mota Veiga

Hi R  users
I need to estimate, with kaplan Meier methodology, a Quality adjusted
survival analysis. It is possible doing this at R?
Thanks in advance. 
Best Regards 

Pedro Mota Veiga

--
View this message in context: 
http://r.789695.n4.nabble.com/How-can-I-doing-Quality-adjusted-survival-analysis-in-R-tp4295868p4295868.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Estimate the average abundance using Poisson regression with a log link.

2012-01-14 Thread Rui Barradas

P.S.

I don't understand what you mean by log link but if it's the use of a
log-normal to get improved confidence intervals, package 'SPECIES'
implements it, unlike 'Rcapture' that only gives point estimates.

Rui Barradas




--
View this message in context: 
http://r.789695.n4.nabble.com/Estimate-the-average-abundance-using-Poisson-regression-with-a-log-link-tp4295863p4296096.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Estimate the average abundance using Poisson regression with a log link.

2012-01-14 Thread Rui Barradas

There is a no homework rule, but:

1. abundance + poisson seems to be an animal species abundance problem
in a capture-recapture framework.
2. If so, check out packages 'Rcapture' and 'SPECIES'. They both implement
several estimators, such as Burnham and Overton's jacknife or Chao's
estimators. (The poisson model is a natural one).
3. Personally, I prefer the first, but this is because I'm more used to it
and have never worked with 'SPECIES', just took a look at it.

Rui Barradas


--
View this message in context: 
http://r.789695.n4.nabble.com/Estimate-the-average-abundance-using-Poisson-regression-with-a-log-link-tp4295863p4295930.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] add column with values found in another data frame

2012-01-14 Thread Jorge I Velez

Hi Jeremy,

Try

 frame1 - structure(list(ID = structure(c(1L, 1L, 2L, 2L), .Label =
c(Guy1,
Guy2), class = factor), score = c(10L, 13L, 9L, 11L), test = c(1L,
2L, 1L, 2L), age = c(20L, 20L, 33L, 33L)), .Names = c(ID, score,
test, age), class = data.frame, row.names = c(1, 2,
3, 4))

 frame2 - structure(list(ID = structure(1:2, .Label = c(Guy1, Guy2),
class = factor)),
.Names = ID, class = data.frame, row.names = c(1, 2))

 merge(frame1, frame2, by = ID))
ID score test age
1 Guy1101  20
2 Guy1132  20
3 Guy2 91  33
4 Guy2112  33

 subset(frame1, ID %in% frame2$ID, select = c(ID, age))
ID age
1 Guy1  20
2 Guy1  20
3 Guy2  33
4 Guy2  33

See ?subset and ?merge for more information.

HTH,
Jorge.-


On Sat, Jan 14, 2012 at 3:51 PM, jdog76  wrote:

  I am positive this problem has a very simple solution, but I have been
 unable to find it, so I am asking for your help. I need to know how to look
 something up in one data frame and add it as a column in another.  If I
 have
 a data frame that looks like this:

  frame1
  ID  score  test age
 1  Guy1 10 1  20
 2 Guy1 13 2  20
 3 Guy2 9 1  33
 4 Guy2 11 2  33

 and another frame that looks like this:

  frame2
  ID
 1 Guy1
 2 Guy2

 How do I add a column to frame2 so it looks like this:

  ID age
 1 Guy1 20
 2 Guy2 33

 I know this must be simple, but I couldn't find the solution by searching.

 thanks so much
 Jeremy


 --
 View this message in context:
 http://r.789695.n4.nabble.com/add-column-with-values-found-in-another-data-frame-tp4295626p4295626.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] add column with values found in another data frame

2012-01-14 Thread Pete Brecknock


jdog76 wrote
 
 I am positive this problem has a very simple solution, but I have been
 unable to find it, so I am asking for your help. I need to know how to
 look something up in one data frame and add it as a column in another.  If
 I have a data frame that looks like this:
 
 frame1
   ID  score  test age
 1  Guy1 10 1  20
 2 Guy1 13 2  20
 3 Guy2 9 1  33
 4 Guy2 11 2  33
 
 and another frame that looks like this:
 
 frame2
   ID
 1 Guy1
 2 Guy2
 
 How do I add a column to frame2 so it looks like this:
 
   ID age
 1 Guy1 20
 2 Guy2 33
 
 I know this must be simple, but I couldn't find the solution by searching.
 
 thanks so much
 Jeremy
 

How about 

frame2$age = frame1[match(frame2$ID, frame1$ID),age]

print(frame2)
ID age
1 Guy1  20
2 Guy2  33

HTH

Pete



--
View this message in context: 
http://r.789695.n4.nabble.com/add-column-with-values-found-in-another-data-frame-tp4295626p4296307.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Coloring counties on a full US map based on a certain criterion

2012-01-14 Thread Ray Brownrigg


On 14/01/2012 10:33 a.m., Dimitri Liakhovitski wrote:

Somewhat related question out of curiousity:
Does anyone know how often the list of the counties and county names
is updated in this package? Or is it done centrally for all packages
that deal with US counties?
Thanks!
Dimitri


Well, I would hazard a guess that the package maintainer would know :-)

The answer to your first question is As and when the package maintainer 
is informed of errors or changes.


The answer to your second question is No.

Ray

On Fri, Jan 13, 2012 at 3:41 PM, Ray Brownrigg
ray.brownr...@ecs.vuw.ac.nz  wrote:

On 14/01/2012 8:04 a.m., Sarah Goslee wrote:

On Fri, Jan 13, 2012 at 1:52 PM, Dimitri Liakhovitski
dimitri.liakhovit...@gmail.comwrote:

Just to clarify, according to help about the fill argument:
logical flag that says whether to draw lines or fill areas. If FALSE,
the lines bounding each region will be drawn (but only once, for
interior lines). If TRUE, each region will be filled using colors from
the col = argument, and bounding lines will not be drawn.
We have fill=TRUE - so why are the county borders still drawn?
Thank you!
Dimitri


This prompted me to check the code:

if fill=TRUE, map() calls polygon()
if fill=FALSE, map() calls lines()

But polygon() draws borders by default.

plot(c(0,1), c(0,1), type=n)
polygon(c(0,0,1,1), c(0,1,1,0), col=yellow)

To not draw borders, the border argument is provided:

plot(c(0,1), c(0,1), type=n)
polygon(c(0,0,1,1), c(0,1,1,0), col=yellow, border=NA)

But that fails in map():

map('county', 'iowa', fill=TRUE, col=rainbow(20), border=NA)

Error in par(pin = p) :
   invalid value specified for graphical parameter pin

because border is used as a named argument in map() already, for setting
the
size of the plot area, so there's no way to alter the border argument
to polygon.


Coincidentally, I became aware of this just recently.  When the maps package
was created (way back in the 'new' S era), polygon() didn't add borders,
and that is why ?map states that fill does not add borders.  A workaround is
to change the map() option border= to myborder= (it is then used twice in
map()).


The work-around I suggested previous (lty=0) seems to be the only
way to deal with the problem.


In fact I believe there is another workaround if you don't want to modify
the code; use the option resolution=0 in the map() call. I.e. try (in
Sarah's original Iowa example):

map('county', 'iowa', fill= TRUE, col = classcolors[countycol],
resolution=0, lty=0)

This ensures that the polygon boundaries match up.

I'll fix the border issue in the next version of maps (*not* the one just
uploaded to CRAN, which was to add Cibola County to NM).

Ray Brownrigg

Sarah







__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] The Future of R | API to Public Databases

2012-01-14 Thread Prof Brian Ripley


On 14/01/2012 18:51, Joshua Wiley wrote:

I have been following this thread, but there are many aspects of it
which are unclear to me.  Who are the publishers?  Who are the users?
What is the problem?  I have a vauge sense for some of these, but it
seems to me like one valuable starting place would be creating a
document that clarifies everything.  It is easier to tackle a concrete
problem (e.g., agree on a standard numerical representation of dates
and times a la ISO 8601) than something diffuse (e.g., information
overload).


Let alone something as vague as 'the future of R' (for which the R-devel 
list is the appropriate one).  I believe the original poster is being 
egocentric: as someone said earlier, she has never had need of this 
concept, and I believe that is true of the vast majority of R users.


The development of R per se is primarily driven by the needs of the core 
developers and those around them.  Other R communities have sent up 
their own special-interest groups and sets of packages, and that would 
seem the way forward here.



Good luck,

Josh

On Sat, Jan 14, 2012 at 10:02 AM, Benjamin Weberm...@bwe.im  wrote:

Mike

We see that the publishers are aware of the problem. They don't think
that the raw data is the usable for the user. Consequently they
recognizing this fact with the proprietary formats. Yes, they resign
in the information overload. That's pathetic.

It is not a question of *which* data format, it is a question about
the general concept. Where do publisher and user meet? There has to be
one *defined* point which all parties agree on. I disagree with your
statement that the publisher should just publish csv or cook his own
API. That leads to fragmentation and inaccessibility of data. We want
data to be accessible.

A more pragmatic approach is needed to revolutionize the way we go
about raw data.

Benjamin

On 14 January 2012 22:17, Mike Marchywkamarchy...@hotmail.com  wrote:








LOL, I remember posting about this in the past. The US gov agencies vary but mostare 
quite good. The big problem appears to be people who push proprietary orcommercial 
standards for which only one effective source exists. Some formats,like Excel 
and PDF come to mind and there is a disturbing trend towards theiradoption in some places 
where raw data is needed by many. The best thing to do is contact the informationprovider 
and let them know you want raw data, not images or stuff that worksin limited commercial 
software packages. Often data sources are valuable andthe revenue model impacts 
availability.

If you are just arguing over different open formats,  it is usually easy for 
someone towrite some conversion code and publish it- CSV to JSON would not be a 
problem for example. Data of course are quite variable and there is 
nothingwrong with giving provider his choice.



Date: Sat, 14 Jan 2012 10:21:23 -0500
From: ja...@rampaginggeek.com
To: r-help@r-project.org
Subject: Re: [R] The Future of R | API to Public Databases

Web services are only part of the problem. In essence, there are at
least two facets:
1. downloading the data using some protocol
2. mapping the data to a common model

Having #1 makes the import/download easier, but it really becomes useful
when both are included. I think #2 is the harder problem to address.
Software can usually be written to handle #1 by making a useful
abstraction layer. #2 means that data has consistent names and meanings,
and this requires people to agree on common definitions and a common
naming convention.

RDF (Resource Description Framework) and its related technologies
(SPARQL, OWL, etc) are one of the many attempts to try to address this.
While this effort would benefit R, I think it's best if it's part of a
larger effort.

Services such as DBpedia and Freebase are trying to unify many data sets
using RDF.

The task view and package ideas a great ideas. I'm just adding another
perspective.

Jason

On 01/13/2012 05:18 PM, Roy Mendelssohn wrote:

HI Benjamin:

What would make this easier is if these sites used standardized web services, 
so it would only require writing once. data.gov is the worst example, they spun 
the own, weak service.

There is a lot of environmental data available through OPenDAP, and that is 
supported in the ncdf4 package. My own group has a service called ERDDAP that 
is entirely RESTFul, see:

http://coastwatch.pfel.noaa.gov/erddap

and

http://upwell.pfeg.noaa.gov/erddap

We provide R (and matlab) scripts that automate the extract for certain cases, 
see:

http://coastwatch.pfeg.noaa.gov/xtracto/

We also have a tool called the Environmental Data Connector (EDC) that provides 
a GUI from with R (and ArcGIS, Matlab and Excel) that allows you to subset data 
that is served by OPeNDAP, ERDDAP, certain Sensor Observation Service (SOS) 
servers, and have it read directly into R. It is freely available at:

http://www.pfeg.noaa.gov/products/EDC/

We can write such tools because

Re: [R] GUI preferences are not saved

2012-01-14 Thread Prof Brian Ripley


On 15/01/2012 01:57, Phillip Feldman wrote:

I'm running R version 2.14.1 (2011-12-22) on a 32-bit Windows machine.
  I've edited the GUI preferences to increase the font size, saving my
preferences after doing so, but the next time I start an R session, my
changes to the GUI preferences are lost.  Is there a way to make the
GUI preference changes permanent?


Save them in the right place.  See ?Rconsole for where it is looking 
(and a file you can edit directly).


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

52 matches

Mail list logo