[R] How to process each element in 3 minute interval using a for loop in R?

2014-07-09 Thread Takatsugu Kobayashi
Hi R-users,

This should be a simple question: How can I delay each loop process in some
minutes? The reason for this is I need to avoid too much traffic to get
longitudes and latitudes of 2000 addresses using google API.

I am searching for solutions with keywords like interval, minutes, delay,
but no directly relevant clues have come up yet.

Many thanks in advance.

Best,

Taka

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to process each element in 3 minute interval using a for loop in R?

2014-07-09 Thread Prof Brian Ripley

On 09/07/2014 08:17, Takatsugu Kobayashi wrote:

Hi R-users,

This should be a simple question: How can I delay each loop process in some
minutes? The reason for this is I need to avoid too much traffic to get
longitudes and latitudes of 2000 addresses using google API.

I am searching for solutions with keywords like interval, minutes, delay,
but no directly relevant clues have come up yet.


See ?Sys.sleep



Many thanks in advance.

Best,

Taka



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reorder a list

2014-07-09 Thread Lorenzo Alfieri
Thanks Bill and the other guys for the variety of useful replies!
In fact I'm working with pretty big lists (with ~35000 sublists) and Bill's 
solution is the fastest one in terms of computing time.
Now comes the second part of the question... :-)
I've my usual list of values and time indices to sort:
A1-list(c(1:4),c(2,4,5),23,c(4,5,13))
and then another list A2 with variables which have to be paired with the values 
of A1:
A2-sapply(A1, exp)#(in my case there's no exp relation between A1 and 
A2, they're completely uncorrelated. That's just an example ) 
 A2
[[1]]
[1]  2.718282  7.389056 20.085537 54.598150

[[2]]
[1]   7.389056  54.598150 148.413159

[[3]]
[1] 9744803446

[[4]]
[1] 54.59815148.41316 442413.39201

Now I'd like to reorder the elements of A2 according to the same rule applied 
for A1:
 f - function (x) {
lengths - vapply(x, FUN = length, FUN.VALUE = 0L)
split(rep(seq_along(x), lengths), unlist(x, use.names = FALSE))
}
B1-f(A1)

and thus obtain a list B2 which looks like this:
 B2
$`1`
[1] 2.718282

$`2`
[1] 7.389056 7.389056

$`3`
[1] 20.08554

$`4`
[1] 54.59815 54.59815 54.59815

$`5`
[1] 148.4132 148.4132

$`13`
[1] 442413.4

$`23`
[1] 9744803446

(In this example each element is the exp() of the sublist name, but in a 
general case they would be uncorrelated, and the resulting elements of each 
sublist would be different)
Any idea?
Alfio
 

 From: wdun...@tibco.com
 Date: Tue, 8 Jul 2014 12:11:09 -0700
 Subject: Re: [R] reorder a list
 To: alfio...@hotmail.com
 CC: r-help@r-project.org
 
 f - function (x) {
 lengths - vapply(x, FUN = length, FUN.VALUE = 0L)
 split(rep(seq_along(x), lengths), unlist(x, use.names = FALSE))
 }
 f(A1) # gives about what you want (has, e.g., name 23, not position
 23, in output)
 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com
 
 
 On Tue, Jul 8, 2014 at 9:39 AM, Lorenzo Alfieri alfio...@hotmail.com wrote:
  Hi,
  I'm trying to find a way to reorder the elements of a list.
  Let's say I have a list like this:
  A1-list(c(1:4),c(2,4,5),23,c(4,5,13))
 
  A1
  [[1]]
  [1] 1 2 3 4
 
  [[2]]
  [1] 2 4 5
 
  [[3]]
  [1] 23
 
  [[4]]
  [1]  4  5 13
 
  All the elements included in it are values, while each sublist is a time 
  index
  Now, I'd like to reorder the list (without looping) so to obtain one 
  sublist for each value, which include all the time indices where each value 
  appears.
  In other words, the result should look like this:
 A2
  [[1]]
  [1] 1
 
  [[2]]
  [1] 1 2#because value 2 appears in the time index [[1]] and [[2]] of 
  A1
 
  [[3]]
  [1] 1
 
  [[4]]
  [1] 1 2 4
 
  [[5]]
  [1] 2 4
 
  [[13]]
  [1] 4
 
  [[23]]
  [1] 3
 
  Any suggestion?
  Thanks
  Alfio
 
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] matrix built by diagonal matrices with a given structure (2nd trial)

2014-07-09 Thread Carlo Giovanni Camarda

Dear R-users,


three weeks ago I sent the mail below, but I didn't receive any 
response. Maybe it was overlooked.


Thanks anyway for all the help you gave us by this mailing-list,
Giancarlo Camarda

[...]

I have a matrix with a series of not-overlapping in a row dimension 
vectors in a given structure. Something like:


|a1,  0,  0,  0|

| 0, a2, a3,  0|

|a4,  0,  0, a5|


where ai are column-vectors of the equal length, m.


My aim is to construct a new matrix formed by diagonal matrices built by 
the mentioned vectors and placed following the original structure. 
Something like:


|diag(a1),0,0,0|

|   0, diag(a2), diag(a3),0|

|diag(a4),0,0, diag(a5)|


Of course the zeros are vectors of length m, and empty (m times m) 
matrices in the first and second scheme, respectively.


I found a way to obtain what I need by selecting an augmented version of 
the original matrix which I have constructed using the kronecker 
product. I was wondering whether there is a more elegant and 
straightforward procedure.


See below a simple reproducible example of my challenge in which the 
length of the vectors is 4.


Thanks in advance for your help,

Giancarlo Camarda



## size of the diagonal matrices

## or length of the vectors

m - 4

## the original matrix

ze - rep(0,m)

A - cbind(c(1,2,3,4,ze,13,14,15,16),

  c(ze,5,6,7,8,ze),

  c(ze,9,10,11,12,ze),

  c(ze,ze,17,18,19,20))

## augmenting the original matrix

A1 - kronecker(A, diag(m))

## which rows to select

w1 - seq(1, m^2, length=m)

w2 - seq(0, 2*m^2, by=m^2)

w0 - outer(w1, w2, FUN=+)

w - c(w0)

## final matrix

A2 - A1[w,]










[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to process each element in 3 minute interval using a for loop in R?

2014-07-09 Thread Michael Sumner
On 9 Jul 2014 17:19, Takatsugu Kobayashi taquito2...@gmail.com wrote:

 Hi R-users,

 This should be a simple question: How can I delay each loop process in
some
 minutes? The reason for this is I need to avoid too much traffic to get
 longitudes and latitudes of 2000 addresses using google API.


R is too fast? In a loop? Preposterous!

 I am searching for solutions with keywords like interval, minutes, delay,
 but no directly relevant clues have come up yet.

 Many thanks in advance.

 Best,

 Taka

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Transform a data.frame with ; sep column and another one in a a new one with the same two column but with repetitions

2014-07-09 Thread João Azevedo Patrício

Em 05-07-2014 00:43, John McKown escreveu:

I messed up my original response by not including r-help in the
distribution. And now I won't look as bad because, after a short nap,
I have new, much shorted (but more difficult, for me, to understand)
answer.

#
# The original data is in the variable x.
z=data.frame(TC=x$TC,
WC=I(mapply(strsplit,x$WC,MoreArgs=list(';'),USE.NAMES=FALSE)));
result=data.frame(TC=rep(x$TC,sapply(z$WC,length)),WC=unlist(z$WC));
#

There may be a way to eliminate the temporary variable z. Maybe I
need another nap!

The heart of this is the mapply, which results in a list where each
entry in the list is another list. And the entries in embedded list
are the list of results from the output of strsplit() on the WC
information.

If this needs to be a function, then

splitUp - function(x) {
 z=data.frame(TC=x$TC,
WC=I(mapply(strsplit,x$WC,MoreArgs=list(';'),USE.NAMES=FALSE)));
 result=data.frame(TC=rep(x$TC,sapply(z$WC,length)),WC=unlist(z$WC));
 return(result);
}

Then invoke it with:

flattened.result - splitUp(original.data.frame);

On Fri, Jul 4, 2014 at 7:50 AM, João Azevedo Patrício
joao.patri...@gmx.pt wrote:

Hi,

I've been trying to solve this issue but with no success.

I have some data like this:

1  TC  WC
2  0   Instruments  Instrumentation; Nuclear Science  Technology;
Physics, Particles  Fields; Spectroscopy
3  0   Nanoscience  Nanotechnology; Materials Science, Multidisciplinary;
Physics, Applied
4  2   Physics, Nuclear; Physics, Particles  Fields
5  0   Chemistry, Inorganic  Nuclear
6  2   Chemistry, Physical; Materials Science, Multidisciplinary;
Metallurgy  Metallurgical Engineering

And I need to have this:

1  TC  WC
2  0   Instruments  Instrumentation
2  0   Nuclear Science  Technology
2  0   Physics, Particles  Fields
2  0   Spectroscopy
3  0   Nanoscience  Nanotechnology
3  0   Materials Science, Multidisciplinary
3  0   Physics, Applied
4  2   Physics, Nuclear
4  2   Physics, Particles  Fields
5  0   Chemistry, Inorganic  Nuclear
6  2   Chemistry, Physical
6  2   Materials Science, Multidisciplinary
6  2   Metallurgy  Metallurgical Engineering

This means repeat the row for each element in WC and keeping the same value
in TC. The goal is to check how many TC (sum) there are by WC, when WC is
multiple.

i've tried to separate the column using strsplt but then I cannot keep the
track of TC.

thanks in advance.
--
João Azevedo Patrício

I've been testing it and the results is coming nicely.

It grabs a CSV taken from ISI Web Of science, works it out and produces 
a table organized by WC (web of science category) with number of papers 
per area, citations and impact factor.


my code is like this right now:

 isi - read.table(file.csv, header = TRUE, sep=;) ##get citations 
and web of science categories file

 isisplit=data.frame(TC=isi$TC,
+ WC=I(mapply(strsplit,isi$WC,MoreArgs=list(';'),USE.NAMES=FALSE)));
 
result=data.frame(TC=rep(isi$TC,sapply(isisplit$WC,length)),WC=unlist(isisplit$WC));

 isisplit$WC - str_trim(isisplit$WC)
 wccitations - aggregate (isisplit$TC, by=list(Category=isisplit$WC), 
FUN = sum) ## creates a table with the list of WCategories and the 
specific + citations

 colnames(wccitations) - c(WC, TC)
 wcproduction - table(isisplit$WC) ## creates a table with the number 
of pubs by WCategories

 wcproduction - as.data.table(wcproduction)
 colnames(wcproduction) - c(WC, PUB)
wc - data.frame(WC = wccitations$WC, PUB = wcproduction$PUB, TC = 
wccitations$TC, IMP = round((wcproduction$PUB/wccitations$TC), digits = 
+ 2))

 wc[wc == Inf] = 0 ## removes inf in impact by impact 0
 write.table(wc, file = file.csv, sep = ;, dec = ,)


--
João Azevedo Patrício
Tel.: +31 91 400 53 63
Portugal
@ http://tripaforra.bl.ee

Take 2 seconds to think before you act

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cutting hierarchical cluster tree at specific height fails

2014-07-09 Thread Johannes Radinger
Hi,

I'd like to cut a hierachical cluster tree calculated with hclust at a
specific height.
However ever get following error message:
Error in cutree(hc, h = 60) :
  the 'height' component of 'tree' is not sorted (increasingly)


Here is a working example to show that when specifing a height in  cutree()
the code fails. In contrast, specifying the number of clusters in cutree()
works.
What is the exact problem and how can I solve it?

x - c(rnorm(100,50,10),rnorm(100,200,25),rnorm(100,80,15))
y - c(rnorm(100,50,10),rnorm(100,200,25),rnorm(100,150,25))
df - data.frame(x,y)
plot(df)

hc - hclust(dist(df,method = euclidean), method=centroid)
plot(hc)

df$memb - cutree(hc, h = 60) # this does not work
df$memb - cutree(hc, k = 3) # this works!

plot(df$x,df$y,col=df$memb)


Thank you for your hints!

Best regards,
Johannes

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] icd9 - a new R package

2014-07-09 Thread Jack Wasey
Dear R people,

The new packge 'icd9' provides a range of tools for working with ICD-9-CM codes.

http://cran.r-project.org/web/packages/icd9/index.html
https://github.com/jackwasey/icd9

ICD-9 (clinical modification) is primarily used for categorizing
diseases in the USA for hospital administration, whereas ICD-10 is
used by the rest of the world for disease surveillance. This package
is currently restricted to ICD-9-CM codes.

I've seen other R code which manipulates ICD-9 codes, but the mistake
is often made of thinking they are numeric. This is not the case, e.g.
100.0 is different from 100 and 100.00 . This package takes care of
validating these codes, explaining them (converting code to plain
English), comparing them, and attributing codes to groups of codes to
assign co-morbidities to patients. ICD-9 codes are often provided in a
shortened format without a decimal place, and these have distinct
validation rules. Functions to convert between decimal and short forms
are provided. All key parts use vectorized code, and comorbidities for
a million patient visits can be assigned in a few seconds on a modest
workstation.

SAS code is published by AHRQ to allow assignment of ICD-9 codes to
comorbidities. This package contains some SAS source to R code
translation, so that the canonical ICD-9-CM to comorbidity mapping
provided by AHRQ can be derived directly without the cumbersome and
error-prone manual task of re-encoding the relationships in R. I
believe a SAS to R converter was an April Fools' joke some time ago,
but this is indeed a very limited answer to that problem.
http://www.biostatistics.dk/sas2r/index.html

A short vignette covers the major use-cases.
http://cran.r-project.org/web/packages/icd9/vignettes/icd9.pdf

The code is supported by a fairly thorough test suite, and is well
documented in the hope that it will be easier for users of the package
to understand it, and to get involved. I chose only to export key
functions where I had thought carefully about the external API, but
all internal functions are documented and contain potentially useful
nuggets for power users.

Comments and contributions are most welcome. In particular, I'd love
to see unit tests corresponding to any failures you may encounter
working with your own ICD-9 data.

Hope you find this useful.
Jack

--
Jack Wasey
Resident Physician, Anesthesiology and Critical Care Medicine
Johns Hopkins Hospital
Baltimore, MD, USA

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] h2o - fast scalable glm, deeplearning, gbm, randomForest, plyr for big datasets

2014-07-09 Thread SriSatish Ambati
http://cran.r-project.org/web/packages/h2o/
Please try h2o,

H2O is fast scalable open source R package for Generalized Linear Modeling,
Deep Learning, Gradient Boosting, RandomForest, k-means for large tera-byte
datasets. This package allows you to scale R over Hadoop like datasets
in-memory on multiple machines.

Under the guidance of a strong scientific advisory council from Stanford,
likes of Stephen Boyd, Rob Tibshirani and Trevor Hastie, our distributed
systems team built an R package that calls fast implementations of GLM,
GBM, DeepLearning, RandomForest to allow modeling on big data in Open
Source. Connectors to Hadoop, S3 and other file formats and extensibility
via R-expressions at scale, plyr and java.

thanks, Sri

-- 
ceo  co-founder, 0 http://www.0xdata.com/*x*data Inc
culture.code.customer

[[alternative HTML version deleted]]

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R Studio v3.0.3 for Windows 32bits is too slow

2014-07-09 Thread Phan, Truong Q
Hi R'er,

I have a dataset which has a matrix of 7502 x 1426 (rows x columns).
The data is in a CSV format which has a size around 68Mb. This dataset is less 
than 10% of our dataset.
I have been adopting the Anomaly detection method as described by 
http://www.mattpeeples.net/kmeans.html .
It has been running more than 24hrs and still haven't completed the calculation.
I did manage to run it with a smaller dataset (ie, 2100 rows x 1426 columns). 
It took around 12hrs to run.

I have a few questions and need your expertise guidance.

1)  Is there any better Open source tools to use to do in one tool (eg, R 
Studio): prepare data, build models, validate models, test models and present 
data. I am looking a tool which will allow me to do the same as per the above 
link (Matt Peeples' blog).

2)  Is there an Open source tools to perform the above which will allow me 
to run on top of Hadoop eco-system?

3)  Can we use R Studio for windows as a client to run on top of Hadoop 
eco-system? If yes, please point me to the site where they have a use cases or 
samples.

Thanks and Regards,
Truong Phan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error evaluating partitioning around medoids clustering method R clValid package

2014-07-09 Thread Scott Davis
I have a data.frame with 300 observations of 36 numerical, categorical, and
NA variables. I am trying to evaluate the partitioning around medoids
clustering algorithm for a marketing segmentation study. My original
dataset has over 130,000 observations, but I took a sample for easy
reproducibility reasons.


My machine Mac OSX 10.9.3:


 sessionInfo()

R version 3.1.0 (2014-04-10)

Platform: x86_64-apple-darwin13.1.0 (64-bit)


Problem: Getting an error when doing internal and stability evaluation with
the clValid CRAN package in R.


Code:

#Convert csv to data.frame

frame -as.data.frame(Smallstore1)

 library(cluster)

#Create dissimilarity matrix

#Gower coefficient for finding distance between mixed variables

 daisy1 - daisy(frame, metric = gower, type = list(ordratio =
c(1:36)))

#k-medoid algorithm with 3 clusters

 kanswers - pam(daisy1, 3, diss = TRUE)

#Evaluate k-mediod clustering algorithm with 2 to 6 clusters

#Import clValid package

 library(clValid)

#Internal validation

 internval1 - clValid(daisy1, 2:6, clMethods = pam, validation =
internal)

#Error in switch(class(obj), matrix = mat - obj, ExpressionSet = mat
-Biobase::exprs(obj),  : EXPR must be a length 1 vector

#Error in summary(internval1) :

  #error in evaluating the argument 'object' in selecting a method for
function 'summary': Error: object 'internval1' not found

#External validation

 stabval1 - clValid(daisy1, 2:6, clMethods = pam, validation =
stability)

#Error in switch(class(obj), matrix = mat - obj, ExpressionSet = mat
- Biobase::exprs(obj),  : EXPR must be a length 1 vector


Data:


I put the data.frame in a dissimilarity matrix using the daisy function and
used partitioning around medoids with 3 clusters. The daisy and pam
functions come from the cluster CRAN package in R. Since the data.frame has
mixed values, the gower distance coefficient is used. Here's the head of
the first 7 variables, but I took out the names of the email for privacy
reasons.


 head(frame)

  user_id emailAge   Gender Household.Income
Marital.Status Presence .of.children

1   12945 @bellycard.com  NAMaleNANA
   NA

2   12947 @bellycard.com  NAMaleNANA
   NA

3   12990 @gmail.com  NANANANA
   NA

4   13160 @gmail.com  25-34   Male100k-125k   Single
   No

5   13195 @gmail.com  NAMale75k-100kSingle
   No

6   13286 @gmail.com  NANANANA
   NA


Please let me know if I can provide more information.
-- 
Scott Davis
Cell: (408)826-9561
Skype ID: Scdavis61
San Jose, CA.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] HPGL or PCL plotting device? Or otherwise plotting plots

2014-07-09 Thread Thomas Levine
Hi,

I want to print plots on a Roland DXY-1100 plotter.
How can I do this from R? I think the easiest thing
would be a graphics device for Printer Command
Language or Hewlett-Packard Graphics Language, but
I haven't managed to find any of those.

Thanks

Tom

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] HPGL or PCL plotting device? Or otherwise plotting plots

2014-07-09 Thread Thomas Levine
Oh it was easier than I thought.

  postscript('project-contracts.ps')
  hist(log(projects$n.contracts))
  dev.off()

Then run this from the shell.

  pstoedit -f plot-hpgl project-contracts.ps project-contracts.hpgl

And send it to the plotter.

On 09 Jul 13:10, Thomas Levine wrote:
 Hi,
 
 I want to print plots on a Roland DXY-1100 plotter.
 How can I do this from R? I think the easiest thing
 would be a graphics device for Printer Command
 Language or Hewlett-Packard Graphics Language, but
 I haven't managed to find any of those.
 
 Thanks
 
 Tom

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R Studio v3.0.3 for Windows 32bits is too slow

2014-07-09 Thread Bert Gunter
RStudio is a separate product with its own support. Post there, not here.

-- Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
Clifford Stoll




On Tue, Jul 8, 2014 at 7:34 PM, Phan, Truong Q
troung.p...@team.telstra.com wrote:
 Hi R'er,

 I have a dataset which has a matrix of 7502 x 1426 (rows x columns).
 The data is in a CSV format which has a size around 68Mb. This dataset is 
 less than 10% of our dataset.
 I have been adopting the Anomaly detection method as described by 
 http://www.mattpeeples.net/kmeans.html .
 It has been running more than 24hrs and still haven't completed the 
 calculation.
 I did manage to run it with a smaller dataset (ie, 2100 rows x 1426 columns). 
 It took around 12hrs to run.

 I have a few questions and need your expertise guidance.

 1)  Is there any better Open source tools to use to do in one tool (eg, R 
 Studio): prepare data, build models, validate models, test models and present 
 data. I am looking a tool which will allow me to do the same as per the above 
 link (Matt Peeples' blog).

 2)  Is there an Open source tools to perform the above which will allow 
 me to run on top of Hadoop eco-system?

 3)  Can we use R Studio for windows as a client to run on top of Hadoop 
 eco-system? If yes, please point me to the site where they have a use cases 
 or samples.

 Thanks and Regards,
 Truong Phan

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] eclat problem

2014-07-09 Thread Alvaro Flores
Thanks Michael, everything going perfect right now! I didn't expect such an 
extensive list of itemsets given the insights on the data that I have. And the 
error message didn't gave me the right clue. Now is working fine! Thanks for 
your time! 

Best Regards.


Alvaro.




-Mensaje original-
De: Michael Hahsler [mailto:mhahs...@lyle.smu.edu] 
Enviado el: miércoles, 09 de julio de 2014 10:02
Para: r-help@r-project.org; Alvaro Flores
Asunto: Re: [R] eclat problem

Hi Alvaro,

this was a tricky problem. Under Windows R uses the trio library (different 
from the package Trio which creates very similar error
messages) for printf support. arules currently contains a bug that results in 
an invalid format string for printf when an error message is created. For your 
problem below the error message should read out of memory, but since creating 
the error message produces an invalid printf format string you see under 
Windows the internal error instead. This problem will be fixed in the next 
release of arules (version 1.1-4).

Note however that your code still runs out of memory and you need to increase 
support and/or restrict the number of items in the itemsets (both with the list 
for parameter; see also class? ECparameter).

-Michael

On 07.07.2014 22:56, Alvaro Flores wrote:
 
 
  I'm working with arule packages and I'm constantly trying to mine frequent 
  itemsets in different datasets. But recently R kept returning the same error 
  message :
 
 
 
  Error in eclat(txn, parameter = list(supp = 0.001)) :
 
 internal error in trio library
 
 
 
  Is just this particular dataset that gives me problems.
 
 
 
  Anyone has ever passed and fixed this error?
 
 
 
  Here are an example of the transaction data set:
 
  items
 
  1  {001200-3,
 
   004100-3,
 
   004200-5,
 
   004500-9,
 
   004600-5}
 
  2  {001524-K,
 
   002100-2}
 
  3  {00179,
 
   03807,
 
   08019,
 
   09314,
 
   12432}
 
  4  {002000}
 
  5  {002600-4,
 
   002700-0}
 
  6  {004115-F,
 
   02/100073A,
 
   02/630935A,
 
   044.1567.0,
 
   044.1567.0/I,
 
   1010301FA,
 
   1012015-400-,
 
   1117285,
 
   1118100-201-4020,
 
   1118105-051-M,
 
   173171,
 
   1903628,
 
   1903628/I,
 
   1903629,
 
   1903629/I,
 
   1907566,
 
   1907567,
 
   1907570,
 
   1907571,
 
   1931018,
 
   2.4419.340.0,
 
   215420/N,
 
   2654408-N,
 
   2992242,
 
   2992544,
 
   2996416,
 
   2VC-115561,
 
   320/04133A,
 
   4102AZL.14.100N00,
 
   4110Z.14.30,
 
   4625547,
 
   477556,
 
   477556/O,
 
   478736,
 
   478736/O,
 
   500054655,
 
   581/18096,
 
   6170005,
 
   957E-6731 A,
 
   BF8T-6731 BA,
 
   BG2X-6731 CA,
 
   DBPN-6731 A,
 
   F2NN-6714 AB,
 
   LF16015,
 
   LF3000,
 
   LF3345,
 
   LF3346,
 
   LF3349,
 
   LF3806,
 
   LF4054,
 
   LF9009,
 
   RE504836,
 
   RE59754,
 
   T19044/I,
 
   TAE-115561,
 
   W-950/7,
 
   ZP520}
 
  7  {005226,
 
   012.0348.0,
 
   012.0349.0,
 
   02/910150A,
 
   1105010E834N00,
 
   1105020D354,
 
   1117011-630-W,
 
   1117025-621-,
 
   1372444,
 
   1393640,
 
   1457434310001,
 
   1521219,
 
   1873018,
 
   1901605,
 
   1902134,
 
   1902138,
 
   1902138/I,
 
   1907640,
 
   1907640/I,
 
   1908547,
 
   1908547/I,
 
   1930010,
 
   19BG920-30001,
 
   20430751,
 
   20514654,
 
   20976003/O,
 
   20998367,
 
   215460,
 
   26560143,
 
   26560201,
 
   26560201/I,
 
   2710806,
 
   2992241,
 
   2992241/I,
 
   2992300,
 
   2992662,
 
   2992662/I,
 
   2995711,
 
   2997376,
 
   2R0-127177,
 
   2R0-127177 A,
 
   2RD-127491,
 
   32/401102,
 
   32/912001A,
 
   32/925423,
 
   32/925760,
 
   32/925869,
 
   32/925915,
 
   320/07155,
 
   343144,
 
   4102H.15.110,
 
   4102H.15.110N00,
 
   4102H.15.20,
 
   500315480,
 
   500315480/I,
 
   500316868,
 
   550228,
 
   550228/N,
 
   582042,
 
   612630080011N00,
 
   612630080087,
 
   7146717,
 
   8159975/O,
 
   81BASE921,
 
   98439681,
 
   AR50041,
 
   BC1132N01,
 
   BF0X-9155 AA,
 
   BF5T-9155 AB,
 
   BF8T-9155 DA,
 
   DDN-99162 B,
 
   DONN-9N074 BG,
 
   E5HT-9155 CA,
 
   E7HN-9155 AA,
 
   FF42000,
 
   FF5421,
 
   FF5458,
 
   FF5488,
 
   FS1000,
 
   FS1015,
 
   FS1241,
 
   FS1242,
 
   FS1280,
 
   PSD460/1,
 
   PSD970/1,
 
   R28-30M,
 
   RC45MB,
 
   RE62418,
 
   RK120MBQ2,
 
   T22VA,
 
   WK-723}
 
  8  {005227,
 
   

[R] Revolutions blog: June 2014 Roundup

2014-07-09 Thread David Smith
Revolution Analytics staff and guests write about R every weekday at
the Revolutions blog:
 http://blog.revolutionanalytics.com
and every month I post a summary of articles from the previous month
of particular interest to readers of r-help.

In case you missed them, here are some articles related to R from the
month of June:

The useR! 2014 conference in Los Angeles opened with 16 tutorials:
http://bit.ly/1rSoqeh

DataInformed published my article on how various companies use R:
http://bit.ly/1rSos5L

Joe Rickert reviews the new book Applied Predictive Modeling by Max
Kuhn and Kjell Johnson, which is rich with examples in R and the
caret package: http://bit.ly/1rSopam

Hadley Wickham's new ggvis package features a new syntax to create
interactive ggplot2-style graphics: http://bit.ly/1rSoqei

Guest poster Wayne Smith reviews the R and Statistics presentations at
the Intel International Science and Engineering Fair:
http://bit.ly/1rSos5K

Bank of America uses R to make mundane tables stand out, as reported
in a recent FastCoLabs article: http://bit.ly/1rSoqen

DataCamp created an infographic comparing SAS, R and SPSS: http://bit.ly/1rSos5M

Prizes on offer for the best R graphic mapping the locations of R user
groups: http://bit.ly/1rSoqeo

Guy Abel used the circlize package to visualize the players in the
World Cup and the location of their home teams: http://bit.ly/1rSos5N

How to create a clean financial data set for backtesting using data
from Quandl: http://bit.ly/1rSoqep

R's popularity continues to surge, with high rankings in the latest
KDNuggets poll and Redmonk language rankings: http://bit.ly/1rSos5O

Analysis of movie palettes using Python and R reveals that Hollywood
cinematographers prefer orange and blue: http://bit.ly/1rSoqeq

There are now 141 R user groups worldwide, with recent additions in
Chennai (India), Exeter (UK), Miami (FL), Durham (NH), Albany (NY) and
Charlotte (NC): http://bit.ly/1rSos5Q

Two more companies share how they use R: the ride-sharing company
Uber, and CultureAmp (a people intelligence platform):
http://bit.ly/1rSoquF

Tutorial on constructing a term structure of interest rates with R:
http://bit.ly/1rSos5R

R is featured in a Dataversity article on the relevance of open source
analytics for businesses: http://bit.ly/1rSoqer

The China R Users Conference attracted more than 1000 attendees:
http://bit.ly/1rSoquE

A look at the state of the art in Deep Learning research, including
the darch and deepnet packages: http://bit.ly/1rSos5U

One million students have enrolled in Coursera courses based on R:
http://bit.ly/1rSoquG

An updated function for reading data into R from Google Spreadsheets
that works with Google's current security model: http://bit.ly/1rSos5W

General interest stories (not related to R) in the past month
included: the illusory colors of Benham's Top (http://bit.ly/1rSos5V),
an airport performance of All by Myself (http://bit.ly/1rSoquH), and
beer bottle harmonies (http://bit.ly/1rSos5X).

Meeting times for local R user groups (http://bit.ly/eC5YQe) can be
found on the updated R Community Calendar at: http://bit.ly/bb3naW

If you're looking for more articles about R, you can find summaries
from previous months at http://blog.revolutionanalytics.com/roundups/.
You can receive daily blog posts via email using services like
blogtrottr.com, or join the Revolution Analytics mailing list at
http://revolutionanalytics.com/newsletter to be alerted to new
articles on a monthly basis.

As always, thanks for the comments and please keep sending suggestions
to me at da...@revolutionanalytics.com or via Twitter (I'm
@revodavid).

Cheers,
# David

-- 
David M Smith da...@revolutionanalytics.com
Chief Community Officer, Revolution Analytics
http://blog.revolutionanalytics.com
Tel: +1 (650) 646-9523 (Chicago IL, USA)
Twitter: @revodavid

-- 
Try Enterprise R Now!  
https://aws.amazon.com/marketplace/seller-profile/ref=_ptnr_emailfooter?ie=UTF8id=3c6536d3-8115-4bc0-a713-be58e257a7be
Get a 14 Day Free Trial of Revolution R Enterprise on AWS Marketplace

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to include factor levels into plot title?

2014-07-09 Thread Bea GD

Hi all,

I'd like to include the levels of one of my variables in the title of a 
plot. I'd like these factor levels to be concatenated. E.g. 'These are 
the levels: setosa, versicolor, virginica'.


I've been working with this code but I don't get the desired results. 
Any suggestions would be a great help. Thanks!


dd - iris

plot(dd$Sepal.Length, dd$Petal.Length,
 main=sprintf(These are the levels: %s, levels(dd$Species)))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cutting hierarchical cluster tree at specific height fails

2014-07-09 Thread David L Carlson
To cut the tree, the clustering algorithm must produce consistently increasing 
height values with no reversals. You used one of the two options in hclust that 
does not do this. Note the following from the hclust manual page:

Note however, that methods median and centroid are not leading to a 
monotone distance measure, or equivalently the resulting dendrograms can have 
so called inversions (which are hard to interpret).

The cutree manual page:

Cutting trees at a given height is only possible for ultrametric trees (with 
monotone clustering heights).

Use a different method (but not median).

-
David L Carlson
Department of Anthropology
Texas AM University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Johannes Radinger
Sent: Wednesday, July 9, 2014 7:07 AM
To: R help
Subject: [R] Cutting hierarchical cluster tree at specific height fails

Hi,

I'd like to cut a hierachical cluster tree calculated with hclust at a
specific height.
However ever get following error message:
Error in cutree(hc, h = 60) :
  the 'height' component of 'tree' is not sorted (increasingly)


Here is a working example to show that when specifing a height in  cutree()
the code fails. In contrast, specifying the number of clusters in cutree()
works.
What is the exact problem and how can I solve it?

x - c(rnorm(100,50,10),rnorm(100,200,25),rnorm(100,80,15))
y - c(rnorm(100,50,10),rnorm(100,200,25),rnorm(100,150,25))
df - data.frame(x,y)
plot(df)

hc - hclust(dist(df,method = euclidean), method=centroid)
plot(hc)

df$memb - cutree(hc, h = 60) # this does not work
df$memb - cutree(hc, k = 3) # this works!

plot(df$x,df$y,col=df$memb)


Thank you for your hints!

Best regards,
Johannes

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to include factor levels into plot title?

2014-07-09 Thread Sarah Goslee
How about:


plot(dd$Sepal.Length, dd$Petal.Length, main=paste(These are the
levels:, paste(levels(dd$Species), collapse=, )))


Thanks for the actual reproducible example!

Sarah

On Wed, Jul 9, 2014 at 11:24 AM, Bea GD aguitatie...@hotmail.com wrote:
 Hi all,

 I'd like to include the levels of one of my variables in the title of a
 plot. I'd like these factor levels to be concatenated. E.g. 'These are the
 levels: setosa, versicolor, virginica'.

 I've been working with this code but I don't get the desired results. Any
 suggestions would be a great help. Thanks!

 dd - iris

 plot(dd$Sepal.Length, dd$Petal.Length,
  main=sprintf(These are the levels: %s, levels(dd$Species)))


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] eclat problem

2014-07-09 Thread Michael Hahsler

Hi Alvaro,

this was a tricky problem. Under Windows R uses the trio library 
(different from the package Trio which creates very similar error 
messages) for printf support. arules currently contains a bug that 
results in an invalid format string for printf when an error message is 
created. For your problem below the error message should read out of 
memory, but since creating the error message produces an invalid printf 
format string you see under Windows the internal error instead. This 
problem will be fixed in the next release of arules (version 1.1-4).


Note however that your code still runs out of memory and you need to 
increase support and/or restrict the number of items in the itemsets 
(both with the list for parameter; see also class? ECparameter).


-Michael

On 07.07.2014 22:56, Alvaro Flores wrote:


 I'm working with arule packages and I'm constantly trying to mine 
frequent itemsets in different datasets. But recently R kept returning 
the same error message :




 Error in eclat(txn, parameter = list(supp = 0.001)) :

internal error in trio library



 Is just this particular dataset that gives me problems.



 Anyone has ever passed and fixed this error?



 Here are an example of the transaction data set:

 items

 1  {001200-3,

  004100-3,

  004200-5,

  004500-9,

  004600-5}

 2  {001524-K,

  002100-2}

 3  {00179,

  03807,

  08019,

  09314,

  12432}

 4  {002000}

 5  {002600-4,

  002700-0}

 6  {004115-F,

  02/100073A,

  02/630935A,

  044.1567.0,

  044.1567.0/I,

  1010301FA,

  1012015-400-,

  1117285,

  1118100-201-4020,

  1118105-051-M,

  173171,

  1903628,

  1903628/I,

  1903629,

  1903629/I,

  1907566,

  1907567,

  1907570,

  1907571,

  1931018,

  2.4419.340.0,

  215420/N,

  2654408-N,

  2992242,

  2992544,

  2996416,

  2VC-115561,

  320/04133A,

  4102AZL.14.100N00,

  4110Z.14.30,

  4625547,

  477556,

  477556/O,

  478736,

  478736/O,

  500054655,

  581/18096,

  6170005,

  957E-6731 A,

  BF8T-6731 BA,

  BG2X-6731 CA,

  DBPN-6731 A,

  F2NN-6714 AB,

  LF16015,

  LF3000,

  LF3345,

  LF3346,

  LF3349,

  LF3806,

  LF4054,

  LF9009,

  RE504836,

  RE59754,

  T19044/I,

  TAE-115561,

  W-950/7,

  ZP520}

 7  {005226,

  012.0348.0,

  012.0349.0,

  02/910150A,

  1105010E834N00,

  1105020D354,

  1117011-630-W,

  1117025-621-,

  1372444,

  1393640,

  1457434310001,

  1521219,

  1873018,

  1901605,

  1902134,

  1902138,

  1902138/I,

  1907640,

  1907640/I,

  1908547,

  1908547/I,

  1930010,

  19BG920-30001,

  20430751,

  20514654,

  20976003/O,

  20998367,

  215460,

  26560143,

  26560201,

  26560201/I,

  2710806,

  2992241,

  2992241/I,

  2992300,

  2992662,

  2992662/I,

  2995711,

  2997376,

  2R0-127177,

  2R0-127177 A,

  2RD-127491,

  32/401102,

  32/912001A,

  32/925423,

  32/925760,

  32/925869,

  32/925915,

  320/07155,

  343144,

  4102H.15.110,

  4102H.15.110N00,

  4102H.15.20,

  500315480,

  500315480/I,

  500316868,

  550228,

  550228/N,

  582042,

  612630080011N00,

  612630080087,

  7146717,

  8159975/O,

  81BASE921,

  98439681,

  AR50041,

  BC1132N01,

  BF0X-9155 AA,

  BF5T-9155 AB,

  BF8T-9155 DA,

  DDN-99162 B,

  DONN-9N074 BG,

  E5HT-9155 CA,

  E7HN-9155 AA,

  FF42000,

  FF5421,

  FF5458,

  FF5488,

  FS1000,

  FS1015,

  FS1241,

  FS1242,

  FS1280,

  PSD460/1,

  PSD970/1,

  R28-30M,

  RC45MB,

  RE62418,

  RK120MBQ2,

  T22VA,

  WK-723}

 8  {005227,

  2641311,

  2641371,

  2641406,

  2641725,

  2641729,

  2641808,

  376518,

  4757883,

  72013,

  72061,

  8190393,

  9986316,

  D8NN-9350 AA,

  DDN-9350,

  RE42211}

 9  {0055,

  0087,

  0482,

  0484,

  0531,

  11329,

  8311}

 10 {007.0762.0/40,

  014.0428.0,

  1114036,

  1118369,

  1118375,

  1118376,

  1118377,

  1118379,

  1305546,

  1312934,

  1677591,

  1677592,

  1677593,

  2.1539.130.0,

  2.1539.259.0,

  20515059/C,

  275092/C,

  275636/C,

  2RD-107124,

  31358393-G,

  3135X031,

  3135X063,

  4622074,

  4622074/G,

  4742199,

  4742202,

  4770623,

  4803030/G,

  500337911,

  

Re: [R] R Studio v3.0.3 for Windows 32bits is too slow

2014-07-09 Thread Jeff Newmiller
Grumpy today, Bert?

While it is a fact that RStudio is a separate tool from R, it is clear from the 
question that the OP is interested in capabilities that R is providing and he 
simply cannot tell the difference.

OP:

1) Better is a word that leads to pointless arguments. You will have to be 
the judge of what works for you. I caution you that Open Source tools almost 
always achieve success by interoperating with other OS tools, and much of the 
success you have already obtained is the result of many contributions, of which 
R and its contributed packages deserve the lion's share of credit. RStudio is a 
very convenient editor that makes using R and LaTeX and Markdown and version 
control easier, but it is unlikely that either the blame for your 
dissatisfaction or the credit for your success should be attributed to RStudio.

I have successfully used all sorts of plain text editors and command line 
interfaces with R, and if you plan to scale up your projects then you will 
likely want to be very clear on this distinction between editors and computing 
tools so you can distribute your work on multiple parallel servers (where 
editors may not necessarily even be helpful) even if you choose to use RStudio 
as your controlling environment for launching such tasks.

2) and 3) I know that R has contributed packages that can manage Hadoop data 
processing, but I have no personal experience with them. Google is your 
friend... especially if you keep in mind that these tools are not all found in 
one monolithic package.

For future reference: this is a plain text mailing list, so please adjust your 
mail client appropriately when sending to this list. Also, there are 
considerable resources mentioned in the Posting Guide that you should be aware 
of... see the link below.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On July 9, 2014 7:10:00 AM PDT, Bert Gunter gunter.ber...@gene.com wrote:
RStudio is a separate product with its own support. Post there, not
here.

-- Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
Clifford Stoll




On Tue, Jul 8, 2014 at 7:34 PM, Phan, Truong Q
troung.p...@team.telstra.com wrote:
 Hi R'er,

 I have a dataset which has a matrix of 7502 x 1426 (rows x columns).
 The data is in a CSV format which has a size around 68Mb. This
dataset is less than 10% of our dataset.
 I have been adopting the Anomaly detection method as described by
http://www.mattpeeples.net/kmeans.html .
 It has been running more than 24hrs and still haven't completed the
calculation.
 I did manage to run it with a smaller dataset (ie, 2100 rows x 1426
columns). It took around 12hrs to run.

 I have a few questions and need your expertise guidance.

 1)  Is there any better Open source tools to use to do in one
tool (eg, R Studio): prepare data, build models, validate models, test
models and present data. I am looking a tool which will allow me to do
the same as per the above link (Matt Peeples' blog).

 2)  Is there an Open source tools to perform the above which will
allow me to run on top of Hadoop eco-system?

 3)  Can we use R Studio for windows as a client to run on top of
Hadoop eco-system? If yes, please point me to the site where they have
a use cases or samples.

 Thanks and Regards,
 Truong Phan

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Survival Analysis with an Historical Control

2014-07-09 Thread Andrews, Chris
The code is actually available at the websites you provide.  Try View page 
source in your browser.  The most cryptic code isn't needed because the math 
functions (e.g, incomplete gamma function) are available in R.


-Original Message-
From: Paul Miller [mailto:pjmiller...@yahoo.com] 
Sent: Tuesday, July 08, 2014 12:00 PM
To: r-help@r-project.org
Subject: [R] Survival Analysis with an Historical Control

Hello All,

I'm trying to figure out how to perform a survival analysis with an historical 
control. I've spent some time looking online and in my boooks but haven't found 
much showing how to do this. Was wondering if there is a R package that can do 
it, or if there are resources somewhere that show the actual steps one takes, 
or if some knowledgeable person might be willing to share some code. 

Here is a statement that describes the sort of analyis I'm being asked to do.

A one-sample parametric test assuming an exponential form of survival was used 
to test the hypothesis that the treatment produces a median PFS no greater than 
the historical control PFS of 16 weeks.  A sample median PFS greater than 20.57 
weeks would fall beyond the critical value associated with the null hypothesis, 
and would be considered statistically significant at alpha = .05, 1 tailed.  

My understanding is that the cutoff of 20.57 weeks was obtained using an online 
calculator that can be found at:

http://www.swogstat.org/stat/public/one_survival.htm

Thus far, I've been unable to determine what values were plugged into the 
calculator to get the cutoff.

There's another calculator for a nonparamertric test that can be found at:

http://www.swogstat.org/stat/public/one_nonparametric_survival.htm

It would be nice to try doing this using both a parameteric and a 
non-parametric model.

So my first question would be whether the approach outlined above is valid or 
if the analysis should be done some other way. If the basic idea is correct, is 
it relatively easy (for a Terry Therneau type genius) to implement the whole 
thing using R? The calculator is a great tool, but, if reasonable, it would be 
nice to be able to look at some code to see how the numbers actually get 
produced.

Below are some sample survival data and code in case this proves helpful.

Thanks,

Paul

###
 Example Data: GD2 Vaccine 
###

connection - textConnection(
GD2  1   8 12  GD2  3 -12 10  GD2  6 -52  7
GD2  7  28 10  GD2  8  44  6  GD2 10  14  8
GD2 12   3  8  GD2 14 -52  9  GD2 15  35 11
GD2 18   6 13  GD2 20  12  7  GD2 23  -7 13
GD2 24 -52  9  GD2 26 -52 12  GD2 28  36 13
GD2 31 -52  8  GD2 33   9 10  GD2 34 -11 16
GD2 36 -52  6  GD2 39  15 14  GD2 40  13 13
GD2 42  21 13  GD2 44 -24 16  GD2 46 -52 13
GD2 48  28  9  GD2  2  15  9  GD2  4 -44 10
GD2  5  -2 12  GD2  9   8  7  GD2 11  12  7
GD2 13 -52  7  GD2 16  21  7  GD2 17  19 11
GD2 19   6 16  GD2 21  10 16  GD2 22 -15  6
GD2 25   4 15  GD2 27  -9  9  GD2 29  27 10
GD2 30   1 17  GD2 32  12  8  GD2 35  20  8
GD2 37 -32  8  GD2 38  15  8  GD2 41   5 14
GD2 43  35 13  GD2 45  28  9  GD2 47   6 15
)

hsv - data.frame(scan(connection, list(VAC=, PAT=0, WKS=0, X=0)))
hsv - transform(hsv, CENS=ifelse(WKS  1, 1, 0), WKS=abs(WKS))
head(hsv)

require(survival)

survObj - Surv(hsv$WKS, hsv$CENS==0) ~ 1

km - survfit(survObj, type=c(kaplan-meier))
print(km)

paraExp - survreg(survObj, dist=exponential)
print(paraExp)


**
Electronic Mail is not secure, may not be read every day, and should not be 
used for urgent or sensitive issues 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reorder a list

2014-07-09 Thread William Dunlap
Is the following 'g' what you want?  A better example might be with
   A2a - lapply(A1, function(x)x+seq_along(x)/(100*length(x)))

g - function (x, y) {
xLengths - vapply(x, FUN = length, FUN.VALUE = 0L)
yLengths - vapply(y, FUN = length, FUN.VALUE = 0L)
stopifnot(identical(xLengths, yLengths))
split(unlist(y, use.names = FALSE), unlist(x, use.names = FALSE))
}
Used as
 g(A1,A2)
$`1`
[1] 2.718282

$`2`
[1] 7.389056 7.389056

$`3`
[1] 20.08554

$`4`
[1] 54.59815 54.59815 54.59815

$`5`
[1] 148.4132 148.4132

$`13`
[1] 442413.4

$`23`
[1] 9744803446


Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Wed, Jul 9, 2014 at 2:04 AM, Lorenzo Alfieri alfio...@hotmail.com
wrote:

 Thanks Bill and the other guys for the variety of useful replies!
 In fact I'm working with pretty big lists (with ~35000 sublists) and
 Bill's solution is the fastest one in terms of computing time.
 Now comes the second part of the question... :-)
 I've my usual list of values and time indices to sort:
 A1-list(c(1:4),c(2,4,5),23,c(4,5,13))
 and then another list A2 with variables which have to be paired with the
 values of A1:
 A2-sapply(A1, exp)#(in my case there's no exp relation between A1
 and A2, they're completely uncorrelated. That's just an example )
  A2
 [[1]]
 [1]  2.718282  7.389056 20.085537 54.598150

 [[2]]
 [1]   7.389056  54.598150 148.413159

 [[3]]
 [1] 9744803446

 [[4]]
 [1] 54.59815148.41316 442413.39201

 Now I'd like to reorder the elements of A2 according to the same rule
 applied for A1:

 f - function (x) {
 lengths - vapply(x, FUN = length, FUN.VALUE = 0L)
 split(rep(seq_along(x), lengths), unlist(x, use.names = FALSE))
 }
 B1-f(A1)

 and thus obtain a list B2 which looks like this:
  B2
 $`1`
 [1] 2.718282

 $`2`
 [1] 7.389056 7.389056

 $`3`
 [1] 20.08554

 $`4`
 [1] 54.59815 54.59815 54.59815

 $`5`
 [1] 148.4132 148.4132

 $`13`
 [1] 442413.4

 $`23`
 [1] 9744803446

 (In this example each element is the exp() of the sublist name, but in a
 general case they would be uncorrelated, and the resulting elements of each
 sublist would be different)
 Any idea?
 Alfio


  From: wdun...@tibco.com
  Date: Tue, 8 Jul 2014 12:11:09 -0700
  Subject: Re: [R] reorder a list
  To: alfio...@hotmail.com
  CC: r-help@r-project.org

 
  f - function (x) {
  lengths - vapply(x, FUN = length, FUN.VALUE = 0L)
  split(rep(seq_along(x), lengths), unlist(x, use.names = FALSE))
  }
  f(A1) # gives about what you want (has, e.g., name 23, not position
  23, in output)
  Bill Dunlap
  TIBCO Software
  wdunlap tibco.com
 
 
  On Tue, Jul 8, 2014 at 9:39 AM, Lorenzo Alfieri alfio...@hotmail.com
 wrote:
   Hi,
   I'm trying to find a way to reorder the elements of a list.
   Let's say I have a list like this:
   A1-list(c(1:4),c(2,4,5),23,c(4,5,13))
  
   A1
   [[1]]
   [1] 1 2 3 4
  
   [[2]]
   [1] 2 4 5
  
   [[3]]
   [1] 23
  
   [[4]]
   [1] 4 5 13
  
   All the elements included in it are values, while each sublist is a
 time index
   Now, I'd like to reorder the list (without looping) so to obtain one
 sublist for each value, which include all the time indices where each value
 appears.
   In other words, the result should look like this:
  A2
   [[1]]
   [1] 1
  
   [[2]]
   [1] 1 2 #because value 2 appears in the time index [[1]] and [[2]]
 of A1
  
   [[3]]
   [1] 1
  
   [[4]]
   [1] 1 2 4
  
   [[5]]
   [1] 2 4
  
   [[13]]
   [1] 4
  
   [[23]]
   [1] 3
  
   Any suggestion?
   Thanks
   Alfio
  
  
   [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reorder a list

2014-07-09 Thread Lorenzo Alfieri
Thanks for the suggestion.
I found that I get the result I wanted with this simple command:

split(unlist(A2),unlist(A1))

$`1`
[1] 2.718282

$`2`
[1] 7.389056 7.389056

$`3`
[1] 20.08554

$`4`
[1] 54.59815 54.59815 54.59815

$`5`
[1] 148.4132 148.4132

$`13`
[1] 442413.4

$`23`
[1] 9744803446

which is indeed very similar to Bill's solution

Alfio

From: wdun...@tibco.com
Date: Wed, 9 Jul 2014 09:26:14 -0700
Subject: Re: [R] reorder a list
To: alfio...@hotmail.com
CC: r-help@r-project.org

Is the following 'g' what you want?  A better example might be with   A2a - 
lapply(A1, function(x)x+seq_along(x)/(100*length(x)))
g - function (x, y) {

xLengths - vapply(x, FUN = length, FUN.VALUE = 0L)yLengths - 
vapply(y, FUN = length, FUN.VALUE = 0L)stopifnot(identical(xLengths, 
yLengths))split(unlist(y, use.names = FALSE), unlist(x, use.names = FALSE))

}Used as g(A1,A2)$`1`[1] 2.718282
$`2`[1] 7.389056 7.389056
$`3`[1] 20.08554


$`4`[1] 54.59815 54.59815 54.59815
$`5`[1] 148.4132 148.4132
$`13`[1] 442413.4
$`23`[1] 9744803446


Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Wed, Jul 9, 2014 at 2:04 AM, Lorenzo Alfieri alfio...@hotmail.com wrote:





Thanks Bill and the other guys for the variety of useful replies!
In fact I'm working with pretty big lists (with ~35000 sublists) and Bill's 
solution is the fastest one in terms of computing time.


Now comes the second part of the question... :-)
I've my usual list of values and time indices to sort:
A1-list(c(1:4),c(2,4,5),23,c(4,5,13))
and then another list A2 with variables which have to be paired with the values 
of A1:


A2-sapply(A1, exp)#(in my case there's no exp relation between A1 and 
A2, they're completely uncorrelated. That's just an example ) 
 A2
[[1]]
[1]  2.718282  7.389056 20.085537 54.598150



[[2]]
[1]   7.389056  54.598150 148.413159

[[3]]
[1] 9744803446

[[4]]
[1] 54.59815148.41316 442413.39201

Now I'd like to reorder the elements of A2 according to the same rule applied 
for A1:


 f - function (x) {
lengths - vapply(x, FUN = length, FUN.VALUE = 0L)
split(rep(seq_along(x), lengths), unlist(x, use.names = FALSE))
}
B1-f(A1)

and thus obtain a list B2 which looks like this:


 B2
$`1`
[1] 2.718282

$`2`
[1] 7.389056 7.389056

$`3`
[1] 20.08554

$`4`
[1] 54.59815 54.59815 54.59815

$`5`
[1] 148.4132 148.4132

$`13`
[1] 442413.4

$`23`


[1] 9744803446

(In this example each element is the exp() of the sublist name, but in a 
general case they would be uncorrelated, and the resulting elements of each 
sublist would be different)
Any idea?
Alfio


 

 From: wdun...@tibco.com
 Date: Tue, 8 Jul 2014 12:11:09 -0700
 Subject: Re: [R] reorder a list
 To: alfio...@hotmail.com


 CC: r-help@r-project.org
 
 f - function (x) {
 lengths - vapply(x, FUN = length, FUN.VALUE = 0L)
 split(rep(seq_along(x), lengths), unlist(x, use.names = FALSE))


 }
 f(A1) # gives about what you want (has, e.g., name 23, not position
 23, in output)
 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com


 
 
 On Tue, Jul 8, 2014 at 9:39 AM, Lorenzo Alfieri alfio...@hotmail.com wrote:
  Hi,
  I'm trying to find a way to reorder the elements of a list.


  Let's say I have a list like this:
  A1-list(c(1:4),c(2,4,5),23,c(4,5,13))
 
  A1
  [[1]]
  [1] 1 2 3 4
 
  [[2]]
  [1] 2 4 5


 
  [[3]]
  [1] 23
 
  [[4]]
  [1]  4  5 13
 
  All the elements included in it are values, while each sublist is a time 
  index
  Now, I'd like to reorder the list (without looping) so to obtain one 
  sublist for each value, which include all the time indices where each value 
  appears.


  In other words, the result should look like this:
 A2
  [[1]]
  [1] 1
 
  [[2]]
  [1] 1 2#because value 2 appears in the time index [[1]] and [[2]] of 
  A1


 
  [[3]]
  [1] 1
 
  [[4]]
  [1] 1 2 4
 
  [[5]]
  [1] 2 4
 
  [[13]]
  [1] 4
 


  [[23]]
  [1] 3
 
  Any suggestion?
  Thanks
  Alfio
 
 
  [[alternative HTML version deleted]]
 
  __


  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help


  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


  

  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] symbols in a data frame

2014-07-09 Thread Sam Albers
Hello,

I have recently received a dataset from a metal analysis company. The
dataset is filled with less than symbols. What I am looking for is a
efficient way to subset for any whole numbers from the dataset. The column
is automatically formatted as a factor because of the  symbols making it
difficult to deal with the numbers is a useful way.

So in sum any ideas on how I could subset the example below for only whole
numbers?

Thanks in advance!

Sam

#code

metals -


structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label
= c(Antimony,
Arsenic, Barium, Beryllium, Boron (Hot Water Soluble),
Cadmium, Chromium, Cobalt, Copper, Lead, Mercury,
Molybdenum, Nickel, pH 1:2, Selenium, Silver, Thallium,
Tin, Vanadium, Zinc), class = factor), Cedar.Creek = structure(c(3L,
3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L,
4L, 4L, 3L), .Label = c(1, 10, 100, 1000, 200,
5, 500, 0.1, 0.13, 0.5, 0.8, 1.07, 1.1, 1.4,
1.5, 137, 154, 163, 165, 169, 178, 2.3, 2.4,
22, 24, 244, 27.2, 274, 3, 3.1, 40.2, 43, 50,
516, 53.3, 550, 569, 65, 66.1, 68, 7.6, 72,
77, 89, 951), class = factor)), .Names = c(Parameter,
Cedar.Creek), row.names = c(NA, 19L), class = data.frame)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] symbols in a data frame

2014-07-09 Thread Sarah Goslee
Hi Sam,

I'd take the similar tack of removing the  instead. Note that if you
import the data frame using the stringsAsFactors=FALSE argument, you
don't need the first step.

metals$Cedar.Creek - as.character(metals$Cedar.Creek)
metals$Cedar.Creek - gsub(, , metals$Cedar.Creek)
metals$Cedar.Creek - as.numeric(metals$Cedar.Creek)

R str(metals)
'data.frame':19 obs. of  2 variables:
 $ Parameter  : Factor w/ 20 levels Antimony,Arsenic,..: 1 2 3 4 6
7 8 9 10 11 ...
 $ Cedar.Creek: num  100 100 500 100 10 1000 100 516 550 10 ...

Sarah


On Wed, Jul 9, 2014 at 1:19 PM, Sam Albers tonightstheni...@gmail.com wrote:
 Hello,

 I have recently received a dataset from a metal analysis company. The
 dataset is filled with less than symbols. What I am looking for is a
 efficient way to subset for any whole numbers from the dataset. The column
 is automatically formatted as a factor because of the  symbols making it
 difficult to deal with the numbers is a useful way.

 So in sum any ideas on how I could subset the example below for only whole
 numbers?

 Thanks in advance!

 Sam

 #code

 metals -


 structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L,
 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label
 = c(Antimony,
 Arsenic, Barium, Beryllium, Boron (Hot Water Soluble),
 Cadmium, Chromium, Cobalt, Copper, Lead, Mercury,
 Molybdenum, Nickel, pH 1:2, Selenium, Silver, Thallium,
 Tin, Vanadium, Zinc), class = factor), Cedar.Creek = structure(c(3L,
 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L,
 4L, 4L, 3L), .Label = c(1, 10, 100, 1000, 200,
 5, 500, 0.1, 0.13, 0.5, 0.8, 1.07, 1.1, 1.4,
 1.5, 137, 154, 163, 165, 169, 178, 2.3, 2.4,
 22, 24, 244, 27.2, 274, 3, 3.1, 40.2, 43, 50,
 516, 53.3, 550, 569, 65, 66.1, 68, 7.6, 72,
 77, 89, 951), class = factor)), .Names = c(Parameter,
 Cedar.Creek), row.names = c(NA, 19L), class = data.frame)


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] symbols in a data frame

2014-07-09 Thread Marc Schwartz
On Jul 9, 2014, at 12:19 PM, Sam Albers tonightstheni...@gmail.com wrote:

 Hello,
 
 I have recently received a dataset from a metal analysis company. The
 dataset is filled with less than symbols. What I am looking for is a
 efficient way to subset for any whole numbers from the dataset. The column
 is automatically formatted as a factor because of the  symbols making it
 difficult to deal with the numbers is a useful way.
 
 So in sum any ideas on how I could subset the example below for only whole
 numbers?
 
 Thanks in advance!
 
 Sam
 
 #code
 
 metals -
 
 
 structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L,
 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label
 = c(Antimony,
 Arsenic, Barium, Beryllium, Boron (Hot Water Soluble),
 Cadmium, Chromium, Cobalt, Copper, Lead, Mercury,
 Molybdenum, Nickel, pH 1:2, Selenium, Silver, Thallium,
 Tin, Vanadium, Zinc), class = factor), Cedar.Creek = structure(c(3L,
 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L,
 4L, 4L, 3L), .Label = c(1, 10, 100, 1000, 200,
 5, 500, 0.1, 0.13, 0.5, 0.8, 1.07, 1.1, 1.4,
 1.5, 137, 154, 163, 165, 169, 178, 2.3, 2.4,
 22, 24, 244, 27.2, 274, 3, 3.1, 40.2, 43, 50,
 516, 53.3, 550, 569, 65, 66.1, 68, 7.6, 72,
 77, 89, 951), class = factor)), .Names = c(Parameter,
 Cedar.Creek), row.names = c(NA, 19L), class = data.frame)


Sam,

You can use ?gsub to remove the '' characters from the column and then use 
?subset to select the records you wish.

Note that gsub() returns a character vector, so you want to coerce to numeric.

 as.numeric(gsub(, , metals$Cedar.Creek))
 [1]  100  100  500  100   10 1000  100  516  550   10  200  500  100
[14]  500  100  951 1000 1000  100


For example:

 subset(metals, as.numeric(gsub(, , Cedar.Creek)) == 100)
   Parameter Cedar.Creek
1   Antimony100
2Arsenic100
4  Beryllium100
7 Cobalt100
13  Selenium100
15  Thallium100
19  Antimony100


 subset(metals, as.numeric(gsub(, , Cedar.Creek)) = 500)
Parameter Cedar.Creek
1Antimony100
2 Arsenic100
3  Barium500
4   Beryllium100
5 Cadmium 10
7  Cobalt100
10Mercury 10
11 Molybdenum200
12 Nickel500
13   Selenium100
14 Silver500
15   Thallium100
19   Antimony100


You can also just create a new column that is numeric and go from there:

metals$CC.Num - as.numeric(gsub(, , metals$Cedar.Creek))

 str(metals)
'data.frame':   19 obs. of  3 variables:
 $ Parameter  : Factor w/ 20 levels Antimony,Arsenic,..: 1 2 3 4 6 7 8 9 10 
11 ...
 $ Cedar.Creek: Factor w/ 45 levels 1,10,100,..: 3 3 7 3 2 4 3 34 36 2 
...
 $ CC.Num : num  100 100 500 100 10 1000 100 516 550 10 ...


 metals
Parameter Cedar.Creek CC.Num
1Antimony100100
2 Arsenic100100
3  Barium500500
4   Beryllium100100
5 Cadmium 10 10
6Chromium   1000   1000
7  Cobalt100100
8  Copper 516516
9Lead 550550
10Mercury 10 10
11 Molybdenum200200
12 Nickel500500
13   Selenium100100
14 Silver500500
15   Thallium100100
16Tin 951951
17   Vanadium   1000   1000
18   Zinc   1000   1000
19   Antimony100100



Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] symbols in a data frame

2014-07-09 Thread Bert Gunter
Well, ?grep and ?regex are clearly apropos here -- dealing with
character data is an essential skill for handling input from diverse
sources with various formatting conventions. I suggest you go through
one of the many regular expression tutorials on the web to learn more.

But this may not be the important issue here at all. If k means the
value is left censored at k -- i.e. we know it's less than k but not
how much less -- than Sarah's proposal is not what you want to do.
Exactly what you do want to do depends on context, and as it concerns
statistical methodology, is not something that should be discussed
here. Consult a local statistician if this is a correct guess.
Otherwise ignore.

... and please post in plain text in future (as requested) as HTML can
get garbled.

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
Clifford Stoll




On Wed, Jul 9, 2014 at 10:26 AM, Sarah Goslee sarah.gos...@gmail.com wrote:
 Hi Sam,

 I'd take the similar tack of removing the  instead. Note that if you
 import the data frame using the stringsAsFactors=FALSE argument, you
 don't need the first step.

 metals$Cedar.Creek - as.character(metals$Cedar.Creek)
 metals$Cedar.Creek - gsub(, , metals$Cedar.Creek)
 metals$Cedar.Creek - as.numeric(metals$Cedar.Creek)

 R str(metals)
 'data.frame':19 obs. of  2 variables:
  $ Parameter  : Factor w/ 20 levels Antimony,Arsenic,..: 1 2 3 4 6
 7 8 9 10 11 ...
  $ Cedar.Creek: num  100 100 500 100 10 1000 100 516 550 10 ...

 Sarah


 On Wed, Jul 9, 2014 at 1:19 PM, Sam Albers tonightstheni...@gmail.com wrote:
 Hello,

 I have recently received a dataset from a metal analysis company. The
 dataset is filled with less than symbols. What I am looking for is a
 efficient way to subset for any whole numbers from the dataset. The column
 is automatically formatted as a factor because of the  symbols making it
 difficult to deal with the numbers is a useful way.

 So in sum any ideas on how I could subset the example below for only whole
 numbers?

 Thanks in advance!

 Sam

 #code

 metals -


 structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L,
 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label
 = c(Antimony,
 Arsenic, Barium, Beryllium, Boron (Hot Water Soluble),
 Cadmium, Chromium, Cobalt, Copper, Lead, Mercury,
 Molybdenum, Nickel, pH 1:2, Selenium, Silver, Thallium,
 Tin, Vanadium, Zinc), class = factor), Cedar.Creek = structure(c(3L,
 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L,
 4L, 4L, 3L), .Label = c(1, 10, 100, 1000, 200,
 5, 500, 0.1, 0.13, 0.5, 0.8, 1.07, 1.1, 1.4,
 1.5, 137, 154, 163, 165, 169, 178, 2.3, 2.4,
 22, 24, 244, 27.2, 274, 3, 3.1, 40.2, 43, 50,
 516, 53.3, 550, 569, 65, 66.1, 68, 7.6, 72,
 77, 89, 951), class = factor)), .Names = c(Parameter,
 Cedar.Creek), row.names = c(NA, 19L), class = data.frame)


 --
 Sarah Goslee
 http://www.functionaldiversity.org

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to include factor levels into plot title?

2014-07-09 Thread Beatriz

@ Sarah
Thanks a lot, paste does the job perfectly!


On 09/07/2014 17:46, Sarah Goslee wrote:

How about:


plot(dd$Sepal.Length, dd$Petal.Length, main=paste(These are the
levels:, paste(levels(dd$Species), collapse=, )))


Thanks for the actual reproducible example!

Sarah

On Wed, Jul 9, 2014 at 11:24 AM, Bea GD aguitatie...@hotmail.com wrote:

Hi all,

I'd like to include the levels of one of my variables in the title of a
plot. I'd like these factor levels to be concatenated. E.g. 'These are the
levels: setosa, versicolor, virginica'.

I've been working with this code but I don't get the desired results. Any
suggestions would be a great help. Thanks!

dd - iris

plot(dd$Sepal.Length, dd$Petal.Length,
  main=sprintf(These are the levels: %s, levels(dd$Species)))



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] symbols in a data frame

2014-07-09 Thread Sam Albers
Thanks for all the responses. It sometimes difficult to outline
exactly what you need. These response were helpful to get there.
Speaking to Bert's point a bit, I needed a column to identify where
the  symbol was used. If I knew more about R I think I might be
embarrassed to post my solution to that problem but here is how I used
Sarah's solution but still kept the info about detection limits. I'm
sure there is a more elegant way:

metals -
structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label
= c(Antimony,
Arsenic, Barium, Beryllium, Boron (Hot Water Soluble),
Cadmium, Chromium, Cobalt, Copper, Lead, Mercury,
Molybdenum, Nickel, pH 1:2, Selenium, Silver, Thallium,
Tin, Vanadium, Zinc), class = factor), Cedar.Creek = structure(c(3L,
3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L,
4L, 4L, 3L), .Label = c(1, 10, 100, 1000, 200,
5, 500, 0.1, 0.13, 0.5, 0.8, 1.07, 1.1, 1.4,
1.5, 137, 154, 163, 165, 169, 178, 2.3, 2.4,
22, 24, 244, 27.2, 274, 3, 3.1, 40.2, 43, 50,
516, 53.3, 550, 569, 65, 66.1, 68, 7.6, 72,
77, 89, 951), class = factor)), .Names = c(Parameter,
Cedar.Creek), row.names = c(NA, 19L), class = data.frame)



metals$temp1-metals$Cedar.Creek
metals$Cedar.Creek - as.character(metals$Cedar.Creek)
metals$Cedar.Creek - gsub(, , metals$Cedar.Creek)
metals$Cedar.Creek - as.numeric(metals$Cedar.Creek)

metals$temp2-metals$temp1==metals$Cedar.Creek
metals$Detection-factor(ifelse(metals$temp2==TRUE,Measured,Limit))
metals[,c(1,2,5)]


Thanks again!

Sam

On Wed, Jul 9, 2014 at 10:41 AM, Bert Gunter gunter.ber...@gene.com wrote:
 Well, ?grep and ?regex are clearly apropos here -- dealing with
 character data is an essential skill for handling input from diverse
 sources with various formatting conventions. I suggest you go through
 one of the many regular expression tutorials on the web to learn more.

 But this may not be the important issue here at all. If k means the
 value is left censored at k -- i.e. we know it's less than k but not
 how much less -- than Sarah's proposal is not what you want to do.
 Exactly what you do want to do depends on context, and as it concerns
 statistical methodology, is not something that should be discussed
 here. Consult a local statistician if this is a correct guess.
 Otherwise ignore.

 ... and please post in plain text in future (as requested) as HTML can
 get garbled.

 Bert Gunter
 Genentech Nonclinical Biostatistics
 (650) 467-7374

 Data is not information. Information is not knowledge. And knowledge
 is certainly not wisdom.
 Clifford Stoll




 On Wed, Jul 9, 2014 at 10:26 AM, Sarah Goslee sarah.gos...@gmail.com wrote:
 Hi Sam,

 I'd take the similar tack of removing the  instead. Note that if you
 import the data frame using the stringsAsFactors=FALSE argument, you
 don't need the first step.

 metals$Cedar.Creek - as.character(metals$Cedar.Creek)
 metals$Cedar.Creek - gsub(, , metals$Cedar.Creek)
 metals$Cedar.Creek - as.numeric(metals$Cedar.Creek)

 R str(metals)
 'data.frame':19 obs. of  2 variables:
  $ Parameter  : Factor w/ 20 levels Antimony,Arsenic,..: 1 2 3 4 6
 7 8 9 10 11 ...
  $ Cedar.Creek: num  100 100 500 100 10 1000 100 516 550 10 ...

 Sarah


 On Wed, Jul 9, 2014 at 1:19 PM, Sam Albers tonightstheni...@gmail.com 
 wrote:
 Hello,

 I have recently received a dataset from a metal analysis company. The
 dataset is filled with less than symbols. What I am looking for is a
 efficient way to subset for any whole numbers from the dataset. The column
 is automatically formatted as a factor because of the  symbols making it
 difficult to deal with the numbers is a useful way.

 So in sum any ideas on how I could subset the example below for only whole
 numbers?

 Thanks in advance!

 Sam

 #code

 metals -


 structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L,
 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label
 = c(Antimony,
 Arsenic, Barium, Beryllium, Boron (Hot Water Soluble),
 Cadmium, Chromium, Cobalt, Copper, Lead, Mercury,
 Molybdenum, Nickel, pH 1:2, Selenium, Silver, Thallium,
 Tin, Vanadium, Zinc), class = factor), Cedar.Creek = structure(c(3L,
 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L,
 4L, 4L, 3L), .Label = c(1, 10, 100, 1000, 200,
 5, 500, 0.1, 0.13, 0.5, 0.8, 1.07, 1.1, 1.4,
 1.5, 137, 154, 163, 165, 169, 178, 2.3, 2.4,
 22, 24, 244, 27.2, 274, 3, 3.1, 40.2, 43, 50,
 516, 53.3, 550, 569, 65, 66.1, 68, 7.6, 72,
 77, 89, 951), class = factor)), .Names = c(Parameter,
 Cedar.Creek), row.names = c(NA, 19L), class = data.frame)


 --
 Sarah Goslee
 http://www.functionaldiversity.org

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__

Re: [R] HPGL or PCL plotting device? Or otherwise plotting plots

2014-07-09 Thread Thomas Levine
Actually, this doesn't _quite_ do what I want;
I want different R colors (1, 2, 3, c.) to select
different pens in HPGL (SP1, SP2, SP3, c.),
but the HPGL file I get selects only pen 1.

A hacky way to do this would be to generate
a few different postscript files for the different
colors on the plot, create the corresponding HPGL
files, edit the SP command in each of them, and
concatenate them. But maybe there's a better way?

On 09 Jul 13:32, Thomas Levine wrote:
 Oh it was easier than I thought.
 
   postscript('project-contracts.ps')
   hist(log(projects$n.contracts))
   dev.off()
 
 Then run this from the shell.
 
   pstoedit -f plot-hpgl project-contracts.ps project-contracts.hpgl
 
 And send it to the plotter.
 
 On 09 Jul 13:10, Thomas Levine wrote:
  Hi,
  
  I want to print plots on a Roland DXY-1100 plotter.
  How can I do this from R? I think the easiest thing
  would be a graphics device for Printer Command
  Language or Hewlett-Packard Graphics Language, but
  I haven't managed to find any of those.
  
  Thanks
  
  Tom

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cansisc: Error in eigen(eHe, symmetric = TRUE)

2014-07-09 Thread Maria Judith Carmona H
Hi,

I have a problem using the function Candisc from Candisc Package.

bosques1-read.csv(bosques1.csv,header=TRUE,encoding=latin1)
bosques1-na.exclude(bosques1)
attach(bosques1)

#Modelo de regresión
mod -
lm(cbind(biomasa,altdosel,altsoto,cobertura,riqarb,elevacion,temperatura,precipitacion,
Acanthaceae, Apocinaceae, Araceae, Araliaceae, Arecaceae,
Aspleniaceae, Begoniaceae,
Blechnaceae, Bromeliaceae, Clusiaceae, Cyclanthaceae,
Davalliaceae, Denstaedtiaceae,
Dryopteridaceae, Ericaceae, Gesneriaceae, Hymenophyllaceae,
indet., Lauraceae, Lomariopsidaceae, Lycopodiaceae, Melastomataceae,
Moraceae, Myrsinaceae, Ophioglossaceae,
Orchidaceae, Peperomia, Piperaceae, Poaceae, Polypodiaceae,
Primulaceae, Pteridaceae,
Pteridophyta.taxa, Rubiaceae, Vittariaceae) ~ sitio,
data=bosques1)
summary(mod)

#Gráfico 1
can - candisc(mod, term=sitio,data=bosques1,ndim=1,eig=T)
### The error happens here, so I can not run the plot.
plot(can,titles.1d = c(Puntuación canónica, Estructura))
summary(can, means = FALSE, scores = TRUE, coef = c(std), digits = 2)

The error is:
Error in eigen(eHe, symmetric = TRUE) : infinite or missing values in 'x'
In addition: Warning message:
In sqrt(wmd) : NaNs produced

Please help!

-- 
Maria Judith Carmona Higuita.
Estudiante de Biología - Universidad de Antioquia
Medellín - Colombia

La felicidad ocurre cuando encajas en tu vida, cuando encajas
tan armónicamente que cualquier cosa que hagas es una alegría para ti. De
repente lo sabrás y la meditación te seguirá. Si amas el trabajo que haces,
si amas la manera como vives, entonces ya estás meditando y nada puede
distraerte. Osho

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using match to obtain non-sorted index values from non-sorted vector

2014-07-09 Thread Folkes, Michael
Hello all,

I've been struggling with the best way to find index values from a large
vector with elements that will match elements of a subset vector [the
table argument in match()]. 

BUT the index values can't come out sorted (as we'd get in  which(X %in%
Y) ).

My 'population' vector can't be sorted. 

pop.df - data.frame(pop=c(1,6,4,3,10)) 

The subset:  Tset = c(10,3,6)

 

So I'd like to get these index values (from pop.df) , in this order:
5,4,2

 

If it could be sorted I could use:

which(sort(pop.df$pop) %in% sort(Tset))

 

But sorting will cause more grief later, so best not mess with it.

Here is my hopefully adequate MWE of a solution. I'm keen to see if
anybody has a better suggestion. 

Thanks!

_

###BEGIN R

#pop is the full set of values, it has no info on their ranking

# I don't want to sort these data. They need to remain in this order.

pop.df - data.frame(pop=c(1,6,4,3,10))

 

#rank.df is my dataframe that tells me the top three rankings (derived
elsewhere)

rank.df - data.frame(rank=1:3, Tset = c(10,3,6))   # Target set

 

#match.df will be my source of row index based on rank

match.df - data.frame(match.vec= match(pop.df$pop, table=rank.df$Tset),
index.vec=1:nrow(pop.df))

 

#rank.df will now include the index location in the pop.df where I can
find the top three ranks.

rank.df  - merge(rank.df, match.df, by.x='rank', by.y='match.vec')

rank.df

 

END

 

___

Michael Folkes

Salmon Stock Assessment

Canadian Dept. of Fisheries  Oceans 

Pacific Biological Station

3190 Hammond Bay Rd.

Nanaimo, B.C., Canada

V9T-6N7

Ph (250) 756-7264 Fax (250) 756-7053  michael.fol...@dfo-mpo.gc.ca
mailto:michael.fol...@dfo-mpo.gc.ca 




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using match to obtain non-sorted index values from non-sorted vector

2014-07-09 Thread David L Carlson
There may be a faster way, but 

 sapply(Tset, function(x) which(pop.df$pop==x))
[1] 5 4 2

-
David L Carlson
Department of Anthropology
Texas AM University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Folkes, Michael
Sent: Wednesday, July 9, 2014 2:58 PM
To: r-help@r-project.org
Subject: [R] using match to obtain non-sorted index values from non-sorted 
vector

Hello all,

I've been struggling with the best way to find index values from a large
vector with elements that will match elements of a subset vector [the
table argument in match()]. 

BUT the index values can't come out sorted (as we'd get in  which(X %in%
Y) ).

My 'population' vector can't be sorted. 

pop.df - data.frame(pop=c(1,6,4,3,10)) 

The subset:  Tset = c(10,3,6)



So I'd like to get these index values (from pop.df) , in this order:
5,4,2



If it could be sorted I could use:

which(sort(pop.df$pop) %in% sort(Tset))



But sorting will cause more grief later, so best not mess with it.

Here is my hopefully adequate MWE of a solution. I'm keen to see if
anybody has a better suggestion. 

Thanks!

_

###BEGIN R

#pop is the full set of values, it has no info on their ranking

# I don't want to sort these data. They need to remain in this order.

pop.df - data.frame(pop=c(1,6,4,3,10))



#rank.df is my dataframe that tells me the top three rankings (derived
elsewhere)

rank.df - data.frame(rank=1:3, Tset = c(10,3,6))   # Target set



#match.df will be my source of row index based on rank

match.df - data.frame(match.vec= match(pop.df$pop, table=rank.df$Tset),
index.vec=1:nrow(pop.df))



#rank.df will now include the index location in the pop.df where I can
find the top three ranks.

rank.df  - merge(rank.df, match.df, by.x='rank', by.y='match.vec')

rank.df



END



___

Michael Folkes

Salmon Stock Assessment

Canadian Dept. of Fisheries  Oceans 

Pacific Biological Station

3190 Hammond Bay Rd.

Nanaimo, B.C., Canada

V9T-6N7

Ph (250) 756-7264 Fax (250) 756-7053  michael.fol...@dfo-mpo.gc.ca
mailto:michael.fol...@dfo-mpo.gc.ca 




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] symbols in a data frame

2014-07-09 Thread Claudia Beleites
Hi Sam,

 But this may not be the important issue here at all. If k means the
 value is left censored at k -- i.e. we know it's less than k but not
 how much less -- than Sarah's proposal is not what you want to do.
 Exactly what you do want to do depends on context, and as it concerns
 statistical methodology, is not something that should be discussed
 here. Consult a local statistician if this is a correct guess.
I'd like to chime in with Bert's advise here. Unless the  LOQs are
very few*, they have the potential to seriously mess up any further data
analysis. 

Actually, I'd recommend you go one step back and ask the analysis lab
whether they can supply you with the uncensored data, specifying the
LOQ separately. 

A while ago I posted some illustrations about such censoring
at LOQ situations on cross validated, which may help you in forming a
decision how to go on:
http://stats.stackexchange.com/a/30739/4598

Claudia (Analytical Chemist  Chemometrician)


*or you know that they'll not matter for the particular data analysis
you want to do




-- 
Claudia Beleites, Chemist
Spectroscopy/Imaging
Leibniz Institute of Photonic Technology 
Albert-Einstein-Str. 9
07745 Jena
Germany

email: claudia.belei...@ipht-jena.de
phone: +49 3641 206-133
fax:   +49 2641 206-399

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using match to obtain non-sorted index values from non-sortedvector

2014-07-09 Thread Folkes, Michael
So nice! 
Apply wins again.
Thanks David.
Michael

-Original Message-
From: David L Carlson [mailto:dcarl...@tamu.edu] 
Sent: July-09-14 1:11 PM
To: Folkes, Michael; r-help@r-project.org
Subject: RE: using match to obtain non-sorted index values from
non-sortedvector

There may be a faster way, but 

 sapply(Tset, function(x) which(pop.df$pop==x))
[1] 5 4 2

-
David L Carlson
Department of Anthropology
Texas AM University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Folkes, Michael
Sent: Wednesday, July 9, 2014 2:58 PM
To: r-help@r-project.org
Subject: [R] using match to obtain non-sorted index values from
non-sorted vector

Hello all,

I've been struggling with the best way to find index values from a large
vector with elements that will match elements of a subset vector [the
table argument in match()]. 

BUT the index values can't come out sorted (as we'd get in  which(X %in%
Y) ).

My 'population' vector can't be sorted. 

pop.df - data.frame(pop=c(1,6,4,3,10)) 

The subset:  Tset = c(10,3,6)



So I'd like to get these index values (from pop.df) , in this order:
5,4,2



If it could be sorted I could use:

which(sort(pop.df$pop) %in% sort(Tset))



But sorting will cause more grief later, so best not mess with it.

Here is my hopefully adequate MWE of a solution. I'm keen to see if
anybody has a better suggestion. 

Thanks!

_

###BEGIN R

#pop is the full set of values, it has no info on their ranking

# I don't want to sort these data. They need to remain in this order.

pop.df - data.frame(pop=c(1,6,4,3,10))



#rank.df is my dataframe that tells me the top three rankings (derived
elsewhere)

rank.df - data.frame(rank=1:3, Tset = c(10,3,6))   # Target set



#match.df will be my source of row index based on rank

match.df - data.frame(match.vec= match(pop.df$pop, table=rank.df$Tset),
index.vec=1:nrow(pop.df))



#rank.df will now include the index location in the pop.df where I can
find the top three ranks.

rank.df  - merge(rank.df, match.df, by.x='rank', by.y='match.vec')

rank.df



END



___

Michael Folkes

Salmon Stock Assessment

Canadian Dept. of Fisheries  Oceans 

Pacific Biological Station

3190 Hammond Bay Rd.

Nanaimo, B.C., Canada

V9T-6N7

Ph (250) 756-7264 Fax (250) 756-7053  michael.fol...@dfo-mpo.gc.ca
mailto:michael.fol...@dfo-mpo.gc.ca 




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using match to obtain non-sorted index values from non-sortedvector

2014-07-09 Thread David Winsemius

On Jul 9, 2014, at 1:13 PM, Folkes, Michael wrote:

 So nice! 
 Apply wins again.

I doubt that `sapply( ..., which(,) )` would win a foot race with `match`:

 match(Tset, pop.df$pop)
[1] 5 4 2

-- 
David.
 Thanks David.
 Michael
 
 -Original Message-
 From: David L Carlson [mailto:dcarl...@tamu.edu] 
 Sent: July-09-14 1:11 PM
 To: Folkes, Michael; r-help@r-project.org
 Subject: RE: using match to obtain non-sorted index values from
 non-sortedvector
 
 There may be a faster way, but 
 
 sapply(Tset, function(x) which(pop.df$pop==x))
 [1] 5 4 2
 
 -
 David L Carlson
 Department of Anthropology
 Texas AM University
 College Station, TX 77840-4352
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Folkes, Michael
 Sent: Wednesday, July 9, 2014 2:58 PM
 To: r-help@r-project.org
 Subject: [R] using match to obtain non-sorted index values from
 non-sorted vector
 
 Hello all,
 
 I've been struggling with the best way to find index values from a large
 vector with elements that will match elements of a subset vector [the
 table argument in match()]. 
 
 BUT the index values can't come out sorted (as we'd get in  which(X %in%
 Y) ).
 
 My 'population' vector can't be sorted. 
 
 pop.df - data.frame(pop=c(1,6,4,3,10)) 
 
 The subset:  Tset = c(10,3,6)
 
 
 
 So I'd like to get these index values (from pop.df) , in this order:
 5,4,2
 
 
 
 If it could be sorted I could use:
 
 which(sort(pop.df$pop) %in% sort(Tset))
 
 
 
 But sorting will cause more grief later, so best not mess with it.
 
 Here is my hopefully adequate MWE of a solution. I'm keen to see if
 anybody has a better suggestion. 
 
 Thanks!
 
 _
 
 ###BEGIN R
 
 #pop is the full set of values, it has no info on their ranking
 
 # I don't want to sort these data. They need to remain in this order.
 
 pop.df - data.frame(pop=c(1,6,4,3,10))
 
 
 
 #rank.df is my dataframe that tells me the top three rankings (derived
 elsewhere)
 
 rank.df - data.frame(rank=1:3, Tset = c(10,3,6))   # Target set
 
 
 
 #match.df will be my source of row index based on rank
 
 match.df - data.frame(match.vec= match(pop.df$pop, table=rank.df$Tset),
 index.vec=1:nrow(pop.df))
 
 
 #rank.df will now include the index location in the pop.df where I can
 find the top three ranks.
 
 rank.df  - merge(rank.df, match.df, by.x='rank', by.y='match.vec')
 
 rank.df
 
 
 END
 
 
 
 ___
 
 Michael Folkes
 
 Salmon Stock Assessment
 


David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using match to obtain non-sorted index values from non-sortedvector

2014-07-09 Thread Folkes, Michael
Oh dear,
I seem to have suffered a case of reversed arguments. 
This explains my surprise why R didn't have this in a function already -
as it does!
I was following the pattern of  search.vector %in% pattern, but match()
arguments are opposite this.

Thanks to both Davids.
Michael

-Original Message-
From: David Winsemius [mailto:dwinsem...@comcast.net] 
Sent: July-09-14 2:01 PM
To: Folkes, Michael
Cc: David L Carlson; r-help@r-project.org
Subject: Re: [R] using match to obtain non-sorted index values from
non-sortedvector


On Jul 9, 2014, at 1:13 PM, Folkes, Michael wrote:

 So nice! 
 Apply wins again.

I doubt that `sapply( ..., which(,) )` would win a foot race with
`match`:

 match(Tset, pop.df$pop)
[1] 5 4 2

--
David.
 Thanks David.
 Michael
 
 -Original Message-
 From: David L Carlson [mailto:dcarl...@tamu.edu]
 Sent: July-09-14 1:11 PM
 To: Folkes, Michael; r-help@r-project.org
 Subject: RE: using match to obtain non-sorted index values from 
 non-sortedvector
 
 There may be a faster way, but
 
 sapply(Tset, function(x) which(pop.df$pop==x))
 [1] 5 4 2
 
 -
 David L Carlson
 Department of Anthropology
 Texas AM University
 College Station, TX 77840-4352
 
 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org]
 On Behalf Of Folkes, Michael
 Sent: Wednesday, July 9, 2014 2:58 PM
 To: r-help@r-project.org
 Subject: [R] using match to obtain non-sorted index values from 
 non-sorted vector
 
 Hello all,
 
 I've been struggling with the best way to find index values from a 
 large vector with elements that will match elements of a subset vector

 [the table argument in match()].
 
 BUT the index values can't come out sorted (as we'd get in  which(X 
 %in%
 Y) ).
 
 My 'population' vector can't be sorted. 
 
 pop.df - data.frame(pop=c(1,6,4,3,10))
 
 The subset:  Tset = c(10,3,6)
 
 
 
 So I'd like to get these index values (from pop.df) , in this order:
 5,4,2
 
 
 
 If it could be sorted I could use:
 
 which(sort(pop.df$pop) %in% sort(Tset))
 
 
 
 But sorting will cause more grief later, so best not mess with it.
 
 Here is my hopefully adequate MWE of a solution. I'm keen to see if 
 anybody has a better suggestion.
 
 Thanks!
 
 _
 
 ###BEGIN R
 
 #pop is the full set of values, it has no info on their ranking
 
 # I don't want to sort these data. They need to remain in this order.
 
 pop.df - data.frame(pop=c(1,6,4,3,10))
 
 
 
 #rank.df is my dataframe that tells me the top three rankings (derived
 elsewhere)
 
 rank.df - data.frame(rank=1:3, Tset = c(10,3,6))   # Target set
 
 
 
 #match.df will be my source of row index based on rank
 
 match.df - data.frame(match.vec= match(pop.df$pop, 
 table=rank.df$Tset),
 index.vec=1:nrow(pop.df))
 
 
 #rank.df will now include the index location in the pop.df where I can

 find the top three ranks.
 
 rank.df  - merge(rank.df, match.df, by.x='rank', by.y='match.vec')
 
 rank.df
 
 
 END
 
 
 
 ___
 
 Michael Folkes
 
 Salmon Stock Assessment
 


David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R Studio v3.0.3 for Windows 32bits is too slow

2014-07-09 Thread Rolf Turner


On 10/07/14 04:24, Jeff Newmiller wrote:


Grumpy today, Bert?


SNIP

Bert is ***always*** grumpy! :-)  If he weren't, I'd get worried.

But then someone else, not more than a million miles from this email, 
has a strong tendency to be grumpy (acerbic?) as well.


Of course ***I*** am ***never*** grumpy! :-)

cheers,

Rolf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cansisc: Error in eigen(eHe, symmetric = TRUE)

2014-07-09 Thread John Fox
Dear Maria Judith Carmona Higuita,

Since you didn't include enough information (such as your access to your data) 
to reproduce the error, one can only guess. My guess: you have fewer 
observations in your data set than response variables on the LHS of the 
multivariate linear model.

I hope this helps,
 John


John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/

On Wed, 9 Jul 2014 11:36:35 -0500
 Maria Judith Carmona H juditycarm...@gmail.com wrote:
 Hi,
 
 I have a problem using the function Candisc from Candisc Package.
 
 bosques1-read.csv(bosques1.csv,header=TRUE,encoding=latin1)
 bosques1-na.exclude(bosques1)
 attach(bosques1)
 
 #Modelo de regresión
 mod -
 lm(cbind(biomasa,altdosel,altsoto,cobertura,riqarb,elevacion,temperatura,precipitacion,
 Acanthaceae, Apocinaceae, Araceae, Araliaceae, Arecaceae,
 Aspleniaceae, Begoniaceae,
 Blechnaceae, Bromeliaceae, Clusiaceae, Cyclanthaceae,
 Davalliaceae, Denstaedtiaceae,
 Dryopteridaceae, Ericaceae, Gesneriaceae, Hymenophyllaceae,
 indet., Lauraceae, Lomariopsidaceae, Lycopodiaceae, Melastomataceae,
 Moraceae, Myrsinaceae, Ophioglossaceae,
 Orchidaceae, Peperomia, Piperaceae, Poaceae, Polypodiaceae,
 Primulaceae, Pteridaceae,
 Pteridophyta.taxa, Rubiaceae, Vittariaceae) ~ sitio,
 data=bosques1)
 summary(mod)
 
 #Gráfico 1
 can - candisc(mod, term=sitio,data=bosques1,ndim=1,eig=T)
 ### The error happens here, so I can not run the plot.
 plot(can,titles.1d = c(Puntuación canónica, Estructura))
 summary(can, means = FALSE, scores = TRUE, coef = c(std), digits = 2)
 
 The error is:
 Error in eigen(eHe, symmetric = TRUE) : infinite or missing values in 'x'
 In addition: Warning message:
 In sqrt(wmd) : NaNs produced
 
 Please help!
 
 -- 
 Maria Judith Carmona Higuita.
 Estudiante de Biología - Universidad de Antioquia
 Medellín - Colombia
 
 La felicidad ocurre cuando encajas en tu vida, cuando encajas
 tan armónicamente que cualquier cosa que hagas es una alegría para ti. De
 repente lo sabrás y la meditación te seguirá. Si amas el trabajo que haces,
 si amas la manera como vives, entonces ya estás meditando y nada puede
 distraerte. Osho
 
   [[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] symbols in a data frame

2014-07-09 Thread MacQueen, Don
After reading the metals data frame, I would do this:

metals$result - as.numeric(gsub('','',metals$Cedar.Creek))
metals$flag - ifelse(grepl('',metals$Cedar.Creek),'','h')

Also, assuming you got your data into R using read.table(),
read.csv(), or similar, I would include
   stringsAsFactors=TRUE

as another argument to the function call. You don't need factors at this
point.

-Don
-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 7/9/14 11:02 AM, Sam Albers tonightstheni...@gmail.com wrote:

Thanks for all the responses. It sometimes difficult to outline
exactly what you need. These response were helpful to get there.
Speaking to Bert's point a bit, I needed a column to identify where
the  symbol was used. If I knew more about R I think I might be
embarrassed to post my solution to that problem but here is how I used
Sarah's solution but still kept the info about detection limits. I'm
sure there is a more elegant way:

metals -
structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label
= c(Antimony,
Arsenic, Barium, Beryllium, Boron (Hot Water Soluble),
Cadmium, Chromium, Cobalt, Copper, Lead, Mercury,
Molybdenum, Nickel, pH 1:2, Selenium, Silver, Thallium,
Tin, Vanadium, Zinc), class = factor), Cedar.Creek =
structure(c(3L,
3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L,
4L, 4L, 3L), .Label = c(1, 10, 100, 1000, 200,
5, 500, 0.1, 0.13, 0.5, 0.8, 1.07, 1.1, 1.4,
1.5, 137, 154, 163, 165, 169, 178, 2.3, 2.4,
22, 24, 244, 27.2, 274, 3, 3.1, 40.2, 43, 50,
516, 53.3, 550, 569, 65, 66.1, 68, 7.6, 72,
77, 89, 951), class = factor)), .Names = c(Parameter,
Cedar.Creek), row.names = c(NA, 19L), class = data.frame)



metals$temp1-metals$Cedar.Creek
metals$Cedar.Creek - as.character(metals$Cedar.Creek)
metals$Cedar.Creek - gsub(, , metals$Cedar.Creek)
metals$Cedar.Creek - as.numeric(metals$Cedar.Creek)

metals$temp2-metals$temp1==metals$Cedar.Creek
metals$Detection-factor(ifelse(metals$temp2==TRUE,Measured,Limit))
metals[,c(1,2,5)]


Thanks again!

Sam

On Wed, Jul 9, 2014 at 10:41 AM, Bert Gunter gunter.ber...@gene.com
wrote:
 Well, ?grep and ?regex are clearly apropos here -- dealing with
 character data is an essential skill for handling input from diverse
 sources with various formatting conventions. I suggest you go through
 one of the many regular expression tutorials on the web to learn more.

 But this may not be the important issue here at all. If k means the
 value is left censored at k -- i.e. we know it's less than k but not
 how much less -- than Sarah's proposal is not what you want to do.
 Exactly what you do want to do depends on context, and as it concerns
 statistical methodology, is not something that should be discussed
 here. Consult a local statistician if this is a correct guess.
 Otherwise ignore.

 ... and please post in plain text in future (as requested) as HTML can
 get garbled.

 Bert Gunter
 Genentech Nonclinical Biostatistics
 (650) 467-7374

 Data is not information. Information is not knowledge. And knowledge
 is certainly not wisdom.
 Clifford Stoll




 On Wed, Jul 9, 2014 at 10:26 AM, Sarah Goslee sarah.gos...@gmail.com
wrote:
 Hi Sam,

 I'd take the similar tack of removing the  instead. Note that if you
 import the data frame using the stringsAsFactors=FALSE argument, you
 don't need the first step.

 metals$Cedar.Creek - as.character(metals$Cedar.Creek)
 metals$Cedar.Creek - gsub(, , metals$Cedar.Creek)
 metals$Cedar.Creek - as.numeric(metals$Cedar.Creek)

 R str(metals)
 'data.frame':19 obs. of  2 variables:
  $ Parameter  : Factor w/ 20 levels Antimony,Arsenic,..: 1 2 3 4 6
 7 8 9 10 11 ...
  $ Cedar.Creek: num  100 100 500 100 10 1000 100 516 550 10 ...

 Sarah


 On Wed, Jul 9, 2014 at 1:19 PM, Sam Albers
tonightstheni...@gmail.com wrote:
 Hello,

 I have recently received a dataset from a metal analysis company. The
 dataset is filled with less than symbols. What I am looking for is a
 efficient way to subset for any whole numbers from the dataset. The
column
 is automatically formatted as a factor because of the  symbols
making it
 difficult to deal with the numbers is a useful way.

 So in sum any ideas on how I could subset the example below for only
whole
 numbers?

 Thanks in advance!

 Sam

 #code

 metals -


 structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L,
 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label
 = c(Antimony,
 Arsenic, Barium, Beryllium, Boron (Hot Water Soluble),
 Cadmium, Chromium, Cobalt, Copper, Lead, Mercury,
 Molybdenum, Nickel, pH 1:2, Selenium, Silver, Thallium,
 Tin, Vanadium, Zinc), class = factor), Cedar.Creek =
structure(c(3L,
 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L,
 4L, 4L, 3L), .Label = c(1, 10, 100, 1000, 200,
 5, 500, 0.1, 0.13, 0.5, 0.8, 1.07, 1.1, 1.4,
 1.5, 137, 154, 163, 165, 169, 178, 2.3, 2.4,
 22, 24, 244, 27.2, 

[R] function completing properly

2014-07-09 Thread Janet Choate
Hi R community,
i created a function (mkdate) as follows:

mkdate = function(x) {
x$date = as.Date(paste(x$year, x$month, x$day, sep=-))
x$wy = ifelse(x$month =10, x$year+1, x$year)
x$yd = as.integer(format(as.Date(x$date), format=%j))
x$wyd = cal.wyd(x)
x
}

the function results in adding the new columns date, wy, yd, and wyd to the
table i apply it to.
this has always worked in R version 2.14.2.
however, in R version 3.1.0 - instead of my mkdate function adding those
columns to my existing table, it just overwrites my table and leaves me
with just a list of the last variable created by my mkdate function. so i
end up with just a list of numbers representing wyd, and lose all the data
in my original table.

does anyone know what would now be causing this to occur, and what i need
to do to make my function work properly again?

thank you for any assistance,
Janet

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R Studio v3.0.3 for Windows 32bits is too slow

2014-07-09 Thread peter dalgaard
Grumpy today, Jeff?

For the concrete issue, I'd conjecture that the base problem is that there are 
way too many columns in the data and that the nature of the method is not 
properly understood. It is not obvious that k-means clustering based on 
Euclidean distance makes sense in 1426-dimensional space. It is quite possible 
that the data set not even consists of columns measured in the same units. Even 
if it does fit the problem, it is a quite computationally intensive. Some sort 
of feature extraction or data reduction technique is likely to be required.

So basically, further study of the methodology, or contact with a machine 
learning expert (which I am not) seems advisable.

-pd  


On 09 Jul 2014, at 18:24 , Jeff Newmiller jdnew...@dcn.davis.ca.us wrote:

 Grumpy today, Bert?
 
 While it is a fact that RStudio is a separate tool from R, it is clear from 
 the question that the OP is interested in capabilities that R is providing 
 and he simply cannot tell the difference.
 
 OP:
 
 1) Better is a word that leads to pointless arguments. You will have to be 
 the judge of what works for you. I caution you that Open Source tools almost 
 always achieve success by interoperating with other OS tools, and much of the 
 success you have already obtained is the result of many contributions, of 
 which R and its contributed packages deserve the lion's share of credit. 
 RStudio is a very convenient editor that makes using R and LaTeX and Markdown 
 and version control easier, but it is unlikely that either the blame for your 
 dissatisfaction or the credit for your success should be attributed to 
 RStudio.
 
 I have successfully used all sorts of plain text editors and command line 
 interfaces with R, and if you plan to scale up your projects then you will 
 likely want to be very clear on this distinction between editors and 
 computing tools so you can distribute your work on multiple parallel servers 
 (where editors may not necessarily even be helpful) even if you choose to use 
 RStudio as your controlling environment for launching such tasks.
 
 2) and 3) I know that R has contributed packages that can manage Hadoop data 
 processing, but I have no personal experience with them. Google is your 
 friend... especially if you keep in mind that these tools are not all found 
 in one monolithic package.
 
 For future reference: this is a plain text mailing list, so please adjust 
 your mail client appropriately when sending to this list. Also, there are 
 considerable resources mentioned in the Posting Guide that you should be 
 aware of... see the link below.
 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 --- 
 Sent from my phone. Please excuse my brevity.
 
 On July 9, 2014 7:10:00 AM PDT, Bert Gunter gunter.ber...@gene.com wrote:
 RStudio is a separate product with its own support. Post there, not
 here.
 
 -- Bert
 
 Bert Gunter
 Genentech Nonclinical Biostatistics
 (650) 467-7374
 
 Data is not information. Information is not knowledge. And knowledge
 is certainly not wisdom.
 Clifford Stoll
 
 
 
 
 On Tue, Jul 8, 2014 at 7:34 PM, Phan, Truong Q
 troung.p...@team.telstra.com wrote:
 Hi R'er,
 
 I have a dataset which has a matrix of 7502 x 1426 (rows x columns).
 The data is in a CSV format which has a size around 68Mb. This
 dataset is less than 10% of our dataset.
 I have been adopting the Anomaly detection method as described by
 http://www.mattpeeples.net/kmeans.html .
 It has been running more than 24hrs and still haven't completed the
 calculation.
 I did manage to run it with a smaller dataset (ie, 2100 rows x 1426
 columns). It took around 12hrs to run.
 
 I have a few questions and need your expertise guidance.
 
 1)  Is there any better Open source tools to use to do in one
 tool (eg, R Studio): prepare data, build models, validate models, test
 models and present data. I am looking a tool which will allow me to do
 the same as per the above link (Matt Peeples' blog).
 
 2)  Is there an Open source tools to perform the above which will
 allow me to run on top of Hadoop eco-system?
 
 3)  Can we use R Studio for windows as a client to run on top of
 Hadoop eco-system? If yes, please point me to the site where they have
 a use cases or samples.
 
 Thanks and Regards,
 Truong Phan
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 

Re: [R] function completing properly

2014-07-09 Thread Jeff Newmiller
I think you are mistaken. Please provide an example of how you used this 
function in any version of R that behaved as you describe.
Also, please post in plain text to avoid the what-you-see-is-not-what-we-see 
feature that HTML email provides.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On July 9, 2014 4:47:39 PM PDT, Janet Choate jsc@gmail.com wrote:
Hi R community,
i created a function (mkdate) as follows:

mkdate = function(x) {
x$date = as.Date(paste(x$year, x$month, x$day, sep=-))
x$wy = ifelse(x$month =10, x$year+1, x$year)
x$yd = as.integer(format(as.Date(x$date), format=%j))
x$wyd = cal.wyd(x)
x
}

the function results in adding the new columns date, wy, yd, and wyd to
the
table i apply it to.
this has always worked in R version 2.14.2.
however, in R version 3.1.0 - instead of my mkdate function adding
those
columns to my existing table, it just overwrites my table and leaves me
with just a list of the last variable created by my mkdate function. so
i
end up with just a list of numbers representing wyd, and lose all the
data
in my original table.

does anyone know what would now be causing this to occur, and what i
need
to do to make my function work properly again?

thank you for any assistance,
Janet

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cansisc: Error in eigen(eHe, symmetric = TRUE)

2014-07-09 Thread John Fox
Dear Judith,

I take it from your reply that you have *more* observations than there are 
response variables in the multivariate linear model, but since you still 
haven't provided access to the data, it's still impossible to tell what the 
problem is. 

I don't follow your application, possibly because I'm ignorant of the area in 
which you work, but also possibly because there's insufficient information 
about that too. When you say that there are 0 abundances, I assume that this 
doesn't mean that some abundance values are 0 for *all* observations. If that's 
the case, then that would I believe produce a computational error, though not I 
think the one you observed. As an aside, if there are many 0 abundances then 
using the multivariate normal distribution for the responses likely isn't 
reasonable, which is what you're doing, but this in itself won't produce a 
computational error in candisc().

So, to reiterate, without the data, there's not much more that I can say. 
Because I'm out of town and will be traveling tomorrow, I'm unlikely to be able 
to respond again for several days.

 Best,
 John

On Wed, 9 Jul 2014 17:54:56 -0500
 Maria Judith Carmona H juditycarm...@gmail.com wrote:
 Dear John,
 
 I am including abundance values ??in my data set so obviously I have zero
 abundances.
 The problem is that if plot only the factors (biomasa, altdosel, altsoto,
 cobertura, riqarb, elevacion, temperatura, precipitacion) I get the
 graphic, the same happen when I included only the families, but I want to
 see the effect of all these factors+families on this plot . In fact I
 included only certain families:
 
 prueba4 -
 lm(cbind(biomasa,altdosel,altsoto,cobertura,riqarb,elevacion,temperatura,precipitacion,
 
 Araceae,Begoniaceae,Bromeliaceae,Clusiaceae,Cyclanthaceae,Ericaceae,Gesneriaceae,
 Melastomataceae,Orchidaceae,Piperaceae,Pteridophyta) ~
 sitio, data=bosques.p)
 canprueba2 - candisc(prueba2, term=sitio, data=bosques.p, ndim=1)
 Error in eigen (EHD, symmetric = TRUE): infinite or missing values ??in 'x'
 In addition: Warning message:
 In sqrt (wmd): NaNs produced
 
 You see I get the same error.
 
 Best regards,
 Judith
 
 
 On Wed, Jul 9, 2014 at 5:30 PM, John Fox j...@mcmaster.ca wrote:
 
  Dear Maria Judith Carmona Higuita,
 
  Since you didn't include enough information (such as your access to your
  data) to reproduce the error, one can only guess. My guess: you have fewer
  observations in your data set than response variables on the LHS of the
  multivariate linear model.
 
  I hope this helps,
   John
 
  
  John Fox, Professor
  McMaster University
  Hamilton, Ontario, Canada
  http://socserv.mcmaster.ca/jfox/
 
  On Wed, 9 Jul 2014 11:36:35 -0500
   Maria Judith Carmona H juditycarm...@gmail.com wrote:
   Hi,
  
   I have a problem using the function Candisc from Candisc Package.
  
   bosques1-read.csv(bosques1.csv,header=TRUE,encoding=latin1)
   bosques1-na.exclude(bosques1)
   attach(bosques1)
  
   #Modelo de regresión
   mod -
  
  lm(cbind(biomasa,altdosel,altsoto,cobertura,riqarb,elevacion,temperatura,precipitacion,
   Acanthaceae, Apocinaceae, Araceae, Araliaceae, Arecaceae,
   Aspleniaceae, Begoniaceae,
   Blechnaceae, Bromeliaceae, Clusiaceae, Cyclanthaceae,
   Davalliaceae, Denstaedtiaceae,
   Dryopteridaceae, Ericaceae, Gesneriaceae,
  Hymenophyllaceae,
   indet., Lauraceae, Lomariopsidaceae, Lycopodiaceae, Melastomataceae,
   Moraceae, Myrsinaceae, Ophioglossaceae,
   Orchidaceae, Peperomia, Piperaceae, Poaceae,
  Polypodiaceae,
   Primulaceae, Pteridaceae,
   Pteridophyta.taxa, Rubiaceae, Vittariaceae) ~ sitio,
   data=bosques1)
   summary(mod)
  
   #Gráfico 1
   can - candisc(mod, term=sitio,data=bosques1,ndim=1,eig=T)
   ### The error happens here, so I can not run the plot.
   plot(can,titles.1d = c(Puntuación canónica, Estructura))
   summary(can, means = FALSE, scores = TRUE, coef = c(std), digits = 2)
  
   The error is:
   Error in eigen(eHe, symmetric = TRUE) : infinite or missing values in 'x'
   In addition: Warning message:
   In sqrt(wmd) : NaNs produced
  
   Please help!
  
   --
   Maria Judith Carmona Higuita.
   Estudiante de Biología - Universidad de Antioquia
   Medellín - Colombia
  
   La felicidad ocurre cuando encajas en tu vida, cuando encajas
   tan armónicamente que cualquier cosa que hagas es una alegría para ti. De
   repente lo sabrás y la meditación te seguirá. Si amas el trabajo que
  haces,
   si amas la manera como vives, entonces ya estás meditando y nada puede
   distraerte. Osho
  
 [[alternative HTML version deleted]]
  
 
 
 
 
 
 
 
 -- 
 Maria Judith Carmona Higuita.
 Estudiante de Biología - Universidad de Antioquia
 Medellín - Colombia
 
 La felicidad ocurre cuando encajas en tu vida, cuando encajas
 tan armónicamente que cualquier cosa que hagas es una alegría para ti. De
 

[R] Installing rgdal and rjags packages on a linux cluster

2014-07-09 Thread Adam Zeilinger

Dear R Help,

I'm trying to install the rjags and rgdal packages on a linux cluster 
running R 3.0.3.  However, I'm having problems installing them 
successfully.  Both packages require external programs (JAGS and GDAL, 
respectively), which have been successfully installed.


For rjags, the error message reads:
configure: error: Location of JAGS headers not defined.  Use configure 
arg '--with-jags-include' or environment variable 'JAGS_INCLUDE'


I tried the following:
 install.packages(rjags, configure.args = list(--with-jags-include))
This returns a different error:
configure: error: Problem with header file yes/Console.h

From my readings of various help pages, it seems that I need to 
download the developer version of the rjags package, in order to supply 
the header files.  Is this correct?  If so, where do I find developer 
packages and how do I install them?  R package development is new to me.


For rgdal, the error message reads:
Error: gdal-config not found
The gdal-config script distributed with GDAL could not be found.

Here, it's my understanding that I need to install PROJ.4 libraries and 
the developer versions of the rgdal and proj4 packages.  Is this correct?


Are the problems with installing rjags and rgdal basically the same?  
Could the problems be caused by running an older version of R?


Any help would be greatly appreciated.
Adam Zeilinger

--
Adam Zeilinger
Postdoctoral scholar
Berkeley Initiative for Global Change Biology
University of California Berkeley
http://www.linkedin.com/in/adamzeilinger/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cansisc: Error in eigen(eHe, symmetric = TRUE)

2014-07-09 Thread Maria Judith Carmona H
Dear John,

I am including abundance values ​​in my data set so obviously I have zero
abundances.
The problem is that if plot only the factors (biomasa, altdosel, altsoto,
cobertura, riqarb, elevacion, temperatura, precipitacion) I get the
graphic, the same happen when I included only the families, but I want to
see the effect of all these factors+families on this plot . In fact I
included only certain families:

prueba4 -
lm(cbind(biomasa,altdosel,altsoto,cobertura,riqarb,elevacion,temperatura,precipitacion,

Araceae,Begoniaceae,Bromeliaceae,Clusiaceae,Cyclanthaceae,Ericaceae,Gesneriaceae,
Melastomataceae,Orchidaceae,Piperaceae,Pteridophyta) ~
sitio, data=bosques.p)
canprueba2 - candisc(prueba2, term=sitio, data=bosques.p, ndim=1)
Error in eigen (EHD, symmetric = TRUE): infinite or missing values ​​in 'x'
In addition: Warning message:
In sqrt (wmd): NaNs produced

You see I get the same error.

Best regards,
Judith


On Wed, Jul 9, 2014 at 5:30 PM, John Fox j...@mcmaster.ca wrote:

 Dear Maria Judith Carmona Higuita,

 Since you didn't include enough information (such as your access to your
 data) to reproduce the error, one can only guess. My guess: you have fewer
 observations in your data set than response variables on the LHS of the
 multivariate linear model.

 I hope this helps,
  John

 
 John Fox, Professor
 McMaster University
 Hamilton, Ontario, Canada
 http://socserv.mcmaster.ca/jfox/

 On Wed, 9 Jul 2014 11:36:35 -0500
  Maria Judith Carmona H juditycarm...@gmail.com wrote:
  Hi,
 
  I have a problem using the function Candisc from Candisc Package.
 
  bosques1-read.csv(bosques1.csv,header=TRUE,encoding=latin1)
  bosques1-na.exclude(bosques1)
  attach(bosques1)
 
  #Modelo de regresión
  mod -
 
 lm(cbind(biomasa,altdosel,altsoto,cobertura,riqarb,elevacion,temperatura,precipitacion,
  Acanthaceae, Apocinaceae, Araceae, Araliaceae, Arecaceae,
  Aspleniaceae, Begoniaceae,
  Blechnaceae, Bromeliaceae, Clusiaceae, Cyclanthaceae,
  Davalliaceae, Denstaedtiaceae,
  Dryopteridaceae, Ericaceae, Gesneriaceae,
 Hymenophyllaceae,
  indet., Lauraceae, Lomariopsidaceae, Lycopodiaceae, Melastomataceae,
  Moraceae, Myrsinaceae, Ophioglossaceae,
  Orchidaceae, Peperomia, Piperaceae, Poaceae,
 Polypodiaceae,
  Primulaceae, Pteridaceae,
  Pteridophyta.taxa, Rubiaceae, Vittariaceae) ~ sitio,
  data=bosques1)
  summary(mod)
 
  #Gráfico 1
  can - candisc(mod, term=sitio,data=bosques1,ndim=1,eig=T)
  ### The error happens here, so I can not run the plot.
  plot(can,titles.1d = c(Puntuación canónica, Estructura))
  summary(can, means = FALSE, scores = TRUE, coef = c(std), digits = 2)
 
  The error is:
  Error in eigen(eHe, symmetric = TRUE) : infinite or missing values in 'x'
  In addition: Warning message:
  In sqrt(wmd) : NaNs produced
 
  Please help!
 
  --
  Maria Judith Carmona Higuita.
  Estudiante de Biología - Universidad de Antioquia
  Medellín - Colombia
 
  La felicidad ocurre cuando encajas en tu vida, cuando encajas
  tan armónicamente que cualquier cosa que hagas es una alegría para ti. De
  repente lo sabrás y la meditación te seguirá. Si amas el trabajo que
 haces,
  si amas la manera como vives, entonces ya estás meditando y nada puede
  distraerte. Osho
 
[[alternative HTML version deleted]]
 







-- 
Maria Judith Carmona Higuita.
Estudiante de Biología - Universidad de Antioquia
Medellín - Colombia

La felicidad ocurre cuando encajas en tu vida, cuando encajas
tan armónicamente que cualquier cosa que hagas es una alegría para ti. De
repente lo sabrás y la meditación te seguirá. Si amas el trabajo que haces,
si amas la manera como vives, entonces ya estás meditando y nada puede
distraerte. Osho

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cansisc: Error in eigen(eHe, symmetric = TRUE)

2014-07-09 Thread Maria Judith Carmona H
Dear John

There is my data set.

Thanks.


On Wed, Jul 9, 2014 at 8:12 PM, John Fox j...@mcmaster.ca wrote:

 Dear Judith,

 I take it from your reply that you have *more* observations than there are
 response variables in the multivariate linear model, but since you still
 haven't provided access to the data, it's still impossible to tell what the
 problem is.

 I don't follow your application, possibly because I'm ignorant of the area
 in which you work, but also possibly because there's insufficient
 information about that too. When you say that there are 0 abundances, I
 assume that this doesn't mean that some abundance values are 0 for *all*
 observations. If that's the case, then that would I believe produce a
 computational error, though not I think the one you observed. As an aside,
 if there are many 0 abundances then using the multivariate normal
 distribution for the responses likely isn't reasonable, which is what
 you're doing, but this in itself won't produce a computational error in
 candisc().

 So, to reiterate, without the data, there's not much more that I can say.
 Because I'm out of town and will be traveling tomorrow, I'm unlikely to be
 able to respond again for several days.

  Best,
  John

 On Wed, 9 Jul 2014 17:54:56 -0500
  Maria Judith Carmona H juditycarm...@gmail.com wrote:
  Dear John,
 
  I am including abundance values ??in my data set so obviously I have zero
  abundances.
  The problem is that if plot only the factors (biomasa, altdosel, altsoto,
  cobertura, riqarb, elevacion, temperatura, precipitacion) I get the
  graphic, the same happen when I included only the families, but I want to
  see the effect of all these factors+families on this plot . In fact I
  included only certain families:
 
  prueba4 -
 
 lm(cbind(biomasa,altdosel,altsoto,cobertura,riqarb,elevacion,temperatura,precipitacion,
 
 
 Araceae,Begoniaceae,Bromeliaceae,Clusiaceae,Cyclanthaceae,Ericaceae,Gesneriaceae,
  Melastomataceae,Orchidaceae,Piperaceae,Pteridophyta)
 ~
  sitio, data=bosques.p)
  canprueba2 - candisc(prueba2, term=sitio, data=bosques.p, ndim=1)
  Error in eigen (EHD, symmetric = TRUE): infinite or missing values ??in
 'x'
  In addition: Warning message:
  In sqrt (wmd): NaNs produced
 
  You see I get the same error.
 
  Best regards,
  Judith
 
 
  On Wed, Jul 9, 2014 at 5:30 PM, John Fox j...@mcmaster.ca wrote:
 
   Dear Maria Judith Carmona Higuita,
  
   Since you didn't include enough information (such as your access to
 your
   data) to reproduce the error, one can only guess. My guess: you have
 fewer
   observations in your data set than response variables on the LHS of the
   multivariate linear model.
  
   I hope this helps,
John
  
   
   John Fox, Professor
   McMaster University
   Hamilton, Ontario, Canada
   http://socserv.mcmaster.ca/jfox/
  
   On Wed, 9 Jul 2014 11:36:35 -0500
Maria Judith Carmona H juditycarm...@gmail.com wrote:
Hi,
   
I have a problem using the function Candisc from Candisc Package.
   
bosques1-read.csv(bosques1.csv,header=TRUE,encoding=latin1)
bosques1-na.exclude(bosques1)
attach(bosques1)
   
#Modelo de regresión
mod -
   
  
 lm(cbind(biomasa,altdosel,altsoto,cobertura,riqarb,elevacion,temperatura,precipitacion,
Acanthaceae, Apocinaceae, Araceae, Araliaceae,
 Arecaceae,
Aspleniaceae, Begoniaceae,
Blechnaceae, Bromeliaceae, Clusiaceae, Cyclanthaceae,
Davalliaceae, Denstaedtiaceae,
Dryopteridaceae, Ericaceae, Gesneriaceae,
   Hymenophyllaceae,
indet., Lauraceae, Lomariopsidaceae, Lycopodiaceae, Melastomataceae,
Moraceae, Myrsinaceae, Ophioglossaceae,
Orchidaceae, Peperomia, Piperaceae, Poaceae,
   Polypodiaceae,
Primulaceae, Pteridaceae,
Pteridophyta.taxa, Rubiaceae, Vittariaceae) ~ sitio,
data=bosques1)
summary(mod)
   
#Gráfico 1
can - candisc(mod, term=sitio,data=bosques1,ndim=1,eig=T)
### The error happens here, so I can not run the plot.
plot(can,titles.1d = c(Puntuación canónica, Estructura))
summary(can, means = FALSE, scores = TRUE, coef = c(std), digits =
 2)
   
The error is:
Error in eigen(eHe, symmetric = TRUE) : infinite or missing values
 in 'x'
In addition: Warning message:
In sqrt(wmd) : NaNs produced
   
Please help!
   
--
Maria Judith Carmona Higuita.
Estudiante de Biología - Universidad de Antioquia
Medellín - Colombia
   
La felicidad ocurre cuando encajas en tu vida, cuando encajas
tan armónicamente que cualquier cosa que hagas es una alegría para
 ti. De
repente lo sabrás y la meditación te seguirá. Si amas el trabajo que
   haces,
si amas la manera como vives, entonces ya estás meditando y nada
 puede
distraerte. Osho
   
  [[alternative HTML version deleted]]
   
  
  
  
  
  
 
 
  --
  Maria Judith Carmona 

[R] Information about font

2014-07-09 Thread Sébastien Bihorel
Hi,

I have this set of R scripts which are ran on a linux box and create plots
with the lattice package. I do not specify any custom font family, so I
believe that whatever is the default font on my system is used in the plot.
1- how can I know which is the default font used in my plots?
2- is this font specific to R or can it be used by external tools?
3- if this font can be used by external tools, how can I know the location
of this font on my system?

Thank you in advance for your help

Sebastien

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Information about font

2014-07-09 Thread David Winsemius

On Jul 9, 2014, at 7:47 PM, Sébastien Bihorel wrote:

 Hi,
 
 I have this set of R scripts which are ran on a linux box and create plots
 with the lattice package. I do not specify any custom font family, so I
 believe that whatever is the default font on my system is used in the plot.
 1- how can I know which is the default font used in my plots?
 2- is this font specific to R or can it be used by external tools?
 3- if this font can be used by external tools, how can I know the location
 of this font on my system?

Fonts are specific to the graphical device being used. You have not specified 
what device you are using.

?Devices

The fonts are provided by your OS setup. 

?pdfFonts
?Type1Font
?grid::gpar

 
 Thank you in advance for your help
 
 Sebastien
 
   [[alternative HTML version deleted]]


Still having trouble understanding your mail client?

-- 
David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Decision Tree

2014-07-09 Thread Abhinaba Roy
Hi R-helpers,

Is it possible to change the color of the boxes when plotting decision
trees using 'fancyRpartPlot()' from rpart.plot package ?

-- 
Regards,
Abhinaba Roy

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] quantmod: How could I change the name in chartSeries

2014-07-09 Thread William

hi, guys,

I am just a beginner to the excellent R package, quantmod. I quite don't 
know how to change the y-axis name in the chartSeries function. 
Actually, I want to write some sort of the following function, by which 
I could use just one code  sentence  to complete the financial analysis.


The following function is designed to provide some aspects of the 
SP500. And now I want to change the stock.name on the y-axis as 
SP500. Is there anyway to realize this?


THX

William


#
stock.price - function(stock.name, stock.code){
   Loading..
  library(zoo)
  library(xts)
  library(TTR)
  library(Defaults)
  library(quantmod)
#--
  ## Theme: white
  theme.white - chartTheme(white)
  names(theme.white)
  theme.white$bg.col - white
  theme.white$up.col - red
  theme.white$dn.col - green
#--
   main function
  stock.name - getSymbols(stock.code, from = 2010-01-01,
to = Sys.Date(), src = 
yahoo, auto.assign=FALSE)

  chartSeries(stock.name, theme = theme.white,
  # subset = 'last 12 months',
  TA = addVo(); addSMA(); addEnvelope();
  addMACD(); addMomentum(); addROC();
  addBBands())
  addLines(v = which(stock.name[,4] == max(stock.name[,4])),
   col = gray)
}
#
stock.price(SP500, ^GSPC)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


RE: [R-es] Resumen de R-help-es, Vol 65, Env�o 13

2014-07-09 Thread Isidro Hidalgo
Exacto: mucho trabajo y poco miedo al ridículo para preguntar y preguntar. Una 
buena combinación...
Un saludo.

Isidro Hidalgo Arellano
Observatorio Regional de Empleo
Consejería de Empleo y Economía
http://www.jccm.es



 -Mensaje original-
 De: r-help-es-boun...@r-project.org [mailto:r-help-es-bounces@r-
 project.org] En nombre de Eva Prieto Castro
 Enviado el: miércoles, 09 de julio de 2014 8:51
 Para: ERNESTO JIMENEZ GARRIDO; r-help-es@r-project.org
 Asunto: Re: [R-es] Resumen de R-help-es, Vol 65, Envío 13

 Hola, Ernesto:

 Me parece desacertada tu decisión. El conocimiento llega a uno
 gradualmente, como consecuencia de esforzarse, de preguntar y de contar
 con gente que, aunque sepa mucho, quiera y tenga la capacidad de
 adaptarse a la situación del otro. Al fin y al cabo, la última fase
 del aprendizaje es la enseñanza...

 Mis conocimientos en R no son avanzados y nunca he dudado en preguntar.
 Te invito a que hagas lo mismo.

 Un saludo.
 Eva


 El Miércoles 9 de julio de 2014 8:00, ERNESTO JIMENEZ GARRIDO
 ernestjime...@ub.edu escribió:



 Queridos,
 Me apunté a esta lista de correo con el objetivo de avanzar en mi
 proceso de aprendizaje con R y R-Commander. Mis limitados conocimientos
 me hacen ver que este no es el foro que necesito, su nivel es demasiado
 avanzado para mí.
 Si me podeis orientar para estar en el foro adecuado que sería tener
 oportunidad de comentar problemas de los primeros pasos con R y R-
 Commander os lo agradecería. Os doy las gracias por avanzado. Que os
 vaya bien en vuestro lento avance con este programa común, universal y
 gratuíto.

 Ernest Jiménez Garrido
 Professor Associat UB
 Departament d'Econometria, Estadística i Economia Espanyola
 
 De: r-help-es-boun...@r-project.org [r-help-es-boun...@r-project.org]
 en nom de r-help-es-requ...@r-project.org [r-help-es-request@r-
 project.org] Enviat el: dimarts, 8 / juliol / 2014 19:43 Per a: r-help-
 e...@r-project.org
 Tema: Resumen de R-help-es, Vol 65, Envío 13

 Envíe los mensajes para la lista R-help-es a         r-help-es@r-
 project.org

 Para subscribirse o anular su subscripción a través de la WEB   Â
 Â  Â  https://stat.ethz.ch/mailman/listinfo/r-help-es

 O por correo electrónico, enviando un mensaje con el texto help en
 el asunto (subject) o en el cuerpo a:
 Â  Â  Â  Â  r-help-es-requ...@r-project.org

 Puede contactar con el responsable de la lista escribiendo a:
 Â  Â  Â  Â  r-help-es-ow...@r-project.org

 Si responde a algún contenido de este mensaje, por favor, edite la
 linea del asunto (subject) para que el texto sea mas especifico que:
 Re: Contents of R-help-es digest Además, por favor, incluya en
 la respuesta sólo aquellas partes del mensaje a las que está
 respondiendo.


 Asuntos del día:

 Â   1. Paquete generado no detectan ambiente particular creado.
 Â  Â  Â  (Eva Prieto Castro)
 Â   2. Re: Paquete generado no detectan ambiente particular creado.
 Â  Â  Â  (Eva Prieto Castro)
 Â   3. Re: Paquete generado no detectan ambiente particular creado.
 Â  Â  Â  (miguel.angel.rodriguez.mui...@sergas.es)
 Â   4. Re: Paquete generado no detectan ambiente particular creado.
 Â  Â  Â  (Eva Prieto Castro)
 Â   5. Re: Paquete generado no detectan ambiente particular creado.
 Â  Â  Â  (rubenfcasal)


 --

 Message: 1
 Date: Tue, 8 Jul 2014 11:38:22 +0100

 To: r-help-es r-help-es@r-project.org
 Subject: [R-es] Paquete generado no detectan ambiente particular   Â
 Â  Â  creado.
 Message-ID:
 Â  Â  Â  Â
 1404815902.81335.yahoomail...@web171506.mail.ir2.yahoo.com
 Content-Type: text/plain

 Buenos días:

 Por favor, ¿alguien podría crear un script de r con este código que
 envío e intentar paquetizarlo?. Yo siempre lo conseguía, pero con la
 versión actual de R (3.1.0), una vez generado el zip del paquete y
 cargado desde la RGui, no me detecta la existencia del environment que
 creé (.Ch.env). Es como si ahora el pkt sólo pudiera estar integrado
 (a efectos prácticos) por funciones, sin admitir la existencia de una
 estructura de datos subyacente, como es el conjunto formado por lGlo y
 bStarted, ambas incluidas en el environment creado (.Ch.env)

 .Ch.env - new.env()
 .Ch.env$lGlo - list()
 .Ch.env$bStarted - FALSE

 CheckGloCreated - function() {
 Â  if (.Ch.env$bStarted == TRUE) {
 Â  Â  stop(Data structures were already initialized., call.=FALSE) Â
 } } ChrL.Start - function() { Â  CheckGloCreated() .Ch.env$bStarted -
 TRUE Â  cat(Tested.\n) }


 Lo único peculiar al paquetizar es que en el Ch-internal.r (si le
 llamáis Ch al paquete) hay que corregir la línea que genera el
 package.skeleton y sustituirla por lo siguiente:

 .Ch.env - new.env()



 Gracias de antemano.

 Atte.- Eva

 Â  Â  Â  Â  [[alternative HTML version deleted]]



 --

 Message: 2
 Date: Tue, 8 Jul 2014 12:45:44 +0100

 To: r-help-es 

Re: [R-es] La lista está para ayudar...

2014-07-09 Thread José Antonio Palazón Ferrando

Hola:

Haciendo un poco de historia debemos recordar, al menos los más viejos 
del lugar,
que esta lista está destinada a ayudar a los que tienen problemas y les 
cuesta
dirigirse a listas especializadas, ya sea por el idioma o por la falta 
de experiencia con R

o incluso con otros aspectos más aplicados.

Es una lista para ayudarnos si recurrir a aquello de lea el j... 
manual; creo
que desde el principio la lista a funcionado con una cordialidad 
insuperable,
las ayudas han sido de todo tipo de niveles; además, estoy seguro de que 
ha facilitado

algunas relaciones personales.

Espero seguir encontrándome útil sobre todo a los novatos, aunque siempre
hay dedos y teclados más vertiginosos que los mios :o(

Un saludo y un agradecimiento a todos  los colisteros: preguntones, 
respondones y discutones ;-)


El 09/07/14 09:53, miguel.angel.rodriguez.mui...@sergas.es escribió:

Hola Ernesto.

Ya que estás pensando en abandonar la lista, yo (antes) me lanzaría a preguntar 
alguna cosa (de esas que tienes guardadas esperando al foro adecuado).
Qué puedes perder?
:-)


Un Saludo,

Miguel Ángel Rodríguez Muíños
Dirección Xeral de Innovación e Xestión da Saúde Pública
Consellería de Sanidade
Xunta de Galicia
http://dxsp.sergas.es







-Mensaje original-
De: r-help-es-boun...@r-project.org [mailto:r-help-es-boun...@r-project.org] En 
nombre de ERNESTO JIMENEZ GARRIDO
Enviado el: miércoles, 09 de julio de 2014 8:01
Para: r-help-es@r-project.org
Asunto: Re: [R-es] Resumen de R-help-es, Vol 65, Envío 13

Queridos,
Me apunté a esta lista de correo con el objetivo de avanzar en mi proceso de 
aprendizaje con R y R-Commander. Mis limitados conocimientos me hacen ver que 
este no es el foro que necesito, su nivel es demasiado avanzado para mí.
Si me podeis orientar para estar en el foro adecuado que sería tener 
oportunidad de comentar problemas de los primeros pasos con R y R-Commander os 
lo agradecería. Os doy las gracias por avanzado. Que os vaya bien en vuestro 
lento avance con este programa común, universal y gratuíto.

Ernest Jiménez Garrido
Professor Associat UB




Nota: A información contida nesta mensaxe e os seus posibles documentos 
adxuntos é privada e confidencial e está dirixida únicamente ó seu 
destinatario/a. Se vostede non é o/a destinatario/a orixinal desta mensaxe, por 
favor elimínea. A distribución ou copia desta mensaxe non está autorizada.

Nota: La información contenida en este mensaje y sus posibles documentos 
adjuntos es privada y confidencial y está dirigida únicamente a su 
destinatario/a. Si usted no es el/la destinatario/a original de este mensaje, 
por favor elimínelo. La distribución o copia de este mensaje no está autorizada.

See more languages: http://www.sergas.es/aviso_confidencialidad.htm
___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


--


José Antonio Palazón Ferrando
Profesor Titular. Departamento de Ecología e Hidrología.
Facultad de Biología. Universidad de Murcia.
Campus Universitario de Espinardo
30100 MURCIA-SPAIN
Telf: +34 868 88 49 80
Fax : +34 868 88 39 63
Email: pala...@um.es

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R-es] error com un archivo

2014-07-09 Thread Carlos J. Gil Bellosta
Hola, ¿qué tal?

Las condiciones que definen los cuatro conjuntos de datos, ¿son mutuamente
exclusivas? Yo trataría de comprobar si hay líneas que cumplen más de una
de ellas.

Podrías crear un vector con etiquetas al principio que definiese qué filas
van a cada conjunto de datos. Eso garantizaría que cada fila llega a uno y
solo uno de ellos. Tendrías, además, un diseño más limpio y fácil de
interpretar y mantener.

Un saludo,

Carlos J. Gil Bellosta
http://www.datanalytics.com




2014-07-09 13:56 GMT+02:00 Marta valdes lopez martavalde...@gmail.com:

 Hola a todos,

 Me gustaria pedir vuestra ayuda a encontrar el error que no consigo
 encontrar en este archivo. He revisado todo mil veces y probado y no doy
 con ello.Adjunto el archivo con Google drive porque es muy grande.

 ​
  monicap_50.csv
 https://docs.google.com/file/d/0B8o2KrPEgG7ATlBMc19lTVk1d3M/edit?usp=drive_web
 ​
 Este es el script, y lo que no entiendo que pasa es que tengo 592044 datos
 despues de limpiar los NA quedan 586561 datos , y cuando utilizo el script
 la suma de z2+z4+z5+z6 , que son los estados deberia de darme lo mismo que
 Z1 que es el valor total de datos pero no se que error existe que me dan
 mas datos que los que hay.He comparado con el archivo en excel y los datos
 de na estan correctos.

 library(chron)
   library(xlsx)
  filename-monicap_50.csv
   DBxy-read.csv(filename, sep=;,header=TRUE,dec=,)
  DBx-na.omit(DBxy)
  names(DBx)-c(Boat,DateTime,TimeDiff, Latitude, Longitude,
 Course, Speed, distNm, calcSpeed, calcCourse, distHb,
 Harbour, idTrip,vmsAngle, calcAngle, vmsLeg, calcLeg,
 Trip_vmsLeg, Trip_calcLeg, lengthTrip, lengthTrip_vmsLeg,
 lengthTrip_calcLeg,Time, Date)
   #Formatting date and time variables
   DBx$Date-strptime(DBx$Date, %d-%m-%Y)
   DBx$Year-as.POSIXlt(DBx$Date)$year+1900
   if(filename!=monicap_50.csv) {DBx$Time-paste(DBx$Time, :00,
 sep=)}   #NOT necessary for Monicap and Univerest_50
   DBx$Time-times(DBx$Time)   #Works for Monicap AND UNIVEREST_50  ONLY
   DBx$Boat-gsub(^\\s+|\\s+$, , DBx$Boat)
#Read file with boat codes and gears
   codeBoats- read.csv(CODES_2002-2010New.csv,
 sep=,,header=TRUE)#Laptop
   codeBoats$CODIGO-gsub(^\\s+|\\s+$, , codeBoats$CODIGO)
  #Assigning a Fishing license based on Boat and Year
   DBx$gear-codeBoats$Lic[match(paste(DBx$Boat,DBx$Year),
 paste(codeBoats$CODIGO,codeBoats$Year))]
 z0-length(DBx$gear)
  z1-length(DBx$gear)
   z1
  #defining speed and distance limits
   speedFishing-2.0
   speedHarb-1.0
   distHbRule-3.0
 speedSteam-2.0
   minTime-times(c(05:59:59))#usual beginning of fishing
 operations
   maxTime- times(c(20:59:59))#usual finishing of fishing
 operations
#Selecting Harbour
 DBharbour- na.omit(DBx[DBx$distHb=distHbRule 
 DBx$calcSpeed=speedHarb,])
 DBharbour$State-Harbour   #MONICAP= 10618; UNIVER1= ; UNIVER2=
 ; UNIVEREST= 1028
 z2-length(DBharbour$State)
  #Selecting Steaming
 DBsteaming- na.omit(DBx[(DBx$calcSpeedspeedFishing) |
 (DBx$distHb=distHbRule  DBx$calcSpeedspeedHarb),])
 DBsteaming$State- Steaming #MONICAP= 88398; UNIVER1= ; UNIVER2=
 ; UNIVEREST= 53748
 DBsteaming$Harbour-
 z4-length(DBsteaming$State)
  #Selecting Fishing
 DBfishing- na.omit(DBx[(DBx$calcSpeed=speedFishing 
 DBx$distHbdistHbRule  DBx$TimeminTime  DBx$Time=maxTime),])
 DBfishing$State-Fishing
   DBfishing$Harbour-
 z5-length(DBfishing$State)
#Selecting nigth
 DBnight- na.omit(DBx[(DBx$calcSpeed=speedFishing 
 DBx$distHbdistHbRule (DBx$Time=minTime | DBx$TimemaxTime)),])
 DBnight$State-Night  #MONICAP=10434; UNIVER1= 16677; UNIVER2=
 25789
 DBnight$Harbour-
 z6-length(DBnight$State)

 Si alguien ve el error y puede echarme una mano agradeceria, si no pues
 seguire peleandome con el archivo!

 Muchas gracias, un saludo


 ___
 R-help-es mailing list
 R-help-es@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-help-es



[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


[R-es] Conversion date a numeric y vuelta a date

2014-07-09 Thread Alberto Soria
Hola a todos:

Debe de ser una tontería, pero no consigo saber porque la siguiente linea
no devuelve la fecha actual:

as.Date(as.numeric(Sys.time()))

He hecho esa prueba porque no consigo pasar un numero convertido a partir
de una fecha y modificado a fecha de nuevo.

Gracias por adelantado.

Un saludo,
Alberto.

[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R-es] Conversion date a numeric y vuelta a date

2014-07-09 Thread Carlos Ortega
Hola,

Obtienes un error por que no indicas el origin, tal y como refleja
fielmente la documentación:

 as.Date(as.numeric(Sys.time()))Error in 
 as.Date.numeric(as.numeric(Sys.time())) :
  'origin' must be supplied



-
Details

The usual vector re-cycling rules are applied to x and format so the answer
will be of length that of the longer of the vectors.

Locale-specific conversions to and from character strings are used where
appropriate and available. This affects the names of the days and months.

The as.Date methods accept character strings, factors, logical NA and
objects of classes POSIXlt
http://127.0.0.1:42870/help/library/base/help/POSIXlt and POSIXct
http://127.0.0.1:42870/help/library/base/help/POSIXct. (The last is
converted to days by ignoring the time after midnight in the representation
of the time in specified time zone, default UTC.) Also objects of class
date(from package date
http://127.0.0.1:42870/help/library/date/html/as.date.html) and dates (from
package chron http://127.0.0.1:42870/help/library/chron/html/chron.html).
Character strings are processed as far as necessary for the format
specified: any trailing characters are ignored.

*as.Date will accept numeric data (the number of days since an epoch),
but only if origin is supplied.*

The format and as.character methods ignore any fractional part of the date.
-

Saludos,
Carlos Ortega
www.qualityexcellence.es





El 9 de julio de 2014, 14:25, Alberto Soria alberto.so...@ari-solar.es
escribió:

 Hola a todos:

 Debe de ser una tontería, pero no consigo saber porque la siguiente linea
 no devuelve la fecha actual:

 as.Date(as.numeric(Sys.time()))

 He hecho esa prueba porque no consigo pasar un numero convertido a partir
 de una fecha y modificado a fecha de nuevo.

 Gracias por adelantado.

 Un saludo,
 Alberto.

 [[alternative HTML version deleted]]


 ___
 R-help-es mailing list
 R-help-es@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-help-es




-- 
Saludos,
Carlos Ortega
www.qualityexcellence.es

[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R-es] Conversion date a numeric y vuelta a date

2014-07-09 Thread Jorge I Velez
Hola Alberto,

Necesitas

as.Date(as.numeric(as.Date(Sys.time())), origin = '1970-01-01')

Esta parte

as.numeric(as.Date(Sys.time()))
#  16260

te da el numero de dias que han transcurrido desde Ene 1, 1970.  Luego,
utilizando ese dia/año como origen, determinas la fecha actual.

Saludos,
Jorge.-


2014-07-09 22:25 GMT+10:00 Alberto Soria alberto.so...@ari-solar.es:

 Hola a todos:

 Debe de ser una tontería, pero no consigo saber porque la siguiente linea
 no devuelve la fecha actual:

 as.Date(as.numeric(Sys.time()))

 He hecho esa prueba porque no consigo pasar un numero convertido a partir
 de una fecha y modificado a fecha de nuevo.

 Gracias por adelantado.

 Un saludo,
 Alberto.

 [[alternative HTML version deleted]]


 ___
 R-help-es mailing list
 R-help-es@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-help-es



[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R-es] Conversion date a numeric y vuelta a date

2014-07-09 Thread Alex
Hola Alberto,

en el siguiente link es posible que encuentres alguna clave para la 
solución: 
http://www.noamross.net/blog/2014/2/10/using-times-and-dates-in-r---presentation-code.html

Espero sea de utilidad.
Un saludo,

-- 

*Alexandre Alonso Fernández**
*Instituto de Investigaciones Marinas, IIM-CSIC
Departamento de Recursos y Ecología Marina
Grupo de Ecología Pesquera

http://www.iim.csic.es/pesquerias

El 09/07/2014 14:32, Carlos Ortega escribió:
 Hola,

 Obtienes un error por que no indicas el origin, tal y como refleja
 fielmente la documentación:

 as.Date(as.numeric(Sys.time()))Error in 
 as.Date.numeric(as.numeric(Sys.time())) :
'origin' must be supplied



 -
 Details

 The usual vector re-cycling rules are applied to x and format so the answer
 will be of length that of the longer of the vectors.

 Locale-specific conversions to and from character strings are used where
 appropriate and available. This affects the names of the days and months.

 The as.Date methods accept character strings, factors, logical NA and
 objects of classes POSIXlt
 http://127.0.0.1:42870/help/library/base/help/POSIXlt and POSIXct
 http://127.0.0.1:42870/help/library/base/help/POSIXct. (The last is
 converted to days by ignoring the time after midnight in the representation
 of the time in specified time zone, default UTC.) Also objects of class
 date(from package date
 http://127.0.0.1:42870/help/library/date/html/as.date.html) and dates 
 (from
 package chron http://127.0.0.1:42870/help/library/chron/html/chron.html).
 Character strings are processed as far as necessary for the format
 specified: any trailing characters are ignored.

 *as.Date will accept numeric data (the number of days since an epoch),
 but only if origin is supplied.*

 The format and as.character methods ignore any fractional part of the date.
 -

 Saludos,
 Carlos Ortega
 www.qualityexcellence.es





 El 9 de julio de 2014, 14:25, Alberto Soria alberto.so...@ari-solar.es
 escribió:

 Hola a todos:

 Debe de ser una tontería, pero no consigo saber porque la siguiente linea
 no devuelve la fecha actual:

 as.Date(as.numeric(Sys.time()))

 He hecho esa prueba porque no consigo pasar un numero convertido a partir
 de una fecha y modificado a fecha de nuevo.

 Gracias por adelantado.

 Un saludo,
 Alberto.

  [[alternative HTML version deleted]]


 ___
 R-help-es mailing list
 R-help-es@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-help-es





 ___
 R-help-es mailing list
 R-help-es@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-help-es


https://twitter.com/FisheriesIIM

[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es