from:"Matthew"

[R] datatable using dt not able to print background colors

2022-12-16 Thread Matthew Pirritano

Hey, all!

I've got a report that uses datatable from DT to create an rmarkdown html that 
looks great as an html but when I try to print it, to a printer, or to a pdf 
the colors I've assigned to cells are not displaying. I'm using chrome and I've 
clicked on the Background graphics button there, but that doesn't help print 
the colors. I have tried to run the datatable section of the code using results 
= 'asis' and eliminating results = 'asis'. Neither seems to help with the

My css style at the top of the rmarkdown is


.main-container {
  max-width: 1500px;
  margin-left: auto;
  margin-right: auto;
  table.display td { white-space: wrap; }

}
td{
  -webkit-print-color-adjust:exact !important;
  print-color-adjust:exact !important;
}


I added the webkit bit based on what I've found online. Maybe I have something 
set up incorrectly there? Any ideas or thoughts on how to get this to print the 
background colors?

Thanks
matt






This communication is intended for the use of the individual or entity to which 
it is addressed and may contain information that is privileged, confidential or 
otherwise exempt from disclosure under applicable law. If you are not the 
intended recipient, you are hereby notified that any dissemination, 
distribution or copying of this communication is strictly prohibited. If you 
have received this communication in error, please notify the sender and delete 
any copies. Thank you.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Differential Gene Expression in R

2021-08-22 Thread Matthew McCormack

  You can look into the edgeR vignette. To get the vignette type 
'vignette("edgeR")' in the R command line. Also, just type 'vignette()' 
and R will list all the vignette's for your loaded packages.  Vignettes 
often have a model analysis that you can follow along and try to adjust 
to your specific data. There is also Biostars,  
https://www.biostars.org/ . However, I doubt you will find anyone on an 
online forum that will walk you through the whole analysis. Although, 
there is probably only 10 plus or minus 4 commands for the whole analysis.


    Alternatively, if you click on the URL you provided below, and at 
the bottom of that page click on 'SRA Run Selector', scroll down a 
little on the page you get to and select the runs you want to analyze by 
checking the appropriate boxes, then click on the grey box on the right 
that has the word 'Galaxy' in it, and it will load your selected runs 
into an instance of Galaxy in which it is a little easier to analyze 
data than on the R command line.


   In the leftmost column of the galaxy page, scroll down to Genomics 
Analysis and then click RNA-seq and scroll down a little and you will 
see that edgeR is available. You will still have to learn a little about 
edgeR analysis, so reading the vignette will be very helpful.


   Also, for the comparisons you want to do, statistical help is 
recommended.


Matthew

On 8/22/21 2:13 PM, Anas Jamshed wrote:

 External Email - Use Caution

  I have downloaded data from:
https://secure-web.cisco.com/11QZcUaPohN9T-S3dXC_GmXle9LtWOwH3EZzb3DhLTvve9_5ltt1RpGGssjgmLGBrEaZGEhesLze6XzCJazVRBgu4xc8kHortjlXtfoXyWlsSXouXicfjhSkh_t-WWivcXHpnTvUtVtq9wEKnxWPCPFNu9hprFt91ho02_8XiRAYDkVLcT76BhLbTleUjEezCPbuh9ieLGA6MVW9oiqYERXpYc2dL-KmvVBER3bd-7KiXJJngxji9kbJDDmm-Irysc8aUWDHZZpWkIB8yT_HFAg/https%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fgeo%2Fquery%2Facc.cgi%3Facc%3DGSE162562%26fbclid%3DIwAR0iZQhttG8HzGhFIIMWbFgNszQrVDgiyVChYzQ_ypCx_d-1pn_tm7STjGs

and now I want to compare:
healthy vs Mild healthy vs Highly exposed seronegative (ishgl) Healthy vs
Asymptomatic covid19 patient healthy vs Highly exposed seronegative (non
ishgl)  from this data.

I started  like  :

library(edgeR)
library(limma)
library(GEOquery)
library(Biobase)

Sys.setenv("VROOM_CONNECTION_SIZE" = 131072 * 2)

setwd("D:\\")

untar("GSE162562_RAW.tar")

filelist = list.files(pattern = ".*.txt.gz")



But after getting text files I don't know how to proceed further. I want to
find degs from these files  *Plz help me *

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://secure-web.cisco.com/1tH9oPhtwGwMZPdSa6iYRfgPKcDjB0RbwLvAsZlByhBsnZOnWMGyfAJedegd7zgzjhBGoJR4l667r5yELyZUobz_rb-7cCszSEx-M4al0kObEUewwS1-66OaSN7ZHYe8OS9Oz6xG6KzS1XBqB5GDyXiA8FMoIEfaq49EamqyjBtwwgsNKpMdy2IyCTZ2dSL_cdkkD5dacTj5gg4PLprBua7uc32IM4bJmSXSAMxd31lqPP9m3V83kjORuTO61SZzQOeTSf8g8HwY6bDJLlOATxA/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help
PLEASE do read the posting guide 
http://secure-web.cisco.com/1qNZPXyZ9T-DwVY58dRhW-s2KI0g8PKqYBjd8eU1WX1DwW8TqASTq2NkdBNjUHF6T9QiEWRKhGinSfo78D3RrHq9hc9HVXYF7t9KAzK-sUNE0Y0IB62wcBJrH8Gd0LS7aus-36dSfndVD9CShsOMfwyMj5KIVQI8sppBOu5xbWhJEYfH3MgGhC_TVJIkQ126GdEuG4wK7xnnBh90fF4tdTJbHmaIWBn4yxPbhSdrYqs7GCgf_Gp4kee0aSyzxk_0WBkd2fPtnz5Ecbqkb1P8C6g/http%3A%2F%2Fwww.R-project.org%2Fposting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Joining data frames

2021-06-29 Thread Matthew McCormack

    I think, but I'm not sure, that when you use merge it basically 
attaches one data frame to the other. I do not think it matches up 
entries from a particular column in each data frame (and I know 
biologists frequently want to match entries from a particular column in 
each data frame). For that, I think you need a join from the dplyr package.


   If you do a right join, then it will use only the entries from the 
second df (the data frame to the right, df1). Entries in df, that are 
not in df1 will not be in the final (in your example the final is df). 
So, from you code, you took df and then joined it to where it had 
entries in df1 and changed df to contain only entries in df that were in 
df1. Had you done a left_join, then your final data frame, df, would 
contain only those entries found originally in df and df1 (entries in 
df1, but not in df would be excluded in the final df).


   You could do a full_join and then all entries (entries in both data 
frames, entries in df but not in df1, and entries in df1 but not in df) 
will be in the final. Maybe something like : (In this case I have 
created a new data frame, df_final, but you could still go with just 
changing df.)


df_final<- full_join(df, df1, by = c(“Sample”, "Plot"))

Matthew


On 6/29/21 7:15 PM, Jim Lemon wrote:

 External Email - Use Caution

Hi Esthi,
Have you tried something like:

df2<-merge(df,df1,by.x="Sample",by.y="Plot",all.y=TRUE)

This will get you a right join in "df2", not overwriting "df".

Jim

On Wed, Jun 30, 2021 at 1:13 AM Esthi Erickson  wrote:

Hi and thank you in advance,

If I have a dataframe, df:

Sample

Plot

Biomass

1

1

1024

1

2

32

2

3

223

2

4

456

3

1


3

2

331

3

3

22151

3

4

1441

And another one, df1:

Sample

Plot

% cover of plant1

% cover of plant2

3

1

32

63

3

2

3


3

3


3

3

4

5

23

I want to join these tables where the columns Sample and Plot are the same.

Currently trying:

df<- right_join(df, df1, by = c(“Sample”, "Plot"))

I am working with a much larger dataset, but it will cut off the data
starting at Sample 3 instead of joining the tables while retaining the
information from df. Any ideas how I could join them this way?


Esthi

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://secure-web.cisco.com/1NdijE3bwtTnJ0kSEgtJU1NlrtOK9zEfac9zyeZv87EuBW5RBFz3d1rdtVoxuWjjEZm2ILfmP1KOs1kEsAOECi2THQ-_HKB9EOJWeI57gQdy8H3UbdNo5_jjkMLPJ7OWuokUT-FJwD84kR0uptsG7XUn_xN9NkAZ4ESV6jXCMs_vWVuqkvXkPRfDV0BBMBQWLKxiQKz-9GYTrcqzWGsCc_A1LB3p6YBnMcOeElnau9pAicwrSrzqbNayjDWgW75J91dn1Bpb7rhV4xLELl_KS0g/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help
PLEASE do read the posting guide 
http://secure-web.cisco.com/1yHZkzpQUOKRg8cDG22MQDxPOC13uXEOgchugGyn3LgrkzeHEY3bJmUM7BdgniFNPIUlVK9c26rAxELBoKzCk3QtR375fxo8PTFptWSOByZg9wWZw8ounbb3NvkgZApJHaDn6KCFRf4ym05BIQUG039oUDbsdBh6fa5LNBsdgTIGVetQokelOMdncVxIv_g233z1CF1xfAozJ9-8eetgqhSIh1lRMlheHhpVRDzkSbxAij8APSko49XhpHmsqwOevGN0c3vHgLT2dLLAzvO_ZLA/http%3A%2F%2Fwww.R-project.org%2Fposting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://secure-web.cisco.com/1NdijE3bwtTnJ0kSEgtJU1NlrtOK9zEfac9zyeZv87EuBW5RBFz3d1rdtVoxuWjjEZm2ILfmP1KOs1kEsAOECi2THQ-_HKB9EOJWeI57gQdy8H3UbdNo5_jjkMLPJ7OWuokUT-FJwD84kR0uptsG7XUn_xN9NkAZ4ESV6jXCMs_vWVuqkvXkPRfDV0BBMBQWLKxiQKz-9GYTrcqzWGsCc_A1LB3p6YBnMcOeElnau9pAicwrSrzqbNayjDWgW75J91dn1Bpb7rhV4xLELl_KS0g/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help
PLEASE do read the posting guide 
http://secure-web.cisco.com/1yHZkzpQUOKRg8cDG22MQDxPOC13uXEOgchugGyn3LgrkzeHEY3bJmUM7BdgniFNPIUlVK9c26rAxELBoKzCk3QtR375fxo8PTFptWSOByZg9wWZw8ounbb3NvkgZApJHaDn6KCFRf4ym05BIQUG039oUDbsdBh6fa5LNBsdgTIGVetQokelOMdncVxIv_g233z1CF1xfAozJ9-8eetgqhSIh1lRMlheHhpVRDzkSbxAij8APSko49XhpHmsqwOevGN0c3vHgLT2dLLAzvO_ZLA/http%3A%2F%2Fwww.R-project.org%2Fposting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] analyzing results from Tuesday's US elections

2020-11-16 Thread Matthew McCormack

Bye the way, I thought I had checked my e-mail before sending it, but my
last e-mail had an unfortunate typo with an 'I' that originally belonged
to the beginning of a deleted sentence.

Matthew

On 11/17/20 1:54 AM, Matthew McCormack wrote:

External Email - Use Caution
No reason to apologize. It's a timely and very interesting topic
that provides a glimpse into the application of statistics in
forensics. I had never heard of Benford's Law before and I think it is
really fascinating. One of those very counter intuitive rules that
show up in statistics and probability; like the Monty Hall problem.
Why in the world does Benford's Law work ? I have been wondering if
it could in any way be applied to biological data analysis. (Also, I
discovered Stand-up-maths !).

Often things are not as easy to figure out as we may first
estimate. I think you would have to start with how you would envision
a fraud to be committed and then figure out if there is a statistical
analysis that could detect it, or develop an anlalysis. For example,
if a voting machine were weighting votes and giving 8/10ths of a vote
to 'yes' and 10/10ths vote to a 'no'. Is there some statistical
analysis that could detect this ? I, Or if someone dumped a couple of
thousand fraudulent ballots in a vote counting center, is there some
statistical analysis that could detect this ? Who knows, maybe a
whole new field waiting to be explored. A oncee-in-a-while dive into a
practical application of statistics that has current interest can be
fun and enlightening for those interested.

Matthew

On 11/16/20 9:01 PM, Abby Spurdle wrote:

External Email - Use Caution

I've come to the conclusion this whole thing was a waste of time.
This is after evaluating much of the relevant information.

The main problem is a large number of red herrings (some in the data,
some in the context), leading pointless data analysis and pointless
data collection.
It's unlikely that sophisticated software, or sophisticated
statistical modelling tools will make any difference.
Although pretty plots, and pretty web-graphics are achievable.

Sorry list, for encouraging this discussion...

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://secure-web.cisco.com/1icMQVewwCL4P0r0nMcvTG7cQoLGA8vrClXS_7PuCMhfAP5EDlSYNlGppDKYtdY57R0Pqq_TLC4uyH7CSQjzrxbWonQqTR0d7Owzt1oJUshxqjBaYybtXPytcEKTyGL0Wj0aNw-lMCtbQG1wHYe2Gw8r8h0LpQfFihvpv8gyl3L3VpdCfL2GdiuVFUHGynOFY8Lu5fZwQDVdp1bN_ZAAbRHhoQEipiM-vRiK0kf20oD1N3CXQfqyS4O2r9kRmArVLk8RiqyHI0rj_I1iVq5m-bQ/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help

PLEASE do read the posting guide
http://secure-web.cisco.com/1K7htkVeCfn5qRcheVmtA1IibcAUehTMiQa-HWmOXY4aZKKdTMqGoB7oWO4dEEBc1qJDtaTeaodidutGZhJexhH2C4c_FpLR_XA-z7GOvfq77dIwhWfnGcvj_31a6y-SXgu5nPP4AdpguRqwR433dZOUMo5MtP5xwtOUGO-EcWd4AvW_7NUFljEFGuAMs06pzQoK4BPfSavqq_QAj-R_mHJ4-AgaKn2Fmh2BOhustujXNyeeWi6KXg3oXtQzqi6BL4HMEK7iWvT21SPXOEJZlMg/http%3A%2F%2Fwww.R-project.org%2Fposting-guide.html

and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] analyzing results from Tuesday's US elections

2020-11-16 Thread Matthew McCormack

 No reason to apologize. It's a timely and very interesting topic 
that provides a glimpse into the application of statistics in forensics. 
I had never heard of Benford's Law before and I think it is really 
fascinating. One of those very counter intuitive rules that show up in 
statistics and probability; like the Monty Hall problem. Why in the 
world does Benford's Law work ?  I have been wondering if it could in 
any way be applied to biological data analysis. (Also, I discovered 
Stand-up-maths !).


   Often things are not as easy to figure out as we may first estimate. 
I think you would have to start with how you would envision a fraud to 
be committed and then figure out if there is a statistical analysis that 
could detect it, or develop an anlalysis. For example, if a voting 
machine were weighting votes and giving 8/10ths of a vote to 'yes' and 
10/10ths vote to a 'no'. Is there some statistical analysis that could 
detect this ?  I, Or if someone dumped a couple of thousand fraudulent 
ballots in a vote counting center, is there some statistical analysis 
that could detect this ?  Who knows, maybe a whole new field waiting to 
be explored. A oncee-in-a-while dive into a practical application of 
statistics that has current interest can be fun and enlightening for 
those interested.


Matthew

On 11/16/20 9:01 PM, Abby Spurdle wrote:

 External Email - Use Caution

I've come to the conclusion this whole thing was a waste of time.
This is after evaluating much of the relevant information.

The main problem is a large number of red herrings (some in the data,
some in the context), leading pointless data analysis and pointless
data collection.
It's unlikely that sophisticated software, or sophisticated
statistical modelling tools will make any difference.
Although pretty plots, and pretty web-graphics are achievable.

Sorry list, for encouraging this discussion...


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] analyzing results from Tuesday's US elections

2020-11-15 Thread Matthew McCormack

  I really like this guy's video as well. (He also has another nice 
video critiquing a statistical analysis of vote results from Kent 
county, Michigan that was presented by a Massachusetts Senate candidate, 
who has some impressive academic credentials. )


  And continuing in this same vein of the complexities of statistical 
analysis by intelligent people here is a video by Mark Nigrini using 
Benfords analysis on Maricopa County vote results.


https://www.youtube.com/watch?v=FrJui5d7BrI_channel=MarkNigrini

    If you search for Mark Nigrini on Amazon you will see that he has 
written a major text on Forensic Analysis, specifically forensic 
accounting investigations, that is now in its second edition as well as 
an additional two books on analysis with Benford's Law for accounting, 
auditing, and fraud detection (He plugs the text in the last part of the 
video). All four books have 4-5 star reviews with 2-48 reviewers. From 
the tiny amount of reading I have done on Benford's Law, it seems that 
Nigirini is a leading figure in the use of Benford's Law. In the video 
he shows that voting results for both Trump and Biden from Maricopa 
county AZ both agree with Benfords Law. However, he uses the last digit 
and not the first. A word of caution before you click on that link: he 
uses Excel !


Matthew

On 11/13/20 9:59 PM, Rolf Turner wrote:

 External Email - Use Caution

On Thu, 12 Nov 2020 01:23:06 +0100
Martin Møller Skarbiniks Pedersen  wrote:


Please watch this video if you wrongly believe that Benford's law
easily can be applied to elections results.

https://secure-web.cisco.com/1nXQfJ050onRLM1UOwgj-z0o0L3Hj6hd0rCZ7zMpqnBfCDuZcCkxAJZnj7o7Z8ZAUVxYBTf5FBjL2Y-Ca8T_ecO-N54S0KhgRtLoVDgxiEKX9N7eqzuxO0k0HloVcc2lXrXFNAiansI8zHgyUS4gTdKtRsJCHttTn5bwmV8J7d0_6iqrjee_toWiGnTsDSFaKVkev7tKKV3ERLFwzTPtNf2Rm99EBbdA75FvsXfBk3WXuVop4GZbN3ZGkd2SssFJaw9AgTHmM1k3C2bnB_STO_w/https%3A%2F%2Fyoutu.be%2Fetx0k1nLn78

Just watched this video and found it to be delightfully enlightening
and entertaining.  (Thank you Martin for posting the link.)

However a question springs to mind:  why is it the case that Trump's
vote counts in Chicago *do* seem to follow Benford's law (at least
roughly) when, as is apparently to be expected, Biden's don't?

Has anyone any explanation for this?  Any ideas?

cheers,

Rolf Turner



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] analyzing results from Tuesday's US elections

2020-11-09 Thread Matthew McCormack



Benford Analysis for Data Validation and Forensic Analytics

Provides tools that make it easier to validate data using Benford's Law.

https://www.rdocumentation.org/packages/benford.analysis/versions/0.1.5


Matthew

On 11/9/20 9:23 AM, Alexandra Thorn wrote:
>  External Email - Use Caution
>
> This thread strikes me as pretty far off-topic for a forum dedicated to
> software support on R.
>
> https://secure-web.cisco.com/15MzwKoUQfDzeGBDx9gweXKgiYtAPv1UlnW2dg9CuDtSNWgxy3ffTf_uuPizbjoJnovoOD6lrPDluOgGvIUTEF1d_rOTfaF3nUKLvNiZa3fHZ_IHD-SjKotr4lurHjmNPlSrljLipPsrDk2aoo63-GLwvaw64By_MnLST7lt4FgA2pYXgE3x15Xn-kRZ85m29f0BxhHJMVfilvVUoUEBPrw/https%3A%2F%2Fwww.r-project.org%2Fmail.html%23instructions
> "The ‘main’ R mailing list, for discussion about problems and solutions
> using R, announcements (not covered by ‘R-announce’ or ‘R-packages’,
> see above), about the availability of new functionality for R and
> documentation of R, comparison and compatibility with S-plus, and for
> the posting of nice examples and benchmarks. Do read the posting guide
> before sending anything!"
>
> https://secure-web.cisco.com/1V05G8mWSPHU-YvLbL-UQMy49XX7n7-EivE-gTOlh2nZ3P0oxp6DGUUZQ_Q5VIkE3J0qmhrrSXxJaqZjv-Tllghba8lQrbkazuAHTcltsfo3I-C-SMqhb-CDdFbeEgIsr7py_gKW9BqumTZacywhHVnzhGGR2s1A-2akqQLYSYpYeX5EcVJAYvX1KPCs9kJbOEveOr5yYjetokaZpLTzdMA/https%3A%2F%2Fwww.r-project.org%2Fposting-guide.html
> "The R mailing lists are primarily intended for questions and
> discussion about the R software. However, questions about statistical
> methodology are sometimes posted. If the question is well-asked and of
> interest to someone on the list, it may elicit an informative
> up-to-date answer. See also the Usenet groups sci.stat.consult (applied
> statistics and consulting) and sci.stat.math (mathematical stat and
> probability)."
>
> On Mon, 9 Nov 2020 00:53:46 -0500
> Matthew McCormack  wrote:
>
>> You can try here: 
>> https://secure-web.cisco.com/17WRivozTB0Frts23cTlTBd3SYWzVXQsLa_jDRN8SldAl35F0SYXRMZczzIXrQFTzbfRV4YfPOVhMSwopcdTU9Sva396s3bX3-KM7-51KjSnY0aXxlADYaHdvs4y4YXrUfk1GT2801rVL26MCEEn2E1azdQ8ECllu1roS_Z8MIj8d6kyCtUYVdOYN1i9DuWBSXPlEi-iOtrQsBp6ELRXNFw/https%3A%2F%2Fdecisiondeskhq.com%2F
>>
>> I think they have what you are looking for. From their website:
>>
>> "Create a FREE account to access up to the minute election results
>> and insights on all U.S. Federal elections. Decision Desk HQ &
>> Øptimus provide live election night coverage, race-specific results
>> including county-level returns, and exclusive race probabilities for
>> key battleground races."
>>
>>      Also, this article provides a little, emphasis on little, of
>> statistical analysis of election results, but it may be a place to
>> start.
>>
>> https://secure-web.cisco.com/1JA34S9tw27K78g7scwo2aGe4lPpV7HThBE81hhJjb4Ban7fxqbnOZqx7HxfcyqKrcB5BX7oJFHhMPumrxjm6aQJ0trW1Jgk0h9s2mNhZg4T_gTUls8y4l0KZ-AstUtw0eC0TtR9mHblU7KWid-7OO4mg0TfsxWyNpcqkA8MBuGftOEgUF7WtakShYgmCNYJkEfQJHK5_vjwK0taJeUheVw/https%3A%2F%2Fwww.theepochtimes.com%2Fstatistical-anomalies-in-biden-votes-analyses-indicate_3570518.html%3Futm_source%3Dnewsnoe%26utm_medium%3Demail%26utm_campaign%3Dbreaking-2020-11-08-5
>>
>> Matthew
>>
>> On 11/8/20 11:25 PM, Bert Gunter wrote:
>>>   External Email - Use Caution
>>>
>>> NYT  had interactive maps that reported  votes by county. So try
>>> contacting them.
>>>
>>>
>>> Bert
>>>
>>> On Sun, Nov 8, 2020, 8:10 PM Abby Spurdle 
>>> wrote:
>>>>> such a repository already exists -- the NY Times, AP, CNN, etc.
>>>>> etc.
>>>> already have interactive web pages that did this
>>>>
>>>> I've been looking for presidential election results, by
>>>> ***county***. I've found historic results, including results for
>>>> 2016.
>>>>
>>>> However, I can't find such a dataset, for 2020.
>>>> (Even though this seems like an obvious thing to publish).
>>>>
>>>> I suspect that the NY Times has the data, but I haven't been able
>>>> to work where the data is on their website, or how to access it.
>>>>
>>>> More ***specific*** suggestions would be appreciated...?
>>>>   
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://secure-web.cisco.com/1C8m4dUQtDXEQdbAFTH153ehiJcvHuL_FkvDGeJBHhMRYZauAp6gdevfmLIh2MLpRjBx7LXAG9QpagRV63oMY5AyQF6uOkNa7JGw-0zGZKIFHoSuZtjpcIokATDMxqoJlVfCiktqIYXEiJcro

Re: [R] analyzing results from Tuesday's US elections

2020-11-08 Thread Matthew McCormack

You can try here: https://decisiondeskhq.com/

I think they have what you are looking for. From their website:

"Create a FREE account to access up to the minute election results and 
insights on all U.S. Federal elections. Decision Desk HQ & Øptimus 
provide live election night coverage, race-specific results including 
county-level returns, and exclusive race probabilities for key 
battleground races."

    Also, this article provides a little, emphasis on little, of 
statistical analysis of election results, but it may be a place to start.

https://www.theepochtimes.com/statistical-anomalies-in-biden-votes-analyses-indicate_3570518.html?utm_source=newsnoe_medium=email_campaign=breaking-2020-11-08-5

Matthew

On 11/8/20 11:25 PM, Bert Gunter wrote:
>  External Email - Use Caution
>
> NYT  had interactive maps that reported  votes by county. So try contacting
> them.
>
>
> Bert
>
> On Sun, Nov 8, 2020, 8:10 PM Abby Spurdle  wrote:
>
>>> such a repository already exists -- the NY Times, AP, CNN, etc. etc.
>> already have interactive web pages that did this
>>
>> I've been looking for presidential election results, by ***county***.
>> I've found historic results, including results for 2016.
>>
>> However, I can't find such a dataset, for 2020.
>> (Even though this seems like an obvious thing to publish).
>>
>> I suspect that the NY Times has the data, but I haven't been able to
>> work where the data is on their website, or how to access it.
>>
>> More ***specific*** suggestions would be appreciated...?
>>
>   [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://secure-web.cisco.com/1C8m4dUQtDXEQdbAFTH153ehiJcvHuL_FkvDGeJBHhMRYZauAp6gdevfmLIh2MLpRjBx7LXAG9QpagRV63oMY5AyQF6uOkNa7JGw-0zGZKIFHoSuZtjpcIokATDMxqoJlVfCiktqIYXEiJcrovbnxo-DAgLEiREocQrn0yMbLc2A-gwR3CN9XurWkU21TUD1CLJ-3gpiCLKKe9BdHWdaeEA/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help
> PLEASE do read the posting guide 
> http://secure-web.cisco.com/1ppZyk8SO6U25PKNDKtGQ-VIADLxXgKvnHc8QlV3cUMNPzLQvS8E0i9cg05EyzUyHnFjj2QWDjvAjyuduvE1P8Nr0TogQweiuBysM9a1rXjQn1EOaypHdqwa2_inODK1icu0Ff33AZDB00N4x-nYxZ2e16nArVuaMEddaLXBhtBYMn2LAcPYJ8s2wGN10heiFWywn-r8--Hw77GJx1hkTgg/http%3A%2F%2Fwww.R-project.org%2Fposting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RNA Seq Analysis in R

2020-08-01 Thread Matthew McCormack

As with the previous post, I agree that Bioconductor will be a better 
place to ask this question.

As a quick thought you also might try to adjust the p-value in the last 
line:

DEGs = subset(tT, P.Value < 0.01 & abs(logFC) > 2). You could play 
around with the P.Value, 0.01 is pretty low, you could try 0.05 and 
maybe abs(logFC) > 1.

But, first you should try to print out tT with something like 
write.table(tT, file = TopTable.txt, sep = "\t").

This will write out tT to a tab-delimited text file (in the directory 
that you are working in) that you can import into Excel and then inspect 
the logFC and p-values for the top 1250 genes.

Matthew

On 8/1/20 1:13 PM, Jeff Newmiller wrote:
>  External Email - Use Caution
>
> https://www.bioconductor.org/help/
>
> On August 1, 2020 4:01:08 AM PDT, Anas Jamshed  
> wrote:
>> I choose microarray data GSE75693 of 30 patients with stable kidney
>> transplantation and 15 with BKVN to identify differentially expressed
>> genes
>> (DEGs). I performed this in GEO2R and find R script there and Runs R
>> script
>> Successfully on R studio as well. The R script is :
>>
>> # Differential expression analysis with limma
>>
>> library(Biobase)
>> library(GEOquery)
>> library(limma)
>> # load series and platform data from GEO
>>
>> gset <- getGEO("GSE75693", GSEMatrix =TRUE, AnnotGPL=TRUE)if
>> (length(gset) > 1) idx <- grep("GPL570", attr(gset, "names")) else idx
>> <- 1
>> gset <- gset[[idx]]
>> # make proper column names to match toptable
>> fvarLabels(gset) <- make.names(fvarLabels(gset))
>> # group names for all samples
>> gsms <- paste0("00XXX1",
>> "11XXX")
>> sml <- c()for (i in 1:nchar(gsms)) { sml[i] <- substr(gsms,i,i) }
>> # eliminate samples marked as "X"
>> sel <- which(sml != "X")
>> sml <- sml[sel]
>> gset <- gset[ ,sel]
>> # log2 transform
>> exprs(gset) <- log2(exprs(gset))
>> # set up the data and proceed with analysis
>> sml <- paste("G", sml, sep="")# set group names
>> fl <- as.factor(sml)
>> gset$description <- fl
>> design <- model.matrix(~ description + 0, gset)
>> colnames(design) <- levels(fl)
>> fit <- lmFit(gset, design)
>> cont.matrix <- makeContrasts(G1-G0, levels=design)
>> fit2 <- contrasts.fit(fit, cont.matrix)
>> fit2 <- eBayes(fit2, 0.01)
>> tT <- topTable(fit2, adjust="fdr", sort.by="B", number=1250)
>>
>> tT <- subset(tT,
>> select=c("ID","adj.P.Val","P.Value","t","B","logFC","Gene.symbol","Gene.title"))
>> DEGs = subset(tT, P.Value < 0.01 & abs(logFC) > 2)
>>
>> After running this no genes are found plz help me
>>
>>  [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fwd: Re: transpose and split dataframe

2019-05-06 Thread Matthew

Thank you very much Jim and David for your scripts and accompanying 
explanations.

I was intrigued at the results that came from David's script.  As seen 
below where I have taken a small piece of his DataTable:

AT1G69490 AT1G29860    AT4G18170 *AT5G46350*
AT1G01560    0    0    0    1
*AT1G02920*    1    2    2    4
AT1G02930    1    2    2    4
AT1G05675    1    1    1    2

    There are numbers other than 1 or 0, which was not what I was 
expecting. The data I am working with come from downloading results of 
an analysis done at a particular web site. I looked at Jim's solution, 
and the equivalent of the above would be:

     AT1G69490 _AT1G29860_ _AT1G29860_ AT4G18170    AT4G18170 
*AT5G46350    AT5G46350 AT5G46350    AT5G46350    AT5G46350*
AT1G01560    NA    NA    NA    NA    NA    NA NA    NA    AT1G01560    NA
*AT1G02920*    AT1G02920    AT1G02920 AT1G02920    AT1G02920    
AT1G02920    AT1G02920    AT1G02920 AT1G02920    AT1G02920    NA
AT1G02930    AT1G02930    AT1G02930    AT1G02930 AT1G02930    
AT1G02930    AT1G02930    AT1G02930    AT1G02930 AT1G02930    NA
AT1G05675    AT1G05675    AT1G05675    NA AT1G05675    NA    
AT1G05675    AT1G05675    NA    NA    NA

   The above is the format that I was desiring, but I was not expecting 
that a single ATG number would be the name of multiple columns. As shown 
above, _AT1G2960_ is the name of two columns and *AT5G46350* is the name 
of 5 columns (You may have to widen the e-mail across the screen to see 
it clearly). When a single ATG number, such as AT5G46350, names multiple 
columns, then the contents of each of those columns may or may not be 
the same. For example, going across a single row looking at *AT1G02920*, 
it occurs in the first column, hence the 1 in David's DataTable. It 
occurs in both AT1G29860 columns, hence the 2 in the DataTable. It again 
occurs in both AT4G18170 columns, so another 2 in the DataTable, and 
finally it occurs in only 4 of the 5 AT5G46350 columns, so the 4 in the 
DataTable.

     When the same ATG number names multiple columns it is because 
different methods were used to determine the content of each column. So, 
if an ATG number such as AT1G05675 occurs in all columns with the same 
name, I then know that it was by multiple methods that this has been 
shown, and if it only occurs in some of the columns, I know that all 
methods did not associate it with the column name ATG.  David's result 
complements Jim's, and both end up being very helpful to me.

   Thanks again to both of you for your time and help.

Matthew

On 5/2/2019 8:40 PM, Jim Lemon wrote:
>  External Email - Use Caution
>
> Hi again,
> Just noticed that the NA fill in the original solution is unnecessary, thus:
>
> # split the second column at the commas
> hitsplit<-strsplit(mmdf$hits,",")
> # get all the sorted hits
> allhits<-sort(unique(unlist(hitsplit)))
> tmmdf<-as.data.frame(matrix(NA,ncol=length(hitsplit),nrow=length(allhits)))
> # change the names of the list
> names(tmmdf)<-mmdf$Regulator
> for(column in 1:length(hitsplit)) {
>   hitmatches<-match(hitsplit[[column]],allhits)
>   hitmatches<-hitmatches[!is.na(hitmatches)]
>   tmmdf[hitmatches,column]<-allhits[hitmatches]
> }
>
> Jim
>
> On Fri, May 3, 2019 at 10:32 AM Jim Lemon  wrote:
>> Hi Matthew,
>> I'm not sure whether you want something like your initial request or
>> David's solution. The result of this can be transformed into the
>> latter:
>>
>> mmdf<-read.table(text="Regulator hits
>> AT1G69490 
>> AT4G31950,AT5G24110,AT1G26380,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G79680,AT3G02840,AT5G25260,AT5G57220,AT2G37430,AT2G26560,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT5G05300,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT5G52760,AT5G66020,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT2G02010,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT2G40180,AT1G59865,AT4G35180,AT4G15417,AT1G51820,AT1G06135,AT1G36622,AT5G42830
>> AT1G29860 
>> AT4G31950,AT5G24110,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G14540,AT1G79680,AT1G07160,AT3G23250,AT5G25260,AT1G53625,AT5G57220,AT2G37430,AT3G54150,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT4G14450,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT4G08555,AT5G66020,AT5G26920,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT4G35180,AT4G15417,AT1G51820,AT4G40020,AT1G06135
>> AT1G2986 
>> AT5G64905,AT1G21120,AT1G07160,AT5G25260,AT1G53625,AT1G56250,AT2G31345,AT4G11170,AT1G66090,AT1G26410,AT3G55840,AT1G69930,AT4G03460,AT5G25250,AT5G36925,AT1G26420,AT5G42380,AT1G16150,AT2G22880,AT1G02930,AT4G11890,AT1G72520,AT5G66020,AT2G43620

Re: [R] Fwd: Re: transpose and split dataframe

2019-05-01 Thread Matthew

Thank you very much, David and Jim for your work and solutions.

I have been working through both of them to better learn R. They both 
proceed through a similar logic except David's starts with a character 
matrix and Jim's with a dataframe, and both end with equivalent 
dataframes (  identical(tmmdf, TF2list2)) returns TRUE  ). They have 
both been very helpful. However, there is one attribute of my intended 
final dataframe that is missing.

Looking at part of the final dataframe:

  head(tmmdf)
   AT1G69490 AT1G29860 AT1G29860.1 AT4G18170 AT4G18170.1 AT5G46350
1 *AT4G31950* *AT4G31950*   AT5G64905 *AT4G31950* AT5G64905 *AT4G31950*
2 AT5G24110 AT5G24110   AT1G21120 AT5G24110   AT1G14540 AT5G24110
3 AT1G26380 AT1G05675   AT1G07160 AT1G05675   AT1G21120 AT1G05675

Row 1 has *AT4G31950* in columns 1,2,4 and 6, but AT4G31950 in columns 3 
and 5. What I was aiming at would be that each row would have a unique 
entry so that AT4G31950 is row 1 columns 1,2,4 and 6, and NA is row 1 
columns 3 and 5. AT4G31950 is row 2 columns 3 and 5 and NA is row 2 
columns 1,2,4 and 6. So, it would look like this:

  head(intended_df)
   AT1G69490 AT1G29860 AT1G29860.1 AT4G18170 AT4G18170.1 AT5G46350
1 AT4G31950 AT4G31950 NA                AT4G31950       NA         
AT4G31950

2      NA                NA           AT4G31950       NA            
AT4G31950      NA

I have been trying to adjust the code to get my intended result 
basically by trying to build a dataframe one column at a time from each 
entry in the character matrix, but have not got anything near working yet.

Matthew

On 4/30/2019 6:29 PM, David L Carlson wrote
> If you read the data frame with read.csv() or one of the other read() 
> functions, use the asis=TRUE argument to prevent conversion to factors. If 
> not do the conversion first:
>
> # Convert factors to characters
> DataMatrix <- sapply(TF2list, as.character)
> # Split the vector of hits
> DataList <- sapply(DataMatrix[, 2], strsplit, split=",")
> # Use the values in Regulator to name the parts of the list
> names(DataList) <- DataMatrix[,"Regulator"]
>
> # Now create a data frame
> # How long is the longest list of hits?
> mx <- max(sapply(DataList, length))
> # Now add NAs to vectors shorter than mx
> DataList2 <- lapply(DataList, function(x) c(x, rep(NA, mx-length(x
> # Finally convert back to a data frame
> TF2list2 <- do.call(data.frame, DataList2)
>
> Try this on a portion of the list, say 25 lines and print each object to see 
> what is happening.
>
> 
> David L Carlson
> Department of Anthropology
> Texas A University
> College Station, TX 77843-4352
>
>
>
>
>
> -Original Message-
> From: R-help  On Behalf Of Matthew
> Sent: Tuesday, April 30, 2019 4:31 PM
> To: r-help@r-project.org
> Subject: [R] Fwd: Re: transpose and split dataframe
>
> Thanks for your reply. I was trying to simplify it a little, but must
> have got it wrong. Here is the real dataframe, TF2list:
>
>    str(TF2list)
> 'data.frame':    152 obs. of  2 variables:
>    $ Regulator: Factor w/ 87 levels "AT1G02065","AT1G13960",..: 17 6 6 54
> 54 82 82 82 82 82 ...
>    $ hits : Factor w/ 97 levels
> "AT1G05675,AT3G12910,AT1G22810,AT1G14540,AT1G21120,AT1G07160,AT5G22520,AT1G56250,AT2G31345,AT5G22530,AT4G11170,A"|
> __truncated__,..: 65 57 90 57 87 57 56 91 31 17 ...
>
>      And the first few lines resulting from dput(head(TF2list)):
>
> dput(head(TF2list))
> structure(list(Regulator = structure(c(17L, 6L, 6L, 54L, 54L,
> 82L), .Label = c("AT1G02065", "AT1G13960", "AT1G18860", "AT1G23380",
> "AT1G29280", "AT1G29860", "AT1G30650", "AT1G55600", "AT1G62300",
> "AT1G62990", "AT1G64000", "AT1G66550", "AT1G66560", "AT1G66600",
> "AT1G68150", "AT1G69310", "AT1G69490", "AT1G69810", "AT1G70510", ...
>
> This is another way of looking at the first 4 entries (Regulator is
> tab-separated from hits):
>
> Regulator
>     hits
> 1
> AT1G69490
>    
> AT4G31950,AT5G24110,AT1G26380,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G79680,AT3G02840,AT5G25260,AT5G57220,AT2G37430,AT2G26560,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT5G05300,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT5G52760,AT5G66020,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT2G02010,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT2G40180,AT1G59865,AT4G35180,AT4G15417,AT1G51820,AT1G06135,AT1G36622,AT5G42830
> 2
> AT1G29860
>    
> AT4G31950,AT5G24110,AT1G0567

[R] Fwd: Re: transpose and split dataframe

2019-04-30 Thread Matthew

Thanks for your reply. I was trying to simplify it a little, but must 
have got it wrong. Here is the real dataframe, TF2list:

  str(TF2list)
'data.frame':    152 obs. of  2 variables:
  $ Regulator: Factor w/ 87 levels "AT1G02065","AT1G13960",..: 17 6 6 54 
54 82 82 82 82 82 ...
  $ hits : Factor w/ 97 levels 
"AT1G05675,AT3G12910,AT1G22810,AT1G14540,AT1G21120,AT1G07160,AT5G22520,AT1G56250,AT2G31345,AT5G22530,AT4G11170,A"|

__truncated__,..: 65 57 90 57 87 57 56 91 31 17 ...

    And the first few lines resulting from dput(head(TF2list)):

dput(head(TF2list))
structure(list(Regulator = structure(c(17L, 6L, 6L, 54L, 54L,
82L), .Label = c("AT1G02065", "AT1G13960", "AT1G18860", "AT1G23380",
"AT1G29280", "AT1G29860", "AT1G30650", "AT1G55600", "AT1G62300",
"AT1G62990", "AT1G64000", "AT1G66550", "AT1G66560", "AT1G66600",
"AT1G68150", "AT1G69310", "AT1G69490", "AT1G69810", "AT1G70510", ...

This is another way of looking at the first 4 entries (Regulator is 
tab-separated from hits):

Regulator
   hits
1
AT1G69490

AT4G31950,AT5G24110,AT1G26380,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G79680,AT3G02840,AT5G25260,AT5G57220,AT2G37430,AT2G26560,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT5G05300,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT5G52760,AT5G66020,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT2G02010,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT2G40180,AT1G59865,AT4G35180,AT4G15417,AT1G51820,AT1G06135,AT1G36622,AT5G42830
2
AT1G29860

AT4G31950,AT5G24110,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G14540,AT1G79680,AT1G07160,AT3G23250,AT5G25260,AT1G53625,AT5G57220,AT2G37430,AT3G54150,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT4G14450,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT4G08555,AT5G66020,AT5G26920,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT4G35180,AT4G15417,AT1G51820,AT4G40020,AT1G06135

3
AT1G2986

AT5G64905,AT1G21120,AT1G07160,AT5G25260,AT1G53625,AT1G56250,AT2G31345,AT4G11170,AT1G66090,AT1G26410,AT3G55840,AT1G69930,AT4G03460,AT5G25250,AT5G36925,AT1G26420,AT5G42380,AT1G16150,AT2G22880,AT1G02930,AT4G11890,AT1G72520,AT5G66020,AT2G43620,AT2G44370,AT4G15975,AT1G35210,AT5G46295,AT1G11925,AT2G39200,AT1G02920,AT4G14370,AT4G35180,AT4G15417,AT2G18690,AT5G11140,AT1G06135,AT5G42830

    So, the goal would be to

first: Transpose the existing dataframe so that the factor Regulator 
becomes a column name (column 1 name = AT1G69490, column2 name 
AT1G29860, etc.) and the hits associated with each Regulator become 
rows. Hits is a comma separated 'list' ( I do not not know if 
technically it is an R list.), so it would have to be comma 
'unseparated' with each entry becoming a row (col 1 row 1 = AT4G31950, 
col 1 row 2 - AT5G24410, etc); like this :

AT1G69490
AT4G31950
AT5G24110
AT1G05675
AT5G64905

... I did not include all the rows)

I think it would be best to actually make the first entry a separate 
dataframe ( 1 column with name = AT1G69490 and number of rows depending 
on the number of hits), then make the second column (column name = 
AT1G29860, and number of rows depending on the number of hits) into a 
new dataframe and do a full join of of the two dataframes; continue by 
making the third column (column name = AT1G2986) into a dataframe and 
full join it with the previous; continue for the 152 observations so 
that then end result is a dataframe with 152 columns and number of rows 
depending on the entry with the greatest number of hits. The full joins 
I can do with dplyr, but getting up to that point seems rather difficult.

This would get me what my ultimate goal would be; each Regulator is a 
column name (152 columns) and a given row has either NA or the same hit.

    This seems very difficult to me, but I appreciate any attempt.

Matthew

On 4/30/2019 4:34 PM, David L Carlson wrote:
>  External Email - Use Caution
>
> I think we need more information. Can you give us the structure of the data 
> with str(YourDataFrame). Alternatively you could copy a small piece into your 
> email message by copying and pasting the results of the following code:
>
> dput(head(YourDataFrame))
>
> The data frame you present could not be a data frame since you say "hits" is 
> a factor with a variable number of elements. If each value of "hits" was a 
> single character string, it would only have 2 factor levels not 6 and your 
> efforts to parse the string would make more sense. Transposing to a data 
> frame would only be possible if each column was padded

[R] transpose and split dataframe

2019-04-30 Thread Matthew

I have a data frame that is a lot bigger but for simplicity sake we can 
say it looks like this:


Regulator    hits
AT1G69490    AT4G31950,AT5G24110,AT1G26380,AT1G05675
AT2G55980    AT2G85403,AT4G89223

   In other words:

data.frame : 2 obs. of 2 variables
$Regulator: Factor w/ 2 levels
$hits : Factor w/ 6 levels

  I want to transpose it so that Regulator is now the column headings 
and each of the AGI numbers now separated by commas is a row. So, 
AT1G69490 is now the header of the first column and AT4G31950 is row 1 
of column 1, AT5G24110 is row 2 of column 1, etc. AT2G55980 is header of 
column 2 and AT2G85403 is row 1 of column 2, etc.


  I have tried playing around with strsplit(TF2list[2:2]) and 
strsplit(as.character(TF2list[2:2]), but I am getting nowhere.


Matthew

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Define pch and color based on two different columns

2019-04-09 Thread Matthew Snyder

You are not late to the party. And you solved it!

Thank you very much. You just made my PhD a little closer to reality!

Matt



*Matthew R. Snyder*
*~*
PhD Candidate
University Fellow
University of Toledo
Computational biologist, ecologist, and bioinformatician
Sponsored Guest Researcher at NOAA PMEL, Seattle, WA.
matthew.snyd...@rockets.utoledo.edu
msnyder...@gmail.com



[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;>
04/09/19,
10:01:53 PM

On Tue, Apr 9, 2019 at 9:37 PM Peter Langfelder 
wrote:

> Sorry for being late to the party, but has anyone suggested a minor
> but important modification of the code from stack exchange?
>
> xyplot(mpg ~ wt | cyl,
>   panel = function(x, y, ..., groups, subscripts) {
>   pch <- mypch[factor(carb)[subscripts]]
>   col <- mycol[factor(gear)[subscripts]]
>   grp <- c(gear,carb)
>   panel.xyplot(x, y, pch = pch, col = col)
>   }
> )
>
> From the little I understand about what you're trying to do, this may
> just do the trick.
>
> Peter
>
> On Tue, Apr 9, 2019 at 2:43 PM Matthew Snyder 
> wrote:
> >
> > I am making a lattice plot and I would like to use the value in one
> column
> > to define the pch and another column to define color of points. Something
> > like:
> >
> > xyplot(mpg ~ wt | cyl,
> >data=mtcars,
> >col = gear,
> >pch = carb
> > )
> >
> > There are unique pch points in the second and third panels, but these
> > points are only unique within the plots, not among all the plots (as they
> > should be). You can see this if you use the following code:
> >
> > xyplot(mpg ~ wt | cyl,
> >data=mtcars,
> >groups = carb
> > )
> >
> > This plot looks great for one group, but if you try to invoke two groups
> > using c(gear, carb) I think it simply takes unique combinations of those
> > two variables and plots them as unique colors.
> >
> > Another solution given by a StackExchange user:
> >
> > mypch <- 1:6
> > mycol <- 1:3
> >
> > xyplot(mpg ~ wt | cyl,
> >   panel = function(x, y, ..., groups, subscripts) {
> >   pch <- mypch[factor(carb[subscripts])]
> >   col <- mycol[factor(gear[subscripts])]
> >   grp <- c(gear,carb)
> >   panel.xyplot(x, y, pch = pch, col = col)
> >   }
> > )
> >
> > This solution has the same problems as the code at the top. I think the
> > issue causing problems with both solutions is that not every value for
> each
> > group is present in each panel, and they are almost never in the same
> > order. I think R is just interpreting the appearance of unique values as
> a
> > signal to change to the next pch or color. My actual data file is very
> > large, and it's not possible to sort my way out of this mess. It would be
> > best if I could just use the value in two columns to actually define a
> > color or pch for each point on an entire plot. Is there a way to do this?
> >
> > Ps, I had to post this via email because the Nabble site kept sending me
> an
> > error message: "Message rejected by filter rule match"
> >
> > Thanks,
> > Matt
> >
> >
> >
> > *Matthew R. Snyder*
> > *~*
> > PhD Candidate
> > University Fellow
> > University of Toledo
> > Computational biologist, ecologist, and bioinformatician
> > Sponsored Guest Researcher at NOAA PMEL, Seattle, WA.
> > matthew.snyd...@rockets.utoledo.edu
> > msnyder...@gmail.com
> >
> >
> >
> > [image: Mailtrack]
> > <
> https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;
> >
> > Sender
> > notified by
> > Mailtrack
> > <
> https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;
> >
> > 04/09/19,
> > 1:49:27 PM
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Define pch and color based on two different columns

2019-04-09 Thread Matthew Snyder

I tried this too:

xyplot(mpg ~ wt | cyl, data=mtcars,
   # groups = carb,
   subscripts = TRUE,
   col = as.factor(mtcars$gear),
   pch = as.factor(mtcars$carb)
)

Same problem...


*Matthew R. Snyder*
*~*
PhD Candidate
University Fellow
University of Toledo
Computational biologist, ecologist, and bioinformatician
Sponsored Guest Researcher at NOAA PMEL, Seattle, WA.
matthew.snyd...@rockets.utoledo.edu
msnyder...@gmail.com




[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;>
04/09/19,
9:28:11 PM

On Tue, Apr 9, 2019 at 8:18 PM Jeff Newmiller 
wrote:

> Maybe you should use factors rather than character columns.
>
> On April 9, 2019 8:09:43 PM PDT, Matthew Snyder 
> wrote:
> >Thanks, Jim.
> >
> >I appreciate your contributed answer, but neither of those make the
> >desired
> >plot either. I'm actually kind of shocked this isn't an easier more
> >straightforward thing. It seems like this would be something that a
> >user
> >would want to do frequently. I can actually do this for single plots in
> >ggplot. Maybe I should contact the authors of lattice and see if this
> >is
> >something they can help me with or if they would like to add this as a
> >feature in the future...
> >
> >Matt
> >
> >
> >
> >*Matthew R. Snyder*
> >*~*
> >PhD Candidate
> >University Fellow
> >University of Toledo
> >Computational biologist, ecologist, and bioinformatician
> >Sponsored Guest Researcher at NOAA PMEL, Seattle, WA.
> >matthew.snyd...@rockets.utoledo.edu
> >msnyder...@gmail.com
> >
> >
> >
> >[image: Mailtrack]
> ><
> https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;
> >
> >Sender
> >notified by
> >Mailtrack
> ><
> https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;
> >
> >04/09/19,
> >7:52:27 PM
> >
> >On Tue, Apr 9, 2019 at 4:53 PM Jim Lemon  wrote:
> >
> >> Hi Matthew,
> >> How about this?
> >>
> >> library(lattice)
> >> xyplot(mpg ~ wt | cyl,
> >>data=mtcars,
> >>col = mtcars$gear,
> >>pch = mtcars$carb
> >> )
> >> library(plotrix)
> >> grange<-range(mtcars$gear)
> >> xyplot(mpg ~ wt | cyl,
> >>data=mtcars,
> >>col =
> >> color.scale(mtcars$gear,extremes=c("blue","red"),xrange=grange),
> >>pch = as.character(mtcars$carb)
> >> )
> >>
> >> Jim
> >>
> >> On Wed, Apr 10, 2019 at 7:43 AM Matthew Snyder 
> >> wrote:
> >> >
> >> > I am making a lattice plot and I would like to use the value in one
> >> column
> >> > to define the pch and another column to define color of points.
> >Something
> >> > like:
> >> >
> >> > xyplot(mpg ~ wt | cyl,
> >> >data=mtcars,
> >> >col = gear,
> >> >pch = carb
> >> > )
> >> >
> >> > There are unique pch points in the second and third panels, but
> >these
> >> > points are only unique within the plots, not among all the plots
> >(as they
> >> > should be). You can see this if you use the following code:
> >> >
> >> > xyplot(mpg ~ wt | cyl,
> >> >data=mtcars,
> >> >groups = carb
> >> > )
> >> >
> >> > This plot looks great for one group, but if you try to invoke two
> >groups
> >> > using c(gear, carb) I think it simply takes unique combinations of
> >those
> >> > two variables and plots them as unique colors.
> >> >
> >> > Another solution given by a StackExchange user:
> >> >
> >> > mypch <- 1:6
> >> > mycol <- 1:3
> >> >
> >> > xyplot(mpg ~ wt | cyl,
> >> >   panel = function(x, y, ..., groups, subscripts) {
> >> >   pch <- mypch[factor(carb[subscripts])]
> >> >   col <- mycol[factor(gear[subscripts])]
> >> >   grp <- c(gear,carb)
> >> >   panel.xyplot(x, y, pch = pch, col = col)
> >> >   }
> >> > )
> >> >
> >> > This solution has t

Re: [R] Define pch and color based on two different columns

2019-04-09 Thread Matthew Snyder

I want to have one column in a dataframe define the color and another
define the pch.

This can be done easily with a single panel:

xyplot(mpg ~ wt,
   data=mtcars,
   col = mtcars$gear,
   pch = mtcars$carb
)

This produces the expected result: two pch that are the same color are
unique in the whole plot. But when you add cyl as a factor. Those two
points are only unique within their respective panels, and not across the
whole plot.

Matt



*Matthew R. Snyder*
*~*
PhD Candidate
University Fellow
University of Toledo
Computational biologist, ecologist, and bioinformatician
Sponsored Guest Researcher at NOAA PMEL, Seattle, WA.
matthew.snyd...@rockets.utoledo.edu
msnyder...@gmail.com




[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;>
04/09/19,
9:26:09 PM

On Tue, Apr 9, 2019 at 9:23 PM Bert Gunter  wrote:

> 1. I am quite sure that whatever it is that you want to do can be done.
> Probably straightforwardly. The various R graphics systems are mature and
> extensive.
>
> 2. But I, for one, do not understand from your post what it is that you
> want to do.  Nor does anyone else apparently.
>
> Cheers,
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Tue, Apr 9, 2019 at 8:10 PM Matthew Snyder 
> wrote:
>
>> Thanks, Jim.
>>
>> I appreciate your contributed answer, but neither of those make the
>> desired
>> plot either. I'm actually kind of shocked this isn't an easier more
>> straightforward thing. It seems like this would be something that a user
>> would want to do frequently. I can actually do this for single plots in
>> ggplot. Maybe I should contact the authors of lattice and see if this is
>> something they can help me with or if they would like to add this as a
>> feature in the future...
>>
>> Matt
>>
>>
>>
>> *Matthew R. Snyder*
>> *~*
>> PhD Candidate
>> University Fellow
>> University of Toledo
>> Computational biologist, ecologist, and bioinformatician
>> Sponsored Guest Researcher at NOAA PMEL, Seattle, WA.
>> matthew.snyd...@rockets.utoledo.edu
>> msnyder...@gmail.com
>>
>>
>>
>> [image: Mailtrack]
>> <
>> https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;
>> >
>> Sender
>> notified by
>> Mailtrack
>> <
>> https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;
>> >
>> 04/09/19,
>> 7:52:27 PM
>>
>> On Tue, Apr 9, 2019 at 4:53 PM Jim Lemon  wrote:
>>
>> > Hi Matthew,
>> > How about this?
>> >
>> > library(lattice)
>> > xyplot(mpg ~ wt | cyl,
>> >data=mtcars,
>> >col = mtcars$gear,
>> >pch = mtcars$carb
>> > )
>> > library(plotrix)
>> > grange<-range(mtcars$gear)
>> > xyplot(mpg ~ wt | cyl,
>> >data=mtcars,
>> >col =
>> > color.scale(mtcars$gear,extremes=c("blue","red"),xrange=grange),
>> >pch = as.character(mtcars$carb)
>> > )
>> >
>> > Jim
>> >
>> > On Wed, Apr 10, 2019 at 7:43 AM Matthew Snyder 
>> > wrote:
>> > >
>> > > I am making a lattice plot and I would like to use the value in one
>> > column
>> > > to define the pch and another column to define color of points.
>> Something
>> > > like:
>> > >
>> > > xyplot(mpg ~ wt | cyl,
>> > >data=mtcars,
>> > >col = gear,
>> > >pch = carb
>> > > )
>> > >
>> > > There are unique pch points in the second and third panels, but these
>> > > points are only unique within the plots, not among all the plots (as
>> they
>> > > should be). You can see this if you use the following code:
>> > >
>> > > xyplot(mpg ~ wt | cyl,
>> > >data=mtcars,
>> > >groups = carb
>> > > )
>> > >
>> > > This plot looks great for one group, but if you try to invoke two
>> groups
>> > > using c(gear, carb) I think it simply takes unique combinations of
>> those
>> > > two variables and plots them as uniq

Re: [R] Define pch and color based on two different columns

2019-04-09 Thread Matthew Snyder

Thanks, Jim.

I appreciate your contributed answer, but neither of those make the desired
plot either. I'm actually kind of shocked this isn't an easier more
straightforward thing. It seems like this would be something that a user
would want to do frequently. I can actually do this for single plots in
ggplot. Maybe I should contact the authors of lattice and see if this is
something they can help me with or if they would like to add this as a
feature in the future...

Matt



*Matthew R. Snyder*
*~*
PhD Candidate
University Fellow
University of Toledo
Computational biologist, ecologist, and bioinformatician
Sponsored Guest Researcher at NOAA PMEL, Seattle, WA.
matthew.snyd...@rockets.utoledo.edu
msnyder...@gmail.com



[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;>
04/09/19,
7:52:27 PM

On Tue, Apr 9, 2019 at 4:53 PM Jim Lemon  wrote:

> Hi Matthew,
> How about this?
>
> library(lattice)
> xyplot(mpg ~ wt | cyl,
>data=mtcars,
>col = mtcars$gear,
>pch = mtcars$carb
> )
> library(plotrix)
> grange<-range(mtcars$gear)
> xyplot(mpg ~ wt | cyl,
>data=mtcars,
>col =
> color.scale(mtcars$gear,extremes=c("blue","red"),xrange=grange),
>    pch = as.character(mtcars$carb)
> )
>
> Jim
>
> On Wed, Apr 10, 2019 at 7:43 AM Matthew Snyder 
> wrote:
> >
> > I am making a lattice plot and I would like to use the value in one
> column
> > to define the pch and another column to define color of points. Something
> > like:
> >
> > xyplot(mpg ~ wt | cyl,
> >data=mtcars,
> >col = gear,
> >pch = carb
> > )
> >
> > There are unique pch points in the second and third panels, but these
> > points are only unique within the plots, not among all the plots (as they
> > should be). You can see this if you use the following code:
> >
> > xyplot(mpg ~ wt | cyl,
> >data=mtcars,
> >groups = carb
> > )
> >
> > This plot looks great for one group, but if you try to invoke two groups
> > using c(gear, carb) I think it simply takes unique combinations of those
> > two variables and plots them as unique colors.
> >
> > Another solution given by a StackExchange user:
> >
> > mypch <- 1:6
> > mycol <- 1:3
> >
> > xyplot(mpg ~ wt | cyl,
> >   panel = function(x, y, ..., groups, subscripts) {
> >   pch <- mypch[factor(carb[subscripts])]
> >   col <- mycol[factor(gear[subscripts])]
> >   grp <- c(gear,carb)
> >   panel.xyplot(x, y, pch = pch, col = col)
> >   }
> > )
> >
> > This solution has the same problems as the code at the top. I think the
> > issue causing problems with both solutions is that not every value for
> each
> > group is present in each panel, and they are almost never in the same
> > order. I think R is just interpreting the appearance of unique values as
> a
> > signal to change to the next pch or color. My actual data file is very
> > large, and it's not possible to sort my way out of this mess. It would be
> > best if I could just use the value in two columns to actually define a
> > color or pch for each point on an entire plot. Is there a way to do this?
> >
> > Ps, I had to post this via email because the Nabble site kept sending me
> an
> > error message: "Message rejected by filter rule match"
> >
> > Thanks,
> > Matt
> >
> >
> >
> > *Matthew R. Snyder*
> > *~*
> > PhD Candidate
> > University Fellow
> > University of Toledo
> > Computational biologist, ecologist, and bioinformatician
> > Sponsored Guest Researcher at NOAA PMEL, Seattle, WA.
> > matthew.snyd...@rockets.utoledo.edu
> > msnyder...@gmail.com
> >
> >
> >
> > [image: Mailtrack]
> > <
> https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;
> >
> > Sender
> > notified by
> > Mailtrack
> > <
> https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;
> >
> > 04/09/19,
> > 1:49:27 PM
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Define pch and color based on two different columns

2019-04-09 Thread Matthew Snyder

I am making a lattice plot and I would like to use the value in one column
to define the pch and another column to define color of points. Something
like:

xyplot(mpg ~ wt | cyl,
   data=mtcars,
   col = gear,
   pch = carb
)

There are unique pch points in the second and third panels, but these
points are only unique within the plots, not among all the plots (as they
should be). You can see this if you use the following code:

xyplot(mpg ~ wt | cyl,
   data=mtcars,
   groups = carb
)

This plot looks great for one group, but if you try to invoke two groups
using c(gear, carb) I think it simply takes unique combinations of those
two variables and plots them as unique colors.

Another solution given by a StackExchange user:

mypch <- 1:6
mycol <- 1:3

xyplot(mpg ~ wt | cyl,
  panel = function(x, y, ..., groups, subscripts) {
  pch <- mypch[factor(carb[subscripts])]
  col <- mycol[factor(gear[subscripts])]
  grp <- c(gear,carb)
  panel.xyplot(x, y, pch = pch, col = col)
  }
)

This solution has the same problems as the code at the top. I think the
issue causing problems with both solutions is that not every value for each
group is present in each panel, and they are almost never in the same
order. I think R is just interpreting the appearance of unique values as a
signal to change to the next pch or color. My actual data file is very
large, and it's not possible to sort my way out of this mess. It would be
best if I could just use the value in two columns to actually define a
color or pch for each point on an entire plot. Is there a way to do this?

Ps, I had to post this via email because the Nabble site kept sending me an
error message: "Message rejected by filter rule match"

Thanks,
Matt



*Matthew R. Snyder*
*~*
PhD Candidate
University Fellow
University of Toledo
Computational biologist, ecologist, and bioinformatician
Sponsored Guest Researcher at NOAA PMEL, Seattle, WA.
matthew.snyd...@rockets.utoledo.edu
msnyder...@gmail.com



[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;>
04/09/19,
1:49:27 PM

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] creating a dataframe with full_join and looping over a list of lists.

2019-03-25 Thread Matthew

This is fantastic !  It was exactly what I was looking for. It is part 
of a larger Shiny app, so difficult to provide a working example as part 
of the post, and after figuring out how your code works ( I am an R 
novice), I made a couple of small tweaks and it works great !  Thank you 
very much, Jim, for the work you put into this.


Matthew

On 3/21/2019 11:01 PM, Jim Lemon wrote:

 External Email - Use Caution

Hi Matthew,
Remember, keep it on the list so that people know the status of the request.
I couldn't get this to work with the "_source_info_" variable. It
seems to be unreadable as a variable name. So, this _may_ be what you
want. I don't know if it can be done with "merge" and I don't know the
function "full_join".

WRKY8_colamp_a<-as.character(
  c("AT1G02920","AT1G06135","AT1G07160","AT1G11925","AT1G14540","AT1G16150",
  "AT1G21120","AT1G26380","AT1G26410","AT1G35210","AT1G49000","AT1G51920",
  "AT1G56250","AT1G66090","AT1G72520","AT1G80840","AT2G02010","AT2G18690",
  "AT2G30750","AT2G39200","AT2G43620","AT3G01830","AT3G54150","AT3G55840",
  "AT4G03460","AT4G11470","AT4G11890","AT4G14370","AT4G15417","AT4G15975",
  "AT4G31940","AT4G35180","AT5G01540","AT5G05300","AT5G11140","AT5G24110",
  "AT5G25250","AT5G36925","AT5G46295","AT5G64750","AT5G64905","AT5G66020"))

bHLH10_col_a<-as.character(c("AT1G72520","AT3G55840","AT5G20230","AT5G64750"))

bHLH10_colamp_a<-as.character(
  c("AT1G01560","AT1G02920","AT1G16420","AT1G17147","AT1G35210","AT1G51620",
  "AT1G57630","AT1G72520","AT2G18690","AT2G19190","AT2G40180","AT2G44370",
  "AT3G23250","AT3G55840","AT4G03460","AT4G04480","AT4G04540","AT4G08555",
  "AT4G11470","AT4G11890","AT4G16820","AT4G23280","AT4G35180","AT5G01540",
  "AT5G05300","AT5G20230","AT5G22530","AT5G24110","AT5G56960","AT5G57010",
  "AT5G57220","AT5G64750","AT5G66020"))

# let myenter be the sorted superset
myenter<-
  sort(unique(c(WRKY8_colamp_a,bHLH10_col_a,bHLH10_colamp_a)))

splice<-function(x,y) {
  nx<-length(x)
  ny<-length(y)
  newy<-rep(NA,nx)
  if(ny) {
   yi<-1
   for(xi in 1:nx) {
if(x[xi] == y[yi]) {
 newy[xi]<-y[yi]
 yi<-yi+1
}
if(yi>ny) break()
   }
  }
  return(newy)
}

comatgs<-list(WRKY8_colamp_a=WRKY8_colamp_a,
  bHLH10_col_a=bHLH10_col_a,bHLH10_colamp_a=bHLH10_colamp_a)
mydf3<-data.frame(myenter,stringsAsFactors=FALSE)
for(j in 1:length(comatgs)) {
  tmp<-data.frame(splice(myenter,sort(comatgs[[j]])))
  names(tmp)<-names(comatgs)[j]
  mydf3<-cbind(mydf3,tmp)
}

Jim

On Fri, Mar 22, 2019 at 10:29 AM Matthew
 wrote:

Hi Jim,

 Thanks for the reply.  That was pretty dumb of me.  I took that out of the 
loop.

comatgs is longer than this but here is a sample of 4 of 569 elements:

$WRKY8_colamp_a
  [1] "AT1G02920" "AT1G06135" "AT1G07160" "AT1G11925" "AT1G14540" "AT1G16150" 
"AT1G21120"
  [8] "AT1G26380" "AT1G26410" "AT1G35210" "AT1G49000" "AT1G51920" "AT1G56250" 
"AT1G66090"
[15] "AT1G72520" "AT1G80840" "AT2G02010" "AT2G18690" "AT2G30750" "AT2G39200" 
"AT2G43620"
[22] "AT3G01830" "AT3G54150" "AT3G55840" "AT4G03460" "AT4G11470" "AT4G11890" 
"AT4G14370"
[29] "AT4G15417" "AT4G15975" "AT4G31940" "AT4G35180" "AT5G01540" "AT5G05300" 
"AT5G11140"
[36] "AT5G24110" "AT5G25250" "AT5G36925" "AT5G46295" "AT5G64750" "AT5G64905" 
"AT5G66020"

$`_source_info_`
character(0)

$bHLH10_col_a
[1] "AT1G72520" "AT3G55840" "AT5G20230" "AT5G64750"

$bHLH10_colamp_a
  [1] "AT1G01560" "AT1G02920" "AT1G16420" "AT1G17147" "AT1G35210" "AT1G51620" 
"AT1G57630"
  [8] "AT1G72520" "AT2G18690" "AT2G19190" "AT2G40180" "AT2G44370" "AT3G2325

[R] creating a dataframe with full_join and looping over a list of lists.

2019-03-21 Thread Matthew

   My apologies, my first e-mail formatted very poorly when sent, so I am 
trying again with something I hope will be less confusing.

I have been trying create a dataframe by looping through a list of lists,
and using dplyr's full_join so as to keep common elements on the same row.
But, I have a couple of problems.

1) The lists have different numbers of elements.

2) In the final dataframe, I would like the column names to be the names
of the lists.

Is it possible ?

Code: *for(j in avector){mydf3 <- data.frame(myenter) atglsts <- 
as.data.frame(comatgs[j]) mydf3 <- full_join(mydf3, atglsts) }* 
Explanation: # Start out with a list, myenter, to dataframe. mydf3 now 
has 1 column. # This first column will be the longest column in the 
final mydf3. # Loop through a list of lists, comatgs, and with each loop 
a particular list # is made into a dataframe of one column, atglsts. # 
The name of the column is the name of the list. # Each atglsts dataframe 
has a different number of elements. # What I want to do, is to add the 
newly made dataframe, atglsts, as a # new column of the data frame, 
mydf3 using full_join # in order to keep common elements on the same 
row. # I could rename the colname to 'AGI' so that I can join by 'AGI', 
# but then I would lose the name of the list. # In the final dataframe, 
I want to know the name of the original list # the column was made from. Matthew




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] creating a dataframe with full_join and looping over a list of lists

2019-03-21 Thread Matthew

I have been trying create a dataframe by looping through a list of lists,

and using dplyr's full_join so as to keep common elements on the same row.

But, I have a couple of problems.

1) The lists have different numbers of elements.

2) In the final dataframe, I would like the column names to be the names
of the lists.

Is it possible ?

for(j in avector){

mydf3 <- data.frame(myenter) # Start out with a list,
myenter, to dataframe. mydf3 now has 1 column.
# This
first column will be the longest column in the final mydf3.
atglsts <- as.data.frame(comatgs[j]) # Loop through a list of
lists, comatgs, and with each loop a particular list
# is made
into a dataframe of one column, atglsts.
# The
name of the column is the name of the list.
# Each
atglsts dataframe has a different number of elements.
mydf3 <- full_join(mydf3, atglsts) # What I want to do, is to
add the newly made dataframe, atglsts, as a
} # new
column of the data frame, mydf3 using full_join
# in order
to keep common elements on the same row.
# I could
rename the colname to 'AGI' so that I can join by 'AGI',
# but then
I would lose the name of the list.
# In the
final dataframe, I want to know the name of the original list

# the column was made from.

Matthew

[[alternative HTML version deleted]]

[R] Defining Variables from a Matrix for 10-Fold Cross Validation

2018-10-09 Thread matthew campbell

Good afternoon,

I am trying to run a 10-fold CV, using a matrix as my data set.
Essentially, I want "y" to be the first column of the matrix, and my "x" to
be all remaining columns (2-257). I've posted some of the code I used
below, and the data set (called "zip.train") is in the "ElemStatLearn"
package. The error message is highlighted in red, and the corresponding
section of code is bolded. (I am not concerned with the warning message,
just the error message).

The issue I am experiencing is the error message below the code: I haven't
come across that specific message before, and am not exactly sure how to
interpret its meaning. What exactly is this error message trying to tell
me?  Any suggestions or insights are appreciated!

Thank you all,

Matthew Campbell


> library (ElemStatLearn)
> library(kknn)
> data(zip.train)
> train=zip.train[which(zip.train[,1] %in% c(2,3)),]
> test=zip.test[which(zip.test[,1] %in% c(2,3)),]
> nfold = 10
> infold = sample(rep(1:10, length.out = (x)))
Warning message:
In rep(1:10, length.out = (x)) :
  first element used of 'length.out' argument
>
*> mydata = data.frame(x = train[ , c(2,257)] , y = train[ , 1])*
>
> K = 20
> errorMatrix = matrix(NA, K, 10)
>
> for (l in nfold)
+ {
+   for (k in 1:20)
+   {
+ knn.fit = kknn(y ~ x, train = mydata[infold != l, ], test =
mydata[infold == l, ], k = k)
+ errorMatrix[k, l] = mean((knn.fit$fitted.values - mydata$y[infold ==
l])^2)
+   }
+ }
Error in model.frame.default(formula, data = train) :
  variable lengths differ (found for 'x')

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] security using R at work

2018-08-09 Thread john matthew via R-help

Hi Katherina.
Good point you make. What makes your IT department happy with the use of R
studio server? What are the safe packages?

Can I trust your answer? :)
John.



On 9 Aug 2018 10:38, "Fritsch, Katharina (NNL) via R-help" <
r-help@r-project.org> wrote:

> Hiya,
> I work in a very security conscious organisation and we happily use R. The
> average user can only use R via RStudio Server, with a limited number of
> packages available, so that adds an additional level of control.
> That said, are you sure that the sentence 'a few people on a mailing list
> said it would be alright' is going to convince your IT department of the
> harmlessness of R?
> Cheers,
> Katharina.
>
> --
>
> Dr Katharina Fritsch B.Sc. M.Sc. MRSC
> Chemical Modeller, Chemical and Process Modelling
>
>
> E.
> katharina.frit...@nnl.co.uk
> T.
> +44 (0)1925 289387
> @uknnl
>
> National Nuclear Laboratory Limited, 5th Floor, Chadwick House,
> Birchwood Park, Warrington, WA3 6AE, UK
>
> www.nnl.co.uk
>
>
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Laurence
> Clark
> Sent: 08 August 2018 16:10
> To: 'r-help@r-project.org'
> Subject: [R] security using R at work
>
> Hello all,
>
> I want to download R and use it for work purposes. I hope to use it to
> analyse very sensitive data from our clients.
>
> My question is:
>
> If I install R on my work network computer, will the data ever leave our
> network? I need to know if the data goes anywhere other than our network,
> because this could compromise it's security. Is there is any chance the
> data could go to a server owned by 'R' or anything else that's not
> immediately obvious, but constitutes the data leaving our network?
>
> Thank you
>
> Laurence
>
>
> 
> 
> --
> Laurence Clark
> Business Data Analyst
> Account Management
> Health Management Ltd
>
> Mobile: 07584 556498
> Switchboard:0845 504 1000
> Email:  laurence.cl...@healthmanltd.com
> Web:BLOCKEDhealthmanagement[.]co[.]ukBLOCKED
>
> 
> 
> --
> CONFIDENTIALITY NOTICE: This email, including attachments, is for the sole
> use of the intended recipients and may contain confidential and privileged
> information or otherwise be protected by law. Any unauthorised review, use,
> disclosure or distribution is prohibited. If you are not the intended
> recipient, please contact the sender, and destroy all copies and the
> original message.MAXIMUS People Services Limited is registered in
> England and Wales (registered number: 03752300); registered office: 202 -
> 206 Union Street, London, SE1 0LX, United Kingdom. The Centre for Health
> and Disability Assessments Ltd (registered number: 9072343) and Health
> Management Ltd (registered number: 4369949) are registered in England and
> Wales. The registered office for each is Ash House, The Broyle, Ringmer,
> East Sussex, BN8 5NN, United Kingdom. Remploy Limited is registered in
> England and Wales (registered number: 09457025); registered office: 18c
> Meridian East, Meridian Business Park, Leicester, L
>  eicestershire, LE19 1WZ, United Kingdom.
> 
> 
> --
>
>
> 
> 
> --
>
>
> 
> #
> Scanned by MailMarshal - M86 Security's comprehensive email content
> security solution.
> Download a free evaluation of MailMarshal at BLOCKEDm86security[.]
> comBLOCKED
> 
> #
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> BLOCKEDstat[.]ethz[.]ch/mailman/listinfo/r-helpBLOCKED
> PLEASE do read the posting guide BLOCKEDR-project[.]org/
> posting-guide[.]htmlBLOCKED
> and provide commented, minimal, self-contained, reproducible code.
> 
> *
> This message was received by the Cloud Security Email Gateway
>
> and was checked for Viruses and SPAM by the Cloud Security Email
> Management Service.
> Please forward any suspicious or unwanted emails to "Spam Helpdesk"
> 
> *
>
>
> This e-mail is from National Nuclear Laboratory Limited ("NNL"). This

Re: [R] sub/grep question: extract year

2018-08-09 Thread john matthew via R-help

So there is probably a command that resets the capture variables as I call
them. No doubt someone will write what it is.

On 9 Aug 2018 10:36, "john matthew"  wrote:

> Hi Marc.
> For question 1.
> I know in Perl that regular expressions when captured can be saved if not
> overwritten. \\1 is the capture variable in your R examples.
>
> So the 2nd regular expression does not match but \\1 still has 1980
> captured from the previous expression, hence the result.
>
> Maybe if you restart R and try your 2nd expression first, \\1 will be
> empty or no match result.
>
> Just speculation :)
>
> John
>
>
> On 9 Aug 2018 08:58, "Marc Girondot via R-help" 
> wrote:
>
>> Hi everybody,
>>
>> I have some questions about the way that sub is working. I hope that
>> someone has the answer:
>>
>> 1/ Why the second example does not return an empty string ? There is no
>> match.
>>
>> subtext <- "-1980-"
>> sub(".*(1980).*", "\\1", subtext) # return 1980
>> sub(".*(1981).*", "\\1", subtext) # return -1980-
>>
>> 2/ Based on sub documentation, it replaces the first occurence of a
>> pattern: why it does not return 1980 ?
>>
>> subtext <- " 1980 1981 "
>> sub(".*(198[01]).*", "\\1", subtext) # return 1981
>>
>> 3/ I want extract year from text; I use:
>>
>> subtext <- "bla 1980 bla"
>> sub(".*[ \\.\\(-]([12][01289][0-9][0-9])[ \\.\\)-].*", "\\1", subtext) #
>> return 1980
>> subtext <- "bla 2010 bla"
>> sub(".*[ \\.\\(-]([12][01289][0-9][0-9])[ \\.\\)-].*", "\\1", subtext) #
>> return 2010
>>
>> but
>>
>> subtext <- "bla 1010 bla"
>> sub(".*[ \\.\\(-]([12][01289][0-9][0-9])[ \\.\\)-].*", "\\1", subtext) #
>> return 1010
>>
>> I would like exclude the case 1010 and other like this.
>>
>> The solution would be:
>>
>> 18[0-9][0-9] or 19[0-9][0-9] or 200[0-9] or 201[0-9]
>>
>> Is there a solution to write such a pattern in grep ?
>>
>> Thanks a lot
>>
>> Marc
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sub/grep question: extract year

2018-08-09 Thread john matthew via R-help

Hi Marc.
For question 1.
I know in Perl that regular expressions when captured can be saved if not
overwritten. \\1 is the capture variable in your R examples.

So the 2nd regular expression does not match but \\1 still has 1980
captured from the previous expression, hence the result.

Maybe if you restart R and try your 2nd expression first, \\1 will be empty
or no match result.

Just speculation :)

John

On 9 Aug 2018 08:58, "Marc Girondot via R-help" 
wrote:

> Hi everybody,
>
> I have some questions about the way that sub is working. I hope that
> someone has the answer:
>
> 1/ Why the second example does not return an empty string ? There is no
> match.
>
> subtext <- "-1980-"
> sub(".*(1980).*", "\\1", subtext) # return 1980
> sub(".*(1981).*", "\\1", subtext) # return -1980-
>
> 2/ Based on sub documentation, it replaces the first occurence of a
> pattern: why it does not return 1980 ?
>
> subtext <- " 1980 1981 "
> sub(".*(198[01]).*", "\\1", subtext) # return 1981
>
> 3/ I want extract year from text; I use:
>
> subtext <- "bla 1980 bla"
> sub(".*[ \\.\\(-]([12][01289][0-9][0-9])[ \\.\\)-].*", "\\1", subtext) #
> return 1980
> subtext <- "bla 2010 bla"
> sub(".*[ \\.\\(-]([12][01289][0-9][0-9])[ \\.\\)-].*", "\\1", subtext) #
> return 2010
>
> but
>
> subtext <- "bla 1010 bla"
> sub(".*[ \\.\\(-]([12][01289][0-9][0-9])[ \\.\\)-].*", "\\1", subtext) #
> return 1010
>
> I would like exclude the case 1010 and other like this.
>
> The solution would be:
>
> 18[0-9][0-9] or 19[0-9][0-9] or 200[0-9] or 201[0-9]
>
> Is there a solution to write such a pattern in grep ?
>
> Thanks a lot
>
> Marc
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] security using R at work

2018-08-09 Thread john matthew via R-help

Hello Laurence.
Taking a pragmatic approach.

If the data is so valuable and secret but also needs some analysis in R,
here is suggested steps to minimise security risks.

1. Plan the analysis up front, what exactly what you want and the outcomes.
2. Take a laptop with Internet, install R and all packages needed for the
planned analysis.
3. Unplug ethernet and turn off blue tooth and wifi. So no internet access
at all.
4. Bring your secret data via USB or cd.
5. Perform the R analysis and export reports and figures etc to safe place.
6. Delete R, the data and all packages from laptop before using online
again.

A bit extreme and may still be some risk but its minimal as the analysis
was done offline, and you removed R etc after. But now have a set of R
results.

Just an idea.

John.


On 8 Aug 2018 16:53, "Laurence Clark" 
wrote:

> Hello all,
>
> I want to download R and use it for work purposes. I hope to use it to
> analyse very sensitive data from our clients.
>
> My question is:
>
> If I install R on my work network computer, will the data ever leave our
> network? I need to know if the data goes anywhere other than our network,
> because this could compromise it's security. Is there is any chance the
> data could go to a server owned by 'R' or anything else that's not
> immediately obvious, but constitutes the data leaving our network?
>
> Thank you
>
> Laurence
>
>
> 
> 
> --
> Laurence Clark
> Business Data Analyst
> Account Management
> Health Management Ltd
>
> Mobile: 07584 556498
> Switchboard:0845 504 1000
> Email:  laurence.cl...@healthmanltd.com
> Web:www.healthmanagement.co.uk
>
> 
> 
> --
> CONFIDENTIALITY NOTICE: This email, including attachments, is for the sole
> use of the intended recipients and may contain confidential and privileged
> information or otherwise be protected by law. Any unauthorised review, use,
> disclosure or distribution is prohibited. If you are not the intended
> recipient, please contact the sender, and destroy all copies and the
> original message.MAXIMUS People Services Limited is registered in
> England and Wales (registered number: 03752300); registered office: 202 -
> 206 Union Street, London, SE1 0LX, United Kingdom. The Centre for Health
> and Disability Assessments Ltd (registered number: 9072343) and Health
> Management Ltd (registered number: 4369949) are registered in England and
> Wales. The registered office for each is Ash House, The Broyle, Ringmer,
> East Sussex, BN8 5NN, United Kingdom. Remploy Limited is registered in
> England and Wales (registered number: 09457025); registered office: 18c
> Meridian East, Meridian Business Park, Leicester, Leicestershire, LE19 1WZ,
> United Kingdom.
> 
> 
> --
>
>
> 
> 
> --
>
>
> 
> #
> Scanned by MailMarshal - M86 Security's comprehensive email content
> security solution.
> Download a free evaluation of MailMarshal at www.m86security.com
> 
> #
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Breaking the samplesize package from CRAN

2018-07-27 Thread john matthew via R-help

Dear Bert,
Thanks for your answer, I already wrote to the maintainer/author of
samplesize, Ralph Scherer, on Thu, Apr 19, 2018 but still have no
answer.

Does anyone have any ideas? Thank you.

John.

On 26 July 2018 at 20:18, Bert Gunter  wrote:
> Suggest you contact the package maintainer.
>
> ?maintainer
>
> Cheers,
> Bert
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> On Thu, Jul 26, 2018 at 9:49 AM, john matthew via R-help
>  wrote:
>>
>> Hello all,
>>
>> I am using the samplesize package (n.ttest function) to calculate
>> number of samples per group power analysis (t-tests with unequal
>> variance).
>> I can break this n.ttest function from the samplesize package,
>> depending on the standard deviations I input.
>>
>> This works very good.
>>
>> n.ttest(sd1 = 0.35, sd2 = 0.22 , variance = "unequal")
>> # outputs
>> $`Total sample size`
>> [1] 8
>>
>> $`Sample size group 1`
>> [1] 5
>>
>> $`sample size group 2`
>> [1] 3
>>
>> Warning message:
>> In n.ttest(sd1 = 0.35, sd2 = 0.22, variance = "unequal") :
>>   Arguments -fraction- and -k- are not used, when variances are unequal
>> The warnings are fine and all is good.
>>
>>
>> But if I run it again with.
>> n.ttest(sd1 = 1.68, sd2 = 0.28 , variance = "unequal")
>> # outputs
>> Error in while (n.start <= n.temp) { :
>>   missing value where TRUE/FALSE needed
>> In addition: Warning messages:
>> 1: In n.ttest(sd1 = 1.68, sd2 = 0.28, variance = "unequal") :
>>   Arguments -fraction- and -k- are not used, when variances are unequal
>> 2: In qt(conf.level, df = df_approx) : NaNs produced
>> 3: In qt(power, df = df_approx) : NaNs produced
>>
>> It breaks.
>> The first obvious thing is that the standard deviations are a lot
>> different in the 2nd example that breaks, compared with the first run.
>>
>> Checking the code myself, I can see it breaks down when the variable
>> "df_approx" becomes a negative number, in a while loop from the
>> n.ttest function.
>> Exert of the code I am talking about.
>>
>> while (n.start <= n.temp) {
>> n.start <- n1 + n2 + 1
>> n1 <- n.start/(1 + k)
>> n2 <- (k * n.start)/(1 + k)
>> df_approx <- 1/((gamma)^2/(n1 - 1) + (1 - gamma)^2/(n2 - 1))   #
>> this calculation becomes negative and breaks subsequently
>> tkrit.alpha <- qt(conf.level, df = df_approx)
>> tkrit.beta <- qt(power, df = df_approx)
>> n.temp <- ((tkrit.alpha + tkrit.beta)^2)/(c^2)
>> }
>>
>> I can hard code df_approx to be an absolute value but I don't know if
>> that messes up the statistics.
>>
>> Can anyone help or any ideas? How to fix?
>>
>> John.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Breaking the samplesize package from CRAN

2018-07-26 Thread john matthew via R-help

Hello all,

I am using the samplesize package (n.ttest function) to calculate
number of samples per group power analysis (t-tests with unequal
variance).
I can break this n.ttest function from the samplesize package,
depending on the standard deviations I input.

This works very good.

n.ttest(sd1 = 0.35, sd2 = 0.22 , variance = "unequal")
# outputs
$`Total sample size`
[1] 8

$`Sample size group 1`
[1] 5

$`sample size group 2`
[1] 3

Warning message:
In n.ttest(sd1 = 0.35, sd2 = 0.22, variance = "unequal") :
  Arguments -fraction- and -k- are not used, when variances are unequal
The warnings are fine and all is good.


But if I run it again with.
n.ttest(sd1 = 1.68, sd2 = 0.28 , variance = "unequal")
# outputs
Error in while (n.start <= n.temp) { :
  missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In n.ttest(sd1 = 1.68, sd2 = 0.28, variance = "unequal") :
  Arguments -fraction- and -k- are not used, when variances are unequal
2: In qt(conf.level, df = df_approx) : NaNs produced
3: In qt(power, df = df_approx) : NaNs produced

It breaks.
The first obvious thing is that the standard deviations are a lot
different in the 2nd example that breaks, compared with the first run.

Checking the code myself, I can see it breaks down when the variable
"df_approx" becomes a negative number, in a while loop from the
n.ttest function.
Exert of the code I am talking about.

while (n.start <= n.temp) {
n.start <- n1 + n2 + 1
n1 <- n.start/(1 + k)
n2 <- (k * n.start)/(1 + k)
df_approx <- 1/((gamma)^2/(n1 - 1) + (1 - gamma)^2/(n2 - 1))   #
this calculation becomes negative and breaks subsequently
tkrit.alpha <- qt(conf.level, df = df_approx)
tkrit.beta <- qt(power, df = df_approx)
n.temp <- ((tkrit.alpha + tkrit.beta)^2)/(c^2)
}

I can hard code df_approx to be an absolute value but I don't know if
that messes up the statistics.

Can anyone help or any ideas? How to fix?

John.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fwrite() not found in data.table package

2017-10-02 Thread Matthew Keller

Thanks Jeff!

It turns out that my problem was that I tried to install the newest
data.table package while the old data.table package was loaded in R. Full
instructions for installing data.table are here:
https://github.com/Rdatatable/data.table/wiki/Installation

On Mon, Oct 2, 2017 at 10:55 AM, Jeff Newmiller <jdnew...@dcn.davis.ca.us>
wrote:

> You are asking about (a) a contributed package (b) for a package version
> that is not in CRAN and (c) an R version that is outdated, which stretches
> the definition of "on topic" here. Since that function does not appear to
> have been removed from that package (I am not installing a development
> version to test if it is broken for your benefit), I will throw out a guess
> that if you update R to 3.4.1 or 3.4.2 then things might start working. If
> not, I suggest you use the CRAN version of the package and create a
> reproducible example (check it with package reprex) and try again here, or
> ask one of the maintainers of that package.
> --
> Sent from my phone. Please excuse my brevity.
>
> On October 2, 2017 8:56:46 AM PDT, Matthew Keller <mckellerc...@gmail.com>
> wrote:
> >Hi all,
> >
> >I used to use fwrite() function in data.table but I cannot get it to
> >work
> >now. The function is not in the data.table package, even though a help
> >page
> >exists for it. My session info is below. Any ideas on how to get
> >fwrite()
> >to work would be much appreciated. Thanks!
> >
> >> sessionInfo()
> >R version 3.2.0 (2015-04-16)
> >Platform: x86_64-unknown-linux-gnu (64-bit)
> >Running under: Red Hat Enterprise Linux Server release 6.3 (Santiago)
> >
> >locale:
> > [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
> >LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
> >LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
> >LC_PAPER=en_US.UTF-8
> > [8] LC_NAME=C  LC_ADDRESS=C
> >LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8
> >LC_IDENTIFICATION=C
> >
> >attached base packages:
> >[1] stats graphics  grDevices utils datasets  methods   base
> >
> >other attached packages:
> >[1] data.table_1.10.5
> >
> >loaded via a namespace (and not attached):
> >[1] tools_3.2.0  chron_2.3-47 tcltk_3.2.0
>



-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] fwrite() not found in data.table package

2017-10-02 Thread Matthew Keller

Hi all,

I used to use fwrite() function in data.table but I cannot get it to work
now. The function is not in the data.table package, even though a help page
exists for it. My session info is below. Any ideas on how to get fwrite()
to work would be much appreciated. Thanks!

> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.3 (Santiago)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8LC_PAPER=en_US.UTF-8
 [8] LC_NAME=C  LC_ADDRESS=C
LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] data.table_1.10.5

loaded via a namespace (and not attached):
[1] tools_3.2.0  chron_2.3-47 tcltk_3.2.0

-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Why does residuals.coxph use naive.var?

2017-03-02 Thread Matthew Burgess

Hi all,

I noticed that the scaled Schoenfeld residuals produced by
residuals.coxph(fit, type="scaledsch") were different from those returned
by cox.zph for a model where robust standard errors have been estimated.
Looking at the source code for both functions suggests this is because
residuals.coxph uses the naive variance to scale the Schoenfeld residuals
whereas cox.zph uses the robust version when it is available.

Lines 20-21 of the version of residuals.coxph currently on github:

vv <- drop(object$naive.var)
if (is.null(vv)) vv <- drop(object$var)

i.e. the naive variance is used even when a robust version is available.

Why is this the case? Have I missed something? Am I right in thinking that
using the robust variance is the better choice if the intention is to check
the proportional hazards assumption?

Here is a reproducible example using the heart data:

data(heart)
fit <- coxph(Surv(start, stop, event) ~ year + age + surgery + cluster(id),
data=jasa1)
# Should return True since both produce the scaled Schoenfeld residuals
all(residuals(fit, type='scaledsch') == cox.zph(fit)$y)

Thanks for your help.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] use value in variable to be name of another variable

2016-07-11 Thread Matthew


Hi Rolf,

Thanks for the warning. I think because my initial efforts used the 
assign function, that Jim provided his solution using it.


Any suggestions for how it could be done without assign() ?

Matthew

On 7/11/2016 6:31 PM, Rolf Turner wrote:

On 12/07/16 10:13, Matthew wrote:

Hi Jim,

   Wow ! And it does exactly what I was looking for.  Thank you very 
much.


That assign function is pretty nice. I should become more familiar 
with it.


Indeed you should, and assign() is indeed nice and useful and handy. 
But it should be used with care and circumspection.  It *alters the 
global environment* which is fraught with peril. Generally speaking 
most things that can be done with assign() (and its companion function 
get()) are better and more safely done using lists and functions and 
other "natural" R-ish constructs. Resist the temptation to turn R into 
a macro language.


cheers,

Rolf Turner



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] use value in variable to be name of another variable

2016-07-11 Thread Matthew


Hi Jim,

   Wow ! And it does exactly what I was looking for.  Thank you very much.

That assign function is pretty nice. I should become more familiar with it.

Matthew


On 7/11/2016 5:59 PM, Jim Lemon wrote:

Hi Matthew,
This question is a bit mysterious as we don't know what the object
"chr" is. However, have a look at this and see if it is close to what
you want to do.

# set up a little matrix of character values
tTargTFS<-matrix(paste("A",rep(1:4,each=4),"B",rep(1:4,4),sep=""),ncol=4)
# try the assignment on the first row and column
assign(tTargTFS[1,1],tTargTFS[-1,1])
# see what it looks like - okay
A1B1
# run the assignment over the matrix
for(i in 1:4) assign(tTargTFS[1,i],tTargTFS[-1,i])
# see what the variables look like
A1B1
A2B1
A3B1
A4B1

It does what I would expect.

Jim


On Tue, Jul 12, 2016 at 6:01 AM, Matthew
<mccorm...@molbio.mgh.harvard.edu> wrote:

I want to get a value that has been assigned to a variable, and then use
that value to be the name of a variable.

For example,

tTargTFS[1,1]
# returns:
 V1
"AT1G01010"

Now, I want to make AT1G01010 the name of a variable:
AT1G01010 <- tTargTFS[-1,1]

Then, go to the next tTargTFS[1,2]. Which produces
V1
"AT1G01030"
And then,
AT1G01030 <- tTargTFS[-1,2]

I want to do this up to tTargTFS[1, 2666], so I want to do this in a script
and not manually.
tTargTFS is a list of 2: chr [1:265, 1:2666], but I also have the data in a
data frame of 265 observations of 2666 variables, if this data structure
makes things easier.

My initial attempts are not working. Starting with a test data structure
that is a little simpler I have tried:
for (i in 1:4)
{ ATG <- tTargTFS[1, i]
assign(cat(ATG), tTargTFS[-1, i]) }

Matthew

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] use value in variable to be name of another variable

2016-07-11 Thread Matthew

I want to get a value that has been assigned to a variable, and then use 
that value to be the name of a variable.


For example,

tTargTFS[1,1]
# returns:
V1
"AT1G01010"

Now, I want to make AT1G01010 the name of a variable:
AT1G01010 <- tTargTFS[-1,1]

Then, go to the next tTargTFS[1,2]. Which produces
   V1
"AT1G01030"
And then,
AT1G01030 <- tTargTFS[-1,2]

I want to do this up to tTargTFS[1, 2666], so I want to do this in a 
script and not manually.
tTargTFS is a list of 2: chr [1:265, 1:2666], but I also have the data 
in a data frame of 265 observations of 2666 variables, if this data 
structure makes things easier.


My initial attempts are not working. Starting with a test data structure 
that is a little simpler I have tried:

for (i in 1:4)
{ ATG <- tTargTFS[1, i]
assign(cat(ATG), tTargTFS[-1, i]) }

Matthew

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] identify duplicate entries in data frame and calculate mean

2016-05-24 Thread Matthew


Thank you very much, Dan.

These work great. Two more great answers to my question.

Matthew

On 5/24/2016 4:15 PM, Nordlund, Dan (DSHS/RDA) wrote:

You have several  options.

1.  You could use the aggregate function.  If your data frame is called DF, you 
could do something like

with(DF, aggregate(Length, list(Identifier), mean))

2.  You could use the dplyr package like this

library(dplyr)
summarize(group_by(DF, Identifier), mean(Length))


Hope this is helpful,

Dan

Daniel Nordlund, PhD
Research and Data Analysis Division
Services & Enterprise Support Administration
Washington State Department of Social and Health Services



-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Matthew
Sent: Tuesday, May 24, 2016 12:47 PM
To: r-help@r-project.org
Subject: [R] identify duplicate entries in data frame and calculate mean

I have a data frame with 10 columns.
In the last column is an alphaneumaric identifier.
For most rows, this alphaneumaric identifier is unique to the file, however
some of these alphanemeric idenitifiers occur in duplicate, triplicate or more.
When they do occur more than once they are in consecutive rows, so when
there is a duplicate or triplicate or quadruplicate (let's call them 
multiplicates),
they are in consecutive rows.

In column 7 there is an integer number (may or may not be unique. does not
matter).

I want to identify each multiple entries (multiplicates) occurring in column 10
and then for each multiplicate calculate the mean of the integers column 7.

As an example, I will show just two columns:
Length  Identifier
321 A234
350 A234
340 A234
180 B123
198 B225

What I want to do (in the above example) is collapse all the A234's and report
the mean to get this:
Length  Identifier
337 A234
180 B123
198 B225


Matthew

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] identify duplicate entries in data frame and calculate mean

2016-05-24 Thread Matthew

Thanks, Tom.  I was making a mistake looking at your example and that's 
what my problem was.

Cool answer, works great. Thank you very much.

Matthew

On 5/24/2016 4:23 PM, Tom Wright wrote:
> Don't see that as being a big problem. If your data grows then dplyr 
> supports connections to external databases. Alternately if you just 
> want a mean, most databases can do that directly in SQL.
>
> On Tue, May 24, 2016 at 4:17 PM, Matthew 
> <mccorm...@molbio.mgh.harvard.edu 
> <mailto:mccorm...@molbio.mgh.harvard.edu>> wrote:
>
> Thank you very much, Tom.
> This gets me thinking in the right direction.
> One thing I should have mentioned that I did not is that the
> number of rows in the data frame will be a little over 40,000 rows.
>
>
> On 5/24/2016 4:08 PM, Tom Wright wrote:
>> Using dplyr
>>
>> $ library(dplyr)
>> $ x<-data.frame(Length=c(321,350,340,180,198),
>> ID=c(rep('A234',3),'B123','B225') )
>> $ x %>% group_by(ID) %>% summarise(m=mean(Length))
>>
>>
>>
>> On Tue, May 24, 2016 at 3:46 PM, Matthew
>> <mccorm...@molbio.mgh.harvard.edu
>> <mailto:mccorm...@molbio.mgh.harvard.edu>> wrote:
>>
>> I have a data frame with 10 columns.
>> In the last column is an alphaneumaric identifier.
>> For most rows, this alphaneumaric identifier is unique to the
>> file, however some of these alphanemeric idenitifiers occur
>> in duplicate, triplicate or more. When they do occur more
>> than once they are in consecutive rows, so when there is a
>> duplicate or triplicate or quadruplicate (let's call them
>> multiplicates), they are in consecutive rows.
>>
>> In column 7 there is an integer number (may or may not be
>> unique. does not matter).
>>
>> I want to identify each multiple entries (multiplicates)
>> occurring in column 10 and then for each multiplicate
>> calculate the mean of the integers column 7.
>>
>> As an example, I will show just two columns:
>> Length  Identifier
>> 321 A234
>> 350 A234
>> 340 A234
>>     180 B123
>> 198 B225
>>
>> What I want to do (in the above example) is collapse all the
>> A234's and report the mean to get this:
>> Length  Identifier
>> 337 A234
>> 180 B123
>> 198 B225
>>
>>
>> Matthew
>>
>> __
>> R-help@r-project.org <mailto:R-help@r-project.org> mailing
>> list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible
>> code.
>>
>>
>
>


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] identify duplicate entries in data frame and calculate mean

2016-05-24 Thread Matthew

Thank you very much, Tom.
This gets me thinking in the right direction.
One thing I should have mentioned that I did not is that the number of 
rows in the data frame will be a little over 40,000 rows.

On 5/24/2016 4:08 PM, Tom Wright wrote:
> Using dplyr
>
> $ library(dplyr)
> $ x<-data.frame(Length=c(321,350,340,180,198),
> ID=c(rep('A234',3),'B123','B225') )
> $ x %>% group_by(ID) %>% summarise(m=mean(Length))
>
>
>
> On Tue, May 24, 2016 at 3:46 PM, Matthew 
> <mccorm...@molbio.mgh.harvard.edu 
> <mailto:mccorm...@molbio.mgh.harvard.edu>> wrote:
>
> I have a data frame with 10 columns.
> In the last column is an alphaneumaric identifier.
> For most rows, this alphaneumaric identifier is unique to the
> file, however some of these alphanemeric idenitifiers occur in
> duplicate, triplicate or more. When they do occur more than once
> they are in consecutive rows, so when there is a duplicate or
> triplicate or quadruplicate (let's call them multiplicates), they
> are in consecutive rows.
>
> In column 7 there is an integer number (may or may not be unique.
> does not matter).
>
> I want to identify each multiple entries (multiplicates) occurring
> in column 10 and then for each multiplicate calculate the mean of
> the integers column 7.
>
> As an example, I will show just two columns:
> Length  Identifier
> 321 A234
> 350 A234
> 340 A234
> 180 B123
> 198 B225
>
> What I want to do (in the above example) is collapse all the
> A234's and report the mean to get this:
> Length  Identifier
> 337 A234
> 180 B123
> 198 B225
>
>
> Matthew
>
> __
> R-help@r-project.org <mailto:R-help@r-project.org> mailing list --
> To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] identify duplicate entries in data frame and calculate mean

2016-05-24 Thread Matthew


I have a data frame with 10 columns.
In the last column is an alphaneumaric identifier.
For most rows, this alphaneumaric identifier is unique to the file, 
however some of these alphanemeric idenitifiers occur in duplicate, 
triplicate or more. When they do occur more than once they are in 
consecutive rows, so when there is a duplicate or triplicate or 
quadruplicate (let's call them multiplicates), they are in consecutive rows.


In column 7 there is an integer number (may or may not be unique. does 
not matter).


I want to identify each multiple entries (multiplicates) occurring in 
column 10 and then for each multiplicate calculate the mean of the 
integers column 7.


As an example, I will show just two columns:
Length  Identifier
321 A234
350 A234
340 A234
180 B123
198 B225

What I want to do (in the above example) is collapse all the A234's and 
report the mean to get this:

Length  Identifier
337 A234
180 B123
198 B225


Matthew

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fast way to create composite matrix based on mixed indices?

2015-09-18 Thread Matthew Keller

Brilliant Denes. Thank you for your help. This worked and is obviously much
faster than a loop...

On Thu, Sep 17, 2015 at 3:22 PM, Dénes Tóth <toth.de...@ttk.mta.hu> wrote:

> Hi Matt,
>
> you could use matrix indexing. Here is a possible solution, which could be
> optimized further (probably).
>
> # The old matrix
> (old.mat <- matrix(1:30,nrow=3,byrow=TRUE))
> # matrix of indices
> index <- matrix(c(1,1,1,4,
>   1,3,5,10,
>   2,2,1,3,
>   2,1,4,8,
>   2,3,9,10),
> nrow=5,byrow=TRUE,
> dimnames=list(NULL,
>   c('new.mat.row','old.mat.row',
> 'old.mat.col.start','old.mat.col.end')))
> # expected result
> new.mat <- matrix(c(1:4,25:30,11:13,4:8,29:30),
>   byrow=TRUE, nrow=2)
> #
> # column indices
> ind <- mapply(seq, index[, 3], index[,4],
>   SIMPLIFY = FALSE, USE.NAMES = FALSE)
> ind_len <- vapply(ind, length, integer(1))
> ind <- unlist(ind)
>
> #
> # old indices
> old.ind <- cbind(rep(index[,2], ind_len), ind)
> #
> # new indices
> new.ind <- cbind(rep(index[,1], ind_len), ind)
> #
> # create the new matrix
> result <- matrix(NA_integer_, max(index[,1]), max(index[,4]))
> #
> # fill the new matrix
> result[new.ind] <- old.mat[old.ind]
> #
> # check the results
> identical(result, new.mat)
>
>
> HTH,
>   Denes
>
>
>
>
>
> On 09/17/2015 10:36 PM, Matthew Keller wrote:
>
>> HI all,
>>
>> Sorry for the title here but I find this difficult to describe succinctly.
>> Here's the problem.
>>
>> I want to create a new matrix where each row is a composite of an old
>> matrix, but where the row & column indexes of the old matrix change for
>> different parts of the new matrix. For example, the second row of new
>> matrix (which has , e.g., 10 columns) might be columns 1 to 3 of row 2 of
>> old matrix, columns 4 to 8 of row 1 of old matrix, and columns 9 to 10 of
>> row 3 of old matrix.
>>
>> Here's an example in code:
>>
>> #The old matrix
>> (old.mat <- matrix(1:30,nrow=3,byrow=TRUE))
>>
>> #matrix of indices to create the new matrix from the old one.
>> #The 1st column gives the row number of the new matrix
>> #the 2nd gives the row of the old matrix that we're going to copy into the
>> new matrix
>> #the 3rd gives the starting column of the old matrix for the row in col 2
>> #the 4th gives the end column of the old matrix for the row in col 2
>> index <- matrix(c(1,1,1,4,
>>1,3,5,10,
>>2,2,1,3,
>>2,1,4,8,
>>2,3,9,10),
>>  nrow=5,byrow=TRUE,
>>
>>
>> dimnames=list(NULL,c('new.mat.row','old.mat.row','old.mat.col.start','old.mat.col.end')))
>>
>> I will be given old.mat and index and want to create new.mat from them.
>>
>> I want to create a new.matrix of two rows that looks like this:
>> new.mat <- matrix(c(1:4,25:30,11:13,4:8,29:30),byrow=TRUE,nrow=2)
>>
>> So here, the first row of new.mat is columns 1 to 4 of row 1 of the
>> old.mat
>> and columns 5 to 10 of row 3 of old.mat.
>>
>> new.mat and old.mat will always have the same number of columns but the
>> number of rows could differ.
>>
>> I could accomplish this in a loop, but the real problem is quite large
>> (new.mat might have 1e8 elements), and so a for loop would be
>> prohibitively
>> slow.
>>
>> I may resort to unix tools and use a shell script, but wanted to first see
>> if this is doable in R in a fast way.
>>
>> Thanks in advance!
>>
>> Matt
>>
>>
>>


-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] fast way to create composite matrix based on mixed indices?

2015-09-17 Thread Matthew Keller

HI all,

Sorry for the title here but I find this difficult to describe succinctly.
Here's the problem.

I want to create a new matrix where each row is a composite of an old
matrix, but where the row & column indexes of the old matrix change for
different parts of the new matrix. For example, the second row of new
matrix (which has , e.g., 10 columns) might be columns 1 to 3 of row 2 of
old matrix, columns 4 to 8 of row 1 of old matrix, and columns 9 to 10 of
row 3 of old matrix.

Here's an example in code:

#The old matrix
(old.mat <- matrix(1:30,nrow=3,byrow=TRUE))

#matrix of indices to create the new matrix from the old one.
#The 1st column gives the row number of the new matrix
#the 2nd gives the row of the old matrix that we're going to copy into the
new matrix
#the 3rd gives the starting column of the old matrix for the row in col 2
#the 4th gives the end column of the old matrix for the row in col 2
index <- matrix(c(1,1,1,4,
  1,3,5,10,
  2,2,1,3,
  2,1,4,8,
  2,3,9,10),
nrow=5,byrow=TRUE,

dimnames=list(NULL,c('new.mat.row','old.mat.row','old.mat.col.start','old.mat.col.end')))

I will be given old.mat and index and want to create new.mat from them.

I want to create a new.matrix of two rows that looks like this:
new.mat <- matrix(c(1:4,25:30,11:13,4:8,29:30),byrow=TRUE,nrow=2)

So here, the first row of new.mat is columns 1 to 4 of row 1 of the old.mat
and columns 5 to 10 of row 3 of old.mat.

new.mat and old.mat will always have the same number of columns but the
number of rows could differ.

I could accomplish this in a loop, but the real problem is quite large
(new.mat might have 1e8 elements), and so a for loop would be prohibitively
slow.

I may resort to unix tools and use a shell script, but wanted to first see
if this is doable in R in a fast way.

Thanks in advance!

Matt


-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reshape: melt and cast

2015-09-01 Thread Matthew Pickard

Yep, that works. Thanks, Stephen. I should have drawn the parallel with
Excel Pivot tables sooner.

On Tue, Sep 1, 2015 at 9:36 AM, stephen sefick <ssef...@gmail.com> wrote:

> I would make this minimal. In other words, use an example data set, dput,
> and use output of dput in a block of reproducible code. I don't understand
> exactly what you want, but does sum work? If there is more than one record
> for a given set of factors the sum is the sum of the counts. If only one
> record, then the sum is the same as the original number.
>
> On Tue, Sep 1, 2015 at 10:00 AM, Matthew Pickard <
> matthew.david.pick...@gmail.com> wrote:
>
>> Thanks, Stephen. I've looked into the fun.aggregate argument. I don't
>> want to aggregate, so I thought leaving it blank (allowing it to default to
>> NULL) would do that.
>>
>>
>> Here's a corrected post (with further explanation):
>>
>> Hi,
>>
>> I have data that looks like this:
>>
>> >dput(head(ratings))
>> structure(list(QCode = structure(c(5L, 7L, 5L, 7L, 5L, 7L), .Label =
>> c("APPEAR",
>> "FEAR", "FUN", "GRAT", "GUILT", "Joy", "LOVE", "UNGRAT"), class =
>> "factor"),
>> PID = structure(c(1L, 1L, 2L, 2L, 3L, 3L), .Label = c("1123",
>> "1136", "1137", "1142", "1146", "1147", "1148", "1149", "1152",
>> "1153", "1154", "1156", "1158", "1161", "1164", "1179", "1182",
>> "1183", "1191", "1196", "1197", "1198", "1199", "1200", "1201",
>> "1203", "1205", "1207", "1208", "1209", "1214", "1216", "1219",
>> "1220", "1222", "1223", "1224", "1225", "1226", "1229", "1236",
>> "1237", "1238", "1240", "1241", "1243", "1245", "1246", "1248",
>> "1254", "1255", "1256", "1257", "1260", "1262", "1264", "1268",
>> "1270", "1272", "1278", "1279", "1280", "1282", "1283", "1287",
>> "1288", "1292", "1293", "1297", "1310", "1311", "1315", "1329",
>> "1332", "1333", "1343", "1346", "1347", "1352", "1354", "1355",
>> "1356", "1360", "1368", "1369", "1370", "1378", "1398", "1400",
>> "1403", "1404", "1411", "1412", "1420", "1421", "1423", "1424",
>> "1426", "1428", "1432", "1433", "1435", "1436", "1438", "1439",
>> "1440", "1441", "1443", "1444", "1446", "1447", "1448", "1449",
>> "1450", "1453", "1454", "1456", "1459", "1460", "1461", "1462",
>> "1463", "1468", "1471", "1475", "1478", "1481", "1482", "1487",
>> "1488", "1490", "1493", "1495", "1497", "1503", "1504", "1508",
>> "1509", "1511", "1513", "1514", "1515", "1522", "1524", "1525",
>> "1526", "1527", "1528", "1529", "1532", "1534", "1536", "1538",
>> "1539", "1540", "1543", "1550", "1551", "1552", "1554", "1555",
>> "1556", "1558", "1559"), class = "factor"), RaterName =
>> structure(c(1L,
>> 1L, 1L, 1L, 1L, 1L), .Label = c("cwormhoudt", "zspeidel"), class =
>> "factor"),
>> SI1 = c(2L, 1L, 1L, 1L, 2L, 1L), SI2 = c(2L, 2L, 2L, 2L,
>> 2L, 3L), SI3 = c(3L, 3L, 3L, 3L, 2L, 4L), SI4 = c(1L, 2L,
>> 1L,

[R] setting up R -- VM Fusion, WIndows7

2015-07-30 Thread Matthew Johnson

Hi,

As i need R to speak to Bloomberg (and big only runs on windows), i'm
running windows 7 via VM Fusion on my mac.

I think i am having permission problems, as i cannot use install.packages,
and cannot change .libPaths via either a .Rprofile, or Profile.site.

I've posted more detail in this super-user question --
http://superuser.com/questions/948083/how-to-set-environment-variables-in-vm-fusion-windows-7

Throwing it over to this list as well, as I've spent about half the time i
had allowed for my project on (not getting) set up.

I realise this is a very niche problem - hoping that someone else has had a
similar problem, and can offer pointers.

best

mj

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tcltk2 entry box

2015-07-10 Thread Matthew


Thank you very much, Greg, for the tkwait commands.

 I am just starting to try out examples on the sciviews web page to get 
a feel for tcltk in R and the tkwait.variable and tkwait.window seem 
like they could be very useful to me. I will add these in to my practice 
scripts and see what I can do with them.


Matthew

On 7/9/2015 5:31 PM, Greg Snow wrote:

If you want you script to wait until you have a value entered then you
can use the tkwait.variable or tkwait.window commands to make the
script wait before continuing (or you can bind the code to a button so
that you enter the value, then click on the button to run the code).

On Wed, Jul 8, 2015 at 7:58 PM, Matthew McCormack
mccorm...@molbio.mgh.harvard.edu wrote:

Wow !  Very nice.  Thank you very much, John.  This is very helpful and just
what I need.
Yes, I can see that I should have paid attention to tcltk before going to
tcltk2.

Matthew


On 7/8/2015 8:37 PM, John Fox wrote:

Dear Matthew,

For file selection, see ?tcltk::tk_choose.files or ?tcltk::tkgetOpenFile .

You could enter a number in a tk entry widget, but, depending upon the
nature of the number, a slider or other widget might be a better choice.

For a variety of helpful tcltk examples see
http://www.sciviews.org/_rgui/tcltk/, originally by James Wettenhall but
now maintained by Philippe Grosjean (the author of the tcltk2 package).
(You
probably don't need tcltk2 for the simple operations that you mention, but
see ?tk2spinbox for an alternative to a slider.)

Best,
   John

---
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.socsci.mcmaster.ca/jfox/





-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Matthew
Sent: July-08-15 8:01 PM
To: r-help
Subject: [R] tcltk2 entry box

Is anyone familiar enough with the tcltk2 package to know if it is
possible to have an entry box where a user can enter information (such
as a path to a file or a number) and then be able to use the entered
information downstream in a R script ?

The idea is for someone unfamiliar with R to just start an R script that
would take care of all the commands for them so all they have to do is
get the script started. However, there is always a couple of pieces of
information that will change each time the script is used (for example,
a different file will be processed by the script). So, I would like a
way for the user to input that information as the script ran.

Matthew McCormack

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] tcltk2 entry box

2015-07-08 Thread Matthew

Is anyone familiar enough with the tcltk2 package to know if it is 
possible to have an entry box where a user can enter information (such 
as a path to a file or a number) and then be able to use the entered 
information downstream in a R script ?


The idea is for someone unfamiliar with R to just start an R script that 
would take care of all the commands for them so all they have to do is 
get the script started. However, there is always a couple of pieces of 
information that will change each time the script is used (for example, 
a different file will be processed by the script). So, I would like a 
way for the user to input that information as the script ran.


Matthew McCormack

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tcltk2 entry box

2015-07-08 Thread Matthew McCormack

Wow !  Very nice.  Thank you very much, John.  This is very helpful and 
just what I need.
Yes, I can see that I should have paid attention to tcltk before going 
to tcltk2.


Matthew

On 7/8/2015 8:37 PM, John Fox wrote:

Dear Matthew,

For file selection, see ?tcltk::tk_choose.files or ?tcltk::tkgetOpenFile .

You could enter a number in a tk entry widget, but, depending upon the
nature of the number, a slider or other widget might be a better choice.

For a variety of helpful tcltk examples see
http://www.sciviews.org/_rgui/tcltk/, originally by James Wettenhall but
now maintained by Philippe Grosjean (the author of the tcltk2 package). (You
probably don't need tcltk2 for the simple operations that you mention, but
see ?tk2spinbox for an alternative to a slider.)

Best,
  John

---
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.socsci.mcmaster.ca/jfox/





-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Matthew
Sent: July-08-15 8:01 PM
To: r-help
Subject: [R] tcltk2 entry box

Is anyone familiar enough with the tcltk2 package to know if it is
possible to have an entry box where a user can enter information (such
as a path to a file or a number) and then be able to use the entered
information downstream in a R script ?

The idea is for someone unfamiliar with R to just start an R script that
would take care of all the commands for them so all they have to do is
get the script started. However, there is always a couple of pieces of
information that will change each time the script is used (for example,
a different file will be processed by the script). So, I would like a
way for the user to input that information as the script ran.

Matthew McCormack

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] simple question - mean of a row of a data.frame

2015-02-11 Thread Matthew Keller

Hi all,

Simple question I should know: I'm unclear on the logic of why the sum of a
row of a data.frame returns a valid sum but the mean of a row of a
data.frame returns NA:

sum(rock[2,])
[1] 10901.05

mean(rock[2,],trim=0)
[1] NA
Warning message:
In mean.default(rock[2, ], trim = 0) :
  argument is not numeric or logical: returning NA

I get that rock[2,] is itself a data.frame of mode list, but why the
inconsistency between functions? How can you figure this out from, e.g.,
?mean
?sum

Thanks in advance,

Matt


-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] need help with excel data

2015-01-21 Thread Matthew

Try asap utilities (Home and Student edition), 
http://www.asap-utilities.com/index.php. When installed it will look 
like this in Excel,

Select Columns  Rows and then #18.


If that is not helpful, then DigDB, http://www.digdb.com/, but this one 
requires a subscription. It will also split columns.
You may have to do some 'cleaning' of individual cells, such as removing 
leading and/or trainling spaces. A lot of this can be one with the ASAP 
Utilities 'Text' pull down menu.


Matthew

On 1/21/2015 3:31 PM, Dr Polanski wrote:

Hi all!

Sorry to bother you, I am trying to learn some R via coursera courses and other 
internet sources yet haven’t managed to go far

And now I need to do some, I hope, not too difficult things, which I think R 
can do, yet have no idea how to make it do so

I have a big set of data (empirical) which was obtained by my colleagues and 
store at not convenient  way - all of the data in two cells of an excel table
an example of the data is in the attached file (the link)

https://drive.google.com/file/d/0B64YMbf_hh5BS2tzVE9WVmV3bFU/view?usp=sharing

so the first column has a number and the second has a whole vector (I guess it 
is) which looks like
«some words in Cyrillic(the length varies)» and then the set of numbers «12*23 
34*45» (another problem that some times it is «12*23, 34*56»

And the number of raws is about 3000 so it is impossible to do manually

what I need to have at the end is to have it separately in different excel cells
- what is written in words - |  12  | 23 | 34 | 45 |

Do you think it is possible to do so using R (or something else?)

Thank you very much in advance and sorry for asking for help and so stupid 
question, the problem is - I am trying and yet haven’t even managed to install 
openSUSE onto my laptop - only Ubuntu! :)


Thank you very much!
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] change default installation of R

2014-10-30 Thread Matthew

I have R version 2.15.0 installed in /usr/local/bin, and this is the 
default; in other words when I type which R this is the path I get.

I also have installed R into/usr/local/R-3.1.1/.   I used ./configure 
and then make to install this version. After make, I get the following 
error messages:

../unix/sys-std.o: In function `initialize_rlcompletion':
/usr/local/R-3.1.1/src/unix/sys-std.c:689: undefined reference to 
`rl_sort_completion_matches'
collect2: ld returned 1 exit status
make[3]: *** [R.bin] Error 1
make[3]: Leaving directory `/usr/local/R-3.1.1/src/main'
make[2]: *** [R] Error 2
make[2]: Leaving directory `/usr/local/R-3.1.1/src/main'
make[1]: *** [R] Error 1
make[1]: Leaving directory `/usr/local/R-3.1.1/src'
make: *** [R] Error 1

I want to change R-3.1.1 to the default, so that when I type which R, I 
get /usr/local/R-3.1.1

To do this I first cd'd into /usr/local/bin and renamed R to R-old_10-30-14
then created a symlink by 'ln -s /usr/local/R-3.1.1/bin  R'
but when I type which R, I get 'no R in ... , where ' . . . ' is my PATH 
variable.

If I remove the symlink and then create another one with ln -s 
/usr/local/R-3.1.1/bin/R  R,
then after typing 'which R', I get
/usr/local/bin/R: line 259: /usr/local/R-3.1.1/bin/exe
c/R: No such file or directory
/usr/local/bin/R: line 259: exec: /usr/local/R-3.1.1/bin/exec/R: cannot 
execute: No such file or directory

This is the same message I get if I just type at the command line: 
/usr/local/R-3.1.1/bin/R.

Matthew

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] find the data frames in list of objects and make a list of them

2014-09-04 Thread Matthew

Thank you very much, Bill !

 It has taken my a while to figure out, but yes, what I need is a 
list (the R object, list) of data frames and not a character vector 
containing the names of the data frames.

   Thank you very much. This works well and is getting me in the 
direction I want to go.

Matthew

On 8/13/2014 7:40 PM, William Dunlap wrote:
 Previously you asked
  A second question: is this the best way to make a list
 of data frames without having to manually type c(dataframe1, dataframe2, 
 ...)  ?
 If you use 'c' there you will not get a list of data.frames - you will
 get a list of all the columns in the data.frame you supplied.  Use
 'list' instead of 'c' if you are taking that route.

 The *apply functions are helpful  here.  To make list of all
 data.frames in an environment you can use the following function,
 which takes the environment to search as an argument.

 f - function(envir = globalenv()) {
  tmp - eapply(envir,
 all.names=TRUE,
 FUN=function(obj) if (is.data.frame(obj))
 obj else NULL)
  # remove NULL's now
  tmp[!vapply(tmp, is.null, TRUE)]
 }

 Use is as
allDataFrames - f(globalenv()) # or just f()






 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com


 On Wed, Aug 13, 2014 at 3:49 PM, Matthew
 mccorm...@molbio.mgh.harvard.edu  wrote:
 Hi Richard,

  Thank you very much for your reply and your code.
 Your code is doing just what I asked for, but does not seem to be what I
 need.

 I will need to review some basic R before I can continue.

 I am trying to list data frames in order to bind them into 1 single data
 frame with something like: dplyr::rbind_all(list of data frames), but when I
 try dplyr::rbind_all(lsDataFrame(ls())), I get the error: object at index 1
 not a data.frame. So, I am going to have to learn some more about lists in R
 before proceding.

 Thank you for your help and code.

 Matthew





 Matthew

 On 8/13/2014 3:12 PM, Richard M. Heiberger wrote:
 I would do something like this

 lsDataFrame - function(xx=ls()) xx[sapply(xx, function(x)
 is.data.frame(get(x)))]
 ls(package:datasets)
 lsDataFrame(ls(package:datasets))

 On Wed, Aug 13, 2014 at 2:56 PM, Matthew
 mccorm...@molbio.mgh.harvard.edu  wrote:
 Hi everyone,

  I would like the find which objects are data frames in all the
 objects I
 have created ( in other words in what you get when you type: ls()  ),
 then I
 would like to make a list of these data frames.

 Explained in other words; after typing ls(), you get the names of
 objects.
 Which objects are data frames ?  How to then make a list of these data
 frames.

  A second question: is this the best way to make a list of data frames
 without having to manually type c(dataframe1, dataframe2, ...)  ?

 Matthew

 __
 R-help@r-project.org  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 __
 R-help@r-project.org  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] find the data frames in list of objects and make a list of them

2014-08-13 Thread Matthew


Hi everyone,

   I would like the find which objects are data frames in all the 
objects I have created ( in other words in what you get when you type: 
ls()  ), then I would like to make a list of these data frames.


Explained in other words; after typing ls(), you get the names of 
objects. Which objects are data frames ?  How to then make a list of 
these data frames.


   A second question: is this the best way to make a list of data 
frames without having to manually type c(dataframe1, dataframe2, ...)  ?


Matthew

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] find the data frames in list of objects and make a list of them

2014-08-13 Thread Matthew

Jim, Wow that was cl !   This function is *really* useful.  Thank 
you very much !  (It is also way beyond my capability).

I need to make a list of data frames because then I am going to bind 
them with plyr using 'dplyr::rbind_all(listOfDataFrames)'. This will 
make a single data frame, and from that single data frame I can make a 
heat map of all the data.

   For example, when I use your fantastic function, my.ls(), I get:

my.ls()
  Size  Class  
Length  Dim
.Random.seed2,544integer
 626
cpl28,664  character
 512
filenames   2,120  character
  19
filepath  216  character
   1
i 152  character
   1
Mer7_1-1_160-226A_1_gene_exp_diff_filt_hc_log2.txt 81,152 data.frame
   3  529 x 3
Mer7_1-1_Mer7_1-2_gene_exp_diff_filt_hc_log2.txt   31,624 data.frame
   3  199 x 3
Mer7_1-1_S150-160-226A_1_gene_exp_diff_filt_hc_log2.txt81,152 data.frame
   3  529 x 3
Mer7_1-1_W29_1_gene_exp_diff_filt_hc_log2.txt 129,376 data.frame
   3  849 x 3
Mer7_1-1_W29_S150-226A_1_gene_exp_diff_filt_hc_log2.txt   126,816 data.frame
   3  835 x 3
Mer7_1-1_W29_S160-162A_1_gene_exp_diff_filt_hc_log2.txt82,792 data.frame
   3  537 x 3
Mer7_1-1_W29_S226A_1_gene_exp_diff_filt_hc_log2.txt   115,008 data.frame
   3  756 x 3
Mer7_1-2_160-226A_1_gene_exp_diff_filt_hc_log2.txt 79,936 data.frame
   3  519 x 3
Mer7_1-2_S150-160-226A_1_gene_exp_diff_filt_hc_log2.txt84,512 data.frame
   3  548 x 3
Mer7_1-2_W29_1_gene_exp_diff_filt_hc_log2.txt 130,568 data.frame
   3  857 x 3
Mer7_1-2_W29_S160-162A_1_gene_exp_diff_filt_hc_log2.txt83,768 data.frame
   3  542 x 3
Mer7_1-2_W29_S226A_1_gene_exp_diff_filt_hc_log2.txt   119,008 data.frame
   3  783 x 3
Mer7_2-1_160-226A_2_gene_exp_diff_filt_hc_log2.txt105,344 data.frame
   3  685 x 3
Mer7_2-1_Mer7_2-2_gene_exp_diff_filt_hc_log2.txt   26,216 data.frame
   3  166 x 3
Mer7_2-1_S150-160-226A_2_gene_exp_diff_filt_hc_log2.txt   106,368 data.frame
   3  693 x 3
Mer7_2-1_W29_2_gene_exp_diff_filt_hc_log2.txt 160,200 data.frame
   3 1053 x 3
Mer7_2-1_W29_S150-226A_2_gene_exp_diff_filt_hc_log2.txt   152,696 data.frame
   3 1005 x 3
Mer7_2-1_W29_S160-162A_2_gene_exp_diff_filt_hc_log2.txt   113,992 data.frame
   3  743 x 3
Mer7_2-1_W29_S226A_2_gene_exp_diff_filt_hc_log2.txt   138,944 data.frame
   3  914 x 3
my.ls  35,624   function
   1
myfiles 2,120  character
  19
names   2,424   list
  19
test  680  character
   5
whatisthis  2,424   list
  19
**Total 2,026,440--- 
---  ---



   What I need is make the list of data frames for the dplyr command, 
dplyr::rbind_all(listOfDataFrames). Ideally, this would also be a 
specific subset of all the data frames, say the data frames with W29 in 
the name. This is something we, our lab, would be doing routinely and at 
various times of the day, so I want to automate the process so it does 
not need anyone to manually sit at the computer and type the list of 
data frames.

Matthew


On 8/13/2014 3:06 PM, jim holtman wrote:
 Here is a function that I use that might give you the results you want:

 =
 my.ls()
 Size  Class  Length Dim
 .Random.seed  2,544integer 626
 .remapHeaderFile 40,440 data.frame   2 373 x 2
 colID   216  character   3
 delDate 104  character   1
 deliv15,752 data.table   7 164 x 7
 f_drawPallet 36,896   function   1
 i96  character   1
 indx168,816  character1782
 pallet  172,696 data.table   31782 x 3
 pallets 405,736 data.table  14   1782 x 14
 picks26,572,856 data.table  19 154247 x 19
 wb  656   Workbook   1
 wSplit   68,043,136   list1782
 x56numeric   2
 **Total  95,460,000--- --- ---

 
 my.ls
 function (pos = 1, sorted = FALSE, envir = as.environment(pos))
 {
  .result - sapply(ls(envir = envir, all.names = TRUE),
 function(..x) object.size(eval(as.symbol(..x),
  envir = envir

Re: [R] find the data frames in list of objects and make a list of them

2014-08-13 Thread Matthew


Hi Richard,

Thank you very much for your reply and your code.
Your code is doing just what I asked for, but does not seem to be what I 
need.


I will need to review some basic R before I can continue.

I am trying to list data frames in order to bind them into 1 single data 
frame with something like: dplyr::rbind_all(list of data frames), but 
when I try dplyr::rbind_all(lsDataFrame(ls())), I get the error: object 
at index 1 not a data.frame. So, I am going to have to learn some more 
about lists in R before proceding.


Thank you for your help and code.

Matthew





Matthew

On 8/13/2014 3:12 PM, Richard M. Heiberger wrote:

I would do something like this

lsDataFrame - function(xx=ls()) xx[sapply(xx, function(x)
is.data.frame(get(x)))]
ls(package:datasets)
lsDataFrame(ls(package:datasets))

On Wed, Aug 13, 2014 at 2:56 PM, Matthew
mccorm...@molbio.mgh.harvard.edu wrote:

Hi everyone,

I would like the find which objects are data frames in all the objects I
have created ( in other words in what you get when you type: ls()  ), then I
would like to make a list of these data frames.

Explained in other words; after typing ls(), you get the names of objects.
Which objects are data frames ?  How to then make a list of these data
frames.

A second question: is this the best way to make a list of data frames
without having to manually type c(dataframe1, dataframe2, ...)  ?

Matthew

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] working on a data frame

2014-07-28 Thread Matthew

Thank you very much Peter, Bill and Petr for some great and quite 
elegant solutions. There is a lot I can learn from these.


Yes to your question Bill about the raw numbers, they are counts 
and they can not be negatives. The data is RNA Sequencing data where 
there are approximately 32,000 genes being measured for changes between 
two conditions. There are some genes that are not present (can not be 
measured) initially, but are present in the second condition, and the 
reverse is true also of some genes that are present initially and then 
not be present in the second condition (these are often the most 
interesting genes). This makes it difficult to compare mathematically 
the changes of all genes, so it is common practice to change the 0's to 
1's and then redo the log2. 1 is considered sufficiently small, actually 
anywhere up to 3 or 5 could be just do to 'background noise' in the 
measurement process, but it is somewhat arbitrary.


Matthew

On 7/28/2014 2:43 AM, PIKAL Petr wrote:

Hi

I like to use logical values directly in computations if possible.

yourData[,10] - yourData[,9]/(yourData[,8]+(yourData[,8]==0))

Logical values are automagicaly considered FALSE=0 and TRUE=1 and can be used 
in computations. If you really want to change 0 to 1 in column 8 you can use

yourData[,8]  -  yourData[,8]+(yourData[,8]==0)

without ifelse stuff.

Regards
Petr



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
project.org] On Behalf Of William Dunlap
Sent: Friday, July 25, 2014 8:07 PM
To: Matthew
Cc: r-help@r-project.org
Subject: Re: [R] working on a data frame


if
yourData[,8]==0,
then
yourData[,8]==1, yourData[,10] - yourData[,9]/yourData[,8]

You could do express this in R as
is8Zero - yourData[,8] == 0
yourData[is8Zero, 8] - 1
yourData[is8Zero, 10] - yourData[is8Zero,9] / yourData[is8Zero,8]
Note how logical (Boolean) values are used as subscripts - read the '['
as 'such that' when using logical subscripts.

There are many more ways to express the same thing.

(I am tempted to change the algorithm to avoid the divide by zero
problem by making the quotient (numerator + epsilon)/(denominator +
epsilon) where epsilon is a very small number.  I am assuming that the
raw numbers are counts or at least cannot be negative.)

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Jul 25, 2014 at 10:44 AM, Matthew
mccorm...@molbio.mgh.harvard.edu wrote:

Thank you for your comments, Peter.

A couple of questions.  Can I do something like the following ?

if
yourData[,8]==0,
then
yourData[,8]==1, yourData[,10] - yourData[,9]/yourData[,8]


I think I am just going to have to learn more about R. I thought
getting into R would be like going from Perl to Python or Java etc.,
but it seems like R programming works differently.

Matthew


On 7/25/2014 12:06 AM, Peter Alspach wrote:

Tena koe Matthew

 Column 10 contains the result of the value in column 9 divided by
the value in column 8. If the value in column 8==0, then the

division

can not be done, so  I want to change the zero to a one in order to

do the division..

That being the case, think in terms of vectors, as Sarah says.  Try:

yourData[,10] - yourData[,9]/yourData[,8]
yourData[yourData[,8]==0,10] - yourData[yourData[,8]==0,9]

This doesn't change the 0 to 1 in column 8, but it doesn't appear

you

actually need to do that.

HTH 

Peter Alspach

-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org]
On Behalf Of Matthew McCormack
Sent: Friday, 25 July 2014 3:16 p.m.
To: Sarah Goslee
Cc: r-help@r-project.org
Subject: Re: [R] working on a data frame


On 7/24/2014 8:52 PM, Sarah Goslee wrote:

Hi,

Your description isn't clear:

On Thursday, July 24, 2014, Matthew
mccorm...@molbio.mgh.harvard.edu

mailto:mccorm...@molbio.mgh.harvard.edu wrote:

  I am coming from the perspective of Excel and VBA scripts, but

I

  would like to do the following in R.

   I have a data frame with 14 columns and 32,795 rows.

  I want to check the value in column 8 (row 1) to see if it is

a 0.

  If it is not a zero, proceed to the next row and check the

value

  for column 8.
  If it is a zero, then
  a) change the zero to a 1,
  b) divide the value in column 9 (row 1) by 1,


Row 1, or the row in which column 8 == 0?

All rows in which the value in column 8==0.

Why do you want to divide by 1?

Column 10 contains the result of the value in column 9 divided by

the

value in column 8. If the value in column 8==0, then the division

can

not be done, so  I want to change the zero to a one in order to do

the division.

This is a fairly standard thing to do with this data. (The data are
measurements of amounts at two time points. Sometimes a thing will
not be present in the beginning (0), but very present at the later
time. Column 10 is the log2 of the change. Infinite is not an easy
number to work with, so it is common to change

Re: [R] working on a data frame

2014-07-25 Thread Matthew


Thank you for your comments, Peter.

A couple of questions.  Can I do something like the following ?

if
yourData[,8]==0,
then
yourData[,8]==1, yourData[,10] - yourData[,9]/yourData[,8]


I think I am just going to have to learn more about R. I thought getting 
into R would be like going from Perl to Python or Java etc., but it 
seems like R programming works differently.


Matthew


On 7/25/2014 12:06 AM, Peter Alspach wrote:

Tena koe Matthew

 Column 10 contains the result of the value in column 9 divided by the value in 
column 8. If the value in column 8==0, then the division can not be done, so  I want to 
change the zero to a one in order to do the division..  That being the case, think 
in terms of vectors, as Sarah says.  Try:

yourData[,10] - yourData[,9]/yourData[,8]
yourData[yourData[,8]==0,10] - yourData[yourData[,8]==0,9]

This doesn't change the 0 to 1 in column 8, but it doesn't appear you actually 
need to do that.

HTH 

Peter Alspach

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Matthew McCormack
Sent: Friday, 25 July 2014 3:16 p.m.
To: Sarah Goslee
Cc: r-help@r-project.org
Subject: Re: [R] working on a data frame


On 7/24/2014 8:52 PM, Sarah Goslee wrote:

Hi,

Your description isn't clear:

On Thursday, July 24, 2014, Matthew mccorm...@molbio.mgh.harvard.edu
mailto:mccorm...@molbio.mgh.harvard.edu wrote:

 I am coming from the perspective of Excel and VBA scripts, but I
 would like to do the following in R.

  I have a data frame with 14 columns and 32,795 rows.

 I want to check the value in column 8 (row 1) to see if it is a 0.
 If it is not a zero, proceed to the next row and check the value
 for column 8.
 If it is a zero, then
 a) change the zero to a 1,
 b) divide the value in column 9 (row 1) by 1,


Row 1, or the row in which column 8 == 0?

All rows in which the value in column 8==0.

Why do you want to divide by 1?

Column 10 contains the result of the value in column 9 divided by the value in 
column 8. If the value in column 8==0, then the division can not be done, so  I 
want to change the zero to a one in order to do the division. This is a fairly 
standard thing to do with this data. (The data are measurements of amounts at 
two time points. Sometimes a thing will not be present in the beginning (0), 
but very present at the later time. Column 10 is the log2 of the change. 
Infinite is not an easy number to work with, so it is common to change the 0 to 
a 1. On the other hand, something may be present at time 1, but not at the 
later time. In this case column 10 would be taking the log2 of a number divided 
by 0, so again the zero is commonly changed to a one in order to get a useable 
value in column 10. In both the preceding cases there was a real change, but 
Inf and NaN are not helpful.)

 c) place the result in column 10 (row 1) and


Ditto on the row 1 question.

I want to work on all rows where column 8 (and column 9) contain a zero.
Column 10 contains the result of the value in column 9 divided by the value in 
column 8. So, for row 1, column 10 row 1 contains the ratio column 9 row 1 
divided by column 8 row 1, and so on through the whole
32,000 or so rows.

Most rows do not have a zero in columns 8 or 9. Some rows have  zero in column 
8 only, and some rows have a zero in column 9 only. I want to get rid of the 
zeros in these two columns and then do the division to get a manageable value 
in column 10. Division by zero and Inf are not considered 'manageable' by me.

What do you want column 10 to be if column 8 isn't 0? Does it already
have a value. I suppose it must.

Yes column 10 does have something, but this something can be Inf or NaN, which 
I want to get rid of.

 d) repeat this for each of the other 32,794 rows.

 Is this possible with an R script, and is this the way to go about
 it. If it is, could anyone get me started ?


Assuming you want to put the new values in the rows where column 8 ==
0, you can do it in two steps:

mydata[,10] - ifelse(mydata[,8] == 0, mydata[,9]/whatever,
mydata[,10]) #where whatever is the thing you want to divide by that
probably isn't 1 mydata[,8] - ifelse(mydata[,8] == 0, 1, mydata[,8])

R programming is best done by thinking about vectorizing things,
rather than doing them in loops. Reading the Intro to R that comes
with your installation is a good place to start.

Would it be better to change the data frame into a matrix, or something else ?
Thanks for your help.

Sarah


 Matthew




--
Sarah Goslee
http://www.stringpage.com
http://www.sarahgoslee.com
http://www.functionaldiversity.org


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained

[R] working on a data frame

2014-07-24 Thread Matthew

I am coming from the perspective of Excel and VBA scripts, but I would 
like to do the following in R.


 I have a data frame with 14 columns and 32,795 rows.

I want to check the value in column 8 (row 1) to see if it is a 0.
If it is not a zero, proceed to the next row and check the value for 
column 8.

If it is a zero, then
a) change the zero to a 1,
b) divide the value in column 9 (row 1) by 1,
c) place the result in column 10 (row 1) and
d) repeat this for each of the other 32,794 rows.

Is this possible with an R script, and is this the way to go about it. 
If it is, could anyone get me started ?


Matthew

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] working on a data frame

2014-07-24 Thread Matthew McCormack


On 7/24/2014 8:52 PM, Sarah Goslee wrote:
 Hi,

 Your description isn't clear:

 On Thursday, July 24, 2014, Matthew mccorm...@molbio.mgh.harvard.edu 
 mailto:mccorm...@molbio.mgh.harvard.edu wrote:

 I am coming from the perspective of Excel and VBA scripts, but I
 would like to do the following in R.

  I have a data frame with 14 columns and 32,795 rows.

 I want to check the value in column 8 (row 1) to see if it is a 0.
 If it is not a zero, proceed to the next row and check the value
 for column 8.
 If it is a zero, then
 a) change the zero to a 1,
 b) divide the value in column 9 (row 1) by 1,


 Row 1, or the row in which column 8 == 0?
All rows in which the value in column 8==0.
 Why do you want to divide by 1?
Column 10 contains the result of the value in column 9 divided by the 
value in column 8. If the value in column 8==0, then the division can 
not be done, so  I want to change the zero to a one in order to do the 
division. This is a fairly standard thing to do with this data. (The 
data are measurements of amounts at two time points. Sometimes a thing 
will not be present in the beginning (0), but very present at the later 
time. Column 10 is the log2 of the change. Infinite is not an easy 
number to work with, so it is common to change the 0 to a 1. On the 
other hand, something may be present at time 1, but not at the later 
time. In this case column 10 would be taking the log2 of a number 
divided by 0, so again the zero is commonly changed to a one in order to 
get a useable value in column 10. In both the preceding cases there was 
a real change, but Inf and NaN are not helpful.)

 c) place the result in column 10 (row 1) and


 Ditto on the row 1 question.
I want to work on all rows where column 8 (and column 9) contain a zero.
Column 10 contains the result of the value in column 9 divided by the 
value in column 8. So, for row 1, column 10 row 1 contains the ratio 
column 9 row 1 divided by column 8 row 1, and so on through the whole 
32,000 or so rows.

Most rows do not have a zero in columns 8 or 9. Some rows have  zero in 
column 8 only, and some rows have a zero in column 9 only. I want to get 
rid of the zeros in these two columns and then do the division to get a 
manageable value in column 10. Division by zero and Inf are not 
considered 'manageable' by me.
 What do you want column 10 to be if column 8 isn't 0? Does it already 
 have a value. I suppose it must.
Yes column 10 does have something, but this something can be Inf or NaN, 
which I want to get rid of.

 d) repeat this for each of the other 32,794 rows.

 Is this possible with an R script, and is this the way to go about
 it. If it is, could anyone get me started ?


 Assuming you want to put the new values in the rows where column 8 == 
 0, you can do it in two steps:

 mydata[,10] - ifelse(mydata[,8] == 0, mydata[,9]/whatever, mydata[,10])
 #where whatever is the thing you want to divide by that probably isn't 1
 mydata[,8] - ifelse(mydata[,8] == 0, 1, mydata[,8])

 R programming is best done by thinking about vectorizing things, 
 rather than doing them in loops. Reading the Intro to R that comes 
 with your installation is a good place to start.
Would it be better to change the data frame into a matrix, or something 
else ?
Thanks for your help.

 Sarah


 Matthew




 -- 
 Sarah Goslee
 http://www.stringpage.com
 http://www.sarahgoslee.com
 http://www.functionaldiversity.org


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] odd behavior of seq()

2014-07-03 Thread Matthew Keller

Hi all,

A bit stumped here.

z - seq(.05,.85,by=.1)
z==.05 #good
[1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

z==.15  #huh
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

More generally:
 sum(z==.25)
[1] 1
 sum(z==.35)
[1] 0
 sum(z==.45)
[1] 1
 sum(z==.55)
[1] 1
 sum(z==.65)
[1] 0
 sum(z==.75)
[1] 0
 sum(z==.85)
[1] 1

Does anyone have any ideas what is going on here?

 R.Version()
$platform
[1] x86_64-apple-darwin9.8.0

$arch
[1] x86_64

$os
[1] darwin9.8.0

$system
[1] x86_64, darwin9.8.0

$status
[1] 

$major
[1] 2

$minor
[1] 13.1

$year
[1] 2011

$month
[1] 07

$day
[1] 08

$`svn rev`
[1] 56322

$language
[1] R

$version.string
[1] R version 2.13.1 (2011-07-08)

-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] odd behavior of seq()

2014-07-03 Thread Matthew Keller

thanks all!


On Thu, Jul 3, 2014 at 12:38 PM, Peter Langfelder 
peter.langfel...@gmail.com wrote:

 Precision, precision, precision...

  z[2]-0.15
 [1] 2.775558e-17

 My solution:

  z - signif(seq(.05,.85,by=.1), 5)
  z[2] - 0.15
 [1] 0
  z[2]==0.15
 [1] TRUE

 Peter

 On Thu, Jul 3, 2014 at 11:28 AM, Matthew Keller mckellerc...@gmail.com
 wrote:
  Hi all,
 
  A bit stumped here.
 
  z - seq(.05,.85,by=.1)
  z==.05 #good
  [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 
  z==.15  #huh
  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 
  More generally:
  sum(z==.25)
  [1] 1
  sum(z==.35)
  [1] 0
  sum(z==.45)
  [1] 1
  sum(z==.55)
  [1] 1
  sum(z==.65)
  [1] 0
  sum(z==.75)
  [1] 0
  sum(z==.85)
  [1] 1
 
  Does anyone have any ideas what is going on here?
 
  R.Version()
  $platform
  [1] x86_64-apple-darwin9.8.0
 
  $arch
  [1] x86_64
 
  $os
  [1] darwin9.8.0
 
  $system
  [1] x86_64, darwin9.8.0
 
  $status
  [1] 
 
  $major
  [1] 2
 
  $minor
  [1] 13.1
 
  $year
  [1] 2011
 
  $month
  [1] 07
 
  $day
  [1] 08
 
  $`svn rev`
  [1] 56322
 
  $language
  [1] R
 
  $version.string
  [1] R version 2.13.1 (2011-07-08)
 
  --
  Matthew C Keller
  Asst. Professor of Psychology
  University of Colorado at Boulder
  www.matthewckeller.com
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.




-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] lm models over all possible pairwise combinations of the columns of two matrices

2014-04-22 Thread Matthew Robinson

Dear all,

I am working through a problem at the moment and have got stuck. I have 
searched around on the help list for assistance but could not find anything - 
but apologies if I have missed something. A dummy example of my problem is 
below. I will continue to work on it, but any help would be greatly appreciated.

Thanks in advance for your time.

Best wishes,
Matt


I have a matrix of response variables:

p-matrix(c(rnorm(120,1),
rnorm(120,1),
rnorm(120,1)),
120,3)

and two matrices of covariates:

g-matrix(c(rep(1:3, each=40),
rep(3:1, each=40),
rep(1:3, 40)),
120,3)
m-matrix(c(rep(1:2, 60),
rep(2:1, 60),
rep(1:2, each=60)),
120,3)

For all combinations of the columns of the covariate matrices g and m I want to 
run these two models:

test - function(uniq_m, uniq_g, p = p) {


full - lm(p ~ factor(uniq_m) * factor(uniq_g))
null - lm(p ~ factor(uniq_m) + factor(uniq_g))
return(list('f'=full, 'n'=null))
}

So I want to test for an interaction between column 1 of m and column 1 of g, 
then column 2 of m and column 1 of g, then column 2 of m and column 2 of 
g...and so forth across all possible pairwise interactions. The response 
variable is the same each time and is a matrix containing multiple columns.


So far, I can do this for a single combination of columns:

test_1 - test(m[ ,1], g[ ,1], p)

And I can also run the model over all columns of m and one coloumn of g:

test_2 - apply(m, 2, function(uniq_m) {
test(uniq_m, g[ ,1], p = p)
})


I can then get the F statistics for each response variable of each model:

sapply(summary(test_2[[1]]$f), function(x) x$fstatistic)
sapply(summary(test_2[[1]]$n), function(x) x$fstatistic)

And I can compare models for each response variable using an F-test:

d1-colSums(matrix(residuals(test_2[[1]]$n),nrow(g),ncol(p))^2)
d2-colSums(matrix(residuals(test_2[[2]]$f),nrow(g),ncol(p))^2)
F-((d1-d2) / (d2/114))


My question is how do I run the lm models over all combinations of columns from 
the m and the g matrix, and get the F-statistics? While this is a dummy 
example, the real analysis will have a response matrix that is 700 x 8000, and 
the covariate matrices will be 700 x 4000 and 700 x 100 so I need something 
that is as fast as possible.





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [e1071] Features that are factors when exporting a model with write.svm

2014-02-21 Thread Matthew Wood

I have a trained SVM that I want to export with write.svm and
eventually use in libSVM. Some of my features are factors. Standard
libSVM only works with features that are doubles, so I need to figure
out how my features should be represented and used.

How does e1071 treat factors in an SVM? For feature foo with values
a and b I'm assuming it's something like foo_a (0 or 1) and foo_b
(0 or 1). Is that right?

Do factors get treated differently in an SVM? If I convert the factors
to intergers for libSVM, I'll lose the information that a feature
doesn't take on a range of values. Is that going to cause problems? I
don't know if the model takes that into account.

When using write.svm a scale file is also output. My scale file is
missing the same number of rows as I have features that are factors.
That's another indication to me that the factors are causing issues.

Thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [e1071] Features that are factors when exporting a model with write.svm

2014-02-21 Thread Matthew Wood

I may have been able to answer my own questions by reading the e1071
source. It looks like the features are just converted to doubles with
as.double(x). And, I haven't found where in the code yet, but it looks
like it's not scaling the factors which explains why I'm missing rows
in the scale file.

On Fri, Feb 21, 2014 at 1:50 PM, Matthew Wood doowt...@gmail.com wrote:
 I have a trained SVM that I want to export with write.svm and
 eventually use in libSVM. Some of my features are factors. Standard
 libSVM only works with features that are doubles, so I need to figure
 out how my features should be represented and used.

 How does e1071 treat factors in an SVM? For feature foo with values
 a and b I'm assuming it's something like foo_a (0 or 1) and foo_b
 (0 or 1). Is that right?

 Do factors get treated differently in an SVM? If I convert the factors
 to intergers for libSVM, I'll lose the information that a feature
 doesn't take on a range of values. Is that going to cause problems? I
 don't know if the model takes that into account.

 When using write.svm a scale file is also output. My scale file is
 missing the same number of rows as I have features that are factors.
 That's another indication to me that the factors are causing issues.

 Thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Biomod model access

2014-02-13 Thread Matthew Bayly

Hi great that was easy I feel like a bit of a fool for not figuring this 
out.


TO LOAD ALL SAVED MODELS AT ONCE:

library(biomod2)
# change directory to where you stored you’re original models (my documents 
is default if you did not specify). Go into the file models

*# TO LOAD ALL SAVED MODELS FROM PREVIOUS RUN*

rm(list=ls()) #will remove ALL objects currently stored in R

# open old models with the load command load()
list.files() # check to see if all your files appeared correctly

f - as.list(list.files()) # get ready to load

for(i in f) {

  load(i)

}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SDM using BIOMOD2 error message

2014-02-12 Thread Matthew Bayly



 I think it is a problem with your directory setting  changing your 
 directory. 



When you make your enviro stack you set your directory to: 
setwd(V:/BIOCLIM) 

Then when you import your species coordinates and presence/absence status 
you change your directory to:
setwd(C:/Users/Lindsie/Documents/R) 

Then you try to run your model  R says  'pred' is missing ~ because it 
cannot find your raster stacks. 

CHANGE YOUR  R DIRECTORY BACK TO WHERE YOU STORED YOUR ENVIRONMENTAL LAYERS 
PRIOR TO RUNNING THE MODEL!
myBiomodModelOut - BIOMOD_Modeling(myBiomodDa...
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Biomod model access

2014-02-12 Thread Matthew Bayly

I have been struggling with this same problem. I always have to re-run. 
PLEASE HELP!!

I have however figured out the whole data-format issue  am now able to 
save grid files for use in other GIS programs after they are re-exported.

On Thursday, August 15, 2013 1:32:31 AM UTC-7, Jenny Williams wrote:

 I am still trying to get my head around biomod2. I have run through the 
 tutorial a few times, which works really well in a linear format. 

 But, I want to see the models and assess them at every part of the 
 process. So, I need to: 

 1: be able to re-access all the files from /.BIOMOD_DATA/ once R is closed 
 and all the file links are lost. 
 e.g myBiomodModelOut 

 2: call the summary parameters for the models e.g GLM, I can see the files 
 but not sure how to access them. 
 e.g 
 myGLMs - BIOMOD_LoadModels(myBiomodModelOut, models='GLM') 
 #just produces a list 
 summary(myGLMs[1]) 
Length Class  Mode 
1 character character 
 #summary(GLM) doesn't work, but is the output that I am looking to find. 

 3. find the split datasets used for each of the iterations BIOMOD_Modeling 
 options; NbRunEval for DataSplit 

 Any help or pointers in the right direction would be greatly appreciated. 
 FYI the vignette does not seem to work: 
 http://127.0.0.1:15505/library/biomod2/doc/index.html 


 ** 
 Jenny Williams 
 Spatial Information Scientist, GIS Unit 
 Herbarium, Library, Art  Archives Directorate 
 Royal Botanic Gardens, Kew 
 Richmond, TW9 3AB, UK 

 Tel: +44 (0)208 332 5277 
 email: jenny.w...@kew.org 
 javascript:mailto:jenny.willi...@kew.orgjavascript: 

 ** 

 Film: The Forgotten Home of Coffee - Beyond the Gardens
 http://www.youtube.com/watch?v=-uDtytKMKpAsns=tw 
 Stories: Coffee Expedition - Ethiopia
 http://storify.com/KewGIS/coffee-expedition-ethiopia 
  Kew in Harapan Rainforest Sumatra
 http://storify.com/KewGIS/kew-in-harapan-rainforest 
 Articles: Seeing the wood for the trees
 http://www.kew.org/ucm/groups/public/documents/document/kppcont_060602.pdf 

 How Kew's GIS team and South East Asia botanists are working to help 
 conserve and restore a rainforest in Sumatra. Download a pdf of this 
 article here.
 http://www.kew.org/ucm/groups/public/documents/document/kppcont_060602.pdf 



  
 The Royal Botanic Gardens, Kew is a non-departmental public body with 
 exempt charitable status, whose principal place of business is at Royal 
 Botanic Gardens, Kew, Richmond, Surrey TW9 3AB, United Kingdom. 

 The information contained in this email and any attachments is intended 
 solely for the addressee(s) and may contain confidential or legally 
 privileged information. If you have received this message in error, please 
 return it immediately and permanently delete it. Do not use, copy or 
 disclose the information contained in this email or in any attachment. 

 Any views expressed in this email do not necessarily reflect the opinions 
 of RBG Kew. 

 Any files attached to this email have been inspected with virus detection 
 software by RBG Kew before transmission, however you should carry out your 
 own virus checks before opening any attachments. RBG Kew accepts no 
 liability for any loss or damage which may be caused by software viruses. 

 [[alternative HTML version deleted]] 

 __ 
 r-h...@r-project.org javascript: mailing list 
 https://stat.ethz.ch/mailman/listinfo/r-help 
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html 
 and provide commented, minimal, self-contained, reproducible code. 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Handling large SAS file in R

2014-02-05 Thread Matthew Shotwell

Completing the reverse engineering effort is the principle barrier to fully
incorporating the sas7bdat file format. Of course, SAS may change the
format specification at any time, and without our knowledge. The sas7bdat
package is a repository for the results of our (myself, Clint Cummins, and
several others) experiments with the file format, most notably the
'sas7bdat' vignette, which lays out our current understanding of the
structure of sas7bdat files. While others have reverse-engineered the file
format, this is the ONLY publicly available specification. Hence, my
feeling is that the vignette is the package's most important contribution.
A prototype reader is also included; the read.sas7bdat function. Some have
found it useful for routine work. But there are issues, as you have found.
Fortunately, there are ongoing efforts by others to implement more
efficient readers, using the data that we have compiled.

Best,
Matt

P.S. There is a read loop in the read.sas7bdat function, indexed by rows of
the tabular data, that you might use to indicate progress reading the file.

Colleagues

 Frank Harrell wrote that you need to purchase Stat/Transfer, which I did
 many years ago and continue to use.

 But I don't understand why the sas7bdat package (or something equivalent)
 cannot reverse engineer the SAS procedures so that R users can read
 sas7bdat files as well as StatTransfer.  I have been in contact with the
 maintainer, Matt Shotwell, regarding bugs in the present version (0.4) and
 he wrote:
 it tends to languish just one or two items from the top of my
 TODO... I hope to get back to it soon.
 I have also written to this bulletin board about the foreign package not
 being able to process certain SAS XPT files (which StatTransfer handled
 without any problem).

 I am a strong advocate of R and I have arranged work-arounds (using
 StatTransfer) in these cases.  However, R users would benefit from the
 ability of R to read any SAS file without intermediate software.   I
 would offer to participate in any efforts to accomplish this but I think
 that it is beyond my capabilities.

 Dennis


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] geo_bar x= and y= warnings and error help

2014-01-22 Thread Matthew Henn

Any insight on issues leading to the following error modes would be 
appreciated.

#Version_1 CALL
alphaDivOTU - ggplot(data=alphaDivOTU_pt1to5, aes(y = Num.OTUs,x = 
Patient,fill = Timepoint)) +
 geom_bar(position = position_dodge) +
 theme(text = element_text(family = 'Helvetica-Narrow',size = 18.0)) +
 scale_fill_manual(guide = guide_legend(),values = 
c(forestgreen,gray44,dodgerblue2,royalblue2,royalblue4,blue3)) +
 scale_y_continuous(breaks = pretty_breaks(n = 10.0,min.n = 5.0))

ggsave(plot=alphaDivOTU, filename='alphaDivOTU.png', scale=1, dpi=300, 
width=10, height=10, units=c(cm))

#Version_1 Error modes
Mapping a variable to y and also using stat=bin.
   With stat=bin, it will attempt to set the y value to the count of 
cases in each group.
   This can result in unexpected behavior and will not be allowed in a 
future version of ggplot2.
   If you want y to represent counts of cases, use stat=bin and don't 
map a variable to y.
   If you want y to represent values in the data, use stat=identity.
   See ?geom_bar for examples. (Deprecated; last used in version 0.9.2)
Error in .$position$adjust : object of type 'closure' is not subsettable

#Version_2 CALL
alphaDivOTU - ggplot(data=alphaDivOTU_pt1to5, aes(y = Num.OTUs,x = 
Patient,fill = Timepoint)) +
 geom_bar(position = position_dodge, stat = identity) +
 theme(text = element_text(family = 'Helvetica-Narrow',size = 18.0)) +
 scale_fill_manual(guide = guide_legend(),values = 
c(forestgreen,gray44,dodgerblue2,royalblue2,royalblue4,blue3)) +
 scale_y_continuous(breaks = pretty_breaks(n = 10.0,min.n = 5.0))

ggsave(plot=alphaDivOTU, filename='alphaDivOTU.png', scale=1, dpi=300, 
width=10, height=10, units=c(cm))

#For Version_2 I get the error:
Error in stat$parameters : object of type 'closure' is not subsettable

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help with removing extra legend elements in ggplot

2013-11-19 Thread Matthew Van Scoyoc

I can't get the fine tuning right with my legend. I get an extra legend
element 10 which is the point size in my plot. Can someone help me get rid
of this extra element? Additionally I would also like to reduce the size of
the legend.

If you want to reproduce my figure you can  download my data in csv format
here
https://github.com/scoyoc/EcoSiteDelineation/blob/master/VegNMDS_scores.csv

.

Here is my code...

veg.nmds.sc = read.csv(VegNMDS_scores.csv, header = T)

nmds.fig = ggplot(data = veg.nmds.sc, aes(x = NMDS1, y = NMDS2))
nmds.fig + geom_point(aes(color = VegType, shape = VegType, size = 10)) +
  scale_colour_manual(name = Vegetation Type,
  values = c(blue, magenta, gray50, red,
 cyan3,
 green4, gold)) +
  scale_shape_manual(name = Vegetation Type, values = c(15, 16, 17, 18,
 15, 16, 17)) +
  theme_bw() +
  theme(panel.background = element_blank(), panel.grid.major =
 element_blank(),
panel.grid.minor = element_blank(),
legend.key = element_rect(color = white)
)

I have been messing around with
  theme(..., legend.key.size = unit(1, cm))
but I keep getting the error could not find function unit. I'm not sure
why, isn't unit supposed to be part of the legend.key argument?

... and the resulting figure...
http://r.789695.n4.nabble.com/file/n4680764/VegNMDS.jpeg


Thanks for the help.
MVS
=
Matthew Van Scoyoc
Graduate Research Assistant, Ecology
Wildland Resources Department http://www.cnr.usu.edu/wild/  Ecology
Center http://www.usu.edu/ecology/
Quinney College of Natural Resources http://cnr.usu.edu/
Utah State University
Logan, UT
=
Think SNOW!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with removing extra legend elements in ggplot

2013-11-19 Thread Matthew Van Scoyoc

No dice. I still get the 10 legend element.
Thanks for the quick reply.
Cheers,

MVS
=
Matthew Van Scoyoc
Graduate Research Assistant, Ecology
Wildland Resources Department http://www.cnr.usu.edu/wild/  Ecology
Center http://www.usu.edu/ecology/
Quinney College of Natural Resources http://cnr.usu.edu/
Utah State University
Logan, UT

mvansco...@aggiemail.usu.eduhttps://sites.google.com/site/scoyoc/
=
Think SNOW!


On Tue, Nov 19, 2013 at 5:12 PM, David Winsemius dwinsem...@comcast.netwrote:


 On Nov 19, 2013, at 3:44 PM, Matthew Van Scoyoc wrote:

  I can't get the fine tuning right with my legend. I get an extra legend
  element 10 which is the point size in my plot. Can someone help me get
 rid
  of this extra element? Additionally I would also like to reduce the size
 of
  the legend.
 
  If you want to reproduce my figure you can  download my data in csv
 format
  here
  
 https://github.com/scoyoc/EcoSiteDelineation/blob/master/VegNMDS_scores.csv
 
  .
 
  Here is my code...
 
  veg.nmds.sc = read.csv(VegNMDS_scores.csv, header = T)
 
  nmds.fig = ggplot(data = veg.nmds.sc, aes(x = NMDS1, y = NMDS2))
  nmds.fig + geom_point(aes(color = VegType, shape = VegType, size = 10))
 +
  scale_colour_manual(name = Vegetation Type,
  values = c(blue, magenta, gray50, red,
  cyan3,
 green4, gold)) +
  scale_shape_manual(name = Vegetation Type, values = c(15, 16, 17, 18,
  15, 16, 17)) +
  theme_bw() +
  theme(panel.background = element_blank(), panel.grid.major =
  element_blank(),
panel.grid.minor = element_blank(),
legend.key = element_rect(color = white)
)
 
  I have been messing around with
  theme(..., legend.key.size = unit(1, cm))
  but I keep getting the error could not find function unit. I'm not sure
  why, isn't unit supposed to be part of the legend.key argument?

 Try this workaround to what sounds like a bug:

 library(grid)

 # then repeat the call.

 --

 David Winsemius
 Alameda, CA, USA



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with removing extra legend elements in ggplot

2013-11-19 Thread Matthew Van Scoyoc

Awesome! Thanks for the fix Dennis, and thanks for clearing up aes() too.
It makes sense now.
Cheers,

MVS
=
Matthew Van Scoyoc
Graduate Research Assistant, Ecology
Wildland Resources Department http://www.cnr.usu.edu/wild/  Ecology
Center http://www.usu.edu/ecology/
Quinney College of Natural Resources http://cnr.usu.edu/
Utah State University
Logan, UT

mvansco...@aggiemail.usu.eduhttps://sites.google.com/site/scoyoc/
=
Think SNOW!


On Tue, Nov 19, 2013 at 5:52 PM, Dennis Murphy djmu...@gmail.com wrote:

 The additional element comes from this code:

 geom_point(aes(color = VegType, shape = VegType, size = 10))

 Take the size argument outside the aes() statement and the legend will
 disappear:

 geom_point(aes(color = VegType, shape = VegType), size = 10)

 The aes() statement maps a variable to a plot aesthetic. In this case
 you're mapping VegType to color and shape. You want to *set* the size
 aesthetic to a constant value, and that is done by assigning the value
 10 to the size aesthetic outside of aes().

 Dennis

 On Tue, Nov 19, 2013 at 4:35 PM, Matthew Van Scoyoc sco...@gmail.com
 wrote:
  No dice. I still get the 10 legend element.
  Thanks for the quick reply.
  Cheers,
 
  MVS
  =
  Matthew Van Scoyoc
  Graduate Research Assistant, Ecology
  Wildland Resources Department http://www.cnr.usu.edu/wild/  Ecology
  Center http://www.usu.edu/ecology/
  Quinney College of Natural Resources http://cnr.usu.edu/
  Utah State University
  Logan, UT
 
  mvansco...@aggiemail.usu.eduhttps://sites.google.com/site/scoyoc/
  =
  Think SNOW!
 
 
  On Tue, Nov 19, 2013 at 5:12 PM, David Winsemius dwinsem...@comcast.net
 wrote:
 
 
  On Nov 19, 2013, at 3:44 PM, Matthew Van Scoyoc wrote:
 
   I can't get the fine tuning right with my legend. I get an extra
 legend
   element 10 which is the point size in my plot. Can someone help me
 get
  rid
   of this extra element? Additionally I would also like to reduce the
 size
  of
   the legend.
  
   If you want to reproduce my figure you can  download my data in csv
  format
   here
   
 
 https://github.com/scoyoc/EcoSiteDelineation/blob/master/VegNMDS_scores.csv
  
   .
  
   Here is my code...
  
   veg.nmds.sc = read.csv(VegNMDS_scores.csv, header = T)
  
   nmds.fig = ggplot(data = veg.nmds.sc, aes(x = NMDS1, y = NMDS2))
   nmds.fig + geom_point(aes(color = VegType, shape = VegType, size =
 10))
  +
   scale_colour_manual(name = Vegetation Type,
   values = c(blue, magenta, gray50, red,
   cyan3,
  green4, gold)) +
   scale_shape_manual(name = Vegetation Type, values = c(15, 16, 17,
 18,
   15, 16, 17)) +
   theme_bw() +
   theme(panel.background = element_blank(), panel.grid.major =
   element_blank(),
 panel.grid.minor = element_blank(),
 legend.key = element_rect(color = white)
 )
  
   I have been messing around with
   theme(..., legend.key.size = unit(1, cm))
   but I keep getting the error could not find function unit. I'm not
 sure
   why, isn't unit supposed to be part of the legend.key argument?
 
  Try this workaround to what sounds like a bug:
 
  library(grid)
 
  # then repeat the call.
 
  --
 
  David Winsemius
  Alameda, CA, USA
 
 
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Save intermediate result in a same file

2013-10-01 Thread Matthew

Hello everybody,

i have to save a 100 iteration computation in a file every 5 iterations
until the end.
I first give a vector A  of 100 elements for the 100 iterations and i want
to update A every 5 iterations.

I use save but it doesn't work. 
Someone has an idea,  i need a help

Cheers.





--
View this message in context: 
http://r.789695.n4.nabble.com/Save-intermediate-result-in-a-same-file-tp4677350.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Confusing behaviour in data.table: unexpectedly changing variable

2013-09-25 Thread Matthew Dowle



Very sorry to hear this bit you.  If you need a copy of names before 
changing them by reference :


oldnames - copy(names(DT))

This will be documented and it's on the bug list to do so. copy is 
needed in other circumstances too, see ?copy.


More details here :

http://stackoverflow.com/questions/18662715/colnames-being-dropped-in-data-table-in-r
http://stackoverflow.com/questions/15913417/why-does-data-table-update-namesdt-by-reference-even-if-i-assign-to-another-v

Btw, the r-help posting guide says (last time I looked) you should only 
post to r-help about packages if you have tried the maintainer first but 
didn't hear from them; i.e., r-help isn't for support about packages.


I don't follow r-help, so please continue to cc me if you reply.

Matthew

On 25/09/13 00:47, Jonathan Dushoff wrote:

I got bitten badly when a variable I created for the purpose of
recording an old set of names changed when I didn't think I was going
near it.

I'm not sure if this is a desired behaviour, or documented, or warned
about.  I read the data.table intro and the FAQ, and also ?setnames.

Ben Bolker created a minimal reproducible example:

library(data.table)
DT = data.table(x=rep(c(a,b,c),each=3), y=c(1,3,6), v=1:9)
names(DT)
## [1] x y v

oldnames - names(DT)
print(oldnames)
## [1] x y v

setnames(DT, LETTERS[1:3])
print(oldnames)
## [1] A B C



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regularized Discriminant Analysis scores, anyone?

2013-06-02 Thread Matthew Fagan

Thank you Dr. Ligges, i very much appreciate the quick reply.  i 
wondered if that was the case, based on the math as I (poorly) 
understood it.  However i remain confused.   page 107 from the rrcov 
package PDF makes me think I can derive LDA-style discriminant scores 
for a QDA:


library(rrcov)
data(iris)
qda1-QdaClassic(x=iris[,1:4], grouping=iris[,5])
pred_qda-predict(qda1, iris[,1:4])
head(pred_qda@x)
plotdat-pred_qda@x
plot(plotdat[,1], plotdat[,2])
plot(plotdat[,2], plotdat[,3])

pred_qda$x looks like QDA discriminant scores.   No doubt you are right, 
but if you have a moment, I'd love to know what these scores are and 
what they summarize.


In addition, I have run into this nice set of lengthy R code to manually 
calculate discriminant scores for a QDA:

https://cs.uwaterloo.ca/~a2curtis/courses/2005/ML-classification.pdf

None of this means i can calculate discriminant scores for a RDA, of 
course, but QDA is my back-up choice.


Bottom line: am i am completely misinterpreting what I am seeing here, 
mathematically?  Or is this just the result of different ways of 
implementing QDA in R?


Regards, and thanks again,
Matt


On 6/2/2013 10:39 AM, Uwe Ligges wrote:



On 02.06.2013 05:01, Matthew Fagan wrote:

Hi all,

I am attempting to do Regularized Discriminant Analysis (RDA) on a large
dataset, and I want to extract the RDA  discriminant score matrix.  But
the predict function in the klaR package, unlike the predict function
for LDA in the MASS package, doesn't seem to give me an option to
extract the scores.  Any suggestions?


There are no such scores:

same as for qda, you do not follow the Fisher idea of the linear 
discriminant components any more: Your space is now partitioned by 
ellipsoid like structures based on the estimation of the inner-class 
covariance matrices.


rda as implemented in klaR (see the reference given on the help page) 
is a regularization that helps to overcome problems when estimating 
non-singular covariance matrices for the separate classes.




i have already tried (and failed; ran out of 16 GB of memory) to do this
with the rda package: don't know why, but the klaR package seems to be
much more efficient with memory.  I have included an example below:


The rda package provides a completely different regularization 
technique, see the reference given on the help page.


Best,
Uwe Ligges





library(klaR)
library(MASS)

data(iris)

x - rda(Species ~ ., data = iris, gamma = 0.05, lambda = 0.2)
rda1-predict(x, iris[, 1:4])
str(rda1)

#  This gets you an object with posterior probabilities and classes, but
no discriminant scores!

#  if you run lda

y - lda(Species ~ ., data = iris)
lda1-predict(y, iris[, 1:4])
str(lda1)

head(lda1$x)  #  gets you the discriminant scores for the LDA. But how
to do this for RDA?

#  curiously, the QDA function in MASS has this same problem, although
you can get around it using the rrcov package.

Regards, and thank very much for any help,
Matt

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Matthew Fagan
Columbia University
Department of Ecology, Evolution, and Environmental Biology
512-569-1417 (cell/home)
(212) 854-9987 (office)
(212) 854-8188 (fax)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Regularized Discriminant Analysis scores, anyone?

2013-06-01 Thread Matthew Fagan


Hi all,

I am attempting to do Regularized Discriminant Analysis (RDA) on a large 
dataset, and I want to extract the RDA  discriminant score matrix.  But 
the predict function in the klaR package, unlike the predict function 
for LDA in the MASS package, doesn't seem to give me an option to 
extract the scores.  Any suggestions?


i have already tried (and failed; ran out of 16 GB of memory) to do this 
with the rda package: don't know why, but the klaR package seems to be 
much more efficient with memory.  I have included an example below:


library(klaR)
library(MASS)

data(iris)

x - rda(Species ~ ., data = iris, gamma = 0.05, lambda = 0.2)
rda1-predict(x, iris[, 1:4])
str(rda1)

#  This gets you an object with posterior probabilities and classes, but 
no discriminant scores!


#  if you run lda

y - lda(Species ~ ., data = iris)
lda1-predict(y, iris[, 1:4])
str(lda1)

head(lda1$x)  #  gets you the discriminant scores for the LDA.  But how 
to do this for RDA?


#  curiously, the QDA function in MASS has this same problem, although 
you can get around it using the rrcov package.


Regards, and thank very much for any help,
Matt

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Contour lines in a persp plot

2013-05-17 Thread Matthew

Thanks a lot, that is all i want. If someone is interessed, see the code
below

panel.3d.contour - 
function(x, y, z, rot.mat, distance, 
 nlevels = 20, zlim.scaled, ...) # les3 points de suspension
pour dire les autres paramètres sont ceux données par défaut
{
add.line - trellis.par.get(add.line)
panel.3dwire(x, y, z, rot.mat, distance, 
 zlim.scaled = zlim.scaled, ...)
clines - 
contourLines(x, y, matrix(z, nrow = length(x), byrow = TRUE),
 nlevels = nlevels)
for (ll in clines) {
m - ltransform3dto3d(rbind(ll$x, ll$y, zlim.scaled[2]), 
  rot.mat, distance)
panel.lines(m[1,], m[2,], col = add.line$col,
lty = add.line$lty, lwd = add.line$lwd)
}
}


fn-function(x,y){sin(x)+2*y} #this looks like a corrugated tin roof

x-seq(from=1,to=100,by=2) #generates a list of x values to sample
y-seq(from=1,to=100,by=2) #generates a list of y values to sample

z-outer(x,y,FUN=fn) #applies the funct. across the combos of x and y


wireframe(z,zlim = c(1, 300), nlevels = 10,
  aspect = c(1, 0.5), panel.aspect = 0.6,
  panel.3d.wireframe = panel.3d.contour,
   shade = FALSE ,
  screen = list(z = 20, x = -60))



--
View this message in context: 
http://r.789695.n4.nabble.com/Contour-lines-in-a-persp-plot-tp4667220p4667309.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] expanding a presence only dataset into presence/absence

2013-04-29 Thread Matthew Venesky

Hello,

I'm working with a very large dataset (250,000+ lines in its' current form)
that includes presence only data on various species (which is nested within
different sites and sampling dates). I need to convert this into a dataset
with presence/absence for each species. For example, I would like to expand
My current data to Desired data:

My current data

Species Site Date
a 1 1
b 1 1
b 1 2
c 1 3

Desired data

Species Present Site Date
a 1 1 1
b 1 1 1
c 0 1 1
a 0 2 2
b 1 2 2
C 0 2 2
a 0 3 3
b 0 3 3
c 1 3 3

I've scoured the web, including Rseek and haven't found a resolution (and
note that a similar question was asked sometime in 2011 without an answer).
Does anyone have any thoughts? Thank you in advance.

--

Matthew D. Venesky, Ph.D.

Postdoctoral Research Associate,
Department of Integrative Biology,
The University of South Florida,
Tampa, FL 33620

Website: http://mvenesky.myweb.usf.edu/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Factor to numeric conversion - as.numeric(as.character(f))[f] - Language definition seems to say to not use this.

2013-04-01 Thread Matthew Lundberg

These two seem to be at odds.  Is this the case?

From help(factor) - section Warning:

To transform a factor f to approximately its original numeric values,
as.numeric(levels(f))[f] is recommended and slightly more efficient than
as.numeric(as.character(f)).

From the language definition - section 2.3.1:

Factors are currently implemented using an integer array to specify the
actual levels and
a second array of names that are mapped to the integers. Rather
unfortunately users often
make use of the implementation in order to make some calculations easier.
This, however,
is an implementation issue and is not guaranteed to hold in all
implementations of R.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Factor to numeric conversion - as.numeric(as.character(f))[f] - Language definition seems to say to not use this.

2013-04-01 Thread Matthew Lundberg

When used as an index, the factor is implicitly converted to integer.  In
the expression as.numeric(levels(f))[f], the vector as.numeric(levels(f))
is indexed by as.integer(f).

This appears to rely on the current implementation, as mentioned in section
2.3.1 of the language definition.


On Mon, Apr 1, 2013 at 1:49 PM, Peter Ehlers ehl...@ucalgary.ca wrote:

 On 2013-04-01 10:48, Matthew Lundberg wrote:

 These two seem to be at odds.  Is this the case?

  From help(factor) - section Warning:


 To transform a factor f to approximately its original numeric values,
 as.numeric(levels(f))[f] is recommended and slightly more efficient than
 as.numeric(as.character(f)).

  From the language definition - section 2.3.1:


 Factors are currently implemented using an integer array to specify the
 actual levels and
 a second array of names that are mapped to the integers. Rather
 unfortunately users often
 make use of the implementation in order to make some calculations easier.
 This, however,
 is an implementation issue and is not guaranteed to hold in all
 implementations of R.


 Hint:

  f - factor(sample(5, 10, TRUE))
  as.numeric(levels(f))[f]

  g - factor(sample(letters[1:5], 10, TRUE))
  as.numeric(levels(g))[g]

 Peter Ehlers



 [[alternative HTML version deleted]]

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Factor to numeric conversion - as.numeric(levels(f))[f] - Language definition seems to say to not use this.

2013-04-01 Thread Matthew Lundberg

Note the edited subject line!  I don't know why I typed it as it was before.

This says that as.numeric(as.character(f)) will work regardless of the
implementation, and I agree.

It's the recommendation to use as.numeric(levels(f))[f] that has me
wondering about section 2.3.1 of the language definition.  I expect that
this idiom is in widespread use, and perhaps the language definition should
be changed.


On Mon, Apr 1, 2013 at 2:58 PM, Bert Gunter gunter.ber...@gene.com wrote:

 Yup. Note also:

  as.character.factor
 function (x, ...)
 levels(x)[x]

 But of course this is OK, since this can change if the implementation
 does. Which is the whole point, of course.

 -- Bert



 On Mon, Apr 1, 2013 at 12:16 PM, Matthew Lundberg
 matthew.k.lundb...@gmail.com wrote:
 
  When used as an index, the factor is implicitly converted to integer.  In
  the expression as.numeric(levels(f))[f], the vector as.numeric(levels(f))
  is indexed by as.integer(f).
 
  This appears to rely on the current implementation, as mentioned in
 section
  2.3.1 of the language definition.
 
 
  On Mon, Apr 1, 2013 at 1:49 PM, Peter Ehlers ehl...@ucalgary.ca wrote:
 
   On 2013-04-01 10:48, Matthew Lundberg wrote:
  
   These two seem to be at odds.  Is this the case?
  
From help(factor) - section Warning:
  
  
   To transform a factor f to approximately its original numeric values,
   as.numeric(levels(f))[f] is recommended and slightly more efficient
 than
   as.numeric(as.character(f)).
  
From the language definition - section 2.3.1:
  
  
   Factors are currently implemented using an integer array to specify
 the
   actual levels and
   a second array of names that are mapped to the integers. Rather
   unfortunately users often
   make use of the implementation in order to make some calculations
 easier.
   This, however,
   is an implementation issue and is not guaranteed to hold in all
   implementations of R.
  
  
   Hint:
  
f - factor(sample(5, 10, TRUE))
as.numeric(levels(f))[f]
  
g - factor(sample(letters[1:5], 10, TRUE))
as.numeric(levels(g))[g]
  
   Peter Ehlers
  
  
  
   [[alternative HTML version deleted]]
  
   __**
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/**listinfo/r-help
 https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide http://www.R-project.org/**
   posting-guide.html http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
  
  
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.




 --

 Bert Gunter
 Genentech Nonclinical Biostatistics

 Internal Contact Info:
 Phone: 467-7374
 Website:

 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with R CMD check and the inconsolata font business

2013-03-05 Thread Matthew Dowle



On 11/3/2011 3:30 PM, Brian Diggs wrote:


Well, I figured it out.  Or at least got it working.  I had to run

initexmf --mkmaps

because apparently there was something wrong with my font mappings.  
I
don't know why; I don't know how.  But it works now.  I think 
installing

the font into the Windows Font directory was not necessary.  I'm
including the solution in case anyone else has this problem.


Many thanks Brian Diggs! I just had the same problem and that fixed it.

Matthew

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] low pass filter analysis in R

2013-02-07 Thread Shotwell, Matthew Stephen

Janesh,

This might help get you started:

http://biostatmatt.com/archives/78

(apologies for linking to my own blog)

Regards,

Matt

--

Message: 51
Date: Wed, 6 Feb 2013 18:50:43 -0600
From: Janesh Devkota janesh.devk...@gmail.com
To: r-help@r-project.org
Subject: [R] low pass filter analysis in R
Message-ID:
CAPTbr1rrSmUgmjjKL54u2KZzzEAFLUXALCuH=wofrbttaky...@mail.gmail.com
Content-Type: text/plain

Hello R users,

I am trying to use R to do the low pass filter analysis for the tidal data.
I am a novice in R and so far been doing only simple stuffs on R. I found a
package called signal but couldn't find the proper tutorial for the low
pass filter.

Could anyone point me to the proper tutorial or starting point on how to do
low pass filter analysis in R ?

Thank you so much.

Janesh

[[alternative HTML version deleted]]

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Rscript on Mac : specify R64 over R (32-bit version)

2013-01-16 Thread Matthew Pettis

Hi,

I have both R and R64 installed on Mac OSX 10.8 Mountain Lion (64-bit).
 When I run the command

sessionInfo()

from within Rscript, I get:

R version 2.15.2 (2012-10-26)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)

Is there a way to make Rscript point at the R64 rather than R (32-bit)?

Thanks,
Matt

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] SQLDF column errors

2013-01-15 Thread Matthew Liebers

I am trying to exclude integer values from a small data frame 1, d1 that
have matching hits in data frame 2, d2 (Very big) which involves matching
those hits first.  I am trying to use sqldf on the df's in the following
fashion:

df1:
V1
12675
14753
16222
18765

df2: head(df2)
V1  V2
13647 rd1500
14753 rd1580
15987 rd1590
16222 rd2020.

df1_new-sqldf(select df1.V1, df2.V2 where rs10.V1 = d10.pos) - Ideally I
would like to try to use delete or not equal to != though I can only
find that delete works with sqldf.
but it returns this error:
 Error in sqliteExecStatement(con, statement, bind.data) :
  RS-DBI driver: (error in statement: no such column: df1.V1)
I am also trying this:
df1_new-sqldf(select V1 from df1, V2 from df2 where df1.V1 = df2.V1)
which returns this error:
Error in sqliteExecStatement(con, statement, bind.data) :
  RS-DBI driver: (error in statement: near from: syntax error)

If anyone with sqldf knowledge could lend me a hand that would be great.
Thanks!

Matt

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] sqldf merging with subset in specific range

2012-12-26 Thread Matthew Liebers

Hi all:

I have two data sets.  Set A includes a long list of hits in a single
column, say:
m$V1
10
15
36
37
38
44
45
57
61
62
69 ...and so on

Set B includes just a few key ranges set up by way of a minimum in column X
and a maximum in column Y.  Say,
n$X n$Y
30   38   # range from 30 to 38
52   62   # range from 52 to 62

I would like the output to be the rows containing the following columns:
m$V1
36
37
38
57
61
62

I am interested in isolating the hits in data set A that correspond to any
of the hotspot ranges in data set B.  I have downloaded sqldf and tried a
couple things but I cannot do a traditional merge since set B is based on a
range.  I can always do a manual subset but I am trying to figure out if
there is anything more expedient since these df's will be quite large.

Thanks!

Matt

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] confirming a formula for use with lmer

2012-12-19 Thread Matthew Panichello


Hello,

I recently began using R and the lme4 package to carry out linear mixed 
effects analyses.


I am interested in the effects of variables 'prime','time', and 'mood' 
on 'reaction_time' while taking into account the random effect 
'subjects.' I've read through documentation on lme4 and came up with the 
following formula for use with lmer:


reaction_time ~ (mood*prime*soa) + (1|subject)

Prime and soa were repeated measures within subjects, while mood was 
manipulated between subjects. As I understand it, however, this 
distinction does not affect how the formula should be written.


While I've done my background reading and think this formula is correct, 
I'd appreciate an expert with more experience than I to double check my 
work.


Thanks in advance for any help,

Matt



The information in this e-mail is intended only for the ...{{dropped:11}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Updating Tom Short's R Reference Card, contacting him?

2012-11-21 Thread Matthew Baggott

I am uncertain about how to acknowledge the fact that $ can do partial
matching in the space of about 30 characters.  One option is this:

x[[name]]   column named name
x$name   same as above (almost always)

Is that better or worse than ignoring this issue, or is there an even
better phrasing?

As per the other suggestions, I fixed the matrices indexing info,
pkg::foo() now has not usually required; and -  now is explained
as Left assignment in outer lexical scope; not for beginners

Plus, I've been able to get in touch with Tom Short. :-)

Thanks to Jeff Newmiller,Dennis Murphy, and Peter Dalgaard for these
helpful suggestions and corrections!

regards,
m@

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Updating Tom Short's R Reference Card, contacting him?

2012-11-20 Thread Matthew Baggott

I made an update/reboot of Tom Short's classic and public domain R
Reference Card.  His is from late 2004 and I've found myself giving it to
new R users with additional notes about packages.

If anyone knows how to reach Tom, that would be great.  I am titling this
reboot Short R Reference, in a play on his name, but I would like to know
he wants his name (and/or email) on this version.

Also if anyone feels like providing corrections or comments, the release
candidate is here.  To view it in full resolution, you may need to download
it:

https://docs.google.com/open?id=0B8NgE2q8ITzTQnhPTFVjVXlOaHM

 regards,
m@

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Clustering groups according to multiple variables

2012-10-31 Thread Matthew Ouellette

Dear R help,


I am trying to cluster my data according to group in a data frame such as
the following:

df=data.frame(group=rep(c(a,b,c,d),10),(replicate(100,rnorm(40


I'm not sure how to tell hclust() that I want to cluster according to the
group variable.  For example:

dfclust=hclust(dist(df),ave)

plot(dfclust)

Clusters according to each individual row.  What I'm looking for is an
unrooted tree that will show similarity/dissimilarity among groups
according to the data set as a whole.

I appreciate the help,


MO

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Clustering groups according to multiple variables

2012-10-31 Thread Matthew Ouellette

Dear R help,


I am trying to cluster my data according to group in a data frame such as
the following:

df=data.frame(group=rep(c(a,b,c,d),10),(replicate(100,rnorm(40


I'm not sure how to tell hclust() that I want to cluster according to the
group variable.  For example:

dfclust=hclust(dist(df),ave)

plot(dfclust)

Clusters according to each individual row.  What I'm looking for is an
unrooted tree that will show similarity/dissimilarity among groups
according to the data set as a whole.

I appreciate the help,


MO

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] optimize and mcparallel problem

2012-10-03 Thread Matthew Wolak

Dear list,

I am running into 2 problems when using the optimize function from the
stats package (note: I've also tried unsuccessfully to  use optim,
nlm,  nlminb).  The second problem is caused by my solution to the
first, so I am asking if anyone has a better solution to the first
question, or if there exists a solution to the second problem.  I
should also mention that what I am
working on is a function for a package - so I need the code
to be applicable to all platforms (I understand that 'multicore'
doesn't really work on Windows, but for the second problem I mean all
platforms, except windows)

The first problem:
I have a function that runs a linear mixed model with a constrained
variance for one of the random effects, computes a loglikelihood ratio
test statistic (LRT), and returns the absolute value of the difference
between the LRT and some pre-defined value (e.g., 2).  I have made a
dummy function, called foo below that has the same inputs and
outputs without the complicated inner workings of my actual function.
My first problem, is that I don't just want to know the end value
(x) that minimizes the output to foo (i.e., diff), but every x
and the corresponding diff used by the optimize function.  My
solution to this is to create an object (vals) outside of foo and
write to this object.

   foo - function(x){
  vals - c(vals, x)
  diff - abs(x - 2)
  diff
   }


This works well so far:

   vals - NULL
   out1 - optimize(foo, interval = seq(0, 4, 0.2))
   vals


However, the second problem arises if I want to use the parallel
function in the multicore package:

   library(multicore)
   vals - NULL
   out2_tmp - mcparallel(optimize(foo, interval = seq(0, 4, 0.2)))
   out2 - collect(out2_tmp, wait = TRUE)
   vals

Predictably, the child process does not return the vals object when
I use the collect function.

To summarize, my first question is whether or not there is a better
way to return all of the values over which optimize evaluates my
function.  The second question is if I do use my solution to the first
question, how can I get the vals object returned from the child
process?

Thanks anyone very much for any and all help!!

Sincerely,
Matthew

-- 

Matthew Wolak
PhD Candidate
Evolution, Ecology, and Organismal Biology Graduate Program
University of California Riverside
http://student.ucr.edu/~mwola001/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Hosted R

2012-08-24 Thread Matthew K. Hettinger

I looked in the archives and couldn't find anything that really 
addressed my question so here it is -


Does anyone know of any web sites/environments that hosts R for free, 
web-based, multi-user access to the R engine. My apologies if the 
question is too simplistic for this forum. The reason I ask is that I'm 
looking at the possibility of establishing an R grid if one doesn't 
already exist, and if one does, then I'm looking for interfaces, 
protocols, and guidelines for adding an R node.


--
Matthew K. Hettinger, Enterprise Architect and Systemist
Mathet Consulting, Inc.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to sort huge ( 2^31 row) dataframes quickly

2012-07-30 Thread Matthew Keller

Hello all,

I have some genetic datasets (gzipped) that contain 6 columns and
upwards of 10s of billions of rows. The largest dataset is about 16 GB
on file, gzipped (!). I need to sort them according to columns 1, 2,
and 3. The setkey() function in the data.table package does this
quickly, but of course we're limited by R not being able to index
vectors with  2^31 elements, and bringing in only the parts of the
dataset we need is not applicable here.

I'm asking for practical advice from people who've done this or who
have ideas. We'd like to be able to sort the biggest datasets in hours
rather than days (or weeks!). We cannot have any process take over 50
GB RAM max (we'd prefer smaller so we can parallelize). .

Relational databases seem too slow, but maybe I am wrong. A quick look
at the bigmemory package doesn't turn up an ability to sort like this,
but again, maybe I'm wrong. My computer programmer writes in C++, so
if you have ideas in C++, that works too.

Any help would be much appreciated... Thanks!

Matt


-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extracting standard errors for adjusted fixed effect sizes in lmer

2012-07-22 Thread Matthew Ouellette

Dear R help,

Does no one have an idea of where I might find information that could help
me with this problem?  I apologize for re-posting - I have half a suspicion
that my original message did not make it through.

I hope you all had a good weekend and look forward to your reply,
MO


On Fri, Jul 20, 2012 at 11:56 AM, MO wrote:

 Dear R help list,

 I have done a lot of searching but have not been able to find an answer to
 my problem.  I apologize in advance if this has been asked before.

 I am applying a mixed model to my data using lmer.  I will use sample data
 to illustrate my question:

 library(lme4)
 library(arm)
 data(HR, package = SASmixed)
  str(HR)
 'data.frame': 120 obs. of  5 variables:
  $ Patient: Factor w/ 24 levels 201,202,203,..: 1 1 1 1 1 2 2 2 2 2
 ...
  $ Drug   : Factor w/ 3 levels a,b,p: 3 3 3 3 3 2 2 2 2 2 ...
  $ baseHR : num  92 92 92 92 92 54 54 54 54 54 ...
  $ HR : num  76 84 88 96 84 58 60 60 60 64 ...
  $ Time   : num  0.0167 0.0833 0.25 0.5 1 ...

  fm1 - lmer(HR ~ baseHR + Time + Drug + (1 | Patient), HR)

  fixef(fm1)  ##Extract estimates of fixed effects

 (Intercept)  baseHRTime   Drugb   Drugp

  32.6037923   0.5881895  -7.0272873   4.6795262  -1.0027581

  se.fixef(fm1)  ##Extract standard error of estimates of fixed effects

 (Intercept)  baseHRTime   Drugb   Drugp

   9.9034008   0.1184529   1.4181457   3.5651679   3.5843026

 ##Because the estimate of the fixed effects are displayed as differences
 from the intercept (I think?), I can back calculate the actual effect sizes
 easily enough.  However, how would I do a similar calculation for the
 standard error for these effect sizes (since these error estimates are for
 the difference in means of effects) if my design isn't balanced (which
 confuses things tremendously when working with a data set as large as
 mine)?  It may help to point out that I'm working with microarray data;
 applying the same model for each gene (hundreds of genes total) across
 multiple samples (hundreds of samples total), but as an R beginner I like
 to start with small data samples and work my way up.

 I appreciate the help,

 MO



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Extracting standard errors for adjusted fixed effect sizes in lmer

2012-07-20 Thread Matthew Ouellette

Dear R help list,

I have done a lot of searching but have not been able to find an answer to
my problem.  I apologize in advance if this has been asked before.

I am applying a mixed model to my data using lmer.  I will use sample data
to illustrate my question:

library(lme4)
library(arm)
data(HR, package = SASmixed)
 str(HR)
'data.frame': 120 obs. of  5 variables:
 $ Patient: Factor w/ 24 levels 201,202,203,..: 1 1 1 1 1 2 2 2 2 2
...
 $ Drug   : Factor w/ 3 levels a,b,p: 3 3 3 3 3 2 2 2 2 2 ...
 $ baseHR : num  92 92 92 92 92 54 54 54 54 54 ...
 $ HR : num  76 84 88 96 84 58 60 60 60 64 ...
 $ Time   : num  0.0167 0.0833 0.25 0.5 1 ...

 fm1 - lmer(HR ~ baseHR + Time + Drug + (1 | Patient), HR)

 fixef(fm1)  ##Extract estimates of fixed effects

(Intercept)  baseHRTime   Drugb   Drugp

 32.6037923   0.5881895  -7.0272873   4.6795262  -1.0027581

 se.fixef(fm1)  ##Extract standard error of estimates of fixed effects

(Intercept)  baseHRTime   Drugb   Drugp

  9.9034008   0.1184529   1.4181457   3.5651679   3.5843026

##Because the estimate of the fixed effects are displayed as differences
from the intercept (I think?), I can back calculate the actual effect sizes
easily enough.  However, how would I do a similar calculation for the
standard error for these effect sizes (since these error estimates are for
the difference in means of effects) if my design isn't balanced (which
confuses things tremendously when working with a data set as large as
mine)?  It may help to point out that I'm working with microarray data;
applying the same model for each gene (hundreds of genes total) across
multiple samples (hundreds of samples total), but as an R beginner I like
to start with small data samples and work my way up.

I appreciate the help,

MO

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data.table vs plyr reg output

2012-06-29 Thread Matthew Dowle

Hi Geoff,

Please see this part of the r-help posting guide :

  For questions about functions in standard packages distributed with R
(see the FAQ Add-on packages in R), ask questions on R-help.  If the
question relates to a contributed package , e.g., one downloaded from CRAN,
try contacting the package maintainer first. You can also use
find(functionname) and packageDescription(packagename) to find this
information. ONLY send such questions to R-help or R-devel if you get no
reply or need further assistance. This applies to both requests for help and
to bug reports. 

Where I've capitalised ONLY since it is bold in the original HTML.  I only
saw your post thanks to Google Alerts.

maintainer(data.table) returns the email address of the datatable-help
list, with the posting guide in mind. However, for questions like this, I'd
suggest the data.table tag on Stack Overflow (which I subscribe to) :

http://stackoverflow.com/questions/tagged/data.table

Btw, I recently presented at LondonR.  Here's a link to the slides :

http://datatable.r-forge.r-project.org/LondonR_2012.pdf

Matthew



--
View this message in context: 
http://r.789695.n4.nabble.com/data-table-vs-plyr-reg-output-tp4634518p4634865.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to convert list of matrix (raster:extract o/p) to data table with additional colums (polygon Id, class)

2012-06-29 Thread Matthew Dowle


AKJ,

Please see this recent answer :

http://r.789695.n4.nabble.com/data-table-vs-plyr-reg-output-tp4634518p4634865.html

Matthew



--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-convert-list-of-matrix-raster-extract-o-p-to-data-table-with-additional-colums-polygon-Id-cla-tp4634579p4634868.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] templated use of aggregate

2012-06-13 Thread Matthew Johnson

Sorry, i'll try and put more flesh on the bones.

please note, i changed the data in the example, as fiddling has raised
another question that's best illustrated with a slightly different
data set.

first of all, when i do as you suggest, i obtain the following error:

 PxMat - aggregate(mm[,-1] ~ mm[,1], data=mm, sum)

Error in aggregate.formula(mm[, -1] ~ mm[, 1], data = mm, sum) :
  'names' attribute [3] must be the same length as the vector [1]

my data.frame is an xts, and it looks like this:

                    px_ym1 vol_ym1
2012-06-01 09:30:00  97.90       9
2012-06-01 09:30:00  97.90      60
2012-06-01 09:30:00  97.90      71
2012-06-01 09:30:00  97.90       5
2012-06-01 09:30:00  97.90       3
2012-06-01 09:30:00  97.90      21
2012-06-01 09:31:00  97.90       5
2012-06-01 09:31:00  97.89     192
2012-06-01 09:31:00  97.89      65
2012-06-01 09:31:00  97.89      73
2012-06-01 09:31:00  97.89       1
2012-06-01 09:31:00  97.89       1
2012-06-01 09:31:00  97.89      39
2012-06-01 09:31:00  97.90      15
2012-06-01 09:31:00  97.90       1
2012-06-01 09:31:00  97.89       1
2012-06-01 09:31:00  97.90      18
2012-06-01 09:31:00  97.89       1
2012-06-01 09:32:00  97.89      33
2012-06-01 09:34:00  97.89       1
2012-06-01 09:34:00  97.89       1

dput(mn) returns:

 dput(mn)
structure(c(97.9, 97.9, 97.9, 97.9, 97.9, 97.9, 97.9, 97.89,
97.89, 97.89, 97.89, 97.89, 97.89, 97.9, 97.9, 97.89, 97.9, 97.89,
97.89, 97.89, 97.89, 9, 60, 71, 5, 3, 21, 5, 192, 65, 73, 1,
1, 39, 15, 1, 1, 18, 1, 33, 1, 1), .indexCLASS = c(POSIXct,
POSIXt), .indexTZ = GMT, class = c(xts, zoo), index =
structure(c(1338543000,
1338543000, 1338543000, 1338543000, 1338543000, 1338543000, 1338543060,
1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543060,
1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543120,
1338543240, 1338543240), tzone = GMT, tclass = c(POSIXct,
POSIXt)), .Dim = c(21L, 2L), .Dimnames = list(NULL, c(px_ym1,
vol_ym1)))

as you can see, the xts data.frame xts data.frame that contains dates,
prices and volumes. There is much more data over a long time period,
and i'm interested in various sub-setting and then aggregate
operations.

I would like to split the data by time period and aggregate the data,
such that i obtain a table which reports the volume traded at each
price, for each of the time-period splits that i have chosen.

I have employed the following approach:

PxMat - aggregate(.~px_ym1, data=mn, sum)


which yields:

  px_ym1 vol_ym1
1  97.89     408
2  97.90     208

and for subsets, i use the following grouping:

PxMat30 - aggregate(.~px_ym1, data=mn[.indexmin(mn) == '30'], sum)

Which yields:

  px_ym1 vol_ym1
1   97.9     169

and

 PxMat31 - aggregate(.~px_ym1, data=mn[.indexmin(mn) == '31'], sum)

which yields:

  px_ym1 vol_ym1
1  97.89 373
2  97.90  39

and so on and so forth for each minute.

when i try and sub-set using general notation, as follows:

PxMat - aggregate(.~mn[,1], data=mn, sum)

this yields a different form of output:

px_ym1  px_ym1 vol_ym1
1  97.90 1076.79 408
2  97.89  979.00 208

the problem is that i now have the sum of the px_ym1 data (the sum of mn[,1])

hopefully things are now clearer - sorry to have wasted your time up
until now.

assuming that i have now made my situation clear, i am hope you can
help with four specific questions.

1/ My data-sets are HUGE, so speed is an issue - is this the fastest
way to sub-set and aggregate an xts?

2/ is there a way to do this for multiple splits? say a table for each
minute, day, week, or month? the return would potentially be a list
with a table for each day / minute etc showing volume traded at each
price -- but it doesn't have to be a list ...

i am writing a function with loops that would generate a table that
reports volume traded at each price for each case of a specified time
split (say for four tables, one for each minute in the example data,
returned as a list). my solution is slow, it seems like something that
someone would have done better already. is this the case?

3/ is there a way to do the sub-setting with templated variables? i
would like to obtain the table i get with the named aggregate
functions (reproduced above) with multiple data frames, as the column
names will differ from time to time. i cannot figure out how to stop
the command from summing the mn[,1] column when i stop using variable
names.

4/ on a related note, is it possible to apply different functions to
different columns of data? It would be nice, for example, if the table
returned from an aggregate command could be made to be:

px_ym1  count vol_ym1
1  97.90  11 408
2  97.89  10 208

where we have the price traded, the number of trades (a count of
px_ym1 / mn[,1], and the sum of vol_ym1 (mn[,2]).

thanks and best regards

matt johnson

On 13 June 2012 15:06, David Winsemius dwinsem...@comcast.net wrote:


 On Jun 12, 2012, at 11:32 PM, Matthew Johnson wrote:

 Dear R-help,

 I have an xts data

Re: [R] templated use of aggregate

2012-06-13 Thread Matthew Johnson

Sorry about the cross posting - i didn't realise it was bad etiquette.

my sessioninfo was as follows:

 sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods
[7] base

other attached packages:
[1] xts_0.8-2 zoo_1.7-6

loaded via a namespace (and not attached):
[1] grid_2.14.1lattice_0.20-0


i have now updated to R 2.15, and my session info is:

 sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods
[7] base

other attached packages:
[1] xts_0.8-6 zoo_1.7-7

loaded via a namespace (and not attached):
[1] grid_2.15.0lattice_0.20-6 tools_2.15.0

For the XTS object mn your suggestion still fails with an error:

 adf - aggregate(mn[,-1]~mn[,1], data=mn, sum); adf
Error in aggregate.formula(mn[, -1] ~ mn[, 1], data = mn, sum) :
  'names' attribute [3] must be the same length as the vector [1]

however when i convert to a zoo with

 mnz - as.zoo(mn)

I get some errors, but it works

 adf - aggregate(mnz[,-1]~mnz[,1], data=mnz, sum); adf
Warning messages:
1: In zoo(rval, index(x)[i]) :
  some methods for “zoo” objects do not work if the index entries in
‘order.by’ are not unique
2: In zoo(rval, index(x)[i]) :
  some methods for “zoo” objects do not work if the index entries in
‘order.by’ are not unique
3: In zoo(rval[i], index(x)[i]) :
  some methods for “zoo” objects do not work if the index entries in
‘order.by’ are not unique
4: In zoo(rval[i], index(x)[i]) :
  some methods for “zoo” objects do not work if the index entries in
‘order.by’ are not unique
5: In zoo(xc[ind], ix[ind]) :
  some methods for “zoo” objects do not work if the index entries in
‘order.by’ are not unique
6: In zoo(xc[ind], ix[ind]) :
  some methods for “zoo” objects do not work if the index entries in
‘order.by’ are not unique
  mnz[, 1] mnz[, -1]
197.90   408
297.89   208

So is this a bug in XTS?

thanks for your patience

mj

On 13 June 2012 15:53, David Winsemius dwinsem...@comcast.net wrote:

 On Jun 13, 2012, at 9:38 AM, Matthew Johnson wrote:

 Sorry, i'll try and put more flesh on the bones.

 please note, i changed the data in the example, as fiddling has raised
 another question that's best illustrated with a slightly different
 data set.

 first of all, when i do as you suggest, i obtain the following error:

 PxMat - aggregate(mm[,-1] ~ mm[,1], data=mm, sum)


 Error in aggregate.formula(mm[, -1] ~ mm[, 1], data = mm, sum) :
  'names' attribute [3] must be the same length as the vector [1]


 Very strange. When I just did it with the structure you (cross-) posted on
 SO I got:

 adf - aggregate(mm[,-1]~mm[,1], data=mm, sum); adf
 snipped warning messages
  mm[, 1] mm[, -1]

 1   97.91      538
 2   97.92      918

 I had earlier tested it with a zoo object I had constructed and did it again
 with the structure below.

  mm[, 1] mm[, -1]

 1   97.91      538
 2   97.92      918

 I'm using zoo_1.7-6 and R version 2.14.2 on a Mac. I do not remember you
 posting the requested information about your versions.

 --
 David.



 my data.frame is an xts, and it looks like this:

                    px_ym1 vol_ym1
 2012-06-01 09:30:00  97.90       9
 2012-06-01 09:30:00  97.90      60
 2012-06-01 09:30:00  97.90      71
 2012-06-01 09:30:00  97.90       5
 2012-06-01 09:30:00  97.90       3
 2012-06-01 09:30:00  97.90      21
 2012-06-01 09:31:00  97.90       5
 2012-06-01 09:31:00  97.89     192
 2012-06-01 09:31:00  97.89      65
 2012-06-01 09:31:00  97.89      73
 2012-06-01 09:31:00  97.89       1
 2012-06-01 09:31:00  97.89       1
 2012-06-01 09:31:00  97.89      39
 2012-06-01 09:31:00  97.90      15
 2012-06-01 09:31:00  97.90       1
 2012-06-01 09:31:00  97.89       1
 2012-06-01 09:31:00  97.90      18
 2012-06-01 09:31:00  97.89       1
 2012-06-01 09:32:00  97.89      33
 2012-06-01 09:34:00  97.89       1
 2012-06-01 09:34:00  97.89       1

 dput(mn) returns:

 dput(mn)

 structure(c(97.9, 97.9, 97.9, 97.9, 97.9, 97.9, 97.9, 97.89,
 97.89, 97.89, 97.89, 97.89, 97.89, 97.9, 97.9, 97.89, 97.9, 97.89,
 97.89, 97.89, 97.89, 9, 60, 71, 5, 3, 21, 5, 192, 65, 73, 1,
 1, 39, 15, 1, 1, 18, 1, 33, 1, 1), .indexCLASS = c(POSIXct,
 POSIXt), .indexTZ = GMT, class = c(xts, zoo), index =
 structure(c(1338543000,
 1338543000, 1338543000, 1338543000, 1338543000, 1338543000, 1338543060,
 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543060,
 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543120,
 1338543240, 1338543240), tzone = GMT, tclass = c(POSIXct,
 POSIXt)), .Dim = c(21L, 2L), .Dimnames = list(NULL, c(px_ym1,
 vol_ym1)))

 as you can see, the xts data.frame xts data.frame

Re: [R] templated use of aggregate

2012-06-13 Thread Matthew Johnson

thank you for your patience. i assure you i will get better with the
appropriate etiquette - and hopefully eventually contribute.

On 13 June 2012 16:18, David Winsemius dwinsem...@comcast.net wrote:

 On Jun 13, 2012, at 10:09 AM, Matthew Johnson wrote:

 my sessioninfo was as follows:

 sessionInfo()

 R version 2.14.1 (2011-12-22)
 Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

 locale:
 [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods
 [7] base

 other attached packages:
 [1] xts_0.8-2 zoo_1.7-6

 loaded via a namespace (and not attached):
 [1] grid_2.14.1    lattice_0.20-0


 i have now updated to R 2.15, and my session info is:

 sessionInfo()

 R version 2.15.0 (2012-03-30)
 Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

 locale:
 [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods
 [7] base

 other attached packages:
 [1] xts_0.8-6 zoo_1.7-7

 loaded via a namespace (and not attached):
 [1] grid_2.15.0    lattice_0.20-6 tools_2.15.0

 For the XTS object mn your suggestion still fails with an error:

 adf - aggregate(mn[,-1]~mn[,1], data=mn, sum); adf

 Error in aggregate.formula(mn[, -1] ~ mn[, 1], data = mn, sum) :
  'names' attribute [3] must be the same length as the vector [1]

 however when i convert to a zoo with

 mnz - as.zoo(mn)


 I get some errors, but it works


 Those are warnings, ... not errors.



 adf - aggregate(mnz[,-1]~mnz[,1], data=mnz, sum); adf

 Warning messages:
 1: In zoo(rval, index(x)[i]) :
  some methods for “zoo” objects do not work if the index entries in
 ‘order.by’ are not unique
 2: In zoo(rval, index(x)[i]) :
  some methods for “zoo” objects do not work if the index entries in
 ‘order.by’ are not unique
 3: In zoo(rval[i], index(x)[i]) :
  some methods for “zoo” objects do not work if the index entries in
 ‘order.by’ are not unique
 4: In zoo(rval[i], index(x)[i]) :
  some methods for “zoo” objects do not work if the index entries in
 ‘order.by’ are not unique
 5: In zoo(xc[ind], ix[ind]) :
  some methods for “zoo” objects do not work if the index entries in
 ‘order.by’ are not unique
 6: In zoo(xc[ind], ix[ind]) :
  some methods for “zoo” objects do not work if the index entries in
 ‘order.by’ are not unique
  mnz[, 1] mnz[, -1]
 1    97.90       408
 2    97.89       208

 So is this a bug in XTS?


 It does look that way to me. The correct way to report this is to contact
 the package maintainer (copied on this message) , (although I did notice
 that Joshua Ulrich already looked at this posting in SO and he is on the xts
 development team). You should have put in this at the beginning of your code
 :

 library(xts)

 --
 David.


 thanks for your patience

 mj

 On 13 June 2012 15:53, David Winsemius dwinsem...@comcast.net wrote:


 On Jun 13, 2012, at 9:38 AM, Matthew Johnson wrote:

 Sorry, i'll try and put more flesh on the bones.

 please note, i changed the data in the example, as fiddling has raised
 another question that's best illustrated with a slightly different
 data set.

 first of all, when i do as you suggest, i obtain the following error:

 PxMat - aggregate(mm[,-1] ~ mm[,1], data=mm, sum)



 Error in aggregate.formula(mm[, -1] ~ mm[, 1], data = mm, sum) :
  'names' attribute [3] must be the same length as the vector [1]



 Very strange. When I just did it with the structure you (cross-) posted
 on
 SO I got:

 adf - aggregate(mm[,-1]~mm[,1], data=mm, sum); adf

 snipped warning messages
  mm[, 1] mm[, -1]

 1   97.91      538
 2   97.92      918

 I had earlier tested it with a zoo object I had constructed and did it
 again
 with the structure below.

  mm[, 1] mm[, -1]

 1   97.91      538
 2   97.92      918

 I'm using zoo_1.7-6 and R version 2.14.2 on a Mac. I do not remember you
 posting the requested information about your versions.

 --
 David.



 my data.frame is an xts, and it looks like this:

                   px_ym1 vol_ym1
 2012-06-01 09:30:00  97.90       9
 2012-06-01 09:30:00  97.90      60
 2012-06-01 09:30:00  97.90      71
 2012-06-01 09:30:00  97.90       5
 2012-06-01 09:30:00  97.90       3
 2012-06-01 09:30:00  97.90      21
 2012-06-01 09:31:00  97.90       5
 2012-06-01 09:31:00  97.89     192
 2012-06-01 09:31:00  97.89      65
 2012-06-01 09:31:00  97.89      73
 2012-06-01 09:31:00  97.89       1
 2012-06-01 09:31:00  97.89       1
 2012-06-01 09:31:00  97.89      39
 2012-06-01 09:31:00  97.90      15
 2012-06-01 09:31:00  97.90       1
 2012-06-01 09:31:00  97.89       1
 2012-06-01 09:31:00  97.90      18
 2012-06-01 09:31:00  97.89       1
 2012-06-01 09:32:00  97.89      33
 2012-06-01 09:34:00  97.89       1
 2012-06-01 09:34:00  97.89       1

 dput(mn) returns:

 dput(mn)


 structure(c(97.9, 97.9, 97.9, 97.9, 97.9, 97.9, 97.9, 97.89,
 97.89, 97.89, 97.89, 97.89, 97.89, 97.9, 97.9, 97.89

[R] what does .indexDate() do - R::xts

2012-06-12 Thread Matthew Johnson

Dear R experts,

I am learning the very useful XTS package, but cannot figure out the
purpose of some commands.

in particular, the .indexDate() command does not work as expected.

say:

x - timeBasedSeq('2010-01-01/2010-01-02 12:00')
x - xts(1:length(x), x)

then i can subset on date as follows:

x['2010-01-01']

however the .indexDate() command does not work as expected; in
particular the following does not return anything.

x[.indexDate(x) == '2010-01-01']


I am sure i am missing something - what is .indexDate() supposed to do?


thanks and best regards


matt johnson

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] what does .indexDate() do - R::xts

2012-06-12 Thread Matthew Johnson

thanks. i think i understand: the difference is that the first command
converts my 'searched-for' date to a number and matches it, but the second
does not?

On 13 June 2012 12:58, Joshua Ulrich josh.m.ulr...@gmail.com wrote:

 On Tue, Jun 12, 2012 at 9:48 PM, Matthew Johnson mcoog...@gmail.com
 wrote:
  Dear R experts,
 
  I am learning the very useful XTS package, but cannot figure out the
  purpose of some commands.
 
  in particular, the .indexDate() command does not work as expected.
 
  say:
 
  x - timeBasedSeq('2010-01-01/2010-01-02 12:00')
  x - xts(1:length(x), x)
 
  then i can subset on date as follows:
 
  x['2010-01-01']
 
  however the .indexDate() command does not work as expected; in
  particular the following does not return anything.
 
  x[.indexDate(x) == '2010-01-01']
 
 That's because all comparisons are FALSE.  .indexDate() returns the
 index of x, converted to the numeric representation of the Date class
 (i.e. as.Date(.indexDate(x), origin=1970-01-01) will be the Date of
 the index values).  '2010-01-01' is a character string.

 
  I am sure i am missing something - what is .indexDate() supposed to do?
 
 Though it's not well documented, what it's doing is pretty clear from
 the source:
 R .indexDate
 function (x)
 {
.index(x)%/%86400L
 }
 environment: namespace:xts

 
  thanks and best regards
 
 
  matt johnson
 

 Best,
 --
 Joshua Ulrich  |  FOSS Trading: www.fosstrading.com


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 3 4 5 >

1 - 100 of 450 matches

Mail list logo