Re: [R] Finding unique terms

2018-10-15 Thread Bert Gunter
Here is a base R solution:
"dat" is the data frame as in Robert's solution.

> aggregate(dat[,3:6], by= dat[1], FUN = sum, na.rm = TRUE)
  STUDENT_ID   PO1M PO1T PO2M PO2T
1AA15285 287.80  350   37   50
2AA15286 240.45  330   41   50

Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Oct 15, 2018 at 6:42 PM Robert Baer  wrote:

>
>
> On 10/11/2018 5:12 PM, roslinazairimah zakaria wrote:
> > Dear r-users,
> >
> > I have this data:
> >
> > structure(list(STUDENT_ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
> > 2L, 2L, 2L, 2L, 2L), .Label = c("AA15285", "AA15286"), class = "factor"),
> >  COURSE_CODE = structure(c(1L, 2L, 5L, 6L, 7L, 8L, 2L, 3L,
> >  4L, 5L, 6L), .Label = c("BAA1113", "BAA1322", "BAA2113",
> >  "BAA2513", "BAA2713", "BAA2921", "BAA4273", "BAA4513"), class =
> > "factor"),
> >  PO1M = c(155.7, 48.9, 83.2, NA, NA, NA, 48.05, 68.4, 41.65,
> >  82.35, NA), PO1T = c(180, 70, 100, NA, NA, NA, 70, 100, 60,
> >  100, NA), PO2M = c(NA, NA, NA, 37, NA, NA, NA, NA, NA, NA,
> >  41), PO2T = c(NA, NA, NA, 50, NA, NA, NA, NA, NA, NA, 50),
> >  X = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), X.1 = c(NA,
> >  NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("STUDENT_ID",
> > "COURSE_CODE", "PO1M", "PO1T", "PO2M", "PO2T", "X", "X.1"), class =
> > "data.frame", row.names = c(NA,
> > -11L))
> >
> > I want to combine the same Student ID and add up all the values for PO1M,
> > PO1T,...,PO2T obtained by the same ID.
> >
> > How do I do that?
> > Thank you for any help given.
> >
> oops!  Forgot to clean up after my cut and paste. Solution with dplyr
> looks like this:
> # Create sums by student ID
> library(dplyr)
> dat %>%
>group_by(STUDENT_ID) %>%
>summarize(sum.PO1M = sum(PO1M, na.rm = TRUE),
>  sum.PO1T = sum(PO1T, na.rm = TRUE),
>  sum.PO2M = sum(PO2M, na.rm = TRUE),
>  sum.PO2T = sum(PO2T, na.rm = TRUE))
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Unlisting a nested dataset

2018-10-15 Thread Nathan Parsons
I’m attempting to do some content analysis on a few million tweets, but I can’t 
seem to get them cleaned correctly.

I’m trying to replicate the process outlined here: 
https://stackoverflow.com/questions/46734501/opposite-of-unnest-tokens

My code:

tweets %>%
 unnest_tokens(word, text, token = 'tweets') %>%
 filter(!word %in% stop_words$word) %>%
 nest(word) %>%
 mutate(text = map(data, unlist),
           text = map_chr(text, paste, collapse = " ")) -> tweets

Unfortunately, I keep getting:

 Error in mutate_impl(.data, dots) :
 Evaluation error: cannot coerce type 'closure' to vector of type 'character’.

What am I doing wrong?

Here’s what the dataset looks like:

> glimpse(tweets)
Observations: 389,253
Variables: 12
$ status_id "x1047841705729306624", "x1046966595610927105", "x104709...
$ created_at "2018-10-04T13:31:45Z", "2018-10-02T03:34:22Z", "2018-10...
$ text "Technique is everything with olympic lifts ! @ Body By ...
$ lat 43.68359, 40.28412, 37.77066, 40.43139, 31.16889, 33.937...
$ lng -70.32841, -83.07859, -122.43598, -79.98069, -100.07689,...
$ county_name "Cumberland County", "Delaware County", "San Francisco C...
$ fips 23005, 39041, 6075, 42003, 48095, 6037, 6037, 55073, 482...
$ state_name "Maine", "Ohio", "California", "Pennsylvania", "Texas", ...
$ state_abb "ME", "OH", "CA", "PA", "TX", "CA", "CA", "WI", "TX", "A...
$ urban_level "Medium Metro", "Large Fringe Metro", "Large Central Met...
$ urban_code 3, 2, 1, 1, 6, 1, 1, 4, 1, 3, 2, 2, 1, 3, 6, 1, 1, 2, 3,...
$ population 277308, 184029, 830781, 1160433, 4160, 9509611, 9509611,...

--

Nate Parsons
Pronouns: He, Him, His
Graduate Teaching Assistant
Department of Sociology
Portland State University
Portland, Oregon

503-725-9025
503-725-3957 FAX

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding unique terms

2018-10-15 Thread Robert Baer




On 10/11/2018 5:12 PM, roslinazairimah zakaria wrote:

Dear r-users,

I have this data:

structure(list(STUDENT_ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L), .Label = c("AA15285", "AA15286"), class = "factor"),
 COURSE_CODE = structure(c(1L, 2L, 5L, 6L, 7L, 8L, 2L, 3L,
 4L, 5L, 6L), .Label = c("BAA1113", "BAA1322", "BAA2113",
 "BAA2513", "BAA2713", "BAA2921", "BAA4273", "BAA4513"), class =
"factor"),
 PO1M = c(155.7, 48.9, 83.2, NA, NA, NA, 48.05, 68.4, 41.65,
 82.35, NA), PO1T = c(180, 70, 100, NA, NA, NA, 70, 100, 60,
 100, NA), PO2M = c(NA, NA, NA, 37, NA, NA, NA, NA, NA, NA,
 41), PO2T = c(NA, NA, NA, 50, NA, NA, NA, NA, NA, NA, 50),
 X = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), X.1 = c(NA,
 NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("STUDENT_ID",
"COURSE_CODE", "PO1M", "PO1T", "PO2M", "PO2T", "X", "X.1"), class =
"data.frame", row.names = c(NA,
-11L))

I want to combine the same Student ID and add up all the values for PO1M,
PO1T,...,PO2T obtained by the same ID.

How do I do that?
Thank you for any help given.

oops!  Forgot to clean up after my cut and paste. Solution with dplyr 
looks like this:

# Create sums by student ID
library(dplyr)
dat %>%
  group_by(STUDENT_ID) %>%
  summarize(sum.PO1M = sum(PO1M, na.rm = TRUE),
    sum.PO1T = sum(PO1T, na.rm = TRUE),
    sum.PO2M = sum(PO2M, na.rm = TRUE),
    sum.PO2T = sum(PO2T, na.rm = TRUE))

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding unique terms

2018-10-15 Thread Robert Baer




Dear r-users,

I have this data:

structure(list(STUDENT_ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L), .Label = c("AA15285", "AA15286"), class = "factor"),
 COURSE_CODE = structure(c(1L, 2L, 5L, 6L, 7L, 8L, 2L, 3L,
 4L, 5L, 6L), .Label = c("BAA1113", "BAA1322", "BAA2113",
 "BAA2513", "BAA2713", "BAA2921", "BAA4273", "BAA4513"), class =
"factor"),
 PO1M = c(155.7, 48.9, 83.2, NA, NA, NA, 48.05, 68.4, 41.65,
 82.35, NA), PO1T = c(180, 70, 100, NA, NA, NA, 70, 100, 60,
 100, NA), PO2M = c(NA, NA, NA, 37, NA, NA, NA, NA, NA, NA,
 41), PO2T = c(NA, NA, NA, 50, NA, NA, NA, NA, NA, NA, 50),
 X = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), X.1 = c(NA,
 NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("STUDENT_ID",
"COURSE_CODE", "PO1M", "PO1T", "PO2M", "PO2T", "X", "X.1"), class =
"data.frame", row.names = c(NA,
-11L))

I want to combine the same Student ID and add up all the values for PO1M,
PO1T,...,PO2T obtained by the same ID.

How do I do that?
Thank you for any help given


# load data

# Enter dataframe by hand
dat <- structure(list(STUDENT_ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L), .Label = c("AA15285", "AA15286"), class = "factor"),
    COURSE_CODE = structure(c(1L, 2L, 5L, 6L, 7L, 8L, 2L, 3L,
    4L, 5L, 6L), .Label = c("BAA1113", "BAA1322", "BAA2113",
    "BAA2513", "BAA2713", "BAA2921", "BAA4273", "BAA4513"), class =
"factor"),
    PO1M = c(155.7, 48.9, 83.2, NA, NA, NA, 48.05, 68.4, 41.65,
    82.35, NA), PO1T = c(180, 70, 100, NA, NA, NA, 70, 100, 60,
    100, NA), PO2M = c(NA, NA, NA, 37, NA, NA, NA, NA, NA, NA,
    41), PO2T = c(NA, NA, NA, 50, NA, NA, NA, NA, NA, NA, 50),
    X = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), X.1 = c(NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("STUDENT_ID",
"COURSE_CODE", "PO1M", "PO1T", "PO2M", "PO2T", "X", "X.1"), class =
"data.frame", row.names = c(NA,
-11L))

# Create sums by student ID

library(dplyr)
dat %>%
  group_by(STUDENT_ID) %>%
  summarize(sum.PO1M = sum(PO1M, na.rm = TRUE),
    sum.PO1T = sum(PO1M, na.rm = TRUE),
    sum.PO2M = sum(PO1M, na.rm = TRUE),
    sum.PO2T = sum(PO1M, na.rm = TRUE))

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RTools and previous Cygwin installation--conflict?

2018-10-15 Thread Brent via R-help
I sent a query on this subject over 4 years ago.
https://www.mail-archive.com/r-help@r-project.org/msg212269.html

I used to use Duncan Murdoch's suggestion to install only cygwin 32 bit and 
then put RTools first in my Windows path, so that its versions of cygwin 
commands would get pick up first.
https://www.mail-archive.com/r-help@r-project.org/msg212281.html

That approach, however, fails as of RTools 3.5, since RTools switched from 
cygwin commands to msys2.
https://cran.r-project.org/bin/windows/Rtools/Rtools.txt
The mysys commands are incompatible with cygwin; you will get strange errors in 
cygwin if it tries to use msys2 commands.

This forced me to reconsider this problem, and I think that I came up with a 
much better approach that I want to share for the benefit of others.

Gabor Grothendieck had already suggested using a DOS batch file to temporarily 
alter the PATH env var "to R and to Rtools only when R is being used in order 
to avoid conflicts"
https://www.mail-archive.com/r-help@r-project.org/msg212275.html

My solution is similar, except that instead of doing this in a DOS batch file, 
I do it as the first action within R itself.

In particular, I edited R .Rprofile file so that its
.First
function calls a new function
add_RTools_toPATH()

That function is:
# Modifies the PATH environmental variable to have the path to the RTools 
bin directory as its first element.
# This change only affects the current R session (and any processes it 
spawns?); it does not permanently alter the Windows system env var.
add_RTools_toPATH = function() {
# get the existing PATH env var's value:
pathEnvVar = Sys.getenv("PATH")

# check that it does not already have an Rtools element:
if (grepl("Rtools", pathEnvVar, fixed = TRUE)) {
stop("Your existing Windows PATH env var (printed below) already 
contains the substring \"Rtools\", which is unexpected:", "\n", pathEnvVar)
}

# add C:\Rtools\bin to the front of PATH:
pathNew = paste( "C:\\Rtools\\bin", Sys.getenv("PATH"), sep = ";" )
Sys.setenv(PATH = pathNew)
}


This new solution allows me to use whatever version of cygwin I want (e.g. 64 
bit) with no worries about collision as before.  It also lets me exactly 
control the PATH for my R sessions.

Here are some background links if you do not fully understand the above:
https://stat.ethz.ch/R-manual/R-devel/library/base/html/Startup.html
https://stat.ethz.ch/R-manual/R-devel/library/base/html/Sys.setenv.html

https://stackoverflow.com/questions/24622725/how-to-set-path-on-windows-through-r-shell-command

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] limit bar graph output

2018-10-15 Thread Jeff Reichman
Bert 

Jeff

 

I just resorted and took the top 30 and then reordered again in the geom_bar 
function – below

 

ggplot(data=st.cnt)+

  geom_bar(aes(x=reorder(CourseName, -n), y=n),fill = "dark blue", 
stat="identity")+

  theme(axis.text.x = element_text(angle = 60, hjust = 1))

 

Jeff

 

From: Bert Gunter  
Sent: Sunday, October 14, 2018 10:51 PM
To: reichm...@sbcglobal.net
Cc: R-help 
Subject: Re: [R] limit bar graph output

 

If I understand correctly, just subset your sorted data.

 

e.g. :

 

x <- runif(50)

##  50 unsorted values

 

sort(x, dec = TRUE)[1:10]  

## the 10 biggest

 

 

-- Bert

 

 




Bert Gunter

"The trouble with having an open mind is that people keep coming along and 
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

 

 

On Sun, Oct 14, 2018 at 7:13 PM Jeff Reichman mailto:reichm...@sbcglobal.net> > wrote:

R-Help Forum

I'm using the following code to reorder (from highest to lowest) my miRNA
counts.  But there are 500 plus and I only need the first (say) 15-20.  How
do I limit ggplot to only the first 20 miRNA counts

ggplot(data = corr.m, aes(x = reorder(miRNA, -value), y = value, fill =
variable)) + 
  geom_bar(stat = "identity")

Jeff

__
R-help@r-project.org   mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] new edition of An R Companion to Applied Regression

2018-10-15 Thread Fox, John
Dear r-help list members,

Sandy Weisberg and I would like to announce a new (third) edition of our book 
An R Companion to Applied Regression, which has recently been published by Sage 
Publications. The book provides a broad introduction to R in the general 
context of applied regression analysis, including linear models, generalized 
linear models, and, new to the third edition, mixed-effects models.

The R Companion is associated with two widely used CRAN packages, the car and 
effects packages. In anticipation of the new edition of the R Companion we 
contributed substantially revised versions of these packages to CRAN: version 
3.0-x of the car package and version 4.0-x of the effects package.

More information about the book, including a variety of on-line resources 
(chapter R scripts, on-line appendices, etc.), is available at 
.

Best,
 John

-
John Fox
Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
Web: https://socialsciences.mcmaster.ca/jfox/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.