Hi Andrew, On Apr 12, 2014, at 6:36 PM, Andrew Hoerner <ahoer...@rprogress.org> wrote:
> Thanks Sarah! That worked! > > And you are quite right about the absence of parentheses and "EC07_A1$" 's. > I apologize for sending that code snip -- I am not quite sure how I managed > to do it, since I had already fixed those problems and changed the code in > order to get the error message I posted. > > Apropos of nothing in particular, before I could successfully impliment > your fix, I also had to learn another new thing. When saving a CSV file > with write.table, if you use sep=", " (that's double-quote comma space > double-quote) R puts the space _inside_ the quotation marks around > character variables. I'm not sure I would call that a bug, but I bet more > people are surprised by it than expect it. It shouldn’t; that’s incorrect. Can you provide a reproducible example? When I look at your code & my reply, I notice that the quote marks are wrong too; could that be the actual problem? Sarah > > Again, many thanks! > > Andrew > > > On Sat, Apr 12, 2014 at 6:04 AM, Sarah Goslee <sarah.gos...@gmail.com>wrote: > >> You need %in% instead. >> >> This is untested, but something like this should work: >> >> >> ECwork <- EC07_A1[ EC07_A1$GEO_ID %in% c("01000US", "04000US06", >> "33000US488", >> "31000US41860", "31400US4186036084" "05000US06001", "E6000US0600153000") & >> EC07_A1$SECTOR %in% c("32", "33", "42", 44", 45", 51", 54", 61", >> "71", >> "81"), ] >> >> (Note that your original code snippet had a shortage of ) and didn't >> specify the data frame from which to take the columns.) >> >> Sarah >> >> On Sat, Apr 12, 2014 at 8:36 AM, Andrew Hoerner <ahoer...@rprogress.org> >> wrote: >>> Dear Folks-- >>> I have a file with 3 million-odd rows of data from the 2007 U.S. Economic >>> Census. I am trying to pare it down to a subset of rows that both (1) has >>> any one of a vector of NAICS economic sector codes, and (2) also has any >>> one of a vector of geographic ID codes. >>> >>> Here is the code I am trying to use. >>> >>> ECwork <- EC07_A1[ any(GEO_ID == c("01000US", "04000US06", >> "33000US488", >>> "31000US41860", "31400US4186036084" "05000US06001", "E6000US0600153000") >> & >>> any(SECTOR == c("32", "33", "42", 44", 45", 51", 54", 61", "71", >>> "81"), ] >>> >>> I get back the following error: >>> >>> Warning message: >>> In EC07_A1$SECTOR == c("32", "33", "42", "44", "45", "51", "54", : >>> longer object length is not a multiple of shorter object length >>> >>> I see what R is doing. Instead of comparing each element of the column >>> SECTOR to the row vector of codes, and returning a logical vector of the >>> length of SECTOR with rows marked as TRUE that match any of the codes, it >>> is lining my code list up with SECTOR as a column vector and doing >>> element-by-element testing, and then recycling the code list over three >>> million rows. But I am not sure how to make it do what I want -- test the >>> sector code in each row against the vector of code I am looking for. I >>> would be grateful if anyone could suggest an alternative that would >> achieve >>> my ends. >>> >>> Oh, and I would add, if there is a way of correctly using doing this with >>> the extract function [], I would like to know what it is. If not, I guess >>> I'd like to know that too. >>> >>> Sincerely, Andrew Hoerner >>> >>> -- >>> J. Andrew Hoerner >>> Director, Sustainable Economics Program >>> Redefining Progress >>> (510) 507-4820 >>> >> -- >> Sarah Goslee >> http://www.functionaldiversity.org >> ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.