Re: [R] Cleaning up messy Excel data

2012-03-03 Thread John Kane
Seconded 

John Kane
Kingston ON Canada


> -Original Message-
> From: rolf.tur...@xtra.co.nz
> Sent: Sat, 03 Mar 2012 13:46:42 +1300
> To: 538...@gmail.com
> Subject: Re: [R] Cleaning up messy Excel data
> 
> On 03/03/12 12:41, Greg Snow wrote:
> 
> 
>> It is possible to do the right thing in
>> Excel, but Excel does not encourage (let alone force) you to do the
>> right thing, but makes it easy to do the wrong thing.
> 
> 
> Fortune!
> 
>  cheers,
> 
>  Rolf Turner
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


FREE ONLINE PHOTOSHARING - Share your photos online with your friends and 
family!
Visit http://www.inbox.com/photosharing to find out more!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning up messy Excel data

2012-03-03 Thread Greg Snow
Sometimes we adapt to our environment, sometimes we adapt our
environment to us. I like fortune(108).

I actually was suggesting that you add a tool to your toolbox, not limit it.

In my experience (and I don't expect everyone else's to match) data
manipulation that seems easier in Excel than R is only easier until
the client comes back and wants me to redo the whole analysis with one
typo fixed.  Then rerunning the script in R (or Perl or other tool) is
a lot easier than trying to remember where all I clicked, dragged,
selected, etc.

I do use Excel for somethings (though I would be happy to find other
tools for that if it were possible to expunge Excel from the earth)
and Word (I actually like using R2wd to send tables and graphs to word
that I can then give to clients who just want to be able to copy and
paste them to something else), I just think that many of the tasks
that many people use excel for would be better served with a better
tool.

If someone reading this decides to put some more thought into a
project up front and actually design a database up front rather than
letting it evolve into some monstrosity in Excel, and that decision
saves them some later grief, then the world will be a little bit
better place.

On Fri, Mar 2, 2012 at 6:04 PM, jim holtman  wrote:
> Unfortunately they only know how to use Excel and Word.  They are not
> folks who use a computer every day.  Many of them run factories or
> warehouses and asking them to use something like Access would not
> happen in my lifetime (I have retired twice already).
>
> I don't have any problems with them "messing" up the data that I send
> them; they are pretty good about making changes within the context of
> the spreadsheet.  The other issue is that I working with people in
> twenty different locations spread across the US, so I might be able to
> one of them to use Access (there is one I know that uses it), but that
> leaves 19 other people I would not be able to communicate with.
>
> The other thing is, is that I use Excel myself to slice/dice data
> since there are things that are easier in Excel than R (believe it or
> not).  There are a number of tools I keep in my toolkit, and R is
> probably the most important, but I have not thrown the rest of them
> away since they still serve a purpose.
>
> So if you can come up with a way to 20 diverse groups, who are not
> computer literate, to change over in a couple of days from Excel to
> Access let me know.  BTW, I tried to use Access once and gave it up
> because it was not as intuitive as some other tools and did not give
> me any more capability than the ones I was using.  So I know I would
> have a problem in convincing other to make the change just so they
> could communicate with me, while they still had to use Excel to most
> of their other interfaces.
>
> This is the real world where you have to learn how to adapt to your
> environment and make the best of it.  So you just have to learn that
> Excel can be your friend (or at least not your enemy) and can serve a
> very useful purpose in getting your ideas across to other people.
>
> On Fri, Mar 2, 2012 at 6:41 PM, Greg Snow <538...@gmail.com> wrote:
>> Try sending your clients a data set (data frame, table, etc) as an MS
>> Access data table instead.  They can still view the data as a table,
>> but will have to go to much more effort to mess up the data, more
>> likely they will do proper edits without messing anything up (mixing
>> characters in with numbers, have more sexes than your biology teacher
>> told you about, add extra lines at top or bottom that makes reading
>> back into R more difficult, etc.)
>>
>> I have had a few clients that I talked into using MS Access from the
>> start to enter their data, there was often a bit of resistance at
>> first, but once they tried it and went through the process of
>> designing the database up front they ended up thanking me and believed
>> that the entire data entry process was easier and quicker than had the
>> used excel as they originally planned.
>>
>> Access is still part of MS office, so they don't need to learn R or in
>> any way break their chains from being prisoners of bill, but they will
>> be more productive in more ways than just interfacing with you.
>>
>> Access (databases in general) force you to plan things out and do the
>> correct thing from the start.  It is possible to do the right thing in
>> Excel, but Excel does not encourage (let alone force) you to do the
>> right thing, but makes it easy to do the wrong thing.
>>
>> On Thu, Mar 1, 2012 at 6:15 AM, jim holtman  wrote:
>>> But there are some important reasons to use Excel.  In my work there
>>> are a lot of people that I have to send the equivalent of a data.frame
>>> to who want to look at the data and possibly slice/dice the data
>>> differently and then send back to me updates.  These folks do not know
>>> how to use R, but do have Microsoft Office installed on their
>>> computers and know how to use the

Re: [R] Cleaning up messy Excel data

2012-03-03 Thread John C Nash
When I was still teaching undergraduate intro biz-stat (among that community it 
is always
abbreviated), we needed to control the spreadsheet behaviour of TAs who entered 
marks into
a spreadsheet. We came up with TellTable (the Sourceforge site is still around 
with refs
at http://telltable-s.sourceforge.net/), which put openoffice calc on a server 
and made
sure change recording was on and the menu to switch off change recording was 
removed. It
is used over a web browser with a VNC client. Neil Smith wrote a Java 
application to view
all the changes by who, what, when etc., and we discovered the infrastructure 
was quite
nice for running any single user app in a shared mode with version control. 
However, with
Google Docs, we realized we could try to make money or enjoy life, and so the 
project is
now moribund. However, the ideas are there, and if anyone gets interested, I'll 
be happy
to try to dig up materials, though I suspect that it would be easier to work 
with the
ideas and more modern tools.

The key idea is that there is just ONE master file, and that there is some 
discipline over
keeping that file OK. My opinion is that this concept could be exploited much 
more for
lots of different situations, but it seems that cloud technology is being used 
to create
lots of versions of files rather than consolidate and control such files.

JN


On 03/03/2012 06:00 AM, r-help-requ...@r-project.org wrote:
> Message: 76
> Date: Fri, 2 Mar 2012 20:04:05 -0500
> From: jim holtman 
> To: Greg Snow <538...@gmail.com>
> Cc: r-help 
> Subject: Re: [R] Cleaning up messy Excel data
> Message-ID:
>   
> Content-Type: text/plain; charset=ISO-8859-1
> 
> Unfortunately they only know how to use Excel and Word.  They are not
> folks who use a computer every day.  Many of them run factories or
> warehouses and asking them to use something like Access would not
> happen in my lifetime (I have retired twice already).
> 
> I don't have any problems with them "messing" up the data that I send
> them; they are pretty good about making changes within the context of
> the spreadsheet.  The other issue is that I working with people in
> twenty different locations spread across the US, so I might be able to
> one of them to use Access (there is one I know that uses it), but that
> leaves 19 other people I would not be able to communicate with.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning up messy Excel data

2012-03-02 Thread jim holtman
Unfortunately they only know how to use Excel and Word.  They are not
folks who use a computer every day.  Many of them run factories or
warehouses and asking them to use something like Access would not
happen in my lifetime (I have retired twice already).

I don't have any problems with them "messing" up the data that I send
them; they are pretty good about making changes within the context of
the spreadsheet.  The other issue is that I working with people in
twenty different locations spread across the US, so I might be able to
one of them to use Access (there is one I know that uses it), but that
leaves 19 other people I would not be able to communicate with.

The other thing is, is that I use Excel myself to slice/dice data
since there are things that are easier in Excel than R (believe it or
not).  There are a number of tools I keep in my toolkit, and R is
probably the most important, but I have not thrown the rest of them
away since they still serve a purpose.

So if you can come up with a way to 20 diverse groups, who are not
computer literate, to change over in a couple of days from Excel to
Access let me know.  BTW, I tried to use Access once and gave it up
because it was not as intuitive as some other tools and did not give
me any more capability than the ones I was using.  So I know I would
have a problem in convincing other to make the change just so they
could communicate with me, while they still had to use Excel to most
of their other interfaces.

This is the real world where you have to learn how to adapt to your
environment and make the best of it.  So you just have to learn that
Excel can be your friend (or at least not your enemy) and can serve a
very useful purpose in getting your ideas across to other people.

On Fri, Mar 2, 2012 at 6:41 PM, Greg Snow <538...@gmail.com> wrote:
> Try sending your clients a data set (data frame, table, etc) as an MS
> Access data table instead.  They can still view the data as a table,
> but will have to go to much more effort to mess up the data, more
> likely they will do proper edits without messing anything up (mixing
> characters in with numbers, have more sexes than your biology teacher
> told you about, add extra lines at top or bottom that makes reading
> back into R more difficult, etc.)
>
> I have had a few clients that I talked into using MS Access from the
> start to enter their data, there was often a bit of resistance at
> first, but once they tried it and went through the process of
> designing the database up front they ended up thanking me and believed
> that the entire data entry process was easier and quicker than had the
> used excel as they originally planned.
>
> Access is still part of MS office, so they don't need to learn R or in
> any way break their chains from being prisoners of bill, but they will
> be more productive in more ways than just interfacing with you.
>
> Access (databases in general) force you to plan things out and do the
> correct thing from the start.  It is possible to do the right thing in
> Excel, but Excel does not encourage (let alone force) you to do the
> right thing, but makes it easy to do the wrong thing.
>
> On Thu, Mar 1, 2012 at 6:15 AM, jim holtman  wrote:
>> But there are some important reasons to use Excel.  In my work there
>> are a lot of people that I have to send the equivalent of a data.frame
>> to who want to look at the data and possibly slice/dice the data
>> differently and then send back to me updates.  These folks do not know
>> how to use R, but do have Microsoft Office installed on their
>> computers and know how to use the different products.
>>
>> I have been very successful in conveying what I am doing for them by
>> communicating via Excel spreadsheets.  It is also an important medium
>> in dealing with some international companies who provide data via
>> Excel and expect responses back via Excel.
>>
>> When dealing with data in a tabular form, Excel does provide a way for
>> a majority of the people I work with to understand the data.  Yes,
>> there are problems with some of the ways that people use Excel, and
>> yes I have had to invest time in scrubbing some of the data that I get
>> from them, but if I did not, then I would probably not have a job
>> working for them.  I use R exclusively for the analysis that I do, but
>> find it convenient to use Excel to provide a communication mechanism
>> to the majority of the non-R users that I have to deal with.  It is a
>> convenient "work-around" because I would never get them to invest the
>> time to learn R.
>>
>> So in the real world these is a need to Excel and we are not going to
>> cause it to go away; we have to learn how to live with it, and from my
>> standpoint, it has definitely benefited me in being able to
>> communicate with my users and continuing to provide them with results
>> that they are happy with.  They refer to letting me work my "magic" on
>> the data; all they know is they see the result via Excel and in th

Re: [R] Cleaning up messy Excel data

2012-03-02 Thread Rolf Turner

On 03/03/12 12:41, Greg Snow wrote:



It is possible to do the right thing in
Excel, but Excel does not encourage (let alone force) you to do the
right thing, but makes it easy to do the wrong thing.



Fortune!

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning up messy Excel data

2012-03-02 Thread Jim Lemon
Unfortunately, a lot of people who use MS Office don't have or know how 
to use MS Access. Where I work now (as in the past) I have to tie 
someone to their chair, give them a few pokes with the cattle prod and 
then show them that a CSV file will load straight into Excel before I 
can convince them that they can use such a heretical data format. You 
don't want to know what I have to do to convince them that they can view 
my listings in HTML.


Jim

PS - Always give them a _copy_ of the CSV file.

On 03/03/2012 10:41 AM, Greg Snow wrote:

Try sending your clients a data set (data frame, table, etc) as an MS
Access data table instead.  They can still view the data as a table,
but will have to go to much more effort to mess up the data, more
likely they will do proper edits without messing anything up (mixing
characters in with numbers, have more sexes than your biology teacher
told you about, add extra lines at top or bottom that makes reading
back into R more difficult, etc.)

I have had a few clients that I talked into using MS Access from the
start to enter their data, there was often a bit of resistance at
first, but once they tried it and went through the process of
designing the database up front they ended up thanking me and believed
that the entire data entry process was easier and quicker than had the
used excel as they originally planned.

Access is still part of MS office, so they don't need to learn R or in
any way break their chains from being prisoners of bill, but they will
be more productive in more ways than just interfacing with you.

Access (databases in general) force you to plan things out and do the
correct thing from the start.  It is possible to do the right thing in
Excel, but Excel does not encourage (let alone force) you to do the
right thing, but makes it easy to do the wrong thing.

On Thu, Mar 1, 2012 at 6:15 AM, jim holtman  wrote:

But there are some important reasons to use Excel.  In my work there
are a lot of people that I have to send the equivalent of a data.frame
to who want to look at the data and possibly slice/dice the data
differently and then send back to me updates.  These folks do not know
how to use R, but do have Microsoft Office installed on their
computers and know how to use the different products.

I have been very successful in conveying what I am doing for them by
communicating via Excel spreadsheets.  It is also an important medium
in dealing with some international companies who provide data via
Excel and expect responses back via Excel.

When dealing with data in a tabular form, Excel does provide a way for
a majority of the people I work with to understand the data.  Yes,
there are problems with some of the ways that people use Excel, and
yes I have had to invest time in scrubbing some of the data that I get
from them, but if I did not, then I would probably not have a job
working for them.  I use R exclusively for the analysis that I do, but
find it convenient to use Excel to provide a communication mechanism
to the majority of the non-R users that I have to deal with.  It is a
convenient "work-around" because I would never get them to invest the
time to learn R.

So in the real world these is a need to Excel and we are not going to
cause it to go away; we have to learn how to live with it, and from my
standpoint, it has definitely benefited me in being able to
communicate with my users and continuing to provide them with results
that they are happy with.  They refer to letting me work my "magic" on
the data; all they know is they see the result via Excel and in the
background R is doing the heavy lifting that they do not have to know
about.

On Wed, Feb 29, 2012 at 4:41 PM, Rolf Turner  wrote:

On 01/03/12 04:43, John Kane wrote:


(mydata<- as.factor(c("1","2","3", ">2", "5", ">2")))
str(mydata)

newdata<- as.character(mydata)

newdata[newdata==">2"]<- 0
newdata<- as.numeric(newdata)
str(newdata)

We really need to keep Excel (and other spreadsheets) out of peoples
hands.



Amen, bro'!!!

cheers,

Rolf Turner



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning up messy Excel data

2012-03-02 Thread Greg Snow
Try sending your clients a data set (data frame, table, etc) as an MS
Access data table instead.  They can still view the data as a table,
but will have to go to much more effort to mess up the data, more
likely they will do proper edits without messing anything up (mixing
characters in with numbers, have more sexes than your biology teacher
told you about, add extra lines at top or bottom that makes reading
back into R more difficult, etc.)

I have had a few clients that I talked into using MS Access from the
start to enter their data, there was often a bit of resistance at
first, but once they tried it and went through the process of
designing the database up front they ended up thanking me and believed
that the entire data entry process was easier and quicker than had the
used excel as they originally planned.

Access is still part of MS office, so they don't need to learn R or in
any way break their chains from being prisoners of bill, but they will
be more productive in more ways than just interfacing with you.

Access (databases in general) force you to plan things out and do the
correct thing from the start.  It is possible to do the right thing in
Excel, but Excel does not encourage (let alone force) you to do the
right thing, but makes it easy to do the wrong thing.

On Thu, Mar 1, 2012 at 6:15 AM, jim holtman  wrote:
> But there are some important reasons to use Excel.  In my work there
> are a lot of people that I have to send the equivalent of a data.frame
> to who want to look at the data and possibly slice/dice the data
> differently and then send back to me updates.  These folks do not know
> how to use R, but do have Microsoft Office installed on their
> computers and know how to use the different products.
>
> I have been very successful in conveying what I am doing for them by
> communicating via Excel spreadsheets.  It is also an important medium
> in dealing with some international companies who provide data via
> Excel and expect responses back via Excel.
>
> When dealing with data in a tabular form, Excel does provide a way for
> a majority of the people I work with to understand the data.  Yes,
> there are problems with some of the ways that people use Excel, and
> yes I have had to invest time in scrubbing some of the data that I get
> from them, but if I did not, then I would probably not have a job
> working for them.  I use R exclusively for the analysis that I do, but
> find it convenient to use Excel to provide a communication mechanism
> to the majority of the non-R users that I have to deal with.  It is a
> convenient "work-around" because I would never get them to invest the
> time to learn R.
>
> So in the real world these is a need to Excel and we are not going to
> cause it to go away; we have to learn how to live with it, and from my
> standpoint, it has definitely benefited me in being able to
> communicate with my users and continuing to provide them with results
> that they are happy with.  They refer to letting me work my "magic" on
> the data; all they know is they see the result via Excel and in the
> background R is doing the heavy lifting that they do not have to know
> about.
>
> On Wed, Feb 29, 2012 at 4:41 PM, Rolf Turner  wrote:
>> On 01/03/12 04:43, John Kane wrote:
>>>
>>> (mydata<- as.factor(c("1","2","3", ">2", "5", ">2")))
>>> str(mydata)
>>>
>>> newdata<- as.character(mydata)
>>>
>>> newdata[newdata==">2"]<- 0
>>> newdata<- as.numeric(newdata)
>>> str(newdata)
>>>
>>> We really need to keep Excel (and other spreadsheets) out of peoples
>>> hands.
>>
>>
>> Amen, bro'!!!
>>
>>    cheers,
>>
>>        Rolf Turner
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning up messy Excel data

2012-03-01 Thread jim holtman
But there are some important reasons to use Excel.  In my work there
are a lot of people that I have to send the equivalent of a data.frame
to who want to look at the data and possibly slice/dice the data
differently and then send back to me updates.  These folks do not know
how to use R, but do have Microsoft Office installed on their
computers and know how to use the different products.

I have been very successful in conveying what I am doing for them by
communicating via Excel spreadsheets.  It is also an important medium
in dealing with some international companies who provide data via
Excel and expect responses back via Excel.

When dealing with data in a tabular form, Excel does provide a way for
a majority of the people I work with to understand the data.  Yes,
there are problems with some of the ways that people use Excel, and
yes I have had to invest time in scrubbing some of the data that I get
from them, but if I did not, then I would probably not have a job
working for them.  I use R exclusively for the analysis that I do, but
find it convenient to use Excel to provide a communication mechanism
to the majority of the non-R users that I have to deal with.  It is a
convenient "work-around" because I would never get them to invest the
time to learn R.

So in the real world these is a need to Excel and we are not going to
cause it to go away; we have to learn how to live with it, and from my
standpoint, it has definitely benefited me in being able to
communicate with my users and continuing to provide them with results
that they are happy with.  They refer to letting me work my "magic" on
the data; all they know is they see the result via Excel and in the
background R is doing the heavy lifting that they do not have to know
about.

On Wed, Feb 29, 2012 at 4:41 PM, Rolf Turner  wrote:
> On 01/03/12 04:43, John Kane wrote:
>>
>> (mydata<- as.factor(c("1","2","3", ">2", "5", ">2")))
>> str(mydata)
>>
>> newdata<- as.character(mydata)
>>
>> newdata[newdata==">2"]<- 0
>> newdata<- as.numeric(newdata)
>> str(newdata)
>>
>> We really need to keep Excel (and other spreadsheets) out of peoples
>> hands.
>
>
> Amen, bro'!!!
>
>    cheers,
>
>        Rolf Turner
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning up messy Excel data

2012-02-29 Thread Rolf Turner

On 01/03/12 04:43, John Kane wrote:

(mydata<- as.factor(c("1","2","3", ">2", "5", ">2")))
str(mydata)

newdata<- as.character(mydata)

newdata[newdata==">2"]<- 0
newdata<- as.numeric(newdata)
str(newdata)

We really need to keep Excel (and other spreadsheets) out of peoples hands.


Amen, bro'!!!

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning up messy Excel data

2012-02-29 Thread John Kane

(mydata <- as.factor(c("1","2","3", ">2", "5", ">2")))
str(mydata)

newdata <- as.character(mydata)

newdata[newdata==">2"] <- 0
newdata <- as.numeric(newdata)
str(newdata)

We really need to keep Excel (and other spreadsheets) out of peoples hands.

John Kane
Kingston ON Canada


> -Original Message-
> From: noahsilver...@ucla.edu
> Sent: Tue, 28 Feb 2012 13:27:13 -0800
> To: r-help@r-project.org
> Subject: [R] Cleaning up messy Excel data
> 
> Unfortunately, some data I need to work with was delivered in a rather
> messy Excel file.  I want to import into R and clean up some things so
> that I can do my analysis.  Pulling in a CSV from Excel is the easy part.
> 
> My current challenge is dealing with some text mixed in the values.
> i.e.   118   5.7   <2.0  3.7
> 
> Since this column in Excel has a "<2.0" value, then R reads the column as
> a factor with levels.  Ideally, I want to convert it a normal vector of
> scalars and code code the "<2.0" as 0.
> 
> Can anyone suggest an easy way to do this?
> 
> Thanks!
> 
> 
> --
> Noah Silverman
> UCLA Department of Statistics
> 8117 Math Sciences Building
> Los Angeles, CA 90095
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


FREE ONLINE PHOTOSHARING - Share your photos online with your friends and 
family!
Visit http://www.inbox.com/photosharing to find out more!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning up messy Excel data

2012-02-28 Thread Stephen Sefick
Just replace that value with zero.  If you provide some reproducible 
code I could probably give you a solution.

?dput
good luck,

Stephen

On 02/28/2012 03:27 PM, Noah Silverman wrote:

Unfortunately, some data I need to work with was delivered in a rather messy 
Excel file.  I want to import into R and clean up some things so that I can do 
my analysis.  Pulling in a CSV from Excel is the easy part.

My current challenge is dealing with some text mixed in the values.
i.e.   118   5.7<2.0  3.7

Since this column in Excel has a "<2.0" value, then R reads the column as a factor with 
levels.  Ideally, I want to convert it a normal vector of scalars and code code the "<2.0" 
as 0.

Can anyone suggest an easy way to do this?

Thanks!


--
Noah Silverman
UCLA Department of Statistics
8117 Math Sciences Building
Los Angeles, CA 90095


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Stephen Sefick
**
Auburn University
Biological Sciences
331 Funchess Hall
Auburn, Alabama
36849
**
sas0...@auburn.edu
http://www.auburn.edu/~sas0025
**

Let's not spend our time and resources thinking about things that are so little 
or so large that all they really do for us is puff us up and make us feel like 
gods.  We are mammals, and have not exhausted the annoying little problems of 
being mammals.

-K. Mullis

"A big computer, a complex algorithm and a long time does not equal science."

  -Robert Gentleman

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning up messy Excel data

2012-02-28 Thread Noah Silverman
That's exactly what I need.

Thank You!!


--
Noah Silverman
UCLA Department of Statistics
8117 Math Sciences Building
Los Angeles, CA 90095

On Feb 28, 2012, at 1:42 PM, jim holtman wrote:

> First of all when reading in the CSV file, use 'as.is = TRUE' to
> prevent the changing to factors.
> 
> Now that things are character in that column, you can use some pattern
> expressions (gsub, regex, ...) to search for and change your data.
> E.g.,
> 
> sub("<.*", "0", yourCol)
> 
> should do it for you.
> 
> On Tue, Feb 28, 2012 at 4:27 PM, Noah Silverman  
> wrote:
>> Unfortunately, some data I need to work with was delivered in a rather messy 
>> Excel file.  I want to import into R and clean up some things so that I can 
>> do my analysis.  Pulling in a CSV from Excel is the easy part.
>> 
>> My current challenge is dealing with some text mixed in the values.
>> i.e.   118   5.7   <2.0  3.7
>> 
>> Since this column in Excel has a "<2.0" value, then R reads the column as a 
>> factor with levels.  Ideally, I want to convert it a normal vector of 
>> scalars and code code the "<2.0" as 0.
>> 
>> Can anyone suggest an easy way to do this?
>> 
>> Thanks!
>> 
>> 
>> --
>> Noah Silverman
>> UCLA Department of Statistics
>> 8117 Math Sciences Building
>> Los Angeles, CA 90095
>> 
>> 
>>[[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> -- 
> Jim Holtman
> Data Munger Guru
> 
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning up messy Excel data

2012-02-28 Thread Robert Baer
-Original Message- 
From: Noah Silverman

Sent: Tuesday, February 28, 2012 3:27 PM
To: r-help
Subject: [R] Cleaning up messy Excel data

Unfortunately, some data I need to work with was delivered in a rather messy 
Excel file.  I want to import into R and clean up some things so that I can 
do my analysis.  Pulling in a CSV from Excel is the easy part.


My current challenge is dealing with some text mixed in the values.
i.e.   118   5.7   <2.0  3.7

Since this column in Excel has a "<2.0" value, then R reads the column as a 
factor with levels.  Ideally, I want to convert it a normal vector of 
scalars and code code the "<2.0" as 0.


Can anyone suggest an easy way to do this?
--
?as.character
will show you how to change the "factor" column into a character column. 
Then, you can replace text using any of a number of procedures.

see for example
?gsub

finally, you can use as.numeric if you want numbers.  "Coding" is best done 
in the context of factors, so you might want to consider where replacing  <2 
with NA is more appropriate than replacing with 0.  In this end, the choice 
might be context sensitive.


Rob

--
Robert W. Baer, Ph.D.
Professor of Physiology
Kirksville College of Osteopathic Medicine
A. T. Still University of Health Sciences
800 W. Jefferson St.
Kirksville, MO 63501
660-626-2322
FAX 660-626-2965

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning up messy Excel data

2012-02-28 Thread jim holtman
First of all when reading in the CSV file, use 'as.is = TRUE' to
prevent the changing to factors.

Now that things are character in that column, you can use some pattern
expressions (gsub, regex, ...) to search for and change your data.
E.g.,

sub("<.*", "0", yourCol)

should do it for you.

On Tue, Feb 28, 2012 at 4:27 PM, Noah Silverman  wrote:
> Unfortunately, some data I need to work with was delivered in a rather messy 
> Excel file.  I want to import into R and clean up some things so that I can 
> do my analysis.  Pulling in a CSV from Excel is the easy part.
>
> My current challenge is dealing with some text mixed in the values.
> i.e.   118   5.7   <2.0  3.7
>
> Since this column in Excel has a "<2.0" value, then R reads the column as a 
> factor with levels.  Ideally, I want to convert it a normal vector of scalars 
> and code code the "<2.0" as 0.
>
> Can anyone suggest an easy way to do this?
>
> Thanks!
>
>
> --
> Noah Silverman
> UCLA Department of Statistics
> 8117 Math Sciences Building
> Los Angeles, CA 90095
>
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cleaning up messy Excel data

2012-02-28 Thread Noah Silverman
Unfortunately, some data I need to work with was delivered in a rather messy 
Excel file.  I want to import into R and clean up some things so that I can do 
my analysis.  Pulling in a CSV from Excel is the easy part.

My current challenge is dealing with some text mixed in the values.  
i.e.   118   5.7   <2.0  3.7 

Since this column in Excel has a "<2.0" value, then R reads the column as a 
factor with levels.  Ideally, I want to convert it a normal vector of scalars 
and code code the "<2.0" as 0.  

Can anyone suggest an easy way to do this?

Thanks!


--
Noah Silverman
UCLA Department of Statistics
8117 Math Sciences Building
Los Angeles, CA 90095


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.