Re: [R] Working with Data Frames

2015-11-03 Thread Joshua Ulrich
On Tue, Nov 3, 2015 at 6:26 PM, Robert Sherry  wrote:
> I have created what I believe to be a data frame. It is called env1$SPY.

It's not a data.frame.  You can use str() to look at the *str*ucture
of an object:

R> str(env1$SPY)
An ‘xts’ object on 1995-01-03/2015-11-02 containing:
  Data: num [1:5247, 1:6] 45.7 46 46 46.1 46 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:6] "SPY.Open" "SPY.High" "SPY.Low" "SPY.Close" ...
  Indexed by objects of class: [Date] TZ: UTC
  xts Attributes:
List of 2
 $ src: chr "yahoo"
 $ updated: POSIXct[1:1], format: "2015-11-03 18:51:48"

> The r statement head( env1$SPY ) produces the following output:
>
>SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume SPY.Adjusted
> 1995-01-03  45.7031  45.8437 45.6875   45.7812 324300 31.55312
> 1995-01-04  45.9843  46. 45.7500   46. 351800 31.70392
> 1995-01-05  46.0312  46.1093 45.9531   46.  89800 31.70392
> 1995-01-06  46.0937  46.2500 45.9062   46.0468 448400 31.73617
> 1995-01-09  46.0312  46.0937 46.   46.0937  36800 31.76850
> 1995-01-10  46.2031  46.3906 46.1406   46.1406 229800 31.80082
>
> The above data from was created by the following commands:
> library( quantmod )
> env1 <- new.env()
> getSymbols("SPY", src = 'yahoo', from = '1995-01-01', env = env1,
> auto.assign = T)
>
> Now, what I want to do is to loo through the data look for when the month
> changes. What is the proper way of writing a for loop in
> R and access the date field?
>
Since the above is an xts object, there is no "date field".  There's
an index attribute.  It would probably help you a lot to read the xts
and zoo vignettes/FAQ.

You would also get better help if you were more specific about what
you're trying to do.  There's probably an easier way to do what you
intend to do with a loop.

> Bob
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Ulrich  |  about.me/joshuaulrich
FOSS Trading  |  www.fosstrading.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Working with Data Frames

2015-11-03 Thread Bert Gunter
Have you gone through any R tutorials? There are innumerable good ones
on the web -- and one that ships with R (An Intro to R). Don't you
think you should make an effort to learn some basics on your own
before posting here?

... or do I misinterpret your question? (And if so, my apologies --
feel free to chastise me appropriately).

Cheers,
Bert


Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Tue, Nov 3, 2015 at 4:26 PM, Robert Sherry  wrote:
> I have created what I believe to be a data frame. It is called env1$SPY.
> The r statement head( env1$SPY ) produces the following output:
>
>SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume SPY.Adjusted
> 1995-01-03  45.7031  45.8437 45.6875   45.7812 324300 31.55312
> 1995-01-04  45.9843  46. 45.7500   46. 351800 31.70392
> 1995-01-05  46.0312  46.1093 45.9531   46.  89800 31.70392
> 1995-01-06  46.0937  46.2500 45.9062   46.0468 448400 31.73617
> 1995-01-09  46.0312  46.0937 46.   46.0937  36800 31.76850
> 1995-01-10  46.2031  46.3906 46.1406   46.1406 229800 31.80082
>
> The above data from was created by the following commands:
> library( quantmod )
> env1 <- new.env()
> getSymbols("SPY", src = 'yahoo', from = '1995-01-01', env = env1,
> auto.assign = T)
>
> Now, what I want to do is to loo through the data look for when the month
> changes. What is the proper way of writing a for loop in
> R and access the date field?
>
> Bob
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Working with Data Frames

2015-11-03 Thread Peter Alspach
Tena koe Robert

Many times in R one can do things without a loop.  In this case, see ?rle.  You 
might also need to check substring or months depending on how you dates are 
stored.

HTH 

Peter Alspach

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Robert Sherry
Sent: Wednesday, 4 November 2015 1:27 p.m.
To: r-help@r-project.org
Subject: [R] Working with Data Frames

I have created what I believe to be a data frame. It is called env1$SPY.  The r 
statement head( env1$SPY ) produces the following output:

SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume SPY.Adjusted
1995-01-03  45.7031  45.8437 45.6875   45.7812 324300 31.55312
1995-01-04  45.9843  46. 45.7500   46. 351800 31.70392
1995-01-05  46.0312  46.1093 45.9531   46.  89800 31.70392
1995-01-06  46.0937  46.2500 45.9062   46.0468 448400 31.73617
1995-01-09  46.0312  46.0937 46.   46.0937  36800 31.76850
1995-01-10  46.2031  46.3906 46.1406   46.1406 229800 31.80082

The above data from was created by the following commands:
 library( quantmod )
 env1 <- new.env()
 getSymbols("SPY", src = 'yahoo', from = '1995-01-01', env = env1, 
auto.assign = T)

Now, what I want to do is to loo through the data look for when the month 
changes. What is the proper way of writing a for loop in R and access the date 
field?

Bob

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
The contents of this e-mail are confidential and may be ...{{dropped:14}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Working with Data Frames

2015-11-03 Thread Robert Sherry
I have created what I believe to be a data frame. It is called 
env1$SPY.  The r statement head( env1$SPY ) produces the following output:


   SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume SPY.Adjusted
1995-01-03  45.7031  45.8437 45.6875   45.7812 324300 31.55312
1995-01-04  45.9843  46. 45.7500   46. 351800 31.70392
1995-01-05  46.0312  46.1093 45.9531   46.  89800 31.70392
1995-01-06  46.0937  46.2500 45.9062   46.0468 448400 31.73617
1995-01-09  46.0312  46.0937 46.   46.0937  36800 31.76850
1995-01-10  46.2031  46.3906 46.1406   46.1406 229800 31.80082

The above data from was created by the following commands:
library( quantmod )
env1 <- new.env()
getSymbols("SPY", src = 'yahoo', from = '1995-01-01', env = env1, 
auto.assign = T)


Now, what I want to do is to loo through the data look for when the 
month changes. What is the proper way of writing a for loop in

R and access the date field?

Bob

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Working with data frames

2014-12-11 Thread William Dunlap
Sun Shine wrote
> with(MHP.def, {plot(as.integer(MHP.def$Names),cH.E, axes=FALSE,
xlab='Area') axis(side=2) axis(side=1, at=seq_along(levels(MHP.def$Names)),
lab=levels(MHP.def$Names))})

Error: unexpected symbol in "with(MHP.def, {plot(as.integer(MHP.def$Names),
MHP.def$cH.E, axes=FALSE, xlab='Area') axis"

This may have something to do with the period between cH and E or perhaps
from the $ to access data from a column?

--> When you see a syntax error message the error is usually towards the
end of the quoted text.  In your case, you are missing a newline or
semicolon between "'Area')" and the subsequent "axis".



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Dec 11, 2014 at 9:05 AM, Sun Shine  wrote:

>  Hello William, Ivan and Jim
>
> I appreciate your replies.
>
> I did suppress the factors using stringsAsFactors=FALSE and in that way
> was able to progress some more on getting a sense of the data set, so
> thanks for that suggestion. I had previously overlooked it.
>
> Also thanks William, I never understood what those thick line segs were -
> now I do. That had been about the best I could get by that point and still
> not with the names on the x axis.
>
> Unfortunately using William's suggestion of 'with' gave me errors:
>
> > with(MHP.def, {plot(as.integer(MHP.def$Names),cH.E, axes=FALSE,
> xlab='Area') axis(side=2) axis(side=1, at=seq_along(levels(MHP.def$Names)),
> lab=levels(MHP.def$Names))})
>
> Error: unexpected symbol in "with(MHP.def,
> {plot(as.integer(MHP.def$Names), MHP.def$cH.E, axes=FALSE, xlab='Area')
> axis"
>
> This may have something to do with the period between cH and E or perhaps
> from the $ to access data from a column?
>
> I have now installed ggplot2 and with the help of the graphics cookbook
> will see if I can make some headway like this, at least for now. I think
> William's suggestion about learning to work with factors is fundamentally
> sound and something I will need to get my head around. For now though, I
> think I'll stick to exploring ggplot2 so that I can visualise this data set
> more easily.
>
> Thanks again.
>
> Best
>
> Sun
>
>
> On 11/12/14 16:06, William Dunlap wrote:
>
> Here is a reproducible example
>   > d <- read.csv(text="Name,Age\nBob,2\nXavier,25\nAdam,1")
>   > str(d)
>   'data.frame':   3 obs. of  2 variables:
>$ Name: Factor w/ 3 levels "Adam","Bob","Xavier": 2 3 1
>$ Age : int  2 25 1
>
>  Do you get something similar?  If not, show us what you have (you
> could trim it down to a few columns).
>
>  Let's try some plots.
> > plot(d$Age)
> This shows a plot of d$Age (on y axis) vs "Index", where Index is
> 1:length(d$Age).  The points are at (1,2), (2,25), and (3,1). You gave
> plot() no information about what should be on the x axis so it gave
> you the index numbers.
>
>  Now asking for d$Name on the x axis and d$Age on the y.
> > plot(d$Name, d$Age)
> This put the names, in alphabetical order on the x axis.  The y axis
> ranges from about 0 to 25 and neither axis is labelled.  There are
> thick horizontal line segments where you expect the the points to
> be.  These are degenerate boxplots - when you ask to plot a
> 'factor' variable on the x axis and numbers on the y you get such
> a plot.
>
>  Some folks suggested you avoid factors by adding stringsAsFactors=FALSE
> (or as.is=TRUE) to your call to read.csv.  Let's try that
>   > d2 <- read.csv(stringsAsFactors=FALSE,
> text="Name,Age\nBob,2\nXavier,25\nAdam,1")
>> plot(d2$Name, d2$Age)
>   Error in plot.window(...) : need finite 'xlim' values
>   In addition: Warning messages:
>   1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
>   2: In min(x) : no non-missing arguments to min; returning Inf
>   3: In max(x) : no non-missing arguments to max; returning -Inf
> You get no plot at all.
>
>  You can get closer to what I think you want with
>   with(d, {
> plot(as.integer(Name), Age, axes=FALSE, xlab="Name")
> axis(side=2) # draw the usual y axis
> axis(side=1, at=seq_along(levels(Name)), lab=levels(Name))
>   })
> If you want the names in a different order on the x axis, then reconstruct
> the factor object d$Name with a different order of levels.  E.g.,
>   d$Name <- factor(d$Name, levels=c("Xavier", "Bob", "Adam"))
> and replot.
>
>  There are various plotting packages, e.g., ggplot2, that can make this
> sort of thing easier, but I think the recommendation not to use factors
> is wrong.  You do need to learn how to use them to your advantage.
>
>  Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Thu, Dec 11, 2014 at 5:00 AM, Sun Shine  wrote:
>
>> Hello
>>
>> I am struggling with data frames and would appreciate some help please.
>>
>> I have a data set of 13 observations and 80 variables. The first column
>> is the names of different political area boundaries (e.g. MHad, LBNW, etc),
>> the first row is a vector of variable names concerning various census data
>> (e.g. age.T, hse.Unk, etc.). The first cell [1,1] is

Re: [R] Working with data frames

2014-12-11 Thread Jeff Newmiller
Ggplot2 also depends on factors, so learn about them asap. It does have some 
support for automatically converting strings to factors in some cases, but it 
doesn't always work the way you want it to.
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On December 11, 2014 9:05:32 AM PST, Sun Shine  wrote:
>Hello William, Ivan and Jim
>
>I appreciate your replies.
>
>I did suppress the factors using stringsAsFactors=FALSE and in that way
>
>was able to progress some more on getting a sense of the data set, so 
>thanks for that suggestion. I had previously overlooked it.
>
>Also thanks William, I never understood what those thick line segs were
>
>- now I do. That had been about the best I could get by that point and 
>still not with the names on the x axis.
>
>Unfortunately using William's suggestion of 'with' gave me errors:
>
> > with(MHP.def, {plot(as.integer(MHP.def$Names),cH.E, axes=FALSE, 
>xlab='Area') axis(side=2) axis(side=1, 
>at=seq_along(levels(MHP.def$Names)), lab=levels(MHP.def$Names))})
>
>Error: unexpected symbol in "with(MHP.def, 
>{plot(as.integer(MHP.def$Names), MHP.def$cH.E, axes=FALSE, xlab='Area')
>
>axis"
>
>This may have something to do with the period between cH and E or 
>perhaps from the $ to access data from a column?
>
>I have now installed ggplot2 and with the help of the graphics cookbook
>
>will see if I can make some headway like this, at least for now. I
>think 
>William's suggestion about learning to work with factors is 
>fundamentally sound and something I will need to get my head around.
>For 
>now though, I think I'll stick to exploring ggplot2 so that I can 
>visualise this data set more easily.
>
>Thanks again.
>
>Best
>
>Sun
>
>On 11/12/14 16:06, William Dunlap wrote:
>> Here is a reproducible example
>>   > d <- read.csv(text="Name,Age\nBob,2\nXavier,25\nAdam,1")
>>   > str(d)
>>   'data.frame':   3 obs. of  2 variables:
>>$ Name: Factor w/ 3 levels "Adam","Bob","Xavier": 2 3 1
>>$ Age : int  2 25 1
>>
>> Do you get something similar?  If not, show us what you have (you
>> could trim it down to a few columns).
>>
>> Let's try some plots.
>> > plot(d$Age)
>> This shows a plot of d$Age (on y axis) vs "Index", where Index is
>> 1:length(d$Age).  The points are at (1,2), (2,25), and (3,1). You
>gave
>> plot() no information about what should be on the x axis so it gave
>> you the index numbers.
>>
>> Now asking for d$Name on the x axis and d$Age on the y.
>> > plot(d$Name, d$Age)
>> This put the names, in alphabetical order on the x axis. The y axis
>> ranges from about 0 to 25 and neither axis is labelled. There are
>> thick horizontal line segments where you expect the the points to
>> be.  These are degenerate boxplots - when you ask to plot a
>> 'factor' variable on the x axis and numbers on the y you get such
>> a plot.
>>
>> Some folks suggested you avoid factors by adding
>stringsAsFactors=FALSE
>> (or as.is =TRUE) to your call to read.csv.  Let's try
>that
>>   > d2 <- read.csv(stringsAsFactors=FALSE,
>> text="Name,Age\nBob,2\nXavier,25\nAdam,1")
>>   > plot(d2$Name, d2$Age)
>>   Error in plot.window(...) : need finite 'xlim' values
>>   In addition: Warning messages:
>>   1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by
>coercion
>>   2: In min(x) : no non-missing arguments to min; returning Inf
>>   3: In max(x) : no non-missing arguments to max; returning -Inf
>> You get no plot at all.
>>
>> You can get closer to what I think you want with
>>   with(d, {
>> plot(as.integer(Name), Age, axes=FALSE, xlab="Name")
>> axis(side=2) # draw the usual y axis
>> axis(side=1, at=seq_along(levels(Name)), lab=levels(Name))
>>   })
>> If you want the names in a different order on the x axis, then
>reconstruct
>> the factor object d$Name with a different order of levels.  E.g.,
>>   d$Name <- factor(d$Name, levels=c("Xavier", "Bob", "Adam"))
>> and replot.
>>
>> There are various plotting packages, e.g., ggplot2, that can make
>this
>> sort of thing easier, but I think the recommendation not to use
>factors
>> is wrong.  You do need to learn how to use them to your advantage.
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com 
>>
>> On Thu, Dec 11, 2014 at 5:00 AM, Sun Shine > > wrote:
>>
>> Hello
>>
>> I am struggling with data frames and would appreciate some help
>> please.
>>
>> I have a data set of 13 observations and 80 variables. The first
>> colum

Re: [R] Working with data frames

2014-12-11 Thread Sun Shine
Hello William, Ivan and Jim

I appreciate your replies.

I did suppress the factors using stringsAsFactors=FALSE and in that way 
was able to progress some more on getting a sense of the data set, so 
thanks for that suggestion. I had previously overlooked it.

Also thanks William, I never understood what those thick line segs were 
- now I do. That had been about the best I could get by that point and 
still not with the names on the x axis.

Unfortunately using William's suggestion of 'with' gave me errors:

 > with(MHP.def, {plot(as.integer(MHP.def$Names),cH.E, axes=FALSE, 
xlab='Area') axis(side=2) axis(side=1, 
at=seq_along(levels(MHP.def$Names)), lab=levels(MHP.def$Names))})

Error: unexpected symbol in "with(MHP.def, 
{plot(as.integer(MHP.def$Names), MHP.def$cH.E, axes=FALSE, xlab='Area') 
axis"

This may have something to do with the period between cH and E or 
perhaps from the $ to access data from a column?

I have now installed ggplot2 and with the help of the graphics cookbook 
will see if I can make some headway like this, at least for now. I think 
William's suggestion about learning to work with factors is 
fundamentally sound and something I will need to get my head around. For 
now though, I think I'll stick to exploring ggplot2 so that I can 
visualise this data set more easily.

Thanks again.

Best

Sun

On 11/12/14 16:06, William Dunlap wrote:
> Here is a reproducible example
>   > d <- read.csv(text="Name,Age\nBob,2\nXavier,25\nAdam,1")
>   > str(d)
>   'data.frame':   3 obs. of  2 variables:
>$ Name: Factor w/ 3 levels "Adam","Bob","Xavier": 2 3 1
>$ Age : int  2 25 1
>
> Do you get something similar?  If not, show us what you have (you
> could trim it down to a few columns).
>
> Let's try some plots.
> > plot(d$Age)
> This shows a plot of d$Age (on y axis) vs "Index", where Index is
> 1:length(d$Age).  The points are at (1,2), (2,25), and (3,1). You gave
> plot() no information about what should be on the x axis so it gave
> you the index numbers.
>
> Now asking for d$Name on the x axis and d$Age on the y.
> > plot(d$Name, d$Age)
> This put the names, in alphabetical order on the x axis. The y axis
> ranges from about 0 to 25 and neither axis is labelled. There are
> thick horizontal line segments where you expect the the points to
> be.  These are degenerate boxplots - when you ask to plot a
> 'factor' variable on the x axis and numbers on the y you get such
> a plot.
>
> Some folks suggested you avoid factors by adding stringsAsFactors=FALSE
> (or as.is =TRUE) to your call to read.csv.  Let's try that
>   > d2 <- read.csv(stringsAsFactors=FALSE,
> text="Name,Age\nBob,2\nXavier,25\nAdam,1")
>   > plot(d2$Name, d2$Age)
>   Error in plot.window(...) : need finite 'xlim' values
>   In addition: Warning messages:
>   1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
>   2: In min(x) : no non-missing arguments to min; returning Inf
>   3: In max(x) : no non-missing arguments to max; returning -Inf
> You get no plot at all.
>
> You can get closer to what I think you want with
>   with(d, {
> plot(as.integer(Name), Age, axes=FALSE, xlab="Name")
> axis(side=2) # draw the usual y axis
> axis(side=1, at=seq_along(levels(Name)), lab=levels(Name))
>   })
> If you want the names in a different order on the x axis, then reconstruct
> the factor object d$Name with a different order of levels.  E.g.,
>   d$Name <- factor(d$Name, levels=c("Xavier", "Bob", "Adam"))
> and replot.
>
> There are various plotting packages, e.g., ggplot2, that can make this
> sort of thing easier, but I think the recommendation not to use factors
> is wrong.  You do need to learn how to use them to your advantage.
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com 
>
> On Thu, Dec 11, 2014 at 5:00 AM, Sun Shine  > wrote:
>
> Hello
>
> I am struggling with data frames and would appreciate some help
> please.
>
> I have a data set of 13 observations and 80 variables. The first
> column is the names of different political area boundaries (e.g.
> MHad, LBNW, etc), the first row is a vector of variable names
> concerning various census data (e.g. age.T, hse.Unk, etc.). The
> first cell [1,1] is blank.
>
> I have loaded this via read.csv('path.to/data.set.csv'
> ), and now want to run some
> analyses on this data frame. If I want to get a list of the names
> of the political areas (i.e. the first column), the result is a
> vector of numbers which appear to correlate with the factors, but
> I don't get the text names, just the corresponding number. So, if
> I want to plot something basic, like the area that uses the most
> gas for central heating, for example:
>
> > plot(data.set$ch.Gas)
>
> The result is the y-axis gives the gas usage for the areas, but
> the x-axis gives only the numbers of the areas, n

Re: [R] Working with data frames

2014-12-11 Thread William Dunlap
Here is a reproducible example
  > d <- read.csv(text="Name,Age\nBob,2\nXavier,25\nAdam,1")
  > str(d)
  'data.frame':   3 obs. of  2 variables:
   $ Name: Factor w/ 3 levels "Adam","Bob","Xavier": 2 3 1
   $ Age : int  2 25 1

Do you get something similar?  If not, show us what you have (you
could trim it down to a few columns).

Let's try some plots.
> plot(d$Age)
This shows a plot of d$Age (on y axis) vs "Index", where Index is
1:length(d$Age).  The points are at (1,2), (2,25), and (3,1). You gave
plot() no information about what should be on the x axis so it gave
you the index numbers.

Now asking for d$Name on the x axis and d$Age on the y.
> plot(d$Name, d$Age)
This put the names, in alphabetical order on the x axis.  The y axis
ranges from about 0 to 25 and neither axis is labelled.  There are
thick horizontal line segments where you expect the the points to
be.  These are degenerate boxplots - when you ask to plot a
'factor' variable on the x axis and numbers on the y you get such
a plot.

Some folks suggested you avoid factors by adding stringsAsFactors=FALSE
(or as.is=TRUE) to your call to read.csv.  Let's try that
  > d2 <- read.csv(stringsAsFactors=FALSE,
text="Name,Age\nBob,2\nXavier,25\nAdam,1")
  > plot(d2$Name, d2$Age)
  Error in plot.window(...) : need finite 'xlim' values
  In addition: Warning messages:
  1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
  2: In min(x) : no non-missing arguments to min; returning Inf
  3: In max(x) : no non-missing arguments to max; returning -Inf
You get no plot at all.

You can get closer to what I think you want with
  with(d, {
plot(as.integer(Name), Age, axes=FALSE, xlab="Name")
axis(side=2) # draw the usual y axis
axis(side=1, at=seq_along(levels(Name)), lab=levels(Name))
  })
If you want the names in a different order on the x axis, then reconstruct
the factor object d$Name with a different order of levels.  E.g.,
  d$Name <- factor(d$Name, levels=c("Xavier", "Bob", "Adam"))
and replot.

There are various plotting packages, e.g., ggplot2, that can make this
sort of thing easier, but I think the recommendation not to use factors
is wrong.  You do need to learn how to use them to your advantage.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Dec 11, 2014 at 5:00 AM, Sun Shine  wrote:

> Hello
>
> I am struggling with data frames and would appreciate some help please.
>
> I have a data set of 13 observations and 80 variables. The first column is
> the names of different political area boundaries (e.g. MHad, LBNW, etc),
> the first row is a vector of variable names concerning various census data
> (e.g. age.T, hse.Unk, etc.). The first cell [1,1] is blank.
>
> I have loaded this via read.csv('path.to/data.set.csv'), and now want to
> run some analyses on this data frame. If I want to get a list of the names
> of the political areas (i.e. the first column), the result is a vector of
> numbers which appear to correlate with the factors, but I don't get the
> text names, just the corresponding number. So, if I want to plot something
> basic, like the area that uses the most gas for central heating, for
> example:
>
> > plot(data.set$ch.Gas)
>
> The result is the y-axis gives the gas usage for the areas, but the x-axis
> gives only the numbers of the areas, not the names of the areas (which is
> preferred).
>
> So, two questions:
>
> (1) have I set up my csv file correctly to be read as a data frame as the
> first row of all of the remaining columns with the values for that
> political area in the corresponding row in the column with the specific
> variable name? So far, looking through tutorials and books seems to suggest
> yes, but at this point I'm no longer sure.
>
> (2) How can I access the names of the political areas when plotting so
> that these are given on the x-axis instead of the numbers?
>
> Thanks for any help.
>
> Cheers
> Sun
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Working with data frames

2014-12-11 Thread jim holtman
If you are using 'read.csv' (or 'read.table') to input, then use the 'as.is
= TRUE' parameter to prevent the conversion to factors of the data.

You can also do "as.character(df$col_with_factors)" to get the character
values back.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Thu, Dec 11, 2014 at 8:00 AM, Sun Shine  wrote:

> Hello
>
> I am struggling with data frames and would appreciate some help please.
>
> I have a data set of 13 observations and 80 variables. The first column is
> the names of different political area boundaries (e.g. MHad, LBNW, etc),
> the first row is a vector of variable names concerning various census data
> (e.g. age.T, hse.Unk, etc.). The first cell [1,1] is blank.
>
> I have loaded this via read.csv('path.to/data.set.csv'), and now want to
> run some analyses on this data frame. If I want to get a list of the names
> of the political areas (i.e. the first column), the result is a vector of
> numbers which appear to correlate with the factors, but I don't get the
> text names, just the corresponding number. So, if I want to plot something
> basic, like the area that uses the most gas for central heating, for
> example:
>
> > plot(data.set$ch.Gas)
>
> The result is the y-axis gives the gas usage for the areas, but the x-axis
> gives only the numbers of the areas, not the names of the areas (which is
> preferred).
>
> So, two questions:
>
> (1) have I set up my csv file correctly to be read as a data frame as the
> first row of all of the remaining columns with the values for that
> political area in the corresponding row in the column with the specific
> variable name? So far, looking through tutorials and books seems to suggest
> yes, but at this point I'm no longer sure.
>
> (2) How can I access the names of the political areas when plotting so
> that these are given on the x-axis instead of the numbers?
>
> Thanks for any help.
>
> Cheers
> Sun
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Working with data frames

2014-12-11 Thread Ivan Calandra

Hi Sun,

If I understood correctly (a reproducible example would be of great 
help), it seems you're struggling with factors. Read on this topic to 
better understand how it works.


For your plots, you would need to set the labels with the argument 
'xlab' for plot(). To access the names of the factors, use levels()


HTH,
Ivan

--
Ivan Calandra, ATER
University of Reims Champagne-Ardenne
GEGENA² - EA 3795
CREA - 2 esplanade Roland Garros
51100 Reims, France
+33(0)3 26 77 36 89
ivan.calan...@univ-reims.fr
https://www.researchgate.net/profile/Ivan_Calandra

Le 11/12/14 14:00, Sun Shine a écrit :

Hello

I am struggling with data frames and would appreciate some help please.

I have a data set of 13 observations and 80 variables. The first 
column is the names of different political area boundaries (e.g. MHad, 
LBNW, etc), the first row is a vector of variable names concerning 
various census data (e.g. age.T, hse.Unk, etc.). The first cell [1,1] 
is blank.


I have loaded this via read.csv('path.to/data.set.csv'), and now want 
to run some analyses on this data frame. If I want to get a list of 
the names of the political areas (i.e. the first column), the result 
is a vector of numbers which appear to correlate with the factors, but 
I don't get the text names, just the corresponding number. So, if I 
want to plot something basic, like the area that uses the most gas for 
central heating, for example:


> plot(data.set$ch.Gas)

The result is the y-axis gives the gas usage for the areas, but the 
x-axis gives only the numbers of the areas, not the names of the areas 
(which is preferred).


So, two questions:

(1) have I set up my csv file correctly to be read as a data frame as 
the first row of all of the remaining columns with the values for that 
political area in the corresponding row in the column with the 
specific variable name? So far, looking through tutorials and books 
seems to suggest yes, but at this point I'm no longer sure.


(2) How can I access the names of the political areas when plotting so 
that these are given on the x-axis instead of the numbers?


Thanks for any help.

Cheers
Sun

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Working with data frames

2014-12-11 Thread Sun Shine

Hello

I am struggling with data frames and would appreciate some help please.

I have a data set of 13 observations and 80 variables. The first column 
is the names of different political area boundaries (e.g. MHad, LBNW, 
etc), the first row is a vector of variable names concerning various 
census data (e.g. age.T, hse.Unk, etc.). The first cell [1,1] is blank.


I have loaded this via read.csv('path.to/data.set.csv'), and now want to 
run some analyses on this data frame. If I want to get a list of the 
names of the political areas (i.e. the first column), the result is a 
vector of numbers which appear to correlate with the factors, but I 
don't get the text names, just the corresponding number. So, if I want 
to plot something basic, like the area that uses the most gas for 
central heating, for example:


> plot(data.set$ch.Gas)

The result is the y-axis gives the gas usage for the areas, but the 
x-axis gives only the numbers of the areas, not the names of the areas 
(which is preferred).


So, two questions:

(1) have I set up my csv file correctly to be read as a data frame as 
the first row of all of the remaining columns with the values for that 
political area in the corresponding row in the column with the specific 
variable name? So far, looking through tutorials and books seems to 
suggest yes, but at this point I'm no longer sure.


(2) How can I access the names of the political areas when plotting so 
that these are given on the x-axis instead of the numbers?


Thanks for any help.

Cheers
Sun

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.