[R] Grouped bar plot

2011-05-18 Thread jctoll
Hi,

I am trying to produce a grouped bar plot from a data.frame and I'm
having difficulties figuring out how to do so.  My data is 500 rows by
4 columns and basically looks like so:

 head(x)
V1V2V3V4
1  XOM 0.2317915 0.1610068 1.6941637
2 AAPL 0.6735488 0.7433611 0.1594102
3   GE 1.2554160 0.9237384 1.6767711
4  IBM 1.6296938 0.3730387 0.5858115
5  CVX 0.9194169 0.4785705 0.1803601
6   PG 0.7768241 1.7622060 0.7640163
 . . .

I would like to produce something similar to what is found at:
http://www.statmethods.net/graphs/bar.html  # the grouped
barplot example
or
http://had.co.nz/ggplot2/geom_bar.html# the Dodged bar
charts example

Across the X-axis, for each set(row) of 3 data points(V2, V3, V4)
associated with a symbol(V1), I would like to create a group of 3 bars
reflecting their values.  So the Y-axis will represent the magnitude
of values in the columns (V2, V3, V4), and X-axis will have 500 groups
of 3 bars, for a total of 1500 bars.  I would like the color of each
bar to reflect the column of data it represents, and to label each
group of 3 with the corresponding symbol in column V1.

I was trying to get this to work using ggplot but the y-axis in the
example is the count, which is not what I'm after.  Any suggestions,
to get me started down the right path would be appreciated.  Thank
you.

James

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] taking rows from data.frames in list to form new data.frame?

2011-04-20 Thread jctoll
Hi,

I am having a problem figuring out how to extract a subset of rows.  I
have a list with 68 similar data.frames.  Each data.frame is 500 rows
by 5 columns.  I want to take one row from each data.frame based upon
the data in a particular column (i.e. it matches a symbol).  For
example:

 str(database)
List of 68
 $ X2011.01.11:'data.frame':500 obs. of  5 variables:
  ..$ Symbol: chr [1:500] MMM ACE AES AFL ...
  ..$ Price : num [1:500] 87.7 60.7 13.1 55.7 15.6 ...
  ..$ Shares.Out: num [1:500] 7.15e+08 3.39e+08 7.88e+08 4.71e+08 1.10e+08 ...
  ..$ Float : num [1:500] 7.13e+08 3.38e+08 6.61e+08 4.60e+08 1.09e+08 ...
  ..$ Market.Cap: num [1:500] 6.27e+10 2.06e+10 1.04e+10 2.62e+10 1.72e+09 ...
 $ X2011.01.12:'data.frame':500 obs. of  5 variables:
  ..$ Symbol: chr [1:500] MMM ACE AES AFL ...
  ..$ Price : num [1:500] 88.7 60.9 12.9 57.1 15.2 ...
  ..$ Shares.Out: num [1:500] 7.15e+08 3.39e+08 7.88e+08 4.71e+08 1.10e+08 ...
  ..$ Float : num [1:500] 7.13e+08 3.38e+08 6.61e+08 4.60e+08 1.09e+08 ...
  ..$ Market.Cap: num [1:500] 6.34e+10 2.06e+10 1.02e+10 2.69e+10 1.68e+09 ...
 . . .

 lapply(database, function(x) which(x == IBM))
$X2011.01.11
[1] 234

$X2011.01.12
[1] 234
 . . .

 lapply(database, function(x) x[which(x == IBM), ])
$X2011.01.11
Symbol  Price Shares.OutFloat Market.Cap
234IBM 147.28   1.24e+09 1.24e+09 1.8297e+11

$X2011.01.12
Symbol Price Shares.OutFloat Market.Cap
234IBM 149.1   1.24e+09 1.24e+09 1.8524e+11
 . . .

What I would like to do is create a new data.frame with 68 rows by 5
columns of data, perhaps using the old data.frame names as the new
row.names. I can get to the subset of data that I want, I just can't
get it from list form into one new data.frame.  Any suggestions?
Thank you.

James

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] taking rows from data.frames in list to form new data.frame?

2011-04-20 Thread jctoll
Dennis,

Thanks, the first example works perfectly.

 do.call(rbind, lapply(database, function(df) subset(df, Symbol == 'IBM')))

I haven't tried the second, but will look into plyr.

Thanks, again,

James



On Wed, Apr 20, 2011 at 6:36 PM, Dennis Murphy djmu...@gmail.com wrote:
 Hi:

 Perhaps you're looking for subset()? I'm not sure I understand the
 problem completely, but is

 do.call(rbind, lapply(database, function(df) subset(df, Symbol == 'IBM')))

 or

 library(plyr)
 ldply(lapply(database, function(df) subset(df, Symbol == 'IBM'), rbind)

 in the vicinity of what you're looking for? [Obviously untested for
 the usual reasons...]

 HTH,
 Dennis

 On Wed, Apr 20, 2011 at 4:13 PM, jctoll jct...@gmail.com wrote:
 Hi,

 I am having a problem figuring out how to extract a subset of rows.  I
 have a list with 68 similar data.frames.  Each data.frame is 500 rows
 by 5 columns.  I want to take one row from each data.frame based upon
 the data in a particular column (i.e. it matches a symbol).  For
 example:

 str(database)
 List of 68
  $ X2011.01.11:'data.frame':    500 obs. of  5 variables:
  ..$ Symbol    : chr [1:500] MMM ACE AES AFL ...
  ..$ Price     : num [1:500] 87.7 60.7 13.1 55.7 15.6 ...
  ..$ Shares.Out: num [1:500] 7.15e+08 3.39e+08 7.88e+08 4.71e+08 1.10e+08 ...
  ..$ Float     : num [1:500] 7.13e+08 3.38e+08 6.61e+08 4.60e+08 1.09e+08 ...
  ..$ Market.Cap: num [1:500] 6.27e+10 2.06e+10 1.04e+10 2.62e+10 1.72e+09 ...
  $ X2011.01.12:'data.frame':    500 obs. of  5 variables:
  ..$ Symbol    : chr [1:500] MMM ACE AES AFL ...
  ..$ Price     : num [1:500] 88.7 60.9 12.9 57.1 15.2 ...
  ..$ Shares.Out: num [1:500] 7.15e+08 3.39e+08 7.88e+08 4.71e+08 1.10e+08 ...
  ..$ Float     : num [1:500] 7.13e+08 3.38e+08 6.61e+08 4.60e+08 1.09e+08 ...
  ..$ Market.Cap: num [1:500] 6.34e+10 2.06e+10 1.02e+10 2.69e+10 1.68e+09 ...
  . . .

 lapply(database, function(x) which(x == IBM))
 $X2011.01.11
 [1] 234

 $X2011.01.12
 [1] 234
  . . .

 lapply(database, function(x) x[which(x == IBM), ])
 $X2011.01.11
    Symbol  Price Shares.Out    Float Market.Cap
 234    IBM 147.28   1.24e+09 1.24e+09 1.8297e+11

 $X2011.01.12
    Symbol Price Shares.Out    Float Market.Cap
 234    IBM 149.1   1.24e+09 1.24e+09 1.8524e+11
  . . .

 What I would like to do is create a new data.frame with 68 rows by 5
 columns of data, perhaps using the old data.frame names as the new
 row.names. I can get to the subset of data that I want, I just can't
 get it from list form into one new data.frame.  Any suggestions?
 Thank you.

 James

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need to abstract changing name of column within loop

2011-03-17 Thread jctoll
On Thu, Mar 17, 2011 at 10:32 AM, Joshua Ulrich josh.m.ulr...@gmail.com wrote:
 On Wed, Mar 16, 2011 at 6:58 PM, jctoll jct...@gmail.com wrote:
 Hi,

 I'm struggling to figure out the way to change the name of a column
 from within a loop.  The problem is I can't refer to the object by its
 actual variable name, since that will change each time through the
 loop.  My xts object is A.

 head(A)
           A.Open A.High A.Low A.Close A.Volume A.Adjusted A.Adjusted.1
 2007-01-03  34.99  35.48 34.05   34.30  2574600      34.30  1186780
 2007-01-04  34.30  34.60 33.46   34.41  2073700      34.41  1190586
 2007-01-05  34.30  34.40 34.00   34.09  2676600      34.09  1179514
 2007-01-08  33.98  34.08 33.68   33.97  1557200      33.97  1175362
 2007-01-09  34.08  34.32 33.63   34.01  1386200      34.01  1176746
 2007-01-10  34.04  34.04 33.37   33.70  2157400      33.70  1166020

 It's column names are:
 colnames(A)
 [1] A.Open       A.High       A.Low        A.Close
 A.Volume     A.Adjusted   A.Adjusted.1

 I want to change the 7th column name:
 colnames(A)[7]
 [1] A.Adjusted.1

 I need to do that through a reference to i:
 i
 [1] A

 This works:
 colnames(get(i))[7]
 [1] A.Adjusted.1

 And this is what I want to change the column name to:
 paste(i, .MarketCap, sep = )
 [1] A.MarketCap

 But how do I make the assignment?  This clearly doesn't work:

 colnames(get(i))[7] -  paste(i, .MarketCap, sep = )
 Error in colnames(get(i))[7] - paste(i, .MarketCap, sep = ) :
  could not find function get-

 Nor does this (it creates a new object A.Adjusted.1 with a value of
 A.MarketCap) :

 assign(colnames(get(i))[7], paste(i, .MarketCap, sep = ))

 How can I change the name of that column within my big loop?  Any
 ideas?  Thanks!

 I usually make a copy of the object, change it, then overwrite the original:

 tmp - get(i)
 colnames(tmp)[7] - foo
 assign(i,tmp)

 Hope that helps.

 Best regards,

 Joshua Ulrich  |  FOSS Trading: www.fosstrading.com


Thank you, that does help.  It sounds like your way is similar to the
way David suggested.  At least now I can just resign myself to doing
it that way rather than torturing my mind trying to figure out a
shorter way.

Thanks,

James

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Need to abstract changing name of column within loop

2011-03-16 Thread jctoll
Hi,

I'm struggling to figure out the way to change the name of a column
from within a loop.  The problem is I can't refer to the object by its
actual variable name, since that will change each time through the
loop.  My xts object is A.

 head(A)
   A.Open A.High A.Low A.Close A.Volume A.Adjusted A.Adjusted.1
2007-01-03  34.99  35.48 34.05   34.30  2574600  34.30  1186780
2007-01-04  34.30  34.60 33.46   34.41  2073700  34.41  1190586
2007-01-05  34.30  34.40 34.00   34.09  2676600  34.09  1179514
2007-01-08  33.98  34.08 33.68   33.97  1557200  33.97  1175362
2007-01-09  34.08  34.32 33.63   34.01  1386200  34.01  1176746
2007-01-10  34.04  34.04 33.37   33.70  2157400  33.70  1166020

It's column names are:
 colnames(A)
[1] A.Open   A.High   A.LowA.Close
A.Volume A.Adjusted   A.Adjusted.1

I want to change the 7th column name:
 colnames(A)[7]
[1] A.Adjusted.1

I need to do that through a reference to i:
 i
[1] A

This works:
 colnames(get(i))[7]
[1] A.Adjusted.1

And this is what I want to change the column name to:
 paste(i, .MarketCap, sep = )
[1] A.MarketCap

But how do I make the assignment?  This clearly doesn't work:

 colnames(get(i))[7] -  paste(i, .MarketCap, sep = )
Error in colnames(get(i))[7] - paste(i, .MarketCap, sep = ) :
  could not find function get-

Nor does this (it creates a new object A.Adjusted.1 with a value of
A.MarketCap) :

assign(colnames(get(i))[7], paste(i, .MarketCap, sep = ))

How can I change the name of that column within my big loop?  Any
ideas?  Thanks!

Best regards,


James

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need to abstract changing name of column within loop

2011-03-16 Thread jctoll
On Wed, Mar 16, 2011 at 8:20 PM, David Winsemius dwinsem...@comcast.net wrote:

 On Mar 16, 2011, at 7:58 PM, jctoll wrote:

 Hi,

 I'm struggling to figure out the way to change the name of a column
 from within a loop.  The problem is I can't refer to the object by its
 actual variable name, since that will change each time through the
 loop.  My xts object is A.

 head(A)

          A.Open A.High A.Low A.Close A.Volume A.Adjusted A.Adjusted.1
 2007-01-03  34.99  35.48 34.05   34.30  2574600      34.30  1186780
 2007-01-04  34.30  34.60 33.46   34.41  2073700      34.41  1190586
 2007-01-05  34.30  34.40 34.00   34.09  2676600      34.09  1179514
 2007-01-08  33.98  34.08 33.68   33.97  1557200      33.97  1175362
 2007-01-09  34.08  34.32 33.63   34.01  1386200      34.01  1176746
 2007-01-10  34.04  34.04 33.37   33.70  2157400      33.70  1166020

 It's column names are:

 colnames(A)

 [1] A.Open       A.High       A.Low        A.Close
 A.Volume     A.Adjusted   A.Adjusted.1

 I want to change the 7th column name:

 colnames(A)[7]

 [1] A.Adjusted.1

 I need to do that through a reference to i:

 i

 [1] A


 It's not pretty and there may be a more direct way:
 a
      [,1] [,2]
 [1,] 1e+20 1000
 [2,] 1e+02 1000
 b - a
 assign(tmp, eval(parse(text=b)))
 tmp
      [,1] [,2]
 [1,] 1e+20 1000
 [2,] 1e+02 1000
 colnames(tmp) - c(one,two)
 assign( b, tmp)
 a
       one  two
 [1,] 1e+20 1000
 [2,] 1e+02 1000


 Trying to short circuit the process without an intermediate temporary
 structure fails:
 colnames(eval(parse(text=b)) )- c(two, three)
 Error in parse(`*tmp*`) : EOF whilst reading MBCS char at line 1
 --
 David


Thank you.  This seems to work, but as you observed it's not exactly elegant.

Thanks,

James





 This works:

 colnames(get(i))[7]

 [1] A.Adjusted.1

 And this is what I want to change the column name to:

 paste(i, .MarketCap, sep = )

 [1] A.MarketCap

 But how do I make the assignment?  This clearly doesn't work:

 colnames(get(i))[7] -  paste(i, .MarketCap, sep = )

 Error in colnames(get(i))[7] - paste(i, .MarketCap, sep = ) :
  could not find function get-

 Nor does this (it creates a new object A.Adjusted.1 with a value of
 A.MarketCap) :

 assign(colnames(get(i))[7], paste(i, .MarketCap, sep = ))

 How can I change the name of that column within my big loop?  Any
 ideas?  Thanks!

 Best regards,


 James


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Finding pairs with least magnitude difference from mean

2011-02-25 Thread jctoll
Hi,

I have what I think is some kind of linear programming question.
Basically, what I want to figure out is if I have a vector of numbers,

 x - rnorm(10)

 x
 [1] -0.44305959 -0.26707077  0.07121266  0.44123714 -1.10323616
-0.19712807  0.20679494 -0.98629992  0.97191659 -0.77561593

 mean(x)
[1] -0.2081249

Using each number only once, I want to find the set of five pairs
where the magnitude of the differences between the mean(x) and each
pairs sum is least.

 y - outer(x, x, +) - (2 * mean(x))
 y
 [,1][,2][,3][,4]   [,5]
 [,6]   [,7]   [,8]  [,9]   [,10]
 [1,] -0.46986936 -0.29388054  0.04440289  0.41442737 -1.1300459
-0.22393784  0.1799852 -1.0131097 0.9451068 -0.80242569
 [2,] -0.29388054 -0.11789173  0.22039171  0.59041619 -0.9540571
-0.04794902  0.3559740 -0.8371209 1.1210956 -0.62643688
 [3,]  0.04440289  0.22039171  0.55867514  0.92869962 -0.6157737
0.29033441  0.6942574 -0.4988374 1.4593791 -0.28815345
 [4,]  0.41442737  0.59041619  0.92869962  1.29872410 -0.2457492
0.66035889  1.0642819 -0.1288130 1.8294035  0.08187104
 [5,] -1.13004593 -0.95405711 -0.61577368 -0.24574920 -1.7902225
-0.88411441 -0.4801914 -1.6732863 0.2849302 -1.46260226
 [6,] -0.22393784 -0.04794902  0.29033441  0.66035889 -0.8841144
0.02199368  0.4259167 -0.7671782 1.1910383 -0.55649417
 [7,]  0.17998518  0.35597399  0.69425742  1.06428191 -0.4801914
0.42591670  0.8298397 -0.3632552 1.5949614 -0.15257116
 [8,] -1.01310969 -0.83712087 -0.49883744 -0.12881296 -1.6732863
-0.76717817 -0.3632552 -1.5563500 0.4018665 -1.34566603
 [9,]  0.94510682  1.12109563  1.45937907  1.82940355  0.2849302
1.19103834  1.5949614  0.4018665 2.3600830  0.61255048
[10,] -0.80242569 -0.62643688 -0.28815345  0.08187104 -1.4626023
-0.55649417 -0.1525712 -1.3456660 0.6125505 -1.13498203

With this matrix, if I put together a combination of pairs which uses
each number only once, the sum of the corresponding numbers is 0.

For example, compare the SD between this set of 5 pairs
 y[10,1] + y[9,2] + y[8,3] + y[7,4] + y[6,5]
[1] 0
 sum(c(y[10,1], y[9,2], y[8,3], y[7,4], y[6,5]))
[1] 5.551115e-17# basically 0, I assume this is round-off error
 mean(c(y[10,1], y[9,2], y[8,3], y[7,4], y[6,5]))
[1] 1.111307e-17# basically 0, I assume this is round-off error
 sd(c(y[10,1], y[9,2], y[8,3], y[7,4], y[6,5]))
[1] 1.007960

versus this hand-selected, possibly lowest SD combination of pairs
 sum(c(y[3,1], y[6,2], y[10,4], y[9,5], y[8,7]))
[1] -1.665335e-16   # basically 0, I assume this is round-off error
 mean(c(y[3,1], y[6,2], y[10,4], y[9,5], y[8,7]))
[1] -3.330669e-17   # basically 0, I assume this is round-off error
 sd(c(y[3,1], y[6,2], y[10,4], y[9,5], y[8,7]))
[1] 0.2367030

I believe that if I could test all the various five pair combinations,
the combination with the lowest SD of values from the table would give
me my answer.  I believe I have 3 questions regarding my problem.

1) How can I find all the 5 pair combinations of my 10 numbers so that
I can perform a brute force test of each set of combinations?  I
believe there are 45 different pairs (i.e. choose(10,2)). I found
combinations from the {Combinations} package but I can't figure out
how to get it to provide pairs.

2) Will my brute force strategy of testing the SD of each of these 5
pair combinations actually give me the answer I'm searching for?

3) Is there a better way of doing this?  Probably something to do with
real linear programming, rather than this method I've concocted.

Thanks for any help you can provide regarding my question.

Best regards,

James

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Applying multiple functions to one object

2011-02-04 Thread jctoll
On Wed, Feb 2, 2011 at 7:59 AM, Karl Ove Hufthammer k...@huftis.org wrote:
 Dear list members,

 I recall seeing a convenience function for applying multiple functions to
 one object (i.e., almost the opposite of 'mapply’) somewhere.
 Example: If the function was named ’fun’ the output of

  fun(3.14, mode, typeof, class)

 would be identical to the output of

  c(mode(3.14), typeof(3.14), class(3.14))

 Is my memory failing me, or does such a function already exists in a
 package? Of course, it’s not difficult to define a summary function and
 apply this to the object, but writing, for example,

 fun(x, mean, median, sd, mad)

 to quickly show the relevant information is much more *convient*.


 It would be even nicer with a function that could also handle vectors and
 lists of values, and output the result as data frames or matrices. Example:

 x = c(foo, bar, foobar)
 fun(x, nchar, function(st) substr(st, 1 ,2) )

 y = list(3, 3L, 3.14, factor(3))
 fun(x, mode, typeof, class)

 --
 Karl Ove Hufthammer

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


Karl,

Perhaps you're thinking of the Reduce function? There's an example
from the help page that you might be able to adapt to your purpose.

## Iterative function application:
Funcall - function(f, ...) f(...)
## Compute log(exp(acos(cos(0))
Reduce(Funcall, list(log, exp, acos, cos), 0, right = TRUE)

HTH,

James

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.