Re: [R] Subtracting Data Frame With a Different Number of Rows

2020-04-21 Thread William Michels via R-help
Hi Phillip,

You have two choices here: 1. Manually enter the missing rows into
your individual.df using rbind(), and cbind() the overall.df and
individual.df dataframes together (assuming the rows line up
properly), or 2. Use merge() to perform an SQL-like "Left Join", and
copy values from the "overall" columns to fill in missing values in
the "indiv" columns (imputation). Below is code starting from a .tsv
files showing the second (merge) method. Note: I've only included the
first 4 rows of data after the merge command (there are 24 rows
total):

> overall <- read.delim("overall.R", sep="\t")
> indiv <- read.delim("individual.R", sep="\t")
> merge(overall, indiv, all.x=TRUE, by.x=c("RunnerCode", "Outs"), 
> by.y=c("RunnerCode", "Outs"))

RunnerCode Outs X.x MeanRuns.x X.y MeanRuns.y
1   BasesEmpty0   1  0.5137615   1  0.4262295
2   BasesEmpty1   9  0.3963801   8  0.5238095
3   BasesEmpty2  17  0.4191011  15  0.3469388
4  BasesLoaded0   8  3.2173913  NA NA


HTH, Bill.

W. Michels, Ph.D.


On Tue, Apr 21, 2020 at 1:47 PM Phillip Heinrich  wrote:
>
> I have two small data frames of baseball data.  The first one is the mean
> number of runs that will score in each half inning for the 2018 Arizona
> Diamondbacks.  The second data frame is the same information but for only
> one player.  As you will see the individual player did not come up to bat
> any time during the season:
> with the bases loaded and no outs
> runners on first and third with one out
>
> Overall
>
> RunnerCodeOuts MeanRuns
> 1 Bases Empty 0   0.5137615
> 2 Runner:1st0   0.8967391
> 3 Runner:2nd   0   1.3018868
> 4 Runners:1st & 2nd0   1.6551724
> 5 Runner:3rd0   1.9545455
> 6 Runners:1st & 3rd 0   2.0571429
> 7 Runners:2nd & 3rd0   2.1578947
> 8 Bases Loaded0   3.2173913
> 9 Bases Empty  1   0.3963801
> 10 Runner:1st   1   0.6952596
> 11 Runner:2nd  1   0.9580838
> 12 Runners:1st & 2nd   1   1.4397163
> 13 Runner:3rd   1   1.5352113
> 14 Runners:1st & 3rd   11.5882353
> 15 Runners:2nd & 3rd  11.9215686
> 16 Bases Loaded  11.9193548
> 17 Bases Empty20.4191011
> 18 Runner:1st   20.5531915
> 19 Runner:2nd  20.8777293
> 20 Runners:1st & 2nd  2 0.9553073
> 21 Runner:3rd  2 1.2783505
> 22 Runners:1st & 3rd   2 1.5851064
> 23 Runners:2nd & 3rd  2 1.2794118
> 24 Bases Loaded 2  1.388235
>
> Individual Player
>
>   RunnerCode  Outs   MeanRuns
> 1 Bases Empty 0 0.4262295
> 2 Runner:1st0 1.320
> 3 Runner:2nd   0 1.2857143
> 4 Runners:1st & 2nd   0  0.5714286
> 5 Runner:3rd   0  2.000
> 6 Runners:1st & 3rd0  3.500
> 7 Runners:2nd & 3rd   0  1.000
> 8 Bases Empty 1  0.5238095
> 9 Runner:1st1  0.6578947
> 10 Runner:2nd 1  0.375
> 11 Runners:1st & 2nd 1   1.4285714
> 12 Runner:3rd 1   1.4285714
> 13 Runners:2nd & 3rd 1   0.667
> 14 Bases Loaded 1   3.000
> 15 Bases Empty   2   0.3469388
> 16 Runner:1st  2   0.1363636
> 17 Runner:2nd 2   0.7142857
> 18 Runners:1st & 2nd  2   1.667
> 19 Runner:3rd  2   1.250
> 20 Runners:1st & 3rd  22.1428571
> 21 Runners:2nd & 3rd 21.500
> 22 Bases Loaded 22.200
>
> RunnersCode is a factor
> Outs are integers
> MeanRuns is numerical data
>
> I would like to subtract the second from the first as a way to evaluate the
> players ability to produce runs. As part of this analysis I I would like to
> input the mean number of runs from the overall data frame into the two
> missing cells for the individual player:Bases Loaded no outs and 1st and 3rd
> one out.
>
> Can anyone give me some advise?
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see

[R] Subtracting Data Frame With a Different Number of Rows

2020-04-21 Thread Phillip Heinrich
I have two small data frames of baseball data.  The first one is the mean 
number of runs that will score in each half inning for the 2018 Arizona 
Diamondbacks.  The second data frame is the same information but for only 
one player.  As you will see the individual player did not come up to bat 
any time during the season:

   with the bases loaded and no outs
   runners on first and third with one out

Overall

RunnerCodeOuts MeanRuns
1 Bases Empty 0   0.5137615
2 Runner:1st0   0.8967391
3 Runner:2nd   0   1.3018868
4 Runners:1st & 2nd0   1.6551724
5 Runner:3rd0   1.9545455
6 Runners:1st & 3rd 0   2.0571429
7 Runners:2nd & 3rd0   2.1578947
8 Bases Loaded0   3.2173913
9 Bases Empty  1   0.3963801
10 Runner:1st   1   0.6952596
11 Runner:2nd  1   0.9580838
12 Runners:1st & 2nd   1   1.4397163
13 Runner:3rd   1   1.5352113
14 Runners:1st & 3rd   11.5882353
15 Runners:2nd & 3rd  11.9215686
16 Bases Loaded  11.9193548
17 Bases Empty20.4191011
18 Runner:1st   20.5531915
19 Runner:2nd  20.8777293
20 Runners:1st & 2nd  2 0.9553073
21 Runner:3rd  2 1.2783505
22 Runners:1st & 3rd   2 1.5851064
23 Runners:2nd & 3rd  2 1.2794118
24 Bases Loaded 2  1.388235

Individual Player

 RunnerCode  Outs   MeanRuns
1 Bases Empty 0 0.4262295
2 Runner:1st0 1.320
3 Runner:2nd   0 1.2857143
4 Runners:1st & 2nd   0  0.5714286
5 Runner:3rd   0  2.000
6 Runners:1st & 3rd0  3.500
7 Runners:2nd & 3rd   0  1.000
8 Bases Empty 1  0.5238095
9 Runner:1st1  0.6578947
10 Runner:2nd 1  0.375
11 Runners:1st & 2nd 1   1.4285714
12 Runner:3rd 1   1.4285714
13 Runners:2nd & 3rd 1   0.667
14 Bases Loaded 1   3.000
15 Bases Empty   2   0.3469388
16 Runner:1st  2   0.1363636
17 Runner:2nd 2   0.7142857
18 Runners:1st & 2nd  2   1.667
19 Runner:3rd  2   1.250
20 Runners:1st & 3rd  22.1428571
21 Runners:2nd & 3rd 21.500
22 Bases Loaded 22.200

RunnersCode is a factor
Outs are integers
MeanRuns is numerical data

I would like to subtract the second from the first as a way to evaluate the 
players ability to produce runs. As part of this analysis I I would like to 
input the mean number of runs from the overall data frame into the two 
missing cells for the individual player:Bases Loaded no outs and 1st and 3rd 
one out.


Can anyone give me some advise?

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.