Re: [R] Olympics: 200m Men Final

Daróczi Gergely Fri, 10 Aug 2012 07:59:16 -0700

On Fri, Aug 10, 2012 at 10:23 AM, Rui Barradas <ruipbarra...@sapo.pt> wrote:


> Hello,
>
> The main critique, I think, is that we assume a certain type of model
> where the times can decrease until zero. And that they can do so linearly.
> I believe that records can allways be beaten but 40-50 years ago times were
> measured in tenths of a second, now we see a gain in the hundreths as
> extraordinary. So the assumption doesn't seem to be completely reasonable.
> As for your assumption that little variation in the responses results in
> little variation in the predictions, I would add that that is true but
> given a model only. The predictions can and do vary from model to model
> (obvious). See the logistic model in the same Gesmann work or Michael's
> ARIMA in a response to my post. Three different predicted values with
> variations from model to model in the tenths of a second. The values are,
> resp., 19.61 (Gesmann) and 19.67 and 19.56 (Weylandt).
> Maybe the linear model performs well because, like you say, the sprinters
> post times very close to each other and a  straight line is not far from
> what a more complex model would do. I'm not betting on the marathon times.
>
> Rui Barradas
>
> Em 10-08-2012 05:31, Mark Leeds escreveu:
>
>  Hi Rui: I hate to sound like a pessimist/cynic and also I should state
>> that
>> I didn't look
>> at any of the analysis by you or the other person. But, my question, ( for
>> anyone who wants to chime in ) is: given that all these olympic 100-200
>> meter runners post times that are generally within 0.1-0.3 seconds of each
>> other or even less, doesn't it stand to reason that a model, given the
>> historical times, is going to predict well. I don't know what the
>> statistical term is for this but intuitively, if there's extremely little
>> variation in the responses, then there's going to be extremely little
>> variation in the predictions and the result is that you won't be too far
>> off ever as long as your predictors are not too strange.  !!!!!   (
>> weight,
>> past performances, height, whatever )
>>
>> Anyone can feel free to chime in and tell me I'm wrong but , if you're
>> going to
>> do that, I'd appreciate statistical reasoning, even though I don't have
>> any. thanks.
>>
>>
>> mark
>>
>>
>>
>>
>>
>>
>> On Thu, Aug 9, 2012 at 4:23 PM, Rui Barradas <ruipbarra...@sapo.pt>
>> wrote:
>>
>>  Hello,
>>>
>>> Have you seen the log-linear prediction of the 100m winning time in R
>>> mailed to the list yesterday by David Smith, subject  Revolutions Blog:
>>> July roundup?
>>>
>>> "A log-linear regression in R predicted the gold-winning Olympic 100m
>>> sprint time to be 9.68 seconds (it was actually 9.63 seconds):
>>> http://bit.ly/QfChUh";
>>>
>>> The original by Markus Gesmann can be found at
>>> http://lamages.blogspot.pt/****2012/07/london-olympics-and-**<http://lamages.blogspot.pt/**2012/07/london-olympics-and-**>
>>> prediction-for-100m.html<http:**//lamages.blogspot.pt/2012/07/**
>>> london-olympics-and-**prediction-for-100m.html<http://lamages.blogspot.pt/2012/07/london-olympics-and-prediction-for-100m.html>
>>> >
>>>
>>> I've made the same, just changing the address to the 200m historical
>>> data,
>>> and the predicted time was 19.27. Usain Bolt has just made 19.32. If you
>>> want to check it, the address and the 'which' argument are:
>>>
>>> url <- "http://www.databasesports.****com/olympics/sport/sportevent.****
>>> htm?sp=ATH&enum=120<http://**www.databasesports.com/**
>>> olympics/sport/sportevent.htm?**sp=ATH&enum=120<http://www.databasesports.com/olympics/sport/sportevent.htm?sp=ATH&enum=120>
>>> >
>>> "
>>>
>>> Plus a change in the graphic functions' y axis arguments to allow for
>>> times around the double to be ploted and seen.
>>>
>>> #
>>> # Original by Markus Gesmann:
>>> # 
>>> http://lamages.blogspot.pt/****2012/07/london-olympics-and-**<http://lamages.blogspot.pt/**2012/07/london-olympics-and-**>
>>> prediction-for-100m.html<http:**//lamages.blogspot.pt/2012/07/**
>>> london-olympics-and-**prediction-for-100m.html<http://lamages.blogspot.pt/2012/07/london-olympics-and-prediction-for-100m.html>
>>> >
>>> library(XML)
>>> library(drc)
>>> url <- "http://www.databasesports.****com/olympics/sport/sportevent.****
>>> htm?sp=ATH&enum=120<http://**www.databasesports.com/**
>>> olympics/sport/sportevent.htm?**sp=ATH&enum=120<http://www.databasesports.com/olympics/sport/sportevent.htm?sp=ATH&enum=120>
>>> >
>>> "
>>> data <- readHTMLTable(readLines(url), which=3, header=TRUE)
>>> golddata <- subset(data, Medal %in% "GOLD")
>>> golddata$Year <- as.numeric(as.character(****golddata$Year))
>>> golddata$Result <- as.numeric(as.character(****golddata$Result))
>>> tail(golddata,10)
>>> logistic <- drm(Result~Year, data=subset(golddata, Year>=1900), fct =
>>> L.4())
>>> log.linear <- lm(log(Result)~Year, data=subset(golddata, Year>=1900))
>>> years <- seq(1896,2012, 4)
>>> predictions <- exp(predict(log.linear, newdata=data.frame(Year=years)**
>>> **))
>>> plot(logistic,  xlim=c(1896,2012),
>>>       ylim=range(golddata$Result) + c(-0.5, 0.5),
>>>       xlab="Year", main="Olympic 100 metre",
>>>       ylab="Winning time for the 100m men's final (s)")
>>> points(golddata$Year, golddata$Result)
>>> lines(years, predictions, col="red")
>>> points(2012, predictions[length(years)], pch=19, col="red")
>>> text(2012 - 0.5, predictions[length(years)] - 0.5,
>>> round(predictions[length(****years)],2))
>>>
>>> Rui Barradas
>>>
>>> ______________________________****________________
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/****listinfo/r-help<https://stat.ethz.ch/mailman/**listinfo/r-help>
>>> <https://stat.**ethz.ch/mailman/listinfo/r-**help<https://stat.ethz.ch/mailman/listinfo/r-help>
>>> >
>>> PLEASE do read the posting guide http://www.R-project.org/**
>>> posting-guide.html 
>>> <http://www.R-project.org/**posting-guide.html<http://www.R-project.org/posting-guide.html>
>>> >
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
> ______________________________**________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
> PLEASE do read the posting guide http://www.R-project.org/**
> posting-guide.html <http://www.R-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>

Dear All,

I would like to also join this fun discussion as inspired by Markus
Gesmann's blog we also published some "predictions" about forthcoming
Olympic results in the last few days - demonstrating some features of our R
packages (`rapport` and `pander` - can be found on GitHub:
https://github.com/Rapporter) integrated in our forthcoming web
application building
on R and possibly attracting attention of some interested laymen in
statistics :)

While being committed to the fact that these kind of extrapolated estimates
are just for fun and we also think that any sport fan can give more valid
and even reliable estimates about the results without any statistical
knowledge, we also suppose that it'd a great opportunity to show something
about statistics to the non specialists - something interesting among
similar news about the Olympics in the yellow press.

That's all about our motives, back to the thread.

Our last estimate was about "200m men final", just as above. The generated
image can be found on our FB page (http://www.facebook.com/rapporter.net),
if you would be interested.

The data was fetched from http://www.databaseolympics.com/ and two models
were build on the previous results (assuming that the performance of the
winners, so the forthcoming results also fit the historical data) -
similarly to the above code and the cited blog post.

Relevant source code (to be uploaded to GH soon as a commented `brew` file
to be used with `pander` for those who would be interested in e.g. the
function generating the plot):

## non linear model
golddata$year2 <- golddata$Year**2
golddata$year3 <- golddata$Year**3
golddata$year4 <- golddata$Year**4
nonLin         <- lm(Result ~ Year * year2 * year3 * year4, data = golddata)

## log-linear model
logLin         <- lm(log(Result) ~ Year, data = golddata)


The rounded estimate of both models was 19.3 seconds which is pretty close
:)

Of course this has nothing to do with causal models, nor would I encourage
anyone to bet on these estimates, but it is really interesting to see
(visualizing with R) the lower and lower values, and it seems that people
tend to like these kind of information.

Just to argue even more in favor of this latter, check out a
similar infographic for the 100m:
http://www.nytimes.com/interactive/2012/08/05/sports/olympics/the-100-meter-dash-one-race-every-medalist-ever.html/

Which presents quite the same as the original blogpost, so I really think
that these kind of models and plots worth dealing with: of course not for
the academics, but advertising that math/statistics/R is comprehensible.

Best,
Gergely

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Olympics: 200m Men Final

Reply via email to