[Rd] Mentor for GSOC '10: Symbolic Regression in R

2010-03-03 Thread Chidambaram Annamalai
Hi all,

I am looking to extend the regression and data analysis capabilities of R
through Symbolic Regression that can potentially find implicit equation
relationships in the input data. You can find my project proposal at:
http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2010:syrfr

I am looking for a mentor to guide me through the summer on the project
under the Google Summer Of Code program (GSOC 2010) with relevant experience
in Symbolic Regression or Genetic Programming in general.


Chillu

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Mentor for GSOC '10: Symbolic Regression in R

2010-03-04 Thread Chidambaram Annamalai
Thank you for your comments. I was indeed looking to use Rcpp for
integration for the C++ portions where compiled code would have performance
benefits while sticking to R code elsewhere.

I've had some previous bad experiences with Swig for interfacing C code with
Python code but it looks like Rcpp is much more friendlier.

And thanks for the support!

Chillu

On Thu, Mar 4, 2010 at 1:18 PM, Romain Francois wrote:

> Hello,
>
> I can't offer to mentor because I don't know anything about symbolic
> regression.
>
> However, since you have R/C++ as the skills requirements, I would strongly
> recommend that you use Rcpp as an enabling technology, so that you can be
> productive on C++/symbolic regressions as opposed to manage R API quirks.
>
> Also, you can count on some implicit support through Rcpp mailing list.
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>
> Good luck finding a mentor, this sounds like a cool project.
>
> Romain
>
>
> On 03/03/2010 08:17 PM, Chidambaram Annamalai wrote:
>
>>
>> Hi all,
>>
>> I am looking to extend the regression and data analysis capabilities of R
>> through Symbolic Regression that can potentially find implicit equation
>> relationships in the input data. You can find my project proposal at:
>> http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2010:syrfr
>>
>> I am looking for a mentor to guide me through the summer on the project
>> under the Google Summer Of Code program (GSOC 2010) with relevant
>> experience
>> in Symbolic Regression or Genetic Programming in general.
>>
>>
>> Chillu
>>
>
>
> --
> Romain Francois
> Professional R Enthusiast
> +33(0) 6 28 91 30 30
> http://romainfrancois.blog.free.fr
> |- http://tr.im/OIXN : raster images and RImageJ
> |- http://tr.im/OcQe : Rcpp 0.7.7
> `- http://tr.im/O1wO : highlight 0.1-5
>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] application to mentor syrfr package development for Google Summer of Code 2010

2010-03-07 Thread Chidambaram Annamalai
It's been a while since I proposed syrfr and I have been constantly in
contact with the many people in the R community and I wasn't able to find a
mentor for the project. I later got interested in the Automatic
Differentiation proposal (adinr) and, on consulting with a few others within
the R community, I mailed John Nash (who proposed adinr in the first place)
if he'd be willing to take me up on the project. I got a positive reply only
a few hours ago and it was my mistake to have not removed the syrfr proposal
in time from the wiki, as being listed under proposals looking for mentors.

While I appreciate your interest in the syrfr proposal I am afraid my
allegiances have shifted towards the adinr proposal, as I got convinced that
it might interest a larger group of people and it has wider scope in
general.

I apologize for having caused this trouble.

Best Regards,
Chillu

On Mon, Mar 8, 2010 at 6:41 AM, James Salsman wrote:

> Per http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2010
> -- and
> http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2010:syrfr
> -- I am applying to mentor the "Symbolic Regression for R" (syrfr)
> package for the Google Summer of Code 2010.
>
> I propose the following test which an applicant would have to pass in
> order to qualify for the topic:
>
> 1. Describe each of the following terms as they relate to statistical
> regression: categorical, periodic, modular, continuous, bimodal,
> log-normal, logistic, Gompertz, and nonlinear.
>
> 2. Explain which parts of http://bit.ly/tablecurve were adopted in
> SigmaPlot and which weren't.
>
> 3. Use the 'outliers' package to improve a regression fit maintaining
> the correct extrapolation confidence intervals as are between those
> with and without outlier exclusions in proportion to the confidence
> that the outliers were reasonably excluded.  (Show your R transcript.)
>
> 4. Explain the relationship between degrees of freedom and correlated
> independent variables.
>
> Best regards,
>
> James Salsman
> jsals...@talknicer.com
> http://talknicer.com
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] application to mentor syrfr package development for Google Summer of Code 2010

2010-03-07 Thread Chidambaram Annamalai
> If I understand your concern, you want to lay the foundation for
> derivatives so that you can implement the search strategies described
> in Schmidt and Lipson (2010) --
> http://www.springerlink.com/content/l79v2183725413w0/ -- is that
> right?


Yes. Basically traditional "naive" error estimators or fitness functions
fail miserably when used in SR with implicit equations because they
immediately close in on "best" fits like f(x) = x - x and other trivial
solutions. In such cases no amount of regularization and complexity
penalizing methods will help since x - x is fairly simple by most measures
of complexity and it does have zero error. So the paper outlines such
problems associated with "direct" error estimators and thus they infer the
"triviality" of the fit by probing its estimates around nearby points and
seeing if it does follow the pattern dictated by the data points -- ergo
derivatives.

Also, somewhat like a side benefit, this method also enables us to perform
regression on closed loops and other implicit equations since the fitness
functions are based only on derivatives. The specific form of the error is
equation 1.2 which is what, I believe, comprises of the internals of the
evaluation procedure used in Eureqa.

You are correct in pointing out that there is no reason to not work in
parallel, since GAs generally have a more or less fixed form
(evaluate-reproduce cycle) which is quite easily parallelized. I have used
OpenMP in the past, in which it is fairly trivial to parallelize well formed
for loops.

Chillu

It is not clear to me how well this generalized approach will
> work in practice, but there is no reason not to proceed in parallel to
> establish a framework under which you could implement the metrics
> proposed by Schmidt and Lipson in the contemplated syrfr package.
>
> I have expanded the test I proposed with two more questions -- at
> http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2010:syrfr
> -- specifically:
>
> 5. Critique http://sites.google.com/site/gptips4matlab/
>
> 6. Use anova to compare the goodness-of-fit of a SSfpl nls fit with a
> linear model of your choice. How can your characterize the
> degree-of-freedom-adjusted goodness of fit of nonlinear models?
>
> I believe pairwise anova.nls is the optimal comparison for nonlinear
> models, but there are several good choices for approximations,
> including the residual standard error, which I believe can be adjusted
> for degrees of freedom, as can the F statistic which TableCurve uses;
> see: http://en.wikipedia.org/wiki/F-test#Regression_problems
>
> Best regards,
> James Salsman
>
>
> On Sun, Mar 7, 2010 at 7:35 PM, Chidambaram Annamalai
>  wrote:
> > It's been a while since I proposed syrfr and I have been constantly in
> > contact with the many people in the R community and I wasn't able to find
> a
> > mentor for the project. I later got interested in the Automatic
> > Differentiation proposal (adinr) and, on consulting with a few others
> within
> > the R community, I mailed John Nash (who proposed adinr in the first
> place)
> > if he'd be willing to take me up on the project. I got a positive reply
> only
> > a few hours ago and it was my mistake to have not removed the syrfr
> proposal
> > in time from the wiki, as being listed under proposals looking for
> mentors.
> >
> > While I appreciate your interest in the syrfr proposal I am afraid my
> > allegiances have shifted towards the adinr proposal, as I got convinced
> that
> > it might interest a larger group of people and it has wider scope in
> > general.
> >
> > I apologize for having caused this trouble.
> >
> > Best Regards,
> > Chillu
> >
> > On Mon, Mar 8, 2010 at 6:41 AM, James Salsman 
> > wrote:
> >>
> >> Per http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2010
> >> -- and
> >>
> http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2010:syrfr
> >> -- I am applying to mentor the "Symbolic Regression for R" (syrfr)
> >> package for the Google Summer of Code 2010.
> >>
> >> I propose the following test which an applicant would have to pass in
> >> order to qualify for the topic:
> >>
> >> 1. Describe each of the following terms as they relate to statistical
> >> regression: categorical, periodic, modular, continuous, bimodal,
> >> log-normal, logistic, Gompertz, and nonlinear.
> >>
> >> 2. Explain which parts of http://bit.ly/tablecurve were adopted in
> >> SigmaPlot and which weren't.
> >>
> >> 3. Use the 'outliers&

Re: [Rd] application to mentor syrfr package development for Google Summer of Code 2010

2010-03-08 Thread Chidambaram Annamalai
Oh oops. I clearly embarrassed myself. :D

I believe you are suggesting that besides the evaluation functions proposed
in the paper you want to test the model produced by SR using statistical
tests to prove its validity? I haven't really given much thought about using
statistical tests in model evaluation. But, that seems to me like a hybrid
-- not just purely evolutionary, betraying the title of SR. However, l
haven't performed any tests myself to conclude which one will outdo the
other.

Chillu

On Mon, Mar 8, 2010 at 1:19 PM, James Salsman wrote:

> Chillu, I meant that development on both a syrfr R package capable of
> using either F statistics or parametric derivatives should proceed in
> parallel with your work on such a derivatives package. You are right
> that genetic algorithm search (and general best-first search --
> http://en.wikipedia.org/wiki/Best-first_search -- of which genetic
> algorithms are various special cases) can be very effectively
> parallelized, too.
>
> In any case, thank you for pointing out Eureqa --
> http://ccsl.mae.cornell.edu/eureqa -- but I can see no evidence there
> or in the user manual or user forums that Eureqa is considering
> degrees of freedom in its goodness-of-fit estimation.  That is a
> serious problem which will typically result in invalid symbolic
> regression.  I am sending this message also to Michael Schmidt so that
> he might be able to comment on the extent to which Eureqa adjusts for
> degrees of freedom in his fit evaluations.
>
> Best regards,
> James Salsman
>
> On Sun, Mar 7, 2010 at 10:39 PM, Chidambaram Annamalai
>  wrote:
> >
> >> If I understand your concern, you want to lay the foundation for
> >> derivatives so that you can implement the search strategies described
> >> in Schmidt and Lipson (2010) --
> >> http://www.springerlink.com/content/l79v2183725413w0/ -- is that
> >> right?
> >
> > Yes. Basically traditional "naive" error estimators or fitness functions
> > fail miserably when used in SR with implicit equations because they
> > immediately close in on "best" fits like f(x) = x - x and other trivial
> > solutions. In such cases no amount of regularization and complexity
> > penalizing methods will help since x - x is fairly simple by most
> measures
> > of complexity and it does have zero error. So the paper outlines such
> > problems associated with "direct" error estimators and thus they infer
> the
> > "triviality" of the fit by probing its estimates around nearby points and
> > seeing if it does follow the pattern dictated by the data points -- ergo
> > derivatives.
> >
> > Also, somewhat like a side benefit, this method also enables us to
> perform
> > regression on closed loops and other implicit equations since the fitness
> > functions are based only on derivatives. The specific form of the error
> is
> > equation 1.2 which is what, I believe, comprises of the internals of the
> > evaluation procedure used in Eureqa.
> >
> > You are correct in pointing out that there is no reason to not work in
> > parallel, since GAs generally have a more or less fixed form
> > (evaluate-reproduce cycle) which is quite easily parallelized. I have
> used
> > OpenMP in the past, in which it is fairly trivial to parallelize well
> formed
> > for loops.
> >
> > Chillu
> >
> >> It is not clear to me how well this generalized approach will
> >> work in practice, but there is no reason not to proceed in parallel to
> >> establish a framework under which you could implement the metrics
> >> proposed by Schmidt and Lipson in the contemplated syrfr package.
> >>
> >> I have expanded the test I proposed with two more questions -- at
> >>
> http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2010:syrfr
> >> -- specifically:
> >>
> >> 5. Critique http://sites.google.com/site/gptips4matlab/
> >>
> >> 6. Use anova to compare the goodness-of-fit of a SSfpl nls fit with a
> >> linear model of your choice. How can your characterize the
> >> degree-of-freedom-adjusted goodness of fit of nonlinear models?
> >>
> >> I believe pairwise anova.nls is the optimal comparison for nonlinear
> >> models, but there are several good choices for approximations,
> >> including the residual standard error, which I believe can be adjusted
> >> for degrees of freedom, as can the F statistic which TableCurve uses;
> >> see: http://en.wikipedia.org/wiki/F-test#Regression_problems
> >>
> >> Best regards,
> >> James Salsman
> 

[Rd] : Operator overloading for custom classes

2010-03-23 Thread Chidambaram Annamalai
Hi,

I need some help to get some of the object orientation, specifically the
methods that overload the basic arithmetic operations, from sample C++
code to R. I don't have experience with such advanced language features
inside of R. So I was wondering if some of you could help me out in this
regard.

I have written a simple demonstration of a forward mode automatic
differentiator in C++ and it is currently hosted on github:
http://github.com/quantumelixir/ad-demo/blob/master/simple.cpp. It uses
simple operator overloading techniques to modify the meaning of the
basic arithmetic operations (+, -, *, /) for the "derivative" type Dual
number class that I have defined. Could you show me how this could be
equivalently done in R? I want to know how to define custom classes and
define the meaning of arithmetic for them.

I had checked for operator overloading in R but could only find the
equivalence of a + b and '+'(a, b) in the R language definition. Could
you show how I could extend the simple object oriented-ness in the C++
code neatly to R?

Thanks a bunch!
Chillu

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] : Operator overloading for custom classes

2010-03-27 Thread Chidambaram Annamalai
Thank you for replying! But I have since figured out how to perform the
operator overloading after reading the "Not so short tutorial to S4 classes"
and a bit of Robert Gentleman's deceptively titled book on R.

(I had mistakenly posted to r-devel earlier and when I posted to r-help
subsequently I got a pointer to the resources on rwiki that dealt with OOP)

Like you say, many already existing packages inside of R use operator
overloading. For instance I later observed that the "+" operator was
overloaded [by running the command methods('+')] for the Date and POSIXt
classes. I will have to redefine the generic function "+" for my classes to
get the output I want.

Regards,
Chillu

On Sat, Mar 27, 2010 at 3:46 AM, Peter Ruckdeschel  wrote:

> >
> > I need some help to get some of the object orientation, specifically the
> > methods that overload the basic arithmetic operations, from sample C++
> > code to R. I don't have experience with such advanced language features
> > inside of R. So I was wondering if some of you could help me out in this
> > regard.
> >
> > I have written a simple demonstration of a forward mode automatic
> > differentiator in C++ and it is currently hosted on github:
> > http://github.com/quantumelixir/ad-demo/blob/master/simple.cpp. It uses
> > simple operator overloading techniques to modify the meaning of the
> > basic arithmetic operations (+, -, *, /) for the "derivative" type Dual
> > number class that I have defined. Could you show me how this could be
> > equivalently done in R? I want to know how to define custom classes and
> > define the meaning of arithmetic for them.
> >
> > I had checked for operator overloading in R but could only find the
> > equivalence of a + b and '+'(a, b) in the R language definition. Could
> > you show how I could extend the simple object oriented-ness in the C++
> > code neatly to R?
> >
> > Thanks a bunch!
> > Chillu
> >
> >   [[alternative HTML version deleted]]
> >
>
> First of all you should read about S4 classes, in particular
> the books of John Chambers should be helpful ;-)
> (others on this list might point you to further sources).
>
> The key is to define S4 classes for your operands
> [i.e; if I have understood correctly you would implement
> your Dual number class as S4 class].
>
> Then you would define new methods by setMethod().
>
> Not sure whether this gives you the indications you look for,
> but we have overloaded arithmetic operators to act on distribution
> classes in package distr (cf CRAN, developped under r-forge).
> You might want to look at the code in NormalDistribution.R resp.
> ContDistribution.R
>
> HTH Peter Ruckdeschel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel