And by formula yes i mean R syntax. possible use case would be to take Spark DataFrame and formula (say, `age ~ . -1`) and produce outputs of DrmLike[Int] (a distributed matrix type) that converts into predictors and target.
In this particular case, this formula means that the predictor matrix (X) would have all original variables except `age` (for categorical variables factor extraction is applied), with no bias column. Some knowledge of R and SAS is required to pin the compatibility nuances there. Maybe we could have reasonable simplifications or omissions compared to R stuff, if we can be reasonably convinced it is actually better that way than vanilla R contract, but IMO it would be really useful to retain 100% compatibility there since it is one of ideas there -- retain R-like-ness with these things. On Fri, Mar 3, 2017 at 12:31 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > > > On Fri, Mar 3, 2017 at 4:09 AM, Jim Jagielski <j...@jagunet.com> wrote: >> >>> >>> >> >>> > >>> > 3) On the feature extraction per R like formula can you elaborate more >>> here, are you talking about feature extraction using R like dataframes and >>> operators? >>> >> >> > Yes. I would start doing generic formula parser and then specific part > that works with backend-speicifc data frames. For spark, i don't see any > reason to write our own; we'd just had an adapter for the Spark native data > frames. >