And by formula yes i mean R syntax.

possible use case would be to take Spark DataFrame and formula (say, `age ~
. -1`) and produce outputs of DrmLike[Int] (a distributed matrix type) that
converts into predictors and target.

In this particular case, this formula means that the predictor matrix (X)
would have all original variables except `age` (for categorical variables
factor extraction is applied), with no bias column.

Some knowledge of R and SAS is required to pin the compatibility nuances
there.

Maybe we could have reasonable simplifications or omissions compared to R
stuff, if we can be reasonably convinced it is actually better that way
than vanilla R contract, but IMO it would be really useful to retain 100%
compatibility there since it is one of ideas there -- retain R-like-ness
with these things.


On Fri, Mar 3, 2017 at 12:31 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:

>
>
> On Fri, Mar 3, 2017 at 4:09 AM, Jim Jagielski <j...@jagunet.com> wrote:
>>
>>>
>>>
>>
>>> >
>>> > 3) On the feature extraction per R like formula can you elaborate more
>>> here, are you talking about feature extraction using R like dataframes and
>>> operators?
>>>
>>
>>
> Yes. I would start doing generic formula parser and then specific part
> that works with backend-speicifc data frames. For spark, i don't see any
> reason to write our own; we'd just had an adapter for the Spark native data
> frames.
>

Reply via email to