Re: Any plans to migrate Transformer API to Spark SQL (closer to DataFrames)?

Jacek Laskowski Sun, 27 Mar 2016 23:45:07 -0700

Hi,

Never develop any custom Transformer (or UnaryTransformer in particular),
but I'd be for it if that's the case.


Jacek
28.03.2016 6:54 AM "Maciej Szymkiewicz" <mszymkiew...@gmail.com> napisał(a):

> Hi Jacek,
>
> In this context, don't you think it would be useful, if at least some
> traits from org.apache.spark.ml.param.shared.sharedParams were
> public?HasInputCol(s) and HasOutputCol for example. These are useful
> pretty much every time you create custom Transformer.
>
> --
> Pozdrawiam,
> Maciej Szymkiewicz
>
>
> On 03/26/2016 10:26 AM, Jacek Laskowski wrote:
> > Hi Joseph,
> >
> > Thanks for the response. I'm one who doesn't understand all the
> > hype/need for Machine Learning...yet and through Spark ML(lib) glasses
> > I'm looking at ML space. In the meantime I've got few assignments (in
> > a project with Spark and Scala) that have required quite extensive
> > dataset manipulation.
> >
> > It was when I sinked into using DataFrame/Dataset for data
> > manipulation not RDD (I remember talking to Brian about how RDD is an
> > "assembly" language comparing to the higher-level concept of
> > DataFrames with Catalysts and other optimizations). After few days
> > with DataFrame I learnt he was so right! (sorry Brian, it took me
> > longer to understand your point).
> >
> > I started using DataFrames in far too many places than one could ever
> > accept :-) I was so...carried away with DataFrames (esp. show vs
> > foreach(println) and UDFs via udf() function)
> >
> > And then, when I moved to Pipeline API and discovered Transformers.
> > And PipelineStage that can create pipelines of DataFrame manipulation.
> > They read so well that I'm pretty sure people would love using them
> > more often, but...they belong to MLlib so they are part of ML space
> > (not many devs tackled yet). I applied the approach to using
> > withColumn to have better debugging experience (if I ever need it). I
> > learnt it after having watched your presentation about Pipeline API.
> > It was so helpful in my RDD/DataFrame space.
> >
> > So, to promote a more extensive use of Pipelines, PipelineStages, and
> > Transformers, I was thinking about moving that part to SQL/DataFrame
> > API where they really belong. If not, I think people might miss the
> > beauty of the very fine and so helpful Transformers.
> >
> > Transformers are *not* a ML thing -- they are DataFrame thing and
> > should be where they really belong (for their greater adoption).
> >
> > What do you think?
> >
> >
> > Pozdrawiam,
> > Jacek Laskowski
> > ----
> > https://medium.com/@jaceklaskowski/
> > Mastering Apache Spark http://bit.ly/mastering-apache-spark
> > Follow me at https://twitter.com/jaceklaskowski
> >
> >
> > On Sat, Mar 26, 2016 at 3:23 AM, Joseph Bradley <jos...@databricks.com>
> wrote:
> >> There have been some comments about using Pipelines outside of ML, but I
> >> have not yet seen a real need for it.  If a user does want to use
> Pipelines
> >> for non-ML tasks, they still can use Transformers + PipelineModels.
> Will
> >> that work?
> >>
> >> On Fri, Mar 25, 2016 at 8:05 AM, Jacek Laskowski <ja...@japila.pl>
> wrote:
> >>> Hi,
> >>>
> >>> After few weeks with spark.ml now, I came to conclusion that
> >>> Transformer concept from Pipeline API (spark.ml/MLlib) should be part
> >>> of DataFrame (SQL) where they fit better. Are there any plans to
> >>> migrate Transformer API (ML) to DataFrame (SQL)?
> >>>
> >>> Pozdrawiam,
> >>> Jacek Laskowski
> >>> ----
> >>> https://medium.com/@jaceklaskowski/
> >>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> >>> Follow me at https://twitter.com/jaceklaskowski
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >>> For additional commands, e-mail: dev-h...@spark.apache.org
> >>>
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> > For additional commands, e-mail: dev-h...@spark.apache.org
> >
>
>
>

Re: Any plans to migrate Transformer API to Spark SQL (closer to DataFrames)?

Reply via email to