Hi, Never develop any custom Transformer (or UnaryTransformer in particular), but I'd be for it if that's the case.
Jacek 28.03.2016 6:54 AM "Maciej Szymkiewicz" <mszymkiew...@gmail.com> napisał(a): > Hi Jacek, > > In this context, don't you think it would be useful, if at least some > traits from org.apache.spark.ml.param.shared.sharedParams were > public?HasInputCol(s) and HasOutputCol for example. These are useful > pretty much every time you create custom Transformer. > > -- > Pozdrawiam, > Maciej Szymkiewicz > > > On 03/26/2016 10:26 AM, Jacek Laskowski wrote: > > Hi Joseph, > > > > Thanks for the response. I'm one who doesn't understand all the > > hype/need for Machine Learning...yet and through Spark ML(lib) glasses > > I'm looking at ML space. In the meantime I've got few assignments (in > > a project with Spark and Scala) that have required quite extensive > > dataset manipulation. > > > > It was when I sinked into using DataFrame/Dataset for data > > manipulation not RDD (I remember talking to Brian about how RDD is an > > "assembly" language comparing to the higher-level concept of > > DataFrames with Catalysts and other optimizations). After few days > > with DataFrame I learnt he was so right! (sorry Brian, it took me > > longer to understand your point). > > > > I started using DataFrames in far too many places than one could ever > > accept :-) I was so...carried away with DataFrames (esp. show vs > > foreach(println) and UDFs via udf() function) > > > > And then, when I moved to Pipeline API and discovered Transformers. > > And PipelineStage that can create pipelines of DataFrame manipulation. > > They read so well that I'm pretty sure people would love using them > > more often, but...they belong to MLlib so they are part of ML space > > (not many devs tackled yet). I applied the approach to using > > withColumn to have better debugging experience (if I ever need it). I > > learnt it after having watched your presentation about Pipeline API. > > It was so helpful in my RDD/DataFrame space. > > > > So, to promote a more extensive use of Pipelines, PipelineStages, and > > Transformers, I was thinking about moving that part to SQL/DataFrame > > API where they really belong. If not, I think people might miss the > > beauty of the very fine and so helpful Transformers. > > > > Transformers are *not* a ML thing -- they are DataFrame thing and > > should be where they really belong (for their greater adoption). > > > > What do you think? > > > > > > Pozdrawiam, > > Jacek Laskowski > > ---- > > https://medium.com/@jaceklaskowski/ > > Mastering Apache Spark http://bit.ly/mastering-apache-spark > > Follow me at https://twitter.com/jaceklaskowski > > > > > > On Sat, Mar 26, 2016 at 3:23 AM, Joseph Bradley <jos...@databricks.com> > wrote: > >> There have been some comments about using Pipelines outside of ML, but I > >> have not yet seen a real need for it. If a user does want to use > Pipelines > >> for non-ML tasks, they still can use Transformers + PipelineModels. > Will > >> that work? > >> > >> On Fri, Mar 25, 2016 at 8:05 AM, Jacek Laskowski <ja...@japila.pl> > wrote: > >>> Hi, > >>> > >>> After few weeks with spark.ml now, I came to conclusion that > >>> Transformer concept from Pipeline API (spark.ml/MLlib) should be part > >>> of DataFrame (SQL) where they fit better. Are there any plans to > >>> migrate Transformer API (ML) to DataFrame (SQL)? > >>> > >>> Pozdrawiam, > >>> Jacek Laskowski > >>> ---- > >>> https://medium.com/@jaceklaskowski/ > >>> Mastering Apache Spark http://bit.ly/mastering-apache-spark > >>> Follow me at https://twitter.com/jaceklaskowski > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > >>> For additional commands, e-mail: dev-h...@spark.apache.org > >>> > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > > For additional commands, e-mail: dev-h...@spark.apache.org > > > > >