It is very difficult to give a general answer. We would need to discuss each case. In general things that are trivially doable using existing APIs, it is not a good idea to provide them, unless for compatibility with other frameworks (e.g. Pandas).
On Fri, Oct 14, 2016 at 5:38 PM, roehst <ro.stev...@gmail.com> wrote: > Hi, I sometimes write convenience methods for pre-processing data frames, > and > I wonder if it makes sense to make a contribution -- should this be > included > in Spark or supplied as Spark Packages/3rd party libraries? > > Example: > > Get all fields in a DataFrame schema of a certain type. > > I end up writing something like getFieldsByDataType(dataFrame: DataFrame, > dataType: DataType): List[StructField] and may be adding that to the Schema > class with implicits. Something like: > > dataFrame.schema.fields.filter(_.dataType == dataType) > > Should the fields variable in the Schema class contain a method like > "filterByDataType" so we can write: > > dataFrame.getFieldsByDataType(StringType)? > > Is it useful? Is it too bloated? Would that be acceptable? That is a small > contribution that a junior developer might be able to write, for example. > This adds more code, but may be makes the library more user friendly (not > that it is not user friendly). > > Just want to hear your thoughts on this question. > > Thanks, > Rodrigo > > > > -- > View this message in context: http://apache-spark- > developers-list.1001551.n3.nabble.com/On-convenience-methods-tp19460.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >