It is very difficult to give a general answer. We would need to discuss
each case. In general things that are trivially doable using existing APIs,
it is not a good idea to provide them, unless for compatibility with other
frameworks (e.g. Pandas).

On Fri, Oct 14, 2016 at 5:38 PM, roehst <ro.stev...@gmail.com> wrote:

> Hi, I sometimes write convenience methods for pre-processing data frames,
> and
> I wonder if it makes sense to make a contribution -- should this be
> included
> in Spark or supplied as Spark Packages/3rd party libraries?
>
> Example:
>
> Get all fields in a DataFrame schema of a certain type.
>
> I end up writing something like getFieldsByDataType(dataFrame: DataFrame,
> dataType: DataType): List[StructField] and may be adding that to the Schema
> class with implicits. Something like:
>
> dataFrame.schema.fields.filter(_.dataType == dataType)
>
> Should the fields variable in the Schema class contain a method like
> "filterByDataType" so we can write:
>
> dataFrame.getFieldsByDataType(StringType)?
>
> Is it useful? Is it too bloated? Would that be acceptable? That is a small
> contribution that a junior developer might be able to write, for example.
> This adds more code, but may be makes the library more user friendly (not
> that it is not user friendly).
>
> Just want to hear your thoughts on this question.
>
> Thanks,
> Rodrigo
>
>
>
> --
> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/On-convenience-methods-tp19460.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Reply via email to