My feeling is that we should have a handful of namespaces (say 4 or 5). It
becomes too cumbersome to import / remember more package names and having
everything in one package makes it hard to read scaladoc etc.

Thanks
Shivaram

On Wed, Apr 29, 2015 at 3:30 PM, Reynold Xin <r...@databricks.com> wrote:

> To add a little bit more context, some pros/cons I can think of are:
>
> Option 1: Very easy for users to find the function, since they are all in
> org.apache.spark.sql.functions. However, there will be quite a large number
> of them.
>
> Option 2: I can't tell why we would want this one over Option 3, since it
> has all the problems of Option 3, and not as nice of a hierarchy.
>
> Option 3: Opposite of Option 1. Each "package" or static class has a small
> number of functions that are relevant to each other, but for some functions
> it is unclear where they should go (e.g. should "min" go into basic or
> math?)
>
>
>
>
> On Wed, Apr 29, 2015 at 3:21 PM, Reynold Xin <r...@databricks.com> wrote:
>
> > Before we make DataFrame non-alpha, it would be great to decide how we
> > want to namespace all the functions. There are 3 alternatives:
> >
> > 1. Put all in org.apache.spark.sql.functions. This is how SQL does it,
> > since SQL doesn't have namespaces. I estimate eventually we will have ~
> 200
> > functions.
> >
> > 2. Have explicit namespaces, which is what master branch currently looks
> > like:
> >
> > - org.apache.spark.sql.functions
> > - org.apache.spark.sql.mathfunctions
> > - ...
> >
> > 3. Have explicit namespaces, but restructure them slightly so everything
> > is under functions.
> >
> > package object functions {
> >
> >   // all the old functions here -- but deprecated so we keep source
> > compatibility
> >   def ...
> > }
> >
> > package org.apache.spark.sql.functions
> >
> > object mathFunc {
> >   ...
> > }
> >
> > object basicFuncs {
> >   ...
> > }
> >
> >
> >
>

Reply via email to