Re: [discuss] DataFrame function namespacing

Reynold Xin Wed, 29 Apr 2015 22:05:28 -0700

We definitely still have the name collision problem in SQL.

On Wed, Apr 29, 2015 at 10:01 PM, Punyashloka Biswal <[email protected]
> wrote:


> Do we still have to keep the names of the functions distinct to avoid
> collisions in SQL? Or is there a plan to allow "importing" a namespace into
> SQL somehow?
>
> I ask because if we have to keep worrying about name collisions then I'm
> not sure what the added complexity of #2 and #3 buys us.
>
> Punya
>
> On Wed, Apr 29, 2015 at 3:52 PM Reynold Xin <[email protected]> wrote:
>
>> Scaladoc isn't much of a problem because scaladocs are grouped.
>> Java/Python
>> is the main problem ...
>>
>> See
>>
>> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$
>>
>> On Wed, Apr 29, 2015 at 3:38 PM, Shivaram Venkataraman <
>> [email protected]> wrote:
>>
>> > My feeling is that we should have a handful of namespaces (say 4 or 5).
>> It
>> > becomes too cumbersome to import / remember more package names and
>> having
>> > everything in one package makes it hard to read scaladoc etc.
>> >
>> > Thanks
>> > Shivaram
>> >
>> > On Wed, Apr 29, 2015 at 3:30 PM, Reynold Xin <[email protected]>
>> wrote:
>> >
>> >> To add a little bit more context, some pros/cons I can think of are:
>> >>
>> >> Option 1: Very easy for users to find the function, since they are all
>> in
>> >> org.apache.spark.sql.functions. However, there will be quite a large
>> >> number
>> >> of them.
>> >>
>> >> Option 2: I can't tell why we would want this one over Option 3, since
>> it
>> >> has all the problems of Option 3, and not as nice of a hierarchy.
>> >>
>> >> Option 3: Opposite of Option 1. Each "package" or static class has a
>> small
>> >> number of functions that are relevant to each other, but for some
>> >> functions
>> >> it is unclear where they should go (e.g. should "min" go into basic or
>> >> math?)
>> >>
>> >>
>> >>
>> >>
>> >> On Wed, Apr 29, 2015 at 3:21 PM, Reynold Xin <[email protected]>
>> wrote:
>> >>
>> >> > Before we make DataFrame non-alpha, it would be great to decide how
>> we
>> >> > want to namespace all the functions. There are 3 alternatives:
>> >> >
>> >> > 1. Put all in org.apache.spark.sql.functions. This is how SQL does
>> it,
>> >> > since SQL doesn't have namespaces. I estimate eventually we will
>> have ~
>> >> 200
>> >> > functions.
>> >> >
>> >> > 2. Have explicit namespaces, which is what master branch currently
>> looks
>> >> > like:
>> >> >
>> >> > - org.apache.spark.sql.functions
>> >> > - org.apache.spark.sql.mathfunctions
>> >> > - ...
>> >> >
>> >> > 3. Have explicit namespaces, but restructure them slightly so
>> everything
>> >> > is under functions.
>> >> >
>> >> > package object functions {
>> >> >
>> >> >   // all the old functions here -- but deprecated so we keep source
>> >> > compatibility
>> >> >   def ...
>> >> > }
>> >> >
>> >> > package org.apache.spark.sql.functions
>> >> >
>> >> > object mathFunc {
>> >> >   ...
>> >> > }
>> >> >
>> >> > object basicFuncs {
>> >> >   ...
>> >> > }
>> >> >
>> >> >
>> >> >
>> >>
>> >
>> >
>>
>

Re: [discuss] DataFrame function namespacing

Reply via email to