After talking with people on this thread and offline, I've decided to go
with option 1, i.e. putting everything in a single functions object.
On Thu, Apr 30, 2015 at 10:04 AM, Ted Yu yuzhih...@gmail.com wrote:
IMHO I would go with choice #1
Cheers
On Wed, Apr 29, 2015 at 10:03 PM, Reynold
IMHO I would go with choice #1
Cheers
On Wed, Apr 29, 2015 at 10:03 PM, Reynold Xin r...@databricks.com wrote:
We definitely still have the name collision problem in SQL.
On Wed, Apr 29, 2015 at 10:01 PM, Punyashloka Biswal
punya.bis...@gmail.com
wrote:
Do we still have to keep the
Scaladoc isn't much of a problem because scaladocs are grouped. Java/Python
is the main problem ...
See
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$
On Wed, Apr 29, 2015 at 3:38 PM, Shivaram Venkataraman
shiva...@eecs.berkeley.edu wrote:
My feeling
To add a little bit more context, some pros/cons I can think of are:
Option 1: Very easy for users to find the function, since they are all in
org.apache.spark.sql.functions. However, there will be quite a large number
of them.
Option 2: I can't tell why we would want this one over Option 3,
Before we make DataFrame non-alpha, it would be great to decide how we want
to namespace all the functions. There are 3 alternatives:
1. Put all in org.apache.spark.sql.functions. This is how SQL does it,
since SQL doesn't have namespaces. I estimate eventually we will have ~ 200
functions.
2.
My feeling is that we should have a handful of namespaces (say 4 or 5). It
becomes too cumbersome to import / remember more package names and having
everything in one package makes it hard to read scaladoc etc.
Thanks
Shivaram
On Wed, Apr 29, 2015 at 3:30 PM, Reynold Xin r...@databricks.com
We definitely still have the name collision problem in SQL.
On Wed, Apr 29, 2015 at 10:01 PM, Punyashloka Biswal punya.bis...@gmail.com
wrote:
Do we still have to keep the names of the functions distinct to avoid
collisions in SQL? Or is there a plan to allow importing a namespace into
SQL