Re: [discuss] DataFrame function namespacing

2015-05-04 Thread Reynold Xin
After talking with people on this thread and offline, I've decided to go with option 1, i.e. putting everything in a single functions object. On Thu, Apr 30, 2015 at 10:04 AM, Ted Yu yuzhih...@gmail.com wrote: IMHO I would go with choice #1 Cheers On Wed, Apr 29, 2015 at 10:03 PM, Reynold

Re: [discuss] DataFrame function namespacing

2015-04-30 Thread Ted Yu
IMHO I would go with choice #1 Cheers On Wed, Apr 29, 2015 at 10:03 PM, Reynold Xin r...@databricks.com wrote: We definitely still have the name collision problem in SQL. On Wed, Apr 29, 2015 at 10:01 PM, Punyashloka Biswal punya.bis...@gmail.com wrote: Do we still have to keep the

Re: [discuss] DataFrame function namespacing

2015-04-29 Thread Reynold Xin
Scaladoc isn't much of a problem because scaladocs are grouped. Java/Python is the main problem ... See https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$ On Wed, Apr 29, 2015 at 3:38 PM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: My feeling

Re: [discuss] DataFrame function namespacing

2015-04-29 Thread Reynold Xin
To add a little bit more context, some pros/cons I can think of are: Option 1: Very easy for users to find the function, since they are all in org.apache.spark.sql.functions. However, there will be quite a large number of them. Option 2: I can't tell why we would want this one over Option 3,

[discuss] DataFrame function namespacing

2015-04-29 Thread Reynold Xin
Before we make DataFrame non-alpha, it would be great to decide how we want to namespace all the functions. There are 3 alternatives: 1. Put all in org.apache.spark.sql.functions. This is how SQL does it, since SQL doesn't have namespaces. I estimate eventually we will have ~ 200 functions. 2.

Re: [discuss] DataFrame function namespacing

2015-04-29 Thread Shivaram Venkataraman
My feeling is that we should have a handful of namespaces (say 4 or 5). It becomes too cumbersome to import / remember more package names and having everything in one package makes it hard to read scaladoc etc. Thanks Shivaram On Wed, Apr 29, 2015 at 3:30 PM, Reynold Xin r...@databricks.com

Re: [discuss] DataFrame function namespacing

2015-04-29 Thread Reynold Xin
We definitely still have the name collision problem in SQL. On Wed, Apr 29, 2015 at 10:01 PM, Punyashloka Biswal punya.bis...@gmail.com wrote: Do we still have to keep the names of the functions distinct to avoid collisions in SQL? Or is there a plan to allow importing a namespace into SQL