Re:Re: [Spark SQL]: SQL, Python, Scala and R API Consistency

2021-02-03 Thread 大啊
+1 for this work! I agree with some clarifications below: SQL is a somewhat different case. There are functions that aren't _that_ useful in general, kind of niche, but nevertheless exist in other SQL systems, most notably Hive. It's useful to try to expand SQL support to cover those to ease

Re:[Spark SQL]: SQL, Python, Scala and R API Consistency

2021-01-31 Thread 大啊
+1 for this work! But I still don't know how to distinguish common and uncommon functions. It seems that we should decide case by case. This work will cause some confuse. At 2021-01-29 04:23:08, "MrPowers" wrote: >Thank you all for your amazing work on this project. Spark has a great >public

Re: [Spark SQL]: SQL, Python, Scala and R API Consistency

2021-01-30 Thread Kent Yao
Hi, MrPowersI'm also interested in this idea.I started https://github.com/yaooqinn/spark-func-extras a few month agoOn 2021/01/30 15:45:30, Matthew Powers wrote: Maciej - I like the idea of a separate library to provide easy access to> functions that the maintainers don't want to

Re: [Spark SQL]: SQL, Python, Scala and R API Consistency

2021-01-30 Thread Matthew Powers
Maciej - I like the idea of a separate library to provide easy access to functions that the maintainers don't want to merge into Spark core. I've seen this model work well in other open source communities. The Rails Active Support library provides the Ruby community with core functionality like

Re: [Spark SQL]: SQL, Python, Scala and R API Consistency

2021-01-30 Thread Maciej
Just thinking out loud ‒ if there is community need for providing language bindings for less popular SQL functions, could these live outside main project or even outside the ASF?  As long as expressions are already implemented, bindings are trivial after all. If could also allow usage of more

Re: [Spark SQL]: SQL, Python, Scala and R API Consistency

2021-01-28 Thread Hyukjin Kwon
FYI exposing methods with Column signature only is already documented on the top of functions.scala, and I believe that has been the current dev direction if I am not mistaken. Another point is that we should rather expose commonly used expressions. Its best if it considers language specific

Re: [Spark SQL]: SQL, Python, Scala and R API Consistency

2021-01-28 Thread Matthew Powers
Thanks for the thoughtful responses. I now understand why adding all the functions across all the APIs isn't the default. To Nick's point, relying on heuristics to gauge user interest, in addition to personal experience, is a good idea. The regexp_extract_all SO thread has 16,000 views

Re: [Spark SQL]: SQL, Python, Scala and R API Consistency

2021-01-28 Thread Reynold Xin
There's another thing that's not mentioned … it's primarily a problem for Scala. Due to static typing, we need a very large number of function overloads for the Scala version of each function, whereas in SQL/Python they are just one. There's a limit on how many functions we can add, and it also

Re: [Spark SQL]: SQL, Python, Scala and R API Consistency

2021-01-28 Thread Maciej
Just my two cents on R side. On 1/28/21 10:00 PM, Nicholas Chammas wrote: > On Thu, Jan 28, 2021 at 3:40 PM Sean Owen > wrote: > > It isn't that regexp_extract_all (for example) is useless outside > SQL, just, where do you draw the line? Supporting 10s of random

Re: [Spark SQL]: SQL, Python, Scala and R API Consistency

2021-01-28 Thread Nicholas Chammas
On Thu, Jan 28, 2021 at 3:40 PM Sean Owen wrote: > It isn't that regexp_extract_all (for example) is useless outside SQL, > just, where do you draw the line? Supporting 10s of random SQL functions > across 3 other languages has a cost, which has to be weighed against > benefit, which we can

Re: [Spark SQL]: SQL, Python, Scala and R API Consistency

2021-01-28 Thread Sean Owen
I think I can articulate the general idea here, though I expect it is not deployed consistently. Yes there's a general desire to make APIs consistent across languages. Python and Scala should track pretty closely, even if R isn't really that consistent. SQL is a somewhat different case. There

[Spark SQL]: SQL, Python, Scala and R API Consistency

2021-01-28 Thread MrPowers
Thank you all for your amazing work on this project. Spark has a great public interface and the source code is clean. The core team has done a great job building and maintaining this project. My emails / GitHub comments focus on the 1% that we might be able to improve. Pull requests /