Our entire team prefers 'x to $"x" or col("x"). We find this way of addressing
top-level columns to be more readable, especially when Column expressions get
complicated. It is unfortunate that the default Spark implementation requires
importing SparkSession.implicits to use this. That’s not a problem in notebook
code but becomes less convenient for *.scala code where there isn’t an
available session that can be imported at the file level. To fix this, we have
object NoSessionImplicits {
/**
* An implicit conversion that turns a Scala `Symbol` into a [[Column]].
* Useful for when there is no [[org.apache.spark.sql.SparkSession]] to
import from.
*/
implicit def symbolToColumn(s: Symbol): ColumnName = new ColumnName(s.name)
}
A question to the group. I’m less unfamiliar with Zeppelin & Jupyter
environments since we use Databricks. Can people familiar with these
environments opine on the ease of running a migration tool on the Scala code
snippets in notebooks?
Thanks,
Sim
Simeon Simeonov, Founder & CTO, Swoop
@simeons | blog.simeonov.com | 617.299.6746
From: Koert Kuipers <[email protected]>
Date: Sunday, March 31, 2019 at 11:18 AM
To: Rubén Berenguel <[email protected]>
Cc: Sean Owen <[email protected]>, Reynold Xin <[email protected]>, Simeon
Simeonov <[email protected]>, dev <[email protected]>
Subject: Re: Do you use single-quote syntax for the DataFrame API?
i don't care much about the symbol class but i find 'a much easier on the eye
than $"a" or "a" and we use it extensively as such in many DSLs including spark.
so its the syntax i would like to preserve not the class, which seems to be the
opposite of what they are suggesting.
On Sun, Mar 31, 2019 at 10:07 AM Rubén Berenguel
<[email protected]<mailto:[email protected]>> wrote:
I favour using either $”foo” or columnar expressions, but know of several
developers who prefer single quote syntax and consider it a better practice.
R
On 31 March 2019 at 15:15:00, Sean Owen
([email protected]<mailto:[email protected]>) wrote:
FWIW I use "foo" in Pyspark or col("foo") where necessary, and $"foo" in Scala
On Sun, Mar 31, 2019 at 1:58 AM Reynold Xin
<[email protected]<mailto:[email protected]>> wrote:
Error! Filename not specified.
As part of evolving the Scala language, the Scala team is considering removing
single-quote syntax for representing symbols. Single-quote syntax is one of the
ways to represent a column in Spark's DataFrame API. While I personally don't
use them (I prefer just using strings for column names, or using expr
function), I see them used quite a lot by other people's code, e.g.
df.select<http://df.select/>('id, 'name).show()
I want to bring this to more people's attention, in case they are depending on
this. The discussion thread is:
https://contributors.scala-lang.org/t/proposal-to-deprecate-and-remove-symbol-literals/2953