Simeon Simeonov created SPARK-26696: ---------------------------------------
Summary: Dataset encoder should be publicly accessible Key: SPARK-26696 URL: https://issues.apache.org/jira/browse/SPARK-26696 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Simeon Simeonov As a platform, Spark should enable framework developers to accomplish outside of the Spark codebase much of what can be accomplished inside the Spark codebase. One of the obstacles to this is a historical pattern of excessive data hiding in Spark, e.g., {{expr}} in {{Column}} not being accessible. This issue is an example of this pattern when it comes to {{Dataset}}. Consider a transformation with the signature `def foo[A](ds: Dataset[A]): Dataset[A]`, which requires the use of {{toDF()}}. To get back to {{Dataset[A]}} would require calling {{.as[A]}}, which requires an implicit {{Encoder[A]}}. A naive approach would change the function signature to `foo[A : Encoder]` but this is poor API design that requires unnecessarily carrying of implicits from user code into framework code. We know `Encoder[A]` exists because we have access to an instance of `Dataset[A]`... but its `encoder` is not accessible. The solution is simple: make {{encoder}} a {{@transient val}} just as is the case with {{queryExecution}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org