Pasting oddity with Spark 2.0 (scala)

jggg777 Mon, 14 Nov 2016 12:09:54 -0800

This one has stumped the group here, hoping to get some insight into why this
error is happening.


I'm going through the  Databricks DataFrames scala docs
<https://docs.cloud.databricks.com/docs/latest/databricks_guide/index.html#04%20SQL,%20DataFrames%20%26%20Datasets/02%20Introduction%20to%20DataFrames%20-%20scala.html>
 
.  Halfway down is the "Flattening" code, which I've copied below:

*The original code*

>>>
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._

implicit class DataFrameFlattener(df: DataFrame) {
  def flattenSchema: DataFrame = {
    df.select(flatten(Nil, df.schema): _*)
  }
  
  protected def flatten(path: Seq[String], schema: DataType): Seq[Column] =
schema match {
    case s: StructType => s.fields.flatMap(f => flatten(path :+ f.name,
f.dataType))
    case other => col(path.map(n =>
s"`$n`").mkString(".")).as(path.mkString(".")) :: Nil
  }
}
>>>

*Pasting into spark-shell with right-click (or pasting into Zeppelin)*


On EMR using Spark 2.0.0 (this also happens on Spark 2.0.1), running
"spark-shell", I right click to paste in the code above.  Here are the
errors I get.  Note that I get the same errors when I paste into Zeppelin on
EMR.

>>>
scala> import org.apache.spark.sql._
import org.apache.spark.sql._

scala> import org.apache.spark.sql.functions._
import org.apache.spark.sql.functions._

scala> import org.apache.spark.sql.types._
import org.apache.spark.sql.types._

scala>

scala> implicit class DataFrameFlattener(df: DataFrame) {
     |   def flattenSchema: DataFrame = {
     |     df.select(flatten(Nil, df.schema): _*)
     |   }
     |
     |   protected def flatten(path: Seq[String], schema: DataType):
Seq[Column] = schema match {
     |     case s: StructType => s.fields.flatMap(f => flatten(path :+
f.name, f.dataType))
     |     case other => col(path.map(n =>
s"`$n`").mkString(".")).as(path.mkString(".")) :: Nil
     |   }
     | }
<console>:11: error: not found: type DataFrame
       implicit class DataFrameFlattener(df: DataFrame) {
                                             ^
<console>:12: error: not found: type DataFrame
         def flattenSchema: DataFrame = {
                            ^
<console>:16: error: not found: type Column
         protected def flatten(path: Seq[String], schema: DataType):
Seq[Column] = schema match {
                                                                         ^
<console>:16: error: not found: type DataType
         protected def flatten(path: Seq[String], schema: DataType):
Seq[Column] = schema match {
                                                          ^
<console>:17: error: not found: type StructType
           case s: StructType => s.fields.flatMap(f => flatten(path :+
f.name, f.dataType))
                   ^
<console>:18: error: not found: value col
           case other => col(path.map(n =>
s"`$n`").mkString(".")).as(path.mkString(".")) :: Nil
>>>

*Pasting using :paste in spark-shell*


However when I paste the same code into spark-shell using :paste, the code
succeeds.

>>>
scala> :paste
// Entering paste mode (ctrl-D to finish)

import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._

implicit class DataFrameFlattener(df: DataFrame) {
  def flattenSchema: DataFrame = {
    df.select(flatten(Nil, df.schema): _*)
  }

  protected def flatten(path: Seq[String], schema: DataType): Seq[Column] =
schema match {
    case s: StructType => s.fields.flatMap(f => flatten(path :+ f.name,
f.dataType))
    case other => col(path.map(n =>
s"`$n`").mkString(".")).as(path.mkString(".")) :: Nil
  }
}

// Exiting paste mode, now interpreting.

import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
defined class DataFrameFlattener
>>>


Any idea what's going on here, and how to get this code working in Zeppelin? 
One thing we've found is that providing the full paths for DataFrame,
StructType, etc (for example org.apache.spark.sql.DataFrame) does work, but
it's a painful workaround and we don't know why the imports don't seem to be
working as usual.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Pasting-oddity-with-Spark-2-0-scala-tp28071.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Pasting oddity with Spark 2.0 (scala)

Reply via email to