Hi Mich,

That’s because df2 is only within scope in the if statements.

Try this:

val df = option match {
  case 1 => {
    println("option = 1")
    val df = spark.read.option("header", 
false).csv("hdfs://rhes564:9000/data/prices/prices.*")
    val df2 = df.map(p => columns(p(0).toString.toInt,p(1).toString, 
p(2).toString,p(3).toString))
    df2
  }
  case 2 => spark.table("test.marketData").select('TIMECREATED,'SECURITY,'PRICE)
  case 3 => 
spark.table("test.marketDataParquet").select('TIMECREATED,'SECURITY,'PRICE)
  case _ => sys.err(“no valid option provided”)
}

df.printSchema()


Thanks,
Silvio

From: Mich Talebzadeh <mich.talebza...@gmail.com>
Date: Saturday, September 17, 2016 at 4:18 PM
To: "user @spark" <user@spark.apache.org>
Subject: DataFrame defined within conditional IF ELSE statement

In Spark 2 this gives me an error in a conditional  IF ELSE statement

I recall seeing the same in standard SQL

I am doing a test for different sources (text file, ORC or Parquet) to be read 
in dependent on value of var option

I wrote this

import org.apache.spark.sql.functions._
import java.util.Calendar
import org.joda.time._
var option = 1
val today = new DateTime()
val minutes = -15
val  minutesago =  today.plusMinutes(minutes).toString.toString.substring(11,19)
val date = java.time.LocalDate.now.toString
val hour = java.time.LocalTime.now.toString
case class columns(INDEX: Int, TIMECREATED: String, SECURITY: String, PRICE: 
String)

if(option == 1 ) {
   println("option = 1")
   val df = spark.read.option("header", 
false).csv("hdfs://rhes564:9000/data/prices/prices.*")
   val df2 = df.map(p => columns(p(0).toString.toInt,p(1).toString, 
p(2).toString,p(3).toString))
   df2.printSchema
} else if (option == 2) {
    val df2 = 
spark.table("test.marketData").select('TIMECREATED,'SECURITY,'PRICE)
} else if (option == 3) {
    val df2 = 
spark.table("test.marketDataParquet").select('TIMECREATED,'SECURITY,'PRICE)
} else {
    println("no valid option provided")
    sys.exit(0)
}

With option 1 selected it goes through and shows this

option = 1
root
 |-- INDEX: integer (nullable = true)
 |-- TIMECREATED: string (nullable = true)
 |-- SECURITY: string (nullable = true)
 |-- PRICE: string (nullable = true)

But when I try to do df2.printSchema OUTSEDE of the LOOP, it comes back with 
error

scala> df2.printSchema
<console>:31: error: not found: value df2
       df2.printSchema
       ^
I can define a stud df2 before IF ELSE statement. Is that the best way of 
dealing with it?

Thanks


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.


Reply via email to