Miklos Christine created SPARK-10848:
----------------------------------------

             Summary: Applied JSON Schema Works for json RDD but not when 
loading json file
                 Key: SPARK-10848
                 URL: https://issues.apache.org/jira/browse/SPARK-10848
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.5.0
            Reporter: Miklos Christine
            Priority: Minor


Using a defined schema to load a json rdd works as expected. Loading the json 
records from a file does not apply the supplied schema. Mainly the nullable 
field isn't applied correctly. Loading from a file uses nullable=true on all 
fields regardless of applied schema. 

Code to reproduce:
{code}
import  org.apache.spark.sql.types._

val jsonRdd = sc.parallelize(List(
  """{"OrderID": 1, "CustomerID":452 , "OrderDate": "2015-05-16", 
"ProductCode": "WQT648", "Qty": 5}""",
  """{"OrderID": 2, "CustomerID":16  , "OrderDate": "2015-07-11", 
"ProductCode": "LG4-Z5", "Qty": 10, "Discount":0.25, 
"expressDelivery":true}"""))

val mySchema = StructType(Array(
  StructField(name="OrderID"   , dataType=LongType, nullable=false),
  StructField("CustomerID", IntegerType, false),
  StructField("OrderDate", DateType, false),
  StructField("ProductCode", StringType, false),
  StructField("Qty", IntegerType, false),
  StructField("Discount", FloatType, true),
  StructField("expressDelivery", BooleanType, true)
))

val myDF = sqlContext.read.schema(mySchema).json(jsonRdd)
val schema1 = myDF.printSchema


val dfDFfromFile = sqlContext.read.schema(mySchema).json("Orders.json")
val schema2 = dfDFfromFile.printSchema
{code}

Orders.json
{code}
{"OrderID": 1, "CustomerID":452 , "OrderDate": "2015-05-16", "ProductCode": 
"WQT648", "Qty": 5}
{"OrderID": 2, "CustomerID":16  , "OrderDate": "2015-07-11", "ProductCode": 
"LG4-Z5", "Qty": 10, "Discount":0.25, "expressDelivery":true}
{code}

The behavior should be consistent. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to