Miklos Christine created SPARK-10848: ----------------------------------------
Summary: Applied JSON Schema Works for json RDD but not when loading json file Key: SPARK-10848 URL: https://issues.apache.org/jira/browse/SPARK-10848 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: Miklos Christine Priority: Minor Using a defined schema to load a json rdd works as expected. Loading the json records from a file does not apply the supplied schema. Mainly the nullable field isn't applied correctly. Loading from a file uses nullable=true on all fields regardless of applied schema. Code to reproduce: {code} import org.apache.spark.sql.types._ val jsonRdd = sc.parallelize(List( """{"OrderID": 1, "CustomerID":452 , "OrderDate": "2015-05-16", "ProductCode": "WQT648", "Qty": 5}""", """{"OrderID": 2, "CustomerID":16 , "OrderDate": "2015-07-11", "ProductCode": "LG4-Z5", "Qty": 10, "Discount":0.25, "expressDelivery":true}""")) val mySchema = StructType(Array( StructField(name="OrderID" , dataType=LongType, nullable=false), StructField("CustomerID", IntegerType, false), StructField("OrderDate", DateType, false), StructField("ProductCode", StringType, false), StructField("Qty", IntegerType, false), StructField("Discount", FloatType, true), StructField("expressDelivery", BooleanType, true) )) val myDF = sqlContext.read.schema(mySchema).json(jsonRdd) val schema1 = myDF.printSchema val dfDFfromFile = sqlContext.read.schema(mySchema).json("Orders.json") val schema2 = dfDFfromFile.printSchema {code} Orders.json {code} {"OrderID": 1, "CustomerID":452 , "OrderDate": "2015-05-16", "ProductCode": "WQT648", "Qty": 5} {"OrderID": 2, "CustomerID":16 , "OrderDate": "2015-07-11", "ProductCode": "LG4-Z5", "Qty": 10, "Discount":0.25, "expressDelivery":true} {code} The behavior should be consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org