[ https://issues.apache.org/jira/browse/SPARK-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiao Li resolved SPARK-10848. ----------------------------- Resolution: Not A Problem > Applied JSON Schema Works for json RDD but not when loading json file > --------------------------------------------------------------------- > > Key: SPARK-10848 > URL: https://issues.apache.org/jira/browse/SPARK-10848 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.0 > Reporter: Miklos Christine > Priority: Minor > > Using a defined schema to load a json rdd works as expected. Loading the json > records from a file does not apply the supplied schema. Mainly the nullable > field isn't applied correctly. Loading from a file uses nullable=true on all > fields regardless of applied schema. > Code to reproduce: > {code} > import org.apache.spark.sql.types._ > val jsonRdd = sc.parallelize(List( > """{"OrderID": 1, "CustomerID":452 , "OrderDate": "2015-05-16", > "ProductCode": "WQT648", "Qty": 5}""", > """{"OrderID": 2, "CustomerID":16 , "OrderDate": "2015-07-11", > "ProductCode": "LG4-Z5", "Qty": 10, "Discount":0.25, > "expressDelivery":true}""")) > val mySchema = StructType(Array( > StructField(name="OrderID" , dataType=LongType, nullable=false), > StructField("CustomerID", IntegerType, false), > StructField("OrderDate", DateType, false), > StructField("ProductCode", StringType, false), > StructField("Qty", IntegerType, false), > StructField("Discount", FloatType, true), > StructField("expressDelivery", BooleanType, true) > )) > val myDF = sqlContext.read.schema(mySchema).json(jsonRdd) > val schema1 = myDF.printSchema > val dfDFfromFile = sqlContext.read.schema(mySchema).json("Orders.json") > val schema2 = dfDFfromFile.printSchema > {code} > Orders.json > {code} > {"OrderID": 1, "CustomerID":452 , "OrderDate": "2015-05-16", "ProductCode": > "WQT648", "Qty": 5} > {"OrderID": 2, "CustomerID":16 , "OrderDate": "2015-07-11", "ProductCode": > "LG4-Z5", "Qty": 10, "Discount":0.25, "expressDelivery":true} > {code} > The behavior should be consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org