[ https://issues.apache.org/jira/browse/SPARK-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263867#comment-16263867 ]
Amit edited comment on SPARK-10848 at 11/23/17 6:22 AM: -------------------------------------------------------- This issue is still persistent in Spark 2.1.0. I tried below steps in Spark 2.1.0, it giving the same result as in the question, Please reopen the JIRA to get it tracked. {code:java} import org.apache.spark.sql.types._ {code} {code:java} val jsonRdd = sc.parallelize(List( """{"OrderID": 1, "CustomerID":452 , "OrderDate": "2015-05-16", "ProductCode": "WQT648", "Qty": 5}""", """{"OrderID": 2, "CustomerID":16 , "OrderDate": "2015-07-11", "ProductCode": "LG4-Z5", "Qty": 10, "Discount":0.25, "expressDelivery":true}""")) {code} {code:java} val mySchema = StructType(Array( StructField(name="OrderID" , dataType=LongType, nullable=false), StructField("CustomerID", IntegerType, false), StructField("OrderDate", DateType, false), StructField("ProductCode", StringType, false), StructField("Qty", IntegerType, false), StructField("Discount", FloatType, true), StructField("expressDelivery", BooleanType, true) )) val myDF = spark.read.schema(mySchema).json(jsonRdd) val schema1 = myDF.printSchema val dfDFfromFile = spark.read.schema(mySchema).json("csvdatatest/Orders.json") val schema2 = dfDFfromFile.printSchema {code} was (Author: amit1990): This issue is still persistent in Spark 2.1.0. I tried below steps and in Spark 2.1.0, it giving the same result as in the question, Please reopen the JIRA to get it tracked. import org.apache.spark.sql.types._ {code:java} val jsonRdd = sc.parallelize(List( """{"OrderID": 1, "CustomerID":452 , "OrderDate": "2015-05-16", "ProductCode": "WQT648", "Qty": 5}""", """{"OrderID": 2, "CustomerID":16 , "OrderDate": "2015-07-11", "ProductCode": "LG4-Z5", "Qty": 10, "Discount":0.25, "expressDelivery":true}""")) {code} {code:java} val mySchema = StructType(Array( StructField(name="OrderID" , dataType=LongType, nullable=false), StructField("CustomerID", IntegerType, false), StructField("OrderDate", DateType, false), StructField("ProductCode", StringType, false), StructField("Qty", IntegerType, false), StructField("Discount", FloatType, true), StructField("expressDelivery", BooleanType, true) )) val myDF = spark.read.schema(mySchema).json(jsonRdd) val schema1 = myDF.printSchema val dfDFfromFile = spark.read.schema(mySchema).json("csvdatatest/Orders.json") val schema2 = dfDFfromFile.printSchema {code} > Applied JSON Schema Works for json RDD but not when loading json file > --------------------------------------------------------------------- > > Key: SPARK-10848 > URL: https://issues.apache.org/jira/browse/SPARK-10848 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.0 > Reporter: Miklos Christine > Priority: Minor > > Using a defined schema to load a json rdd works as expected. Loading the json > records from a file does not apply the supplied schema. Mainly the nullable > field isn't applied correctly. Loading from a file uses nullable=true on all > fields regardless of applied schema. > Code to reproduce: > {code} > import org.apache.spark.sql.types._ > val jsonRdd = sc.parallelize(List( > """{"OrderID": 1, "CustomerID":452 , "OrderDate": "2015-05-16", > "ProductCode": "WQT648", "Qty": 5}""", > """{"OrderID": 2, "CustomerID":16 , "OrderDate": "2015-07-11", > "ProductCode": "LG4-Z5", "Qty": 10, "Discount":0.25, > "expressDelivery":true}""")) > val mySchema = StructType(Array( > StructField(name="OrderID" , dataType=LongType, nullable=false), > StructField("CustomerID", IntegerType, false), > StructField("OrderDate", DateType, false), > StructField("ProductCode", StringType, false), > StructField("Qty", IntegerType, false), > StructField("Discount", FloatType, true), > StructField("expressDelivery", BooleanType, true) > )) > val myDF = sqlContext.read.schema(mySchema).json(jsonRdd) > val schema1 = myDF.printSchema > val dfDFfromFile = sqlContext.read.schema(mySchema).json("Orders.json") > val schema2 = dfDFfromFile.printSchema > {code} > Orders.json > {code} > {"OrderID": 1, "CustomerID":452 , "OrderDate": "2015-05-16", "ProductCode": > "WQT648", "Qty": 5} > {"OrderID": 2, "CustomerID":16 , "OrderDate": "2015-07-11", "ProductCode": > "LG4-Z5", "Qty": 10, "Discount":0.25, "expressDelivery":true} > {code} > The behavior should be consistent. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org