Add one more thing about question 1. Once you get the SchemaRDD from jsonFile/jsonRDD, you can use CAST(columnName as DATE) in your query to cast the column type from the StringType to DateType (the string format should be "yyyy-[m]m-[d]d" and you need to use hiveContext). Here is the code snippet that may help.
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) val schemaRDD = hiveContext.jsonFile(...) schemaRDD.registerTempTable("jsonTable") hiveContext.sql("SELECT CAST(columnName as DATE) FROM jsonTable") Thanks, Yin On Tue, Oct 21, 2014 at 8:00 PM, Yin Huai <huaiyin....@gmail.com> wrote: > Hello Tridib, > > I just saw this one. > > 1. Right now, jsonFile and jsonRDD do not detect date type. Right now, > IntegerType, LongType, DoubleType, DecimalType, StringType, BooleanType, > StructType and ArrayType will be automatically detected. > 2. The process of inferring schema will pass the entire dataset once to > determine the schema. So, you will see a join is launched. Applying a > specific schema to a dataset does not have this cost. > 3. It is hard to comment on it without seeing your implementation. For our > built-in JSON support, jsonFile and jsonRDD provides a very convenient way > to work with JSON datasets with SQL. You do not need to define the schema > in advance and Spark SQL will automatically create the SchemaRDD for your > dataset. You can start to query it with SQL by simply registering the > returned SchemaRDD as a temp table. Regarding the implementation, we use a > high performance JSON lib (Jackson, https://github.com/FasterXML/jackson) > to parse JSON records. > > Thanks, > > Yin > > On Mon, Oct 20, 2014 at 10:56 PM, tridib <tridib.sama...@live.com> wrote: > >> Hi Spark SQL team, >> I trying to explore automatic schema detection for json document. I have >> few >> questions: >> 1. What should be the date format to detect the fields as date type? >> 2. Is automatic schema infer slower than applying specific schema? >> 3. At this moment I am parsing json myself using map Function and creating >> schema RDD from the parsed JavaRDD. Is there any performance impact not >> using inbuilt jsonFile()? >> >> Thanks >> Tridib >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-sqlContext-jsonFile-date-type-detection-and-perforormance-tp16881.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >