Re: Spark RuntimeException due to Unsupported datatype NullType
Hi Rafeeq, I think the following part triggered the bug https://issues.apache.org/jira/browse/SPARK-2908. [{*href:null*,rel:me}] It has been fixed. Can you try spark master and see if the error get resolved? Thanks, Yin On Mon, Aug 11, 2014 at 3:53 AM, rafeeq s rafeeq.ec...@gmail.com wrote: Hi, *Spark RuntimeException due to Unsupported datatype NullType , *When saving null primitives *jsonRDD *with *.saveAsParquetFile()* *Code: I am trying to* store jsonRDD into Parquet file using *saveAsParquetFile with below code.* JavaRDDString javaRDD = ssc.sparkContext().parallelize(jsonData); JavaSchemaRDD schemaObject = sqlContext.jsonRDD(javaRDD); *schemaObject.saveAsParquetFile*(tweets/tweet + time.toString().replace( ms, ) + .parquet); *Input: *In below *JSON input* have some *null values* which are not supported by spark and throwing error as *Unsupported datatype NullType.* {id:tag:search.twitter.com ,2005:11,objectType:activity,actor:{objectType:person,id:id: twitter.com:111,link:http://www.twitter.com/funtubevids,displayName:مشاهد حول العالم,postedTime:2014-05-01T06:14:51.000Z,image: https://pbs.twimg.com/profile_images/111/VORNn-Df_normal.png;, *summary*:*null*,links:[{*href:null* ,rel:me}],friendsCount:0,followersCount:49,listedCount:0,statusesCount:61, *twitterTimeZone:null*,verified:false*,utcOffset:null* ,preferredUsername:funtubevids,languages:[en],favoritesCount:0},verb:post,postedTime:2014-05-27T17:33:54.000Z,generator:{displayName:web,link: http://twitter.com },provider:{objectType:service,displayName:Twitter,link: http://www.twitter.com},link:; http://twitter.com/funtubevids/statuses/1,body:القيادة في مدرج الطيران #مهبط #مدرج #مطار #هبوط #قيادة #سيارة #طائرة #airport #plane #car https://t.co/gnn7LKE6pC,object:urls:[{url:; https://t.co/gnn7LKE6pC,expanded_url:; https://www.youtube.com/watch?v=J-j6RSRMvRo ,expanded_status:200}],klout_score:10,language:{value:ar}}} *ERROR* scheduler.JobScheduler: Error running job streaming job 140774119 ms.0 *java.lang.RuntimeException: Unsupported datatype NullType* at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.parquet.ParquetTypesConverter$.fromDataType(ParquetTypes.scala:267) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$2.apply(ParquetTypes.scala:244) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$2.apply(ParquetTypes.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.parquet.ParquetTypesConverter$.fromDataType(ParquetTypes.scala:243) at org.apache.spark.sql.parquet.ParquetTypesConverter$.fromDataType(ParquetTypes.scala:235) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$2.apply(ParquetTypes.scala:244) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$2.apply(ParquetTypes.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.parquet.ParquetTypesConverter$.fromDataType(ParquetTypes.scala:243) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$3.apply(ParquetTypes.scala:287) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$3.apply(ParquetTypes.scala:286) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105
Spark RuntimeException due to Unsupported datatype NullType
Hi, *Spark RuntimeException due to Unsupported datatype NullType , *When saving null primitives *jsonRDD *with *.saveAsParquetFile()* *Code: I am trying to* store jsonRDD into Parquet file using *saveAsParquetFile with below code.* JavaRDDString javaRDD = ssc.sparkContext().parallelize(jsonData); JavaSchemaRDD schemaObject = sqlContext.jsonRDD(javaRDD); *schemaObject.saveAsParquetFile*(tweets/tweet + time.toString().replace( ms, ) + .parquet); *Input: *In below *JSON input* have some *null values* which are not supported by spark and throwing error as *Unsupported datatype NullType.* {id:tag:search.twitter.com ,2005:11,objectType:activity,actor:{objectType:person,id:id: twitter.com:111,link:http://www.twitter.com/funtubevids,displayName:مشاهد حول العالم,postedTime:2014-05-01T06:14:51.000Z,image: https://pbs.twimg.com/profile_images/111/VORNn-Df_normal.png;, *summary*:*null*,links:[{*href:null* ,rel:me}],friendsCount:0,followersCount:49,listedCount:0,statusesCount:61, *twitterTimeZone:null*,verified:false*,utcOffset:null* ,preferredUsername:funtubevids,languages:[en],favoritesCount:0},verb:post,postedTime:2014-05-27T17:33:54.000Z,generator:{displayName:web,link: http://twitter.com },provider:{objectType:service,displayName:Twitter,link: http://www.twitter.com},link:; http://twitter.com/funtubevids/statuses/1,body:القيادة في مدرج الطيران #مهبط #مدرج #مطار #هبوط #قيادة #سيارة #طائرة #airport #plane #car https://t.co/gnn7LKE6pC,object:urls:[{url:; https://t.co/gnn7LKE6pC,expanded_url:; https://www.youtube.com/watch?v=J-j6RSRMvRo ,expanded_status:200}],klout_score:10,language:{value:ar}}} *ERROR* scheduler.JobScheduler: Error running job streaming job 140774119 ms.0 *java.lang.RuntimeException: Unsupported datatype NullType* at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.parquet.ParquetTypesConverter$.fromDataType(ParquetTypes.scala:267) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$2.apply(ParquetTypes.scala:244) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$2.apply(ParquetTypes.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.parquet.ParquetTypesConverter$.fromDataType(ParquetTypes.scala:243) at org.apache.spark.sql.parquet.ParquetTypesConverter$.fromDataType(ParquetTypes.scala:235) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$2.apply(ParquetTypes.scala:244) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$2.apply(ParquetTypes.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.parquet.ParquetTypesConverter$.fromDataType(ParquetTypes.scala:243) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$3.apply(ParquetTypes.scala:287) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$3.apply(ParquetTypes.scala:286) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.parquet.ParquetTypesConverter$.convertFromAttributes(ParquetTypes.scala:285) at org.apache.spark.sql.parquet.ParquetTypesConverter$.writeMetaData(ParquetTypes.scala:331) at org.apache.spark.sql.parquet.ParquetRelation$.createEmpty(ParquetRelation.scala:133) at org.apache.spark.sql.parquet.ParquetRelation$.create