[jira] [Updated] (SPARK-9032) scala.MatchError in DataFrameReader.json(String path)
[ https://issues.apache.org/jira/browse/SPARK-9032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-9032: - Assignee: Josh Rosen > scala.MatchError in DataFrameReader.json(String path) > - > > Key: SPARK-9032 > URL: https://issues.apache.org/jira/browse/SPARK-9032 > Project: Spark > Issue Type: Bug > Components: Java API, SQL >Affects Versions: 1.4.0 > Environment: Ubuntu 15.04 >Reporter: Philipp Poetter >Assignee: Josh Rosen > Fix For: 1.4.1 > > > Executing read().json() of SQLContext e.g. DataFrameReader raises a > MatchError with a stacktrace as follows while trying to read JSON data: > {code} > 15/07/14 11:25:26 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks > have all completed, from pool > 15/07/14 11:25:26 INFO DAGScheduler: Job 0 finished: json at Example.java:23, > took 6.981330 s > Exception in thread "main" scala.MatchError: StringType (of class > org.apache.spark.sql.types.StringType$) > at org.apache.spark.sql.json.InferSchema$.apply(InferSchema.scala:58) > at > org.apache.spark.sql.json.JSONRelation$$anonfun$schema$1.apply(JSONRelation.scala:139) > at > org.apache.spark.sql.json.JSONRelation$$anonfun$schema$1.apply(JSONRelation.scala:138) > at scala.Option.getOrElse(Option.scala:120) > at > org.apache.spark.sql.json.JSONRelation.schema$lzycompute(JSONRelation.scala:137) > at org.apache.spark.sql.json.JSONRelation.schema(JSONRelation.scala:137) > at > org.apache.spark.sql.sources.LogicalRelation.(LogicalRelation.scala:30) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:120) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104) > at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:213) > at com.hp.sparkdemo.Example.main(Example.java:23) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > 15/07/14 11:25:26 INFO SparkContext: Invoking stop() from shutdown hook > 15/07/14 11:25:26 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:4040 > 15/07/14 11:25:26 INFO DAGScheduler: Stopping DAGScheduler > 15/07/14 11:25:26 INFO SparkDeploySchedulerBackend: Shutting down all > executors > 15/07/14 11:25:26 INFO SparkDeploySchedulerBackend: Asking each executor to > shut down > 15/07/14 11:25:26 INFO MapOutputTrackerMasterEndpoint: > MapOutputTrackerMasterEndpoint stopped! > {code} > Offending code snippet (around line 23): > {code} >JavaSparkContext sctx = new JavaSparkContext(sparkConf); > SQLContext ctx = new SQLContext(sctx); > DataFrame frame = ctx.read().json(facebookJSON); > frame.printSchema(); > {code} > The exception is reproducable using the following JSON: > {code} > { >"data": [ > { > "id": "X999_Y999", > "from": { > "name": "Tom Brady", "id": "X12" > }, > "message": "Looking forward to 2010!", > "actions": [ > { >"name": "Comment", >"link": "http://www.facebook.com/X999/posts/Y999; > }, > { >"name": "Like", >"link": "http://www.facebook.com/X999/posts/Y999; > } > ], > "type": "status", > "created_time": "2010-08-02T21:27:44+", > "updated_time": "2010-08-02T21:27:44+" > }, > { > "id": "X998_Y998", > "from": { > "name": "Peyton Manning", "id": "X18" > }, > "message": "Where's my contract?", > "actions": [ > { >"name": "Comment", >"link": "http://www.facebook.com/X998/posts/Y998; > }, > { >"name": "Like", >"link": "http://www.facebook.com/X998/posts/Y998; > } > ], > "type": "status", > "created_time": "2010-08-02T21:27:44+", > "updated_time": "2010-08-02T21:27:44+" > } >] > } > {code} -- This message was sent by Atlassian JIRA
[jira] [Updated] (SPARK-9032) scala.MatchError in DataFrameReader.json(String path)
[ https://issues.apache.org/jira/browse/SPARK-9032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-9032: -- Description: Executing read().json() of SQLContext e.g. DataFrameReader raises a MatchError with a stacktrace as follows while trying to read JSON data: {code} 15/07/14 11:25:26 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 15/07/14 11:25:26 INFO DAGScheduler: Job 0 finished: json at Example.java:23, took 6.981330 s Exception in thread "main" scala.MatchError: StringType (of class org.apache.spark.sql.types.StringType$) at org.apache.spark.sql.json.InferSchema$.apply(InferSchema.scala:58) at org.apache.spark.sql.json.JSONRelation$$anonfun$schema$1.apply(JSONRelation.scala:139) at org.apache.spark.sql.json.JSONRelation$$anonfun$schema$1.apply(JSONRelation.scala:138) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.json.JSONRelation.schema$lzycompute(JSONRelation.scala:137) at org.apache.spark.sql.json.JSONRelation.schema(JSONRelation.scala:137) at org.apache.spark.sql.sources.LogicalRelation.(LogicalRelation.scala:30) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:120) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104) at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:213) at com.hp.sparkdemo.Example.main(Example.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 15/07/14 11:25:26 INFO SparkContext: Invoking stop() from shutdown hook 15/07/14 11:25:26 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:4040 15/07/14 11:25:26 INFO DAGScheduler: Stopping DAGScheduler 15/07/14 11:25:26 INFO SparkDeploySchedulerBackend: Shutting down all executors 15/07/14 11:25:26 INFO SparkDeploySchedulerBackend: Asking each executor to shut down 15/07/14 11:25:26 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! {code} Offending code snippet (around line 23): {code} JavaSparkContext sctx = new JavaSparkContext(sparkConf); SQLContext ctx = new SQLContext(sctx); DataFrame frame = ctx.read().json(facebookJSON); frame.printSchema(); {code} The exception is reproducable using the following JSON: {code} { "data": [ { "id": "X999_Y999", "from": { "name": "Tom Brady", "id": "X12" }, "message": "Looking forward to 2010!", "actions": [ { "name": "Comment", "link": "http://www.facebook.com/X999/posts/Y999; }, { "name": "Like", "link": "http://www.facebook.com/X999/posts/Y999; } ], "type": "status", "created_time": "2010-08-02T21:27:44+", "updated_time": "2010-08-02T21:27:44+" }, { "id": "X998_Y998", "from": { "name": "Peyton Manning", "id": "X18" }, "message": "Where's my contract?", "actions": [ { "name": "Comment", "link": "http://www.facebook.com/X998/posts/Y998; }, { "name": "Like", "link": "http://www.facebook.com/X998/posts/Y998; } ], "type": "status", "created_time": "2010-08-02T21:27:44+", "updated_time": "2010-08-02T21:27:44+" } ] } {code} was: Executing read().json() of SQLContext e.g. DataFrameReader raises a MatchError with a stacktrace as follows while trying to read JSON data: 15/07/14 11:25:26 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 15/07/14 11:25:26 INFO DAGScheduler: Job 0 finished: json at Example.java:23, took 6.981330 s Exception in thread "main" scala.MatchError: StringType (of class org.apache.spark.sql.types.StringType$) at org.apache.spark.sql.json.InferSchema$.apply(InferSchema.scala:58) at org.apache.spark.sql.json.JSONRelation$$anonfun$schema$1.apply(JSONRelation.scala:139) at