[jira] [Updated] (SPARK-9032) scala.MatchError in DataFrameReader.json(String path)

2015-09-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-9032:
-
Assignee: Josh Rosen

> scala.MatchError in DataFrameReader.json(String path)
> -
>
> Key: SPARK-9032
> URL: https://issues.apache.org/jira/browse/SPARK-9032
> Project: Spark
>  Issue Type: Bug
>  Components: Java API, SQL
>Affects Versions: 1.4.0
> Environment: Ubuntu 15.04
>Reporter: Philipp Poetter
>Assignee: Josh Rosen
> Fix For: 1.4.1
>
>
> Executing read().json() of SQLContext e.g. DataFrameReader raises a 
> MatchError with a stacktrace as follows while trying to read JSON data:
> {code}
> 15/07/14 11:25:26 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks 
> have all completed, from pool 
> 15/07/14 11:25:26 INFO DAGScheduler: Job 0 finished: json at Example.java:23, 
> took 6.981330 s
> Exception in thread "main" scala.MatchError: StringType (of class 
> org.apache.spark.sql.types.StringType$)
>   at org.apache.spark.sql.json.InferSchema$.apply(InferSchema.scala:58)
>   at 
> org.apache.spark.sql.json.JSONRelation$$anonfun$schema$1.apply(JSONRelation.scala:139)
>   at 
> org.apache.spark.sql.json.JSONRelation$$anonfun$schema$1.apply(JSONRelation.scala:138)
>   at scala.Option.getOrElse(Option.scala:120)
>   at 
> org.apache.spark.sql.json.JSONRelation.schema$lzycompute(JSONRelation.scala:137)
>   at org.apache.spark.sql.json.JSONRelation.schema(JSONRelation.scala:137)
>   at 
> org.apache.spark.sql.sources.LogicalRelation.(LogicalRelation.scala:30)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:120)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104)
>   at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:213)
>   at com.hp.sparkdemo.Example.main(Example.java:23)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 15/07/14 11:25:26 INFO SparkContext: Invoking stop() from shutdown hook
> 15/07/14 11:25:26 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:4040
> 15/07/14 11:25:26 INFO DAGScheduler: Stopping DAGScheduler
> 15/07/14 11:25:26 INFO SparkDeploySchedulerBackend: Shutting down all 
> executors
> 15/07/14 11:25:26 INFO SparkDeploySchedulerBackend: Asking each executor to 
> shut down
> 15/07/14 11:25:26 INFO MapOutputTrackerMasterEndpoint: 
> MapOutputTrackerMasterEndpoint stopped!
> {code}
> Offending code snippet (around line 23):
> {code}
>JavaSparkContext sctx = new JavaSparkContext(sparkConf);
> SQLContext ctx = new SQLContext(sctx);
> DataFrame frame = ctx.read().json(facebookJSON);
> frame.printSchema();
> {code}
> The exception is reproducable using the following JSON:
> {code}
> {
>"data": [
>   {
>  "id": "X999_Y999",
>  "from": {
> "name": "Tom Brady", "id": "X12"
>  },
>  "message": "Looking forward to 2010!",
>  "actions": [
> {
>"name": "Comment",
>"link": "http://www.facebook.com/X999/posts/Y999;
> },
> {
>"name": "Like",
>"link": "http://www.facebook.com/X999/posts/Y999;
> }
>  ],
>  "type": "status",
>  "created_time": "2010-08-02T21:27:44+",
>  "updated_time": "2010-08-02T21:27:44+"
>   },
>   {
>  "id": "X998_Y998",
>  "from": {
> "name": "Peyton Manning", "id": "X18"
>  },
>  "message": "Where's my contract?",
>  "actions": [
> {
>"name": "Comment",
>"link": "http://www.facebook.com/X998/posts/Y998;
> },
> {
>"name": "Like",
>"link": "http://www.facebook.com/X998/posts/Y998;
> }
>  ],
>  "type": "status",
>  "created_time": "2010-08-02T21:27:44+",
>  "updated_time": "2010-08-02T21:27:44+"
>   }
>]
> }
> {code}



--
This message was sent by Atlassian JIRA

[jira] [Updated] (SPARK-9032) scala.MatchError in DataFrameReader.json(String path)

2015-09-15 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-9032:
--
Description: 
Executing read().json() of SQLContext e.g. DataFrameReader raises a MatchError 
with a stacktrace as follows while trying to read JSON data:

{code}
15/07/14 11:25:26 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have 
all completed, from pool 
15/07/14 11:25:26 INFO DAGScheduler: Job 0 finished: json at Example.java:23, 
took 6.981330 s
Exception in thread "main" scala.MatchError: StringType (of class 
org.apache.spark.sql.types.StringType$)
at org.apache.spark.sql.json.InferSchema$.apply(InferSchema.scala:58)
at 
org.apache.spark.sql.json.JSONRelation$$anonfun$schema$1.apply(JSONRelation.scala:139)
at 
org.apache.spark.sql.json.JSONRelation$$anonfun$schema$1.apply(JSONRelation.scala:138)
at scala.Option.getOrElse(Option.scala:120)
at 
org.apache.spark.sql.json.JSONRelation.schema$lzycompute(JSONRelation.scala:137)
at org.apache.spark.sql.json.JSONRelation.schema(JSONRelation.scala:137)
at 
org.apache.spark.sql.sources.LogicalRelation.(LogicalRelation.scala:30)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:120)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:213)
at com.hp.sparkdemo.Example.main(Example.java:23)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
15/07/14 11:25:26 INFO SparkContext: Invoking stop() from shutdown hook
15/07/14 11:25:26 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:4040
15/07/14 11:25:26 INFO DAGScheduler: Stopping DAGScheduler
15/07/14 11:25:26 INFO SparkDeploySchedulerBackend: Shutting down all executors
15/07/14 11:25:26 INFO SparkDeploySchedulerBackend: Asking each executor to 
shut down
15/07/14 11:25:26 INFO MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!
{code}

Offending code snippet (around line 23):

{code}
   JavaSparkContext sctx = new JavaSparkContext(sparkConf);
SQLContext ctx = new SQLContext(sctx);
DataFrame frame = ctx.read().json(facebookJSON);
frame.printSchema();
{code}

The exception is reproducable using the following JSON:
{code}
{
   "data": [
  {
 "id": "X999_Y999",
 "from": {
"name": "Tom Brady", "id": "X12"
 },
 "message": "Looking forward to 2010!",
 "actions": [
{
   "name": "Comment",
   "link": "http://www.facebook.com/X999/posts/Y999;
},
{
   "name": "Like",
   "link": "http://www.facebook.com/X999/posts/Y999;
}
 ],
 "type": "status",
 "created_time": "2010-08-02T21:27:44+",
 "updated_time": "2010-08-02T21:27:44+"
  },
  {
 "id": "X998_Y998",
 "from": {
"name": "Peyton Manning", "id": "X18"
 },
 "message": "Where's my contract?",
 "actions": [
{
   "name": "Comment",
   "link": "http://www.facebook.com/X998/posts/Y998;
},
{
   "name": "Like",
   "link": "http://www.facebook.com/X998/posts/Y998;
}
 ],
 "type": "status",
 "created_time": "2010-08-02T21:27:44+",
 "updated_time": "2010-08-02T21:27:44+"
  }
   ]
}
{code}

  was:
Executing read().json() of SQLContext e.g. DataFrameReader raises a MatchError 
with a stacktrace as follows while trying to read JSON data:

15/07/14 11:25:26 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have 
all completed, from pool 
15/07/14 11:25:26 INFO DAGScheduler: Job 0 finished: json at Example.java:23, 
took 6.981330 s
Exception in thread "main" scala.MatchError: StringType (of class 
org.apache.spark.sql.types.StringType$)
at org.apache.spark.sql.json.InferSchema$.apply(InferSchema.scala:58)
at 
org.apache.spark.sql.json.JSONRelation$$anonfun$schema$1.apply(JSONRelation.scala:139)
at