luochenghui created SPARK-6659:
----------------------------------

             Summary: Spark SQL 1.3 cannot read json file that only with a 
record.
                 Key: SPARK-6659
                 URL: https://issues.apache.org/jira/browse/SPARK-6659
             Project: Spark
          Issue Type: Bug
            Reporter: luochenghui


Dear friends:
 
Spark SQL 1.3 cannot read json file that only with a record.
here is my json file's content.
{"name":"milo","age",24}
 
when i run Spark SQL under the local mode,it throws an exception
rg.apache.spark.sql.AnalysisException: cannot resolve 'name' given input 
columns _corrupt_record;
 
what i had done:
1  ./spark-shell
2 
scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
sqlContext: org.apache.spark.sql.SQLContext = 
org.apache.spark.sql.SQLContext@5f3be6c8
 
scala> val df = sqlContext.jsonFile("/home/milo/person.json")
15/03/19 22:11:45 INFO MemoryStore: ensureFreeSpace(163705) called with 
curMem=0, maxMem=280248975
15/03/19 22:11:45 INFO MemoryStore: Block broadcast_0 stored as values in 
memory (estimated size 159.9 KB, free 267.1 MB)
15/03/19 22:11:45 INFO MemoryStore: ensureFreeSpace(22692) called with 
curMem=163705, maxMem=280248975
15/03/19 22:11:45 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in 
memory (estimated size 22.2 KB, free 267.1 MB)
15/03/19 22:11:45 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 
localhost:35842 (size: 22.2 KB, free: 267.2 MB)
15/03/19 22:11:45 INFO BlockManagerMaster: Updated info of block 
broadcast_0_piece0
15/03/19 22:11:45 INFO SparkContext: Created broadcast 0 from textFile at 
JSONRelation.scala:98
15/03/19 22:11:47 INFO FileInputFormat: Total input paths to process : 1
15/03/19 22:11:47 INFO SparkContext: Starting job: reduce at JsonRDD.scala:51
15/03/19 22:11:47 INFO DAGScheduler: Got job 0 (reduce at JsonRDD.scala:51) 
with 1 output partitions (allowLocal=false)
15/03/19 22:11:47 INFO DAGScheduler: Final stage: Stage 0(reduce at 
JsonRDD.scala:51)
15/03/19 22:11:47 INFO DAGScheduler: Parents of final stage: List()
15/03/19 22:11:47 INFO DAGScheduler: Missing parents: List()
15/03/19 22:11:47 INFO DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[3] at 
map at JsonRDD.scala:51), which has no missing parents
15/03/19 22:11:47 INFO MemoryStore: ensureFreeSpace(3184) called with 
curMem=186397, maxMem=280248975
15/03/19 22:11:47 INFO MemoryStore: Block broadcast_1 stored as values in 
memory (estimated size 3.1 KB, free 267.1 MB)
15/03/19 22:11:47 INFO MemoryStore: ensureFreeSpace(2251) called with 
curMem=189581, maxMem=280248975
15/03/19 22:11:47 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in 
memory (estimated size 2.2 KB, free 267.1 MB)
15/03/19 22:11:47 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 
localhost:35842 (size: 2.2 KB, free: 267.2 MB)
15/03/19 22:11:47 INFO BlockManagerMaster: Updated info of block 
broadcast_1_piece0
15/03/19 22:11:47 INFO SparkContext: Created broadcast 1 from broadcast at 
DAGScheduler.scala:839
15/03/19 22:11:48 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 
(MapPartitionsRDD[3] at map at JsonRDD.scala:51)
15/03/19 22:11:48 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
15/03/19 22:11:48 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 
localhost, PROCESS_LOCAL, 1291 bytes)
15/03/19 22:11:48 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
15/03/19 22:11:48 INFO HadoopRDD: Input split: file:/home/milo/person.json:0+26
15/03/19 22:11:48 INFO deprecation: mapred.tip.id is deprecated. Instead, use 
mapreduce.task.id
15/03/19 22:11:48 INFO deprecation: mapred.task.id is deprecated. Instead, use 
mapreduce.task.attempt.id
15/03/19 22:11:48 INFO deprecation: mapred.task.is.map is deprecated. Instead, 
use mapreduce.task.ismap
15/03/19 22:11:48 INFO deprecation: mapred.task.partition is deprecated. 
Instead, use mapreduce.task.partition
15/03/19 22:11:48 INFO deprecation: mapred.job.id is deprecated. Instead, use 
mapreduce.job.id
15/03/19 22:11:49 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2023 
bytes result sent to driver
15/03/19 22:11:49 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) 
in 1209 ms on localhost (1/1)
15/03/19 22:11:49 INFO DAGScheduler: Stage 0 (reduce at JsonRDD.scala:51) 
finished in 1.308 s
15/03/19 22:11:49 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have 
all completed, from pool 
15/03/19 22:11:49 INFO DAGScheduler: Job 0 finished: reduce at 
JsonRDD.scala:51, took 2.002429 s
df: org.apache.spark.sql.DataFrame = [_corrupt_record: string]
 
3  
scala> df.select("name").show()
15/03/19 22:12:44 INFO BlockManager: Removing broadcast 1
15/03/19 22:12:44 INFO BlockManager: Removing block broadcast_1_piece0
15/03/19 22:12:44 INFO MemoryStore: Block broadcast_1_piece0 of size 2251 
dropped from memory (free 280059394)
15/03/19 22:12:44 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 
localhost:35842 in memory (size: 2.2 KB, free: 267.2 MB)
15/03/19 22:12:44 INFO BlockManagerMaster: Updated info of block 
broadcast_1_piece0
15/03/19 22:12:44 INFO BlockManager: Removing block broadcast_1
15/03/19 22:12:44 INFO MemoryStore: Block broadcast_1 of size 3184 dropped from 
memory (free 280062578)
15/03/19 22:12:45 INFO ContextCleaner: Cleaned broadcast 1
15/03/19 22:12:45 INFO BlockManager: Removing broadcast 0
15/03/19 22:12:45 INFO BlockManager: Removing block broadcast_0
15/03/19 22:12:45 INFO MemoryStore: Block broadcast_0 of size 163705 dropped 
from memory (free 280226283)
15/03/19 22:12:45 INFO BlockManager: Removing block broadcast_0_piece0
15/03/19 22:12:45 INFO MemoryStore: Block broadcast_0_piece0 of size 22692 
dropped from memory (free 280248975)
15/03/19 22:12:45 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 
localhost:35842 in memory (size: 22.2 KB, free: 267.3 MB)
15/03/19 22:12:45 INFO BlockManagerMaster: Updated info of block 
broadcast_0_piece0
15/03/19 22:12:45 INFO ContextCleaner: Cleaned broadcast 0
org.apache.spark.sql.AnalysisException: cannot resolve 'name' given input 
columns _corrupt_record;
 at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
 at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$apply$3$$anonfun$apply$1.applyOrElse(CheckAnalysis.scala:48)
 at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$apply$3$$anonfun$apply$1.applyOrElse(CheckAnalysis.scala:45)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:250)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:250)
 at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:50)
 at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:249)
 at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$transformExpressionUp$1(QueryPlan.scala:103)
 at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2$$anonfun$apply$2.apply(QueryPlan.scala:117)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
 at scala.collection.AbstractTraversable.map(Traversable.scala:105)
 at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:116)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
 at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
 at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
 at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
 at scala.collection.AbstractIterator.to(Iterator.scala:1157)
 at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
 at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
 at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
 at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
 at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:121)
 at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$apply$3.apply(CheckAnalysis.scala:45)
 at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$apply$3.apply(CheckAnalysis.scala:43)
 at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:88)
 at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.apply(CheckAnalysis.scala:43)
 at 
org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:1069)
 at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:133)
 at org.apache.spark.sql.DataFrame.logicalPlanToDataFrame(DataFrame.scala:157)
 at org.apache.spark.sql.DataFrame.select(DataFrame.scala:465)
 at org.apache.spark.sql.DataFrame.select(DataFrame.scala:480)
 at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:26)
 at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
 at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33)
 at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:35)
 at $iwC$$iwC$$iwC$$iwC.<init>(<console>:37)
 at $iwC$$iwC$$iwC.<init>(<console>:39)
 at $iwC$$iwC.<init>(<console>:41)
 at $iwC.<init>(<console>:43)
 at <init>(<console>:45)
 at .<init>(<console>:49)
 at .<clinit>(<console>)
 at .<init>(<console>:7)
 at .<clinit>(<console>)
 at $print(<console>)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
 at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
 at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
 at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
 at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
 at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:856)
 at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:901)
 at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:813)
 at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:656)
 at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:664)
 at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:669)
 at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:996)
 at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:944)
 at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:944)
 at 
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
 at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:944)
 at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1058)
 at org.apache.spark.repl.Main$.main(Main.scala:31)
 at org.apache.spark.repl.Main.main(Main.scala)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
 at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
 at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
 

but i invoke df.show() ,it could work.
scala> df.show()
15/03/19 22:13:32 INFO MemoryStore: ensureFreeSpace(81443) called with 
curMem=0, maxMem=280248975
15/03/19 22:13:32 INFO MemoryStore: Block broadcast_2 stored as values in 
memory (estimated size 79.5 KB, free 267.2 MB)
15/03/19 22:13:32 INFO MemoryStore: ensureFreeSpace(31262) called with 
curMem=81443, maxMem=280248975
15/03/19 22:13:32 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in 
memory (estimated size 30.5 KB, free 267.2 MB)
15/03/19 22:13:32 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 
localhost:35842 (size: 30.5 KB, free: 267.2 MB)
15/03/19 22:13:32 INFO BlockManagerMaster: Updated info of block 
broadcast_2_piece0
15/03/19 22:13:32 INFO SparkContext: Created broadcast 2 from textFile at 
JSONRelation.scala:98
15/03/19 22:13:32 INFO FileInputFormat: Total input paths to process : 1
15/03/19 22:13:32 INFO SparkContext: Starting job: runJob at SparkPlan.scala:121
15/03/19 22:13:32 INFO DAGScheduler: Got job 1 (runJob at SparkPlan.scala:121) 
with 1 output partitions (allowLocal=false)
15/03/19 22:13:32 INFO DAGScheduler: Final stage: Stage 1(runJob at 
SparkPlan.scala:121)
15/03/19 22:13:32 INFO DAGScheduler: Parents of final stage: List()
15/03/19 22:13:32 INFO DAGScheduler: Missing parents: List()
15/03/19 22:13:32 INFO DAGScheduler: Submitting Stage 1 (MapPartitionsRDD[8] at 
map at SparkPlan.scala:96), which has no missing parents
15/03/19 22:13:32 INFO MemoryStore: ensureFreeSpace(3968) called with 
curMem=112705, maxMem=280248975
15/03/19 22:13:32 INFO MemoryStore: Block broadcast_3 stored as values in 
memory (estimated size 3.9 KB, free 267.2 MB)
15/03/19 22:13:32 INFO MemoryStore: ensureFreeSpace(2724) called with 
curMem=116673, maxMem=280248975
15/03/19 22:13:32 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in 
memory (estimated size 2.7 KB, free 267.2 MB)
15/03/19 22:13:32 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 
localhost:35842 (size: 2.7 KB, free: 267.2 MB)
15/03/19 22:13:32 INFO BlockManagerMaster: Updated info of block 
broadcast_3_piece0
15/03/19 22:13:32 INFO SparkContext: Created broadcast 3 from broadcast at 
DAGScheduler.scala:839
15/03/19 22:13:32 INFO DAGScheduler: Submitting 1 missing tasks from Stage 1 
(MapPartitionsRDD[8] at map at SparkPlan.scala:96)
15/03/19 22:13:32 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
15/03/19 22:13:32 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, 
localhost, PROCESS_LOCAL, 1291 bytes)
15/03/19 22:13:32 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
15/03/19 22:13:32 INFO HadoopRDD: Input split: file:/home/milo/person.json:0+26
15/03/19 22:13:33 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1968 
bytes result sent to driver
15/03/19 22:13:33 INFO DAGScheduler: Stage 1 (runJob at SparkPlan.scala:121) 
finished in 0.249 s
15/03/19 22:13:33 INFO DAGScheduler: Job 1 finished: runJob at 
SparkPlan.scala:121, took 0.381798 s
15/03/19 22:13:33 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) 
in 242 ms on localhost (1/1)
15/03/19 22:13:33 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have 
all completed, from pool 
_corrupt_record     
{"name":"milo","a...
 
And i tested another case with a json file more than one record,it ran success.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to