Is it possible you have blank lines in your input? Not that this should be an error condition, but it may be what's causing it.
On Wed, Jun 25, 2014 at 11:57 AM, durin <m...@simon-schaefer.net> wrote: > Hi Zongheng Yang, > > thanks for your response. Reading your answer, I did some more tests and > realized that analyzing very small parts of the dataset (which is ~130GB in > ~4.3M lines) works fine. > The error occurs when I analyze larger parts. Using 5% of the whole data, > the error is the same as posted before for certain TIDs. However, I get the > structure determined so far as a result when using 5%. > > The Spark WebUI shows the following: > > Job aborted due to stage failure: Task 6.0:11 failed 4 times, most recent > failure: Exception failure in TID 108 on host foo.bar.com: > com.fasterxml.jackson.databind.JsonMappingException: No content to map due > to end-of-input at [Source: java.io.StringReader@3697781f; line: 1, > column: > 1] > > com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:164) > > com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3029) > > com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:2971) > > com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2091) > > org.apache.spark.sql.json.JsonRDD$$anonfun$parseJson$1$$anonfun$apply$5.apply(JsonRDD.scala:261) > > org.apache.spark.sql.json.JsonRDD$$anonfun$parseJson$1$$anonfun$apply$5.apply(JsonRDD.scala:261) > scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > scala.collection.Iterator$class.foreach(Iterator.scala:727) > scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > > scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:172) > scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1157) > org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:823) > org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:821) > org.apache.spark.SparkContext$$anonfun$24.apply(SparkContext.scala:1132) > org.apache.spark.SparkContext$$anonfun$24.apply(SparkContext.scala:1132) > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:112) > org.apache.spark.scheduler.Task.run(Task.scala:51) > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > java.lang.Thread.run(Thread.java:662) Driver stacktrace: > > > > Is the only possible reason that some of these 4.3 Million JSON-Objects are > not valid JSON, or could there be another explanation? > And if it is the reason, is there some way to tell the function to just > skip > faulty lines? > > > Thanks, > Durin > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/jsonFile-function-in-SQLContext-does-not-work-tp8273p8278.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >