Is it possible you have blank lines in your input? Not that this should be
an error condition, but it may be what's causing it.


On Wed, Jun 25, 2014 at 11:57 AM, durin <m...@simon-schaefer.net> wrote:

> Hi Zongheng Yang,
>
> thanks for your response. Reading your answer, I did some more tests and
> realized that analyzing very small parts of the dataset (which is ~130GB in
> ~4.3M lines) works fine.
> The error occurs when I analyze larger parts. Using 5% of the whole data,
> the error is the same as posted before for certain TIDs. However, I get the
> structure determined so far as a result when using 5%.
>
> The Spark WebUI shows the following:
>
> Job aborted due to stage failure: Task 6.0:11 failed 4 times, most recent
> failure: Exception failure in TID 108 on host foo.bar.com:
> com.fasterxml.jackson.databind.JsonMappingException: No content to map due
> to end-of-input at [Source: java.io.StringReader@3697781f; line: 1,
> column:
> 1]
>
> com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:164)
>
> com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3029)
>
> com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:2971)
>
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2091)
>
> org.apache.spark.sql.json.JsonRDD$$anonfun$parseJson$1$$anonfun$apply$5.apply(JsonRDD.scala:261)
>
> org.apache.spark.sql.json.JsonRDD$$anonfun$parseJson$1$$anonfun$apply$5.apply(JsonRDD.scala:261)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> scala.collection.Iterator$class.foreach(Iterator.scala:727)
> scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>
> scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:172)
> scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1157)
> org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:823)
> org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:821)
> org.apache.spark.SparkContext$$anonfun$24.apply(SparkContext.scala:1132)
> org.apache.spark.SparkContext$$anonfun$24.apply(SparkContext.scala:1132)
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:112)
> org.apache.spark.scheduler.Task.run(Task.scala:51)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> java.lang.Thread.run(Thread.java:662) Driver stacktrace:
>
>
>
> Is the only possible reason that some of these 4.3 Million JSON-Objects are
> not valid JSON, or could there be another explanation?
> And if it is the reason, is there some way to tell the function to just
> skip
> faulty lines?
>
>
> Thanks,
> Durin
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/jsonFile-function-in-SQLContext-does-not-work-tp8273p8278.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Reply via email to