Re: debug jsonRDD problem?

2015-05-28 Thread Michael Stone

On Wed, May 27, 2015 at 02:06:16PM -0700, Ted Yu wrote:

Looks like the exception was caused by resolved.get(prefix ++ a) returning None
:
        a => StructField(a.head, resolved.get(prefix ++ a).get, nullable =
true)

There are three occurrences of resolved.get() in createSchema() - None should
be better handled in these places.

My two cents.


Here's the simplest test case I've come up with:

sqlContext.jsonRDD(sc.parallelize(Array("{\"'```'\":\"\"}"))).count()

Mike Stone

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: debug jsonRDD problem?

2015-05-27 Thread Ted Yu
Looks like the exception was caused by resolved.get(prefix ++ a) returning
None :
a => StructField(a.head, resolved.get(prefix ++ a).get, nullable =
true)

There are three occurrences of resolved.get() in createSchema() - None
should be better handled in these places.

My two cents.

On Wed, May 27, 2015 at 1:46 PM, Michael Stone  wrote:

> On Wed, May 27, 2015 at 01:13:43PM -0700, Ted Yu wrote:
>
>> Can you tell us a bit more about (schema of) your JSON ?
>>
>
> It's fairly simple, consisting of 22 fields with values that are mostly
> strings or integers, except that some of the fields are objects
> with http header/value pairs. I'd guess it's something in those latter
> fields that is causing the problems. The data is 800M rows that I didn't
> create in the first place and I'm in the process of making a simpler test
> case. What I was mostly wondering is if there were an obvious mechanism
> that I'm just missing to get jsonRDD to spit out more information about
> which specific rows it's having problems with.
>
>  You can find sample JSON
>> in sql/core/src/test//scala/org/apache/spark/sql/json/
>> TestJsonData.scala
>>
>
> I know the jsonRDD works in general, I've used it before without problems.
> It even works on subsets of this data.
> Mike Stone
>


Re: debug jsonRDD problem?

2015-05-27 Thread Michael Stone

On Wed, May 27, 2015 at 01:13:43PM -0700, Ted Yu wrote:

Can you tell us a bit more about (schema of) your JSON ?


It's fairly simple, consisting of 22 fields with values that are mostly 
strings or integers, except that some of the fields are objects
with http header/value pairs. I'd guess it's something in those latter 
fields that is causing the problems. The data is 800M rows that I didn't 
create in the first place and I'm in the process of making a simpler 
test case. What I was mostly wondering is if there were an obvious 
mechanism that I'm just missing to get jsonRDD to spit out more 
information about which specific rows it's having problems with.



You can find sample JSON in sql/core/src/test//scala/org/apache/spark/sql/json/
TestJsonData.scala


I know the jsonRDD works in general, I've used it before without 
problems. It even works on subsets of this data. 


Mike Stone

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: debug jsonRDD problem?

2015-05-27 Thread Ted Yu
Can you tell us a bit more about (schema of) your JSON ?

You can find sample JSON
in sql/core/src/test//scala/org/apache/spark/sql/json/TestJsonData.scala

Cheers

On Wed, May 27, 2015 at 12:33 PM, Michael Stone  wrote:

> Can anyone provide some suggestions on how to debug this? Using spark
> 1.3.1. The json itself seems to be valid (other programs can parse it) and
> the problem seems to lie in jsonRDD trying to describe & use a schema.
>
> scala> sqlContext.jsonRDD(rdd).count()
> java.util.NoSuchElementException: None.get
>at scala.None$.get(Option.scala:313)
>at scala.None$.get(Option.scala:311)
>at
> org.apache.spark.sql.json.JsonRDD$$anonfun$14.apply(JsonRDD.scala:105)
>at
> org.apache.spark.sql.json.JsonRDD$$anonfun$14.apply(JsonRDD.scala:101)
>at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
>at
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>at
> org.apache.spark.sql.json.JsonRDD$.org$apache$spark$sql$json$JsonRDD$$makeStruct$1(JsonRDD.scala:101)
>at
> org.apache.spark.sql.json.JsonRDD$$anonfun$14.apply(JsonRDD.scala:104)
>at
> org.apache.spark.sql.json.JsonRDD$$anonfun$14.apply(JsonRDD.scala:101)
>at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>at scala.collection.immutable.Map$Map2.foreach(Map.scala:130)
>at
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>at
> org.apache.spark.sql.json.JsonRDD$.org$apache$spark$sql$json$JsonRDD$$makeStruct$1(JsonRDD.scala:101)
>at
> org.apache.spark.sql.json.JsonRDD$.createSchema(JsonRDD.scala:132)
>at org.apache.spark.sql.json.JsonRDD$.inferSchema(JsonRDD.scala:56)
>at org.apache.spark.sql.SQLContext.jsonRDD(SQLContext.scala:635)
>at org.apache.spark.sql.SQLContext.jsonRDD(SQLContext.scala:581)
>[...]
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>