There is a PR to fix this: https://github.com/apache/spark/pull/1802
On Tue, Aug 5, 2014 at 10:11 PM, Brad Miller bmill...@eecs.berkeley.edu wrote:
I concur that printSchema works; it just seems to be operations that use the
data where trouble happens.
Thanks for posting the bug.
-Brad
Nice catch Brad and thanks to Yin and Davies for getting on it so quickly.
On Wed, Aug 6, 2014 at 2:45 AM, Davies Liu dav...@databricks.com wrote:
There is a PR to fix this: https://github.com/apache/spark/pull/1802
On Tue, Aug 5, 2014 at 10:11 PM, Brad Miller bmill...@eecs.berkeley.edu
Hi All,
I am interested to use jsonRDD and jsonFile to create a SchemaRDD out of
some JSON data I have, but I've run into some instability involving the
following java exception:
An error occurred while calling o1326.collect.
: org.apache.spark.SparkException: Job aborted due to stage failure:
I believe this is a known issue in 1.0.1 that's fixed in 1.0.2.
See: SPARK-2376: Selecting list values inside nested JSON objects raises
java.lang.IllegalArgumentException
https://issues.apache.org/jira/browse/SPARK-2376
On Tue, Aug 5, 2014 at 2:55 PM, Brad Miller bmill...@eecs.berkeley.edu
Is this on 1.0.1? I'd suggest running this on master or the 1.1-RC which
should be coming out this week. Pyspark did not have good support for
nested data previously. If you still encounter issues using a more recent
version, please file a JIRA. Thanks!
On Tue, Aug 5, 2014 at 11:55 AM, Brad
Nick: Thanks for both the original JIRA bug report and the link.
Michael: This is on the 1.0.1 release. I'll update to master and follow-up
if I have any problems.
best,
-Brad
On Tue, Aug 5, 2014 at 12:04 PM, Michael Armbrust mich...@databricks.com
wrote:
Is this on 1.0.1? I'd suggest
Hi All,
I've built and deployed the current head of branch-1.0, but it seems to
have only partly fixed the bug.
This code now runs as expected with the indicated output:
srdd = sqlCtx.jsonRDD(sc.parallelize(['{foo:[1,2,3]}',
'{foo:[4,5,6]}']))
srdd.printSchema()
root
|-- foo:
This looks to be fixed in master:
from pyspark.sql import SQLContext sqlContext = SQLContext(sc)
sc.parallelize(['{foo:[[1,2,3], [4,5,6]]}', '{foo:[[1,2,3], [4,5,6]]}'])
ParallelCollectionRDD[5] at parallelize at PythonRDD.scala:315
sqlContext.jsonRDD(sc.parallelize(['{foo:[[1,2,3],
Hi All,
I checked out and built master. Note that Maven had a problem building
Kafka (in my case, at least); I was unable to fix this easily so I moved on
since it seemed unlikely to have any influence on the problem at hand.
Master improves functionality (including the example Nicholas just
I tried jsonRDD(...).printSchema() and it worked. Seems the problem is when
we take the data back to the Python side, SchemaRDD#javaToPython failed on
your cases. I have created https://issues.apache.org/jira/browse/SPARK-2875
to track it.
Thanks,
Yin
On Tue, Aug 5, 2014 at 9:20 PM, Brad
I concur that printSchema works; it just seems to be operations that use
the data where trouble happens.
Thanks for posting the bug.
-Brad
On Tue, Aug 5, 2014 at 10:05 PM, Yin Huai yh...@databricks.com wrote:
I tried jsonRDD(...).printSchema() and it worked. Seems the problem is
when we
11 matches
Mail list logo