I have a job that is running into intermittent errors with [SparkDriver]
java.lang.OutOfMemoryError: Java heap space. Before I was getting this
error I was getting errors saying the result size exceed the
spark.driver.maxResultSize.
This does not make any sense to me, as there are no actions in
Hi,
I have a problem trying to get a fairly simple app working which makes use
of native avro libraries. The app runs fine on my local machine and in
yarn-cluster mode, but when I try to run it on EMR yarn-client mode I get
the error below. I'm aware this is a version problem, as EMR runs an
t; Aniket
>
> On Wed, Sep 9, 2015, 2:11 PM Tom Seddon <mr.tom.sed...@gmail.com> wrote:
>
>> Hi,
>>
>> I have a problem trying to get a fairly simple app working which makes
>> use of native avro libraries. The app runs fine on my local machine and in
>
at 12:05 PM Tom Seddon mr.tom.sed...@gmail.com wrote:
Hi,
I've worked out how to use explode on my input avro dataset with the
following structure
root
|-- pageViewId: string (nullable = false)
|-- components: array (nullable = true)
||-- element: struct (containsNull = false
Hi,
I've worked out how to use explode on my input avro dataset with the
following structure
root
|-- pageViewId: string (nullable = false)
|-- components: array (nullable = true)
||-- element: struct (containsNull = false)
|||-- name: string (nullable = false)
|||--
Hi,
I've searched but can't seem to find a PySpark example. How do I write
compressed text file output to S3 using PySpark saveAsTextFile?
Thanks,
Tom
I'm trying to set up a PySpark ETL job that takes in JSON log files and
spits out fact table files for upload to Redshift. Is there an efficient
way to send different event types to different outputs without having to
just read the same cached RDD twice? I have my first RDD which is just a
json
Hi,
Just wondering if anyone has any advice about this issue, as I am
experiencing the same thing. I'm working with multiple broadcast variables
in PySpark, most of which are small, but one of around 4.5GB, using 10
workers at 31GB memory each and driver with same spec. It's not running
out of
Yes please can you share. I am getting this error after expanding my
application to include a large broadcast variable. Would be good to know if
it can be fixed with configuration.
On 23 October 2014 18:04, Michael Campbell michael.campb...@gmail.com
wrote:
Can you list what your fix was so