Apologies, outlook for mac is ridiculous. Copy and paste the original below:
-
I’m running into a strange issue with trying to use a custom Log4j layout for
Spark (1.6.1) on YARN (CDH). The layout is:
https://github.com/michaeltandy/log4j-json
If I use a log4j.properties file (supplied
Thanks for this tip.
I ran it in yarn-client mode with driver-memory = 4G and took a dump once the
heap got close to 4G.
num#instances #bytes class name
--
1: 446169 3661137256 [J
2: 2032795 222636720
I have a spark v.1.4.1 on YARN job where the first stage has ~149,000 tasks
(it’s reading a few TB of data). The job itself is fairly simple - it’s just
getting a list of distinct values:
val days = spark
.sequenceFile(inputDir, classOf[KeyClass], classOf[ValueClass])
I should have mentioned: yes I am using Kryo and have registered KeyClass and
ValueClass.
I guess it’s not clear to me what is actually taking up space on the driver
heap - I can’t see how it can be data with the code that I have.
On 27/08/2015 12:09, Ewan Leith ewan.le...@realitymine.com
I've found a strange issue when trying to sort a lot of data in HDFS using
spark 1.2.0 (CDH5.3.0). My data is in sequencefiles and the key is a class
that derives from BytesWritable (the value is also a BytesWritable). I'm
using a custom KryoSerializer to serialize the underlying byte array