Re: NoSuchMethodError in KafkaReciever

2014-07-08 Thread Michael Chang
To be honest I'm a scala newbie too. I just copied it from createStream. I assume it's the canonical way to convert a java map (JMap) to a scala map (Map) On Mon, Jul 7, 2014 at 1:40 PM, mcampbell michael.campb...@gmail.com wrote: xtrahotsauce wrote I had this same problem as well. I

Re: Spilled shuffle files not being cleared

2014-06-13 Thread Michael Chang
old un-used shuffle data when it is timeout. For Spark 1.0 another way is to clean shuffle data using weak reference (reference tracking based, configuration is spark.cleaner.referenceTracking), and it is enabled by default. Thanks Saisai *From:* Michael Chang [mailto:m

Re: How to achieve reasonable performance on Spark Streaming?

2014-06-13 Thread Michael Chang
I'm interested in this issue as well. I have spark streaming jobs that seems to run well for a while, but slowly degrade and don't recover. On Wed, Jun 11, 2014 at 11:08 PM, Boduo Li onpo...@gmail.com wrote: It seems that the slow reduce tasks are caused by slow shuffling. Here is the logs

Re: Spilled shuffle files not being cleared

2014-06-12 Thread Michael Chang
Bump On Mon, Jun 9, 2014 at 3:22 PM, Michael Chang m...@tellapart.com wrote: Hi all, I'm seeing exceptions that look like the below in Spark 0.9.1. It looks like I'm running out of inodes on my machines (I have around 300k each in a 12 machine cluster). I took a quick look and I'm seeing

Re: NoSuchMethodError in KafkaReciever

2014-06-10 Thread Michael Chang
I had this same problem as well. I ended up just adding the necessary code in KafkaUtil and compiling my own spark jar. Something like this for the raw stream: def createRawStream( jssc: JavaStreamingContext, kafkaParams: JMap[String, String], topics: JMap[String, JInt]

Spilled shuffle files not being cleared

2014-06-09 Thread Michael Chang
Hi all, I'm seeing exceptions that look like the below in Spark 0.9.1. It looks like I'm running out of inodes on my machines (I have around 300k each in a 12 machine cluster). I took a quick look and I'm seeing some shuffle spill files that are around even around 12 minutes after they are

Using log4j.xml

2014-06-04 Thread Michael Chang
Has anyone tried to use a log4j.xml instead of a log4j.properties with spark 0.9.1? I'm trying to run spark streaming on yarn and i've set the environment variable SPARK_LOG4J_CONF to a log4j.xml file instead of a log4j.properties file, but spark seems to be using the default log4j.properties

Re: NoSuchElementException: key not found

2014-06-03 Thread Michael Chang
have added a JIRA for this. The fix is not trivial though. https://issues.apache.org/jira/browse/SPARK-2002 A not-so-good workaround for now would be not use coalesced RDD, which is avoids the race condition. TD On Tue, Jun 3, 2014 at 10:09 AM, Michael Chang m...@tellapart.com wrote: I

Re: Failed to remove RDD error

2014-06-02 Thread Michael Chang
@mayur_rustagi https://twitter.com/mayur_rustagi On Sat, May 31, 2014 at 6:52 AM, Michael Chang m...@tellapart.com wrote: I'm running a some kafka streaming spark contexts (on 0.9.1), and they seem to be dying after 10 or so minutes with a lot of these errors. I can't really tell what's

NoSuchElementException: key not found

2014-06-02 Thread Michael Chang
Hi all, Seeing a random exception kill my spark streaming job. Here's a stack trace: java.util.NoSuchElementException: key not found: 32855 at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:58) at

Failed to remove RDD error

2014-05-30 Thread Michael Chang
I'm running a some kafka streaming spark contexts (on 0.9.1), and they seem to be dying after 10 or so minutes with a lot of these errors. I can't really tell what's going on here, except that maybe the driver is unresponsive somehow? Has anyone seen this before? 14/05/31 01:13:30 ERROR