Could you change MEMORY_ONLY_SER to MEMORY_AND_DISK_SER_2 and see if this
still happens? It may be because you don't have enough memory to cache the
events.
On Thu, Jan 14, 2016 at 4:06 PM, Lin Zhao wrote:
> Hi,
>
> I'm testing spark streaming with actor receiver. The actor
PM
To: Lin Zhao <l...@exabeam.com<mailto:l...@exabeam.com>>
Cc: user <user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: Spark Streaming: custom actor receiver losing vast majority of data
MEMORY_AND_DISK_SER_2
<mailto:shixi...@databricks.com>>
Date: Thursday, January 14, 2016 at 4:41 PM
To: Lin Zhao <l...@exabeam.com<mailto:l...@exabeam.com>>
Cc: user <user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: Spark Streaming: custom actor receiver losing vast majority of data
Hi,
I'm testing spark streaming with actor receiver. The actor keeps calling
store() to save a pair to Spark.
Once the job is launched, on the UI everything looks good. Millions of events
gets through every batch. However, I added logging to the first step and found
that only 20 or 40 events
ger: Removing RDD 40
> 16/01/15 00:31:31 INFO storage.BlockManager: Removing RDD 39
>
>
> From: "Shixiong(Ryan) Zhu" <shixi...@databricks.com>
> Date: Thursday, January 14, 2016 at 4:13 PM
> To: Lin Zhao <l...@exabeam.com>
> Cc: user <user@spark.apache.org>
> Subject: Re: Spark Streaming: custom actor receiver losing vast majority
> of data
>
> MEMORY_AND_DISK_SER_2
>