Hi Jerry,
Thanks for your response.
Is there a way to get the list of currently registered/live workers? Even in 
order to provide preferredLocation, it would be safer to know which workers are 
active. Guess I only need to provide the hostname, right?
Thanks,Du 

     On Wednesday, March 4, 2015 10:08 PM, "Shao, Saisai" 
<saisai.s...@intel.com> wrote:
   

 #yiv8205255497 #yiv8205255497 -- _filtered #yiv8205255497 
{font-family:Helvetica;panose-1:2 11 6 4 2 2 2 2 2 4;} _filtered #yiv8205255497 
{font-family:SimSun;panose-1:2 1 6 0 3 1 1 1 1 1;} _filtered #yiv8205255497 
{panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv8205255497 
{font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered #yiv8205255497 
{panose-1:2 1 6 0 3 1 1 1 1 1;}#yiv8205255497 #yiv8205255497 
p.yiv8205255497MsoNormal, #yiv8205255497 li.yiv8205255497MsoNormal, 
#yiv8205255497 div.yiv8205255497MsoNormal 
{margin:0in;margin-bottom:.0001pt;font-size:12.0pt;}#yiv8205255497 a:link, 
#yiv8205255497 span.yiv8205255497MsoHyperlink 
{color:#0563C1;text-decoration:underline;}#yiv8205255497 a:visited, 
#yiv8205255497 span.yiv8205255497MsoHyperlinkFollowed 
{color:#954F72;text-decoration:underline;}#yiv8205255497 
span.yiv8205255497EmailStyle17 {color:#1F497D;}#yiv8205255497 
.yiv8205255497MsoChpDefault {font-size:10.0pt;} _filtered #yiv8205255497 
{margin:1.0in 1.0in 1.0in 1.0in;}#yiv8205255497 div.yiv8205255497WordSection1 
{}#yiv8205255497 Hi Du,    You could try to sleep for several seconds after 
creating streaming context to let all the executors registered, then all the 
receivers can distribute to the nodes more evenly. Also setting locality is 
another way as you mentioned.    Thanks Jerry       From: Du Li 
[mailto:l...@yahoo-inc.com.INVALID]
Sent: Thursday, March 5, 2015 1:50 PM
To: User
Subject: Re: distribution of receivers in spark streaming    Figured it out: I 
need to override method preferredLocation() in MyReceiver class.    On 
Wednesday, March 4, 2015 3:35 PM, Du Li <l...@yahoo-inc.com.INVALID> wrote:    
Hi,    I have a set of machines (say 5) and want to evenly launch a number (say 
8) of kafka receivers on those machines. In my code I did something like the 
following, as suggested in the spark docs:         val streams = (1 to 
numReceivers).map(_ => ssc.receiverStream(new MyKafkaReceiver()))         
ssc.union(streams)    However, from the spark UI, I saw that some machines are 
not running any instance of the receiver while some get three. The mapping 
changed every time the system was restarted. This impacts the receiving and 
also the processing speeds.    I wonder if it's possible to control/suggest the 
distribution so that it would be more even. How is the decision made in spark?  
  Thanks, Du          

   

Reply via email to