Getting a big memory hit

Sachin Pasalkar Sat, 22 Aug 2015 09:16:20 -0700

Hi,

 We are reading whole file in memory around 5 MB, which is send through Kafaka 
to Storm. In next bolt, we have a bolt which performs the operation on file and 
sends out tuple to next bolt. After profiling we found that file (bytes of 
file) does not get garbage collected. So after further investigation we found 
that  
backtype.storm.coordination.CoordinatedBolt.CoordinatedOutputCollector.emit(String,
 Collection<Tuple>, List<Object>) API gets the first object and use it for 
tracking :(. Can you confirm reason behind this? Is there any way we can send 
different unique id as first element in list or the unique id of tuple used as 
indicator.


However, for time being we have made changes in schema assigned to KafkaSpout, 
so that it will parse the file and send out list of values. Can you also 
explain why the list approach is used instead of map as we do declare the out 
fiels in getOutputFields() API

Thanks,
Sachin

Getting a big memory hit

Reply via email to