Hi, We are reading whole file in memory around 5 MB, which is send through Kafaka to Storm. In next bolt, we have a bolt which performs the operation on file and sends out tuple to next bolt. After profiling we found that file (bytes of file) does not get garbage collected. So after further investigation we found that backtype.storm.coordination.CoordinatedBolt.CoordinatedOutputCollector.emit(String, Collection<Tuple>, List<Object>) API gets the first object and use it for tracking :(. Can you confirm reason behind this? Is there any way we can send different unique id as first element in list or the unique id of tuple used as indicator.
However, for time being we have made changes in schema assigned to KafkaSpout, so that it will parse the file and send out list of values. Can you also explain why the list approach is used instead of map as we do declare the out fiels in getOutputFields() API Thanks, Sachin
