"At LinkedIn we use the InputFormat provided in contrib/hadoop-consumer to load the data for topics in daily and hourly partitions."
Sorry for my ignorance but what exactly do you mean by loading the data in daily and hourly partitions?
On 11/6/11 10:26 AM, Neha Narkhede wrote:
There should be no changes to the way you create topics to achieve this kind of HDFS data load for Kafka. At LinkedIn we use the InputFormat provided in contrib/hadoop-consumer to load the data for topics in daily and hourly partitions. These Hadoop jobs run every 10 mins or so. So the maximum delay of data being available from producer->Hadoop is around 10 mins. Thanks, Neha On Sun, Nov 6, 2011 at 8:45 AM, Mark<[email protected]> wrote:This is more of a general design question but what is the preferred way of importing logs from Kafka to HDFS when you want your data segmented by hour or day? Is there anyway to say "Import only this {hour|day} of logs" or does one need to create their topics around the way they would like to import them.. ie Topic: "search_logs/2011/11/06". If its the latter, is there any documentation/best practices on topic/key design? Thanks
