It sounds like your source is in avro? Or you want to transform your logs to avro?

> Or would I use the raw output format, logging serialized AVRO data in =
> the message body and analyze it later in Hadoop?

I don't see any problem.

> Are there any problems with this? I could imagine that this won't work =
> because hadoop is splitting after 64 mb?

hdfs block size should be transparent to users. You wouldn't be aware of it at all. If you write avro to hdfs, I can imagine that later on you will parse the avro file(s) with a map/reduce job and do whatever you want. I don't see why you need to bother for the 64mb block size. Or i missed anything?

Is the link helpful?
http://www.datasalt.com/blog/2011/07/hadoop-avro/



On 10/25/2011 09:02 AM, Tobias Schlottke wrote:
Hi there,

sorry for the newbie question.
I really want to write Logging data in a custom AVRO schema.
Is it possible to extend the standard schema?
Or would I use the raw output format, logging serialized AVRO data in =
the message body and analyze it later in Hadoop?
Are there any problems with this? I could imagine that this won't work =
because hadoop is splitting after 64 mb?
Do we have to implement a custom source?
What is the most elegant solution for this?

Best,

Tobias

Reply via email to