[ https://issues.apache.org/jira/browse/FLUME-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966424#comment-14966424 ]
Gonzalo Herreros commented on FLUME-2818: ----------------------------------------- That avro is generated by the TwitterSource and gets corrupted when you try to store it as text in the sink. I see two solutions: - You could remove the line TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text and parse the hdfs files as avro - Build an interceptor for the source that converts avro into json > Problems with Avro data and not Json and no data in HDFS > -------------------------------------------------------- > > Key: FLUME-2818 > URL: https://issues.apache.org/jira/browse/FLUME-2818 > Project: Flume > Issue Type: Request > Components: Sinks+Sources > Affects Versions: v1.5.2 > Environment: HDP-2.3.0.0-2557 Sandbox > Reporter: Kettler Karl > Priority: Critical > Fix For: v1.5.2 > > > Flume supplies twitter data in avro format and not in Json. > Why? > Flume Config Agent: > TwitterAgent.sources = Twitter > TwitterAgent.channels = MemChannel > TwitterAgent.sinks = HDFS > TwitterAgent.sources.Twitter.type = > org.apache.flume.source.twitter.TwitterSource > TwitterAgent.sources.Twitter.channels = MemChannel > TwitterAgent.sources.Twitter.consumerKey = xxx > TwitterAgent.sources.Twitter.consumerSecret = xxx > TwitterAgent.sources.Twitter.accessToken = xxx > TwitterAgent.sources.Twitter.accessTokenSecret = xxx > TwitterAgent.sources.Twitter.maxBatchSize = 10 > TwitterAgent.sources.Twitter.maxBatchDurationMillis = 200 > TwitterAgent.sources.Twitter.keywords = United Nations > TwitterAgent.sources.Twitter.deserializer.schemaType = LITERAL > # HDFS Sink > TwitterAgent.sinks.HDFS.channel = MemChannel > TwitterAgent.sinks.HDFS.type = hdfs > TwitterAgent.sinks.HDFS.hdfs.path = /demo/tweets/stream/%y-%m-%d/%H%M%S > TwitterAgent.sinks.HDFS.hdfs.filePrefix = events > TwitterAgent.sinks.HDFS.hdfs.round = true > TwitterAgent.sinks.HDFS.hdfs.roundValue = 5 > TwitterAgent.sinks.HDFS.hdfs.roundUnit = minute > TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true > TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream > TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text > TwitterAgent.channels.MemChannel.type = memory > TwitterAgent.channels.MemChannel.capacity = 1000 > TwitterAgent.channels.MemChannel.transactionCapacity = 100 > Twitter Data from Flume: > Obj avro.schema� > {"type":"record","name":"Doc","doc":"adoc","fields":[{"name":"id","type":"string"},{"name":"user_friends_count","type":["int","null"]},{"name":"user_location","type":["string","null"]},{"name":"user_description","type":["string","null"]},{"name":"user_statuses_count","type":["int","null"]},{"name":"user_followers_count","type":["int","null"]},{"name":"user_name","type":["string","null"]},{"name":"user_screen_name","type":["string","null"]},{"name":"created_at","type":["string","null"]},{"name":"text","type":["string","null"]},{"name":"retweet_count","type":["long","null"]},{"name":"retweeted","type":["boolean","null"]},{"name":"in_reply_to_user_id","type":["long","null"]},{"name":"source","type":["string","null"]},{"name":"in_reply_to_status_id","type":["long","null"]},{"name":"media_url_https","type":["string","null"]},{"name":"expanded_url","type":["string","null"]}]}�]3hˊى���|����$656461386520784896� > �お絵描きするショタコン/オタクまっしぐら。論破メインに雑食もぐもぐ/成人済み pixiv:323565 隔離:【@yh_u_】�n� ユハズ > yhzz_(2015-10-20T13:26:05Z� はじめた~リセマラめんどくさいし緑茶来たから普通にこのまま進める > https://t.co/ZpfDqw4l9g � <a href=" http://twitter.com" > rel="nofollow">Twitter Web Client</a> ^ > https://pbs.twimg.com/media/CRw4Js3UAAAGusn.pngthttp://twitter.com/yhzz_/status/656461386520784896/photo/1$656461390677417984� > <Mundo de las sombras (Cc,Extr)�#RP User de un agente del gobierno |20| Que > no me veais ni noteis mi presencia no quiere decir que no os este observando > desde las sombras�� � JKP® BakasumaUserSinCausa(2015-10-20T13:26:06Z� RT > @NaiiVicious: @Lisi_Hattori @UserSinCausa https://t.co/M2LTJWwqae � <a href=" > http://twitter.com/download/android" rel="nofollow">Twitter for Android</a> ^ > https://pbs.twimg.com/media/CRthC1mWUAIFTF-.jpg� > http://twitter.com/NaiiVicious/status/656224896297529344/photo/1�]3hˊى���|��� > By loading this twitter data into a HDFS table. It is not possible to convert > with avro-tools-1.7.7.jar. into Json. We get error message: "No data" > If we want to read this file we get following error message: > "java -jar avro-tools-1.7.7.jar tojson twitter.avro > twitter.json > Exception in thread "main" org.apache.avro.AvroRuntimeException: > java.io.EOFException" > I hope you could help us. > Kind regards, > Karl > > > Details > -- This message was sent by Atlassian JIRA (v6.3.4#6332)