[jira] [Created] (FLUME-2819) Kafka libs are being bundled into Flume distro

2015-10-21 Thread Roshan Naik (JIRA)
Roshan Naik created FLUME-2819:
--

 Summary: Kafka libs are being bundled into Flume distro
 Key: FLUME-2819
 URL: https://issues.apache.org/jira/browse/FLUME-2819
 Project: Flume
  Issue Type: Bug
Reporter: Roshan Naik


Kafka dependency libs need to be marked as 'provided' in the pom.xml 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLUME-2792) Flume Kafka Kerberos Support

2015-10-21 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967495#comment-14967495
 ] 

Roshan Naik edited comment on FLUME-2792 at 10/21/15 5:23 PM:
--

opened FLUME-2819 to track the kafka libs issue noted above


was (Author: roshan_naik):
opened FLUME-2819

> Flume Kafka Kerberos Support
> 
>
> Key: FLUME-2792
> URL: https://issues.apache.org/jira/browse/FLUME-2792
> Project: Flume
>  Issue Type: Bug
>  Components: Configuration, Docs, Sinks+Sources
>Affects Versions: v1.6.0, v1.5.2
> Environment: HDP 2.3 fully kerberized including Kafka 0.8.2.2 + Flume 
> 1.5.2 or Apache Flume 1.6 downloaded from apache.org
>Reporter: Hari Sekhon
>Priority: Blocker
>
> Following on from FLUME-2790 it appears as though Flume doesn't yet have 
> support for Kafka + Kerberos as there are is no setting documented in the 
> Flume 1.6.0 user guide under the Kafka source section to tell Flume to use 
> plaintextsasl as the connection mechanism to Kafka and Kafka rejects 
> unauthenticated plaintext mechanism:
> {code}15/09/10 16:51:22 INFO consumer.ConsumerFetcherManager: 
> [ConsumerFetcherManager-1441903874830] Added fetcher for partitions 
> ArrayBuffer()
> 15/09/10 16:51:22 WARN consumer.ConsumerFetcherManager$LeaderFinderThread: 
> [flume_-1441903874763-abdc98ec-leader-finder-thread], Failed 
> to find leader for Set([,0], [,1])
> kafka.common.BrokerEndPointNotAvailableException: End point PLAINTEXT not 
> found for broker 0
> at kafka.cluster.Broker.getBrokerEndPoint(Broker.scala:140)
> at 
> kafka.utils.ZkUtils$$anonfun$getAllBrokerEndPointsForChannel$1.apply(ZkUtils.scala:124)
> at 
> kafka.utils.ZkUtils$$anonfun$getAllBrokerEndPointsForChannel$1.apply(ZkUtils.scala:124)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
> at 
> kafka.utils.ZkUtils$.getAllBrokerEndPointsForChannel(ZkUtils.scala:124)
> at 
> kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66)
> at 
> kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60){code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2792) Flume Kafka Kerberos Support

2015-10-21 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967495#comment-14967495
 ] 

Roshan Naik commented on FLUME-2792:


opened FLUME-2819

> Flume Kafka Kerberos Support
> 
>
> Key: FLUME-2792
> URL: https://issues.apache.org/jira/browse/FLUME-2792
> Project: Flume
>  Issue Type: Bug
>  Components: Configuration, Docs, Sinks+Sources
>Affects Versions: v1.6.0, v1.5.2
> Environment: HDP 2.3 fully kerberized including Kafka 0.8.2.2 + Flume 
> 1.5.2 or Apache Flume 1.6 downloaded from apache.org
>Reporter: Hari Sekhon
>Priority: Blocker
>
> Following on from FLUME-2790 it appears as though Flume doesn't yet have 
> support for Kafka + Kerberos as there are is no setting documented in the 
> Flume 1.6.0 user guide under the Kafka source section to tell Flume to use 
> plaintextsasl as the connection mechanism to Kafka and Kafka rejects 
> unauthenticated plaintext mechanism:
> {code}15/09/10 16:51:22 INFO consumer.ConsumerFetcherManager: 
> [ConsumerFetcherManager-1441903874830] Added fetcher for partitions 
> ArrayBuffer()
> 15/09/10 16:51:22 WARN consumer.ConsumerFetcherManager$LeaderFinderThread: 
> [flume_-1441903874763-abdc98ec-leader-finder-thread], Failed 
> to find leader for Set([,0], [,1])
> kafka.common.BrokerEndPointNotAvailableException: End point PLAINTEXT not 
> found for broker 0
> at kafka.cluster.Broker.getBrokerEndPoint(Broker.scala:140)
> at 
> kafka.utils.ZkUtils$$anonfun$getAllBrokerEndPointsForChannel$1.apply(ZkUtils.scala:124)
> at 
> kafka.utils.ZkUtils$$anonfun$getAllBrokerEndPointsForChannel$1.apply(ZkUtils.scala:124)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
> at 
> kafka.utils.ZkUtils$.getAllBrokerEndPointsForChannel(ZkUtils.scala:124)
> at 
> kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66)
> at 
> kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60){code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2632) High CPU on KafkaSink

2015-10-21 Thread Mike Lerch (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967664#comment-14967664
 ] 

Mike Lerch commented on FLUME-2632:
---

We're using the KafkaSink in a larger production environment, and this bug eats 
up 6 cores at 100% on each instance until the first messages arrive. Would be 
nice to see this fixed.

> High CPU on KafkaSink
> -
>
> Key: FLUME-2632
> URL: https://issues.apache.org/jira/browse/FLUME-2632
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>Affects Versions: v1.5.1
>Reporter: Gwen Shapira
>Assignee: Gwen Shapira
>  Labels: kafka
> Fix For: v1.6.0
>
> Attachments: FLUME-2632-0.patch
>
>
> Reported here: https://github.com/harishreedharan/flume/issues/1
> "I tried flume-ng-kafka-sink and it worked fine. But I noticed that the cpu 
> utilization stay at 100% and never dropped down all the time even at the time 
> the channel is empty.
> I looked into the source code and found that "process" function in KafkaSink 
> always return Status.READY even if no events available in channel. That 
> causes the SinkRunner keep running achieving event from channel and get 
> nothing.
> Do we need to change to return Status.BACKOFF in "process" function in 
> KafkaSink when it notices that there is no events processed in current round? 
> So that the SinkRunner has a chance to take a rest when there is no event in 
> channel. If this proposal feasible, function "testEmptyChannel" in 
> TestKafkaSink also need to be changed. "



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2632) High CPU on KafkaSink

2015-10-21 Thread Johny Rufus (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967677#comment-14967677
 ] 

Johny Rufus commented on FLUME-2632:


+1,
Looks like I +1ed this on the review request long time back, will run tests and 
commit 

> High CPU on KafkaSink
> -
>
> Key: FLUME-2632
> URL: https://issues.apache.org/jira/browse/FLUME-2632
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>Affects Versions: v1.5.1
>Reporter: Gwen Shapira
>Assignee: Gwen Shapira
>  Labels: kafka
> Fix For: v1.6.0
>
> Attachments: FLUME-2632-0.patch
>
>
> Reported here: https://github.com/harishreedharan/flume/issues/1
> "I tried flume-ng-kafka-sink and it worked fine. But I noticed that the cpu 
> utilization stay at 100% and never dropped down all the time even at the time 
> the channel is empty.
> I looked into the source code and found that "process" function in KafkaSink 
> always return Status.READY even if no events available in channel. That 
> causes the SinkRunner keep running achieving event from channel and get 
> nothing.
> Do we need to change to return Status.BACKOFF in "process" function in 
> KafkaSink when it notices that there is no events processed in current round? 
> So that the SinkRunner has a chance to take a rest when there is no event in 
> channel. If this proposal feasible, function "testEmptyChannel" in 
> TestKafkaSink also need to be changed. "



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2632) High CPU on KafkaSink

2015-10-21 Thread Ralph Goers (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967973#comment-14967973
 ] 

Ralph Goers commented on FLUME-2632:


I am now running this in production in my environment and it works just fine.  
I'd be happy to commit it but that seems a bit strange since I didn't author it 
and wasn't part of the review.

> High CPU on KafkaSink
> -
>
> Key: FLUME-2632
> URL: https://issues.apache.org/jira/browse/FLUME-2632
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>Affects Versions: v1.5.1
>Reporter: Gwen Shapira
>Assignee: Gwen Shapira
>  Labels: kafka
> Fix For: v1.6.0
>
> Attachments: FLUME-2632-0.patch
>
>
> Reported here: https://github.com/harishreedharan/flume/issues/1
> "I tried flume-ng-kafka-sink and it worked fine. But I noticed that the cpu 
> utilization stay at 100% and never dropped down all the time even at the time 
> the channel is empty.
> I looked into the source code and found that "process" function in KafkaSink 
> always return Status.READY even if no events available in channel. That 
> causes the SinkRunner keep running achieving event from channel and get 
> nothing.
> Do we need to change to return Status.BACKOFF in "process" function in 
> KafkaSink when it notices that there is no events processed in current round? 
> So that the SinkRunner has a chance to take a rest when there is no event in 
> channel. If this proposal feasible, function "testEmptyChannel" in 
> TestKafkaSink also need to be changed. "



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2632) High CPU on KafkaSink

2015-10-21 Thread Ashish Paliwal (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967982#comment-14967982
 ] 

Ashish Paliwal commented on FLUME-2632:
---

It's a minor fix. Since I authored it, I cannot commit the changes. If its 
working fine, lets get it out of way :)

> High CPU on KafkaSink
> -
>
> Key: FLUME-2632
> URL: https://issues.apache.org/jira/browse/FLUME-2632
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>Affects Versions: v1.5.1
>Reporter: Gwen Shapira
>Assignee: Gwen Shapira
>  Labels: kafka
> Fix For: v1.6.0
>
> Attachments: FLUME-2632-0.patch
>
>
> Reported here: https://github.com/harishreedharan/flume/issues/1
> "I tried flume-ng-kafka-sink and it worked fine. But I noticed that the cpu 
> utilization stay at 100% and never dropped down all the time even at the time 
> the channel is empty.
> I looked into the source code and found that "process" function in KafkaSink 
> always return Status.READY even if no events available in channel. That 
> causes the SinkRunner keep running achieving event from channel and get 
> nothing.
> Do we need to change to return Status.BACKOFF in "process" function in 
> KafkaSink when it notices that there is no events processed in current round? 
> So that the SinkRunner has a chance to take a rest when there is no event in 
> channel. If this proposal feasible, function "testEmptyChannel" in 
> TestKafkaSink also need to be changed. "



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2632) High CPU on KafkaSink

2015-10-21 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968084#comment-14968084
 ] 

ASF subversion and git services commented on FLUME-2632:


Commit d6bf08b54e467a6bdc6a5fc0edd41c51200e9da1 in flume's branch 
refs/heads/trunk from [~jrufus]
[ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=d6bf08b ]

FLUME-2632: High CPU on KafkaSink

(Ashish Paliwal via Johny Rufus)


> High CPU on KafkaSink
> -
>
> Key: FLUME-2632
> URL: https://issues.apache.org/jira/browse/FLUME-2632
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>Affects Versions: v1.5.1
>Reporter: Gwen Shapira
>Assignee: Gwen Shapira
>  Labels: kafka
> Fix For: v1.6.0
>
> Attachments: FLUME-2632-0.patch
>
>
> Reported here: https://github.com/harishreedharan/flume/issues/1
> "I tried flume-ng-kafka-sink and it worked fine. But I noticed that the cpu 
> utilization stay at 100% and never dropped down all the time even at the time 
> the channel is empty.
> I looked into the source code and found that "process" function in KafkaSink 
> always return Status.READY even if no events available in channel. That 
> causes the SinkRunner keep running achieving event from channel and get 
> nothing.
> Do we need to change to return Status.BACKOFF in "process" function in 
> KafkaSink when it notices that there is no events processed in current round? 
> So that the SinkRunner has a chance to take a rest when there is no event in 
> channel. If this proposal feasible, function "testEmptyChannel" in 
> TestKafkaSink also need to be changed. "



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2632) High CPU on KafkaSink

2015-10-21 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968087#comment-14968087
 ] 

ASF subversion and git services commented on FLUME-2632:


Commit bf2495047f5cebb26f26f94904831c52057d129d in flume's branch 
refs/heads/flume-1.7 from [~jrufus]
[ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=bf24950 ]

FLUME-2632: High CPU on KafkaSink

(Ashish Paliwal via Johny Rufus)


> High CPU on KafkaSink
> -
>
> Key: FLUME-2632
> URL: https://issues.apache.org/jira/browse/FLUME-2632
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>Affects Versions: v1.5.1
>Reporter: Gwen Shapira
>Assignee: Gwen Shapira
>  Labels: kafka
> Fix For: v1.6.0
>
> Attachments: FLUME-2632-0.patch
>
>
> Reported here: https://github.com/harishreedharan/flume/issues/1
> "I tried flume-ng-kafka-sink and it worked fine. But I noticed that the cpu 
> utilization stay at 100% and never dropped down all the time even at the time 
> the channel is empty.
> I looked into the source code and found that "process" function in KafkaSink 
> always return Status.READY even if no events available in channel. That 
> causes the SinkRunner keep running achieving event from channel and get 
> nothing.
> Do we need to change to return Status.BACKOFF in "process" function in 
> KafkaSink when it notices that there is no events processed in current round? 
> So that the SinkRunner has a chance to take a rest when there is no event in 
> channel. If this proposal feasible, function "testEmptyChannel" in 
> TestKafkaSink also need to be changed. "



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


flume hive sink not work

2015-10-21 Thread lizhenm...@163.com

hi all:
I use flume to import data from syslog to hive,but encount the follow errors. 

2015-10-22 10:05:05,115 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - 
org.apache.flume.sink.hive.HiveSink.drainOneBatch(HiveSink.java:324)] k2 : 
Failed connecting to EndPoint {metaStoreUri='thrift://bigdata1:9083', 
database='dnsdb', table='dns_request', partitionVals=[] }
org.apache.flume.sink.hive.HiveWriter$ConnectException: Failed connecting to 
EndPoint {metaStoreUri='thrift://bigdata1:9083', database='dnsdb', 
table='dns_request', partitionVals=[] }
at org.apache.flume.sink.hive.HiveWriter.(HiveWriter.java:99)
at 
org.apache.flume.sink.hive.HiveSink.getOrCreateWriter(HiveSink.java:344)
at org.apache.flume.sink.hive.HiveSink.drainOneBatch(HiveSink.java:296)
at org.apache.flume.sink.hive.HiveSink.process(HiveSink.java:254)
at 
org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.flume.sink.hive.HiveWriter$ConnectException: Failed 
connecting to EndPoint {metaStoreUri='thrift://bigdata1:9083', 
database='dnsdb', table='dns_request', partitionVals=[] }
at 
org.apache.flume.sink.hive.HiveWriter.newConnection(HiveWriter.java:380)
at org.apache.flume.sink.hive.HiveWriter.(HiveWriter.java:86)
... 6 more
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask.get(FutureTask.java:201)
at org.apache.flume.sink.hive.HiveWriter.timedCall(HiveWriter.java:431)
at 
org.apache.flume.sink.hive.HiveWriter.newConnection(HiveWriter.java:373)
... 7 more


my configuration is:


a1.sources = r1
a1.channels = c1 c2
a1.sinks = k1 k2

a1.sources.r1.type = syslogudp
a1.sources.r1.port = 514
a1.sources.r1.host = 192.168.55.246

a1.sources.r1.channels = c1 c2
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = regex_extractor
a1.sources.r1.interceptors.i1.regex = Dns(.*)\\[
a1.sources.r1.interceptors.i1.serializers = t1
a1.sources.r1.interceptors.i1.serializers.t1.name = type

a1.sources.r1.selector.type = multiplexing
a1.sources.r1.selector.header = type
a1.sources.r1.selector.mapping.Request = c1
a1.sources.r1.selector.mapping.Answer = c2

a1.sinks.k2.type = hive
a1.sinks.k2.channel = c1
a1.sinks.k2.hive.metastore = thrift://bigdata1:9083
a1.sinks.k2.hive.database = dnsdb
a1.sinks.k2.hive.table = dns_request
a1.sinks.k2.hive.partiton = %Y,%m,%d,%H
a1.sinks.k2.hive.txnsPerBatchAsk = 2
a1.sinks.k2.batchSize = 10
a1.sinks.k2.serializer = delimited
a1.sinks.k2.serializer.delimiter = ,
a1.sinks.k2.serializer.fieldnames = timepoint,random,sip,dip,spt,type,name

a1.sinks.k1.type = hive
a1.sinks.k1.channel = c2
a1.sinks.k1.hive.metastore = thrift://bigdata1:9083
a1.sinks.k1.hive.database = Dnsdb
a1.sinks.k1.hive.table = dns_answer
a1.sinks.k1.hive.partiton = %Y,%m,%d,%H
a1.sinks.k1.hive.txnsPerBatchAsk = 2
a1.sinks.k1.batchSize = 10
a1.sinks.k1.serializer = delimited
a1.sinks.k1.serializer.delimiter = ,
a1.sinks.k1.serializer.fieldnames = 
timepoint,random,sip,dip,dpt,name,nosuchname,typemax,typecname,typeaddr,authservername,additionalrecords

help me please,thanks.



lizhenm...@163.com


[ANNOUNCE] Change of Apache Flume PMC Chair

2015-10-21 Thread Arvind Prabhakar
Dear Flume Users and Developers,

I have had the pleasure of serving as the PMC Chair of Apache Flume since
its graduation three years ago. I sincerely thank you and the Flume PMC for
this opportunity. However, I have decided to step down from this
responsibility due to personal reasons.

I am very happy to announce that on the request of Flume PMC and with the
approval from the board of directors at The Apache Software Foundation,
Hari Shreedharan is hereby appointed as the new PMC Chair. I am confident
that Hari will do everything possible to help further grow the community
and adoption of Apache Flume.

Please join me in congratulating Hari on his appointment and welcoming him
to this role.

Regards,
Arvind Prabhakar


Re: [ANNOUNCE] Change of Apache Flume PMC Chair

2015-10-21 Thread Johny Rufus
Congrats Hari !! Wonderful news !!

Regards,
Rufus


On Wed, Oct 21, 2015 at 5:50 PM, Arvind Prabhakar  wrote:

> Dear Flume Users and Developers,
>
> I have had the pleasure of serving as the PMC Chair of Apache Flume since
> its graduation three years ago. I sincerely thank you and the Flume PMC for
> this opportunity. However, I have decided to step down from this
> responsibility due to personal reasons.
>
> I am very happy to announce that on the request of Flume PMC and with the
> approval from the board of directors at The Apache Software Foundation,
> Hari Shreedharan is hereby appointed as the new PMC Chair. I am confident
> that Hari will do everything possible to help further grow the community
> and adoption of Apache Flume.
>
> Please join me in congratulating Hari on his appointment and welcoming him
> to this role.
>
> Regards,
> Arvind Prabhakar
>


Re: [ANNOUNCE] Change of Apache Flume PMC Chair

2015-10-21 Thread Ashish
Congrats Hari !

Arvind - Thanks for watching over and taking care of the community.
Hope you would continue to do so in the future as well :)

On Wed, Oct 21, 2015 at 5:50 PM, Arvind Prabhakar  wrote:
> Dear Flume Users and Developers,
>
> I have had the pleasure of serving as the PMC Chair of Apache Flume since
> its graduation three years ago. I sincerely thank you and the Flume PMC for
> this opportunity. However, I have decided to step down from this
> responsibility due to personal reasons.
>
> I am very happy to announce that on the request of Flume PMC and with the
> approval from the board of directors at The Apache Software Foundation,
> Hari Shreedharan is hereby appointed as the new PMC Chair. I am confident
> that Hari will do everything possible to help further grow the community
> and adoption of Apache Flume.
>
> Please join me in congratulating Hari on his appointment and welcoming him
> to this role.
>
> Regards,
> Arvind Prabhakar



-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal


[jira] [Commented] (FLUME-944) Implement a Load-balancing channel selector for distributing the load over many channels.

2015-10-21 Thread Hari Shreedharan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968254#comment-14968254
 ] 

Hari Shreedharan commented on FLUME-944:


Sure - if it is something that interests you, there are probably others who'd 
like to see this as well. Please feel free to upload a patch.

> Implement a Load-balancing channel selector for distributing the load over 
> many channels.
> -
>
> Key: FLUME-944
> URL: https://issues.apache.org/jira/browse/FLUME-944
> Project: Flume
>  Issue Type: Improvement
>Reporter: Arvind Prabhakar
> Attachments: FLUME-944-1.patch
>
>
> The load balancing channel selector could distribute load via:
> * round-robin semantics, or
> * using dynamic load measurements from the channel



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2632) High CPU on KafkaSink

2015-10-21 Thread Ralph Goers (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968179#comment-14968179
 ] 

Ralph Goers commented on FLUME-2632:


Yeah - I forgot Flume had that rule. Totally different than every other project 
I work on. I guess that's why I never commit.

> High CPU on KafkaSink
> -
>
> Key: FLUME-2632
> URL: https://issues.apache.org/jira/browse/FLUME-2632
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>Affects Versions: v1.5.1
>Reporter: Gwen Shapira
>Assignee: Ashish Paliwal
>  Labels: kafka
> Fix For: v1.7.0
>
> Attachments: FLUME-2632-0.patch
>
>
> Reported here: https://github.com/harishreedharan/flume/issues/1
> "I tried flume-ng-kafka-sink and it worked fine. But I noticed that the cpu 
> utilization stay at 100% and never dropped down all the time even at the time 
> the channel is empty.
> I looked into the source code and found that "process" function in KafkaSink 
> always return Status.READY even if no events available in channel. That 
> causes the SinkRunner keep running achieving event from channel and get 
> nothing.
> Do we need to change to return Status.BACKOFF in "process" function in 
> KafkaSink when it notices that there is no events processed in current round? 
> So that the SinkRunner has a chance to take a rest when there is no event in 
> channel. If this proposal feasible, function "testEmptyChannel" in 
> TestKafkaSink also need to be changed. "



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-944) Implement a Load-balancing channel selector for distributing the load over many channels.

2015-10-21 Thread Ralph Goers (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968184#comment-14968184
 ] 

Ralph Goers commented on FLUME-944:
---

FYI - I took this patch and pulled out the minimum load policy and have it 
running in my local environment where it is working really well. Without the 
minimum load policy the only thing that is actually needed is the new 
LoadBalancingChannelSelector.  I'd be happy to supply that as a patch if it is 
of interest.  I can't believe others haven't asked for this.

> Implement a Load-balancing channel selector for distributing the load over 
> many channels.
> -
>
> Key: FLUME-944
> URL: https://issues.apache.org/jira/browse/FLUME-944
> Project: Flume
>  Issue Type: Improvement
>Reporter: Arvind Prabhakar
> Attachments: FLUME-944-1.patch
>
>
> The load balancing channel selector could distribute load via:
> * round-robin semantics, or
> * using dynamic load measurements from the channel



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2632) High CPU on KafkaSink

2015-10-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968205#comment-14968205
 ] 

Hudson commented on FLUME-2632:
---

SUCCESS: Integrated in Flume-trunk-hbase-1 #130 (See 
[https://builds.apache.org/job/Flume-trunk-hbase-1/130/])
FLUME-2632: High CPU on KafkaSink (johnyrufus: 
[http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git=commit=d6bf08b54e467a6bdc6a5fc0edd41c51200e9da1])
* 
flume-ng-sinks/flume-ng-kafka-sink/src/test/java/org/apache/flume/sink/kafka/TestKafkaSink.java
* 
flume-ng-sinks/flume-ng-kafka-sink/src/main/java/org/apache/flume/sink/kafka/KafkaSink.java


> High CPU on KafkaSink
> -
>
> Key: FLUME-2632
> URL: https://issues.apache.org/jira/browse/FLUME-2632
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>Affects Versions: v1.5.1
>Reporter: Gwen Shapira
>Assignee: Ashish Paliwal
>  Labels: kafka
> Fix For: v1.7.0
>
> Attachments: FLUME-2632-0.patch
>
>
> Reported here: https://github.com/harishreedharan/flume/issues/1
> "I tried flume-ng-kafka-sink and it worked fine. But I noticed that the cpu 
> utilization stay at 100% and never dropped down all the time even at the time 
> the channel is empty.
> I looked into the source code and found that "process" function in KafkaSink 
> always return Status.READY even if no events available in channel. That 
> causes the SinkRunner keep running achieving event from channel and get 
> nothing.
> Do we need to change to return Status.BACKOFF in "process" function in 
> KafkaSink when it notices that there is no events processed in current round? 
> So that the SinkRunner has a chance to take a rest when there is no event in 
> channel. If this proposal feasible, function "testEmptyChannel" in 
> TestKafkaSink also need to be changed. "



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to stable : Flume-trunk-hbase-1 #130

2015-10-21 Thread Apache Jenkins Server
See 



[jira] [Commented] (FLUME-2818) Problems with Avro data and not Json and no data in HDFS

2015-10-21 Thread Kettler Karl (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14966698#comment-14966698
 ] 

Kettler Karl commented on FLUME-2818:
-

 I added this configuration into the flume agent:
TwitterAgent.sources.Twitter.interceptors = i1 i2
TwitterAgent.sources.Twitter.i1.type = 
org.apache.flume.interceptor.HostInterceptor$Builder
TwitterAgent.sources.Twitter.i1.interceptors.i1.preserveExisting = false
TwitterAgent.sources.Twitter.i1.interceptors.i1.hostHeader = 192.168.235.145
TwitterAgent.sources.Twitter.i2.type = 
org.apache.flume.interceptor.TimestampInterceptor$Builder
No twitter data were coming.
How should the configuration in the agent should be?
Thanks,
Karl


> Problems with Avro data and not Json and no data in HDFS
> 
>
> Key: FLUME-2818
> URL: https://issues.apache.org/jira/browse/FLUME-2818
> Project: Flume
>  Issue Type: Request
>  Components: Sinks+Sources
>Affects Versions: v1.5.2
> Environment:  HDP-2.3.0.0-2557 Sandbox
>Reporter: Kettler Karl
>Priority: Critical
> Fix For: v1.5.2
>
>
> Flume supplies twitter data in avro format and not in Json. 
> Why? 
> Flume Config Agent: 
> TwitterAgent.sources = Twitter 
> TwitterAgent.channels = MemChannel 
> TwitterAgent.sinks = HDFS 
> TwitterAgent.sources.Twitter.type = 
> org.apache.flume.source.twitter.TwitterSource 
> TwitterAgent.sources.Twitter.channels = MemChannel 
> TwitterAgent.sources.Twitter.consumerKey = xxx 
> TwitterAgent.sources.Twitter.consumerSecret = xxx 
> TwitterAgent.sources.Twitter.accessToken = xxx 
> TwitterAgent.sources.Twitter.accessTokenSecret = xxx 
> TwitterAgent.sources.Twitter.maxBatchSize = 10 
> TwitterAgent.sources.Twitter.maxBatchDurationMillis = 200 
> TwitterAgent.sources.Twitter.keywords = United Nations 
> TwitterAgent.sources.Twitter.deserializer.schemaType = LITERAL 
> # HDFS Sink 
> TwitterAgent.sinks.HDFS.channel = MemChannel 
> TwitterAgent.sinks.HDFS.type = hdfs 
> TwitterAgent.sinks.HDFS.hdfs.path = /demo/tweets/stream/%y-%m-%d/%H%M%S 
> TwitterAgent.sinks.HDFS.hdfs.filePrefix = events 
> TwitterAgent.sinks.HDFS.hdfs.round = true 
> TwitterAgent.sinks.HDFS.hdfs.roundValue = 5 
> TwitterAgent.sinks.HDFS.hdfs.roundUnit = minute 
> TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true 
> TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream 
> TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text 
> TwitterAgent.channels.MemChannel.type = memory 
> TwitterAgent.channels.MemChannel.capacity = 1000 
> TwitterAgent.channels.MemChannel.transactionCapacity = 100 
> Twitter Data from Flume: 
> Obj avro.schema� 
> {"type":"record","name":"Doc","doc":"adoc","fields":[{"name":"id","type":"string"},{"name":"user_friends_count","type":["int","null"]},{"name":"user_location","type":["string","null"]},{"name":"user_description","type":["string","null"]},{"name":"user_statuses_count","type":["int","null"]},{"name":"user_followers_count","type":["int","null"]},{"name":"user_name","type":["string","null"]},{"name":"user_screen_name","type":["string","null"]},{"name":"created_at","type":["string","null"]},{"name":"text","type":["string","null"]},{"name":"retweet_count","type":["long","null"]},{"name":"retweeted","type":["boolean","null"]},{"name":"in_reply_to_user_id","type":["long","null"]},{"name":"source","type":["string","null"]},{"name":"in_reply_to_status_id","type":["long","null"]},{"name":"media_url_https","type":["string","null"]},{"name":"expanded_url","type":["string","null"]}]}�]3hˊى���|$656461386520784896�
>  �お絵描きするショタコン/オタクまっしぐら。論破メインに雑食もぐもぐ/成人済み pixiv:323565 隔離:【@yh_u_】�n� ユハズ 
> yhzz_(2015-10-20T13:26:05Z� はじめた~リセマラめんどくさいし緑茶来たから普通にこのまま進める 
> https://t.co/ZpfDqw4l9g � http://twitter.com; 
> rel="nofollow">Twitter Web Client ^ 
> https://pbs.twimg.com/media/CRw4Js3UAAAGusn.pngthttp://twitter.com/yhzz_/status/656461386520784896/photo/1$656461390677417984�
>   no me veais ni noteis mi presencia no quiere decir que no os este observando 
> desde las sombras�� � JKP® BakasumaUserSinCausa(2015-10-20T13:26:06Z� RT 
> @NaiiVicious: @Lisi_Hattori @UserSinCausa https://t.co/M2LTJWwqae � http://twitter.com/download/android; rel="nofollow">Twitter for Android ^ 
> https://pbs.twimg.com/media/CRthC1mWUAIFTF-.jpg� 
> http://twitter.com/NaiiVicious/status/656224896297529344/photo/1�]3hˊى���|��� 
> By loading this twitter data into a HDFS table. It is not possible to convert 
> with avro-tools-1.7.7.jar. into Json. We get error message: "No data" 
> If we want to read this file we get following error message: 
> "java -jar avro-tools-1.7.7.jar tojson twitter.avro > twitter.json 
> Exception in thread "main" org.apache.avro.AvroRuntimeException: 
> java.io.EOFException" 
> I hope you could help us. 
> Kind regards, 
> Karl 
>  
>  
> Details
>  



--
This message was 

[jira] [Created] (FLUME-2818) Problems with Avro data and not Json and no data in HDFS

2015-10-21 Thread Kettler Karl (JIRA)
Kettler Karl created FLUME-2818:
---

 Summary: Problems with Avro data and not Json and no data in HDFS
 Key: FLUME-2818
 URL: https://issues.apache.org/jira/browse/FLUME-2818
 Project: Flume
  Issue Type: Request
  Components: Sinks+Sources
Affects Versions: v1.5.2
 Environment:  HDP-2.3.0.0-2557 Sandbox
Reporter: Kettler Karl
Priority: Critical
 Fix For: v1.5.2


Flume supplies twitter data in avro format and not in Json. 
Why? 
Flume Config Agent: 
TwitterAgent.sources = Twitter 
TwitterAgent.channels = MemChannel 
TwitterAgent.sinks = HDFS 

TwitterAgent.sources.Twitter.type = 
org.apache.flume.source.twitter.TwitterSource 
TwitterAgent.sources.Twitter.channels = MemChannel 
TwitterAgent.sources.Twitter.consumerKey = xxx 
TwitterAgent.sources.Twitter.consumerSecret = xxx 
TwitterAgent.sources.Twitter.accessToken = xxx 
TwitterAgent.sources.Twitter.accessTokenSecret = xxx 
TwitterAgent.sources.Twitter.maxBatchSize = 10 
TwitterAgent.sources.Twitter.maxBatchDurationMillis = 200 
TwitterAgent.sources.Twitter.keywords = United Nations 
TwitterAgent.sources.Twitter.deserializer.schemaType = LITERAL 
# HDFS Sink 
TwitterAgent.sinks.HDFS.channel = MemChannel 
TwitterAgent.sinks.HDFS.type = hdfs 
TwitterAgent.sinks.HDFS.hdfs.path = /demo/tweets/stream/%y-%m-%d/%H%M%S 
TwitterAgent.sinks.HDFS.hdfs.filePrefix = events 
TwitterAgent.sinks.HDFS.hdfs.round = true 
TwitterAgent.sinks.HDFS.hdfs.roundValue = 5 
TwitterAgent.sinks.HDFS.hdfs.roundUnit = minute 
TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true 
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream 
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text 

TwitterAgent.channels.MemChannel.type = memory 
TwitterAgent.channels.MemChannel.capacity = 1000 
TwitterAgent.channels.MemChannel.transactionCapacity = 100 

Twitter Data from Flume: 
Obj avro.schema� 
{"type":"record","name":"Doc","doc":"adoc","fields":[{"name":"id","type":"string"},{"name":"user_friends_count","type":["int","null"]},{"name":"user_location","type":["string","null"]},{"name":"user_description","type":["string","null"]},{"name":"user_statuses_count","type":["int","null"]},{"name":"user_followers_count","type":["int","null"]},{"name":"user_name","type":["string","null"]},{"name":"user_screen_name","type":["string","null"]},{"name":"created_at","type":["string","null"]},{"name":"text","type":["string","null"]},{"name":"retweet_count","type":["long","null"]},{"name":"retweeted","type":["boolean","null"]},{"name":"in_reply_to_user_id","type":["long","null"]},{"name":"source","type":["string","null"]},{"name":"in_reply_to_status_id","type":["long","null"]},{"name":"media_url_https","type":["string","null"]},{"name":"expanded_url","type":["string","null"]}]}�]3hˊى���|$656461386520784896�
 �お絵描きするショタコン/オタクまっしぐら。論破メインに雑食もぐもぐ/成人済み pixiv:323565 隔離:【@yh_u_】�n� ユハズ 
yhzz_(2015-10-20T13:26:05Z� はじめた~リセマラめんどくさいし緑茶来たから普通にこのまま進める 
https://t.co/ZpfDqw4l9g � http://twitter.com; rel="nofollow">Twitter 
Web Client ^ 
https://pbs.twimg.com/media/CRw4Js3UAAAGusn.pngthttp://twitter.com/yhzz_/status/656461386520784896/photo/1$656461390677417984�
 https://t.co/M2LTJWwqae � http://twitter.com/download/android; rel="nofollow">Twitter for Android ^ 
https://pbs.twimg.com/media/CRthC1mWUAIFTF-.jpg� 
http://twitter.com/NaiiVicious/status/656224896297529344/photo/1�]3hˊى���|��� 

By loading this twitter data into a HDFS table. It is not possible to convert 
with avro-tools-1.7.7.jar. into Json. We get error message: "No data" 
If we want to read this file we get following error message: 
"java -jar avro-tools-1.7.7.jar tojson twitter.avro > twitter.json 
Exception in thread "main" org.apache.avro.AvroRuntimeException: 
java.io.EOFException" 

I hope you could help us. 

Kind regards, 
Karl 
 
 
Details
 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2818) Problems with Avro data and not Json and no data in HDFS

2015-10-21 Thread Gonzalo Herreros (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14966424#comment-14966424
 ] 

Gonzalo Herreros commented on FLUME-2818:
-

That avro is generated by the TwitterSource and gets corrupted when you try to 
store it as text in the sink.

I see two solutions:
- You could remove the line TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text  
and parse the hdfs files as avro
- Build an interceptor for the source that converts avro into json

> Problems with Avro data and not Json and no data in HDFS
> 
>
> Key: FLUME-2818
> URL: https://issues.apache.org/jira/browse/FLUME-2818
> Project: Flume
>  Issue Type: Request
>  Components: Sinks+Sources
>Affects Versions: v1.5.2
> Environment:  HDP-2.3.0.0-2557 Sandbox
>Reporter: Kettler Karl
>Priority: Critical
> Fix For: v1.5.2
>
>
> Flume supplies twitter data in avro format and not in Json. 
> Why? 
> Flume Config Agent: 
> TwitterAgent.sources = Twitter 
> TwitterAgent.channels = MemChannel 
> TwitterAgent.sinks = HDFS 
> TwitterAgent.sources.Twitter.type = 
> org.apache.flume.source.twitter.TwitterSource 
> TwitterAgent.sources.Twitter.channels = MemChannel 
> TwitterAgent.sources.Twitter.consumerKey = xxx 
> TwitterAgent.sources.Twitter.consumerSecret = xxx 
> TwitterAgent.sources.Twitter.accessToken = xxx 
> TwitterAgent.sources.Twitter.accessTokenSecret = xxx 
> TwitterAgent.sources.Twitter.maxBatchSize = 10 
> TwitterAgent.sources.Twitter.maxBatchDurationMillis = 200 
> TwitterAgent.sources.Twitter.keywords = United Nations 
> TwitterAgent.sources.Twitter.deserializer.schemaType = LITERAL 
> # HDFS Sink 
> TwitterAgent.sinks.HDFS.channel = MemChannel 
> TwitterAgent.sinks.HDFS.type = hdfs 
> TwitterAgent.sinks.HDFS.hdfs.path = /demo/tweets/stream/%y-%m-%d/%H%M%S 
> TwitterAgent.sinks.HDFS.hdfs.filePrefix = events 
> TwitterAgent.sinks.HDFS.hdfs.round = true 
> TwitterAgent.sinks.HDFS.hdfs.roundValue = 5 
> TwitterAgent.sinks.HDFS.hdfs.roundUnit = minute 
> TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true 
> TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream 
> TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text 
> TwitterAgent.channels.MemChannel.type = memory 
> TwitterAgent.channels.MemChannel.capacity = 1000 
> TwitterAgent.channels.MemChannel.transactionCapacity = 100 
> Twitter Data from Flume: 
> Obj avro.schema� 
> {"type":"record","name":"Doc","doc":"adoc","fields":[{"name":"id","type":"string"},{"name":"user_friends_count","type":["int","null"]},{"name":"user_location","type":["string","null"]},{"name":"user_description","type":["string","null"]},{"name":"user_statuses_count","type":["int","null"]},{"name":"user_followers_count","type":["int","null"]},{"name":"user_name","type":["string","null"]},{"name":"user_screen_name","type":["string","null"]},{"name":"created_at","type":["string","null"]},{"name":"text","type":["string","null"]},{"name":"retweet_count","type":["long","null"]},{"name":"retweeted","type":["boolean","null"]},{"name":"in_reply_to_user_id","type":["long","null"]},{"name":"source","type":["string","null"]},{"name":"in_reply_to_status_id","type":["long","null"]},{"name":"media_url_https","type":["string","null"]},{"name":"expanded_url","type":["string","null"]}]}�]3hˊى���|$656461386520784896�
>  �お絵描きするショタコン/オタクまっしぐら。論破メインに雑食もぐもぐ/成人済み pixiv:323565 隔離:【@yh_u_】�n� ユハズ 
> yhzz_(2015-10-20T13:26:05Z� はじめた~リセマラめんどくさいし緑茶来たから普通にこのまま進める 
> https://t.co/ZpfDqw4l9g � http://twitter.com; 
> rel="nofollow">Twitter Web Client ^ 
> https://pbs.twimg.com/media/CRw4Js3UAAAGusn.pngthttp://twitter.com/yhzz_/status/656461386520784896/photo/1$656461390677417984�
>   no me veais ni noteis mi presencia no quiere decir que no os este observando 
> desde las sombras�� � JKP® BakasumaUserSinCausa(2015-10-20T13:26:06Z� RT 
> @NaiiVicious: @Lisi_Hattori @UserSinCausa https://t.co/M2LTJWwqae � http://twitter.com/download/android; rel="nofollow">Twitter for Android ^ 
> https://pbs.twimg.com/media/CRthC1mWUAIFTF-.jpg� 
> http://twitter.com/NaiiVicious/status/656224896297529344/photo/1�]3hˊى���|��� 
> By loading this twitter data into a HDFS table. It is not possible to convert 
> with avro-tools-1.7.7.jar. into Json. We get error message: "No data" 
> If we want to read this file we get following error message: 
> "java -jar avro-tools-1.7.7.jar tojson twitter.avro > twitter.json 
> Exception in thread "main" org.apache.avro.AvroRuntimeException: 
> java.io.EOFException" 
> I hope you could help us. 
> Kind regards, 
> Karl 
>  
>  
> Details
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2818) Problems with Avro data and not Json and no data in HDFS

2015-10-21 Thread Kettler Karl (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14966462#comment-14966462
 ] 

Kettler Karl commented on FLUME-2818:
-

Hello Gonzalo,

I removed this sink but I got the same file structure.

Do you have an idea how this interceptor should be?

Thanks

Kind regards,
Karl

> Problems with Avro data and not Json and no data in HDFS
> 
>
> Key: FLUME-2818
> URL: https://issues.apache.org/jira/browse/FLUME-2818
> Project: Flume
>  Issue Type: Request
>  Components: Sinks+Sources
>Affects Versions: v1.5.2
> Environment:  HDP-2.3.0.0-2557 Sandbox
>Reporter: Kettler Karl
>Priority: Critical
> Fix For: v1.5.2
>
>
> Flume supplies twitter data in avro format and not in Json. 
> Why? 
> Flume Config Agent: 
> TwitterAgent.sources = Twitter 
> TwitterAgent.channels = MemChannel 
> TwitterAgent.sinks = HDFS 
> TwitterAgent.sources.Twitter.type = 
> org.apache.flume.source.twitter.TwitterSource 
> TwitterAgent.sources.Twitter.channels = MemChannel 
> TwitterAgent.sources.Twitter.consumerKey = xxx 
> TwitterAgent.sources.Twitter.consumerSecret = xxx 
> TwitterAgent.sources.Twitter.accessToken = xxx 
> TwitterAgent.sources.Twitter.accessTokenSecret = xxx 
> TwitterAgent.sources.Twitter.maxBatchSize = 10 
> TwitterAgent.sources.Twitter.maxBatchDurationMillis = 200 
> TwitterAgent.sources.Twitter.keywords = United Nations 
> TwitterAgent.sources.Twitter.deserializer.schemaType = LITERAL 
> # HDFS Sink 
> TwitterAgent.sinks.HDFS.channel = MemChannel 
> TwitterAgent.sinks.HDFS.type = hdfs 
> TwitterAgent.sinks.HDFS.hdfs.path = /demo/tweets/stream/%y-%m-%d/%H%M%S 
> TwitterAgent.sinks.HDFS.hdfs.filePrefix = events 
> TwitterAgent.sinks.HDFS.hdfs.round = true 
> TwitterAgent.sinks.HDFS.hdfs.roundValue = 5 
> TwitterAgent.sinks.HDFS.hdfs.roundUnit = minute 
> TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true 
> TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream 
> TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text 
> TwitterAgent.channels.MemChannel.type = memory 
> TwitterAgent.channels.MemChannel.capacity = 1000 
> TwitterAgent.channels.MemChannel.transactionCapacity = 100 
> Twitter Data from Flume: 
> Obj avro.schema� 
> {"type":"record","name":"Doc","doc":"adoc","fields":[{"name":"id","type":"string"},{"name":"user_friends_count","type":["int","null"]},{"name":"user_location","type":["string","null"]},{"name":"user_description","type":["string","null"]},{"name":"user_statuses_count","type":["int","null"]},{"name":"user_followers_count","type":["int","null"]},{"name":"user_name","type":["string","null"]},{"name":"user_screen_name","type":["string","null"]},{"name":"created_at","type":["string","null"]},{"name":"text","type":["string","null"]},{"name":"retweet_count","type":["long","null"]},{"name":"retweeted","type":["boolean","null"]},{"name":"in_reply_to_user_id","type":["long","null"]},{"name":"source","type":["string","null"]},{"name":"in_reply_to_status_id","type":["long","null"]},{"name":"media_url_https","type":["string","null"]},{"name":"expanded_url","type":["string","null"]}]}�]3hˊى���|$656461386520784896�
>  �お絵描きするショタコン/オタクまっしぐら。論破メインに雑食もぐもぐ/成人済み pixiv:323565 隔離:【@yh_u_】�n� ユハズ 
> yhzz_(2015-10-20T13:26:05Z� はじめた~リセマラめんどくさいし緑茶来たから普通にこのまま進める 
> https://t.co/ZpfDqw4l9g � http://twitter.com; 
> rel="nofollow">Twitter Web Client ^ 
> https://pbs.twimg.com/media/CRw4Js3UAAAGusn.pngthttp://twitter.com/yhzz_/status/656461386520784896/photo/1$656461390677417984�
>   no me veais ni noteis mi presencia no quiere decir que no os este observando 
> desde las sombras�� � JKP® BakasumaUserSinCausa(2015-10-20T13:26:06Z� RT 
> @NaiiVicious: @Lisi_Hattori @UserSinCausa https://t.co/M2LTJWwqae � http://twitter.com/download/android; rel="nofollow">Twitter for Android ^ 
> https://pbs.twimg.com/media/CRthC1mWUAIFTF-.jpg� 
> http://twitter.com/NaiiVicious/status/656224896297529344/photo/1�]3hˊى���|��� 
> By loading this twitter data into a HDFS table. It is not possible to convert 
> with avro-tools-1.7.7.jar. into Json. We get error message: "No data" 
> If we want to read this file we get following error message: 
> "java -jar avro-tools-1.7.7.jar tojson twitter.avro > twitter.json 
> Exception in thread "main" org.apache.avro.AvroRuntimeException: 
> java.io.EOFException" 
> I hope you could help us. 
> Kind regards, 
> Karl 
>  
>  
> Details
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2818) Problems with Avro data and not Json and no data in HDFS

2015-10-21 Thread Gonzalo Herreros (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14966485#comment-14966485
 ] 

Gonzalo Herreros commented on FLUME-2818:
-

It would look something like this:

public class AvroToJsonInterceptor implements Interceptor {
...
public Event intercept(Event event) {
byte[] avro = event.getBody();
byte[] json = avroToJson(avro);
event.setBody(json);
return event;
}
...
}

See documentation on how to use interceptors

> Problems with Avro data and not Json and no data in HDFS
> 
>
> Key: FLUME-2818
> URL: https://issues.apache.org/jira/browse/FLUME-2818
> Project: Flume
>  Issue Type: Request
>  Components: Sinks+Sources
>Affects Versions: v1.5.2
> Environment:  HDP-2.3.0.0-2557 Sandbox
>Reporter: Kettler Karl
>Priority: Critical
> Fix For: v1.5.2
>
>
> Flume supplies twitter data in avro format and not in Json. 
> Why? 
> Flume Config Agent: 
> TwitterAgent.sources = Twitter 
> TwitterAgent.channels = MemChannel 
> TwitterAgent.sinks = HDFS 
> TwitterAgent.sources.Twitter.type = 
> org.apache.flume.source.twitter.TwitterSource 
> TwitterAgent.sources.Twitter.channels = MemChannel 
> TwitterAgent.sources.Twitter.consumerKey = xxx 
> TwitterAgent.sources.Twitter.consumerSecret = xxx 
> TwitterAgent.sources.Twitter.accessToken = xxx 
> TwitterAgent.sources.Twitter.accessTokenSecret = xxx 
> TwitterAgent.sources.Twitter.maxBatchSize = 10 
> TwitterAgent.sources.Twitter.maxBatchDurationMillis = 200 
> TwitterAgent.sources.Twitter.keywords = United Nations 
> TwitterAgent.sources.Twitter.deserializer.schemaType = LITERAL 
> # HDFS Sink 
> TwitterAgent.sinks.HDFS.channel = MemChannel 
> TwitterAgent.sinks.HDFS.type = hdfs 
> TwitterAgent.sinks.HDFS.hdfs.path = /demo/tweets/stream/%y-%m-%d/%H%M%S 
> TwitterAgent.sinks.HDFS.hdfs.filePrefix = events 
> TwitterAgent.sinks.HDFS.hdfs.round = true 
> TwitterAgent.sinks.HDFS.hdfs.roundValue = 5 
> TwitterAgent.sinks.HDFS.hdfs.roundUnit = minute 
> TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true 
> TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream 
> TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text 
> TwitterAgent.channels.MemChannel.type = memory 
> TwitterAgent.channels.MemChannel.capacity = 1000 
> TwitterAgent.channels.MemChannel.transactionCapacity = 100 
> Twitter Data from Flume: 
> Obj avro.schema� 
> {"type":"record","name":"Doc","doc":"adoc","fields":[{"name":"id","type":"string"},{"name":"user_friends_count","type":["int","null"]},{"name":"user_location","type":["string","null"]},{"name":"user_description","type":["string","null"]},{"name":"user_statuses_count","type":["int","null"]},{"name":"user_followers_count","type":["int","null"]},{"name":"user_name","type":["string","null"]},{"name":"user_screen_name","type":["string","null"]},{"name":"created_at","type":["string","null"]},{"name":"text","type":["string","null"]},{"name":"retweet_count","type":["long","null"]},{"name":"retweeted","type":["boolean","null"]},{"name":"in_reply_to_user_id","type":["long","null"]},{"name":"source","type":["string","null"]},{"name":"in_reply_to_status_id","type":["long","null"]},{"name":"media_url_https","type":["string","null"]},{"name":"expanded_url","type":["string","null"]}]}�]3hˊى���|$656461386520784896�
>  �お絵描きするショタコン/オタクまっしぐら。論破メインに雑食もぐもぐ/成人済み pixiv:323565 隔離:【@yh_u_】�n� ユハズ 
> yhzz_(2015-10-20T13:26:05Z� はじめた~リセマラめんどくさいし緑茶来たから普通にこのまま進める 
> https://t.co/ZpfDqw4l9g � http://twitter.com; 
> rel="nofollow">Twitter Web Client ^ 
> https://pbs.twimg.com/media/CRw4Js3UAAAGusn.pngthttp://twitter.com/yhzz_/status/656461386520784896/photo/1$656461390677417984�
>   no me veais ni noteis mi presencia no quiere decir que no os este observando 
> desde las sombras�� � JKP® BakasumaUserSinCausa(2015-10-20T13:26:06Z� RT 
> @NaiiVicious: @Lisi_Hattori @UserSinCausa https://t.co/M2LTJWwqae � http://twitter.com/download/android; rel="nofollow">Twitter for Android ^ 
> https://pbs.twimg.com/media/CRthC1mWUAIFTF-.jpg� 
> http://twitter.com/NaiiVicious/status/656224896297529344/photo/1�]3hˊى���|��� 
> By loading this twitter data into a HDFS table. It is not possible to convert 
> with avro-tools-1.7.7.jar. into Json. We get error message: "No data" 
> If we want to read this file we get following error message: 
> "java -jar avro-tools-1.7.7.jar tojson twitter.avro > twitter.json 
> Exception in thread "main" org.apache.avro.AvroRuntimeException: 
> java.io.EOFException" 
> I hope you could help us. 
> Kind regards, 
> Karl 
>  
>  
> Details
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)