Increasing the batch size to 10k didnt make any diff: 24.6k event/sec
Subsequently increased the number of sources from 1 to 8 which improved it
a bit .. 33k e/s
Yes I will try to take a deeper look using a profiler.
Here is another issue that comes up occasionally with HDFS sink ... any
thoughts ?
15 Dec 2013 11:13:26,689 ERROR
[SinkRunner-PollingRunner-DefaultSinkProcessor]
(org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated:82) -
Unexpected error while checking replication factor
java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.flume.sink.hdfs.AbstractHDFSWriter.getNumCurrentReplicas(AbstractHDFSWriter.java:147)
at
org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:68)
at
org.apache.flume.sink.hdfs.BucketWriter.shouldRotate(BucketWriter.java:452)
at
org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:387)
at
org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
at
org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at
org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:662)
Caused by:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
Lease mismatch on /flume/bidder/_flume_tx_agent01_sink01_web06-
east.manage.com_2013121422.1387093314310.snappy.tmp owned by
DFSClient_NONMAPREDUCE_-1742177356_32 but is accessed by
DFSClient_NONMAPREDUCE_643393114_25
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2770)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2567)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2480)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.