[jira] [Resolved] (FLUME-2920) Kafka Channel Should Not Commit Offsets When Stopping
[ https://issues.apache.org/jira/browse/FLUME-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho resolved FLUME-2920. --- Resolution: Fixed Fix Version/s: v1.7.0 Thank you for your contribution [~kevinconaway] and for the review [~jholoman] / [~granthenke]! > Kafka Channel Should Not Commit Offsets When Stopping > - > > Key: FLUME-2920 > URL: https://issues.apache.org/jira/browse/FLUME-2920 > Project: Flume > Issue Type: Bug > Components: Channel >Affects Versions: v1.6.0 >Reporter: Kevin Conaway >Assignee: Kevin Conaway >Priority: Critical > Fix For: v1.7.0 > > Attachments: FLUME-2920.patch > > > The Kafka channel commits the consumer offsets when shutting down (via stop() > -> decommissionConsumerAndRecords()) > This can lead to data loss if the channel is shut down while messages have > been fetched in a transaction but the transaction has not yet been committed. > The only time that the offsets should be committed is when a transaction is > complete. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLUME-2920) Kafka Channel Should Not Commit Offsets When Stopping
[ https://issues.apache.org/jira/browse/FLUME-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho reassigned FLUME-2920: - Assignee: Kevin Conaway > Kafka Channel Should Not Commit Offsets When Stopping > - > > Key: FLUME-2920 > URL: https://issues.apache.org/jira/browse/FLUME-2920 > Project: Flume > Issue Type: Bug > Components: Channel >Affects Versions: v1.6.0 >Reporter: Kevin Conaway >Assignee: Kevin Conaway >Priority: Critical > Attachments: FLUME-2920.patch > > > The Kafka channel commits the consumer offsets when shutting down (via stop() > -> decommissionConsumerAndRecords()) > This can lead to data loss if the channel is shut down while messages have > been fetched in a transaction but the transaction has not yet been committed. > The only time that the offsets should be committed is when a transaction is > complete. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2905) NetcatSource - Socket not closed when an exception is encountered during start() leading to file descriptor leaks
[ https://issues.apache.org/jira/browse/FLUME-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289528#comment-15289528 ] Jarek Jarcec Cecho commented on FLUME-2905: --- My local build is failing with: {code} Tests run: 8, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.481 sec <<< FAILURE! testSourceStoppedOnFlumeException(org.apache.flume.source.TestNetcatSource) Time elapsed: 6 sec <<< ERROR! java.lang.IllegalStateException: No channel processor configured at com.google.common.base.Preconditions.checkState(Preconditions.java:145) at org.apache.flume.source.AbstractSource.start(AbstractSource.java:45) at org.apache.flume.source.NetcatSource.start(NetcatSource.java:192) at org.apache.flume.source.TestNetcatSource.startSource(TestNetcatSource.java:344) at org.apache.flume.source.TestNetcatSource.testSourceStoppedOnFlumeException(TestNetcatSource.java:314) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) {code} Wondering if you saw that in your environment as well? > NetcatSource - Socket not closed when an exception is encountered during > start() leading to file descriptor leaks > - > > Key: FLUME-2905 > URL: https://issues.apache.org/jira/browse/FLUME-2905 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources >Affects Versions: v1.6.0 >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja > Attachments: FLUME-2905-0.patch, FLUME-2905-1.patch > > > During the flume agent start-up, the flume configuration containing the > NetcatSource is parsed and the source's start() is called. If there is an > issue while binding the channel's socket to a local address to configure the > socket to listen for connections following exception is thrown but the socket > open just before is not closed. > {code} > 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: > Unable to start EventDrivenSourceRunner: { > source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - > Exception follows. > org.apache.flume.FlumeException: java.net.
[jira] [Commented] (FLUME-2905) NetcatSource - Socket not closed when an exception is encountered during start() leading to file descriptor leaks
[ https://issues.apache.org/jira/browse/FLUME-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281953#comment-15281953 ] Jarek Jarcec Cecho commented on FLUME-2905: --- Sure, my pleasure. Quickly looking at the patch, can you please remove the trailing white space warnings [~sahuja]? {code} jarcec@arlene flume % git apply FLUME-2905-0.patch [trunk ✗] (1.8.0_77-b03) (ruby-2.2.3) [12:54:00] FLUME-2905-0.patch:25: space before tab in indent. .setNameFormat("netcat-handler-%d").build()); FLUME-2905-0.patch:50: trailing whitespace. FLUME-2905-0.patch:52: trailing whitespace. * Tests that the source is stopped when an exception is thrown FLUME-2905-0.patch:54: trailing whitespace. * clean up the sockets opened during source.start(). FLUME-2905-0.patch:55: trailing whitespace. * warning: squelched 5 whitespace errors warning: 10 lines add whitespace errors. {code} > NetcatSource - Socket not closed when an exception is encountered during > start() leading to file descriptor leaks > - > > Key: FLUME-2905 > URL: https://issues.apache.org/jira/browse/FLUME-2905 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources >Affects Versions: v1.6.0 >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja > Attachments: FLUME-2905-0.patch > > > During the flume agent start-up, the flume configuration containing the > NetcatSource is parsed and the source's start() is called. If there is an > issue while binding the channel's socket to a local address to configure the > socket to listen for connections following exception is thrown but the socket > open just before is not closed. > {code} > 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: > Unable to start EventDrivenSourceRunner: { > source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - > Exception follows. > org.apache.flume.FlumeException: java.net.BindException: Address already in > use > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173) > at > org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) > at > org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167) > ... 9 more > {code} > The source's start() is then called again leading to another socket being > opened but not closed and so on. This leads to file descriptor (socket) leaks. > This can be easily reproduced as follows: > 1. Set Netcat as the source in flume agent configuration. > 2. Set the bind port for the netcat source to a port which is already in use. > e.g. in my case I used 50010 which is the port for DataNode's XCeiver > Protocol in use by the HDFS service. > 3. Start flume agent and perform "lsof -p | wc -l". Notice > the file descriptors keep on growing due to socket leaks with errors like: > "can't identify protocol". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2620) File channel throws NullPointerException if a header value is null
[ https://issues.apache.org/jira/browse/FLUME-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272559#comment-15272559 ] Jarek Jarcec Cecho commented on FLUME-2620: --- I have two comments to the [latest patch|https://issues.apache.org/jira/secure/attachment/12802350/FLUME-2620.patch]: * Our code style is to put opening bracket next to the previous statement. E.g. {{if(headerVal.getValue()==null) {}} on one single line rather then two. Can you please fix the method {{checkAndReplaceNullInHeaders}} to look like that? * Can we introduce a configuration option that will allow user to configure the default "replacement" value for {{null}}? I'm concerned that someone might need to distinguish the case of value being {{null}} or empty. > File channel throws NullPointerException if a header value is null > -- > > Key: FLUME-2620 > URL: https://issues.apache.org/jira/browse/FLUME-2620 > Project: Flume > Issue Type: Bug > Components: File Channel >Reporter: Santiago M. Mola >Assignee: Neerja Khattar > Attachments: FLUME-2620.patch, FLUME-2620.patch > > > File channel throws NullPointerException if a header value is null. > If this is intended, it should be reported correctly in the logs. > Sample trace: > org.apache.flume.ChannelException: Unable to put batch on required channel: > FileChannel chan { dataDirs: [/var/lib/ingestion-csv/chan/data] } > at > org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:200) > at > org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:236) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at > org.apache.flume.channel.file.proto.ProtosFactory$FlumeEventHeader$Builder.setValue(ProtosFactory.java:7415) > at org.apache.flume.channel.file.Put.writeProtos(Put.java:85) > at > org.apache.flume.channel.file.TransactionEventRecord.toByteBuffer(TransactionEventRecord.java:174) > at org.apache.flume.channel.file.Log.put(Log.java:622) > at > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doPut(FileChannel.java:469) > at > org.apache.flume.channel.BasicTransactionSemantics.put(BasicTransactionSemantics.java:93) > at > org.apache.flume.channel.BasicChannelSemantics.put(BasicChannelSemantics.java:80) > at > org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:189) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLUME-2898) Kafka source test will freeze on JDK8
Jarek Jarcec Cecho created FLUME-2898: - Summary: Kafka source test will freeze on JDK8 Key: FLUME-2898 URL: https://issues.apache.org/jira/browse/FLUME-2898 Project: Flume Issue Type: Bug Reporter: Jarek Jarcec Cecho After committing FLUME-2821, I've noticed that the test case doesn't work properly on JDK8. It gets stuck: {code} "main" #1 prio=5 os_prio=31 tid=0x7fa662000800 nid=0x1703 runnable [0x70217000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method) at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198) at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:103) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked <0x00076bc0dd78> (a sun.nio.ch.Util$2) - locked <0x00076bc0dd68> (a java.util.Collections$UnmodifiableSet) - locked <0x00076bc0dc48> (a sun.nio.ch.KQueueSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at org.apache.kafka.common.network.Selector.select(Selector.java:425) at org.apache.kafka.common.network.Selector.poll(Selector.java:254) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:134) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorKnown(AbstractCoordinator.java:184) at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:886) at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) at org.apache.flume.source.kafka.KafkaSource.doStart(KafkaSource.java:352) at org.apache.flume.source.BasicSourceSemantics.start(BasicSourceSemantics.java:83) - locked <0x0006c81d1698> (a org.apache.flume.source.kafka.KafkaSource) at org.apache.flume.source.kafka.TestKafkaSource.testOffsets(TestKafkaSource.java:97) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderPro
[jira] [Resolved] (FLUME-2820) Support New Kafka APIs
[ https://issues.apache.org/jira/browse/FLUME-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho resolved FLUME-2820. --- Resolution: Fixed Fix Version/s: v1.7.0 I'm resolving the umbrella JIRA as all subtasks has been committed. > Support New Kafka APIs > -- > > Key: FLUME-2820 > URL: https://issues.apache.org/jira/browse/FLUME-2820 > Project: Flume > Issue Type: Improvement >Reporter: Jeff Holoman >Assignee: Jeff Holoman > Fix For: v1.7.0 > > > Flume utilizes the older Kafka producer and consumer APIs. We should add > implementations of the Source, Channel and Sink that utilize the new APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLUME-2823) Flume-Kafka-Channel with new APIs
[ https://issues.apache.org/jira/browse/FLUME-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho resolved FLUME-2823. --- Resolution: Fixed Fix Version/s: v1.7.0 Thank you for your contribution [~jholoman]! > Flume-Kafka-Channel with new APIs > - > > Key: FLUME-2823 > URL: https://issues.apache.org/jira/browse/FLUME-2823 > Project: Flume > Issue Type: Sub-task >Reporter: Jeff Holoman >Assignee: Jeff Holoman > Fix For: v1.7.0 > > Attachments: FLUME-2823v4.patch, FLUME-2823v6.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLUME-2822) Flume-Kafka-Sink with new Producer
[ https://issues.apache.org/jira/browse/FLUME-2822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho resolved FLUME-2822. --- Resolution: Fixed Fix Version/s: v1.7.0 Thank you for your contribution [~jholoman]! > Flume-Kafka-Sink with new Producer > -- > > Key: FLUME-2822 > URL: https://issues.apache.org/jira/browse/FLUME-2822 > Project: Flume > Issue Type: Sub-task >Reporter: Jeff Holoman >Assignee: Jeff Holoman > Fix For: v1.7.0 > > Attachments: FLUME-2822.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2897) AsyncHBase sink NPE when Channel.getTransaction() fails
[ https://issues.apache.org/jira/browse/FLUME-2897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216136#comment-15216136 ] Jarek Jarcec Cecho commented on FLUME-2897: --- Thanks [~mpercy]. I've quickly looked into other Sinks to see how they are handling this and both [AvroSink|https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/sink/AbstractRpcSink.java#L336] and [HdfsSink|https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java#L368] are using the idiom that you've mentioned. Some of the newer sinks ([KafkaSink|https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-ng-kafka-sink/src/main/java/org/apache/flume/sink/kafka/KafkaSink.java#L83]) are getting the transaction inside the {{try-catch}}, but then they are properly guarding the transaction object against being {{null}}. This seems a bit simpler, so I'm +1. > AsyncHBase sink NPE when Channel.getTransaction() fails > --- > > Key: FLUME-2897 > URL: https://issues.apache.org/jira/browse/FLUME-2897 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources >Affects Versions: v1.6.0 >Reporter: Mike Percy >Assignee: Mike Percy > Attachments: FLUME-2897-1.patch > > > There is a possibility for a NPE in the AsyncHBaseSink when a channel > getTransaction() call fails. This is possible when the FileChannel is out of > disk space. > Patch forthcoming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2821) Flume-Kafka Source with new Consumer
[ https://issues.apache.org/jira/browse/FLUME-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15214994#comment-15214994 ] Jarek Jarcec Cecho commented on FLUME-2821: --- Can you please upload the latest patch from the review board to this JIRA as well [~groghkov]? I need that because by attaching the patch here, you're implicitly giving ASF rights to include it (which is not the case on review board). After that I'll go ahead and commit it as discussed on FLUME-2820. > Flume-Kafka Source with new Consumer > > > Key: FLUME-2821 > URL: https://issues.apache.org/jira/browse/FLUME-2821 > Project: Flume > Issue Type: Sub-task >Reporter: Jeff Holoman >Assignee: Grigoriy Rozhkov > Attachments: FLUME-2821.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2820) Support New Kafka APIs
[ https://issues.apache.org/jira/browse/FLUME-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209030#comment-15209030 ] Jarek Jarcec Cecho commented on FLUME-2820: --- Updated channel and sink got several +1s, so I believe that they are ready to be committed. I've asked [~singhashish] from Kafka community to take a look on the source patch. Once that will be done I'll go ahead and commit all those patches at once. As they obviously overlap a bit (overriding root {{pom.xml}} the same way), I'll remove the overlap myself and commit that change. > Support New Kafka APIs > -- > > Key: FLUME-2820 > URL: https://issues.apache.org/jira/browse/FLUME-2820 > Project: Flume > Issue Type: Improvement >Reporter: Jeff Holoman >Assignee: Jeff Holoman > > Flume utilizes the older Kafka producer and consumer APIs. We should add > implementations of the Source, Channel and Sink that utilize the new APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2891) Revert FLUME-2712 and FLUME-2886
[ https://issues.apache.org/jira/browse/FLUME-2891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15187678#comment-15187678 ] Jarek Jarcec Cecho commented on FLUME-2891: --- +1, seems reasonable to me > Revert FLUME-2712 and FLUME-2886 > > > Key: FLUME-2891 > URL: https://issues.apache.org/jira/browse/FLUME-2891 > Project: Flume > Issue Type: Bug >Reporter: Hari Shreedharan >Assignee: Hari Shreedharan > Attachments: FLUME-2891.patch > > > FLUME-2712 can be fixed by simply setting keep-alive to 0. I think it added > additional complexity, which we can probably avoid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLUME-2886) Optional Channels can cause OOMs
[ https://issues.apache.org/jira/browse/FLUME-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho resolved FLUME-2886. --- Resolution: Fixed Fix Version/s: v1.7.0 Thank you for your contribution [~hshreedharan]! > Optional Channels can cause OOMs > > > Key: FLUME-2886 > URL: https://issues.apache.org/jira/browse/FLUME-2886 > Project: Flume > Issue Type: Bug >Reporter: Hari Shreedharan >Assignee: Hari Shreedharan > Fix For: v1.7.0 > > Attachments: FLUME-2886.patch > > > If an optional channel is full, the queue backing the executor that is > asynchronously submitting the events to the channel can grow indefinitely in > size leading to a huge number of events on the heap and causing OOMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2886) Optional Channels can cause OOMs
[ https://issues.apache.org/jira/browse/FLUME-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159118#comment-15159118 ] Jarek Jarcec Cecho commented on FLUME-2886: --- +1 > Optional Channels can cause OOMs > > > Key: FLUME-2886 > URL: https://issues.apache.org/jira/browse/FLUME-2886 > Project: Flume > Issue Type: Bug >Reporter: Hari Shreedharan >Assignee: Hari Shreedharan > Attachments: FLUME-2886.patch > > > If an optional channel is full, the queue backing the executor that is > asynchronously submitting the events to the channel can grow indefinitely in > size leading to a huge number of events on the heap and causing OOMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLUME-2712) Optional channel errors slows down the Source to Main channel event rate
[ https://issues.apache.org/jira/browse/FLUME-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho updated FLUME-2712: -- Fix Version/s: v1.7.0 > Optional channel errors slows down the Source to Main channel event rate > > > Key: FLUME-2712 > URL: https://issues.apache.org/jira/browse/FLUME-2712 > Project: Flume > Issue Type: Bug >Reporter: Johny Rufus >Assignee: Johny Rufus > Fix For: v1.7.0 > > Attachments: FLUME-2712-1.patch, FLUME-2712-2.patch, > FLUME-2712.patch, FLUME-2712.patch > > > When we have a source configured to deliver events to a main channel and an > optional channel, and if the delivery to optional channel fails, this > significantly slows down the rate at which events are delivered to the main > channel by the source. > We need to evaluate async means of delivering events to the optional channel > and isolate the errors happening in optional channel from the delivery to the > main channel -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLUME-2718) HTTP Source to support generic Stream Handler
[ https://issues.apache.org/jira/browse/FLUME-2718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho reassigned FLUME-2718: - Assignee: Hari The patch is in, thank you for your contribution Hari! > HTTP Source to support generic Stream Handler > - > > Key: FLUME-2718 > URL: https://issues.apache.org/jira/browse/FLUME-2718 > Project: Flume > Issue Type: Improvement > Components: Sinks+Sources >Reporter: Hari >Assignee: Hari > Attachments: > 0001-FLUME-2718-HTTP-Source-to-support-generic-Stream-Han.patch, > 0002-FLUME-2718-HTTP-Source-to-support-generic-Stream-Han.patch > > > Currently the HTTP Source supports JSONHandler as the default implementation. > A more generic approach will be having a BLOBHandler which accepts any > request input stream (that loads the stream as Event payload). Furthermore, > this Handler lets you define mandatory request parameters and maps those > parameters into Event Headers. > This way HTTPSource can be used as a generic Data Ingress endpoint for any > sink, where one can specify attributes run like basepath, filename & > timestamp as request parameters and access those values via HEADER values in > sink properties. > All this can be done without developing any custom Handler code. > For e.g. > With the below agent configuration, you can send any type of data > (JSON/CSV/TSV) and store it in any sink, HDFS in this case. > {code:title=sample command|borderStyle=solid} > curl -v -X POST > "http://testHost:8080/?basepath=/data/=test.json=1434101498275; > --data @test.json > {code} > {code:title=HDFS data path |borderStyle=solid} > /data/2015/06/12/test.json.1434101498275.lzo > {code} > {code:title=agent.conf|borderStyle=solid} > #Agent configuration > #HTTP Source configuration > agent.sources = httpSrc > agent.channels = memChannel > agent.sources.httpSrc.type = http > agent.sources.httpSrc.channels = memChannel > agent.sources.httpSrc.bind = testHost > agent.sources.httpSrc.port = 8080 > agent.sources.httpSrc.handler = org.apache.flume.source.http.BLOBHandler > agent.sources.httpSrc.handler.mandatoryParameters = basepath, filename > #Memory channel with default configuration > agent.channels.memChannel.type = memory > agent.channels.memChannel.capacity = 10 > agent.channels.memChannel.transactionCapacity = 1000 > #HDFS Sink configuration > agent.sinks.hdfsSink.type = hdfs > agent.sinks.hdfsSink.hdfs.path = %{basepath}/%Y/%m/%d > agent.sinks.hdfsSink.hdfs.useLocalTimeStamp = true > agent.sinks.hdfsSink.hdfs.filePrefix = %{filename} > agent.sinks.hdfsSink.hdfs.fileType = CompressedStream > agent.sinks.hdfsSink.hdfs.codeC = lzop > agent.sinks.hdfsSink.channel = memChannel > #Finally, activate. > agent.channels = memChannel > agent.sources = httpSrc > agent.sinks = hdfsSink > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLUME-2718) HTTP Source to support generic Stream Handler
[ https://issues.apache.org/jira/browse/FLUME-2718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho resolved FLUME-2718. --- Resolution: Fixed Fix Version/s: v1.7.0 > HTTP Source to support generic Stream Handler > - > > Key: FLUME-2718 > URL: https://issues.apache.org/jira/browse/FLUME-2718 > Project: Flume > Issue Type: Improvement > Components: Sinks+Sources >Reporter: Hari >Assignee: Hari > Fix For: v1.7.0 > > Attachments: > 0001-FLUME-2718-HTTP-Source-to-support-generic-Stream-Han.patch, > 0002-FLUME-2718-HTTP-Source-to-support-generic-Stream-Han.patch > > > Currently the HTTP Source supports JSONHandler as the default implementation. > A more generic approach will be having a BLOBHandler which accepts any > request input stream (that loads the stream as Event payload). Furthermore, > this Handler lets you define mandatory request parameters and maps those > parameters into Event Headers. > This way HTTPSource can be used as a generic Data Ingress endpoint for any > sink, where one can specify attributes run like basepath, filename & > timestamp as request parameters and access those values via HEADER values in > sink properties. > All this can be done without developing any custom Handler code. > For e.g. > With the below agent configuration, you can send any type of data > (JSON/CSV/TSV) and store it in any sink, HDFS in this case. > {code:title=sample command|borderStyle=solid} > curl -v -X POST > "http://testHost:8080/?basepath=/data/=test.json=1434101498275; > --data @test.json > {code} > {code:title=HDFS data path |borderStyle=solid} > /data/2015/06/12/test.json.1434101498275.lzo > {code} > {code:title=agent.conf|borderStyle=solid} > #Agent configuration > #HTTP Source configuration > agent.sources = httpSrc > agent.channels = memChannel > agent.sources.httpSrc.type = http > agent.sources.httpSrc.channels = memChannel > agent.sources.httpSrc.bind = testHost > agent.sources.httpSrc.port = 8080 > agent.sources.httpSrc.handler = org.apache.flume.source.http.BLOBHandler > agent.sources.httpSrc.handler.mandatoryParameters = basepath, filename > #Memory channel with default configuration > agent.channels.memChannel.type = memory > agent.channels.memChannel.capacity = 10 > agent.channels.memChannel.transactionCapacity = 1000 > #HDFS Sink configuration > agent.sinks.hdfsSink.type = hdfs > agent.sinks.hdfsSink.hdfs.path = %{basepath}/%Y/%m/%d > agent.sinks.hdfsSink.hdfs.useLocalTimeStamp = true > agent.sinks.hdfsSink.hdfs.filePrefix = %{filename} > agent.sinks.hdfsSink.hdfs.fileType = CompressedStream > agent.sinks.hdfsSink.hdfs.codeC = lzop > agent.sinks.hdfsSink.channel = memChannel > #Finally, activate. > agent.channels = memChannel > agent.sources = httpSrc > agent.sinks = hdfsSink > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2718) HTTP Source to support generic Stream Handler
[ https://issues.apache.org/jira/browse/FLUME-2718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098323#comment-15098323 ] Jarek Jarcec Cecho commented on FLUME-2718: --- I left few comments on the review board. > HTTP Source to support generic Stream Handler > - > > Key: FLUME-2718 > URL: https://issues.apache.org/jira/browse/FLUME-2718 > Project: Flume > Issue Type: Improvement > Components: Sinks+Sources >Reporter: Hari > Attachments: > 0001-FLUME-2718-HTTP-Source-to-support-generic-Stream-Han.patch > > > Currently the HTTP Source supports JSONHandler as the default implementation. > A more generic approach will be having a BLOBHandler which accepts any > request input stream (that loads the stream as Event payload). Furthermore, > this Handler lets you define mandatory request parameters and maps those > parameters into Event Headers. > This way HTTPSource can be used as a generic Data Ingress endpoint for any > sink, where one can specify attributes run like basepath, filename & > timestamp as request parameters and access those values via HEADER values in > sink properties. > All this can be done without developing any custom Handler code. > For e.g. > With the below agent configuration, you can send any type of data > (JSON/CSV/TSV) and store it in any sink, HDFS in this case. > {code:title=sample command|borderStyle=solid} > curl -v -X POST > "http://testHost:8080/?basepath=/data/=test.json=1434101498275; > --data @test.json > {code} > {code:title=HDFS data path |borderStyle=solid} > /data/2015/06/12/test.json.1434101498275.lzo > {code} > {code:title=agent.conf|borderStyle=solid} > #Agent configuration > #HTTP Source configuration > agent.sources = httpSrc > agent.channels = memChannel > agent.sources.httpSrc.type = http > agent.sources.httpSrc.channels = memChannel > agent.sources.httpSrc.bind = testHost > agent.sources.httpSrc.port = 8080 > agent.sources.httpSrc.handler = org.apache.flume.source.http.BLOBHandler > agent.sources.httpSrc.handler.mandatoryParameters = basepath, filename > #Memory channel with default configuration > agent.channels.memChannel.type = memory > agent.channels.memChannel.capacity = 10 > agent.channels.memChannel.transactionCapacity = 1000 > #HDFS Sink configuration > agent.sinks.hdfsSink.type = hdfs > agent.sinks.hdfsSink.hdfs.path = %{basepath}/%Y/%m/%d > agent.sinks.hdfsSink.hdfs.useLocalTimeStamp = true > agent.sinks.hdfsSink.hdfs.filePrefix = %{filename} > agent.sinks.hdfsSink.hdfs.fileType = CompressedStream > agent.sinks.hdfsSink.hdfs.codeC = lzop > agent.sinks.hdfsSink.channel = memChannel > #Finally, activate. > agent.channels = memChannel > agent.sources = httpSrc > agent.sinks = hdfsSink > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2703) HDFS sink: Ability to exclude time counter in fileName via sink configuration
[ https://issues.apache.org/jira/browse/FLUME-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097170#comment-15097170 ] Jarek Jarcec Cecho commented on FLUME-2703: --- I'm a bit concerned about introducing this functionality to flume - flume has been designed as a event based system and not necessary file based one. Trying to preserve the original filename might make it seem like we're transferring whole files which is not the case. Even with SpoolDirectorySource that reads whole files we can change order or generate duplicates at the end so the resulting file on HDFS might not end up having the same checksum. Also this can lead to a lot of issues with two independent flume agents will try to write to the same output file on HDFS. > HDFS sink: Ability to exclude time counter in fileName via sink configuration > -- > > Key: FLUME-2703 > URL: https://issues.apache.org/jira/browse/FLUME-2703 > Project: Flume > Issue Type: Improvement > Components: Sinks+Sources >Affects Versions: v1.7.0 >Reporter: Hari >Priority: Minor > Attachments: FLUME-2703-0.patch > > > HDFS sinks always append time counter to filenames which is not configurable. > In some use cases, it is desirable to retain the original filename. > For e.g. While ingesting a blob using Spool directory source, it's desirable > to retain the original filename (basename) in HDFS. > This patch allows to configure a HDFS sink to override this behavior > retaining the backward compatible file naming convention by default i.e, > hdfs.appendTimeCounter = false -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2718) HTTP Source to support generic Stream Handler
[ https://issues.apache.org/jira/browse/FLUME-2718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097166#comment-15097166 ] Jarek Jarcec Cecho commented on FLUME-2718: --- Thanks for the patch [~hariprasad kuppuswamy]! Would you mind uploading it to [review board|https://reviews.apache.org/dashboard/] for deeper review? > HTTP Source to support generic Stream Handler > - > > Key: FLUME-2718 > URL: https://issues.apache.org/jira/browse/FLUME-2718 > Project: Flume > Issue Type: Improvement > Components: Sinks+Sources >Reporter: Hari > Attachments: > 0001-FLUME-2718-HTTP-Source-to-support-generic-Stream-Han.patch > > > Currently the HTTP Source supports JSONHandler as the default implementation. > A more generic approach will be having a BLOBHandler which accepts any > request input stream (that loads the stream as Event payload). Furthermore, > this Handler lets you define mandatory request parameters and maps those > parameters into Event Headers. > This way HTTPSource can be used as a generic Data Ingress endpoint for any > sink, where one can specify attributes run like basepath, filename & > timestamp as request parameters and access those values via HEADER values in > sink properties. > All this can be done without developing any custom Handler code. > For e.g. > With the below agent configuration, you can send any type of data > (JSON/CSV/TSV) and store it in any sink, HDFS in this case. > {code:title=sample command|borderStyle=solid} > curl -v -X POST > "http://testHost:8080/?basepath=/data/=test.json=1434101498275; > --data @test.json > {code} > {code:title=HDFS data path |borderStyle=solid} > /data/2015/06/12/test.json.1434101498275.lzo > {code} > {code:title=agent.conf|borderStyle=solid} > #Agent configuration > #HTTP Source configuration > agent.sources = httpSrc > agent.channels = memChannel > agent.sources.httpSrc.type = http > agent.sources.httpSrc.channels = memChannel > agent.sources.httpSrc.bind = testHost > agent.sources.httpSrc.port = 8080 > agent.sources.httpSrc.handler = org.apache.flume.source.http.BLOBHandler > agent.sources.httpSrc.handler.mandatoryParameters = basepath, filename > #Memory channel with default configuration > agent.channels.memChannel.type = memory > agent.channels.memChannel.capacity = 10 > agent.channels.memChannel.transactionCapacity = 1000 > #HDFS Sink configuration > agent.sinks.hdfsSink.type = hdfs > agent.sinks.hdfsSink.hdfs.path = %{basepath}/%Y/%m/%d > agent.sinks.hdfsSink.hdfs.useLocalTimeStamp = true > agent.sinks.hdfsSink.hdfs.filePrefix = %{filename} > agent.sinks.hdfsSink.hdfs.fileType = CompressedStream > agent.sinks.hdfsSink.hdfs.codeC = lzop > agent.sinks.hdfsSink.channel = memChannel > #Finally, activate. > agent.channels = memChannel > agent.sources = httpSrc > agent.sinks = hdfsSink > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2720) HDFS Sink: Autogenerate LZO indexes while creating LZO files
[ https://issues.apache.org/jira/browse/FLUME-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097155#comment-15097155 ] Jarek Jarcec Cecho commented on FLUME-2720: --- Thanks for the patch [~hariprasad kuppuswamy]. I like the idea of making it simpler and auto-generate the LZO indexes. Sadly the LZO libraries are shared under GPL license that [is not compatible with ASLv2|http://www.apache.org/legal/resolved.html] and therefore we can't accept this improvement :( > HDFS Sink: Autogenerate LZO indexes while creating LZO files > > > Key: FLUME-2720 > URL: https://issues.apache.org/jira/browse/FLUME-2720 > Project: Flume > Issue Type: Improvement > Components: Sinks+Sources >Affects Versions: v1.7.0 >Reporter: Hari >Priority: Minor > Attachments: > 0001-FLUME-2720-Autogenerate-LZO-indexes-while-creating-L.patch > > > The LZO indexes are now generated offline using DistributedLZOIndexer. > It will be nice to auto generate these index files during the ingestion > itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSSION] Flume-Kafka 0.9 Support
It’s unfortunate that in order to support the new features in Kafka 0.9 (primarily the security additions), one have to lose support of previous version (0.8). I do believe that the security additions that have been added recently are important enough for us to migrate to the new version of Kafka and use it for the next Flume release. If some people will need to continue using future Flume version with Kafka 0.8, they should be able to simply take 1.6.0 version of Kafka Channel/Source/Sink jars and use them with the new agent, so we do have a mitigation plan if needed. Jarcec > On Dec 22, 2015, at 3:26 PM, Jeff Holomanwrote: > > With the new release of Kafka I wanted to start the discussion on how best > to handle updating Flume to be able to make use of some of the new features > available in 0.9. > > First, it is important for Flume to adopt the 0.9 Kafka Clients as the new > Consumer / Producer API's are the only APIs that support new Security > features put into the latest Kafka release such as SSL. > > If we agree that this is important, then we need to consider how best to > make this switch. With many projects, we could just update the jars/clients > and move along happily, however, the Kafka compatibility story complicates > this. > > > - > > Kafka promises to be backward compatible with clients > - > > i.e. A 0.8.x client can talk to a 0.9.x broker > - > > Kafka does not promise to be forward compatible (at all) from client > perspective: > - > > i.e. A 0.9.x client can not talk to a 0.8.x broker > - > > If it works, its is by luck and not reliable, even for old > functionality > - > > This is due to protocol changes and no way for the client to know the > version of Kafka it’s talking to. Hopefully KIP-35 (Retrieving protocol > version) will move this in the right direction. > > > > - > > Integrations that utilize Kafka 0.9.x clients will not be able to talk > to Kafka 0.8.x brokers at all and may get cryptic error messages when doing > so. > - > > Integrations will only be able to support one major version of Kafka at > a time without more complex class-loading > - > > Note: The kafka_2.10 artifact depends on the kafka-clients artifact > so you cannot have both kafka-clients & kafka_2.10 of different > versions at > the same time without collision > - > > However older clients (0.8.x) will work when talking to 0.9.x server. > - > > But that is pretty much useless as the benefits of 0.9.x (security > features) won’t be available. > > > Given these constraints, and after careful consideration, I propose that we > do the following. > > 1) Update the Kafka libraries to the latest 0.9/0.9+ release and update the > Source, Sink and Channel Implementations to make use of the new Kafka > Clients > 2) Document that Flume no longer supports Kafka Brokers < 0.9 > > Given that both producer and clients will be updated, there will need to be > changes in agent configurations to support the new clients. > > This means if upgrading Flume, existing agent configurations will break. I > don't see a clean way around this, unfortunately. This seems to be a > situation where we break things, and document this to be the case. > > > > > -- > Jeff Holoman > Systems Engineer
[jira] [Assigned] (FLUME-2858) Add better exception message for malformed zookeeper, also add a default of 2181 if the port is not given
[ https://issues.apache.org/jira/browse/FLUME-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho reassigned FLUME-2858: - Assignee: Samuel David Glover (was: Ted Malaska) Assigning to [~samglover] per the JIRA comments. > Add better exception message for malformed zookeeper, also add a default of > 2181 if the port is not given > --- > > Key: FLUME-2858 > URL: https://issues.apache.org/jira/browse/FLUME-2858 > Project: Flume > Issue Type: Bug >Reporter: Ted Malaska >Assignee: Samuel David Glover >Priority: Minor > Attachments: FLUME-2858-0.patch, FLUME-2858-1.patch > > > I lot time in my last set up because my zookeeperQuorum didn't have ports in > it. The error message required me to go into the code to figure out. > So this jira is to give a better exception message for this > Also we should add a default port of 2181 if the port is not given. This is > because HBase has this default port. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Potential Gerrit support in review/commit flow
I like the gerrit flow a lot. I can’t speak for the flume community, but I would be in favor of that :) You might consider sending similar email to Sqoop community where I would support gerrit as well. Jarcec > On Nov 6, 2015, at 10:27 AM, Zhe Zhangwrote: > > Hi Flume contributors, > > The Hadoop community is considering adding Gerrit as a review / commit > tool. Since this will require support from the Apache Infra team, it makes > more sense if multiple projects can benefit from the effort. > > The main benefit of Gerrit over ReviewBoard is better integration with git > and Jenkins. A Gerrit review request is created through a simple "git push" > instead of manually creating and uploading a diff file. Conflicts detection > and rebase can be done on the review UI as well. When the programmed commit > criteria are met (e.g. a code review +1 and a Jenkins verification), > committing can also be done with a single button click, or even > automatically. > > The main benefit of Gerrit over Github pull requests is the rebase workflow > (rather than git merge), which avoids merge commits. The rebase workflow > also enables a clear interdiff view, rather than reviewing every patch rev > from scratch. > > This also just augments instead of replacing the current review / commit > flow. Every task will still start as a JIRA. Review comments can be made on > both JIRA and Gerrit and will be bi-directionally mirrored. Patches can > also be directly committed through git command line (Gerrit will recognize > a direct commit and close the review request as long as a simple git hook > is installed: > https://gerrit.googlecode.com/svn/documentation/2.0/user-changeid.html). > > I wonder if the Flume community would be interested in moving on this > direction as well. Any feedback is much appreciated. > > Thanks, > Zhe Zhang
[jira] [Updated] (FLUME-2781) A Kafka Channel defined as parseAsFlumeEvent=false cannot be correctly used by a Flume source
[ https://issues.apache.org/jira/browse/FLUME-2781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho updated FLUME-2781: -- Fix Version/s: v1.7.0 > A Kafka Channel defined as parseAsFlumeEvent=false cannot be correctly used > by a Flume source > - > > Key: FLUME-2781 > URL: https://issues.apache.org/jira/browse/FLUME-2781 > Project: Flume > Issue Type: Improvement >Affects Versions: v1.6.0 >Reporter: Gonzalo Herreros >Assignee: Gonzalo Herreros > Labels: easyfix, patch > Fix For: v1.7.0 > > Attachments: FLUME-2781.patch > > > When a Kafka channel is configured as parseAsFlumeEvent=false, the channel > will read events from the topic as text instead of serialized Avro Flume > events. > This is useful so Flume can read from an existing Kafka topic, where other > Kafka clients publish as text. > However, if you use a Flume source on that channel, it will still write the > events as Avro so it will create an inconsistency and those events will fail > to be read correctly. > Also, this would allow a Flume source to write to a Kafka channel and any > Kafka subscriber to listen to Flume events passing through without binary > dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [ANNOUNCE] Change of Apache Flume PMC Chair
Congratulations Hari! Jarcec > On Oct 21, 2015, at 5:50 PM, Arvind Prabhakarwrote: > > Dear Flume Users and Developers, > > I have had the pleasure of serving as the PMC Chair of Apache Flume since its > graduation three years ago. I sincerely thank you and the Flume PMC for this > opportunity. However, I have decided to step down from this responsibility > due to personal reasons. > > I am very happy to announce that on the request of Flume PMC and with the > approval from the board of directors at The Apache Software Foundation, Hari > Shreedharan is hereby appointed as the new PMC Chair. I am confident that > Hari will do everything possible to help further grow the community and > adoption of Apache Flume. > > Please join me in congratulating Hari on his appointment and welcoming him to > this role. > > Regards, > Arvind Prabhakar
[jira] [Commented] (FLUME-2734) Kafka Channel timeout property is overridden by default value
[ https://issues.apache.org/jira/browse/FLUME-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937481#comment-14937481 ] Jarek Jarcec Cecho commented on FLUME-2734: --- +1 > Kafka Channel timeout property is overridden by default value > - > > Key: FLUME-2734 > URL: https://issues.apache.org/jira/browse/FLUME-2734 > Project: Flume > Issue Type: Bug >Reporter: Johny Rufus >Assignee: Johny Rufus > Attachments: FLUME-2734.patch > > > Kafka Channel timeout property which will be passed to the Kafka Consumer > internally, does not work as expected, as in the code, it is overridden by > default value (or value specified by .timeout property which is undocumented) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2751) Upgrade Derby version to 10.11.1.1
[ https://issues.apache.org/jira/browse/FLUME-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937464#comment-14937464 ] Jarek Jarcec Cecho commented on FLUME-2751: --- +1 > Upgrade Derby version to 10.11.1.1 > -- > > Key: FLUME-2751 > URL: https://issues.apache.org/jira/browse/FLUME-2751 > Project: Flume > Issue Type: Bug >Reporter: Johny Rufus >Assignee: Johny Rufus > Attachments: FLUME-2751.patch > > > Derby has lot has bug fixes in the past releases over 3 years, we should > upgrade to the latest stable release to 10.11.1.1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLUME-2737) Documentation for Pollable Source config parameters introduced in FLUME-2729
[ https://issues.apache.org/jira/browse/FLUME-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho resolved FLUME-2737. --- Resolution: Fixed Fix Version/s: v1.7.0 Thank you for your contribution [~malaskat]! Documentation for Pollable Source config parameters introduced in FLUME-2729 Key: FLUME-2737 URL: https://issues.apache.org/jira/browse/FLUME-2737 Project: Flume Issue Type: Documentation Reporter: Johny Rufus Assignee: Ted Malaska Fix For: v1.7.0 Attachments: FLUME-2737.patch, FLUME-2737.patch.1 Documentation needs to be updated in Flume User Guide, for all Pollable Sources, that would use the new config parameters introduced in FLUME-2729 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [ANNOUNCE] New Flume committer - Johny Rufus
Congratulations Johny! Jarcec On Jun 19, 2015, at 1:38 PM, Hari Shreedharan hshreedha...@apache.org wrote: On behalf of the Apache Flume PMC, I am excited to welcome Johny Rufus as a committer on the Apache Flume project. Johny has actively contributed several patches to the Flume project, including bug fixes, authentication and other new features. Congratulations and Welcome, Johny! Cheers, Hari Shreedharan
Re: The error I meet When I compile the source of apache-flume-1.5.2-src with eclipse with m2
Hi Maojun, those sort of questions are best asked on the flume user mailing list. You can find instructions how to sing up on following web page: http://flume.apache.org/mailinglists.html Jarcec On May 28, 2015, at 11:16 PM, Thanks 864157...@qq.com wrote: Dear Jaroslav Cecho, This is Maojun Wang,from JJWorld(Beijing) network technology Co.,LTD.When I compile the source of apache-flume-1.5.2-src with eclipse with m2,I meet an error as follows: 1 the error I meet in my eclipse :The import com.cloudera cannot be resolved CAC19F0B@A662E310.57046855 2 when I use mvn install -DskipTests to install the apache-flume-1.5.2-src,every thing is Ok. 3 when I unpack the flume-avro-source-1.5.2.jar from 2,I find the missing classes(as 1 describe) B2B97585@A662E310.57046855 4 Where is the problem? Help me Please! Thank you very much! Best regards Sincerely,Maojun Wang
[jira] [Commented] (FLUME-1934) Spoolingdir source exception when reading multiple zero size files
[ https://issues.apache.org/jira/browse/FLUME-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394766#comment-14394766 ] Jarek Jarcec Cecho commented on FLUME-1934: --- I've granted you contributor role on JIRA [~granthenke]. You should be able to assign the JIRA to yourself. Spoolingdir source exception when reading multiple zero size files - Key: FLUME-1934 URL: https://issues.apache.org/jira/browse/FLUME-1934 Project: Flume Issue Type: Bug Affects Versions: v1.3.1 Environment: windows 7, flume 1.3.1, spooling dir source. Reporter: andy zhou Attachments: FLUME-1934.patch move more than one files to the spool dir, and each file size is 0, then flume agent will throw IllegalStateException forever, and never work again,its' main cause is commited flag will not set to true. logs: 08 三月 2013 08:00:14,406 ERROR [pool-5-thread-1] (org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run:148) - Uncaught exception in Runnable java.lang.IllegalStateException: File should not roll when commit is outstanding. at org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:164) at org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:135) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Is 1.6 a good time to switch to hadoop2 ?
Sqoop2 method is very simple - we have different profiles for Hadoop 1 and 2 and simply use classifiers to mark the appropriate hadoop “profile”. No need to generate different pom.xml file. Check it out here: https://github.com/apache/sqoop/blob/branch-1.99.4/execution/mapreduce/pom.xml#L65 Jarcec On Apr 1, 2015, at 2:56 PM, Hari Shreedharan hshreedha...@cloudera.com wrote: We should choose which ever is the easiest to do I guess (Sqoop method or HBase method). Thanks, Hari On Wed, Apr 1, 2015 at 1:33 PM, Roshan Naik ros...@hortonworks.com wrote: FWIWŠ From what I recollect, Hbase did this when they started supporting hadoop12. Not sure if its still the same method they use.. - their default pom had the artifacts named for hadoop1 binaries - they had a shell script to produce a modified pom ..with -hadoop2 suffix for the artifact names - they built and published the hadoop2 binaries using the modified pom, and hadoop1 binaries form the original pom -roshan On 4/1/15 1:24 PM, Hari Shreedharan hshreedha...@cloudera.com wrote: I know sqoop2 does it. Maybe Jarcec can help? Thanks, Hari On Wed, Apr 1, 2015 at 1:13 PM, Roshan Naik ros...@hortonworks.com wrote: To push artifacts for both hadoop2 and hadoop1 will need to name the hadoop2 and hadoop1 artifacts differently. Tricky to do that with one pom I think. -roshan On 4/1/15 1:05 PM, Hari Shreedharan hshreedha...@cloudera.com wrote: I am a +1 for doing this, since this is the last JDK 6 release. Maybe we should push artifacts for both Hadoop-1 and hbase-98 profiles this time and switch out to Hadoop 2 exclusively from 1.7? Thanks, Hari On Wed, Apr 1, 2015 at 1:04 PM, Roshan Naik ros...@hortonworks.com wrote: Don't recall if this has been discussed before. Its been sometime since hadoop2 has been out. Sooner or later Flume will switch to hadoop2 based builds (hbase98 profile?) as the default. Not sure if 1.6 is the time or worth waiting longer. -roshan
30 tech skills that will get you a $110,000-plus salary
It seems that being Flume developer is a well paying job :) Jarcec
[jira] [Resolved] (FLUME-2630) Update documentation for Thrift SRc/Sink SSL support
[ https://issues.apache.org/jira/browse/FLUME-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho resolved FLUME-2630. --- Resolution: Fixed Thank you for the fix [~jrufus]! Update documentation for Thrift SRc/Sink SSL support Key: FLUME-2630 URL: https://issues.apache.org/jira/browse/FLUME-2630 Project: Flume Issue Type: Documentation Components: Sinks+Sources Affects Versions: v1.5.2 Reporter: Johny Rufus Assignee: Johny Rufus Fix For: v1.6.0 Attachments: FLUME-2630-1.patch, FLUME-2630-2.patch, FLUME-2630-3.patch, FLUME-2630.patch Update User guide to document the SSL related input parameters for Thrift Src and Thrift Sink -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2641) Drop Java 6 support post Flume 1.6
[ https://issues.apache.org/jira/browse/FLUME-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357161#comment-14357161 ] Jarek Jarcec Cecho commented on FLUME-2641: --- I'm +1 on dropping JKD 6 support as this version is not supported for more then two years. I don't mind dropping Hadoop 1 support either as Hadoop 2 is stable and used in all major distributions. Drop Java 6 support post Flume 1.6 -- Key: FLUME-2641 URL: https://issues.apache.org/jira/browse/FLUME-2641 Project: Flume Issue Type: Bug Reporter: Hari Shreedharan We should also test Java 7 + Hadoop 1. We will not target Java 6 at all, so we might have to drop Hadoop 1 support if Hadoop 1 jars won't run on Java 7 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2630) Update documentation for Thrift SRc/Sink SSL support
[ https://issues.apache.org/jira/browse/FLUME-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359831#comment-14359831 ] Jarek Jarcec Cecho commented on FLUME-2630: --- I still see some formatting issue [~jrufus]: {code} System Message: ERROR/3 (/Users/jarcec/apache/flume/flume-ng-doc/sphinx/FlumeUserGuide.rst, line 1960) Malformed table. Text in column margin at line offset 17. {code} Update documentation for Thrift SRc/Sink SSL support Key: FLUME-2630 URL: https://issues.apache.org/jira/browse/FLUME-2630 Project: Flume Issue Type: Documentation Components: Sinks+Sources Affects Versions: v1.5.2 Reporter: Johny Rufus Assignee: Johny Rufus Fix For: v1.6.0 Attachments: FLUME-2630-1.patch, FLUME-2630.patch Update User guide to document the SSL related input parameters for Thrift Src and Thrift Sink -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLUME-2126) Problem in elasticsearch sink when the event body is a complex field
[ https://issues.apache.org/jira/browse/FLUME-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho updated FLUME-2126: -- Fix Version/s: 1.6 Problem in elasticsearch sink when the event body is a complex field Key: FLUME-2126 URL: https://issues.apache.org/jira/browse/FLUME-2126 Project: Flume Issue Type: Bug Components: Sinks+Sources Environment: 1.3.1 and 1.4 Reporter: Massimo Paladin Fix For: 1.6 Attachments: FLUME-2126-0.patch I have found a bug in the elasticsearch sink, the problem is in the {{ContentBuilderUtil.addComplexField}} method, when it does {{builder.field(fieldName, tmp);}} the {{tmp}} object is taken as {{Object}} with the result of being serialized with the {{toString}} method in the {{XContentBuilder}}. In the end you get the object reference as content. The following change workaround the problem for me, the bad point is that it has to parse the content twice, I guess there is a better way to solve the problem but I am not an elasticsearch api expert. {code} --- a/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java +++ b/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java @@ -61,7 +61,12 @@ public class ContentBuilderUtil { parser = XContentFactory.xContent(contentType).createParser(data); parser.nextToken(); tmp.copyCurrentStructure(parser); - builder.field(fieldName, tmp); + + // if it is a valid structure then we include it + parser = XContentFactory.xContent(contentType).createParser(data); + parser.nextToken(); + builder.field(fieldName); + builder.copyCurrentStructure(parser); } catch (JsonParseException ex) { // If we get an exception here the most likely cause is nested JSON that // can't be figured out in the body. At this point just push it through {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLUME-2126) Problem in elasticsearch sink when the event body is a complex field
[ https://issues.apache.org/jira/browse/FLUME-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho resolved FLUME-2126. --- Resolution: Fixed I didn't get any response on the JIRA, so I'll return it back to resolved state. Let's take any subsequent discussion to separate JIRA. Problem in elasticsearch sink when the event body is a complex field Key: FLUME-2126 URL: https://issues.apache.org/jira/browse/FLUME-2126 Project: Flume Issue Type: Bug Components: Sinks+Sources Environment: 1.3.1 and 1.4 Reporter: Massimo Paladin Attachments: FLUME-2126-0.patch I have found a bug in the elasticsearch sink, the problem is in the {{ContentBuilderUtil.addComplexField}} method, when it does {{builder.field(fieldName, tmp);}} the {{tmp}} object is taken as {{Object}} with the result of being serialized with the {{toString}} method in the {{XContentBuilder}}. In the end you get the object reference as content. The following change workaround the problem for me, the bad point is that it has to parse the content twice, I guess there is a better way to solve the problem but I am not an elasticsearch api expert. {code} --- a/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java +++ b/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java @@ -61,7 +61,12 @@ public class ContentBuilderUtil { parser = XContentFactory.xContent(contentType).createParser(data); parser.nextToken(); tmp.copyCurrentStructure(parser); - builder.field(fieldName, tmp); + + // if it is a valid structure then we include it + parser = XContentFactory.xContent(contentType).createParser(data); + parser.nextToken(); + builder.field(fieldName); + builder.copyCurrentStructure(parser); } catch (JsonParseException ex) { // If we get an exception here the most likely cause is nested JSON that // can't be figured out in the body. At this point just push it through {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2126) Problem in elasticsearch sink when the event body is a complex field
[ https://issues.apache.org/jira/browse/FLUME-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350602#comment-14350602 ] Jarek Jarcec Cecho commented on FLUME-2126: --- Since the patch on this JIRA has been committed and not reverted, might I suggest to either: 1) Resolve this JIRA and move the open questions to subsequent discussion 2) Revert the committed patch Problem in elasticsearch sink when the event body is a complex field Key: FLUME-2126 URL: https://issues.apache.org/jira/browse/FLUME-2126 Project: Flume Issue Type: Bug Components: Sinks+Sources Environment: 1.3.1 and 1.4 Reporter: Massimo Paladin Attachments: FLUME-2126-0.patch I have found a bug in the elasticsearch sink, the problem is in the {{ContentBuilderUtil.addComplexField}} method, when it does {{builder.field(fieldName, tmp);}} the {{tmp}} object is taken as {{Object}} with the result of being serialized with the {{toString}} method in the {{XContentBuilder}}. In the end you get the object reference as content. The following change workaround the problem for me, the bad point is that it has to parse the content twice, I guess there is a better way to solve the problem but I am not an elasticsearch api expert. {code} --- a/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java +++ b/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java @@ -61,7 +61,12 @@ public class ContentBuilderUtil { parser = XContentFactory.xContent(contentType).createParser(data); parser.nextToken(); tmp.copyCurrentStructure(parser); - builder.field(fieldName, tmp); + + // if it is a valid structure then we include it + parser = XContentFactory.xContent(contentType).createParser(data); + parser.nextToken(); + builder.field(fieldName); + builder.copyCurrentStructure(parser); } catch (JsonParseException ex) { // If we get an exception here the most likely cause is nested JSON that // can't be figured out in the body. At this point just push it through {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2633) Update Kite dependency to 1.0.0
[ https://issues.apache.org/jira/browse/FLUME-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339490#comment-14339490 ] Jarek Jarcec Cecho commented on FLUME-2633: --- Since this was already committed, could you open a new JIRA for it [~whoschek]? Update Kite dependency to 1.0.0 --- Key: FLUME-2633 URL: https://issues.apache.org/jira/browse/FLUME-2633 Project: Flume Issue Type: Bug Components: Sinks+Sources Reporter: Tom White Assignee: Tom White Fix For: v1.6.0 Attachments: FLUME-2633.patch Update the dataset sink to use Kite 1.0.0 which has a stable API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2631) End to End authentication in Flume
[ https://issues.apache.org/jira/browse/FLUME-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336646#comment-14336646 ] Jarek Jarcec Cecho commented on FLUME-2631: --- We've recently added dependency on {{hadoop-common}} on Sqoop client side because {{hadoop-auth}} didn't contain all required code. [~abec] have more context there and perhaps can help to see if we can limit the hadoop dependency here? End to End authentication in Flume --- Key: FLUME-2631 URL: https://issues.apache.org/jira/browse/FLUME-2631 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Johny Rufus Assignee: Johny Rufus Fix For: v1.6.0 Attachments: FLUME-2631.patch 1. The idea is to enable authentication primarily by using SASL/GSSAPI/Kerberos with Thrift RPC. [Thrift already has support for SASL api that supports kerberos, so implementing right now for Thrift. For Avro RPC kerberos support, Avro needs to support SASL first for its Netty Server, before we can use it in flume] 2. Authentication will happen hop to hop[Client to source, intermediate sources to sinks, final sink to destination]. 3. As per the initial model, the user principals won’t be carried forward. The flume client[ThriftRpcClient] will authenticate itself to the KDC. All the intermediate agents [Thrift Sources/Sinks] will authenticate as principal ‘flume’ (typically, but this can be any valid principal that KDC can autenticate) to each other and the final agent will authenticate to the destination as the principal it wishes to identify to the destination -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2631) End to End authentication in Flume
[ https://issues.apache.org/jira/browse/FLUME-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337038#comment-14337038 ] Jarek Jarcec Cecho commented on FLUME-2631: --- Sqoop client is actually a dependency of the shell that exposes public Java APIs that users can use to talk to Sqoop 2 server from their Java applications :) I don't think that we have any active users yet, but considering how many requests we've seen for public Java API, I'm assuming that it will be used a lot. End to End authentication in Flume --- Key: FLUME-2631 URL: https://issues.apache.org/jira/browse/FLUME-2631 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Johny Rufus Assignee: Johny Rufus Fix For: v1.6.0 Attachments: FLUME-2631.patch 1. The idea is to enable authentication primarily by using SASL/GSSAPI/Kerberos with Thrift RPC. [Thrift already has support for SASL api that supports kerberos, so implementing right now for Thrift. For Avro RPC kerberos support, Avro needs to support SASL first for its Netty Server, before we can use it in flume] 2. Authentication will happen hop to hop[Client to source, intermediate sources to sinks, final sink to destination]. 3. As per the initial model, the user principals won’t be carried forward. The flume client[ThriftRpcClient] will authenticate itself to the KDC. All the intermediate agents [Thrift Sources/Sinks] will authenticate as principal ‘flume’ (typically, but this can be any valid principal that KDC can autenticate) to each other and the final agent will authenticate to the destination as the principal it wishes to identify to the destination -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2631) End to End authentication in Flume
[ https://issues.apache.org/jira/browse/FLUME-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337151#comment-14337151 ] Jarek Jarcec Cecho commented on FLUME-2631: --- (For the record, I don't particularly like depending on Hadoop common in Sqoop client either, I'm just saying that we did that) End to End authentication in Flume --- Key: FLUME-2631 URL: https://issues.apache.org/jira/browse/FLUME-2631 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Johny Rufus Assignee: Johny Rufus Fix For: v1.6.0 Attachments: FLUME-2631.patch 1. The idea is to enable authentication primarily by using SASL/GSSAPI/Kerberos with Thrift RPC. [Thrift already has support for SASL api that supports kerberos, so implementing right now for Thrift. For Avro RPC kerberos support, Avro needs to support SASL first for its Netty Server, before we can use it in flume] 2. Authentication will happen hop to hop[Client to source, intermediate sources to sinks, final sink to destination]. 3. As per the initial model, the user principals won’t be carried forward. The flume client[ThriftRpcClient] will authenticate itself to the KDC. All the intermediate agents [Thrift Sources/Sinks] will authenticate as principal ‘flume’ (typically, but this can be any valid principal that KDC can autenticate) to each other and the final agent will authenticate to the destination as the principal it wishes to identify to the destination -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2580) Sink side interceptors
[ https://issues.apache.org/jira/browse/FLUME-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307566#comment-14307566 ] Jarek Jarcec Cecho commented on FLUME-2580: --- Thanks for the document [~hshreedharan] - I agree with your assessment that it looks like a hack, but it seems as a reasonable solution for now. Quick question - are we planning to support event dropping as well? Sink side interceptors -- Key: FLUME-2580 URL: https://issues.apache.org/jira/browse/FLUME-2580 Project: Flume Issue Type: New Feature Reporter: Hari Shreedharan Assignee: Hari Shreedharan Attachments: SinkSideInterceptors.pdf Currently, we only have source-side interceptors that help routing. But if we use something like Kafka Channel, having a sink-side interceptor can help us modify events as they come in. We could also do validation on event schemas and drop them before they hit the sink, rather than have an infinite loop due to such events. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLUME-1207) Add sink-side decorators
[ https://issues.apache.org/jira/browse/FLUME-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho resolved FLUME-1207. --- Resolution: Duplicate Add sink-side decorators Key: FLUME-1207 URL: https://issues.apache.org/jira/browse/FLUME-1207 Project: Flume Issue Type: New Feature Reporter: Joey Echeverria Priority: Minor FLUME-1157 added support for interceptors (source-side decorators) which enables a number of the use cases that decorators were used for in Flume 0.xx. It would be nice to have a sink-side equivalent so that the same source can feed multiple channels/sinks with some getting decorated event, and others getting the original. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2443) org.apache.hadoop.fs.FSDataOutputStream.sync() is deprecated in hadoop 2.4
[ https://issues.apache.org/jira/browse/FLUME-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304044#comment-14304044 ] Jarek Jarcec Cecho commented on FLUME-2443: --- +1 org.apache.hadoop.fs.FSDataOutputStream.sync() is deprecated in hadoop 2.4 -- Key: FLUME-2443 URL: https://issues.apache.org/jira/browse/FLUME-2443 Project: Flume Issue Type: Dependency upgrade Components: Sinks+Sources, Technical Debt Affects Versions: v1.5.0, v1.5.0.1, v1.6.0 Environment: Linux PPC / x86 Reporter: Corentin Baron Assignee: Hari Shreedharan Fix For: v1.6.0 Attachments: FLUME-2443.patch HDFS sink uses this method, which is deprecated in hadoop 2.4.1, and no longer present in the current hadoop developments: java.lang.NoSuchMethodError: org/apache/hadoop/fs/FSDataOutputStream.sync()V at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:131) at org.apache.flume.sink.hdfs.BucketWriter$6.call(BucketWriter.java:502) at org.apache.flume.sink.hdfs.BucketWriter$6.call(BucketWriter.java:499) at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:718) at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:183) at org.apache.flume.sink.hdfs.BucketWriter.access$1700(BucketWriter.java:59) at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:715) at java.util.concurrent.FutureTask.run(FutureTask.java:273) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1176) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641) at java.lang.Thread.run(Thread.java:853) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2594) Close Async HBase Client if there are large number of consecutive timeouts
[ https://issues.apache.org/jira/browse/FLUME-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286348#comment-14286348 ] Jarek Jarcec Cecho commented on FLUME-2594: --- +1 Close Async HBase Client if there are large number of consecutive timeouts -- Key: FLUME-2594 URL: https://issues.apache.org/jira/browse/FLUME-2594 Project: Flume Issue Type: Bug Reporter: Hari Shreedharan Assignee: Hari Shreedharan Attachments: FLUME-2594.patch A large number of timeouts can lead to several thousands of messages being buffered but never removed from the HBaseClient's buffers. This can cause a major heap spike. So we should close and dereference the HBaseClient if we hit many failures from the HBase side to clear up the buffers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLUME-2594) Close Async HBase Client if there are large number of consecutive timeouts
[ https://issues.apache.org/jira/browse/FLUME-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho resolved FLUME-2594. --- Resolution: Fixed Fix Version/s: v1.6.0 Thank you for the contribution [~hshreedharan]! Close Async HBase Client if there are large number of consecutive timeouts -- Key: FLUME-2594 URL: https://issues.apache.org/jira/browse/FLUME-2594 Project: Flume Issue Type: Bug Reporter: Hari Shreedharan Assignee: Hari Shreedharan Fix For: v1.6.0 Attachments: FLUME-2594.patch A large number of timeouts can lead to several thousands of messages being buffered but never removed from the HBaseClient's buffers. This can cause a major heap spike. So we should close and dereference the HBaseClient if we hit many failures from the HBase side to clear up the buffers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLUME-2562) Metrics for Flafka components
[ https://issues.apache.org/jira/browse/FLUME-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho updated FLUME-2562: -- Fix Version/s: v1.6.0 Metrics for Flafka components - Key: FLUME-2562 URL: https://issues.apache.org/jira/browse/FLUME-2562 Project: Flume Issue Type: Improvement Reporter: Gwen Shapira Assignee: Gwen Shapira Fix For: v1.6.0 Attachments: FLUME-2562.1.patch, FLUME-2562.2.patch Kafka source, sink and channel should have metrics. This will help us track down possible issues or performance problems. Here are the metrics I came up with: kafka.next.time - Time spent waiting for events from Kafka (source and channel) kafka.send.time - Time spent sending events (channel and sink) kafka.commit.time - Time spent committing (source and channel) events.sent - Number of events sent to Kafka (sink and channel) events.read - Number of events read from Kafka (channel and source) events.rollback - Number of event rolled back (channel) or number of rollback calls (sink) kafka.empty - Number of times backing off due to empty kafka topic (source) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Fwd: [ NOTICE ] Service Downtime Notification - R/W git repos
FYI: Our git repo will be down on Thursday. Jarcec Begin forwarded message: Reply-To: priv...@sqoop.apache.org Date: January 13, 2015 at 6:48:51 AM PST Subject: [ NOTICE ] Service Downtime Notification - R/W git repos From: Tony Stevenson pct...@apache.org To: undisclosed-recipients:; Folks, Please note than on Thursday 15th at 20:00 UTC the Infrastructure team will be taking the read/write git repositories offline. We expect that this migration to last about 4 hours. During the outage the service will be migrated from an old host to a new one. We intend to keep the URL the same for access to the repos after the migration, but an alternate name is already in place in case DNS updates take too long. Please be aware it might take some hours after the completion of the downtime for github to update and reflect any changes. The Infrastructure team have been trialling the new host for about a week now, and [touch wood] have not had any problems with it. The service is current;y available by accessing repos via: https://git-wip-us.apache.org If you have any questions please address them to infrastruct...@apache.org -- Cheers, Tony On behalf of the Apache Infrastructure Team -- Tony Stevenson t...@pc-tony.com pct...@apache.org http://www.pc-tony.com GPG - 1024D/51047D66 --
[jira] [Assigned] (FLUME-2488) TestElasticSearchRestClient fails on Oracle JDK 8
[ https://issues.apache.org/jira/browse/FLUME-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho reassigned FLUME-2488: - Assignee: Johny Rufus (was: Santiago M. Mola) TestElasticSearchRestClient fails on Oracle JDK 8 - Key: FLUME-2488 URL: https://issues.apache.org/jira/browse/FLUME-2488 Project: Flume Issue Type: Bug Components: Sinks+Sources Affects Versions: v1.5.0.1 Reporter: Santiago M. Mola Assignee: Johny Rufus Labels: jdk8 Fix For: v1.6.0 Attachments: FLUME-2488-0.patch, FLUME-2488-1.patch The JSON comparison should be performed independently of the keys order: https://travis-ci.org/Stratio/flume/jobs/36847693#L6735 shouldAddNewEventWithoutTTL(org.apache.flume.sink.elasticsearch.client.TestElasticSearchRestClient) Time elapsed: 493 sec FAILURE! junit.framework.ComparisonFailure: expected:{index:{_[type:bar_type,_index:foo_index]}} {body:test} but was:{index:{_[index:foo_index,_type:bar_type]}} {body:test} at junit.framework.Assert.assertEquals(Assert.java:85) at junit.framework.Assert.assertEquals(Assert.java:91) at org.apache.flume.sink.elasticsearch.client.TestElasticSearchRestClient.shouldAddNewEventWithoutTTL(TestElasticSearchRestClient.java:105) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: SSLv2Hello is required on Java 6
Hi Hari, I’ve just reviewed and committed FLUME-2547 and it’s subtasks. As we are supporting JDK6, I would be in favor of doing another quick release. Jarcec On Nov 11, 2014, at 2:03 PM, Hari Shreedharan hshreedha...@cloudera.com wrote: Hi, Right after we pushed 1.5.1 out (I have not even sent the announce email), we discovered that Java 6 requires SSLv2Hello on the server side for negotiation even if TLS is used (unless the client code is also changed to disable SSLv2Hello). So: - For HTTP Source, any clients running on Java 6 would need code changes to also disable SSLv2Hello to be able to send data via TLSv1. - For Avro Source, any clients running Flume SDK 1.5.1 on Java 6 would break and requires the client application to upgrade to 1.5.1. I filed FLUME-2547 to fix this. My question to the community here is whether we want a new release bringing SSLv2Hello back or if we are willing to just document this and move forward? I am willing to put together an RC if required. Thanks, Hari
Re: [VOTE] Apache Flume 1.5.2 RC1
+1 * Verified checksums and signature files * Verified that each jar in binary tarball is in the license * Checked top level files (NOTICE, ...) * Run tests (pretty much the same email I’ve sent for 1.5.1 :)) Jarcec On Nov 12, 2014, at 1:15 PM, Hari Shreedharan hshreedha...@cloudera.com wrote: This is the eighth release for Apache Flume as a top-level project, version 1.5.2. We are voting on release candidate RC1. This release fixes an incompatibility with Java 6 based clients found in Apache Flume 1.5.1 Release. It fixes the following issues:https://git-wip-us.apache.org/repos/asf?p=flume.git;a=blob;f=CHANGELOG;h=cc7321361d0b702ba870de20d6a3d2106987186a;hb=229442aa6835ee0faa17e3034bcab42754c460f5 *** Please cast your vote within the next 72 hours *** The tarball (*.tar.gz), signature (*.asc), and checksums (*.md5, *.sha1) for the source and binary artifacts can be found here: *https://people.apache.org/~hshreedharan/apache-flume-1.5.2-rc1/ https://people.apache.org/~hshreedharan/apache-flume-1.5.2-rc1/* Maven staging repo: *https://repository.apache.org/content/repositories/orgapacheflume-1008/ https://repository.apache.org/content/repositories/orgapacheflume-1008/* The tag to be voted on: *https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=229442aa6835ee0faa17e3034bcab42754c460f5 https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=229442aa6835ee0faa17e3034bcab42754c460f5* Flume's KEYS file containing PGP keys we use to sign the release: http://www.apache.org/dist/flume/KEYS Thanks, Hari
Re: [VOTE] Release Apache Flume 1.5.1 RC1
+1 * Verified checksums and signature files * Verified that each jar in binary tarball is in the license * Checked top level files (NOTICE, ...) * Run tests Jarcec On Nov 6, 2014, at 3:17 PM, Hari Shreedharan hshreedha...@cloudera.com wrote: This is the seventh release for Apache Flume as a top-level project, version 1.5.1. We are voting on release candidate RC1. It fixes the following issues: https://git-wip-us.apache.org/repos/asf?p=flume.git;a=blob_plain;f=CHANGELOG;hb=c74804226bcee59823c0cbc09cdf803a3d9e6920 *** Please cast your vote within the next 72 hours *** The tarball (*.tar.gz), signature (*.asc), and checksums (*.md5, *.sha1) for the source and binary artifacts can be found here: https://people.apache.org/~hshreedharan/apache-flume-1.5.1-rc1/ Maven staging repo: https://repository.apache.org/content/repositories/orgapacheflume-1006/ The tag to be voted on: https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=c74804226bcee59823c0cbc09cdf803a3d9e6920 Flume's KEYS file containing PGP keys we use to sign the release: http://www.apache.org/dist/flume/KEYS Thanks, Hari
[jira] [Commented] (FLUME-2505) Test added in FLUME-2502 is flaky
[ https://issues.apache.org/jira/browse/FLUME-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203574#comment-14203574 ] Jarek Jarcec Cecho commented on FLUME-2505: --- +1 Test added in FLUME-2502 is flaky - Key: FLUME-2505 URL: https://issues.apache.org/jira/browse/FLUME-2505 Project: Flume Issue Type: Bug Reporter: Hari Shreedharan Assignee: Hari Shreedharan Attachments: FLUME-2505.patch, FLUME-2505.patch I added a test to Prateek's patch - which is flaky on Jenkins (not locally) - probably due to slower machines. I think we should make the test a bit more tolerant. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Flume 1.5.1 Release
+1 go for 1.5.1 release Hari Jarcec On Nov 6, 2014, at 1:39 PM, Hari Shreedharan hshreedha...@cloudera.com wrote: As some of you may have noticed, I have been pushing some commits to trunk, 1.5 and 1.6 branches. This is primarily meant for a 1.5.1 maintenance release that includes the following patches: FLUME-2520: HTTP Source should be able to block a prefixed set of protocols. FLUME-2511. Allow configuration of enabled protocols in Avro source and RpcClient. FLUME-2441. HTTP Source Unit tests fail on IBM JDK 7 FLUME-2533: HTTPS tests fail on Java 6 I will try to spin up an RC today so we can vote through the weekend. Let me know if you have any concerns. Thanks, Hari
[jira] [Commented] (FLUME-2533) HTTPS tests fail on Java 6
[ https://issues.apache.org/jira/browse/FLUME-2533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199490#comment-14199490 ] Jarek Jarcec Cecho commented on FLUME-2533: --- +1 HTTPS tests fail on Java 6 -- Key: FLUME-2533 URL: https://issues.apache.org/jira/browse/FLUME-2533 Project: Flume Issue Type: Bug Reporter: Hari Shreedharan Assignee: Hari Shreedharan Attachments: FLUME-2533.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [ANNOUNCE] New Flume PMC Member - Roshan Naik
Congratulations Roshan! Jarcec On Nov 4, 2014, at 2:12 PM, Arvind Prabhakar arv...@apache.org wrote: On behalf of Apache Flume PMC, it is my pleasure to announce that Roshan Naik has been elected to the Flume Project Management Committee. Roshan has been active with the project for many years and has been a committer on the project since September of 2013. Please join me in congratulating Roshan and welcoming him to the Flume PMC. Regards, Arvind Prabhakar
[jira] [Commented] (FLUME-2500) Add a channel that uses Kafka
[ https://issues.apache.org/jira/browse/FLUME-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173787#comment-14173787 ] Jarek Jarcec Cecho commented on FLUME-2500: --- Could you also post the patch to review board [~hshreedharan]? Add a channel that uses Kafka -- Key: FLUME-2500 URL: https://issues.apache.org/jira/browse/FLUME-2500 Project: Flume Issue Type: Bug Reporter: Hari Shreedharan Assignee: Hari Shreedharan Attachments: FLUME-2500.patch Here is the rationale: - Kafka does give a HA channel, which means a dead agent does not affect the data in the channel - thus reducing delay of delivery. - Kafka is used by many companies - it would be a good idea to use Flume to pull data from Kafka and write it to HDFS/HBase etc. This channel is not going to be useful for cases where Kafka is not already used, since it brings is operational overhead of maintaining two systems, but if there is Kafka in use - this is good way to integrate Kafka and Flume. Here is an a scratch implementation: https://github.com/harishreedharan/flume/blob/kafka-channel/flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannel.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2500) Add a channel that uses Kafka
[ https://issues.apache.org/jira/browse/FLUME-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169427#comment-14169427 ] Jarek Jarcec Cecho commented on FLUME-2500: --- I like the idea of Kafka channel, so I'm +1 on the idea! Add a channel that uses Kafka -- Key: FLUME-2500 URL: https://issues.apache.org/jira/browse/FLUME-2500 Project: Flume Issue Type: Bug Reporter: Hari Shreedharan Assignee: Hari Shreedharan Here is the rationale: - Kafka does give a HA channel, which means a dead agent does not affect the data in the channel - thus reducing delay of delivery. - Kafka is used by many companies - it would be a good idea to use Flume to pull data from Kafka and write it to HDFS/HBase etc. This channel is not going to be useful for cases where Kafka is not already used, since it brings is operational overhead of maintaining two systems, but if there is Kafka in use - this is good way to integrate Kafka and Flume. Here is an a scratch implementation: https://github.com/harishreedharan/flume/blob/kafka-channel/flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannel.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2250) Add support for Kafka Source
[ https://issues.apache.org/jira/browse/FLUME-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14133601#comment-14133601 ] Jarek Jarcec Cecho commented on FLUME-2250: --- Good point, thank you for raising the concern about legality [~hshreedharan]! I can second [~gwenshap] sentiment. Contributors are agreeing with including their work in ASF software by attaching their patches on JIRA - commit to a repository is not necessary from a legal perspective. Hence, this situation when one contributor took uploaded patch of another contributor to finish the work is completely fine and in fact has happened several times through my time at Apache. Add support for Kafka Source Key: FLUME-2250 URL: https://issues.apache.org/jira/browse/FLUME-2250 Project: Flume Issue Type: Sub-task Components: Sinks+Sources Affects Versions: v1.5.0 Reporter: Ashish Paliwal Priority: Minor Attachments: FLUME-2250-0.patch, FLUME-2250-1.patch, FLUME-2250-2.patch, FLUME-2250.patch Add support for Kafka Source -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Interest in a Flume Meetup?
Yeah, I don’t even remember when we had last Flume meet up, so I’m definitely up for doing it. I don’t have preference to either place or time at the moment. Jarcec On Jul 28, 2014, at 4:34 PM, Jonathan Natkins na...@streamsets.com wrote: Hi Flume folks, I've noticed that it's been quite a while since the last Flume meetup, and was curious if there would be interest in doing one sometime in August around SF or somewhere else in the bay area. Would anybody on this list be interested in attending if one were scheduled? Are there preferences as to whether or not it is in San Francisco or somewhere in the South Bay (most likely Palo Alto)? Feel free to respond to the list or me directly with questions or comments. Thanks! Natty
[jira] [Commented] (FLUME-2416) Use CodecPool in compressed stream to prevent leak of direct buffers
[ https://issues.apache.org/jira/browse/FLUME-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045493#comment-14045493 ] Jarek Jarcec Cecho commented on FLUME-2416: --- +1 Use CodecPool in compressed stream to prevent leak of direct buffers Key: FLUME-2416 URL: https://issues.apache.org/jira/browse/FLUME-2416 Project: Flume Issue Type: Bug Reporter: Hari Shreedharan Assignee: Hari Shreedharan Attachments: FLUME-2416.patch Even though they may no longer be references, Java only cleans up direct buffers on full gc. If there is enough heap available, a full GC is never hit and these buffers are leaked. Hadoop keeps creating new compressors instead of using the pools causing a leak - which is a bug in itself which is being addressed by HADOOP-10591 -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: [VOTE] Apache Flume 1.5.0.1 RC1
+1. Looks good to me. Jarcec On Tue, Jun 10, 2014 at 03:40:25PM -0700, Hari Shreedharan wrote: This is a vote for the next release of Apache Flume, version 1.5.0.1. We are voting on release candidate RC1. It fixes the following issues: http://s.apache.org/v7X *** Please cast your vote within the next 72 hours *** The tarball (*.tar.gz), signature (*.asc), and checksums (*.md5, *.sha1) for the source and binary artifacts can be found here: https://people.apache.org/~hshreedharan/apache-flume-1.5.0.1-rc1/ Maven staging repo: https://repository.apache.org/content/repositories/orgapacheflume-1004/ The tag to be voted on: https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=ceda6aa1126a01370641caf729d8b1dd6d80aa61 Flume's KEYS file containing PGP keys we use to sign the release: http://www.apache.org/dist/flume/KEYS Thanks, Hari signature.asc Description: Digital signature
Re: Preparing For Flume 1.5.0.1
+1 thank you for working on it Hari! Jarcec On Mon, Jun 09, 2014 at 11:25:37AM -0700, Hari Shreedharan wrote: Hi all, Flume 1.5.0 did not have a build compatible with HBase-98, which is blocking the next BigTop release. I expect the new release vote to be out early this week and the release in the later part of the week. This release will not have any new features, and is only a build-related release to unblock BigTop Thanks, Hari signature.asc Description: Digital signature
[jira] [Commented] (FLUME-2397) HBase-98 compatibility
[ https://issues.apache.org/jira/browse/FLUME-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018920#comment-14018920 ] Jarek Jarcec Cecho commented on FLUME-2397: --- +1 HBase-98 compatibility -- Key: FLUME-2397 URL: https://issues.apache.org/jira/browse/FLUME-2397 Project: Flume Issue Type: Bug Affects Versions: v1.5.0 Reporter: Hari Shreedharan Assignee: Hari Shreedharan Priority: Blocker Fix For: v1.5.0.1 Attachments: FLUME-2397.patch In addition to adding a new profile in the POM, it looks like the new version of asynchbase uses the same executor to create a channel selector boss and worker threads. We need to pass in two different single threaded executors to ensure that the test does not fail. This is *required* for HBase 98 support -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (FLUME-2397) HBase-98 compatibility
[ https://issues.apache.org/jira/browse/FLUME-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho reassigned FLUME-2397: - Assignee: Jarek Jarcec Cecho (was: Hari Shreedharan) HBase-98 compatibility -- Key: FLUME-2397 URL: https://issues.apache.org/jira/browse/FLUME-2397 Project: Flume Issue Type: Bug Affects Versions: v1.5.0 Reporter: Hari Shreedharan Assignee: Jarek Jarcec Cecho Priority: Blocker Fix For: v1.5.0.1 Attachments: FLUME-2397.patch In addition to adding a new profile in the POM, it looks like the new version of asynchbase uses the same executor to create a channel selector boss and worker threads. We need to pass in two different single threaded executors to ensure that the test does not fail. This is *required* for HBase 98 support -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Flume 1.6.0 - 80% issues with patches
Hi Otis, we've just released flume 1.5.0, so I'm wondering what is the motivation for 1.6.0? Is there any patch you need to get in? Jarcec On Wed, May 28, 2014 at 05:10:25PM -0400, Otis Gospodnetic wrote: Btw. I had a peek at https://issues.apache.org/jira/browse/FLUME/fixforversion/12327047/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-issues-paneland noticed that in version 1.6.0: * there are a total of 19 open issues * 80% of issues already have a patch available - nice :) Is it thus realistic to see Flume 1.6.0 towards the end of June? Thanks, Otis signature.asc Description: Digital signature
Re: Flume 1.5.0.1 branching
Hi Roshan, I believe that we are using branches to encoding minor line (e.g. 1.4 or 1.5) and then for the particular release we have tags (1.5.0, 1.4.0). Hence I would assume that we will release from branch flume-1.5. Jarcec On Wed, May 28, 2014 at 01:47:51PM -0700, Roshan Naik wrote: Based on discussion on FLUME-1618 there appears to be an intent to release Flume 1.5.0.1 Wondering if this release would have a branch of its own ? -roshan -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. signature.asc Description: Digital signature
Re: [VOTE] Apache Flume 1.5.0 RC1
+1 * Verified checksums and signature files * Verified that each jar in binary tarball is in the license * Notice have copyright from 2012, we should update it (not a blocker in my mind) * Checked top level files (NOTICE, ...) * Run tests Jarcec On Wed, May 07, 2014 at 03:28:36PM -0700, Hari Shreedharan wrote: This is a vote for the next release of Apache Flume, version 1.5.0. We are voting on release candidate RC1. It fixes the following issues: http://s.apache.org/4eQ *** Please cast your vote within the next 72 hours *** The tarball (*.tar.gz), signature (*.asc), and checksums (*.md5, *.sha1) for the source and binary artifacts can be found here: https://people.apache.org/~hshreedharan/apache-flume-1.5.0-rc1/ Maven staging repo: https://repository.apache.org/content/repositories/orgapacheflume-1001/ The tag to be voted on: https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=8633220df808c4cd0c13d1cf0320454a94f1ea97 Flume's KEYS file containing PGP keys we use to sign the release: http://www.apache.org/dist/flume/KEYS Thanks, Hari signature.asc Description: Digital signature
[jira] [Commented] (FLUME-2381) Upgrade Hadoop version in Hadoop 2 profile to 2.4.0
[ https://issues.apache.org/jira/browse/FLUME-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13988910#comment-13988910 ] Jarek Jarcec Cecho commented on FLUME-2381: --- +1 Upgrade Hadoop version in Hadoop 2 profile to 2.4.0 --- Key: FLUME-2381 URL: https://issues.apache.org/jira/browse/FLUME-2381 Project: Flume Issue Type: Bug Affects Versions: v1.4.0 Reporter: Hari Shreedharan Assignee: Hari Shreedharan Fix For: v1.5.0 Attachments: FLUME-2381.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (FLUME-2381) Upgrade Hadoop version in Hadoop 2 profile to 2.4.0
[ https://issues.apache.org/jira/browse/FLUME-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho resolved FLUME-2381. --- Resolution: Fixed Thank you [~hshreedharan]! Upgrade Hadoop version in Hadoop 2 profile to 2.4.0 --- Key: FLUME-2381 URL: https://issues.apache.org/jira/browse/FLUME-2381 Project: Flume Issue Type: Bug Affects Versions: v1.4.0 Reporter: Hari Shreedharan Assignee: Hari Shreedharan Fix For: v1.5.0 Attachments: FLUME-2381.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (FLUME-2347) Add FLUME_JAVA_OPTS which allows users to inject java properties from cmd line
[ https://issues.apache.org/jira/browse/FLUME-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945615#comment-13945615 ] Jarek Jarcec Cecho commented on FLUME-2347: --- +1 Add FLUME_JAVA_OPTS which allows users to inject java properties from cmd line -- Key: FLUME-2347 URL: https://issues.apache.org/jira/browse/FLUME-2347 Project: Flume Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: FLUME-2347.patch In order to set java properties such as -X, -D, and -javaagent we have teh following: * flume-ng takes -X and -D as native properties * JAVA_OPTS can be placed in the flume-env.sh file However, there is no way to set properties on the command line which do not start with -X or -D. eg. env JAVA_OPTS=-javaagent flume-ng Therefore I suggest we introduce FLUME_JAVA_OPTS which sets properties from the env starting the flume-ng command. This will not impact users who use JAVA_OPTs in the non-flume environment incompatibly with flume. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (FLUME-2347) Add FLUME_JAVA_OPTS which allows users to inject java properties from cmd line
[ https://issues.apache.org/jira/browse/FLUME-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho resolved FLUME-2347. --- Resolution: Fixed Fix Version/s: v1.5.0 Thank you for your contribution [~brocknoland]! Add FLUME_JAVA_OPTS which allows users to inject java properties from cmd line -- Key: FLUME-2347 URL: https://issues.apache.org/jira/browse/FLUME-2347 Project: Flume Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Fix For: v1.5.0 Attachments: FLUME-2347.patch In order to set java properties such as -X, -D, and -javaagent we have teh following: * flume-ng takes -X and -D as native properties * JAVA_OPTS can be placed in the flume-env.sh file However, there is no way to set properties on the command line which do not start with -X or -D. eg. env JAVA_OPTS=-javaagent flume-ng Therefore I suggest we introduce FLUME_JAVA_OPTS which sets properties from the env starting the flume-ng command. This will not impact users who use JAVA_OPTs in the non-flume environment incompatibly with flume. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (FLUME-2336) HBase tests that pass in ZK configs must use a new context object
[ https://issues.apache.org/jira/browse/FLUME-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917229#comment-13917229 ] Jarek Jarcec Cecho commented on FLUME-2336: --- +1 HBase tests that pass in ZK configs must use a new context object - Key: FLUME-2336 URL: https://issues.apache.org/jira/browse/FLUME-2336 Project: Flume Issue Type: Bug Reporter: Hari Shreedharan Assignee: Hari Shreedharan Attachments: FLUME-2336.patch In java 7, since the test order is undefined, calling testZkIncorrectPorts followed by other tests causes a series of failures. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (FLUME-2336) HBase tests that pass in ZK configs must use a new context object
[ https://issues.apache.org/jira/browse/FLUME-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho updated FLUME-2336: -- Fix Version/s: (was: v0.9.5) v1.5.0 HBase tests that pass in ZK configs must use a new context object - Key: FLUME-2336 URL: https://issues.apache.org/jira/browse/FLUME-2336 Project: Flume Issue Type: Bug Reporter: Hari Shreedharan Assignee: Hari Shreedharan Fix For: v1.5.0 Attachments: FLUME-2336.patch In java 7, since the test order is undefined, calling testZkIncorrectPorts followed by other tests causes a series of failures. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (FLUME-2324) Support writing to multiple HBase clusters using HBaseSink
[ https://issues.apache.org/jira/browse/FLUME-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916538#comment-13916538 ] Jarek Jarcec Cecho commented on FLUME-2324: --- Couple of nits: * I've noticed that this fragment is in the code twice, perhaps we should have it only once? {code} logger.info(Using ZK Quorum: + zkQuorum); {code} * Do you think that you could update the user guide with this argument? It seems that we have currently documented the {{zookeeperQuorum}} only for AsyncHBaseSink. Support writing to multiple HBase clusters using HBaseSink -- Key: FLUME-2324 URL: https://issues.apache.org/jira/browse/FLUME-2324 Project: Flume Issue Type: Bug Reporter: Hari Shreedharan Assignee: Hari Shreedharan Attachments: FLUME-2324.patch The AsyncHBaseSink can already write to multiple HBase clusters, but HBaseSink cannot. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (FLUME-2328) FileChannel Dual Checkpoint Backup Thread not released on Application stop
[ https://issues.apache.org/jira/browse/FLUME-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916552#comment-13916552 ] Jarek Jarcec Cecho commented on FLUME-2328: --- +1 FileChannel Dual Checkpoint Backup Thread not released on Application stop -- Key: FLUME-2328 URL: https://issues.apache.org/jira/browse/FLUME-2328 Project: Flume Issue Type: Bug Components: File Channel Affects Versions: v1.4.0 Reporter: Arun Assignee: Hari Shreedharan Attachments: FLUME-2328.patch In my application wired the filechannel with dual checkpoint enabled. Even after calling application.stop() i can see checkpoint backup thread is still in waiting state. [channel=c1] - CheckpointBackUpThread prio=6 tid=0x3a069400 nid=0x8a4 waiting on condition [0x3b17f000] Since i am usign java service wrapper to run application and for stopping service i am waiting all user threads to be released, service is not stopping gracefully even after waiting for 5 mins. in code i can see checkpointBackUpExecutor is started if (shouldBackup) { checkpointBackUpExecutor = Executors.newSingleThreadExecutor( new ThreadFactoryBuilder().setNameFormat( getName() + - CheckpointBackUpThread).build()); } else { checkpointBackUpExecutor = null; } there is no shutdown call for checkpointBackUpExecutor in anywhere in EventQueueBackingStoreFile -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (FLUME-2323) Morphline sink must increment eventDrainAttemptCount when it takes event from channel
[ https://issues.apache.org/jira/browse/FLUME-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916580#comment-13916580 ] Jarek Jarcec Cecho commented on FLUME-2323: --- +1 Morphline sink must increment eventDrainAttemptCount when it takes event from channel - Key: FLUME-2323 URL: https://issues.apache.org/jira/browse/FLUME-2323 Project: Flume Issue Type: Bug Reporter: Hari Shreedharan Assignee: Hari Shreedharan Attachments: FLUME-2323.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (FLUME-2283) Spool Dir source must check interrupt flag before writing to channel
[ https://issues.apache.org/jira/browse/FLUME-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho resolved FLUME-2283. --- Resolution: Fixed Fix Version/s: v1.5.0 Your contribution is appreciated [~hshreedharan]! Spool Dir source must check interrupt flag before writing to channel Key: FLUME-2283 URL: https://issues.apache.org/jira/browse/FLUME-2283 Project: Flume Issue Type: Bug Reporter: Hari Shreedharan Assignee: Hari Shreedharan Fix For: v1.5.0 Attachments: FLUME-2283.patch FLUME-2282 was likely caused by the Spool Dir Source continuing to attempt writing the channel even though the stop method was called, because it never checks the interrupt flag. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (FLUME-2307) Remove Log writetimeout
[ https://issues.apache.org/jira/browse/FLUME-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho updated FLUME-2307: -- Description: I've observed Flume failing to clean up old log data in FileChannels. The amount of old log data can range anywhere from tens to hundreds of GB. I was able to confirm that the channels were in fact empty. This behavior always occurs after lock timeouts when attempting to put, take, rollback, or commit to a FileChannel. Once the timeout occurs, Flume stops cleaning up the old files. I was able to confirm that the Log's writeCheckpoint method was still being called and successfully obtaining a lock from tryLockExclusive(), but I was not able to confirm removeOldLogs being called. The application log did not include Removing old file: log-xyz for the old files which the Log class would output if they were correctly being removed. I suspect the lock timeouts were due to high I/O load at the time. Some stack traces: {code} org.apache.flume.ChannelException: Failed to obtain lock for writing to the log. Try increasing the log write timeout value. [channel=fileChannel] at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doPut(FileChannel.java:478) at org.apache.flume.channel.BasicTransactionSemantics.put(BasicTransactionSemantics.java:93) at org.apache.flume.channel.BasicChannelSemantics.put(BasicChannelSemantics.java:80) at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:189) org.apache.flume.ChannelException: Failed to obtain lock for writing to the log. Try increasing the log write timeout value. [channel=fileChannel] at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doCommit(FileChannel.java:594) at org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151) at dataxu.flume.plugins.avro.AsyncAvroSink.process(AsyncAvroSink.java:548) at dataxu.flume.plugins.ClassLoaderFlumeSink.process(ClassLoaderFlumeSink.java:33) at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) at java.lang.Thread.run(Thread.java:619) org.apache.flume.ChannelException: Failed to obtain lock for writing to the log. Try increasing the log write timeout value. [channel=fileChannel] at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:621) at org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168) at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:194) at dataxu.flume.plugins.avro.AvroSource.appendBatch(AvroSource.java:209) at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.avro.ipc.specific.SpecificResponder.respond(SpecificResponder.java:91) at org.apache.avro.ipc.Responder.respond(Responder.java:151) at org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.messageReceived(NettyServer.java:188) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75) at org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:173) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:792) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:321) at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:303) at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:220) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:94) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:364
[jira] [Commented] (FLUME-2314) Upgrade to Mapdb 0.9.9
[ https://issues.apache.org/jira/browse/FLUME-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895931#comment-13895931 ] Jarek Jarcec Cecho commented on FLUME-2314: --- +1 Upgrade to Mapdb 0.9.9 -- Key: FLUME-2314 URL: https://issues.apache.org/jira/browse/FLUME-2314 Project: Flume Issue Type: Bug Reporter: Hari Shreedharan Assignee: Hari Shreedharan Attachments: FLUME-2314.patch This affects classes which we use though I am not sure it affects our use-case: https://github.com/jankotek/MapDB/issues/259. This is fixed in the latest release -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (FLUME-2312) Add utility for adorning HTTP contexts in Jetty
[ https://issues.apache.org/jira/browse/FLUME-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13892927#comment-13892927 ] Jarek Jarcec Cecho commented on FLUME-2312: --- Thank you [~hshreedharan]! Add utility for adorning HTTP contexts in Jetty --- Key: FLUME-2312 URL: https://issues.apache.org/jira/browse/FLUME-2312 Project: Flume Issue Type: Bug Reporter: Hari Shreedharan Assignee: Hari Shreedharan Attachments: FLUME-2312.patch, FLUME-2312.patch The idea is to accomplish the same as HBASE-10473 to disallow TRACE and OPTIONS -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: [DISCUSS] Release Flume 1.5.0
Strong +1 on creating a new release. Jarcec On Thu, Jan 30, 2014 at 09:17:43AM -0800, Hari Shreedharan wrote: Hi folks, It has been about 6 months since we did a release. We have added several new features and fixed a lot of bugs. What do you guys think about releasing Flume 1.5.0? Thanks Hari signature.asc Description: Digital signature
[jira] [Commented] (FLUME-2305) BucketWriter#close must cancel idleFuture
[ https://issues.apache.org/jira/browse/FLUME-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13884924#comment-13884924 ] Jarek Jarcec Cecho commented on FLUME-2305: --- +1 BucketWriter#close must cancel idleFuture - Key: FLUME-2305 URL: https://issues.apache.org/jira/browse/FLUME-2305 Project: Flume Issue Type: Bug Reporter: Hari Shreedharan Assignee: Hari Shreedharan Attachments: FLUME-2305.patch We use a ScheduledExecutorService to close the files after a time period. If the task has been submitted, the service does not cancel that task by default. In the ScheduledThreadPoolExecutor class: this field executeExistingDelayedTasksAfterShutdown is set to true, by default - which causes the task to be executed even if the executor was shutdown. If we don't cancel it, the agent does not die till idleTimeout even though every other component is stopped. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (FLUME-2302) TestHDFS Sink fails with Can't get Kerberos realm
[ https://issues.apache.org/jira/browse/FLUME-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho resolved FLUME-2302. --- Resolution: Fixed Fix Version/s: v1.5.0 Assignee: Hari Shreedharan Thank you for the fix [~hshreedharan]! TestHDFS Sink fails with Can't get Kerberos realm - Key: FLUME-2302 URL: https://issues.apache.org/jira/browse/FLUME-2302 Project: Flume Issue Type: Bug Reporter: Hari Shreedharan Assignee: Hari Shreedharan Fix For: v1.5.0 Attachments: FLUME-2302.patch Adding this to the test fixes it: {code} static { System.setProperty(java.security.krb5.realm, flume); System.setProperty(java.security.krb5.kdc, blah); } {code} The same issue caused HBASE-8842, and the fix is pretty much the same. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (FLUME-2303) HBaseSink tests can fail based on order of execution
[ https://issues.apache.org/jira/browse/FLUME-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13875819#comment-13875819 ] Jarek Jarcec Cecho commented on FLUME-2303: --- +1 HBaseSink tests can fail based on order of execution Key: FLUME-2303 URL: https://issues.apache.org/jira/browse/FLUME-2303 Project: Flume Issue Type: Bug Reporter: Hari Shreedharan Assignee: Hari Shreedharan Attachments: FLUME-2303.patch Some tests do not remove all events from the channel because of batch sizes getting reset in other tests -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (FLUME-2301) Update HBaseSink tests to reflect sink returning backoff only on empty batches
[ https://issues.apache.org/jira/browse/FLUME-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13873806#comment-13873806 ] Jarek Jarcec Cecho commented on FLUME-2301: --- +1 Update HBaseSink tests to reflect sink returning backoff only on empty batches -- Key: FLUME-2301 URL: https://issues.apache.org/jira/browse/FLUME-2301 Project: Flume Issue Type: Bug Reporter: Hari Shreedharan Assignee: Hari Shreedharan Attachments: FLUME-2301.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (FLUME-2301) Update HBaseSink tests to reflect sink returning backoff only on empty batches
[ https://issues.apache.org/jira/browse/FLUME-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho resolved FLUME-2301. --- Resolution: Fixed Fix Version/s: v1.5.0 Committed to [trunk|https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=9a4f047668eb56895fb4bf5c1ee3e5dd6add8601] and [flume-1.5|https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=4bb2658348f836e0e03b14f5a949a05663b54bfe]. Thank you Hari for your contribution! Update HBaseSink tests to reflect sink returning backoff only on empty batches -- Key: FLUME-2301 URL: https://issues.apache.org/jira/browse/FLUME-2301 Project: Flume Issue Type: Bug Reporter: Hari Shreedharan Assignee: Hari Shreedharan Fix For: v1.5.0 Attachments: FLUME-2301.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: Git commit confirmation on jira
We are using svngit2jira on other projects and we are quite happy with that. Thank you for enabling that Hari! Jarcec On Tue, Jan 14, 2014 at 01:45:17PM -0800, Hari Shreedharan wrote: Hi folks, I requested ASF Infra to enable svngit2jira to put in a message on each jira immediately when it is committed. Infra has already enabled it, but if you do not agree with this change and think this will inconvenience you in any way, please reply to this email. Thanks, Hari signature.asc Description: Digital signature
[jira] [Commented] (FLUME-2289) Disable maxUnderReplication test which is extremely flakey
[ https://issues.apache.org/jira/browse/FLUME-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864088#comment-13864088 ] Jarek Jarcec Cecho commented on FLUME-2289: --- +1 Disable maxUnderReplication test which is extremely flakey -- Key: FLUME-2289 URL: https://issues.apache.org/jira/browse/FLUME-2289 Project: Flume Issue Type: Bug Reporter: Hari Shreedharan Assignee: Hari Shreedharan Attachments: FLUME-2289.patch There is a test that already tests that under replication causes file rolls. The max under replication test is inherently a bit flakey because it depends on HDFS client API noticing a bunch of things. I think we should disable it -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (FLUME-2265) Closed bucket writers should be removed from sfwriters map
[ https://issues.apache.org/jira/browse/FLUME-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho resolved FLUME-2265. --- Resolution: Fixed Fix Version/s: v1.5.0 Thank you for your contribution [~hshreedharan]! I've committed the patch to both [trunk|https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=3b1034e8229eb9ad3e27ed0faab77c3f68f708c6] and [flume-1.5|https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=c6117b50d998d8989890bd1dca911cdf402ba1a2] branch. Closed bucket writers should be removed from sfwriters map -- Key: FLUME-2265 URL: https://issues.apache.org/jira/browse/FLUME-2265 Project: Flume Issue Type: Bug Reporter: Hari Shreedharan Assignee: Hari Shreedharan Fix For: v1.5.0 Attachments: FLUME-2265.patch If we don't remove the bucket writer from the map, the reference is kept in the map and not removed till the maxOpenFiles limit is hit, even though the bucket writer is closed. This leads to HDFS buffers sticking around for a long time after the bucket writer is closed. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: [DISCUSS] Feature bloat and contrib module
Hi Hari, thank you very much for starting the discussion about contrib module. Flume is a very pluggable project where user have opportunity to plug in variety of different components (sink, source, channel, interceptor, ...). I believe that having a place where users can easily exchange their own plugins would be extremely useful and would help with further adoption of the project. Having a contrib module seems as one way how to accomplish this, another way would be perhaps to have a special wiki page (which we already do have [1]). I do see benefits of repository based contrib module in having the ability to track the authorship of the changes and their history as the time passed. As I do see the contrib space more as a user driven, so I would see the bar of including a new component very low. It should be very easy for one user to submit component that works in his particular environment and let other user to tweak it for yet another purpose. With the bar keeping low we can't expect that the components will always work with current flume version nor that they will be backward compatible. Having said that I don't think that such contrib module should be official part of a Flume release. We can do our best to make a contrib release as a part of our usual release process and offer it as a separate download. I would not consider broken or removed components in the contrib as blocker for flume release though. I'm happy to hear others opinions! Jarcec Links: 1: https://cwiki.apache.org/confluence/display/FLUME/Flume+NG+Plugins On Fri, Dec 13, 2013 at 04:21:29PM -0800, Hari Shreedharan wrote: Hi all Over the past few weeks, there were some discussions about including additional features in core flume itself. This led to a discussion about adding a contrib module. I thought I’d start an official discussion regarding this. The argument for contrib module was that there are too many components without generic use which are getting submitted/committed. The use-cases for such components are limited, and hence they should not be part of core flume itself. First, we should answer the question if we want to separate components into a contrib module and why? What components go into contrib and what into core flume? What does it mean to be make a component part of the contrib module. Do contrib components get released with Flume? Can they break compatibility with older versions (what does this mean if they are not getting released?) etc. How supported are these? Please respond and let us know your view! Thanks, Hari signature.asc Description: Digital signature
[jira] [Updated] (FLUME-2262) Log4j Appender should use timeStamp field not getTimestamp
[ https://issues.apache.org/jira/browse/FLUME-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho updated FLUME-2262: -- Fix Version/s: v1.5.0 Log4j Appender should use timeStamp field not getTimestamp -- Key: FLUME-2262 URL: https://issues.apache.org/jira/browse/FLUME-2262 Project: Flume Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Fix For: v1.5.0 Attachments: FLUME-2262.patch getTimestamp was added in log4j 1.2.15, we should use the timestamp field instead: 1.2.14: https://github.com/apache/log4j/blob/v1_2_14/src/java/org/apache/log4j/spi/LoggingEvent.java#L124 trunk: https://github.com/apache/log4j/blob/trunk/src/main/java/org/apache/log4j/spi/LoggingEvent.java#L569 -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (FLUME-2235) idleFuture should be cancelled at the start of append
[ https://issues.apache.org/jira/browse/FLUME-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816747#comment-13816747 ] Jarek Jarcec Cecho commented on FLUME-2235: --- Thank you for the patch [~hshreedharan], it looks good to me - +1. idleFuture should be cancelled at the start of append - Key: FLUME-2235 URL: https://issues.apache.org/jira/browse/FLUME-2235 Project: Flume Issue Type: Bug Reporter: Hari Shreedharan Assignee: Hari Shreedharan Attachments: FLUME-2235.patch, FLUME-2235.patch It seems like if idleTimeout is reached in the middle of an append call, it will rotate the file while a write is happening, since the idleFuture is not cancelled before the append starts. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (FLUME-2235) idleFuture should be cancelled at the start of append
[ https://issues.apache.org/jira/browse/FLUME-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816748#comment-13816748 ] Jarek Jarcec Cecho commented on FLUME-2235: --- Committed to [trunk|https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=705abaf00fbf8ee69ac88cbccae47c1a33f4b4b2] and [flume-1.5|https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=ad612c28bec3845775e12d6bc51ef724e6f78f06]. idleFuture should be cancelled at the start of append - Key: FLUME-2235 URL: https://issues.apache.org/jira/browse/FLUME-2235 Project: Flume Issue Type: Bug Reporter: Hari Shreedharan Assignee: Hari Shreedharan Fix For: v1.5.0 Attachments: FLUME-2235.patch, FLUME-2235.patch It seems like if idleTimeout is reached in the middle of an append call, it will rotate the file while a write is happening, since the idleFuture is not cancelled before the append starts. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (FLUME-2229) Backoff period gets reset too often in OrderSelector
[ https://issues.apache.org/jira/browse/FLUME-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13810626#comment-13810626 ] Jarek Jarcec Cecho commented on FLUME-2229: --- +1 Backoff period gets reset too often in OrderSelector Key: FLUME-2229 URL: https://issues.apache.org/jira/browse/FLUME-2229 Project: Flume Issue Type: Bug Reporter: Hari Shreedharan Assignee: Hari Shreedharan Attachments: FLUME-2229.patch In OrderSelector.java, the backoff period is calculated by using the number of sequential failures seen to the same host. Since CONSIDER_SEQUENTIAL_RANGE is set to 2 seconds, if the host is not selected for connection within 2 seconds of the backoff period ending, the sequential failures variable is reset and backoff period is not increased. We must increase the value of CONSIDER_SEQUENTIAL_RANGE so that the backoff is not reset too often. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (FLUME-2229) Backoff period gets reset too often in OrderSelector
[ https://issues.apache.org/jira/browse/FLUME-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13810630#comment-13810630 ] Jarek Jarcec Cecho commented on FLUME-2229: --- Committed to [trunk|https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=a89897bec4e7d6f3342ed966c61668e8a8139af5] and [cherry-picked|https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=7fc23d7d41757ce75058cead963b5bd54c395727] to flume-1.5. Thank you for your contribution [~hshreedharan]! Backoff period gets reset too often in OrderSelector Key: FLUME-2229 URL: https://issues.apache.org/jira/browse/FLUME-2229 Project: Flume Issue Type: Bug Reporter: Hari Shreedharan Assignee: Hari Shreedharan Attachments: FLUME-2229.patch In OrderSelector.java, the backoff period is calculated by using the number of sequential failures seen to the same host. Since CONSIDER_SEQUENTIAL_RANGE is set to 2 seconds, if the host is not selected for connection within 2 seconds of the backoff period ending, the sequential failures variable is reset and backoff period is not increased. We must increase the value of CONSIDER_SEQUENTIAL_RANGE so that the backoff is not reset too often. -- This message was sent by Atlassian JIRA (v6.1#6144)