[jira] [Commented] (FLUME-2799) Kafka Source - Message Offset and Partition add to headers

2016-01-13 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15097445#comment-15097445
 ] 

Gwen Shapira commented on FLUME-2799:
-

Do we really want Sources to have their own "pre-source"  pluggable interceptor?

I sympathize with the requirement, but perhaps something more "configurable" 
and less "coding" to manage the headers (a bit like what HDFS sink does for 
directory names, only the reverse). After all, simple and configurable sources 
are part of the big value in Flume. 

> Kafka Source - Message Offset and Partition add to headers
> --
>
> Key: FLUME-2799
> URL: https://issues.apache.org/jira/browse/FLUME-2799
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.6.0
>Reporter: Michael Andre Pearce (IG)
>Priority: Minor
>  Labels: easyfix, patch
> Fix For: v1.7.0
>
> Attachments: FLUME-2799-0.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently Kafka source only persists the original kafka message's topic into 
> the Flume event headers.
> For downstream interceptors and sinks that may want to have available to them 
> the partition and the offset , we need to add these.
> Also it is noted that the conversion from MessageAndMetaData to FlumeEvent is 
> not configurable unlike other sources such as JMS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [ANNOUNCE] New Flume Committer - Ashish Paliwal

2015-05-11 Thread Gwen Shapira
Congrats Ashish and thanks for all the reviews :)

On Fri, May 8, 2015 at 8:42 PM, Hari Shreedharan 
wrote:

> On behalf of the Apache Flume PMC, I am excited to welcome Ashish Paliwal as
> a committer on the Apache Flume project. Ashish has actively contributed
> several patches to the Flume project, including bug fixes, configuration
> improvements and other new features.
>
> Congratulations and Welcome, Ashish!
>
>
> Cheers,
> Hari Shreedharan
>


[jira] [Commented] (FLUME-2667) Add new Kafka Java API in flume-ng-kafka-sink which keep compatibility with current API

2015-04-24 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511477#comment-14511477
 ] 

Gwen Shapira commented on FLUME-2667:
-

I have a major concern here:
I'm not sure we want to support both Scala and Java clients as options.  Since 
the new producer will work with old Kafka brokers, which producer we use is an 
internal implementation detail of KafkaSink. If we are not sure the new 
Producer works, we shouldn't have this patch. If we are sure it works, why 
support the old one?

The decision to support both complicates the implementation and testing quite a 
bit, adding layers of abstraction, factories, etc. I don't think we gain enough 
to justify the new maintenance complexity.

Do I miss a good reason to support both producers?



> Add new Kafka Java API in flume-ng-kafka-sink which keep compatibility with 
> current API
> ---
>
> Key: FLUME-2667
> URL: https://issues.apache.org/jira/browse/FLUME-2667
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.6.0
>Reporter: Frank Yao
>Assignee: Frank Yao
>Priority: Minor
> Attachments: FLUME-2667-0.patch
>
>
> Kafka has released 0.8.2 and meanwhile starts to using new producer API which 
> is written in Java originally. Currently, we use javaapi written in scala in 
> flume-ng-kafka-sink. I've added new Java API into current sink and re-write 
> the tests with mockito. 
> As far as I know by reading the source code of kafka 0.8.2, new Java API of 
> Consumer would seen be released soon. I think in the very soon, kafka API 
> will all change into Java API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-1934) Spoolingdir source exception when reading multiple zero size files

2015-04-03 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14395044#comment-14395044
 ] 

Gwen Shapira commented on FLUME-1934:
-

+1 - simple and clean fix, nice tests and nice log message.

> Spoolingdir source exception when reading multiple   zero size files 
> -
>
> Key: FLUME-1934
> URL: https://issues.apache.org/jira/browse/FLUME-1934
> Project: Flume
>  Issue Type: Bug
>Affects Versions: v1.3.1
> Environment: windows 7, flume 1.3.1, spooling dir source.
>Reporter: andy zhou
>Assignee: Grant Henke
> Attachments: FLUME-1934.patch
>
>
> move more than one files to the spool dir, and each file size is 0, then 
> flume agent will throw IllegalStateException forever, and never work 
> again,its' main cause is commited flag will not set to true.
> logs:
> 08 三月 2013 08:00:14,406 ERROR [pool-5-thread-1] 
> (org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run:148) 
>  - Uncaught exception in Runnable
> java.lang.IllegalStateException: File should not roll when  commit is 
> outstanding.
>   at 
> org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:164)
>   at 
> org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:135)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at 
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:619)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (FLUME-1934) Spoolingdir source exception when reading multiple zero size files

2015-04-03 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira reopened FLUME-1934:
-

> Spoolingdir source exception when reading multiple   zero size files 
> -
>
> Key: FLUME-1934
> URL: https://issues.apache.org/jira/browse/FLUME-1934
> Project: Flume
>  Issue Type: Bug
>Affects Versions: v1.3.1
> Environment: windows 7, flume 1.3.1, spooling dir source.
>Reporter: andy zhou
> Attachments: FLUME-1934.patch
>
>
> move more than one files to the spool dir, and each file size is 0, then 
> flume agent will throw IllegalStateException forever, and never work 
> again,its' main cause is commited flag will not set to true.
> logs:
> 08 三月 2013 08:00:14,406 ERROR [pool-5-thread-1] 
> (org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run:148) 
>  - Uncaught exception in Runnable
> java.lang.IllegalStateException: File should not roll when  commit is 
> outstanding.
>   at 
> org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:164)
>   at 
> org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:135)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at 
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:619)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLUME-2632) High CPU on KafkaSink

2015-02-23 Thread Gwen Shapira (JIRA)
Gwen Shapira created FLUME-2632:
---

 Summary: High CPU on KafkaSink
 Key: FLUME-2632
 URL: https://issues.apache.org/jira/browse/FLUME-2632
 Project: Flume
  Issue Type: Bug
Reporter: Gwen Shapira


Reported here: https://github.com/harishreedharan/flume/issues/1

"I tried flume-ng-kafka-sink and it worked fine. But I noticed that the cpu 
utilization stay at 100% and never dropped down all the time even at the time 
the channel is empty.

I looked into the source code and found that "process" function in KafkaSink 
always return Status.READY even if no events available in channel. That causes 
the SinkRunner keep running achieving event from channel and get nothing.

Do we need to change to return Status.BACKOFF in "process" function in 
KafkaSink when it notices that there is no events processed in current round? 
So that the SinkRunner has a chance to take a rest when there is no event in 
channel. If this proposal feasible, function "testEmptyChannel" in 
TestKafkaSink also need to be changed. "



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLUME-2632) High CPU on KafkaSink

2015-02-23 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira reassigned FLUME-2632:
---

Assignee: Gwen Shapira

> High CPU on KafkaSink
> -
>
> Key: FLUME-2632
> URL: https://issues.apache.org/jira/browse/FLUME-2632
> Project: Flume
>  Issue Type: Bug
>        Reporter: Gwen Shapira
>    Assignee: Gwen Shapira
>
> Reported here: https://github.com/harishreedharan/flume/issues/1
> "I tried flume-ng-kafka-sink and it worked fine. But I noticed that the cpu 
> utilization stay at 100% and never dropped down all the time even at the time 
> the channel is empty.
> I looked into the source code and found that "process" function in KafkaSink 
> always return Status.READY even if no events available in channel. That 
> causes the SinkRunner keep running achieving event from channel and get 
> nothing.
> Do we need to change to return Status.BACKOFF in "process" function in 
> KafkaSink when it notices that there is no events processed in current round? 
> So that the SinkRunner has a chance to take a rest when there is no event in 
> channel. If this proposal feasible, function "testEmptyChannel" in 
> TestKafkaSink also need to be changed. "



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLUME-2223) is NettyAvroRpcClient methods threadsafe

2015-01-20 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira resolved FLUME-2223.
-
Resolution: Not a Problem

> is NettyAvroRpcClient methods threadsafe
> 
>
> Key: FLUME-2223
> URL: https://issues.apache.org/jira/browse/FLUME-2223
> Project: Flume
>  Issue Type: Question
>  Components: Client SDK
>Reporter: Praveen Nair
>
> Hi, 
> I would like to know if  NettyAvroRpcClient methods are thread safe , so that 
> same client instance can be used by multiple threads to post log events from 
> a java application. 
> Could you provide an example code snippet that mentions the usage of  
> NettyAvroRpcClient methods in multithreaded situation?
> Also the append method in org.apache.flume.clients.log4jappender.Log4jAppender
> seems to be synchronised. Does that create a bottleneck if Log4jAppender
> is used from application for appending log messages to Flume?
> Thanks & Regards,
> Praveen



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2562) Metrics for Flafka components

2015-01-16 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281041#comment-14281041
 ] 

Gwen Shapira commented on FLUME-2562:
-

Thanks :)

> Metrics for Flafka components
> -
>
> Key: FLUME-2562
> URL: https://issues.apache.org/jira/browse/FLUME-2562
> Project: Flume
>  Issue Type: Improvement
>    Reporter: Gwen Shapira
>    Assignee: Gwen Shapira
> Attachments: FLUME-2562.1.patch, FLUME-2562.2.patch
>
>
> Kafka source, sink and channel should have metrics. This will help us track 
> down possible issues or performance problems.
> Here are the metrics I came up with:
> kafka.next.time - Time spent waiting for events from Kafka (source and 
> channel)
> kafka.send.time - Time spent sending events (channel and sink) 
> kafka.commit.time - Time spent committing (source and channel)
> events.sent - Number of events sent to Kafka (sink and channel)
> events.read - Number of events read from Kafka (channel and source)
> events.rollback - Number of event rolled back (channel) or number of rollback 
> calls (sink)
> kafka.empty - Number of times backing off due to empty kafka topic (source)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-2562) Metrics for Flafka components

2015-01-14 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated FLUME-2562:

Attachment: FLUME-2562.2.patch

Attaching patch after rebase

> Metrics for Flafka components
> -
>
> Key: FLUME-2562
> URL: https://issues.apache.org/jira/browse/FLUME-2562
> Project: Flume
>  Issue Type: Improvement
>        Reporter: Gwen Shapira
>    Assignee: Gwen Shapira
> Attachments: FLUME-2562.1.patch, FLUME-2562.2.patch
>
>
> Kafka source, sink and channel should have metrics. This will help us track 
> down possible issues or performance problems.
> Here are the metrics I came up with:
> kafka.next.time - Time spent waiting for events from Kafka (source and 
> channel)
> kafka.send.time - Time spent sending events (channel and sink) 
> kafka.commit.time - Time spent committing (source and channel)
> events.sent - Number of events sent to Kafka (sink and channel)
> events.read - Number of events read from Kafka (channel and source)
> events.rollback - Number of event rolled back (channel) or number of rollback 
> calls (sink)
> kafka.empty - Number of times backing off due to empty kafka topic (source)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2583) Flafka unit tests should randomize ports

2015-01-04 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264125#comment-14264125
 ] 

Gwen Shapira commented on FLUME-2583:
-

btw. Looks like we have a bunch of duplication between unit tests code for the 
Flafka components.

How important is it to keep sinks, sources and channels independent? Can I 
create a test-utils library that will be used by all 3?

> Flafka unit tests should randomize ports
> 
>
> Key: FLUME-2583
> URL: https://issues.apache.org/jira/browse/FLUME-2583
> Project: Flume
>  Issue Type: Improvement
>    Reporter: Gwen Shapira
>    Assignee: Gwen Shapira
>
> Flafka (i.e. Kafka source, sink and channel) unit tests don't randomize 
> ports, therefore they can fail (after creating a small mess) when running on 
> machines that already have Kafka or Zookeeper running.
> Lets fix this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLUME-2583) Flafka unit tests should randomize ports

2015-01-04 Thread Gwen Shapira (JIRA)
Gwen Shapira created FLUME-2583:
---

 Summary: Flafka unit tests should randomize ports
 Key: FLUME-2583
 URL: https://issues.apache.org/jira/browse/FLUME-2583
 Project: Flume
  Issue Type: Improvement
Reporter: Gwen Shapira
Assignee: Gwen Shapira


Flafka (i.e. Kafka source, sink and channel) unit tests don't randomize ports, 
therefore they can fail (after creating a small mess) when running on machines 
that already have Kafka or Zookeeper running.

Lets fix this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2578) Kafka source throws NPE if Kafka record has null key

2014-12-30 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261346#comment-14261346
 ] 

Gwen Shapira commented on FLUME-2578:
-

Thanks [~skeltoac]!

[~hshreedharan] - come on, review this one already :)

> Kafka source throws NPE if Kafka record has null key
> 
>
> Key: FLUME-2578
> URL: https://issues.apache.org/jira/browse/FLUME-2578
> Project: Flume
>  Issue Type: Bug
>    Reporter: Gwen Shapira
>    Assignee: Gwen Shapira
> Attachments: FLUME-2578.0.patch
>
>
> When the Kafka topics contain messages with no key (such as the messages sent 
> by default from kafka-console-producer), Kafka Source throws NPE:
> 2014-12-20 12:58:59,604 ERROR
> org.apache.flume.source.kafka.KafkaSource: KafkaSource EXCEPTION, {}
> java.lang.NullPointerException
> at java.lang.String.(String.java:556)
> at org.apache.flume.source.kafka.KafkaSource.process(KafkaSource.java:105)
> at 
> org.apache.flume.source.PollableSourceRunner$PollingRunner.run(PollableSourceRunner.java:139)
> at java.lang.Thread.run(Thread.java:745)
> )



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-2578) Kafka source throws NPE if Kafka record has null key

2014-12-20 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated FLUME-2578:

Attachment: FLUME-2578.0.patch

simple fix + test

> Kafka source throws NPE if Kafka record has null key
> 
>
> Key: FLUME-2578
> URL: https://issues.apache.org/jira/browse/FLUME-2578
> Project: Flume
>  Issue Type: Bug
>        Reporter: Gwen Shapira
>    Assignee: Gwen Shapira
> Attachments: FLUME-2578.0.patch
>
>
> When the Kafka topics contain messages with no key (such as the messages sent 
> by default from kafka-console-producer), Kafka Source throws NPE:
> 2014-12-20 12:58:59,604 ERROR
> org.apache.flume.source.kafka.KafkaSource: KafkaSource EXCEPTION, {}
> java.lang.NullPointerException
> at java.lang.String.(String.java:556)
> at org.apache.flume.source.kafka.KafkaSource.process(KafkaSource.java:105)
> at 
> org.apache.flume.source.PollableSourceRunner$PollingRunner.run(PollableSourceRunner.java:139)
> at java.lang.Thread.run(Thread.java:745)
> )



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLUME-2578) Kafka source throws NPE if Kafka record has null key

2014-12-20 Thread Gwen Shapira (JIRA)
Gwen Shapira created FLUME-2578:
---

 Summary: Kafka source throws NPE if Kafka record has null key
 Key: FLUME-2578
 URL: https://issues.apache.org/jira/browse/FLUME-2578
 Project: Flume
  Issue Type: Bug
Reporter: Gwen Shapira
Assignee: Gwen Shapira


When the Kafka topics contain messages with no key (such as the messages sent 
by default from kafka-console-producer), Kafka Source throws NPE:

2014-12-20 12:58:59,604 ERROR
org.apache.flume.source.kafka.KafkaSource: KafkaSource EXCEPTION, {}
java.lang.NullPointerException
at java.lang.String.(String.java:556)
at org.apache.flume.source.kafka.KafkaSource.process(KafkaSource.java:105)
at 
org.apache.flume.source.PollableSourceRunner$PollingRunner.run(PollableSourceRunner.java:139)
at java.lang.Thread.run(Thread.java:745)
)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2570) Add option to not pad date fields

2014-12-14 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246212#comment-14246212
 ] 

Gwen Shapira commented on FLUME-2570:
-

[~petel]: If I understand the request correctly, you'd like to add a flag for 
unpadded-day and unpadded-month?

> Add option to not pad date fields
> -
>
> Key: FLUME-2570
> URL: https://issues.apache.org/jira/browse/FLUME-2570
> Project: Flume
>  Issue Type: New Feature
>  Components: Configuration
>Affects Versions: v1.5.1
>Reporter: Peter Leckie
>Assignee: Johny Rufus
>
> Although technically dates are padded, it would be valuable if Flume was able 
> to format the date components such that they were expressed like integers, eg 
> not padded.
> For example using the %y, %d or %m alias to create output directories 
> referencing today's date like the following:
> /output/2014/3/5/
> The reason this would be so helpful is when importing the data into either 
> Hive or Impala.
> First of all, Impala does not have an ability to pad partitions, so currently 
> the only way to do this is to import the data with hive, then use Impala to 
> access the data(well you could write custom code, however).
> Second, padding partitions in hive or impala causes issues for example 
> pruning of padded partitions is not possible.
> The following is an example of a typical work flow:
> Data is imported into HDFS using flume with sink as follows:
> agent.sinks.snk_avro_snappy.hdfs.path = 
> hdfs://hdfs/avro/year=%Y/month=%m/day=%d
> IMPALA reads the data as follows:
> create external table TestAvro (.)
> partitioned by (Year int, Month int, Day int) stored as avro
> location '/avro';
> alter table TestAvro add if not exists 
> partition(Year=cast(year(to_date(now())) as int), 
> Month=cast(month(to_date(now())) as int), Day=cast(day(to_date(now())) as 
> int));
> Flume saves the output as
> hdfs://hdfs/avro/year=2014/month=12/day=01
> And Impala reads it as:
> hdfs://hdfs/avro/year=2014/month=12/day=1
> So this feature request is to add an ability to Flume to write data into a 
> directory using today's date with no padding on the day or month field.
> Implementation details are not important, for example could add a macro which 
> simply removes padding, instead of futzing with the date aliases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2562) Metrics for Flafka components

2014-11-26 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226754#comment-14226754
 ] 

Gwen Shapira commented on FLUME-2562:
-

[~paliwalashish] - I am afraid I do not fully understand your comment. Perhaps 
I'm missing some context.

Are you asking to wait with this patch because you are working on a change that 
will make this patch simpler? If so, can you point me at where the rest of the 
work is proceeding?






> Metrics for Flafka components
> -
>
> Key: FLUME-2562
> URL: https://issues.apache.org/jira/browse/FLUME-2562
> Project: Flume
>  Issue Type: Improvement
>    Reporter: Gwen Shapira
>Assignee: Gwen Shapira
> Attachments: FLUME-2562.1.patch
>
>
> Kafka source, sink and channel should have metrics. This will help us track 
> down possible issues or performance problems.
> Here are the metrics I came up with:
> kafka.next.time - Time spent waiting for events from Kafka (source and 
> channel)
> kafka.send.time - Time spent sending events (channel and sink) 
> kafka.commit.time - Time spent committing (source and channel)
> events.sent - Number of events sent to Kafka (sink and channel)
> events.read - Number of events read from Kafka (channel and source)
> events.rollback - Number of event rolled back (channel) or number of rollback 
> calls (sink)
> kafka.empty - Number of times backing off due to empty kafka topic (source)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2562) Metrics for Flafka components

2014-11-25 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225457#comment-14225457
 ] 

Gwen Shapira commented on FLUME-2562:
-

Example of new metrics:

"CHANNEL.channel1":{"EventPutSuccessCount":"98","ChannelFillPercentage":"1.7976931348623157E308","*KafkaEventGetTimer*":"2","Type":"CHANNEL","*KafkaEventSendTimer*":"565","*KafkaCommitTimer*":"1142","EventTakeSuccessCount":"95","StopTime":"0","ChannelSize":"0","EventPutAttemptCount":"0","StartTime":"1416893965625","*RollbackCount*":"0","ChannelCapacity":"0","EventTakeAttemptCount":"0”}}

"SOURCE.source1":{"*KafkaEventGetTimer*":"7377","OpenConnectionCount":"0","Type":"SOURCE","AppendBatchAcceptedCount":"0","AppendBatchReceivedCount":"0","EventAcceptedCount":"30","AppendReceivedCount":"0","StopTime":"0","StartTime":"1416953795734","EventReceivedCount":"30","*KafkaCommitTimer*":"1129","AppendAcceptedCount":"0”},

{"SINK.sink1":{"Type":"SINK","ConnectionClosedCount":"0","EventDrainSuccessCount":"437","*KafkaEventSendTimer*":"2144","ConnectionFailedCount":"0","BatchCompleteCount":"0","EventDrainAttemptCount":"0","ConnectionCreatedCount":"0","BatchEmptyCount":"0","StopTime":"0","*RollbackCount*":"0","StartTime":"1416960016259","BatchUnderflowCount":"0”},


> Metrics for Flafka components
> -
>
> Key: FLUME-2562
> URL: https://issues.apache.org/jira/browse/FLUME-2562
> Project: Flume
>  Issue Type: Improvement
>Reporter: Gwen Shapira
>Assignee: Gwen Shapira
> Attachments: FLUME-2562.1.patch
>
>
> Kafka source, sink and channel should have metrics. This will help us track 
> down possible issues or performance problems.
> Here are the metrics I came up with:
> kafka.next.time - Time spent waiting for events from Kafka (source and 
> channel)
> kafka.send.time - Time spent sending events (channel and sink) 
> kafka.commit.time - Time spent committing (source and channel)
> events.sent - Number of events sent to Kafka (sink and channel)
> events.read - Number of events read from Kafka (channel and source)
> events.rollback - Number of event rolled back (channel) or number of rollback 
> calls (sink)
> kafka.empty - Number of times backing off due to empty kafka topic (source)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-2562) Metrics for Flafka components

2014-11-25 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated FLUME-2562:

Attachment: FLUME-2562.1.patch

Version of the patch that uses MonitoredCounterGroup hierarchy.

I verified that all components (source, sink, channel) work and that they 
report metrics through the http port.

> Metrics for Flafka components
> -
>
> Key: FLUME-2562
> URL: https://issues.apache.org/jira/browse/FLUME-2562
> Project: Flume
>  Issue Type: Improvement
>        Reporter: Gwen Shapira
>    Assignee: Gwen Shapira
> Attachments: FLUME-2562.1.patch
>
>
> Kafka source, sink and channel should have metrics. This will help us track 
> down possible issues or performance problems.
> Here are the metrics I came up with:
> kafka.next.time - Time spent waiting for events from Kafka (source and 
> channel)
> kafka.send.time - Time spent sending events (channel and sink) 
> kafka.commit.time - Time spent committing (source and channel)
> events.sent - Number of events sent to Kafka (sink and channel)
> events.read - Number of events read from Kafka (channel and source)
> events.rollback - Number of event rolled back (channel) or number of rollback 
> calls (sink)
> kafka.empty - Number of times backing off due to empty kafka topic (source)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2562) Metrics for Flafka components

2014-11-21 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221252#comment-14221252
 ] 

Gwen Shapira commented on FLUME-2562:
-

Removed the patch since we want one that uses the MonitoredCounterGroup

> Metrics for Flafka components
> -
>
> Key: FLUME-2562
> URL: https://issues.apache.org/jira/browse/FLUME-2562
> Project: Flume
>  Issue Type: Improvement
>    Reporter: Gwen Shapira
>    Assignee: Gwen Shapira
>
> Kafka source, sink and channel should have metrics. This will help us track 
> down possible issues or performance problems.
> Here are the metrics I came up with:
> kafka.next.time - Time spent waiting for events from Kafka (source and 
> channel)
> kafka.send.time - Time spent sending events (channel and sink) 
> kafka.commit.time - Time spent committing (source and channel)
> events.sent - Number of events sent to Kafka (sink and channel)
> events.read - Number of events read from Kafka (channel and source)
> events.rollback - Number of event rolled back (channel) or number of rollback 
> calls (sink)
> kafka.empty - Number of times backing off due to empty kafka topic (source)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-2562) Metrics for Flafka components

2014-11-21 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated FLUME-2562:

Attachment: (was: FLUME-2562.0.patch)

> Metrics for Flafka components
> -
>
> Key: FLUME-2562
> URL: https://issues.apache.org/jira/browse/FLUME-2562
> Project: Flume
>  Issue Type: Improvement
>        Reporter: Gwen Shapira
>    Assignee: Gwen Shapira
>
> Kafka source, sink and channel should have metrics. This will help us track 
> down possible issues or performance problems.
> Here are the metrics I came up with:
> kafka.next.time - Time spent waiting for events from Kafka (source and 
> channel)
> kafka.send.time - Time spent sending events (channel and sink) 
> kafka.commit.time - Time spent committing (source and channel)
> events.sent - Number of events sent to Kafka (sink and channel)
> events.read - Number of events read from Kafka (channel and source)
> events.rollback - Number of event rolled back (channel) or number of rollback 
> calls (sink)
> kafka.empty - Number of times backing off due to empty kafka topic (source)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2562) Metrics for Flafka components

2014-11-21 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221243#comment-14221243
 ] 

Gwen Shapira commented on FLUME-2562:
-

Looks like I got the wrong counters here :)

I used CounterGroup where I probably should have used SourceCounter, 
SinkCounter and ChannelCounter which are exposed via http. 

I assume that extending those classes to add some Kafka-specific metrics is 
acceptable? 
I'm asking because it looks like no one extended them before...



> Metrics for Flafka components
> -
>
> Key: FLUME-2562
> URL: https://issues.apache.org/jira/browse/FLUME-2562
> Project: Flume
>  Issue Type: Improvement
>    Reporter: Gwen Shapira
>Assignee: Gwen Shapira
> Attachments: FLUME-2562.0.patch
>
>
> Kafka source, sink and channel should have metrics. This will help us track 
> down possible issues or performance problems.
> Here are the metrics I came up with:
> kafka.next.time - Time spent waiting for events from Kafka (source and 
> channel)
> kafka.send.time - Time spent sending events (channel and sink) 
> kafka.commit.time - Time spent committing (source and channel)
> events.sent - Number of events sent to Kafka (sink and channel)
> events.read - Number of events read from Kafka (channel and source)
> events.rollback - Number of event rolled back (channel) or number of rollback 
> calls (sink)
> kafka.empty - Number of times backing off due to empty kafka topic (source)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-2562) Metrics for Flafka components

2014-11-21 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated FLUME-2562:

Attachment: FLUME-2562.0.patch

I tested that unit-tests are passing.

I'm not sure how to tests that metrics are indeed collected as expected. Advice 
will be appreciated :)

> Metrics for Flafka components
> -
>
> Key: FLUME-2562
> URL: https://issues.apache.org/jira/browse/FLUME-2562
> Project: Flume
>  Issue Type: Improvement
>    Reporter: Gwen Shapira
>    Assignee: Gwen Shapira
> Attachments: FLUME-2562.0.patch
>
>
> Kafka source, sink and channel should have metrics. This will help us track 
> down possible issues or performance problems.
> Here are the metrics I came up with:
> kafka.next.time - Time spent waiting for events from Kafka (source and 
> channel)
> kafka.send.time - Time spent sending events (channel and sink) 
> kafka.commit.time - Time spent committing (source and channel)
> events.sent - Number of events sent to Kafka (sink and channel)
> events.read - Number of events read from Kafka (channel and source)
> events.rollback - Number of event rolled back (channel) or number of rollback 
> calls (sink)
> kafka.empty - Number of times backing off due to empty kafka topic (source)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLUME-2562) Metrics for Flafka components

2014-11-21 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira reassigned FLUME-2562:
---

Assignee: Gwen Shapira

> Metrics for Flafka components
> -
>
> Key: FLUME-2562
> URL: https://issues.apache.org/jira/browse/FLUME-2562
> Project: Flume
>  Issue Type: Improvement
>        Reporter: Gwen Shapira
>    Assignee: Gwen Shapira
>
> Kafka source, sink and channel should have metrics. This will help us track 
> down possible issues or performance problems.
> Here are the metrics I came up with:
> kafka.next.time - Time spent waiting for events from Kafka (source and 
> channel)
> kafka.send.time - Time spent sending events (channel and sink) 
> kafka.commit.time - Time spent committing (source and channel)
> events.sent - Number of events sent to Kafka (sink and channel)
> events.read - Number of events read from Kafka (channel and source)
> events.rollback - Number of event rolled back (channel) or number of rollback 
> calls (sink)
> kafka.empty - Number of times backing off due to empty kafka topic (source)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-2562) Metrics for Flafka components

2014-11-21 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated FLUME-2562:

Description: 
Kafka source, sink and channel should have metrics. This will help us track 
down possible issues or performance problems.

Here are the metrics I came up with:

kafka.next.time - Time spent waiting for events from Kafka (source and channel)
kafka.send.time - Time spent sending events (channel and sink) 
kafka.commit.time - Time spent committing (source and channel)
events.sent - Number of events sent to Kafka (sink and channel)
events.read - Number of events read from Kafka (channel and source)
events.rollback - Number of event rolled back (channel) or number of rollback 
calls (sink)
kafka.empty - Number of times backing off due to empty kafka topic (source)


  was:
Kafka source, sink and channel should have metrics. This will help us track 
down possible issues or performance problems.

Here are the metrics I came up with:

kafka.next.time - Time spent waiting for events from Kafka (source and channel)
kafka.send.time - Time spent sending events (channel and sink) 
kafka.commit.time - Time spent committing (source and channel)
events.sent - Number of events sent to Kafka (source and channel)
events.read - Number of events read from Kafka (channel and sink)
events.rollback - Number of event rolled back (channel) or number of rollback 
calls (sink)
kafka.empty - Number of times backing off due to empty kafka topic (source)



> Metrics for Flafka components
> -
>
> Key: FLUME-2562
> URL: https://issues.apache.org/jira/browse/FLUME-2562
> Project: Flume
>  Issue Type: Improvement
>        Reporter: Gwen Shapira
>
> Kafka source, sink and channel should have metrics. This will help us track 
> down possible issues or performance problems.
> Here are the metrics I came up with:
> kafka.next.time - Time spent waiting for events from Kafka (source and 
> channel)
> kafka.send.time - Time spent sending events (channel and sink) 
> kafka.commit.time - Time spent committing (source and channel)
> events.sent - Number of events sent to Kafka (sink and channel)
> events.read - Number of events read from Kafka (channel and source)
> events.rollback - Number of event rolled back (channel) or number of rollback 
> calls (sink)
> kafka.empty - Number of times backing off due to empty kafka topic (source)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-2562) Metrics for Flafka components

2014-11-21 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated FLUME-2562:

Description: 
Kafka source, sink and channel should have metrics. This will help us track 
down possible issues or performance problems.

Here are the metrics I came up with:

kafka.next.time - Time spent waiting for events from Kafka (source and channel)
kafka.send.time - Time spent sending events (channel and sink) 
kafka.commit.time - Time spent committing (source and channel)
events.sent - Number of events sent to Kafka (source and channel)
events.read - Number of events read from Kafka (channel and sink)
events.rollback - Number of event rolled back (channel) or number of rollback 
calls (sink)
kafka.empty - Number of times backing off due to empty kafka topic (source)


  was:
Kafka source, sink and channel should have metrics. This will help us track 
down possible issues or performance problems.

Here are the metrics I came up with:

kafka.next.time - Time spent waiting for events from Kafka (source and channel)
kafka.send.time - Time spent sending events (channel and sink) 
kafka.commit.time - Time spent committing (source and channel)
events.sent - Number of events sent
events.rollback - Number of event rolled back (channel) or number of rollback 
calls (sink)
kafka.empty - Number of times backing off due to empty kafka topic (source)



> Metrics for Flafka components
> -
>
> Key: FLUME-2562
> URL: https://issues.apache.org/jira/browse/FLUME-2562
> Project: Flume
>  Issue Type: Improvement
>        Reporter: Gwen Shapira
>
> Kafka source, sink and channel should have metrics. This will help us track 
> down possible issues or performance problems.
> Here are the metrics I came up with:
> kafka.next.time - Time spent waiting for events from Kafka (source and 
> channel)
> kafka.send.time - Time spent sending events (channel and sink) 
> kafka.commit.time - Time spent committing (source and channel)
> events.sent - Number of events sent to Kafka (source and channel)
> events.read - Number of events read from Kafka (channel and sink)
> events.rollback - Number of event rolled back (channel) or number of rollback 
> calls (sink)
> kafka.empty - Number of times backing off due to empty kafka topic (source)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLUME-2562) Metrics for Flafka components

2014-11-21 Thread Gwen Shapira (JIRA)
Gwen Shapira created FLUME-2562:
---

 Summary: Metrics for Flafka components
 Key: FLUME-2562
 URL: https://issues.apache.org/jira/browse/FLUME-2562
 Project: Flume
  Issue Type: Improvement
Reporter: Gwen Shapira


Kafka source, sink and channel should have metrics. This will help us track 
down possible issues or performance problems.

Here are the metrics I came up with:

kafka.next.time - Time spent waiting for events from Kafka (source and channel)
kafka.send.time - Time spent sending events (channel and sink) 
kafka.commit.time - Time spent committing (source and channel)
events.sent - Number of events sent
events.rollback - Number of event rolled back (channel) or number of rollback 
calls (sink)
kafka.empty - Number of times backing off due to empty kafka topic (source)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [ANNOUNCE] New Flume PMC Member - Roshan Naik

2014-11-05 Thread Gwen Shapira
Congrats, Roshan :)

Very much deserved.

On Tue, Nov 4, 2014 at 2:12 PM, Arvind Prabhakar  wrote:

> On behalf of Apache Flume PMC, it is my pleasure to announce that Roshan
> Naik has been elected to the Flume Project Management Committee. Roshan has
> been active with the project for many years and has been a committer on the
> project since September of 2013.
>
> Please join me in congratulating Roshan and welcoming him to the Flume PMC.
>
> Regards,
> Arvind Prabhakar
>


[jira] [Updated] (FLUME-2523) Document Kafka channel

2014-10-31 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated FLUME-2523:

Attachment: FLUME-2523.1.patch

Thanks for the detailed review! 

Fixed most issues raised by Ashish and Hari. 

Exceptions:
1) any hostname that resolves is valid for zookeeper and broker. I didn't 
include FQD for brevity and clarity
2) I'd rather not document properties that we don't want users to override.
Lets view it as internal implementation detail? I feel that documenting invites 
fiddling, and I'd rather they won't fiddle here.


> Document Kafka channel
> --
>
> Key: FLUME-2523
> URL: https://issues.apache.org/jira/browse/FLUME-2523
> Project: Flume
>  Issue Type: Task
>  Components: Docs
>    Reporter: Gwen Shapira
>Assignee: Gwen Shapira
> Attachments: FLUME-2523.0.patch, FLUME-2523.1.patch
>
>
> FLUME-2500 adds a Kafka channel. We need to document its usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-2523) Document Kafka channel

2014-10-30 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated FLUME-2523:

Attachment: FLUME-2523.0.patch

Documentation for Kafka Channel

> Document Kafka channel
> --
>
> Key: FLUME-2523
> URL: https://issues.apache.org/jira/browse/FLUME-2523
> Project: Flume
>  Issue Type: Task
>  Components: Docs
>    Reporter: Gwen Shapira
>Assignee: Gwen Shapira
> Attachments: FLUME-2523.0.patch
>
>
> FLUME-2500 adds a Kafka channel. We need to document its usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2484) NullPointerException in Kafka Sink test

2014-10-27 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186225#comment-14186225
 ] 

Gwen Shapira commented on FLUME-2484:
-

I can't reproduce the issue or figure out what went wrong just by looking at 
the test.

I take it that other KafkaSink tests are successful? 
Can you attach logs or console output from the test execution? I need more 
information on what's going on there.

If you are trying to debug yourself, this is the line thats failing:
String fetchedMsg = new String((byte[]) testUtil.getNextMessageFromConsumer(
  TestConstants.STATIC_TOPIC).message())

testUtil.getNextMessageFromConsumer can return null if there was nothing to 
consume within 1s. Looking at the time the test took, I suspect this is the 
issue. 
Which probably means kafkaSink.process() in "prepareAndSend" failed without 
returning a BACKOFF status.

Logs or any other extra insights on the failure will be appreciated. I'd be 
more helpful, but I can't get this to reproduce.

> NullPointerException in Kafka Sink test
> ---
>
> Key: FLUME-2484
> URL: https://issues.apache.org/jira/browse/FLUME-2484
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>Affects Versions: v1.6.0
>Reporter: Santiago M. Mola
>Priority: Blocker
> Fix For: v1.6.0
>
>
> Kafka Sink test fails on Travis with NullPointerException:
> https://travis-ci.org/Stratio/flume/jobs/36814710#L6560
> Running org.apache.flume.sink.kafka.TestKafkaSink
> Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 17.061 sec 
> <<< FAILURE!
> testStaticTopic(org.apache.flume.sink.kafka.TestKafkaSink)  Time elapsed: 
> 1823 sec  <<< ERROR!
> java.lang.NullPointerException
>   at 
> org.apache.flume.sink.kafka.TestKafkaSink.testStaticTopic(TestKafkaSink.java:113)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
>   at 
> org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
>   at 
> org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
> Results :
> Tests in error: 
>   testStaticTopic(org.apache.flume.sink.kafka.TestKafkaSink)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 26820: FLUME-2500. Kafka Channel

2014-10-27 Thread Gwen Shapira

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26820/#review58751
---

Ship it!


Created FLUME-2523 for the docs. 

I think this is ready :)

- Gwen Shapira


On Oct. 23, 2014, 6:58 p.m., Hari Shreedharan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26820/
> ---
> 
> (Updated Oct. 23, 2014, 6:58 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Bugs: FLUME-2500
> https://issues.apache.org/jira/browse/FLUME-2500
> 
> 
> Repository: flume-git
> 
> 
> Description
> ---
> 
> Add a channel that uses Kafka
> 
> 
> Diffs
> -
> 
>   flume-ng-channels/flume-kafka-channel/pom.xml PRE-CREATION 
>   
> flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannel.java
>  PRE-CREATION 
>   
> flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannelConfiguration.java
>  PRE-CREATION 
>   
> flume-ng-channels/flume-kafka-channel/src/test/java/org/apache/flume/channel/kafka/TestKafkaChannel.java
>  PRE-CREATION 
>   
> flume-ng-channels/flume-kafka-channel/src/test/resources/kafka-server.properties
>  PRE-CREATION 
>   flume-ng-channels/flume-kafka-channel/src/test/resources/log4j.properties 
> PRE-CREATION 
>   
> flume-ng-channels/flume-kafka-channel/src/test/resources/zookeeper.properties 
> PRE-CREATION 
>   flume-ng-channels/pom.xml dc8dbc6 
>   flume-ng-sinks/flume-ng-kafka-sink/pom.xml 746a395 
>   
> flume-ng-sinks/flume-ng-kafka-sink/src/test/java/org/apache/flume/sink/kafka/util/KafkaConsumer.java
>  1c98922 
>   
> flume-ng-sinks/flume-ng-kafka-sink/src/test/java/org/apache/flume/sink/kafka/util/TestUtil.java
>  8855c53 
>   pom.xml 4f550d3 
> 
> Diff: https://reviews.apache.org/r/26820/diff/
> 
> 
> Testing
> ---
> 
> Added tests that simulate a Kafka cluster.
> 
> 
> Thanks,
> 
> Hari Shreedharan
> 
>



[jira] [Created] (FLUME-2523) Document Kafka channel

2014-10-27 Thread Gwen Shapira (JIRA)
Gwen Shapira created FLUME-2523:
---

 Summary: Document Kafka channel
 Key: FLUME-2523
 URL: https://issues.apache.org/jira/browse/FLUME-2523
 Project: Flume
  Issue Type: Task
  Components: Docs
Reporter: Gwen Shapira
Assignee: Gwen Shapira


FLUME-2500 adds a Kafka channel. We need to document its usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 26820: FLUME-2500. Kafka Channel

2014-10-27 Thread Gwen Shapira

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26820/#review58632
---


Looks good! Do you want to do the docs in this patch or a separate one?

- Gwen Shapira


On Oct. 23, 2014, 6:58 p.m., Hari Shreedharan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26820/
> ---
> 
> (Updated Oct. 23, 2014, 6:58 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Bugs: FLUME-2500
> https://issues.apache.org/jira/browse/FLUME-2500
> 
> 
> Repository: flume-git
> 
> 
> Description
> ---
> 
> Add a channel that uses Kafka
> 
> 
> Diffs
> -
> 
>   flume-ng-channels/flume-kafka-channel/pom.xml PRE-CREATION 
>   
> flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannel.java
>  PRE-CREATION 
>   
> flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannelConfiguration.java
>  PRE-CREATION 
>   
> flume-ng-channels/flume-kafka-channel/src/test/java/org/apache/flume/channel/kafka/TestKafkaChannel.java
>  PRE-CREATION 
>   
> flume-ng-channels/flume-kafka-channel/src/test/resources/kafka-server.properties
>  PRE-CREATION 
>   flume-ng-channels/flume-kafka-channel/src/test/resources/log4j.properties 
> PRE-CREATION 
>   
> flume-ng-channels/flume-kafka-channel/src/test/resources/zookeeper.properties 
> PRE-CREATION 
>   flume-ng-channels/pom.xml dc8dbc6 
>   flume-ng-sinks/flume-ng-kafka-sink/pom.xml 746a395 
>   
> flume-ng-sinks/flume-ng-kafka-sink/src/test/java/org/apache/flume/sink/kafka/util/KafkaConsumer.java
>  1c98922 
>   
> flume-ng-sinks/flume-ng-kafka-sink/src/test/java/org/apache/flume/sink/kafka/util/TestUtil.java
>  8855c53 
>   pom.xml 4f550d3 
> 
> Diff: https://reviews.apache.org/r/26820/diff/
> 
> 
> Testing
> ---
> 
> Added tests that simulate a Kafka cluster.
> 
> 
> Thanks,
> 
> Hari Shreedharan
> 
>



Re: Review Request 26820: FLUME-2500. Kafka Channel

2014-10-21 Thread Gwen Shapira

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26820/#review57706
---


Done reviewing. Great stuff :)


flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannelConfiguration.java
<https://reviews.apache.org/r/26820/#comment98525>

It looks like these are not used, and I can't see how the channel will work 
at all without a separate serializer for keys and messages. Any idea?



flume-ng-sinks/flume-ng-kafka-sink/src/test/java/org/apache/flume/sink/kafka/util/KafkaConsumer.java
<https://reviews.apache.org/r/26820/#comment98527>

    thanks :)


- Gwen Shapira


On Oct. 16, 2014, 8:22 p.m., Hari Shreedharan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26820/
> ---
> 
> (Updated Oct. 16, 2014, 8:22 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Bugs: FLUME-2500
> https://issues.apache.org/jira/browse/FLUME-2500
> 
> 
> Repository: flume-git
> 
> 
> Description
> ---
> 
> Add a channel that uses Kafka
> 
> 
> Diffs
> -
> 
>   flume-ng-channels/flume-kafka-channel/pom.xml PRE-CREATION 
>   
> flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannel.java
>  PRE-CREATION 
>   
> flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannelConfiguration.java
>  PRE-CREATION 
>   
> flume-ng-channels/flume-kafka-channel/src/test/java/org/apache/flume/channel/kafka/TestKafkaChannel.java
>  PRE-CREATION 
>   
> flume-ng-channels/flume-kafka-channel/src/test/resources/kafka-server.properties
>  PRE-CREATION 
>   flume-ng-channels/flume-kafka-channel/src/test/resources/log4j.properties 
> PRE-CREATION 
>   
> flume-ng-channels/flume-kafka-channel/src/test/resources/zookeeper.properties 
> PRE-CREATION 
>   flume-ng-channels/pom.xml dc8dbc6 
>   flume-ng-sinks/flume-ng-kafka-sink/pom.xml 746a395 
>   
> flume-ng-sinks/flume-ng-kafka-sink/src/test/java/org/apache/flume/sink/kafka/util/KafkaConsumer.java
>  1c98922 
>   
> flume-ng-sinks/flume-ng-kafka-sink/src/test/java/org/apache/flume/sink/kafka/util/TestUtil.java
>  8855c53 
>   pom.xml 4f550d3 
> 
> Diff: https://reviews.apache.org/r/26820/diff/
> 
> 
> Testing
> ---
> 
> Added tests that simulate a Kafka cluster.
> 
> 
> Thanks,
> 
> Hari Shreedharan
> 
>



Re: Review Request 26820: FLUME-2500. Kafka Channel

2014-10-21 Thread Gwen Shapira

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26820/#review57692
---


more comments...


flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannel.java
<https://reviews.apache.org/r/26820/#comment98509>

Wouldn't it be cleaner to instantiate in the constructor? Even if we never 
do puts, the overhead is fairly minimal?



flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannel.java
<https://reviews.apache.org/r/26820/#comment98514>

Shouldn't we create a new consumer here too?



flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannel.java
<https://reviews.apache.org/r/26820/#comment98512>

LOL



flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannel.java
<https://reviews.apache.org/r/26820/#comment98515>

Constructor?


- Gwen Shapira


On Oct. 16, 2014, 8:22 p.m., Hari Shreedharan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26820/
> ---
> 
> (Updated Oct. 16, 2014, 8:22 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Bugs: FLUME-2500
> https://issues.apache.org/jira/browse/FLUME-2500
> 
> 
> Repository: flume-git
> 
> 
> Description
> ---
> 
> Add a channel that uses Kafka
> 
> 
> Diffs
> -
> 
>   flume-ng-channels/flume-kafka-channel/pom.xml PRE-CREATION 
>   
> flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannel.java
>  PRE-CREATION 
>   
> flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannelConfiguration.java
>  PRE-CREATION 
>   
> flume-ng-channels/flume-kafka-channel/src/test/java/org/apache/flume/channel/kafka/TestKafkaChannel.java
>  PRE-CREATION 
>   
> flume-ng-channels/flume-kafka-channel/src/test/resources/kafka-server.properties
>  PRE-CREATION 
>   flume-ng-channels/flume-kafka-channel/src/test/resources/log4j.properties 
> PRE-CREATION 
>   
> flume-ng-channels/flume-kafka-channel/src/test/resources/zookeeper.properties 
> PRE-CREATION 
>   flume-ng-channels/pom.xml dc8dbc6 
>   flume-ng-sinks/flume-ng-kafka-sink/pom.xml 746a395 
>   
> flume-ng-sinks/flume-ng-kafka-sink/src/test/java/org/apache/flume/sink/kafka/util/KafkaConsumer.java
>  1c98922 
>   
> flume-ng-sinks/flume-ng-kafka-sink/src/test/java/org/apache/flume/sink/kafka/util/TestUtil.java
>  8855c53 
>   pom.xml 4f550d3 
> 
> Diff: https://reviews.apache.org/r/26820/diff/
> 
> 
> Testing
> ---
> 
> Added tests that simulate a Kafka cluster.
> 
> 
> Thanks,
> 
> Hari Shreedharan
> 
>



Re: Review Request 26820: FLUME-2500. Kafka Channel

2014-10-21 Thread Gwen Shapira

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26820/#review57638
---


Still reviewing... will update the review in parts :)


flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannel.java
<https://reviews.apache.org/r/26820/#comment98440>

Producers don't need to access ZK directly, so the error should be "Check 
whether Kafka broker is up..."



flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannel.java
<https://reviews.apache.org/r/26820/#comment98441>

Are you sure we want "smallest"? This will cause the channel to read 
everything in the topic when we first start. 
I can see why this would be good when Flume Source is the only one writing 
to the topic, but it can be a disaster if we attach the channel to a 
pre-existing topic. Perhaps make it user-configurable with "smallest" as the 
default and explain when to change it?


- Gwen Shapira


On Oct. 16, 2014, 8:22 p.m., Hari Shreedharan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26820/
> ---
> 
> (Updated Oct. 16, 2014, 8:22 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Bugs: FLUME-2500
> https://issues.apache.org/jira/browse/FLUME-2500
> 
> 
> Repository: flume-git
> 
> 
> Description
> ---
> 
> Add a channel that uses Kafka
> 
> 
> Diffs
> -
> 
>   flume-ng-channels/flume-kafka-channel/pom.xml PRE-CREATION 
>   
> flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannel.java
>  PRE-CREATION 
>   
> flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannelConfiguration.java
>  PRE-CREATION 
>   
> flume-ng-channels/flume-kafka-channel/src/test/java/org/apache/flume/channel/kafka/TestKafkaChannel.java
>  PRE-CREATION 
>   
> flume-ng-channels/flume-kafka-channel/src/test/resources/kafka-server.properties
>  PRE-CREATION 
>   flume-ng-channels/flume-kafka-channel/src/test/resources/log4j.properties 
> PRE-CREATION 
>   
> flume-ng-channels/flume-kafka-channel/src/test/resources/zookeeper.properties 
> PRE-CREATION 
>   flume-ng-channels/pom.xml dc8dbc6 
>   flume-ng-sinks/flume-ng-kafka-sink/pom.xml 746a395 
>   
> flume-ng-sinks/flume-ng-kafka-sink/src/test/java/org/apache/flume/sink/kafka/util/KafkaConsumer.java
>  1c98922 
>   
> flume-ng-sinks/flume-ng-kafka-sink/src/test/java/org/apache/flume/sink/kafka/util/TestUtil.java
>  8855c53 
>   pom.xml 4f550d3 
> 
> Diff: https://reviews.apache.org/r/26820/diff/
> 
> 
> Testing
> ---
> 
> Added tests that simulate a Kafka cluster.
> 
> 
> Thanks,
> 
> Hari Shreedharan
> 
>



[jira] [Updated] (FLUME-2495) Kafka Source may miss events when channel is not available

2014-10-07 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated FLUME-2495:

Attachment: FLUME-2495.2.patch

> Kafka Source may miss events when channel is not available
> --
>
> Key: FLUME-2495
> URL: https://issues.apache.org/jira/browse/FLUME-2495
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>    Reporter: Gwen Shapira
>Assignee: Gwen Shapira
> Attachments: FLUME-2495.0.patch, FLUME-2495.1.patch, 
> FLUME-2495.2.patch
>
>
> Because the Kafka consumer itself tracks offsets, and we don't restart the 
> consumer when we get channel errors, the consumer will skip messages that we 
> couldn't write to channel, even though we did not advance offset in Zookeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2495) Kafka Source may miss events when channel is not available

2014-10-07 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162393#comment-14162393
 ] 

Gwen Shapira commented on FLUME-2495:
-

ok, I think I see what you mean. Basically, clear list immediately after 
writing to channel. Then if we fail to commit to Kafka, most of the time we 
will not have duplicates (unless the source actually dies) 

> Kafka Source may miss events when channel is not available
> --
>
> Key: FLUME-2495
> URL: https://issues.apache.org/jira/browse/FLUME-2495
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>    Reporter: Gwen Shapira
>Assignee: Gwen Shapira
> Attachments: FLUME-2495.0.patch, FLUME-2495.1.patch
>
>
> Because the Kafka consumer itself tracks offsets, and we don't restart the 
> consumer when we get channel errors, the consumer will skip messages that we 
> couldn't write to channel, even though we did not advance offset in Zookeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-2495) Kafka Source may miss events when channel is not available

2014-10-07 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated FLUME-2495:

Attachment: FLUME-2495.1.patch

Thanks a lot for the catch, [~hshreedharan]. 
Attaching a fixed version + additional tests.

> Kafka Source may miss events when channel is not available
> --
>
> Key: FLUME-2495
> URL: https://issues.apache.org/jira/browse/FLUME-2495
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>    Reporter: Gwen Shapira
>Assignee: Gwen Shapira
> Attachments: FLUME-2495.0.patch, FLUME-2495.1.patch
>
>
> Because the Kafka consumer itself tracks offsets, and we don't restart the 
> consumer when we get channel errors, the consumer will skip messages that we 
> couldn't write to channel, even though we did not advance offset in Zookeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-2495) Kafka Source may miss events when channel is not available

2014-10-06 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated FLUME-2495:

Attachment: FLUME-2495.0.patch

> Kafka Source may miss events when channel is not available
> --
>
> Key: FLUME-2495
> URL: https://issues.apache.org/jira/browse/FLUME-2495
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>    Reporter: Gwen Shapira
>Assignee: Gwen Shapira
> Attachments: FLUME-2495.0.patch
>
>
> Because the Kafka consumer itself tracks offsets, and we don't restart the 
> consumer when we get channel errors, the consumer will skip messages that we 
> couldn't write to channel, even though we did not advance offset in Zookeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLUME-2495) Kafka Source may miss events when channel is not available

2014-10-06 Thread Gwen Shapira (JIRA)
Gwen Shapira created FLUME-2495:
---

 Summary: Kafka Source may miss events when channel is not available
 Key: FLUME-2495
 URL: https://issues.apache.org/jira/browse/FLUME-2495
 Project: Flume
  Issue Type: Bug
  Components: Sinks+Sources
Reporter: Gwen Shapira
Assignee: Gwen Shapira


Because the Kafka consumer itself tracks offsets, and we don't restart the 
consumer when we get channel errors, the consumer will skip messages that we 
couldn't write to channel, even though we did not advance offset in Zookeeper.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-2479) Kafka property auto.commit.enable is incorrect for KafkaSource

2014-10-03 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated FLUME-2479:

Attachment: FLUME-2479.1.patch

Fixed failing tests and renamed test classes.

> Kafka property auto.commit.enable is incorrect for KafkaSource
> --
>
> Key: FLUME-2479
> URL: https://issues.apache.org/jira/browse/FLUME-2479
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>    Assignee: Gwen Shapira
> Attachments: FLUME-2479.0.patch, FLUME-2479.1.patch
>
>
> The KafkaSource uses auto.commit.enabled (the d should be removed)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLUME-2493) Kafka source doesn't disable auto-commit as it should

2014-10-03 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira resolved FLUME-2493.
-
  Resolution: Duplicate
Release Note: FLUME-2479 is the same issue

> Kafka source doesn't disable auto-commit as it should
> -
>
> Key: FLUME-2493
> URL: https://issues.apache.org/jira/browse/FLUME-2493
> Project: Flume
>  Issue Type: Bug
>    Reporter: Gwen Shapira
>    Assignee: Gwen Shapira
>Priority: Critical
>
> auto-commit configuration is misspelled and of wrong type. Kafka never gets 
> it and therefore continues committing when it shouldn't. This can lead to 
> data loss if auto-commit happens before writing to channel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-2479) Kafka property auto.commit.enable is incorrect for KafkaSource

2014-10-03 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated FLUME-2479:

Attachment: FLUME-2479.0.patch

Fixed typos.

> Kafka property auto.commit.enable is incorrect for KafkaSource
> --
>
> Key: FLUME-2479
> URL: https://issues.apache.org/jira/browse/FLUME-2479
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>    Assignee: Gwen Shapira
> Attachments: FLUME-2479.0.patch
>
>
> The KafkaSource uses auto.commit.enabled (the d should be removed)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLUME-2479) Kafka property auto.commit.enable is incorrect for KafkaSource

2014-10-03 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira reassigned FLUME-2479:
---

Assignee: Gwen Shapira

> Kafka property auto.commit.enable is incorrect for KafkaSource
> --
>
> Key: FLUME-2479
> URL: https://issues.apache.org/jira/browse/FLUME-2479
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>    Assignee: Gwen Shapira
>
> The KafkaSource uses auto.commit.enabled (the d should be removed)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLUME-2493) Kafka source doesn't disable auto-commit as it should

2014-10-03 Thread Gwen Shapira (JIRA)
Gwen Shapira created FLUME-2493:
---

 Summary: Kafka source doesn't disable auto-commit as it should
 Key: FLUME-2493
 URL: https://issues.apache.org/jira/browse/FLUME-2493
 Project: Flume
  Issue Type: Bug
Reporter: Gwen Shapira
Assignee: Gwen Shapira
Priority: Critical


auto-commit configuration is misspelled and of wrong type. Kafka never gets it 
and therefore continues committing when it shouldn't. This can lead to data 
loss if auto-commit happens before writing to channel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-2492) Flume's Kafka Source doesn't account time correctly

2014-10-03 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated FLUME-2492:

Attachment: FLUME-2492.0.patch

proposed fix:

Don't try to count time in loops, the inaccuracies in 
System.getCurrentTimeMillis accumulate.
Instead calculate the limit and compare getCurrentTimeMillis to that.

> Flume's Kafka Source doesn't account time correctly
> ---
>
> Key: FLUME-2492
> URL: https://issues.apache.org/jira/browse/FLUME-2492
> Project: Flume
>  Issue Type: Bug
>Reporter: Gwen Shapira
>Assignee: Gwen Shapira
> Attachments: FLUME-2492.0.patch
>
>
> When receiving events, Flume's Kafka source does not account correctly for 
> passage of time. 1s batches can take significantly longer (i.e 10s or more).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLUME-2492) Flume's Kafka Source doesn't account time correctly

2014-10-03 Thread Gwen Shapira (JIRA)
Gwen Shapira created FLUME-2492:
---

 Summary: Flume's Kafka Source doesn't account time correctly
 Key: FLUME-2492
 URL: https://issues.apache.org/jira/browse/FLUME-2492
 Project: Flume
  Issue Type: Bug
Reporter: Gwen Shapira
Assignee: Gwen Shapira


When receiving events, Flume's Kafka source does not account correctly for 
passage of time. 1s batches can take significantly longer (i.e 10s or more).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2470) Kafka Sink and Source must use camel case for all configs.

2014-09-23 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14145873#comment-14145873
 ] 

Gwen Shapira commented on FLUME-2470:
-

It is. Since the new parameters change documentation, it made sense to assume 
the docs are already there.


> Kafka Sink and Source must use camel case for all configs.
> --
>
> Key: FLUME-2470
> URL: https://issues.apache.org/jira/browse/FLUME-2470
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>    Assignee: Gwen Shapira
> Attachments: FLUME-2470.0.patch
>
>
> Some configs are of the form kafka.* while others are camel cased. This is 
> for ease of implementation as these get passed to Kafka, but this make config 
> params not uniform. We must change this so all configs are camel-cased. We 
> should translate these camel-cased params to the kafka properties to 
> configure the kafka API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-2470) Kafka Sink and Source must use camel case for all configs.

2014-09-18 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated FLUME-2470:

Attachment: FLUME-2470.0.patch

Patch standardizing the parameters.

Mandatory parameters for Kafka source/sink are now camelCase.
Any Kafka parameter can still be passed along using the 
"kafka."+kafka.parameter.name convention.

I modified the documentation to match, cleaned up the configuration code a bit 
to support the new logic and added a test to validate.

> Kafka Sink and Source must use camel case for all configs.
> --
>
> Key: FLUME-2470
> URL: https://issues.apache.org/jira/browse/FLUME-2470
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Gwen Shapira
> Attachments: FLUME-2470.0.patch
>
>
> Some configs are of the form kafka.* while others are camel cased. This is 
> for ease of implementation as these get passed to Kafka, but this make config 
> params not uniform. We must change this so all configs are camel-cased. We 
> should translate these camel-cased params to the kafka properties to 
> configure the kafka API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2250) Add support for Kafka Source

2014-09-18 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139892#comment-14139892
 ] 

Gwen Shapira commented on FLUME-2250:
-

Thank you [~ybaniu] - this is a great contribution for Kafka and all thanks to 
you :)

> Add support for Kafka Source
> 
>
> Key: FLUME-2250
> URL: https://issues.apache.org/jira/browse/FLUME-2250
> Project: Flume
>  Issue Type: Sub-task
>  Components: Sinks+Sources
>Affects Versions: v1.5.0
>Reporter: Ashish Paliwal
>Assignee: Gwen Shapira
>Priority: Minor
> Attachments: FLUME-2250-0.patch, FLUME-2250-1.patch, 
> FLUME-2250-2.patch, FLUME-2250-3.patch, FLUME-2250-4.patch, FLUME-2250.patch
>
>
> Add support for Kafka Source



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLUME-2470) Kafka Sink and Source must use camel case for all configs.

2014-09-17 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira reassigned FLUME-2470:
---

Assignee: Gwen Shapira

> Kafka Sink and Source must use camel case for all configs.
> --
>
> Key: FLUME-2470
> URL: https://issues.apache.org/jira/browse/FLUME-2470
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>    Assignee: Gwen Shapira
>
> Some configs are of the form kafka.* while others are camel cased. This is 
> for ease of implementation as these get passed to Kafka, but this make config 
> params not uniform. We must change this so all configs are camel-cased. We 
> should translate these camel-cased params to the kafka properties to 
> configure the kafka API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-2455) Documentation update for Kafka Sink

2014-09-17 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated FLUME-2455:

Attachment: FLUME-2455-1.patch

Modified docs to match the patch version that was committed.

Basically removed the section about pre-processors and replaced with a note 
about how event headers will be used.

> Documentation update for Kafka Sink
> ---
>
> Key: FLUME-2455
> URL: https://issues.apache.org/jira/browse/FLUME-2455
> Project: Flume
>  Issue Type: Task
>  Components: Sinks+Sources
>Affects Versions: v1.6.0
>Reporter: Johny Rufus
>Assignee: Johny Rufus
>Priority: Minor
> Attachments: FLUME-2455-0.patch, FLUME-2455-1.patch
>
>
> The Flume user guide and Flume developer guide needs to be updated with the 
> new Kafka Sink



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2455) Documentation update for Kafka Sink

2014-09-17 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137764#comment-14137764
 ] 

Gwen Shapira commented on FLUME-2455:
-

pinging [~hshreedharan] for review :)

> Documentation update for Kafka Sink
> ---
>
> Key: FLUME-2455
> URL: https://issues.apache.org/jira/browse/FLUME-2455
> Project: Flume
>  Issue Type: Task
>  Components: Sinks+Sources
>Affects Versions: v1.6.0
>Reporter: Johny Rufus
>Assignee: Johny Rufus
>Priority: Minor
> Attachments: FLUME-2455-0.patch, FLUME-2455-1.patch
>
>
> The Flume user guide and Flume developer guide needs to be updated with the 
> new Kafka Sink



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLUME-2471) Add default value of kafka.request.required.acks for Kafka sink

2014-09-17 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira resolved FLUME-2471.
-
  Resolution: Won't Fix
Release Note: My mistake. The parameter is not mandatory, and if not 
specified, Kafka's default of 0 (leader ack only) is applied. I think this 
makes sense for now. I'll mention it in the docs and recommend setting -1 for 
extra safety.

> Add default value of kafka.request.required.acks for Kafka sink
> ---
>
> Key: FLUME-2471
> URL: https://issues.apache.org/jira/browse/FLUME-2471
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Reporter: Gwen Shapira
>
> Currently kafka.request.required.acks is a mandatory configuration for 
> setting the number of Kafka brokers that must acknowledge a message before 
> its considered a successful write by the Kafka Sink.
> We should  make it optional configuration, and default to -1 (all replicas 
> must acknowledge writes, safest option).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLUME-2471) Add default value of kafka.request.required.acks for Kafka sink

2014-09-17 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira reassigned FLUME-2471:
---

Assignee: Gwen Shapira

> Add default value of kafka.request.required.acks for Kafka sink
> ---
>
> Key: FLUME-2471
> URL: https://issues.apache.org/jira/browse/FLUME-2471
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>    Reporter: Gwen Shapira
>Assignee: Gwen Shapira
>
> Currently kafka.request.required.acks is a mandatory configuration for 
> setting the number of Kafka brokers that must acknowledge a message before 
> its considered a successful write by the Kafka Sink.
> We should  make it optional configuration, and default to -1 (all replicas 
> must acknowledge writes, safest option).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLUME-2471) Add default value of kafka.request.required.acks for Kafka sink

2014-09-17 Thread Gwen Shapira (JIRA)
Gwen Shapira created FLUME-2471:
---

 Summary: Add default value of kafka.request.required.acks for 
Kafka sink
 Key: FLUME-2471
 URL: https://issues.apache.org/jira/browse/FLUME-2471
 Project: Flume
  Issue Type: Improvement
  Components: Sinks+Sources
Reporter: Gwen Shapira


Currently kafka.request.required.acks is a mandatory configuration for setting 
the number of Kafka brokers that must acknowledge a message before its 
considered a successful write by the Kafka Sink.

We should  make it optional configuration, and default to -1 (all replicas must 
acknowledge writes, safest option).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLUME-2454) Support batchSize to allow multiple events per transaction to the Kafka Sink

2014-09-17 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira resolved FLUME-2454.
-
  Resolution: Fixed
Assignee: Gwen Shapira  (was: Johny Rufus)
Release Note: Done as part of FLUME-2251

> Support batchSize to allow multiple events per transaction to the Kafka Sink
> 
>
> Key: FLUME-2454
> URL: https://issues.apache.org/jira/browse/FLUME-2454
> Project: Flume
>  Issue Type: Task
>  Components: Sinks+Sources
>Affects Versions: v1.6.0
>Reporter: Johny Rufus
>    Assignee: Gwen Shapira
>
> The current kafka sink processes one event/transaction. We need to support 
> batchSize and allow the sink to process multiple events per transaction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-2250) Add support for Kafka Source

2014-09-16 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated FLUME-2250:

Attachment: FLUME-2250-4.patch

Latest version - modified the sink pom and removed Kafka version. Both sink and 
source are getting version from parent pom now.

> Add support for Kafka Source
> 
>
> Key: FLUME-2250
> URL: https://issues.apache.org/jira/browse/FLUME-2250
> Project: Flume
>  Issue Type: Sub-task
>  Components: Sinks+Sources
>Affects Versions: v1.5.0
>Reporter: Ashish Paliwal
>Priority: Minor
> Attachments: FLUME-2250-0.patch, FLUME-2250-1.patch, 
> FLUME-2250-2.patch, FLUME-2250-3.patch, FLUME-2250-4.patch, FLUME-2250.patch
>
>
> Add support for Kafka Source



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-2250) Add support for Kafka Source

2014-09-16 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated FLUME-2250:

Attachment: FLUME-2250-3.patch

Many fixes according to [~hshreedharan] comments and few extra:

* Using EventBuilder
* Using isDebugEnabled
* Skipping call to channel in case of no events.
* Moved the creation of Kafka Consumer to start() method. Note that this means 
that if Flume can't connect to ZooKeeper (for any reason - from firewalls and 
typos), start will retry to connect every 3 second.
* Tried to clean up the exceptions. 
One exception ("Failed to create message iterator") is still not very clear. As 
far as I can tell, creating iterator should never fail (and I've never seen it 
fail). The exception is there "just in case", so I don't have useful details to 
add.
* fixed issue with hasData
* chopped up lines
* Improved comments, especially around the Kafka bits.
* Fixed versions in pom
* Parameters that will be passed to Kafka should start with "kafka" (i.e. 
"kafka.zookeeper.connect" rather than "zookeeper.connect"), this makes the 
behavior of Kafka source and sink identical, which is easier for users.
* Added tests for non-existing topics (no errors, just lack of messages) and 
non-existing zookeeper (exception thrown by start() )


> Add support for Kafka Source
> 
>
> Key: FLUME-2250
> URL: https://issues.apache.org/jira/browse/FLUME-2250
> Project: Flume
>  Issue Type: Sub-task
>  Components: Sinks+Sources
>Affects Versions: v1.5.0
>Reporter: Ashish Paliwal
>Priority: Minor
> Attachments: FLUME-2250-0.patch, FLUME-2250-1.patch, 
> FLUME-2250-2.patch, FLUME-2250-3.patch, FLUME-2250.patch
>
>
> Add support for Kafka Source



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2250) Add support for Kafka Source

2014-09-16 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136408#comment-14136408
 ] 

Gwen Shapira commented on FLUME-2250:
-

Kafka's AutoCommit:

If autocommit is set to true, Kafka will commit offset for messages in a 
background thread every 10 seconds (IIRC the default).
This means that if Flume is unable to write to channel - messages that we 
consumed may be lost when autocommit happens.
It also means that if Flume agent crashes, we don't know if we lost messages in 
the buffer that were not written to channel yet, or if we will read the same 
messages twice because autocommit did not happen.

If autocommit is disabled, the Kafka Source will commit on every batch. This 
can slow down ingest rates if the batches are small, but is far safer option.

We recommend autocommit=false. I'll make sure we document this.

> Add support for Kafka Source
> 
>
> Key: FLUME-2250
> URL: https://issues.apache.org/jira/browse/FLUME-2250
> Project: Flume
>  Issue Type: Sub-task
>  Components: Sinks+Sources
>Affects Versions: v1.5.0
>Reporter: Ashish Paliwal
>Priority: Minor
> Attachments: FLUME-2250-0.patch, FLUME-2250-1.patch, 
> FLUME-2250-2.patch, FLUME-2250.patch
>
>
> Add support for Kafka Source



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2250) Add support for Kafka Source

2014-09-16 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136376#comment-14136376
 ] 

Gwen Shapira commented on FLUME-2250:
-

Regarding timedHasNext():

Kafka's API is slightly wonky.

You set a "consumer timeout". hasNext() will block until the timeout 
(indefinitely by default), and will throw an exception when the timeout is 
reached. So the only possible return value is "true". 
This is fine when we consume an event at a time, but a problem when trying to 
consume batches (where we may block on each event).
So I'm wrapping the Kafka API with behavior that we can use for batching.

> Add support for Kafka Source
> 
>
> Key: FLUME-2250
> URL: https://issues.apache.org/jira/browse/FLUME-2250
> Project: Flume
>  Issue Type: Sub-task
>  Components: Sinks+Sources
>Affects Versions: v1.5.0
>Reporter: Ashish Paliwal
>Priority: Minor
> Attachments: FLUME-2250-0.patch, FLUME-2250-1.patch, 
> FLUME-2250-2.patch, FLUME-2250.patch
>
>
> Add support for Kafka Source



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-2251) Add support for Kafka Sink

2014-09-12 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated FLUME-2251:

Attachment: FLUME-2251-6.patch

Fixed the nits :)

Thanks for noticing the list pre-allocation bit. Also added a test for handling 
an empty channel.

> Add support for Kafka Sink
> --
>
> Key: FLUME-2251
> URL: https://issues.apache.org/jira/browse/FLUME-2251
> Project: Flume
>  Issue Type: Sub-task
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>Reporter: Ashish Paliwal
>Priority: Minor
>  Labels: feature, patch
> Attachments: FLUME-2251-0.patch, FLUME-2251-2.patch, 
> FLUME-2251-3.patch, FLUME-2251-4.patch, FLUME-2251-5.patch, 
> FLUME-2251-6.patch, FLUME-2251.patch, FLUME-2251.patch, Flume-2251-1.patch
>
>
> Add support for Kafka Sink



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2250) Add support for Kafka Source

2014-09-12 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14132086#comment-14132086
 ] 

Gwen Shapira commented on FLUME-2250:
-

[~hshreedharan]

The first 3 attachments to this patch (dating Nov/Dec 2013) are contributions 
to the apache project by [~baniu.yao].
My patch modified the existing patches to support a new version, different 
method of unit testing (miniclusters instead of mocks), batching and manual 
commits. 

To best of my understanding, the transfer of ownership to ASF happened when the 
patch was attached to the Jira. Since the original patch was uploaded here, it 
was contributed to the ASF, and therefore I can make changes on top and 
contribute it again.

> Add support for Kafka Source
> 
>
> Key: FLUME-2250
> URL: https://issues.apache.org/jira/browse/FLUME-2250
> Project: Flume
>  Issue Type: Sub-task
>  Components: Sinks+Sources
>Affects Versions: v1.5.0
>Reporter: Ashish Paliwal
>Priority: Minor
> Attachments: FLUME-2250-0.patch, FLUME-2250-1.patch, 
> FLUME-2250-2.patch, FLUME-2250.patch
>
>
> Add support for Kafka Source



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-2251) Add support for Kafka Sink

2014-09-12 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated FLUME-2251:

Attachment: FLUME-2251-5.patch

Removed the "preprocessor" code and added support for optional headers. Fixed 
unittests to match.

[~hshreedharan] please review :)

> Add support for Kafka Sink
> --
>
> Key: FLUME-2251
> URL: https://issues.apache.org/jira/browse/FLUME-2251
> Project: Flume
>  Issue Type: Sub-task
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>Reporter: Ashish Paliwal
>Priority: Minor
>  Labels: feature, patch
> Attachments: FLUME-2251-0.patch, FLUME-2251-2.patch, 
> FLUME-2251-3.patch, FLUME-2251-4.patch, FLUME-2251-5.patch, FLUME-2251.patch, 
> FLUME-2251.patch, Flume-2251-1.patch
>
>
> Add support for Kafka Sink



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-2251) Add support for Kafka Sink

2014-09-12 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated FLUME-2251:

Attachment: FLUME-2251-4.patch

Attaching version of the sink that fixes the try-catch issues and adds batching 
(but still has the pre-processor in place).

I'm going to go ahead and remove the pre-processor and replace it with event 
headers (for use with interceptors), but I wanted to include this version in 
case [~thilinamb] has a requirement that the interceptors don't solve.

> Add support for Kafka Sink
> --
>
> Key: FLUME-2251
> URL: https://issues.apache.org/jira/browse/FLUME-2251
> Project: Flume
>  Issue Type: Sub-task
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>Reporter: Ashish Paliwal
>Priority: Minor
>  Labels: feature, patch
> Attachments: FLUME-2251-0.patch, FLUME-2251-2.patch, 
> FLUME-2251-3.patch, FLUME-2251-4.patch, FLUME-2251.patch, FLUME-2251.patch, 
> Flume-2251-1.patch
>
>
> Add support for Kafka Sink



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2250) Add support for Kafka Source

2014-09-12 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14131762#comment-14131762
 ] 

Gwen Shapira commented on FLUME-2250:
-

pinging [~hshreedharan] for review :)

> Add support for Kafka Source
> 
>
> Key: FLUME-2250
> URL: https://issues.apache.org/jira/browse/FLUME-2250
> Project: Flume
>  Issue Type: Sub-task
>  Components: Sinks+Sources
>Affects Versions: v1.5.0
>Reporter: Ashish Paliwal
>Priority: Minor
> Attachments: FLUME-2250-0.patch, FLUME-2250-1.patch, 
> FLUME-2250-2.patch, FLUME-2250.patch
>
>
> Add support for Kafka Source



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2251) Add support for Kafka Sink

2014-09-11 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14130890#comment-14130890
 ] 

Gwen Shapira commented on FLUME-2251:
-

Also, since events are byte[] and Kafka take byte[] values, removing the 
preProcessor (which takes strings) can save us few conversions.

> Add support for Kafka Sink
> --
>
> Key: FLUME-2251
> URL: https://issues.apache.org/jira/browse/FLUME-2251
> Project: Flume
>  Issue Type: Sub-task
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>Reporter: Ashish Paliwal
>Priority: Minor
>  Labels: feature, patch
> Attachments: FLUME-2251-0.patch, FLUME-2251-2.patch, 
> FLUME-2251-3.patch, FLUME-2251.patch, FLUME-2251.patch, Flume-2251-1.patch
>
>
> Add support for Kafka Sink



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2251) Add support for Kafka Sink

2014-09-11 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14130887#comment-14130887
 ] 

Gwen Shapira commented on FLUME-2251:
-

[~hshreedharan] - agree on most comments on working to fix them.
However, regarding the last one, the preProcessor is most definitely a 
preProcessor and not a serializer. 

It actually does actions that may be a better fit for an interceptor - 
modifying the message, processing it to decide on a topic and partition into 
which the message will be published. 

Any advice on whether to remove this functionality and instead check event 
header for topic and partition information (so the interceptor can be used for 
this functionality)? 

> Add support for Kafka Sink
> --
>
> Key: FLUME-2251
> URL: https://issues.apache.org/jira/browse/FLUME-2251
> Project: Flume
>  Issue Type: Sub-task
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>Reporter: Ashish Paliwal
>Priority: Minor
>  Labels: feature, patch
> Attachments: FLUME-2251-0.patch, FLUME-2251-2.patch, 
> FLUME-2251-3.patch, FLUME-2251.patch, FLUME-2251.patch, Flume-2251-1.patch
>
>
> Add support for Kafka Sink



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-2250) Add support for Kafka Source

2014-09-11 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated FLUME-2250:

Attachment: FLUME-2250-2.patch

New Kafka-Source patch. Features:
- Tested with 0.8.1.1
- Can batch writes to channel, for improved performance
- Supports both Kafka's auto commit, or commit for each batch, to eliminate 
risk for data loss 

> Add support for Kafka Source
> 
>
> Key: FLUME-2250
> URL: https://issues.apache.org/jira/browse/FLUME-2250
> Project: Flume
>  Issue Type: Sub-task
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>Reporter: Ashish Paliwal
>Priority: Minor
> Attachments: FLUME-2250-0.patch, FLUME-2250-1.patch, 
> FLUME-2250-2.patch, FLUME-2250.patch
>
>
> Add support for Kafka Source



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2250) Add support for Kafka Source

2014-09-08 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126183#comment-14126183
 ] 

Gwen Shapira commented on FLUME-2250:
-

Hi [~ybaniu],

I created an 0.8.1.1 version of the patch, tested it and added a unit-test.
Mind if I upload it here? 

> Add support for Kafka Source
> 
>
> Key: FLUME-2250
> URL: https://issues.apache.org/jira/browse/FLUME-2250
> Project: Flume
>  Issue Type: Sub-task
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>Reporter: Ashish Paliwal
>Priority: Minor
> Attachments: FLUME-2250-0.patch, FLUME-2250-1.patch, FLUME-2250.patch
>
>
> Add support for Kafka Source



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2251) Add support for Kafka Sink

2014-09-08 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125718#comment-14125718
 ] 

Gwen Shapira commented on FLUME-2251:
-

I recommend committing FLUME-2251-2.patch to trunk, since we tested that 
version.

Pinging [~hshreedharan] :)



> Add support for Kafka Sink
> --
>
> Key: FLUME-2251
> URL: https://issues.apache.org/jira/browse/FLUME-2251
> Project: Flume
>  Issue Type: Sub-task
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>Reporter: Ashish Paliwal
>Priority: Minor
>  Labels: feature, patch
> Attachments: FLUME-2251-0.patch, FLUME-2251-2.patch, 
> FLUME-2251-3.patch, FLUME-2251.patch, FLUME-2251.patch, Flume-2251-1.patch
>
>
> Add support for Kafka Sink



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2251) Add support for Kafka Sink

2014-08-26 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111276#comment-14111276
 ] 

Gwen Shapira commented on FLUME-2251:
-

Thank you [~thilinamb]! I'm testing it out :)

> Add support for Kafka Sink
> --
>
> Key: FLUME-2251
> URL: https://issues.apache.org/jira/browse/FLUME-2251
> Project: Flume
>  Issue Type: Sub-task
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>Reporter: Ashish Paliwal
>Priority: Minor
>  Labels: feature, patch
> Attachments: FLUME-2251-0.patch, FLUME-2251.patch, FLUME-2251.patch, 
> Flume-2251-1.patch
>
>
> Add support for Kafka Sink



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (FLUME-2251) Add support for Kafka Sink

2014-08-19 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103289#comment-14103289
 ] 

Gwen Shapira commented on FLUME-2251:
-

[~thilinamb]
Saturday is perfect. Looking forward for your patch!

> Add support for Kafka Sink
> --
>
> Key: FLUME-2251
> URL: https://issues.apache.org/jira/browse/FLUME-2251
> Project: Flume
>  Issue Type: Sub-task
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>Reporter: Ashish Paliwal
>Priority: Minor
> Attachments: FLUME-2251-0.patch, FLUME-2251.patch, FLUME-2251.patch
>
>
> Add support for Kafka Sink



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (FLUME-2250) Add support for Kafka Source

2014-08-19 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103253#comment-14103253
 ] 

Gwen Shapira commented on FLUME-2250:
-

Hey [~ybaniu],

We really want this patch in trunk :)
If you are busy, is this ok if I'll create the patch, and you'll just submit it 
to satisfy Apache's copyright assignment requirement?

> Add support for Kafka Source
> 
>
> Key: FLUME-2250
> URL: https://issues.apache.org/jira/browse/FLUME-2250
> Project: Flume
>  Issue Type: Sub-task
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>Reporter: Ashish Paliwal
>Priority: Minor
> Attachments: FLUME-2250-0.patch, FLUME-2250-1.patch, FLUME-2250.patch
>
>
> Add support for Kafka Source



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (FLUME-2251) Add support for Kafka Sink

2014-08-19 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103256#comment-14103256
 ] 

Gwen Shapira commented on FLUME-2251:
-

Hey [~thilinamb],

We really want this patch in trunk 
If you are busy, is this ok if I'll create the patch, and you'll just submit it 
to satisfy Apache's copyright assignment requirement?

> Add support for Kafka Sink
> --
>
> Key: FLUME-2251
> URL: https://issues.apache.org/jira/browse/FLUME-2251
> Project: Flume
>  Issue Type: Sub-task
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>Reporter: Ashish Paliwal
>Priority: Minor
> Attachments: FLUME-2251-0.patch, FLUME-2251.patch, FLUME-2251.patch
>
>
> Add support for Kafka Sink



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (FLUME-2251) Add support for Kafka Sink

2014-08-14 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14097204#comment-14097204
 ] 

Gwen Shapira commented on FLUME-2251:
-

Thanks [~ybaniu] for contributing. We are looking forward to the patch so we 
can get it into trunk and people can stop confusing the different forks :)

> Add support for Kafka Sink
> --
>
> Key: FLUME-2251
> URL: https://issues.apache.org/jira/browse/FLUME-2251
> Project: Flume
>  Issue Type: Sub-task
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>Reporter: Ashish Paliwal
>Priority: Minor
> Attachments: FLUME-2251-0.patch, FLUME-2251.patch, FLUME-2251.patch
>
>
> Add support for Kafka Sink



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (FLUME-2250) Add support for Kafka Source

2014-08-11 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093097#comment-14093097
 ] 

Gwen Shapira commented on FLUME-2250:
-

[~baniu.yao] can you submit your recent 0.8 branch of flume-kafka source as a 
patch to this JIRA?

> Add support for Kafka Source
> 
>
> Key: FLUME-2250
> URL: https://issues.apache.org/jira/browse/FLUME-2250
> Project: Flume
>  Issue Type: Sub-task
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>Reporter: Ashish Paliwal
>Priority: Minor
> Attachments: FLUME-2250-0.patch, FLUME-2250-1.patch, FLUME-2250.patch
>
>
> Add support for Kafka Source



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (FLUME-2251) Add support for Kafka Sink

2014-08-11 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093082#comment-14093082
 ] 

Gwen Shapira commented on FLUME-2251:
-

[~baniu.yao] the link you posted here is for kafka source. This JIRA is for 
kafka sink...

I found the following repository in the parent JIRA (FLUME-2242) with a kafka 
sink implementation for kafka 0.8: 
https://github.com/thilinamb/flume-ng-kafka-sink.

I hope [~thilinamb] will create a patch and submit in this JIRA. 

> Add support for Kafka Sink
> --
>
> Key: FLUME-2251
> URL: https://issues.apache.org/jira/browse/FLUME-2251
> Project: Flume
>  Issue Type: Sub-task
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>Reporter: Ashish Paliwal
>Priority: Minor
> Attachments: FLUME-2251-0.patch, FLUME-2251.patch, FLUME-2251.patch
>
>
> Add support for Kafka Sink



--
This message was sent by Atlassian JIRA
(v6.2#6252)