Re: Guarantees of the memory channel for delivering to sink

2012-11-07 Thread Rahul Ravindran
Ping on the below questions about new Spool Directory source:

If we choose to use the memory channel with this source, to an Avro sink on a 
remote box, do we risk data loss in the eventuality of a network partition/slow 
network or if the flume-agent on the source box dies?
If we choose to use file channel with this source, we will result in double 
writes to disk, correct? (one for the legacy log files which will be ingested 
by the Spool Directory source, and the other for the WAL)





 From: Rahul Ravindran rahu...@yahoo.com
To: user@flume.apache.org user@flume.apache.org 
Sent: Tuesday, November 6, 2012 3:40 PM
Subject: Re: Guarantees of the memory channel for delivering to sink
 

This is awesome. 
This may be perfect for our use case :)

When is the 1.3 release expected?

Couple of questions for the choice of channel for the new source:

If we choose to use the memory channel with this source, to an Avro sink on a 
remote box, do we risk data loss in the eventuality of a network partition/slow 
network or if the flume-agent on the source box dies?
If we choose to use file channel with this source, we will result in double 
writes to disk, correct? (one for the legacy log files which will be ingested 
by the Spool Directory source, and the other for the WAL)

Thanks,
~Rahul.



 From: Brock Noland br...@cloudera.com
To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com 
Sent: Tuesday, November 6, 2012 3:05 PM
Subject: Re: Guarantees of the memory channel for delivering to sink
 
This use case sounds like a perfect use of the Spool DIrectory source
which will be in the upcoming 1.3 release.

Brock

On Tue, Nov 6, 2012 at 4:53 PM, Rahul Ravindran rahu...@yahoo.com wrote:
 We will update the checkpoint each time (we may tune this to be
 periodic)
 but the contents of the memory channel will be in the legacy logs which are
 currently being generated.

 Additionally, the sink for the memory channel will be an Avro source in
 another machine.

 Does that clear things up?

 
 From: Brock Noland br...@cloudera.com
 To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com
 Sent: Tuesday, November 6, 2012 1:44 PM

 Subject: Re: Guarantees of the memory channel for delivering to sink

 But in your architecture you are going to write the contents of the
 memory channel out? Or did I miss
 something?

 The checkpoint will be updated each time we perform a successive
 insertion into the memory channel.

 On Tue, Nov 6, 2012 at 3:43 PM, Rahul Ravindran rahu...@yahoo.com wrote:
 We have a legacy system which writes events to a file (existing log file).
 This will continue. If I used a filechannel, I will be double the number
 of
 IO operations(writes to the legacy log file, and writes to WAL).

 
 From: Brock Noland br...@cloudera.com
 To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com
 Sent: Tuesday, November 6, 2012 1:38 PM
 Subject: Re: Guarantees of the memory channel for delivering to sink

 Your still going to be writing out all events, no? So how would file
 channel do more IO than that?

 On Tue, Nov 6, 2012 at 3:32 PM, Rahul Ravindran rahu...@yahoo.com wrote:
 Hi,
    I am very new to Flume and we are hoping to use it for our log
 aggregation into HDFS. I have a few questions below:

 FileChannel will double our disk IO, which will affect IO performance on
 certain performance sensitive machines. Hence, I was hoping to write a
 custom Flume source which will use a memory channel, and which
 will
 perform
 checkpointing. The checkpoint will be updated each time we perform a
 successive insertion into the memory channel. (I realize that this
 results
 in a risk of data, the maximum size of which is the capacity of the
 memory
 channel).

    As long as there is capacity in the memory channel buffers, does the
 memory channel guarantee delivery to a sink (does it wait for
 acknowledgements, and retry failed packets)? This would mean that we need
 to
 ensure that we do not exceed the channel capacity.

 I am writing a custom source which will use the memory channel, and which
 will catch a ChannelException to identify any channel capacity issues(so,
 buffer used in the memory channel
 is full because of lagging
 sinks/network
 issues etc). Is that a reasonable assumption to make?

 Thanks,
 ~Rahul.



 --
 Apache MRUnit - Unit testing MapReduce -
 http://incubator.apache.org/mrunit/





 --
 Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/





-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Re: Guarantees of the memory channel for delivering to sink

2012-11-07 Thread Brock Noland
Hi,

Yes if you use memory channel, you can lose data. To not lose data, file
channel needs to write to disk...

Brock

On Wed, Nov 7, 2012 at 1:29 PM, Rahul Ravindran rahu...@yahoo.com wrote:

 Ping on the below questions about new Spool Directory source:

 If we choose to use the memory channel with this source, to an Avro sink
 on a remote box, do we risk data loss in the eventuality of a network
 partition/slow network or if the flume-agent on the source box dies?
 If we choose to use file channel with this source, we will result in
 double writes to disk, correct? (one for the legacy log files which will be
 ingested by the Spool Directory source, and the other for the WAL)


   --
 *From:* Rahul Ravindran rahu...@yahoo.com
  *To:* user@flume.apache.org user@flume.apache.org
 *Sent:* Tuesday, November 6, 2012 3:40 PM

 *Subject:* Re: Guarantees of the memory channel for delivering to sink

 This is awesome.
 This may be perfect for our use case :)

 When is the 1.3 release expected?

 Couple of questions for the choice of channel for the new source:

 If we choose to use the memory channel with this source, to an Avro sink
 on a remote box, do we risk data loss in the eventuality of a network
 partition/slow network or if the flume-agent on the source box dies?
 If we choose to use file channel with this source, we will result in
 double writes to disk, correct? (one for the legacy log files which will be
 ingested by the Spool Directory source, and the other for the WAL)

 Thanks,
 ~Rahul.

   --
 *From:* Brock Noland br...@cloudera.com
 *To:* user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com
 *Sent:* Tuesday, November 6, 2012 3:05 PM
 *Subject:* Re: Guarantees of the memory channel for delivering to sink

 This use case sounds like a perfect use of the Spool DIrectory source
 which will be in the upcoming 1.3 release.

 Brock

 On Tue, Nov 6, 2012 at 4:53 PM, Rahul Ravindran rahu...@yahoo.com wrote:
  We will update the checkpoint each time (we may tune this to be periodic)
  but the contents of the memory channel will be in the legacy logs which
 are
  currently being generated.
 
  Additionally, the sink for the memory channel will be an Avro source in
  another machine.
 
  Does that clear things up?
 
  
  From: Brock Noland br...@cloudera.com
  To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com
  Sent: Tuesday, November 6, 2012 1:44 PM
 
  Subject: Re: Guarantees of the memory channel for delivering to sink
 
  But in your architecture you are going to write the contents of the
  memory channel out? Or did I miss something?
 
  The checkpoint will be updated each time we perform a successive
  insertion into the memory channel.
 
  On Tue, Nov 6, 2012 at 3:43 PM, Rahul Ravindran rahu...@yahoo.com
 wrote:
  We have a legacy system which writes events to a file (existing log
 file).
  This will continue. If I used a filechannel, I will be double the number
  of
  IO operations(writes to the legacy log file, and writes to WAL).
 
  
  From: Brock Noland br...@cloudera.com
  To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com
  Sent: Tuesday, November 6, 2012 1:38 PM
  Subject: Re: Guarantees of the memory channel for delivering to sink
 
  Your still going to be writing out all events, no? So how would file
  channel do more IO than that?
 
  On Tue, Nov 6, 2012 at 3:32 PM, Rahul Ravindran rahu...@yahoo.com
 wrote:
  Hi,
 I am very new to Flume and we are hoping to use it for our log
  aggregation into HDFS. I have a few questions below:
 
  FileChannel will double our disk IO, which will affect IO performance
 on
  certain performance sensitive machines. Hence, I was hoping to write a
  custom Flume source which will use a memory channel, and which will
  perform
  checkpointing. The checkpoint will be updated each time we perform a
  successive insertion into the memory channel. (I realize that this
  results
  in a risk of data, the maximum size of which is the capacity of the
  memory
  channel).
 
 As long as there is capacity in the memory channel buffers, does the
  memory channel guarantee delivery to a sink (does it wait for
  acknowledgements, and retry failed packets)? This would mean that we
 need
  to
  ensure that we do not exceed the channel capacity.
 
  I am writing a custom source which will use the memory channel, and
 which
  will catch a ChannelException to identify any channel capacity
 issues(so,
  buffer used in the memory channel is full because of lagging
  sinks/network
  issues etc). Is that a reasonable assumption to make?
 
  Thanks,
  ~Rahul.
 
 
 
  --
  Apache MRUnit - Unit testing MapReduce -
  http://incubator.apache.org/mrunit/
 
 
 
 
 
  --
  Apache MRUnit - Unit testing MapReduce -
 http://incubator.apache.org/mrunit/
 
 



 --
 Apache MRUnit - Unit testing MapReduce -
 http://incubator.apache.org/mrunit/


Re: Guarantees of the memory channel for delivering to sink

2012-11-07 Thread Rahul Ravindran
Hi,

Thanks for the response.

Does the memory channel provide transactional guarantees? In the event of a 
network packet loss, does it retry sending the packet? If we ensure that we do 
not exceed the capacity for the memory channel, does it continue retrying to 
send an event to the remote source on failure?

Thanks,
~Rahul.



 From: Brock Noland br...@cloudera.com
To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com 
Sent: Wednesday, November 7, 2012 11:48 AM
Subject: Re: Guarantees of the memory channel for delivering to sink
 

Hi,

Yes if you use memory channel, you can lose data. To not lose data, file 
channel needs to write to disk...

Brock


On Wed, Nov 7, 2012 at 1:29 PM, Rahul Ravindran rahu...@yahoo.com wrote:

Ping on the below questions about new Spool Directory source:


If we choose to use the memory channel with this source, to an Avro sink on a 
remote box, do we risk data loss in the eventuality of a network 
partition/slow network or if the flume-agent on the source box dies?
If we choose to use file channel with this source, we will result in double 
writes to disk, correct? (one for the legacy log files which will be ingested 
by the Spool Directory source, and the other for the WAL)






 From: Rahul Ravindran rahu...@yahoo.com
To: user@flume.apache.org user@flume.apache.org 
Sent: Tuesday, November 6, 2012 3:40 PM

Subject: Re: Guarantees of the memory channel for delivering to sink
 


This is awesome. 
This may be perfect for our use case :)


When is the 1.3 release expected?


Couple of questions for the choice of channel for the new source:


If we choose to use the memory channel with this source, to an Avro sink on a 
remote box, do we risk data loss in the eventuality of a network 
partition/slow network or if the flume-agent on the source box dies?
If we choose to use file channel with this source, we will result in double 
writes to disk, correct? (one for the legacy log files which will be ingested 
by the Spool Directory source, and the other for the WAL)


Thanks,
~Rahul.




 From: Brock Noland br...@cloudera.com
To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com 
Sent: Tuesday, November 6, 2012 3:05 PM
Subject: Re: Guarantees of the memory channel for delivering to sink
 
This use case sounds like a perfect use of the Spool DIrectory source
which will be in the upcoming 1.3 release.

Brock

On Tue, Nov 6, 2012 at 4:53 PM, Rahul Ravindran rahu...@yahoo.com wrote:
 We will update the checkpoint each time
 (we may tune this to be
 periodic)
 but the contents of the memory channel will be in the legacy logs which are
 currently being generated.

 Additionally, the sink for the memory channel will be an Avro source in
 another machine.

 Does that clear things up?

 
 From: Brock Noland br...@cloudera.com
 To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com
 Sent: Tuesday, November 6, 2012 1:44 PM

 Subject: Re: Guarantees of the memory channel for delivering to sink

 But in your architecture you
 are going to write the contents of the
 memory channel out? Or did I miss
 something?

 The checkpoint will be updated each time we perform a successive
 insertion into the memory channel.

 On Tue, Nov 6, 2012 at 3:43 PM, Rahul Ravindran rahu...@yahoo.com wrote:
 We have a legacy system which writes events to a file (existing log file).
 This will continue. If I used a filechannel, I will be double the number
 of
 IO operations(writes to the legacy log file, and writes to WAL).

 
 From: Brock Noland br...@cloudera.com
 To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com
 Sent: Tuesday, November 6, 2012 1:38 PM
 Subject: Re: Guarantees of the memory channel for delivering to sink

 Your still going to be writing out all events, no? So how would file
 channel do more IO than that?

 On Tue, Nov 6, 2012 at 3:32 PM, Rahul Ravindran rahu...@yahoo.com wrote:
 Hi,
    I am very new to Flume and we are hoping to use it for our log
 aggregation into HDFS. I have a few questions below:

 FileChannel will double our disk IO, which will affect IO
 performance on
 certain performance sensitive machines. Hence, I was hoping to write a
 custom Flume source which will use a memory channel, and which
 will
 perform
 checkpointing. The checkpoint will be updated each time we perform a
 successive insertion into the memory channel. (I realize that this
 results
 in a risk of data, the maximum size of which is the capacity of the
 memory
 channel).

    As long as there is capacity in the memory channel buffers, does the
 memory channel guarantee delivery to a sink (does it wait for
 acknowledgements, and retry failed packets)? This would mean that we need
 to
 ensure that we do not exceed the channel capacity.

 I am writing a custom source which will use the memory channel, and which
 will catch 

Re: Guarantees of the memory channel for delivering to sink

2012-11-07 Thread Brock Noland
The memory channel doesn't know about networks.  The sources like
avrosource/avrosink do. They operate on TCP/IP and when there is an error
sending data downstream they roll the transaction back so that no data is
lost. The believe the docs cover this here
http://flume.apache.org/FlumeUserGuide.html

Brock

On Wed, Nov 7, 2012 at 1:52 PM, Rahul Ravindran rahu...@yahoo.com wrote:

 Hi,

 Thanks for the response.

 Does the memory channel provide transactional guarantees? In the event of
 a network packet loss, does it retry sending the packet? If we ensure that
 we do not exceed the capacity for the memory channel, does it continue
 retrying to send an event to the remote source on failure?

 Thanks,
 ~Rahul.

   --
 *From:* Brock Noland br...@cloudera.com
 *To:* user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com
 *Sent:* Wednesday, November 7, 2012 11:48 AM

 *Subject:* Re: Guarantees of the memory channel for delivering to sink

 Hi,

 Yes if you use memory channel, you can lose data. To not lose data, file
 channel needs to write to disk...

 Brock

 On Wed, Nov 7, 2012 at 1:29 PM, Rahul Ravindran rahu...@yahoo.com wrote:

 Ping on the below questions about new Spool Directory source:

 If we choose to use the memory channel with this source, to an Avro sink
 on a remote box, do we risk data loss in the eventuality of a network
 partition/slow network or if the flume-agent on the source box dies?
 If we choose to use file channel with this source, we will result in
 double writes to disk, correct? (one for the legacy log files which will be
 ingested by the Spool Directory source, and the other for the WAL)


   --
 *From:* Rahul Ravindran rahu...@yahoo.com
  *To:* user@flume.apache.org user@flume.apache.org
 *Sent:* Tuesday, November 6, 2012 3:40 PM

 *Subject:* Re: Guarantees of the memory channel for delivering to sink

 This is awesome.
 This may be perfect for our use case :)

 When is the 1.3 release expected?

 Couple of questions for the choice of channel for the new source:

 If we choose to use the memory channel with this source, to an Avro sink
 on a remote box, do we risk data loss in the eventuality of a network
 partition/slow network or if the flume-agent on the source box dies?
 If we choose to use file channel with this source, we will result in
 double writes to disk, correct? (one for the legacy log files which will be
 ingested by the Spool Directory source, and the other for the WAL)

 Thanks,
 ~Rahul.

   --
 *From:* Brock Noland br...@cloudera.com
 *To:* user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com
 *Sent:* Tuesday, November 6, 2012 3:05 PM
 *Subject:* Re: Guarantees of the memory channel for delivering to sink

 This use case sounds like a perfect use of the Spool DIrectory source
 which will be in the upcoming 1.3 release.

 Brock

 On Tue, Nov 6, 2012 at 4:53 PM, Rahul Ravindran rahu...@yahoo.com wrote:
  We will update the checkpoint each time (we may tune this to be periodic)
  but the contents of the memory channel will be in the legacy logs which
 are
  currently being generated.
 
  Additionally, the sink for the memory channel will be an Avro source in
  another machine.
 
  Does that clear things up?
 
  
  From: Brock Noland br...@cloudera.com
  To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com
  Sent: Tuesday, November 6, 2012 1:44 PM
 
  Subject: Re: Guarantees of the memory channel for delivering to sink
 
  But in your architecture you are going to write the contents of the
  memory channel out? Or did I miss something?
 
  The checkpoint will be updated each time we perform a successive
  insertion into the memory channel.
 
  On Tue, Nov 6, 2012 at 3:43 PM, Rahul Ravindran rahu...@yahoo.com
 wrote:
  We have a legacy system which writes events to a file (existing log
 file).
  This will continue. If I used a filechannel, I will be double the number
  of
  IO operations(writes to the legacy log file, and writes to WAL).
 
  
  From: Brock Noland br...@cloudera.com
  To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com
  Sent: Tuesday, November 6, 2012 1:38 PM
  Subject: Re: Guarantees of the memory channel for delivering to sink
 
  Your still going to be writing out all events, no? So how would file
  channel do more IO than that?
 
  On Tue, Nov 6, 2012 at 3:32 PM, Rahul Ravindran rahu...@yahoo.com
 wrote:
  Hi,
 I am very new to Flume and we are hoping to use it for our log
  aggregation into HDFS. I have a few questions below:
 
  FileChannel will double our disk IO, which will affect IO performance
 on
  certain performance sensitive machines. Hence, I was hoping to write a
  custom Flume source which will use a memory channel, and which will
  perform
  checkpointing. The checkpoint will be updated each time we perform a
  successive insertion into the memory channel. (I 

Guarantees of the memory channel for delivering to sink

2012-11-06 Thread Rahul Ravindran
Hi,
   I am very new to Flume and we are hoping to use it for our log aggregation 
into HDFS. I have a few questions below:

FileChannel will double our disk IO, which will affect IO performance on 
certain performance sensitive machines. Hence, I was hoping to write a custom 
Flume source which will use a memory channel, and which will perform 
checkpointing. The checkpoint will be updated each time we perform a successive 
insertion into the memory channel. (I realize that this results in a risk of 
data, the maximum size of which is the capacity of the memory channel).

   As long as there is capacity in the memory channel buffers, does the memory 
channel guarantee delivery to a sink (does it wait for acknowledgements, and 
retry failed packets)? This would mean that we need to ensure that we do not 
exceed the channel capacity.

I am writing a custom source which will use the memory channel, and which will 
catch a ChannelException to identify any channel capacity issues(so, buffer 
used in the memory channel is full because of lagging sinks/network issues 
etc). Is that a reasonable assumption to make?

Thanks,
~Rahul.

Re: Guarantees of the memory channel for delivering to sink

2012-11-06 Thread Brock Noland
Your still going to be writing out all events, no? So how would file
channel do more IO than that?

On Tue, Nov 6, 2012 at 3:32 PM, Rahul Ravindran rahu...@yahoo.com wrote:
 Hi,
I am very new to Flume and we are hoping to use it for our log
 aggregation into HDFS. I have a few questions below:

 FileChannel will double our disk IO, which will affect IO performance on
 certain performance sensitive machines. Hence, I was hoping to write a
 custom Flume source which will use a memory channel, and which will perform
 checkpointing. The checkpoint will be updated each time we perform a
 successive insertion into the memory channel. (I realize that this results
 in a risk of data, the maximum size of which is the capacity of the memory
 channel).

As long as there is capacity in the memory channel buffers, does the
 memory channel guarantee delivery to a sink (does it wait for
 acknowledgements, and retry failed packets)? This would mean that we need to
 ensure that we do not exceed the channel capacity.

 I am writing a custom source which will use the memory channel, and which
 will catch a ChannelException to identify any channel capacity issues(so,
 buffer used in the memory channel is full because of lagging sinks/network
 issues etc). Is that a reasonable assumption to make?

 Thanks,
 ~Rahul.



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/


Re: Guarantees of the memory channel for delivering to sink

2012-11-06 Thread Brock Noland
But in your architecture you are going to write the contents of the
memory channel out? Or did I miss something?

The checkpoint will be updated each time we perform a successive
insertion into the memory channel.

On Tue, Nov 6, 2012 at 3:43 PM, Rahul Ravindran rahu...@yahoo.com wrote:
 We have a legacy system which writes events to a file (existing log file).
 This will continue. If I used a filechannel, I will be double the number of
 IO operations(writes to the legacy log file, and writes to WAL).

 
 From: Brock Noland br...@cloudera.com
 To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com
 Sent: Tuesday, November 6, 2012 1:38 PM
 Subject: Re: Guarantees of the memory channel for delivering to sink

 Your still going to be writing out all events, no? So how would file
 channel do more IO than that?

 On Tue, Nov 6, 2012 at 3:32 PM, Rahul Ravindran rahu...@yahoo.com wrote:
 Hi,
I am very new to Flume and we are hoping to use it for our log
 aggregation into HDFS. I have a few questions below:

 FileChannel will double our disk IO, which will affect IO performance on
 certain performance sensitive machines. Hence, I was hoping to write a
 custom Flume source which will use a memory channel, and which will
 perform
 checkpointing. The checkpoint will be updated each time we perform a
 successive insertion into the memory channel. (I realize that this results
 in a risk of data, the maximum size of which is the capacity of the memory
 channel).

As long as there is capacity in the memory channel buffers, does the
 memory channel guarantee delivery to a sink (does it wait for
 acknowledgements, and retry failed packets)? This would mean that we need
 to
 ensure that we do not exceed the channel capacity.

 I am writing a custom source which will use the memory channel, and which
 will catch a ChannelException to identify any channel capacity issues(so,
 buffer used in the memory channel is full because of lagging sinks/network
 issues etc). Is that a reasonable assumption to make?

 Thanks,
 ~Rahul.



 --
 Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/





-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/


Re: Guarantees of the memory channel for delivering to sink

2012-11-06 Thread Rahul Ravindran
We will update the checkpoint each time (we may tune this to be periodic) but 
the contents of the memory channel will be in the legacy logs which are 
currently being generated.


Additionally, the sink for the memory channel will be an Avro source in another 
machine.

Does that clear things up?



 From: Brock Noland br...@cloudera.com
To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com 
Sent: Tuesday, November 6, 2012 1:44 PM
Subject: Re: Guarantees of the memory channel for delivering to sink
 
But in your architecture you are going to write the contents of the
memory channel out? Or did I miss something?

The checkpoint will be updated each time we perform a successive
insertion into the memory channel.

On Tue, Nov 6, 2012 at 3:43 PM, Rahul Ravindran rahu...@yahoo.com wrote:
 We have a legacy system which writes events to a file (existing log file).
 This will continue. If I used a filechannel, I will be double the number of
 IO operations(writes to the legacy log file, and writes to WAL).

 
 From: Brock Noland br...@cloudera.com
 To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com
 Sent: Tuesday, November 6, 2012 1:38 PM
 Subject: Re: Guarantees of the memory channel for delivering to sink

 Your still going to be writing out all events, no? So how would file
 channel do more IO than that?

 On Tue, Nov 6, 2012 at 3:32 PM, Rahul Ravindran rahu...@yahoo.com wrote:
 Hi,
    I am very new to Flume and we are hoping to use it for our log
 aggregation into HDFS. I have a few questions below:

 FileChannel will double our disk IO, which will affect IO performance on
 certain performance sensitive machines. Hence, I was hoping to write a
 custom Flume source which will use a memory channel, and which will
 perform
 checkpointing. The checkpoint will be updated each time we perform a
 successive insertion into the memory channel. (I realize that this results
 in a risk of data, the maximum size of which is the capacity of the memory
 channel).

    As long as there is capacity in the memory channel buffers, does the
 memory channel guarantee delivery to a sink (does it wait for
 acknowledgements, and retry failed packets)? This would mean that we need
 to
 ensure that we do not exceed the channel capacity.

 I am writing a custom source which will use the memory channel, and which
 will catch a ChannelException to identify any channel capacity issues(so,
 buffer used in the memory channel is full because of lagging sinks/network
 issues etc). Is that a reasonable assumption to make?

 Thanks,
 ~Rahul.



 --
 Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/





-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Re: Guarantees of the memory channel for delivering to sink

2012-11-06 Thread Brock Noland
This use case sounds like a perfect use of the Spool DIrectory source
which will be in the upcoming 1.3 release.

Brock

On Tue, Nov 6, 2012 at 4:53 PM, Rahul Ravindran rahu...@yahoo.com wrote:
 We will update the checkpoint each time (we may tune this to be periodic)
 but the contents of the memory channel will be in the legacy logs which are
 currently being generated.

 Additionally, the sink for the memory channel will be an Avro source in
 another machine.

 Does that clear things up?

 
 From: Brock Noland br...@cloudera.com
 To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com
 Sent: Tuesday, November 6, 2012 1:44 PM

 Subject: Re: Guarantees of the memory channel for delivering to sink

 But in your architecture you are going to write the contents of the
 memory channel out? Or did I miss something?

 The checkpoint will be updated each time we perform a successive
 insertion into the memory channel.

 On Tue, Nov 6, 2012 at 3:43 PM, Rahul Ravindran rahu...@yahoo.com wrote:
 We have a legacy system which writes events to a file (existing log file).
 This will continue. If I used a filechannel, I will be double the number
 of
 IO operations(writes to the legacy log file, and writes to WAL).

 
 From: Brock Noland br...@cloudera.com
 To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com
 Sent: Tuesday, November 6, 2012 1:38 PM
 Subject: Re: Guarantees of the memory channel for delivering to sink

 Your still going to be writing out all events, no? So how would file
 channel do more IO than that?

 On Tue, Nov 6, 2012 at 3:32 PM, Rahul Ravindran rahu...@yahoo.com wrote:
 Hi,
I am very new to Flume and we are hoping to use it for our log
 aggregation into HDFS. I have a few questions below:

 FileChannel will double our disk IO, which will affect IO performance on
 certain performance sensitive machines. Hence, I was hoping to write a
 custom Flume source which will use a memory channel, and which will
 perform
 checkpointing. The checkpoint will be updated each time we perform a
 successive insertion into the memory channel. (I realize that this
 results
 in a risk of data, the maximum size of which is the capacity of the
 memory
 channel).

As long as there is capacity in the memory channel buffers, does the
 memory channel guarantee delivery to a sink (does it wait for
 acknowledgements, and retry failed packets)? This would mean that we need
 to
 ensure that we do not exceed the channel capacity.

 I am writing a custom source which will use the memory channel, and which
 will catch a ChannelException to identify any channel capacity issues(so,
 buffer used in the memory channel is full because of lagging
 sinks/network
 issues etc). Is that a reasonable assumption to make?

 Thanks,
 ~Rahul.



 --
 Apache MRUnit - Unit testing MapReduce -
 http://incubator.apache.org/mrunit/





 --
 Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/





-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/