subject:"\[jira\] \[Commented\] \(FLUME\-1227\) Introduce some sort of SpillableChannel"

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2016-08-15 Thread Laxman (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15422202#comment-15422202
 ] 

Laxman commented on FLUME-1227:
---

[~roshan_naik], we are planning to use this channel. But found that this does 
not persist in-memory data on shutdown. Found FLUME-2396 has been filed for the 
same. IMHO, dataloss in a channel with persistence may not be acceptable. I can 
work with you, if you feel this should be fixed.

> Introduce some sort of SpillableChannel
> ---
>
> Key: FLUME-1227
> URL: https://issues.apache.org/jira/browse/FLUME-1227
> Project: Flume
>  Issue Type: New Feature
>  Components: Channel
>Affects Versions: v1.4.0
>Reporter: Jarek Jarcec Cecho
>Assignee: Roshan Naik
> Fix For: v1.5.0
>
> Attachments: 1227.patch.1, FLUME-1227.v2.patch, FLUME-1227.v5.patch, 
> FLUME-1227.v6.patch, FLUME-1227.v7.patch, FLUME-1227.v8.patch, 
> FLUME-1227.v9.patch, SpillableMemory Channel Design 2.pdf, SpillableMemory 
> Channel Design.pdf
>
>
> I would like to introduce new channel that would behave similarly as scribe 
> (https://github.com/facebook/scribe). It would be something between memory 
> and file channel. Input events would be saved directly to the memory (only) 
> and would be served from there. In case that the memory would be full, we 
> would outsource the events to file.
> Let me describe the use case behind this request. We have plenty of frontend 
> servers that are generating events. We want to send all events to just 
> limited number of machines from where we would send the data to HDFS (some 
> sort of staging layer). Reason for this second layer is our need to decouple 
> event aggregation and front end code to separate machines. Using memory 
> channel is fully sufficient as we can survive lost of some portion of the 
> events. However in order to sustain maintenance windows or networking issues 
> we would have to end up with a lot of memory assigned to those "staging" 
> machines. Referenced "scribe" is dealing with this problem by implementing 
> following logic - events are saved in memory similarly as our MemoryChannel. 
> However in case that the memory gets full (because of maintenance, networking 
> issues, ...) it will spill data to disk where they will be sitting until 
> everything start working again.
> I would like to introduce channel that would implement similar logic. It's 
> durability guarantees would be same as MemoryChannel - in case that someone 
> would remove power cord, this channel would lose data. Based on the 
> discussion in FLUME-1201, I would propose to have the implementation 
> completely independent on any other channel internal code.
> Jarcec



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2014-02-28 Thread Hari Shreedharan (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916097#comment-13916097
]

Hari Shreedharan commented on FLUME-1227:
-

[~roshan_naik] - When we roll 1.5, jiras with no fix versions will be updated.

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho
Assignee: Roshan Naik
Attachments: 1227.patch.1, FLUME-1227.v2.patch, FLUME-1227.v5.patch,
FLUME-1227.v6.patch, FLUME-1227.v7.patch, FLUME-1227.v8.patch,
FLUME-1227.v9.patch, SpillableMemory Channel Design 2.pdf, SpillableMemory
Channel Design.pdf

I would like to introduce new channel that would behave similarly as scribe
(https://github.com/facebook/scribe). It would be something between memory
and file channel. Input events would be saved directly to the memory (only)
and would be served from there. In case that the memory would be full, we
would outsource the events to file.
Let me describe the use case behind this request. We have plenty of frontend
servers that are generating events. We want to send all events to just
limited number of machines from where we would send the data to HDFS (some
sort of staging layer). Reason for this second layer is our need to decouple
event aggregation and front end code to separate machines. Using memory
channel is fully sufficient as we can survive lost of some portion of the
events. However in order to sustain maintenance windows or networking issues
we would have to end up with a lot of memory assigned to those staging
machines. Referenced scribe is dealing with this problem by implementing
following logic - events are saved in memory similarly as our MemoryChannel.
However in case that the memory gets full (because of maintenance, networking
issues, ...) it will spill data to disk where they will be sitting until
everything start working again.
I would like to introduce channel that would implement similar logic. It's
durability guarantees would be same as MemoryChannel - in case that someone
would remove power cord, this channel would lose data. Based on the
discussion in FLUME-1201, I would propose to have the implementation
completely independent on any other channel internal code.
Jarcec

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2014-02-27 Thread Hari Shreedharan (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915146#comment-13915146
]

Hari Shreedharan commented on FLUME-1227:
-

+1. I am going to run tests and commit this one. Since this is being marked as
experimental, I made a change in the user guide to clarify it is not
recommended for production use.

I also made some minor indentation changes in SpillableMemoryChannel.java

Introduce some sort of SpillableChannel
---

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2014-02-27 Thread ASF subversion and git services (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915184#comment-13915184
]

ASF subversion and git services commented on FLUME-1227:

Commit d5805c8598be4eec85de8973b4c98ecdd7ffe6d3 in flume's branch
refs/heads/flume-1.5 from [~hshreedharan]
[ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=d5805c8 ]

FLUME-1227. Introduce Spillable Channel.

(Roshan Naik via Hari Shreedharan)

Introduce some sort of SpillableChannel
---

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2014-02-27 Thread ASF subversion and git services (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915183#comment-13915183
]

ASF subversion and git services commented on FLUME-1227:

Commit 6a50ec2ad33b8cbd057907c67030d855520c5f13 in flume's branch
refs/heads/trunk from [~hshreedharan]
[ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=6a50ec2 ]

FLUME-1227. Introduce Spillable Channel.

(Roshan Naik via Hari Shreedharan)

Introduce some sort of SpillableChannel
---

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2014-02-27 Thread Roshan Naik (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915232#comment-13915232
]

Roshan Naik commented on FLUME-1227:

Should we set the 'fix version' to 1.5 ?

Introduce some sort of SpillableChannel
---

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2014-02-26 Thread Otis Gospodnetic (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13913186#comment-13913186
]

Otis Gospodnetic commented on FLUME-1227:
-

Was just about to write to the ML asking about this functionality. Looks like
all known issues have been fixed, plus this is new functionality, so it should
go in and get some real-world action, which we'd love to give it as soon as
1.5.0 is out!

+10 for committing this. Any chances of this going in before 1.5.0 is cut?

Introduce some sort of SpillableChannel
---

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2014-02-26 Thread Thilo Seidel (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13913190#comment-13913190
]

Thilo Seidel commented on FLUME-1227:
-

Guten Tag,
Ich bin heute nicht im Büro. Ihre Mail wird bis zu meiner Rückkehr weder
gelesen noch automatisch weitergeleitet.
Viele Grüße
Thilo Seidel

Introduce some sort of SpillableChannel
---

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2014-01-24 Thread Roshan Naik (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880852#comment-13880852
]

Roshan Naik commented on FLUME-1227:

[~hshreedharan] if there are no other comments.. could you look into committing
this ?

Introduce some sort of SpillableChannel
---

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2014-01-15 Thread Hari Shreedharan (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872552#comment-13872552
]

Hari Shreedharan commented on FLUME-1227:
-

[~roshan_naik] - Is this ready for review (since you have not hit Submit
Patch)?

Introduce some sort of SpillableChannel
---

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-12-26 Thread Brock Noland (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857070#comment-13857070
]

Brock Noland commented on FLUME-1227:
-

Thank you for addressing the feedback! I am OK with your reasoning regarding
adding dual checkpointing to the example. I haven't looked at this code and
review in detail. It looks like Hari has, so I think he'll have to make the
call of when to commit.

Thank you for your hard work Roshan!

Introduce some sort of SpillableChannel
---

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-12-19 Thread Roshan Naik (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853707#comment-13853707
]

Roshan Naik commented on FLUME-1227:

thanks for the feedback [~brocknoland]
Will incorporate ur feedback and update the patch soon.

WRT to the adding notes on file channel best practices into Spillable Channel
section, i am not too hot on that unless it has specifically to do with its
coupling with Spillable channel. In (FLUME-2239) recently I made a note about
multiple data dirs helping file channel perf. Also the dual checkpoint feature
is broken on Windows(FLUME-2224). Let me know if you feel otherwise.

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho
Assignee: Roshan Naik
Attachments: 1227.patch.1, FLUME-1227.v2.patch, FLUME-1227.v5.patch,
FLUME-1227.v6.patch, FLUME-1227.v7.patch, FLUME-1227.v8.patch,
SpillableMemory Channel Design 2.pdf, SpillableMemory Channel Design.pdf

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-12-18 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851768#comment-13851768
 ] 

Brock Noland commented on FLUME-1227:
-

Hey, I have not participated in the review til now so sorry about this...but I 
just noticed the following items which are mostly nits and improvements.

SpillableMemoryChannel
1. Static stuff should be at the top
2. Constructor should be directly below fields
3. String constants should be static final fields with javadoc description
4. Stuff can be final:
{noformat}
private Object queueLock = new Object();
{noformat}

TestSpillableMemoryChannel
1. Take null has a commented out assertion
2. There are locations where we expect Exception that should be a specific 
type of exception.
3. Let's not use e.printStackTrace();
4. Places we assert boolean should have a message
5. Many missing spaces such as:
{noformat}
for (int i=0; icount; ++i) {
{noformat}
and
{noformat}
nullsFound=count;
{noformat}

Docs

1. Please specify multiple data directories in the examples and add a note
that file channel performance will increase dramatically with multiple disks.

2. Add dual checkpoint to the examples as that is a good practice.

 Introduce some sort of SpillableChannel
 ---

 Key: FLUME-1227
 URL: https://issues.apache.org/jira/browse/FLUME-1227
 Project: Flume
  Issue Type: New Feature
  Components: Channel
Reporter: Jarek Jarcec Cecho
Assignee: Roshan Naik
 Attachments: 1227.patch.1, FLUME-1227.v2.patch, FLUME-1227.v5.patch, 
 FLUME-1227.v6.patch, FLUME-1227.v7.patch, FLUME-1227.v8.patch, 
 SpillableMemory Channel Design 2.pdf, SpillableMemory Channel Design.pdf


 I would like to introduce new channel that would behave similarly as scribe 
 (https://github.com/facebook/scribe). It would be something between memory 
 and file channel. Input events would be saved directly to the memory (only) 
 and would be served from there. In case that the memory would be full, we 
 would outsource the events to file.
 Let me describe the use case behind this request. We have plenty of frontend 
 servers that are generating events. We want to send all events to just 
 limited number of machines from where we would send the data to HDFS (some 
 sort of staging layer). Reason for this second layer is our need to decouple 
 event aggregation and front end code to separate machines. Using memory 
 channel is fully sufficient as we can survive lost of some portion of the 
 events. However in order to sustain maintenance windows or networking issues 
 we would have to end up with a lot of memory assigned to those staging 
 machines. Referenced scribe is dealing with this problem by implementing 
 following logic - events are saved in memory similarly as our MemoryChannel. 
 However in case that the memory gets full (because of maintenance, networking 
 issues, ...) it will spill data to disk where they will be sitting until 
 everything start working again.
 I would like to introduce channel that would implement similar logic. It's 
 durability guarantees would be same as MemoryChannel - in case that someone 
 would remove power cord, this channel would lose data. Based on the 
 discussion in FLUME-1201, I would propose to have the implementation 
 completely independent on any other channel internal code.
 Jarcec



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-12-17 Thread Roshan Naik (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850319#comment-13850319
]

Roshan Naik commented on FLUME-1227:

Hi [~hshreedharan].. i have addressed most of your comments locally.. but will
need another day to address your comments on incorrect counter test issue.
it needs some thinking through on my part.. thanks for catching them.

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho
Assignee: Roshan Naik
Attachments: 1227.patch.1, FLUME-1227.v2.patch, FLUME-1227.v5.patch,
FLUME-1227.v6.patch, FLUME-1227.v7.patch, SpillableMemory Channel Design
2.pdf, SpillableMemory Channel Design.pdf

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-12-16 Thread Hari Shreedharan (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849852#comment-13849852
]

Hari Shreedharan commented on FLUME-1227:
-

Hey [~roshan_naik] - Any updates here?

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho
Assignee: Roshan Naik
Attachments: 1227.patch.1, FLUME-1227.v2.patch, FLUME-1227.v5.patch,
FLUME-1227.v6.patch, FLUME-1227.v7.patch, SpillableMemory Channel Design
2.pdf, SpillableMemory Channel Design.pdf

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-12-09 Thread Hari Shreedharan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843403#comment-13843403
 ] 

Hari Shreedharan commented on FLUME-1227:
-

Hi Roshan,

In the takePrimary and takeOverflow methods, there is a 
Preconditions.checkArgument method where like you mentioned in takePrimary 
method comments, there is an int-Integer-String conversion in a hot path 
(this is handled with an if in the takePrimary method, not in takeOverflow) - 
can you get rid of the the preconditions call, and just do:

if (...) {
throw IllegalStateException(..)
}.

This for one is cleaner, since the if already checks for the issue and we can 
avoid an unneeded method call.

Is this because rolling back the overflow txn will ensure that the event goes 
back into the file channel and you don't need to handle it?

{code}
 if (!useOverflow) {
  takeList.offer(event);  // takeList is thd pvt, so no need to do this 
in synchronized block
}
{code}

If that is the case the counters are incorrect when the transaction committed 
is overflow transaction, since this is how they are updated:
{code}
channelCounter.addToEventTakeSuccessCount(takeList.size());
{code}

Even this is not accurate:
{code}
  if (takeList.size()  largestTakeTxSize)
largestTakeTxSize = takeList.size();
{code}


There are also a couple issue with regards to failed transactions when writing 
to primary (granted it is a queue and it should not fail, but if a lock acquire 
gets interrupted, it can still fail). The memQueueRemaining semaphore has 
already been updated before pushing the events to the queue (that is definitely 
the right thing to do), but if a queue.offer fails the memQueueRemaining is not 
updated. This might be an issue with the current channels too - and is 
sufficiently rare to say we can revisit this later.

Also there is a possibility of partially successful transactions right now (if 
the queue inserts fail - that I guess is true for all channels right now, so I 
guess we can live with it - just mentioning it to ensure that we know it is a 
possibility and we can revisit if needed).


 Introduce some sort of SpillableChannel
 ---

 Key: FLUME-1227
 URL: https://issues.apache.org/jira/browse/FLUME-1227
 Project: Flume
  Issue Type: New Feature
  Components: Channel
Reporter: Jarek Jarcec Cecho
Assignee: Roshan Naik
 Attachments: 1227.patch.1, FLUME-1227.v2.patch, FLUME-1227.v5.patch, 
 FLUME-1227.v6.patch, FLUME-1227.v7.patch, SpillableMemory Channel Design 
 2.pdf, SpillableMemory Channel Design.pdf


 I would like to introduce new channel that would behave similarly as scribe 
 (https://github.com/facebook/scribe). It would be something between memory 
 and file channel. Input events would be saved directly to the memory (only) 
 and would be served from there. In case that the memory would be full, we 
 would outsource the events to file.
 Let me describe the use case behind this request. We have plenty of frontend 
 servers that are generating events. We want to send all events to just 
 limited number of machines from where we would send the data to HDFS (some 
 sort of staging layer). Reason for this second layer is our need to decouple 
 event aggregation and front end code to separate machines. Using memory 
 channel is fully sufficient as we can survive lost of some portion of the 
 events. However in order to sustain maintenance windows or networking issues 
 we would have to end up with a lot of memory assigned to those staging 
 machines. Referenced scribe is dealing with this problem by implementing 
 following logic - events are saved in memory similarly as our MemoryChannel. 
 However in case that the memory gets full (because of maintenance, networking 
 issues, ...) it will spill data to disk where they will be sitting until 
 everything start working again.
 I would like to introduce channel that would implement similar logic. It's 
 durability guarantees would be same as MemoryChannel - in case that someone 
 would remove power cord, this channel would lose data. Based on the 
 discussion in FLUME-1201, I would propose to have the implementation 
 completely independent on any other channel internal code.
 Jarcec



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-12-09 Thread Hari Shreedharan (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843421#comment-13843421
]

Hari Shreedharan commented on FLUME-1227:
-

Also, there are several lines 80 characters. Can you make sure that you fix
this too. For comments, please put the comments before the relevant line if
they are expected to be long.

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho
Assignee: Roshan Naik
Attachments: 1227.patch.1, FLUME-1227.v2.patch, FLUME-1227.v5.patch,
FLUME-1227.v6.patch, FLUME-1227.v7.patch, SpillableMemory Channel Design
2.pdf, SpillableMemory Channel Design.pdf

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-12-09 Thread Hari Shreedharan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843442#comment-13843442
 ] 

Hari Shreedharan commented on FLUME-1227:
-

The patch seems to be failing tests :
{code}
---
Picked up _JAVA_OPTIONS: -Djava.awt.headless=true
Running org.apache.flume.channel.TestSpillableMemoryChannel
Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 103.657 sec 
 FAILURE!
testTotalStoredSemaphore(org.apache.flume.channel.TestSpillableMemoryChannel)  
Time elapsed: 2923 sec   FAILURE!
java.lang.AssertionError: expected:0 but was:4500
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at 
org.apache.flume.channel.TestSpillableMemoryChannel.testTotalStoredSemaphore(TestSpillableMemoryChannel.java:735)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:46)
at org.junit.rules.RunRules.evaluate(RunRules.java:18)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at 
org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at 
org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)


Results :

Failed tests:   
testTotalStoredSemaphore(org.apache.flume.channel.TestSpillableMemoryChannel): 
expected:0 but was:4500

{code}

 Introduce some sort of SpillableChannel
 ---

 Key: FLUME-1227
 URL: https://issues.apache.org/jira/browse/FLUME-1227
 Project: Flume
  Issue Type: New Feature
  Components: Channel
Reporter: Jarek Jarcec Cecho
Assignee: Roshan Naik
 Attachments: 1227.patch.1, FLUME-1227.v2.patch, FLUME-1227.v5.patch, 
 FLUME-1227.v6.patch, FLUME-1227.v7.patch, SpillableMemory Channel Design 
 2.pdf, SpillableMemory Channel Design.pdf


 I would like to introduce new channel that would behave similarly as scribe 
 (https://github.com/facebook/scribe). It would be something between memory 
 and file channel. Input events would be saved directly to the memory (only) 
 and would be served from there. In case that the memory would be full, we 
 would outsource the events to

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-12-09 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843787#comment-13843787
 ] 

Roshan Naik commented on FLUME-1227:


- will fix the 80 character length issue you noted
- I will need to review code wrt your other comments related to Txn correctness 
more closely. let me get back to you on them.
- [~hshreedharan] could you please confirm that the test failure was noticed in 
in patch v7 ? 

 Introduce some sort of SpillableChannel
 ---

 Key: FLUME-1227
 URL: https://issues.apache.org/jira/browse/FLUME-1227
 Project: Flume
  Issue Type: New Feature
  Components: Channel
Reporter: Jarek Jarcec Cecho
Assignee: Roshan Naik
 Attachments: 1227.patch.1, FLUME-1227.v2.patch, FLUME-1227.v5.patch, 
 FLUME-1227.v6.patch, FLUME-1227.v7.patch, SpillableMemory Channel Design 
 2.pdf, SpillableMemory Channel Design.pdf


 I would like to introduce new channel that would behave similarly as scribe 
 (https://github.com/facebook/scribe). It would be something between memory 
 and file channel. Input events would be saved directly to the memory (only) 
 and would be served from there. In case that the memory would be full, we 
 would outsource the events to file.
 Let me describe the use case behind this request. We have plenty of frontend 
 servers that are generating events. We want to send all events to just 
 limited number of machines from where we would send the data to HDFS (some 
 sort of staging layer). Reason for this second layer is our need to decouple 
 event aggregation and front end code to separate machines. Using memory 
 channel is fully sufficient as we can survive lost of some portion of the 
 events. However in order to sustain maintenance windows or networking issues 
 we would have to end up with a lot of memory assigned to those staging 
 machines. Referenced scribe is dealing with this problem by implementing 
 following logic - events are saved in memory similarly as our MemoryChannel. 
 However in case that the memory gets full (because of maintenance, networking 
 issues, ...) it will spill data to disk where they will be sitting until 
 everything start working again.
 I would like to introduce channel that would implement similar logic. It's 
 durability guarantees would be same as MemoryChannel - in case that someone 
 would remove power cord, this channel would lose data. Based on the 
 discussion in FLUME-1201, I would propose to have the implementation 
 completely independent on any other channel internal code.
 Jarcec



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-12-09 Thread Hari Shreedharan (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843789#comment-13843789
]

Hari Shreedharan commented on FLUME-1227:
-

Yes, it was v7.

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho
Assignee: Roshan Naik
Attachments: 1227.patch.1, FLUME-1227.v2.patch, FLUME-1227.v5.patch,
FLUME-1227.v6.patch, FLUME-1227.v7.patch, SpillableMemory Channel Design
2.pdf, SpillableMemory Channel Design.pdf

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-11-05 Thread Roshan Naik (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814292#comment-13814292
]

Roshan Naik commented on FLUME-1227:

[~hshreedharan] , all the review comments should be addressed now. if there are
no other concerns, could you commit this ?

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho
Assignee: Roshan Naik
Attachments: 1227.patch.1, FLUME-1227.v2.patch, FLUME-1227.v5.patch,
FLUME-1227.v6.patch, SpillableMemory Channel Design 2.pdf, SpillableMemory
Channel Design.pdf

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-10-15 Thread Hari Shreedharan (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795589#comment-13795589
]

Hari Shreedharan commented on FLUME-1227:
-

[~roshan_naik] - Could you please update the patch on rb?

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho
Assignee: Roshan Naik
Attachments: 1227.patch.1, FLUME-1227.v2.patch, SpillableMemory
Channel Design 2.pdf, SpillableMemory Channel Design.pdf

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-10-15 Thread Roshan Naik (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795614#comment-13795614
]

Roshan Naik commented on FLUME-1227:

[~hshreedharan] just updated it.

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho
Assignee: Roshan Naik
Attachments: 1227.patch.1, FLUME-1227.v2.patch, SpillableMemory
Channel Design 2.pdf, SpillableMemory Channel Design.pdf

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-08-22 Thread Roshan Naik (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747758#comment-13747758
]

Roshan Naik commented on FLUME-1227:

[hshreedharan], others interested.. could you take a stab at reviewing this
code ?

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho
Assignee: Roshan Naik
Attachments: 1227.patch.1, FLUME-1227.v2.patch, SpillableMemory
Channel Design 2.pdf, SpillableMemory Channel Design.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-08-01 Thread Roshan Naik (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726702#comment-13726702
]

Roshan Naik commented on FLUME-1227:

Appreciate your feedback Hari.

HARI It looks like channel can actually return fewer events than total
available in the case where there are only n events in the primary queue and
an n+1-th take would happen - since the events in a particular txn will
always come from one queue. I think we should be able to pull events from the
other store if it turns out to be required - else we expect the sink to come
back and poll immediately - and also cause sink side transactions to be smaller
than they have to be - which can cause Avro/HDFS batch sizes to be smaller than
configured causing perf issues.

Yes that is correct. The sink's transaction batch size would be smaller in that
case. The case
would only occur in when the take transaction transitions between overflow and
primary.
The alternative, as you sugest, is to pull from both overflow and primary, but
that opens up some fundamental problems similar to distributed transactions.
Essentially the sink needs to have
two transactions open (one each on overflow and primary) which needs to be
atomically committed/rolledback. Thoughts ?

HARI How the channel recovers from an overflow situation.

I have updated the design doc (section 2.1.2) to elaborate on this. The short
version is:

New incoming events will go into primary if the sinks have drained older events
from the primary
even if overflow is not empty.

Let me know if the description addresses your question sufficiently.

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho
Assignee: Roshan Naik
Attachments: 1227.patch.1, FLUME-1227.v2.patch, SpillableMemory
Channel Design 2.pdf, SpillableMemory Channel Design.pdf

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-07-26 Thread Hari Shreedharan (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721312#comment-13721312
]

Hari Shreedharan commented on FLUME-1227:
-

Hi Roshan,

Thanks for the updated design doc and patch. I looked at the design doc and
this approach looks good. I like the fact that there are no dependencies (at
least as mentioned in the doc) on the file channel's implicit behavior. I have
on question though. The drain order queue seems to keep a count of how many
events are written to which store each time a write happens (using the -ve and
+ve numbers). It looks like channel can actually return fewer events than total
available in the case where there are only n events in the primary queue and
an n+1-th take would happen - since the events in a particular txn will
always come from one queue. I think we should be able to pull events from the
other store if it turns out to be required - else we expect the sink to come
back and poll immediately - and also cause sink side transactions to be smaller
than they have to be - which can cause Avro/HDFS batch sizes to be smaller than
configured causing perf issues.

Also, I am not clear on how the channel recovers from an overflow situation.
Assume that the primary has capacity of n and we are currently overflowing.
When do we decide to go back to the primary? Is it when all n from the
primary have been removed, or we don't go back to it until restart (sorry I
didn't look at the code yet - this does not seem to have gotten a mention in
the design doc).

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho
Assignee: Roshan Naik
Attachments: 1227.patch.1, FLUME-1227.v2.patch, SpillableMemory
Channel Design 2.pdf, SpillableMemory Channel Design.pdf

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-04-10 Thread Hari Shreedharan (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628201#comment-13628201
]

Hari Shreedharan commented on FLUME-1227:
-

Thanks for your patience with this Roshan.
This approach seems fine. It is a good idea to explicitly do the instantiation
inside the SC. You can go ahead with that for now I guess.

But here is some food for thought - The fundamental difference between this
channel and the File Channel is the way the transactions get written out. Have
you considered inheriting the File Channel and then adding a 2nd data structure
(your primary memory channel) and have the decision making happen in the
transaction code? I am not sure how feasible it is or even how smart an idea it
is, but it might be worth considering.

Introduce some sort of SpillableChannel
---

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-04-10 Thread Roshan Naik (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628260#comment-13628260
]

Roshan Naik commented on FLUME-1227:

Thats a very interesting suggestion. Thanks. I shall play with that idea also.

Introduce some sort of SpillableChannel
---

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-04-08 Thread Roshan Naik (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625573#comment-13625573
]

Roshan Naik commented on FLUME-1227:

Hari, Juhani, if there is no additional concerns then i shall proceed with this
approach. Settling on the general approach now will help us avoid pouring
efforts into an unacceptable direction. I shall wait for another day before
proceeding.

Introduce some sort of SpillableChannel
---

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-04-08 Thread Juhani Connolly (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626167#comment-13626167
]

Juhani Connolly commented on FLUME-1227:

Seems like a reasonable compromise to me. I think any approach will have
issues. 3 would probably be preferable to 4 if it's doable

Introduce some sort of SpillableChannel
---

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-04-03 Thread Mike Percy (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621057#comment-13621057
]

Mike Percy commented on FLUME-1227:
---

Roshan, that sounds good to me. Hari, Juhani, do you guys have any additional
feedback on this proposal?

Thanks,
Mike

Introduce some sort of SpillableChannel
---

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-03-27 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614991#comment-13614991
 ] 

Roshan Naik commented on FLUME-1227:


I am not particularly wedded to the current approach. My first attempt based on 
your suggestion to inline the config of overflow channel in the SC itself. I 
discovered some [serious 
issues|https://issues.apache.org/jira/browse/FLUME-1227?focusedCommentId=13540116page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13540116]
 with it and so I pursued the alternative that had been discussed (but w/o 
consensus). Intent was to get the less contentious core logic working and 
return quickly to this phase of getting feedback on these shaky parts.

- Since you mention it, explicitly depending on FC  ( i assume by invoking 'new 
FileChannel()' inside SC ) ... has not been discussed. It might be worth 
considering. 

- Forking FC / Creating yet another durable channel : This has talked about and 
concerns have been with duplication of code (perhaps the most complex piece 
Flume code). I think Juhani also noted the same. I too am concerned about that. 
If forked.. each FC bug would have to fixed in 2 places. FC seems to keep 
evolving, and the for will likely become stale. I wonder, if it makes sense to 
derive a class from FC and use it as overflow instead.

- Your unresolved code review Question: We spoke about this when we met at the 
Flume meetup. On restart the overflow is drained completely first. It is 
addressed in the design doc under 'recovery from failures' but perhaps not very 
clearly.

- Yes, if SC does not have to guarantee strict ordering, then as long as counts 
in DOQ are correct, things will work fine. Ordering guarantees from overflow 
are needed only if SC is reqd to provide ordering guarantee. We already have a 
consensus that SC will not rely on any non-explicit FC guarantees.

- I totally agree with Hari and yourself on transactionCapacity issue. It makes 
total sense to expose channel size and capacity at the channel interface. I 
didn't do it in the first patch as I was afraid it might become a big point of 
contention. Perhaps a misplaced fear. MemC,FC  JdbcC may need minor tweaks for 
it. If there are no objections i can go ahead and make this change.


I think now the only remaining open issue is how to deal with Overflow. Let me 
list the options that have been put forward so far and some more : 

1) User specifies in config which channel to use as overflow : Current approach 
and has given me all the grief that i anticipated :)
2) Fork FC / create yet another durable FC like store. Then embed it into SC. 
Some comments have been made on this already.
3) Explicitly instantiate FC directly inside SC. 
4) Derive another class from FC and embed it into SC.
5) Based on Mike comment about SinkProcessors... Does it make sense to 
experiment with the notion of ChannelProcessors ? 
6) Any other ideas ? Now would be THE time to speak.



 Introduce some sort of SpillableChannel
 ---

 Key: FLUME-1227
 URL: https://issues.apache.org/jira/browse/FLUME-1227
 Project: Flume
  Issue Type: New Feature
  Components: Channel
Reporter: Jarek Jarcec Cecho
Assignee: Roshan Naik
 Attachments: 1227.patch.1, SpillableMemory Channel Design.pdf


 I would like to introduce new channel that would behave similarly as scribe 
 (https://github.com/facebook/scribe). It would be something between memory 
 and file channel. Input events would be saved directly to the memory (only) 
 and would be served from there. In case that the memory would be full, we 
 would outsource the events to file.
 Let me describe the use case behind this request. We have plenty of frontend 
 servers that are generating events. We want to send all events to just 
 limited number of machines from where we would send the data to HDFS (some 
 sort of staging layer). Reason for this second layer is our need to decouple 
 event aggregation and front end code to separate machines. Using memory 
 channel is fully sufficient as we can survive lost of some portion of the 
 events. However in order to sustain maintenance windows or networking issues 
 we would have to end up with a lot of memory assigned to those staging 
 machines. Referenced scribe is dealing with this problem by implementing 
 following logic - events are saved in memory similarly as our MemoryChannel. 
 However in case that the memory gets full (because of maintenance, networking 
 issues, ...) it will spill data to disk where they will be sitting until 
 everything start working again.
 I would like to introduce channel that would implement similar logic. It's 
 durability guarantees would be same as MemoryChannel - in case that someone 
 would remove power cord,

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-03-25 Thread Mike Percy (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613418#comment-13613418
]

Mike Percy commented on FLUME-1227:
---

Roshan, thanks a lot for this design documentation.

Guys, based on my prior [reviewboard
comment|https://reviews.apache.org/r/9544/] one big problem I have with this
implementation is the way that the channels are allowed to know about each
other. I am completely against this because it violates separation of
responsibilities and encourages unmaintainable spaghetti dependencies between
components. What's next, sinks? That is why we have SinkProcessors (so sinks
don't have to know about each other). We simply cannot afford to open that
Pandora's box. Let the SpillableChannel instantiate its own dependencies and
govern their lifecycle.

If explicitly depending on the file channel is a problem, then let's talk about
ways to mitigate that... either forking a copy of the FC code into SC so that
FC can evolve separately, or explicitly not relying on ordering in SC, if that
is the issue. Therefore SC would not have ordering guarantees. Can the Drain
Order Queue survive that situation? It makes me a little nervous that DOQ even
exists to be honest... I don't really like it. It seems like a somewhat complex
and brittle mechanism for achieving this spill functionality. But I would not
block this patch because I'm not in love with the DOQ. And I think if the SC
doesn't have to guarantee order then as long as its counts are correct then it
should still work. Correct me if I'm wrong.

If specific non-explicit guarantees of the FC are being relied on then an
alternative is to consider a different design that relies on different
invariants than the DOQ does. I'm not necessarily advocating for that, I'm just
throwing it out there as an option. But I'd be happy with forking the FC and
getting this checked in without a total redesign to make progress if that
addresses others' concerns.

My other as-yet unresolved item of code review feedback involved what happens
when the agent is stopped then restarted while the channel has events in both
the primary and secondary channels. Can this please be addressed as well?

Additionally, I agree with Hari on the use of transactionCapacity as a poor
substitute for a reservation amount on the underlying channels. We need a
better way, and if exposing channel size and capacity via an interface will
help then I'm all for it.

Regards,
Mike

Introduce some sort of SpillableChannel
---

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-03-23 Thread Roshan Naik (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13611899#comment-13611899
]

Roshan Naik commented on FLUME-1227:

- I concur that unspecified guarantees should not be depended upon. I can drop
that assumption from the tests.

- I think its very important to not continue to leave the guarantees
unspecified. But that's for another Jira.

- WRT to deferring the decision to commit() time. Let me revisit that issue.

*Instantiationa config*:
For discussion, I would like to treat instantiation (new up the object)
separate from life cycle (start/stop). Since existing instance may get reused
during reconfigure.

Overflow does not need to be instantiated or configured before SC! Just like
sources, sinks and channels can be instantiated and configured independently in
any order. Only start/stop needs to co-ordinated between the two. Also we need
to ensure that SC is not able to get a reference to overflow if overflow had
configuration errors.

All components (sinks/sources/channels) get introduced to each other after
they are correctly configured. There is already a step to introduce configured
sinks and sources to their channels. I have extended that step to introduce
channels to each other. The current implementation is a bit permissive and
could be tightened up so that SC is limited to obtaining a handle only its
overflow (not other channels).

*Life cycle*:
Hari, Correct me if you think its not the case, but i think the current design
is in tune with your desire that the SC owns the lifecycle (start/stop) of the
overflow. Config subsystem merely instantiates, configures and introduces the
two channels to each other. Thereafter it disowns the lifecycle of overflow and
lets the SC manage overflow's lifecycle. It retains ownership of SC's lifecycle
however. This is nice because we dont have to replicate solutions to some of
the config related aspects in SC. We don not have to worry about the order in
which channels are instantiated and configured, and at the same time gain
control over the order in which the start/stop is called on the SC and its
overflow.

*Scribe*:
Juhani, I think spilling policy can we definitely tweaked. Right now I spill
into overflow only when primary is full. I like the idea that we can take a cue
from the fact that takes() have begun to fail and start spilling early to
minimize data loss. There is a throughput concern that I have with Scribe's
operating mode where it switches exclusively to using either memory or disk. In
SC's design we do not need to wait for the overflow to completely drain before
resuming the use of the faster primary. I'll look more into scribe and see what
we can leverage.

- The fsync experiment is something i would like to defer and resolve other
open items. It does not look like a blocker and more of a perf tuning thing.
does that sound reasonable ?

Introduce some sort of SpillableChannel
---

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-03-22 Thread Juhani Connolly (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13610111#comment-13610111
]

Juhani Connolly commented on FLUME-1227:

I would personally prefer seeing a dependence on existing channels than another
implementation of something like the file channel and something like the memory
channel. The code-base is already getting pretty big, and the interfaces are
fixed. The spillable channel shouldn't even know or care about what type the
main/sub channel are, just feed them data. While it might not be the most
optimal solution performance-wise, I think the cost would be small and it would
give us less code to maintain overall. Either approach certainly has its merits.

Introduce some sort of SpillableChannel
---

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-03-21 Thread Roshan Naik (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608700#comment-13608700
]

Roshan Naik commented on FLUME-1227:

Thanks Hari.

1) WRT the concern on not depending on another channel, i went down this path
since it looked like there was some consensus when i started. What alternative
design do you have in mind ?

2) WRT change in memory/file channel breaking the Spillable channel: Could you
expand a bit ? I am not familiar with replay order issue and how it can impact.
I dont think there is any intrinsic assumption being made wrt to any specific
channel's behavior. Just to be doubly sure, i made sure not to rely on a single
type of overflow channel in all the tests. The only material dependency (as far
as I can tell) that Spillable Channel has on the overflow is the interface
level guarantee that is expected from all channels: that order is maintained in
case of single source/sink.
Do you see any other assumptions/dependencies hiding there ?

3) WRT reserving capacity on both channels. If you mean that each txn should
not reserve capacity on both channels. I agree. And the current implementation
does not do that. Or were you by any chance referring to the issue of upfront
reservation (at put() time) versus commit() time ?

4) WRT to testing with fsyncs removed, i have not pursued it since i felt that
would be compromising the durability guarantees. Do you think its useful to do
that ?

5) WRT we should make the configuration change. Can you elaborate ? I am not
certain which change specifically you are referring to. Or are you referring
to the whole config approach ?

6) WRT lifecycle management and dependencies : After configuration, any
channel that is found to be not connected with a source/sink is automatically
discarded from the list of Life cycle system managed components. Consequently
the Spillable Channel becomes the sole life cycle manager of the overflow
channel. Otherwise, yes there would be havoc.

Introduce some sort of SpillableChannel
---

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-03-21 Thread Hari Shreedharan (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13609360#comment-13609360
]

Hari Shreedharan commented on FLUME-1227:
-

{quote}
1) WRT the concern on not depending on another channel, i went down this path
since it looked like there was some consensus when i started. What alternative
design do you have in mind ?

I am sorry, I was not part of the initial discussions - so I was not aware of
the consensus aspect. What I am saying is that being dependent on another
channel creates an undesired strong coupling between this channel and the other
channels. An if there are unit tests in this channel which can break if one of
the other channels' behavior is changed, then it is not something that is
acceptable. If you look at all our other components, none of them have a
dependence on each other (except the RPCClients - that is because the sinks are
just glorified RPCClients).

The reason I would not agree with even the single source/sink replay order is
that our interfaces do not really enforce this. This is not really even
enforced anywhere in the documentation either. The FileChannel did not even
conform to that single source/sink replay order until FLUME-1432. In fact,
conforming to that order even in FLUME-1432 was a side-effect of fixing a race
condition, and not specifically because it was meant to be handled. At some
point, if it is decided this can change again to some other order (maybe a
thread based ordering, or or an order in which events in a transaction will all
get written out together on commit, rather than getting written out on put and
fsynced on commit), then if this channels' tests break, the onus will be on the
contributor who submitted the file channel change to fix it - which I do not
agree with.

In summary, I am ok with depending on other channels. What I am not ok with is
depending on the behavior of those channels, which are not explicitly
guaranteed through interfaces (or even documentation).

bq. 3) WRT reserving capacity on both channels. If you mean that each txn
should not reserve capacity on both channels. I agree. And the current
implementation does not do that. Or were you by any chance referring to the
issue of upfront reservation (at put() time) versus commit() time ?

I am talking about put v/s commit time. In most cases, transaction capacity is
often configured to be much higher than the the max expected in most cases. I
would suggest doing a full implementation where there is a transaction outside,
and a backing store inside. Once the transaction is about to get committed,
then decide where the events go. (It is going to be tricky to do this and avoid
doing all the writes at once - the File Channel fsyncs on commit, but writes to
OS buffers on every write - so it is possible some data is flushed to disk
before explicit fsyncs). This is not a blocker anyway, we can work on it later
as well.

bq. 4) WRT to testing with fsyncs removed, i have not pursued it since i felt
that would be compromising the durability guarantees. Do you think its useful
to do that ?

I was wondering whether simply adding a config param to change the fsyncs
(fsync all files before checkpoint in parallel or something) to optional will
give comparable performance to what is being proposed in this jira. I have a
feeling it might, since fsyncs are the most expensive part of the file channel,
and removing the fsyncs just writes to the in-memory OS buffer and the fsyncs
will be taken care of in the background.

{quote}
5) WRT we should make the configuration change. Can you elaborate ? I am not
certain which change specifically you are referring to. Or are you referring to
the whole config approach ?
6) WRT lifecycle management and dependencies : After configuration, any channel
that is found to be not connected with a source/sink is automatically discarded
from the list of Life cycle system managed components. Consequently the
Spillable Channel becomes the sole life cycle manager of the overflow channel.
Otherwise, yes there would be havoc.
{quote}

I just think we should not allow one component to pull a reference to another
component in the system. This explicitly breaks the

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-03-21 Thread Juhani Connolly (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13609824#comment-13609824
]

Juhani Connolly commented on FLUME-1227:

I had a look at the design doc and comments so just thought I'd chip in.

So long as we're only depending on the Channel interface for behaviors, I think
we're good, I believe this was the intention in an earlier proposal of this
feature.

I agree with Hari about ordering. It's not a guarantee we enforce in flume, and
while nice, I think that it over-complicates things.

As to lifecycle management, I don't necessary feel that having a channel own
it's sub-channels is a particularly good precedent. I think it would be
preferable that we allow the lifecycle manager to return interfaces rather than
having components creating other components explicitly. Configuration would
have to have some grasp of dependencies though... Sub-channels would need to
be instantiated before the owner

As to the fsync thing: definitely should be an option. Separate issue though.
Making it possible to disable it would be great. Since this depends on in
memory data, durability really shouldn't be an issue. If you have data in
memory, it doesn't really matter if it's in the memory channel or in the OS
file buffer

One thing you may want to consider is the approach taken by scribed(which has
other problems, but the buffer store implementation is very nice):
- Default to using the main channel
- Upon a next hop failure(roll back of take transaction in our case), switch to
a buffering mode. All data is sent to the buffer channel until recovery. One
may want to move the contents of the primary channel to the buffer if
maintaining ordering is an objective. This could also reduce loss of data.
- During buffering mode, puts and takes go to the buffer channel, until it has
been drained. Once it has been drained, return to streaming mode where
operations are performed against the primary channel.

Introduce some sort of SpillableChannel
---

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-03-21 Thread Hari Shreedharan (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13609911#comment-13609911
]

Hari Shreedharan commented on FLUME-1227:
-

Hi Juhani,

Thanks for you comments. I agree with most of what you have mentioned.
{quote}
As to lifecycle management, I don't necessary feel that having a channel own
it's sub-channels is a particularly good precedent. I think it would be
preferable that we allow the lifecycle manager to return interfaces rather than
having components creating other components explicitly. Configuration would
have to have some grasp of dependencies though... Sub-channels would need to be
instantiated before the owner
{quote}

I agree with your last statement. Configuration will also need to detect cycles
etc so that you don't have a cycle of interdependent components. I don't
particularly like the idea of passing references of existing channels to others
to use as sub-channels - something that I don't like, but won't block since
there seems to have been some consensus regarding this earlier. I frankly think
2 channels within the same one is overkill. I think this channel can be easily
implemented by using a mmap-ed file which is never specifically fsync-ed. This
might cause some page faults etc., but the page cache management is usually
smart enough to not cause this to affect performance a whole lot - this
implementation is likely to be faster too (in fact, this is very similar to the
File Channel checkpoint class). Using this as a cyclic buffer would probably be
as good, and gives the same guarantees as the memory channel (which is what we
are targeting in this jira, I suppose?).

Also, I like the implementation you have mentioned above, though this can be
quite tricky to get right.

Introduce some sort of SpillableChannel
---

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-03-19 Thread Hari Shreedharan (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13606137#comment-13606137
]

Hari Shreedharan commented on FLUME-1227:
-

Roshan,

Sorry it took me this long to get to this one. I reviewed the design document
and I have a couple of relatively major concerns:

#. This channel implicitly depends on the behavior of current channels - the
File Channel and Memory Channel. As one of the people who maintain the file
channel, I strongly feel this is not the correct thing to do. It is possible
that behavior of the File Channel or the Memory Channel could change (This is
not without precedent. In FLUME-1437, we did change the replay order). At that
point, a change in the behavior of the File Channel or Memory Channel would
break unit/integration tests for this channel - which could delay a commit.

#. I don't think we should make the configuration change. The idea of the
Lifecycle manager is to handle all the components and make them independent of
each other. Dependencies on other components managed by the Lifecycle system is
a bad idea. This also sets a bad precedent. This can lead to patches that make
component inter-dependent and depend on the other component being a particular
one (example a source using this hook to figure out if it is operating on
Memory Channel or File Channel).

I believe the current design is a bit more complex than it needs to be - due to
the handling of more than one transaction. Also reserving transaction capacity
on both channels is a bad indicator of where the txn should go. In my
experience, people do set the transaction capacity to a value much higher than
the average transaction.

Also, have you tested this against a slightly modified File Channel with all of
the fsyncs removed (or commented out)? I'd be interested in seeing the
difference in performance at that point. Also, see FLUME-1423 where Denny
removed the fsyncs for performance (the performance of the channel has improved
even more since then though).

Introduce some sort of SpillableChannel
---

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-03-14 Thread Roshan Naik (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13603141#comment-13603141
]

Roshan Naik commented on FLUME-1227:

Looking to revive attention on this one.

Introduce some sort of SpillableChannel
---

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-02-27 Thread Brock Noland (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588425#comment-13588425
]

Brock Noland commented on FLUME-1227:
-

Same as Mike. [~hshreedharan] any time for a review?

Introduce some sort of SpillableChannel
---

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-02-27 Thread Hari Shreedharan (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588624#comment-13588624
]

Hari Shreedharan commented on FLUME-1227:
-

I can take a quick look later today, though I can't promise when I can do a
full review.

Introduce some sort of SpillableChannel
---

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2012-12-27 Thread Roshan Naik (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13540116#comment-13540116
]

Roshan Naik commented on FLUME-1227:

Seeking input ..

The current configuration system does not look conducive to chaining channels.
Here are the config techniques that has been previously talked about :

1) Out-of-line:

agent1.channels = channel1 channel2

agent1.channels.channel1.type = SPILLABLE
agent1.channels.channel1.overflow = channel2

agent1.channels.channel2.type = FILE
agent1.channels.channel2.checkpointDir = /path1
...

The problem here is that ..
- At the time channel1 is configured, channel2 may not have been instantiated
yet. So it is not possible to latch on to an instance of channel2. So it may be
better to defer obtaining a reference to the overflow channel at start time.
- No mechanism to get a reference to one channel from another (in this case, at
start time)

2) Inline: (as suggested by Mike)

agent1.channels = channel1

agent1.channels.channel1.type = SPILLABLE
agent1.channels.channel1.overflowChannel.type = FILE
agent1.channels.channel1.overflowChannel.checkpointDir = /path1
agent1.channels.channel1.overflowChannel.dataDirs = /path2
...

The issue here is that the instantiation and configuration of the overflow
channel will now have to reside inside SpillableChannel::configure(). This
method is not a very conducive place for doing such things.

3) Hard coding
Basically hard code the file channel to be the overflow channel. this allows
the file channel to be easily instantiated and configured. downside is that it
still duplicates the channel instantiation/config logic from
AbstractConfigurationProvider.loadChannels()

Any thoughts ?

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho
Assignee: Roshan Naik

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2012-12-05 Thread Mike Percy (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13510888#comment-13510888
]

Mike Percy commented on FLUME-1227:
---

Hey Roshan, sounds good to me except I'd recommend trying this out with a brand
new channel that delegates to a memory channel, in order to minimize the risk
of destabilizing what is a very solid and important core component.

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2012-12-05 Thread Roshan Naik (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13510988#comment-13510988
]

Roshan Naik commented on FLUME-1227:

You mean we conceptually create a new MemChannel++ ? where the ++ part is
basically the overflow ability ?

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2012-12-05 Thread Mike Percy (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511024#comment-13511024
]

Mike Percy commented on FLUME-1227:
---

Right. Or we could call it SpillableChannel I guess. :) I don't have a strong
opinion on the name, personally.

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2012-11-29 Thread Roshan Naik (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506841#comment-13506841
]

Roshan Naik commented on FLUME-1227:

Hi Mike.. yes you are right.. i think it is a downside of that algorithm. i
realized the same after posting that comment.

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2012-11-27 Thread Roshan Naik (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504893#comment-13504893
]

Roshan Naik commented on FLUME-1227:

Thanks for those valuable thoughts Mike.

I have described an algorithm for puts/takes
[here|https://issues.apache.org/jira/browse/FLUME-1227?focusedCommentId=13493481page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13493481].
It should solve the ordering problem, handle transactions correctly and
maximize throughput.

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2012-11-27 Thread Brock Noland (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505021#comment-13505021
]

Brock Noland commented on FLUME-1227:
-

If we move forward with this proposal, I think it'd be great to see a design
document.

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2012-11-26 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504291#comment-13504291
 ] 

Roshan Naik commented on FLUME-1227:


Continuing the discussion...

I spent some time studying the discussions in the jiras related to solving the 
problem of spilling over (and/or failover). I think failover and spillover 
should not be conflated to be the same problem ... even though it may be 
possible to address them both in the same solution.

There is a consensus that the problem worth addressing. There are concerns 
hovering around these dimensions.

1) complexity of implementation and configuration. also potentially 
[enhancements|https://issues.apache.org/jira/browse/FLUME-1045?focusedCommentId=13430529page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13430529]
 to existing interfaces
2) complexity of testing
3) Ensuring transaction guarantees are preserved and its weakness/strength level
4) Defining the durability level (durable or not) of the final solution .. this 
is simple IMHO
5) Efficiency of the solution (batching requests during when spilling over)
6) Flexibility

So far the solutions discussed along with their concerns ..

 1) FailOver Sink processor  -  has issues with retaining transaction 
guarantees 
([Reference|https://issues.apache.org/jira/browse/FLUME-1045?focusedCommentId=13235705page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13235705])

 2) Mechanisms for Composing Existing Channels  
([1201|https://issues.apache.org/jira/browse/FLUME-1201] and [my 
proposal|https://issues.apache.org/jira/browse/FLUME-1227?focusedCommentId=13492828page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13492828])
 -  Flexible but has complexities in regards to testing ([mixed opinions 
here|https://issues.apache.org/jira/browse/FLUME-1201?focusedCommentId=13282018page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13282018]),
 implementation  determining durability 
[See|https://issues.apache.org/jira/browse/FLUME-1045?focusedCommentId=13235705page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13235705]

 3) Spillable Channel - Limited functionality but easier to test and determine 
transaction+durability semantics. 



My thoughts...
  The concerns related to mechanisms for composing channels is largely centered 
around complexities. I feel some of them are not true.
 
  Testing a composition mechanism is not as complex as it has been feared for 
reasons stated 
[here|https://issues.apache.org/jira/browse/FLUME-1201?focusedCommentId=13282018page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13282018].

 In a pluggable system (like rest of flume) we rely on guarantees from the 
interface itself. There is no need to test all combination of all possible 
channels for testing. Just like it does not make sense to test all combinations 
of sink/channel/source/interceptors/sink-processors in Flume.

Implementation of a composite mechanisms would also be simpler. It would be 
focussed only around issues involved in stitching channels. Not in actually 
providing a robust backing store. 

Spillover channel (Mem + File) seems a little too specialized .. for instance 
it does not provide durability for users if needed. It is nice to allow the 
primary channel to be on a fast smaller durable store (like SSDs) and overflow 
into a another slower durable store (like hard disk /jdbc)

the following general strategy for compounding channels seems worth discussing 
..

agent1.channels.compoundChannel.type = compound
agent1.channels.compoundChannel.1 = memChannel1
agent1.channels.compoundChannel.2 = fileChannel1
agent1.channels.compoundChannel.3 = jdbcChannel1

agent1.channels.compoundChannel.1.overflowBatchSize = 100   # batch size when 
spilling into fileChannel1
agent1.channels.compoundChannel.2.overflowBatchSize = 1000  # batch size when 
spilling into jdbcChannel1



 Introduce some sort of SpillableChannel
 ---

 Key: FLUME-1227
 URL: https://issues.apache.org/jira/browse/FLUME-1227
 Project: Flume
  Issue Type: New Feature
  Components: Channel
Reporter: Jarek Jarcec Cecho

 I would like to introduce new channel that would behave similarly as scribe 
 (https://github.com/facebook/scribe). It would be something between memory 
 and file channel. Input events would be saved directly to the memory (only) 
 and would be served from there. In case that the memory would be full, we 
 would outsource the events to file.
 Let me describe the use case behind this request. We have plenty of frontend 
 servers that are generating events. We want to send all events to just 
 limited number of machines from where we would send the

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2012-11-12 Thread Roshan Naik (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13495737#comment-13495737
]

Roshan Naik commented on FLUME-1227:

Looks like this jira is up for grabs ??
If there is agreement that my proposal is a good way forward I would like to
pick it up.
Thoughts ?

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2012-11-12 Thread Roshan Naik (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13495778#comment-13495778
]

Roshan Naik commented on FLUME-1227:

actually i think.. this proposal, if acceptable, would have to be a different
jira. since the current jira is about introducing a new channel.

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2012-11-12 Thread Hari Shreedharan (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13495785#comment-13495785
]

Hari Shreedharan commented on FLUME-1227:
-

Roshan - that might be a good thing to do - but there was a discussion about a
compound channel several months ago, and I believe the consensus was that it
would be too complex to write and even more complex to test. But feel free to
file a jira - I am sure there will be a healthy discussion.

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2012-11-12 Thread Bernardo de Seabra (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13495798#comment-13495798
]

Bernardo de Seabra commented on FLUME-1227:
---

I like this approach (quite popular with Scribe) but my only concern is around
performance. You would get unexpected/unpredictable performance impact on disk
IO which could be (in our case it would be) impacting your application if flume
and the app are sharing the same disk. It's a tradeoff.

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2012-11-08 Thread Roshan Naik (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493481#comment-13493481
]

Roshan Naik commented on FLUME-1227:

I agree Scribe's policy is sub optimal. It is better to prioritize the parent
channel whenever it has spare capacity and still maintain order. To achieve
this I have a simple algorithm in mind...

The parent channel maintains a 'drain order' queue of signed numbers which
indicates at anytime the order in which the items in it and its overflow
channel should be drained. For instance the following numbers in that queue
[3,-2,6,-1] indicate the following drain order:

- drain 3 from self
- then drain 2 from overflow
- then 6 from self
- then 1 from overflow

The channel's put() will update its drain order queue (DOQ) as follows:

if(I have capacity) {
+ add event to my DOQ
+ if last element in DOQ is +ve then increment it
+ else push +1 to DOQ
} else {
+ Call put() on overflow
+ if last element in DOW is -ve then decrement it
+ else push -1 to DOQ
}

I think the take() should be obvious.

Obviously corner cases like empty self and empty overflow need to be handled
appropriately.. but this is just capturing the idea.

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2012-11-08 Thread Roshan Naik (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493500#comment-13493500
]

Roshan Naik commented on FLUME-1227:

apologies for email storm created by multiple edits to my prev comment.

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2012-11-07 Thread Roshan Naik (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492828#comment-13492828
]

Roshan Naik commented on FLUME-1227:

I dont see this option discussed but it seems interesting (and IMO avoids some
of the issues in sink triggered spooling as discussed in FLUME-1045).

Basically instead of adding another Spillable channel which is logically a
composite of mem file channels, we could add a config directive to Memory
Channel such as:

agent1.channels.memChannel1.overflow = fileChannel1

Basically, there would be a preconfigured file channel (or jdbc or some custom
channel) into which memory channel would simply spill over events into when
capacity has been reached. There should be no other sources or sinks tied to an
overflow channel.

Ideally any channel should be able to use another channel for overflow.

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho
Assignee: Patrick Wendell

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2012-11-07 Thread Juhani Connolly (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492876#comment-13492876
]

Juhani Connolly commented on FLUME-1227:

Interesting suggestion... When would you suggest that the overflow channels
contents be read, and by what component?

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho
Assignee: Patrick Wendell

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2012-11-07 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492923#comment-13492923
 ] 

Roshan Naik commented on FLUME-1227:


The parent channel's put()/take() will be the source/sink for its overflow 
channel.  

For the special case of just supporting it in memory channel, I think it could 
easily employ whatever policy the SpillableChannel would have used. 

For the more general case of making this a cross-cutting feature available to 
all channels with the ability to chain, i would conjecture, it may be possible 
to use the same policy at each level of the chain.  So this policy could be 
pushed into the common base class for channels.

 Introduce some sort of SpillableChannel
 ---

 Key: FLUME-1227
 URL: https://issues.apache.org/jira/browse/FLUME-1227
 Project: Flume
  Issue Type: New Feature
  Components: Channel
Reporter: Jarek Jarcec Cecho
Assignee: Patrick Wendell

 I would like to introduce new channel that would behave similarly as scribe 
 (https://github.com/facebook/scribe). It would be something between memory 
 and file channel. Input events would be saved directly to the memory (only) 
 and would be served from there. In case that the memory would be full, we 
 would outsource the events to file.
 Let me describe the use case behind this request. We have plenty of frontend 
 servers that are generating events. We want to send all events to just 
 limited number of machines from where we would send the data to HDFS (some 
 sort of staging layer). Reason for this second layer is our need to decouple 
 event aggregation and front end code to separate machines. Using memory 
 channel is fully sufficient as we can survive lost of some portion of the 
 events. However in order to sustain maintenance windows or networking issues 
 we would have to end up with a lot of memory assigned to those staging 
 machines. Referenced scribe is dealing with this problem by implementing 
 following logic - events are saved in memory similarly as our MemoryChannel. 
 However in case that the memory gets full (because of maintenance, networking 
 issues, ...) it will spill data to disk where they will be sitting until 
 everything start working again.
 I would like to introduce channel that would implement similar logic. It's 
 durability guarantees would be same as MemoryChannel - in case that someone 
 would remove power cord, this channel would lose data. Based on the 
 discussion in FLUME-1201, I would propose to have the implementation 
 completely independent on any other channel internal code.
 Jarcec

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2012-11-05 Thread Rahul Ravindran (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491154#comment-13491154
]

Rahul Ravindran commented on FLUME-1227:

Is there a timeline on when this new channel would be out?

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho
Assignee: Patrick Wendell

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2012-11-05 Thread Mike Percy (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491209#comment-13491209
]

Mike Percy commented on FLUME-1227:
---

I don't know of anyone actively working on this...

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho
Assignee: Patrick Wendell

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2012-08-09 Thread Juhani Connolly (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431609#comment-13431609
]

Juhani Connolly commented on FLUME-1227:

Since the channel is not aware of the state of sinks, I think Jareks proposed
method sounds good.

In another place, it was pointed out that we cannot just change the interface
as it will break peoples custom components.

However I think you can get away with a similar method to configurable now. Add
a CapacityPollable interface or something, and check whether the channel
implements it, polling if it exists. In the case of non-existence you will just
have to rely on catching exceptions as an indicator of problems)

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho
Assignee: Patrick Wendell

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2012-08-07 Thread Seetharam Venkatesh (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13430470#comment-13430470
]

Seetharam Venkatesh commented on FLUME-1227:

Does this mean there is no effort going into FLUME-1045?

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho
Assignee: Patrick Wendell

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2012-08-03 Thread Denny Ye (JIRA)

[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428531#comment-13428531
]

Denny Ye commented on FLUME-1227:
-

That's great and useful when Flume cannot reaches to HDFS or other destination.
Also it's the same concept in Scribe with named 'primary store' and 'secondary
store'. Wish any implementations.

Introduce some sort of SpillableChannel
---

Key: FLUME-1227
URL: https://issues.apache.org/jira/browse/FLUME-1227
Project: Flume
Issue Type: New Feature
Components: Channel
Reporter: Jarek Jarcec Cecho
Assignee: Jarek Jarcec Cecho

65 matches

Mail list logo