Re: Concerns about File endpoint

Claus Ibsen Fri, 05 Dec 2008 12:37:15 -0800

Hi Chris

I am doing the final testing on the pre move on the 1.5.1 branch. It
has been commited to the trunk.


I also updated the wiki page for the file component to explain a bit
more on the file repository for the idempotent thing we have in Camel
2.0.
The file repo is basic as it's just a persistence store for a 1st
level cache - but it survives server restarts.

We can always improve later if needed.

But if you need a really large idempotent repo the we have the JPA you can use.

Yeah I am hoping we get some time sooner to get the JMS component
improved in 2.0 to have this exchange transfer as it has been
requested a few times and good for internal routing using the JMS
queues to survive restarts.



/Claus Ibsen
Apache Camel Committer
Blog: http://davsclaus.blogspot.com/



On Thu, Dec 4, 2008 at 4:46 PM, Christopher Hammack
<[EMAIL PROTECTED]> wrote:
>
> I opened CAMEL-1148 on the preMove concept.
>
> I would be willing to test the transferExchange when you get that going.
>
> It's a good question on the file-based persistence.  I suppose you could do
> something simple like putting a hidden file that is a java serialized object
> of the list but I think, at least for our situation, scalability would be an
> issue.  To perform well you'd need a datastructure that you could either
> natively search without it being in memory, or accept the limitation that
> all of the metadata must be in memory (e.g. a hashmap)--but that probably
> won't scale well.  I suppose you could use derby (but that's basically just
> jpa anyhow) or something from hadoop or one of the persistence technologies
> used in activemq.
>
> For us, we bring in on the order of several hundred thousand to million
> files per day, and so keeping track of that much metadata and trying to find
> the new subset will be probably be untenable on anything but a full blown
> database store.   That's why we came up with the preMove concept as it
> allows us to use the actual filesystem for marking which files have been
> scanned, and by using persistence in JMS we survive restarts.  I can see
> situations where there are much less files and that could work though.
>
>
> Claus Ibsen-2 wrote:
>>
>> Hi
>>
>> Glad I could help. The preMove command is a very good idea. Please
>> feel free to create a ticket about it in JIRA.
>>
>> Yeah the jms-component auto deciding the javax.jms.XXX message type is
>> an issue I also would like to remedy for Camel 2.0.
>> There is a ticket about it.
>>
>> However you want to only send the java.io.File over the JMS and not
>> the actual payload of the file (kinda like sending the pointer of the
>> file).
>> I would envision that java.io.File by default would load the file
>> content and send that over JMS. So you want to keep it as a
>> java.io.File object and that's it.
>>
>> We have a ticket for sending the exchange itself, like we have for
>> camel-mina: transferExchange=true. That one would resolve the issue
>> you have, since you are using Camel as both sender and reciever of the
>> JMS queues. There are some ticket in JIRA for this. Feel free to
>> comment and vote for them.
>>
>> We might get started on them pretty soon, then you could help test it
>> on your system.
>>
>>
>> The idempotent repository for the file consumer can be changed to a
>> jpa version, or you can implement your own. The jpa will persist in a
>> DB and thus survive restarts. We are also planning on a file base repo
>> as well. In fact it's on my next todo list. Do you have any
>> suggestions / requirements for such a file based repo?
>>
>>
>>
>> /Claus Ibsen
>> Apache Camel Committer
>> Blog: http://davsclaus.blogspot.com/
>>
>>
>>
>> On Wed, Dec 3, 2008 at 4:27 PM, Christopher Hammack
>> <[EMAIL PROTECTED]> wrote:
>>>
>>> Thanks for the clarification.  This does completely explain the
>>> situation,
>>> and applying a variant of your solution "a" seems to have things working
>>> much more reliably.
>>>
>>> However, I'd like to suggest that you add a "preMove" option as it seems
>>> to
>>> be pretty much a requirement for doing clustered seda-style processing
>>> from
>>> a file endpoint.  I suppose you could use the new noop/idempotent
>>> capabilities that you have added in 2.0, but it's also nice to have
>>> separation between files that have yet to be discovered and files that
>>> are
>>> in process and also I'm a little leery of keeping state information like
>>> that around as the file endpoint idempotent data presumably (maybe I'm
>>> wrong?) doesn't persist across restarts, etc.
>>>
>>> Also, is there a way to disable the implicit conversion of java.io.File
>>> to a
>>> jms bytes message on JMS?  This is also not desirable in this situation.
>>> Putting gigabytes of byte data on jms does not work out very well.  To
>>> get
>>> around this we have to convert the java.io.File to a String prior to
>>> putting
>>> it on JMS, and then convert it back to a java.io.File so we can give our
>>> users the ability to use the camel built in transformers to transform to
>>> the
>>> input method of their choice (byte[], InputStream, FileReader,
>>> java.io.File,
>>> etc.).  This works, but it kind of takes away from the "cleanliness" of
>>> the
>>> routes.
>>>
>>> Thanks for your help--we've found camel to be much superior in many ways
>>> to
>>> another animal-named esb product that we're attempting to migrate away
>>> from.
>>>
>>>
>>>
>>> Claus Ibsen-2 wrote:
>>>>
>>>> Hi
>>>>
>>>> #1
>>>> The move is executed *AFTER* the routing.
>>>> The idea is that you process the file while it's in the target folder
>>>> (where its dropped) and after processing you can move it to a backup
>>>> folder.
>>>>
>>>> #2
>>>> Ah I speculate that the JMS consumer is faster than the file consumer
>>>> so when you drop a JMS message with the filename pointing at the move
>>>> folder then the JMS consumer in some circumstances be ahead of the
>>>> file consumer and trying to get the file before it's actually moved
>>>> there. Hence #1
>>>>
>>>>
>>>> You could to fix
>>>> =============
>>>> b) use camel to route and move the file using pipes and filters
>>>> from("file://inbox?delete=true").to("file://moved").to("jms:queue:pleasegetthefilenow");
>>>> Note: However this will read the file content and save it as a new
>>>> file, it's not a native File IO move operation
>>>>
>>>> a) move the file yourself and then afterwards send the JMS message.
>>>> from("file://inbox").to("bean:myFileMover").to("jms:queue:pleasegetthefilenow");
>>>>
>>>> Using a POJO bean you can move the file yourself using File rename.
>>>>
>>>>
>>>>
>>>> /Claus Ibsen
>>>> Apache Camel Committer
>>>> Blog: http://davsclaus.blogspot.com/
>>>>
>>>>
>>>>
>>>> On Tue, Dec 2, 2008 at 11:51 PM, Christopher Hammack
>>>> <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>> In addition to the memory leak issue in Camel 1.5.0, I have a few other
>>>>> concerns about the file consumer endpoint--some of which could be a
>>>>> misunderstanding on my part:
>>>>>
>>>>> 1.  When using the default move capability (moving a file to .camel)
>>>>> after
>>>>> it has been picked up, the java.io.File object refers to the path
>>>>> BEFORE
>>>>> the
>>>>> move, not after.  So in order to actually read the file, my processing
>>>>> code
>>>>> must have knowledge of which path the file was moved to.  Is this
>>>>> intentional?
>>>>>
>>>>> 2.  Occasionally, especially when the system is under considerable
>>>>> load,
>>>>> the
>>>>> java.io.File object that I get is not available in the moved location,
>>>>> which
>>>>> generates a FileNotFoundException.  When I check that location later
>>>>> on,
>>>>> the
>>>>> file is in the correct location.  Looking at the code, it seems like
>>>>> the
>>>>> message should not be being propogated back to me prior to the rename
>>>>> occurring, but it is apparently happening.  Any thoughts?
>>>>>
>>>>> The use case for this is a very large number of small files is being
>>>>> dropped
>>>>> into a directory.  This directory is then being scanned by camel's file
>>>>> endpoint.  The files as they are discovered are then moved to the
>>>>> .camel
>>>>> directory, and the filename is put onto a jms endpoint.  A clustered
>>>>> set
>>>>> of
>>>>> camel processors then pull the filename off the endpoint and process
>>>>> the
>>>>> file, and then delete it.
>>>>>
>>>>> Any suggestions would be appreciated as right now I'm "stranding" about
>>>>> one
>>>>> out of every 1000 files because the processing checks for the file
>>>>> prior
>>>>> to
>>>>> the file actually being in the moved location.
>>>>>
>>>>> Thanks.
>>>>> --
>>>>> View this message in context:
>>>>> http://www.nabble.com/Concerns-about-File-endpoint-tp20802855s22882p20802855.html
>>>>> Sent from the Camel - Users mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Concerns-about-File-endpoint-tp20802855s22882p20815087.html
>>> Sent from the Camel - Users mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Concerns-about-File-endpoint-tp20802855s22882p20836110.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
>
>

Re: Concerns about File endpoint

Reply via email to