Hi Chris I am doing the final testing on the pre move on the 1.5.1 branch. It has been commited to the trunk.
I also updated the wiki page for the file component to explain a bit more on the file repository for the idempotent thing we have in Camel 2.0. The file repo is basic as it's just a persistence store for a 1st level cache - but it survives server restarts. We can always improve later if needed. But if you need a really large idempotent repo the we have the JPA you can use. Yeah I am hoping we get some time sooner to get the JMS component improved in 2.0 to have this exchange transfer as it has been requested a few times and good for internal routing using the JMS queues to survive restarts. /Claus Ibsen Apache Camel Committer Blog: http://davsclaus.blogspot.com/ On Thu, Dec 4, 2008 at 4:46 PM, Christopher Hammack <[EMAIL PROTECTED]> wrote: > > I opened CAMEL-1148 on the preMove concept. > > I would be willing to test the transferExchange when you get that going. > > It's a good question on the file-based persistence. I suppose you could do > something simple like putting a hidden file that is a java serialized object > of the list but I think, at least for our situation, scalability would be an > issue. To perform well you'd need a datastructure that you could either > natively search without it being in memory, or accept the limitation that > all of the metadata must be in memory (e.g. a hashmap)--but that probably > won't scale well. I suppose you could use derby (but that's basically just > jpa anyhow) or something from hadoop or one of the persistence technologies > used in activemq. > > For us, we bring in on the order of several hundred thousand to million > files per day, and so keeping track of that much metadata and trying to find > the new subset will be probably be untenable on anything but a full blown > database store. That's why we came up with the preMove concept as it > allows us to use the actual filesystem for marking which files have been > scanned, and by using persistence in JMS we survive restarts. I can see > situations where there are much less files and that could work though. > > > Claus Ibsen-2 wrote: >> >> Hi >> >> Glad I could help. The preMove command is a very good idea. Please >> feel free to create a ticket about it in JIRA. >> >> Yeah the jms-component auto deciding the javax.jms.XXX message type is >> an issue I also would like to remedy for Camel 2.0. >> There is a ticket about it. >> >> However you want to only send the java.io.File over the JMS and not >> the actual payload of the file (kinda like sending the pointer of the >> file). >> I would envision that java.io.File by default would load the file >> content and send that over JMS. So you want to keep it as a >> java.io.File object and that's it. >> >> We have a ticket for sending the exchange itself, like we have for >> camel-mina: transferExchange=true. That one would resolve the issue >> you have, since you are using Camel as both sender and reciever of the >> JMS queues. There are some ticket in JIRA for this. Feel free to >> comment and vote for them. >> >> We might get started on them pretty soon, then you could help test it >> on your system. >> >> >> The idempotent repository for the file consumer can be changed to a >> jpa version, or you can implement your own. The jpa will persist in a >> DB and thus survive restarts. We are also planning on a file base repo >> as well. In fact it's on my next todo list. Do you have any >> suggestions / requirements for such a file based repo? >> >> >> >> /Claus Ibsen >> Apache Camel Committer >> Blog: http://davsclaus.blogspot.com/ >> >> >> >> On Wed, Dec 3, 2008 at 4:27 PM, Christopher Hammack >> <[EMAIL PROTECTED]> wrote: >>> >>> Thanks for the clarification. This does completely explain the >>> situation, >>> and applying a variant of your solution "a" seems to have things working >>> much more reliably. >>> >>> However, I'd like to suggest that you add a "preMove" option as it seems >>> to >>> be pretty much a requirement for doing clustered seda-style processing >>> from >>> a file endpoint. I suppose you could use the new noop/idempotent >>> capabilities that you have added in 2.0, but it's also nice to have >>> separation between files that have yet to be discovered and files that >>> are >>> in process and also I'm a little leery of keeping state information like >>> that around as the file endpoint idempotent data presumably (maybe I'm >>> wrong?) doesn't persist across restarts, etc. >>> >>> Also, is there a way to disable the implicit conversion of java.io.File >>> to a >>> jms bytes message on JMS? This is also not desirable in this situation. >>> Putting gigabytes of byte data on jms does not work out very well. To >>> get >>> around this we have to convert the java.io.File to a String prior to >>> putting >>> it on JMS, and then convert it back to a java.io.File so we can give our >>> users the ability to use the camel built in transformers to transform to >>> the >>> input method of their choice (byte[], InputStream, FileReader, >>> java.io.File, >>> etc.). This works, but it kind of takes away from the "cleanliness" of >>> the >>> routes. >>> >>> Thanks for your help--we've found camel to be much superior in many ways >>> to >>> another animal-named esb product that we're attempting to migrate away >>> from. >>> >>> >>> >>> Claus Ibsen-2 wrote: >>>> >>>> Hi >>>> >>>> #1 >>>> The move is executed *AFTER* the routing. >>>> The idea is that you process the file while it's in the target folder >>>> (where its dropped) and after processing you can move it to a backup >>>> folder. >>>> >>>> #2 >>>> Ah I speculate that the JMS consumer is faster than the file consumer >>>> so when you drop a JMS message with the filename pointing at the move >>>> folder then the JMS consumer in some circumstances be ahead of the >>>> file consumer and trying to get the file before it's actually moved >>>> there. Hence #1 >>>> >>>> >>>> You could to fix >>>> ============= >>>> b) use camel to route and move the file using pipes and filters >>>> from("file://inbox?delete=true").to("file://moved").to("jms:queue:pleasegetthefilenow"); >>>> Note: However this will read the file content and save it as a new >>>> file, it's not a native File IO move operation >>>> >>>> a) move the file yourself and then afterwards send the JMS message. >>>> from("file://inbox").to("bean:myFileMover").to("jms:queue:pleasegetthefilenow"); >>>> >>>> Using a POJO bean you can move the file yourself using File rename. >>>> >>>> >>>> >>>> /Claus Ibsen >>>> Apache Camel Committer >>>> Blog: http://davsclaus.blogspot.com/ >>>> >>>> >>>> >>>> On Tue, Dec 2, 2008 at 11:51 PM, Christopher Hammack >>>> <[EMAIL PROTECTED]> wrote: >>>>> >>>>> In addition to the memory leak issue in Camel 1.5.0, I have a few other >>>>> concerns about the file consumer endpoint--some of which could be a >>>>> misunderstanding on my part: >>>>> >>>>> 1. When using the default move capability (moving a file to .camel) >>>>> after >>>>> it has been picked up, the java.io.File object refers to the path >>>>> BEFORE >>>>> the >>>>> move, not after. So in order to actually read the file, my processing >>>>> code >>>>> must have knowledge of which path the file was moved to. Is this >>>>> intentional? >>>>> >>>>> 2. Occasionally, especially when the system is under considerable >>>>> load, >>>>> the >>>>> java.io.File object that I get is not available in the moved location, >>>>> which >>>>> generates a FileNotFoundException. When I check that location later >>>>> on, >>>>> the >>>>> file is in the correct location. Looking at the code, it seems like >>>>> the >>>>> message should not be being propogated back to me prior to the rename >>>>> occurring, but it is apparently happening. Any thoughts? >>>>> >>>>> The use case for this is a very large number of small files is being >>>>> dropped >>>>> into a directory. This directory is then being scanned by camel's file >>>>> endpoint. The files as they are discovered are then moved to the >>>>> .camel >>>>> directory, and the filename is put onto a jms endpoint. A clustered >>>>> set >>>>> of >>>>> camel processors then pull the filename off the endpoint and process >>>>> the >>>>> file, and then delete it. >>>>> >>>>> Any suggestions would be appreciated as right now I'm "stranding" about >>>>> one >>>>> out of every 1000 files because the processing checks for the file >>>>> prior >>>>> to >>>>> the file actually being in the moved location. >>>>> >>>>> Thanks. >>>>> -- >>>>> View this message in context: >>>>> http://www.nabble.com/Concerns-about-File-endpoint-tp20802855s22882p20802855.html >>>>> Sent from the Camel - Users mailing list archive at Nabble.com. >>>>> >>>>> >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/Concerns-about-File-endpoint-tp20802855s22882p20815087.html >>> Sent from the Camel - Users mailing list archive at Nabble.com. >>> >>> >> >> > > -- > View this message in context: > http://www.nabble.com/Concerns-about-File-endpoint-tp20802855s22882p20836110.html > Sent from the Camel - Users mailing list archive at Nabble.com. > >
