Re: Status, Release Candidates and Derby
On 19/12/05, Daniel John Debrunner <[EMAIL PROTECTED]> wrote: > The use I was suggesting CLOB for was the storage for mail recipents > which currently is a LONG VARCHAR, not the body of the message. Oh, my mistake. > Though I was surprised to see BLOB for mail body storage, I'd naively > assumed it would be CLOB. Yeah, I thought you would, hence I jumped to the wrong conclusion! > For the reasons you give BLOB is probably the > correct storage, from the javax.mail classes it seems that the body is > transported as a set of bytes. Yeah, mail involves a lot of layers (defined largely in MIME specs) of encoding to transport rich "modern" content on a protocol which dates back to the days when US-ASCII was the only charater set there was. You could say that MIME effectively specifies a binary "file format" for any old thing, which is alos backwards compatible with ASCII screen readers. It s also future proofed against bizarre new content types and IMHO is an unsung triumph of the kind of big brain thinking that made the net great. Back to the subject at hand though, the safe way not to break the encodings during transport is to ignore the whole issue and treat it as bytes. > Just FYI for Derby CLOBs, the character set is always Unicode and the > stream is available through the standard ResultSet/Clob methods as a > Java Reader or (not very useful) Ascii stream. Sounds safe enough, but for simplicity I'd still rather just duck the issue and stick with the bytes :-) d. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Status, Release Candidates and Derby
Danny Angus wrote: >>A better datatype would be CLOB, then you could have up to 2Gb character > > limit. > >>I don't believe this would require any application changes (moving to a > > CLOB). > > >>We've seen some odd behavior dealing with CLOBs with some drivers, hence > > the > >>useBlob vs useBytes attibute in sqlResources. Would that apply here? > > > IIRC there could be issues relating to the character encoding used in the > CLOB or the stream it is exposed as, mail is notorious in its ability to > mix encodings within a single message, a BLOB is more "raw" than a CLOB > which presupposes that it contains characters. This may explain our > historic choice of BLOB vs CLOB, or it may be wrong. The use I was suggesting CLOB for was the storage for mail recipents which currently is a LONG VARCHAR, not the body of the message. Though I was surprised to see BLOB for mail body storage, I'd naively assumed it would be CLOB. For the reasons you give BLOB is probably the correct storage, from the javax.mail classes it seems that the body is transported as a set of bytes. Just FYI for Derby CLOBs, the character set is always Unicode and the stream is available through the standard ResultSet/Clob methods as a Java Reader or (not very useful) Ascii stream. Dan. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Status, Release Candidates and Derby
> A better datatype would be CLOB, then you could have up to 2Gb character limit. > I don't believe this would require any application changes (moving to a CLOB). > We've seen some odd behavior dealing with CLOBs with some drivers, hence the > useBlob vs useBytes attibute in sqlResources. Would that apply here? IIRC there could be issues relating to the character encoding used in the CLOB or the stream it is exposed as, mail is notorious in its ability to mix encodings within a single message, a BLOB is more "raw" than a CLOB which presupposes that it contains characters. This may explain our historic choice of BLOB vs CLOB, or it may be wrong. Field size for this is worth knowing about though, James would need to have db field size >= max allowed message size. d. *** The information in this e-mail is confidential and for use by the addressee(s) only. If you are not the intended recipient (or responsible for delivery of the message to the intended recipient) please notify us immediately on 0141 306 2050 and delete the message from your computer. You may not copy or forward it or use or disclose its contents to any other person. As Internet communications are capable of data corruption Student Loans Company Limited does not accept any responsibility for changes made to this message after it was sent. For this reason it may be inappropriate to rely on advice or opinions contained in an e-mail without obtaining written confirmation of it. Neither Student Loans Company Limited or the sender accepts any liability or responsibility for viruses as it is your responsibility to scan attachments (if any). Opinions and views expressed in this e-mail are those of the sender and may not reflect the opinions and views of The Student Loans Company Limit ed. This footnote also confirms that this email message has been swept for the presence of computer viruses. ** - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Status, Release Candidates and Derby
Daniel John Debrunner wrote: > I'm looking at how James uses Derby Your observations aside, JAMES can be far more efficient in its management of data than it is at present. Too much copying. dbfile is more efficient, and I'd like to see us separate mail (associates state, attributes and recipients with a message) from message, so that we have fewer copies and don't need to move the message when mail switches between processors. Back to your comments ... > I'm curious about the recipients and {message_body, message_attributes} > fields. > recipients long varchar - In Derby LONG VARCHAR has a limit of 32,700 > characters. Does this limit James in any way? As per RFC 2821, an e-mail address is limited to 320 characters, so we'd be good for 100 addresses, which is the minimum permitted by the RFC. Practically, we would be good for far more than the minimum. > A better datatype would be CLOB, then you could have up to 2Gb character limit. > I don't believe this would require any application changes (moving to a CLOB). We've seen some odd behavior dealing with CLOBs with some drivers, hence the useBlob vs useBytes attibute in sqlResources. Would that apply here? > message_body, message_attributes are defined as BLOB, this means > BLOB(1M), a one megabyte blob. Is this a concern for James? > (http://db.apache.org/derby/docs/10.1/ref/rrefblob.html) Possibly, yes, thanks. Other databases have similar limits. Personally, I'd suggest that until the aforementioned changes are made, anyone accepting large e-mail (a maximum message size can be configured in the SMTP handler) be using dbfile, rather than db. --- Noel - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Status, Release Candidates and Derby
Noel J. Bergman wrote: > I have spent much of ApacheCon working on testing JAMES. Ran into some > little bits, but generally OK. Ran with Derby for about 48 hours stably, I'm looking at how James uses Derby and I see for the message repository the create table statement is: CREATE TABLE ${table} ( message_name varchar (200) NOT NULL, repository_name varchar (255) NOT NULL, message_state varchar (30) NOT NULL , error_message varchar (200) , sender varchar (255) , recipients long varchar NOT NULL , remote_host varchar (255) NOT NULL , remote_addr varchar (20) NOT NULL , message_body blob NOT NULL , message_attributes blob , last_updated timestamp NOT NULL, PRIMARY KEY (repository_name, message_name) ) I'm curious about the recipients and {message_body, message_attributes} fields. recipients long varchar - In Derby LONG VARCHAR has a limit of 32,700 characters. Does this limit James in any way? A better datatype would be CLOB, then you could have up to 2Gb character limit. (http://db.apache.org/derby/docs/10.1/ref/rrefsqlj15147.html) (http://db.apache.org/derby/docs/10.1/ref/rrefclob.html) I don't believe this would require any application changes (moving to a CLOB). message_body, message_attributes are defined as BLOB, this means BLOB(1M), a one megabyte blob. Is this a concern for James? If you want to have the maximum size of a BLOB then you would need to define the type using BLOB(2G). (http://db.apache.org/derby/docs/10.1/ref/rrefblob.html) I'm not sure what your intended limits are in these cases, but wanted to ensure that you are not suprised by them. DB2 will also have the same limits. Dan. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Status, Release Candidates and Derby
Whoa, guys. I didn't say that there were bugs in Derby. I always look for any memory leaks, and was more concerned about finding them (and other bugs) in *our* code. :-) I did a test that pushed over 32K incoming messages (total messages would be over 64k, due to SMTP relay and POP3) through JAMES with Derby and heapdump enabled, and am not seeing any memory leak. I have tar'd up the heapdump (multiple snapshots during the run), SAR-INF/ and logs/ for review. It is available as memleak-test.tar.gz from people.apache.org/~noel. I'll delete it within the next week or so. As an aside to Danny: I'd like to encourage you to follow up your comment "I'm not going into the whole world of memory tuning here (that would be a half-day tutorial I could give!)" with a response to the next ApacheCon CFP. :-) We can post the next drop as a beta. That's fine. I'd already made Derby the default repository, to get it tested. Hopefully those of you who use Bayesian will consider testing against Derby for that, too, even if you change to your current database for mail repositories. --- Noel - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Status, Release Candidates and Derby
> I'm quite neutral about Derby, but I don't remember the reasons not to > continue to keep file repositories as the default. By definition it is > the simplest and safest. The reason is that there are known bugs in lock management of file repositories. I reported one few months ago, and in stress test I'm able to make them fails every time. Never had this problems with db repositories. Here is one: http://issues.apache.org/jira/browse/JAMES-397 Stefano - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Status, Release Candidates and Derby
I'm quite neutral about Derby, but I don't remember the reasons not to continue to keep file repositories as the default. By definition it is the simplest and safest. Vincenzo Stefano Bagnara wrote: If we're unsure that we can shake out the Derby bugs quickly we should pull this feature so that we can cycle through releases quickly to get to a final release soon. It would be better to add Derby in a subsequent 2.4.0 release in the not too distant future than hold up this release. I think we could release a 2.3.0b1 with default derby repositories. If the test reveal any REAL problem we can decide to remove it and to move it to 2.4.0. Stefano - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Status, Release Candidates and Derby
> If we're unsure that we can shake out the Derby bugs quickly we should pull > this feature so that we can cycle through releases quickly to get to a final > release soon. It would be better to add Derby in a subsequent 2.4.0 release > in the not too distant future than hold up this release. I think we could release a 2.3.0b1 with default derby repositories. If the test reveal any REAL problem we can decide to remove it and to move it to 2.4.0. Stefano - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Status, Release Candidates and Derby
+1 for 2.3.0, +1 for Beta, +1 for Derby support in 2.4.0 if the alternative is a delay. Vincenzo Steve Brewin wrote: Hi All, I would prefer 2.3.0 for this release for the reasons stated by Stefano and Søren. Beta or RC# I don't much care about, they mean different things to different people. Its more important to make our intent clear - a feature stable release, just shaking out the bugs. If we're unsure that we can shake out the Derby bugs quickly we should pull this feature so that we can cycle through releases quickly to get to a final release soon. It would be better to add Derby in a subsequent 2.4.0 release in the not too distant future than hold up this release. I'ld say just my 2 cents, but as I've contibuted so little to this release, maybe 1 cent. Cheers and seasonal greetings to you all, -- Steve -Original Message- From: Søren Hilmer [mailto:[EMAIL PROTECTED] Sent: 15 December 2005 11:05 To: James Developers List Subject: Re: Status, Release Candidates and Derby I am on par with Stefano regarding the number scheme, and also on a beta release before a RC release. IMO we should save 3.0 for the more drastic upcomming changes, and continue with 2.3.0 for this release. --Søren -- Søren Hilmer, M.Sc., M.Crypt. wideTrailPhone: +45 25481225 Pilevænget 41Email: [EMAIL PROTECTED] DK-8961 Allingåbro Web: www.widetrail.dk On Thu, December 15, 2005 11:20, Stefano Bagnara wrote: Noel J. Bergman wrote: Would people take a look at the current code and see if they feel comfortable with a release candidate? Unless I encounter a definitive memory leak or other problem, I'd like to call a vote on a release candidate. And since there are both configuration and functional changes, I'd suggest that v3 is perhaps the more appropriate designation than v2.3. I would better like a a 2.3b1 or 3.0b1 (beta release) before the 3.0rc1. I'm not sure James is ready for release candidate: changes I've done in the past months need to be tested by a wider audience to understand wether the users understand them or we need to change some behaviour. Furthermore, I think that we could change our opinion (I hope it doesn't happen but i'm not sure) about the "default" configuration (for mail stores, or anything else) after a beta cycle and I would not be happy to change similar thing in release candidate cycles. Anyway I'm +1 for the release, soon! About the version numbers I'm +1 for a 2.3.0 and +0 for 3.0. We didn't publish a numbering rule, so it's a personal feeling. I like the 2.3.0 because current trunk has less changes than the 2.1 to 2.2 step and it's not a "revolution". I would prefer to keep the 3.0 for the "next generation" (pojo, different container, different configuration style, much different behaviours). My preference is for an early 2.3.0 release and a fast move to the 3.0. 2.3.0 (current trunk) fixes a lot of bugs from 2.2.0 and is a due upgrade. Stefano - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Status, Release Candidates and Derby
Hi All, I would prefer 2.3.0 for this release for the reasons stated by Stefano and Søren. Beta or RC# I don't much care about, they mean different things to different people. Its more important to make our intent clear - a feature stable release, just shaking out the bugs. If we're unsure that we can shake out the Derby bugs quickly we should pull this feature so that we can cycle through releases quickly to get to a final release soon. It would be better to add Derby in a subsequent 2.4.0 release in the not too distant future than hold up this release. I'ld say just my 2 cents, but as I've contibuted so little to this release, maybe 1 cent. Cheers and seasonal greetings to you all, -- Steve > -Original Message- > From: Søren Hilmer [mailto:[EMAIL PROTECTED] > Sent: 15 December 2005 11:05 > To: James Developers List > Subject: Re: Status, Release Candidates and Derby > > > I am on par with Stefano regarding the number scheme, and > also on a beta > release before a RC release. > > IMO we should save 3.0 for the more drastic upcomming changes, and > continue with 2.3.0 for this release. > > --Søren > > -- > Søren Hilmer, M.Sc., M.Crypt. > wideTrailPhone: +45 25481225 > Pilevænget 41Email: [EMAIL PROTECTED] > DK-8961 Allingåbro Web: www.widetrail.dk > > On Thu, December 15, 2005 11:20, Stefano Bagnara wrote: > > Noel J. Bergman wrote: > >> Would people take a look at the current code and see if they feel > >> comfortable with a release candidate? Unless I encounter > a definitive > >> memory leak or other problem, I'd like to call a vote on a release > >> candidate. And since there are both configuration and functional > >> changes, > >> I'd suggest that v3 is perhaps the more appropriate > designation than > >> v2.3. > > > > I would better like a a 2.3b1 or 3.0b1 (beta release) > before the 3.0rc1. > > > > I'm not sure James is ready for release candidate: changes > I've done in > > the past months need to be tested by a wider audience to understand > > wether the users understand them or we need to change some > behaviour. > > > > Furthermore, I think that we could change our opinion (I > hope it doesn't > > happen but i'm not sure) about the "default" configuration (for mail > > stores, or anything else) after a beta cycle and I would > not be happy to > > change similar thing in release candidate cycles. > > > > Anyway I'm +1 for the release, soon! > > > > About the version numbers I'm +1 for a 2.3.0 and +0 for 3.0. > > > > We didn't publish a numbering rule, so it's a personal feeling. > > I like the 2.3.0 because current trunk has less changes > than the 2.1 to > > 2.2 step and it's not a "revolution". > > > > I would prefer to keep the 3.0 for the "next generation" (pojo, > > different container, different configuration style, much different > > behaviours). > > > > My preference is for an early 2.3.0 release and a fast move > to the 3.0. > > 2.3.0 (current trunk) fixes a lot of bugs from 2.2.0 and is > a due upgrade. > > > > Stefano > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Status, Release Candidates and Derby
I am on par with Stefano regarding the number scheme, and also on a beta release before a RC release. IMO we should save 3.0 for the more drastic upcomming changes, and continue with 2.3.0 for this release. --Søren -- Søren Hilmer, M.Sc., M.Crypt. wideTrailPhone: +45 25481225 Pilevænget 41Email: [EMAIL PROTECTED] DK-8961 Allingåbro Web: www.widetrail.dk On Thu, December 15, 2005 11:20, Stefano Bagnara wrote: > Noel J. Bergman wrote: >> Would people take a look at the current code and see if they feel >> comfortable with a release candidate? Unless I encounter a definitive >> memory leak or other problem, I'd like to call a vote on a release >> candidate. And since there are both configuration and functional >> changes, >> I'd suggest that v3 is perhaps the more appropriate designation than >> v2.3. > > I would better like a a 2.3b1 or 3.0b1 (beta release) before the 3.0rc1. > > I'm not sure James is ready for release candidate: changes I've done in > the past months need to be tested by a wider audience to understand > wether the users understand them or we need to change some behaviour. > > Furthermore, I think that we could change our opinion (I hope it doesn't > happen but i'm not sure) about the "default" configuration (for mail > stores, or anything else) after a beta cycle and I would not be happy to > change similar thing in release candidate cycles. > > Anyway I'm +1 for the release, soon! > > About the version numbers I'm +1 for a 2.3.0 and +0 for 3.0. > > We didn't publish a numbering rule, so it's a personal feeling. > I like the 2.3.0 because current trunk has less changes than the 2.1 to > 2.2 step and it's not a "revolution". > > I would prefer to keep the 3.0 for the "next generation" (pojo, > different container, different configuration style, much different > behaviours). > > My preference is for an early 2.3.0 release and a fast move to the 3.0. > 2.3.0 (current trunk) fixes a lot of bugs from 2.2.0 and is a due upgrade. > > Stefano > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Status, Release Candidates and Derby
Noel J. Bergman wrote: > Would people take a look at the current code and see if they feel > comfortable with a release candidate? Unless I encounter a definitive > memory leak or other problem, I'd like to call a vote on a release > candidate. And since there are both configuration and functional changes, > I'd suggest that v3 is perhaps the more appropriate designation than v2.3. I would better like a a 2.3b1 or 3.0b1 (beta release) before the 3.0rc1. I'm not sure James is ready for release candidate: changes I've done in the past months need to be tested by a wider audience to understand wether the users understand them or we need to change some behaviour. Furthermore, I think that we could change our opinion (I hope it doesn't happen but i'm not sure) about the "default" configuration (for mail stores, or anything else) after a beta cycle and I would not be happy to change similar thing in release candidate cycles. Anyway I'm +1 for the release, soon! About the version numbers I'm +1 for a 2.3.0 and +0 for 3.0. We didn't publish a numbering rule, so it's a personal feeling. I like the 2.3.0 because current trunk has less changes than the 2.1 to 2.2 step and it's not a "revolution". I would prefer to keep the 3.0 for the "next generation" (pojo, different container, different configuration style, much different behaviours). My preference is for an early 2.3.0 release and a fast move to the 3.0. 2.3.0 (current trunk) fixes a lot of bugs from 2.2.0 and is a due upgrade. Stefano - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Status, Release Candidates and Derby
Noel, At work I found one of the best ways to detect memory problems is to bring down the -Xmx to a reasonably low level consistent with proper operation, make -Xms the same (to save wasting time watching it ramp up) and use garbage collection logging (add these switches -Xloggc:./logs/gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps) You should look for a nice even sawtooth pattern in the young generation, and a less frequent saw tooth in the tenured generation, ten (or more) young per one old (full or compacting). I'm not going into the whole world of memory tuning here (that would be a half-day tutorial I could give!) but the symtom of a leak in the java (c.f. the native parts) would be that after several full collections to let it bed down the use of the tenured space should vary around a finite value, (not tend to infity however slowly) If the young space in use after a young collection rises consistently the heap sizing is wrong enough to make the data we're interested in invalid, if it is high but stable it is probably wrong for operation, but the data we care about will be valid. This check is all about patterns, not absolute values. You can graph the log output patterns and get some really good insights (using either sun's GC portal, spread sheet macro, or a wee script) but I find that you can do this check pretty easily using grep and metal arithmetic ("Mental Math" I think you yankees call it ;-) It is a very useful and unobtrusive way to instrument any JVM, I think of it as analogous to a doctors stethoscope. At work we log GC all the time on our production systems, after a bit of practice you can tail the log file and derive some good information about your jvm's state of mind. One last gotcha on memory is that in long running systems, particularly ones which have a lot of classes and other statics the permanent space can be an issue. There is a defect in Sun's handling of this, their docs say that when an allocation of permanent space is required which would exceed the size of the largest free block of permanent space the jvm will make the allocation from tenured space. In practice what happens is that it tries to perform a compacting collection of the permanent space, then retries the allocation, if there is *still* not enough space it will try the compacting collection again, and so a loop is born and we see the process become unresponsive while consuming 100% cpu. d. On 14/12/05, Noel J. Bergman <[EMAIL PROTECTED]> wrote: > I have spent much of ApacheCon working on testing JAMES. Ran into some > little bits, but generally OK. Ran with Derby for about 48 hours stably, > but was not sure if there was any memory leak or not (TOP showed a slow, > consistent, memory increase, but that's not a conclusive indicator for our > Java code), so I am running another test with with heapdump enabled. > > Generally, things look good. I will add a derby.properties file to bin/ > with a statement in it to control the cache size, and to provide a > placeholder for any users who want to control Derby properties. > > Would people take a look at the current code and see if they feel > comfortable with a release candidate? Unless I encounter a definitive > memory leak or other problem, I'd like to call a vote on a release > candidate. And since there are both configuration and functional changes, > I'd suggest that v3 is perhaps the more appropriate designation than v2.3. > > --- Noel > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]