RE: new InputStream class for mail data

2003-07-18 Thread Sandiep U. Sharma

--- "Noel J. Bergman" <[EMAIL PROTECTED]> wrote:
> > Since we are talking about improving SMTP data
> handler
> > class I want to draw attention to some performance
> > issues I had found earlier while analyzing
> > CharTerminatedInputStream class.
> 
> > The point of major concern was the use of read
> method
> > to read data from Socket InputStream. for example
> if
> > we consider an average mail size of 40KB, read
> method
> > will be called 40 x 1024 = 40960
> 
> This seems to be factually wrong.  Consider the
> following:
> 
> in = new
> BufferedInputStream(socket.getInputStream(), 1024);
> inReader = new CRLFTerminatedReader(in,
> "ASCII");
> InputStream msgIn = new
> CharTerminatedInputStream(in,
> SMTPTerminator);
> 
> In all cases, the filtered data is coming through
> BufferedInputStream with a
> 1K buffer.

The point here is not that the data is coming from
BufferedInputStream. The point of concern is that the
read() method is called so many times putting a huge
amount of function call overhead so i am just trying
find a way to avoid that.

I would prefer a approach wherein we read a block of
data and use a for loop to find out the terminating
characters, if terminator falls  at the block
boundary, read next block and continue to match
terminator. In this process any extra bytes in buffer
can be easily unread if we use PushBackInputStream.

The logic behind the above approach is that the for
loop executes much faster on a block of data say 2KB
as compared to calling read() method 2048 times over
the same block.

I have tested the above mentioned approach and the
results are quite impressive. It takes 12 seconds to
read 110KB mail file 1000 times using
CharTerminatedInputStream while on the other hand it
takes hardly 5 seconds to read the same mail file 1000
times using the new approach. The hardware
configuration was CPU P4/RAM 256MB DDR/HDD 40GB/OS Win
98

I'll submit the source code of my test routine and the
new optimized InputStream in my forthcoming mails.

Cheers

__
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Your Mail has been Quarantined: Re: new InputStream class for mail data]

2003-07-16 Thread Chiew Ruoh Tau
My apologies.  It is a glitch in my email client setting.  Nothing to do
with the James mailing list.  It should have been fixed now.

- Original Message - 
From: "Richard O. Hammer" <[EMAIL PROTECTED]>
To: "James Developers List" <[EMAIL PROTECTED]>
Sent: Thursday, July 17, 2003 12:36 AM
Subject: [Fwd: Your Mail has been Quarantined: Re: new InputStream class for
mail data]


> I am getting a message such as the following each time I post email to
> this list of James developers.
>
> Is this something I should deal with individually?  Or is this an
> issue for the list administrator?
>
> Rich
>
>  Original Message ----
> Subject: Your Mail has been Quarantined: Re: new InputStream class for
> mail data
> Date: Wed, 16 Jul 2003 22:50:01 +0800
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
>
>
>
> <http://www.antejunk.com>
>
> Antejunk has quarantined your mail due to suspected spam content.
>
> Mail sent to: [EMAIL PROTECTED]
> Subject: Re: new InputStream class for mail data
> Date sent: 16/07/2003 22:14
>
> To release the mail to the recipient, please verify yourself as a valid
> sender by clicking here
>
<http://www.antejunk.com/[EMAIL PROTECTED]&msgid=6313_9782_47
01_8068_9149_1058366736663>
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: new InputStream class for mail data

2003-07-16 Thread Noel J. Bergman
> Thank you again, Noel, for calling my attention to FilterInputStream.

You're welcome.

> In the particular case with SMTPDataInputStream [...]

In the case of SMTPDataInputStream, we would not want to call close(), which
is why no one does.  :-)

> As I describe in the comment at the end of class SMTPDataInputStream,
> this implementation relies upon the behavior of the two InputStream
> methods read(byte[]) and read(byte[], int, int) which work by making
> repeated calls to the read() method

You would override read(byte[], int, int) to implement the required
behavior.  That is easy enough, and FilterInputStream.read(byte[]) is fine.

Perhaps FilterInputStream.read(byte[], int, int) should have been left
unchanged to call the core read() method, and required a specialized class
to call the delegated stream if desired, but that would not have been
consistent.  One would have to ask Sun what they were thinking.

--- Noel


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: new InputStream class for mail data

2003-07-16 Thread Noel J. Bergman
> The SMTPDataInputStream class which I have written expresses [my] "a
> period alone in a line" interpretation

When you start to forward e-mail through JavaMail, let me know what happens.
I expect that you will find an extra line being inserted at the end.

--- Noel


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: new InputStream class for mail data

2003-07-16 Thread Richard O. Hammer
Noel J. Bergman wrote:
It all depends upon the RFC compliance.
To summarize this question in my view, RFC 2821 clearly offers two 
interpretations (that the end of data indicator is a period alone in a 
line; that the end of data indicator is CRLF.CRLF) which lead to two 
different sets of behavior in a few cases which probably are not very 
important.

This confusion originates in the wording of the RFC.  I think the 
RFC's writers did not understand the difference between "a period 
alone in a line" and "CRLF.CRLF", a difference which may be noticed by 
only a few writers of parsers.

Because the RFC offers these two interpretations, I expect that each 
interpretation has been expressed in some presently working code. 
Each interpretation probably has people who would fight for it.  As 
such, if we assume that the writers of a future revision to the RFC 
become conscious of the confusion possible on this issue, I expect 
they will deliberately adopt wording which continues to allow both 
interpretations.

As such, I suppose James is safely RFC compliant on this issue as it 
is now.

But, between the two interpretations, both of which I believe must 
ultimately be acceptable, I sort of like the "a period alone in a 
line" interpretation better, because I suppose that was first 
historically.  The first idea was probably to put "a period alone in a 
line" as a way to signal the end of data.

A subsequent, later idea, probably thought up by programmers who 
needed to implement "a period alone in a line"  was to scan for 
"CRLF.CRLF" (as I am guessing the history).  In fact "CRLF.CRLF" was 
probably favored by many programmers as the way to indicate "a period 
alone in a line" because it is easier to search for than "a period 
alone in a line".

But, as I continue to insist, the two interpretations lead to 
different behavior in a few minor ways in programs which attempt to 
comply with RFC 2821.

The SMTPDataInputStream class which I have written expresses the "a 
period alone in a line" interpretation, and thus behaves differently 
from the present James classes in two small ways, as described in its 
Javadoc.

Rich

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: new InputStream class for mail data

2003-07-16 Thread Richard O. Hammer
I attach a revision of the class SMTPDataInputStream which I submit 
for your consideration.  This replaces the version which I mailed to 
this list on July 2.

The external behavior of the code has not changed.  It still passes 
the JUnit tests.

This code should run faster.  I had arranged the code for logical 
clarity as much as for executing speed.  But Noel's comments made me 
realize just how rare would be the case in which this class enters its 
BUFFERING_STATE.  It may never get into that state.  Yet the code was 
testing if(receivedState == BUFFERING_STATE) for every byte read.  Now 
that contingency is handled as a case in a switch, so the most common 
reading of ordinary sequences goes faster.

In addition to the file containing SMTPDataInputStream, I attach once 
again TestSIS.java (a file of JUnit tests).  I also attach two little 
stub classes, for WatchDog and MessageSizeException, which you may 
find useful if you want to compile and test SMTPDataInputStream 
separate from a James installation.

Rich
import java.io.BufferedInputStream;
import java.io.IOException;
import java.io.InputStream;

/** 
 * An InputStream for SMTP message body data. Performs four of the functions
 *  needed while receiving the message body in the SMTP DATA command:
 * 
 *   watches for the end of mail data indicator, and signals this by
 *   returning EOF, that is -1.
 *   
 *   removes dot stuffing as described in RFC 2821 section 4.5.2.
 *   
 *   works with james.util.watchdog.Watchdog to police the minimum rate of 
 *   data transfer.  Calls Watchdog.reset() every time a certain number of
 *   bytes have been read, thus forestalling intervention of the watchdog for
 *   another time increment.
 *   
 *   optionally polices the total size of the data.  Throws 
 *   MessageSizeException if this exceeds a limit.
 *  
 * 
 * 
 * The end of mail data indicator which this class recognizes is a period
 * alone in a line.  This indicator is often described as "CRLF.CRLF", but that
 * description leads to errors in possibly minor ways.  The better 
 * description which this class recognizes, "a period alone in a line", leads
 * to better behavior in two ways:
 * 
 *   When the end of mail data indicator is recognized in the input stream,
 *the CRLF which immediately preceded the period in the indicator is 
 *returned as part of the mail data as the CRLF which concludes the final
 *line of mail data, rather than being discarded as part of the end of mail
 *data indicator.
 *   
 *   The end of mail data indicator can occur in the very first line of 
 *   mail data, with the period being the first character read.
 *   
 * 
 * RFC 2821 discusses this in sections 2.3.7, 3.3, 4.1.1.4, 4.5.2.
 * 
 *  This class resets the WatchDog each time it has read a quota of bytes
 * as specified in the constructor.  But it does not reset or stop the 
 * WatchDog when it recognizes the end of mail data indicator and returns EOF.
 * 
 * This class returns EOF in two circumstances: when it recognizes the end 
 * of mail data indicator in the stream (a normal occurrence); when the 
 * underlying stream signals EOF (probably an error of some sort).  This 
 * behavior may be okay, in that it mimics the behavior of the earlier James 
 * class CharTerminatedInputStream, but it may need further examination at some
 * point.
 * 
 * An instance of this class can not be reset.  A new instance must 
 * constructed for each message's data.
 * 
 */
public class SMTPDataInputStream extends InputStream{
BufferedInputStream in;

/* For a discussion of some decisions made in designing this class,
 * see the comment at the end.
 */

// The kinds of bytes we care about
static final int
EOF= -1,
CR = 13,
LF = 10,
PERIOD = 46;

//the states in which this SMTPDataInputStream may be
static final int
LINE_STARTING_STATE  = 0, //at the start of a line
MID_LINE_STATE   = 1, //the most common state
CR_STATE = 2, //a CR has been received
INIT_PERIOD_STATE= 3, //a period at start of line
INIT_PERIOD_CR_STATE = 4, //initial period then CR
BUFFERING_STATE  = 5, //see comments further down
EOF_STATE= 6; //either EOF of end of data

//the variable in which we keep the present state
private int receivedState = LINE_STARTING_STATE;

/* This comment describes the strategy for monitoring message size and
 * data transfer rate.  The five variables below serve these purposes.
 * 
 * The use of maxMessageSize should be obvious, but note that if it is
 * set to zero then it signals that there is no limit on message size.
 * 
 * Both of the limits (message size and data rate) are checked with one
 * operation in the most frequently used code by using a quota, kept in
 * currentQuota.  When the quota is reached (by decrementing 
 * bytes

RE: [Fwd: Your Mail has been Quarantined: Re: new InputStream class for mail data]

2003-07-16 Thread Noel J. Bergman
> I am getting a message such as the following each time I post email to
> this list of James developers.

I've gotten it, too.  It appears to be an anti-spam measure that the person
installed.  At least the code is smart enough to bounce to the individual,
and not the list.

Basically, it sends you a notice, which means that you have to have used a
valid address.  When you get it, you need to click on the link.  That
completes the circuit.  Then it will remember you for that sender.  It
requires spammers to use lots of real mailboxes, and just makes things
marginally more difficult for them.

It is an interesting application that someone could consider building into
James, actually.  You'd want a quarantine repository (like a long term
spooler), something in a web server or within james to maintain the approved
list, and a matcher that checked against it.

Could be part of James for free, and is not a bad idea at all.

Who wants to play?  :-)

--- Noel


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: new InputStream class for mail data

2003-07-16 Thread Richard O. Hammer
Noel J. Bergman wrote:
FilteredInputStream is the right thing to extend.
Thank you again, Noel, for calling my attention to FilterInputStream. 
 I had not previously understood what purpose FilterInputStream 
serves in the API.  But your suggestion, coming after this programming 
exercise in which I've had to understand the consequences of calling 
read(byte[]) when my class does not override that method, has helped 
me understand a need for FilterInputStream.

> ... SMTPInputStream
implements read() based upon the real stream, which is the one
non-constructor method that is correct in the code.  The rest are wrong.
For example:
public void close() throws IOException{
super.close();
}
explicitly illustrates a problem that is implicit with all of the inherited
methods.  The code invokes the inherited implementation, often a NO-OP.  It
ought to be delegating to the real stream.
In the particular case with SMTPDataInputStream, the class which I 
submit for your consideration, the underlying stream (the "real 
stream" as I understand your usage) is the BufferedInputStream which 
will later be read again by the envelope-command-line Reader.  So, 
unless I am mistaken, we do not want to be closing the underlying 
stream in this case.

> ... FilteredInputStream provides the
core wrapper for delegation, allowing you to override just those methods
that implement your unique behavior.
As I describe in the comment at the end of class SMTPDataInputStream, 
this implementation relies upon the behavior of the two InputStream 
methods read(byte[]) and read(byte[], int, int) which work by making 
repeated calls to the read() method in this descendant 
SMTPDataInputStream.  In this case that behavior is needed, in order 
to use the functionality which I have added only in the read() method.

The read(byte[]) and read(byte[], int, int) methods of 
BufferedInputStream, which evidently would be employed if I extended 
FilterInputStream, do not behave this way, according to my tests. 
They appear to read directly from somewhere deeper, and thus would not 
express the functionality added in the one read() method.

But FilterInputStream does look like a class I will want to use 
another time, in a similar task.

Rich

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: new InputStream class for mail data

2003-07-16 Thread Steve Short
I mean March 2002 !

> -Original Message-
> From: Steve Short 
> Sent: Wednesday, July 16, 2003 10:02 AM
> To: James Developers List
> Subject: RE: new InputStream class for mail data
> 
> 
> 
> > In all cases, the filtered data is coming through
> > BufferedInputStream with a 1K buffer.
> 
> Sandeep first submitted this idea back in March of 2003 and 
> the SMTPHandler didn't use a buffered input stream or reader 
> back then.  We did some measurements and found that his 
> modification did give a performance benefit but I seem to 
> remember there was a problem with boundary conditions at the 
> end of the input data.
> 
> Sandeep - are you comparing your modifications against a 
> recent version of James?  How about sharing your figures with us?
> 
> Cheers
> Steve
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: new InputStream class for mail data

2003-07-16 Thread Steve Short

> In all cases, the filtered data is coming through 
> BufferedInputStream with a 1K buffer.

Sandeep first submitted this idea back in March of 2003 and the
SMTPHandler didn't use a buffered input stream or reader back then.  We
did some measurements and found that his modification did give a
performance benefit but I seem to remember there was a problem with
boundary conditions at the end of the input data.

Sandeep - are you comparing your modifications against a recent version
of James?  How about sharing your figures with us?

Cheers
Steve

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: new InputStream class for mail data

2003-07-16 Thread Noel J. Bergman
> Since we are talking about improving SMTP data handler
> class I want to draw attention to some performance
> issues I had found earlier while analyzing
> CharTerminatedInputStream class.

> The point of major concern was the use of read method
> to read data from Socket InputStream. for example if
> we consider an average mail size of 40KB, read method
> will be called 40 x 1024 = 40960

This seems to be factually wrong.  Consider the following:

in = new BufferedInputStream(socket.getInputStream(), 1024);
inReader = new CRLFTerminatedReader(in, "ASCII");
InputStream msgIn = new CharTerminatedInputStream(in,
SMTPTerminator);

In all cases, the filtered data is coming through BufferedInputStream with a
1K buffer.

> I have used PushBackInputStram and block read method

I would like to hear more about what you did, but as noted above, I don't
believe that you are correct in your initial analysis.

--- Noel


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: new InputStream class for mail data

2003-07-16 Thread Noel J. Bergman
Richard,

> In case you are considering using that class SMTPDataInputStream, you
> may want to know that I am now rewriting part of it, making what I
> consider to be an improvement, an improvement stimulated by the
> discussion of the last few days.

It all depends upon the RFC compliance.

The I/O handling architecture is going to change, anyway.  The pull model
does not scale to large numbers of connections.  The push model used in nio
means that:

  1) data arrives and is dispatched to a worker thread.
  2) the worker thread takes the data and processes it
 through an object associated with the connection.
  3) the worker then returns to process the next packet
 for whichever connection happens to be serving up
 data next.

Things like MailImpl(String, MailAddress, Collection, InputStream) cannot be
used the way we do today because you cannot sit on a socket and wait for
data.  But since so much code, including code we do not have control over,
relies upon InputStream, we will have to accomodate it.

It is do-able.  It is do-able in such fashion that it supports both java.io
and java.nio with a single set of handlers.  But it is a change.  And we're
going to need code that implements the filtering behavior in a fashion
compatible with that change.

--- Noel


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: new InputStream class for mail data

2003-07-16 Thread Sandiep U. Sharma
Hi

Since we are talking about improving SMTP data handler
class I want to draw attention to some performance
issues I had found earlier while analyzing
CharTerminatedInputStream class.

The point of major concern was the use of read method
to read data from Socket InputStream. for example if
we consider an average mail size of 40KB, read method
will be called 40 x 1024 = 40960 times and that
consumes lot of CPU time. Number of calls can be
easily reduced to just 20 if use block size of 2KB and
that makes a huge difference of the overall throughput

I have used PushBackInputStram and block read method
to implement a mail proxy server for one of our client
and the benchmark results were quite impressive
against nonblock read method. In my case my server's
mail handling capability increased two times.

Sandi

--- "Richard O. Hammer" <[EMAIL PROTECTED]>
wrote:
> In case you are considering using that class
> SMTPDataInputStream, you 
> may want to know that I am now rewriting part of it,
> making what I 
> consider to be an improvement, an improvement
> stimulated by the 
> discussion of the last few days.
> 
> Rich
> 
> 
>
-
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
> 


__
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: new InputStream class for mail data

2003-07-16 Thread Richard O. Hammer
In case you are considering using that class SMTPDataInputStream, you 
may want to know that I am now rewriting part of it, making what I 
consider to be an improvement, an improvement stimulated by the 
discussion of the last few days.

Rich

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: new InputStream class for mail data

2003-07-15 Thread Noel J. Bergman
> Based upon my present understanding, I would not throw an exception
> (and would not throw an exception in CRLFTerminatedReader either).  I
> would leave the lone CR in the stream, to be dealt with by whatever
> code handles it next.

Do as you will, but RFC 2821 4.3.2. says that you should reject the message
with a 554.

--- Noel


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: new InputStream class for mail data

2003-07-15 Thread Richard O. Hammer
Noel J. Bergman wrote:
I disagree.  An empty data set can be a valid message.  I find support
in RFC 2821 Section 4.1.1.4.
Not RFC 2821.  RFC 2822, section 3.6:

   The only required header fields are the origination date field and
   the originator address field(s).  All other header fields are
   syntactically optional.
Outside RFC 2821 I don't know so well.  But I have the impression that 
SMTP message body can -- optionally -- contain a RFC 2822 message. 
But SMTP does not demand a RFC 2822 message in the body.

I wonder, do most MTA's demand that the SMTP message body be in the 
form specified by RFC 822 (or 2822)?

In any event, if you feel that your interpretation of the RFC is correct,
and that everyone else is wrong, please contact the IETF to explain where
they went wrong, and ask them to issue a correction.
I thought I was doing such homework when I took the question to the 
[EMAIL PROTECTED] email list.  But is that not the place to go with 
such issues?

As I have aged I think I have reduced the scope in which I take it 
upon myself to correct what I judge to be other peoples' errors.  I am 
inclined to let this drop here.  But I am not hiding.  If they need 
the truth they can find me.

Obviously, we want to be RFC compliant.
On that score you probably need not worry.  The RFC offers two 
distinct interpretations, and offers passages for each side to cite.

Rich

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: new InputStream class for mail data

2003-07-15 Thread Richard O. Hammer
Noel J. Bergman wrote:
I agree with you that [RFC 2821, 2.3.7] seems to clearly prohibit
SMTP clients from sending a lone CR character in message body data.


Since I am working on server-side SMTP code, I suppose I should allow
the possibility that a lone CR might come in.
And what would you do with it then?  Throw the exception, as in
CRLFTerminatedReader?
Based upon my present understanding, I would not throw an exception 
(and would not throw an exception in CRLFTerminatedReader either).  I 
would leave the lone CR in the stream, to be dealt with by whatever 
code handles it next.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: new InputStream class for mail data

2003-07-15 Thread Noel J. Bergman
> I disagree.  An empty data set can be a valid message.  I find support
> in RFC 2821 Section 4.1.1.4.

Not RFC 2821.  RFC 2822, section 3.6:

   The only required header fields are the origination date field and
   the originator address field(s).  All other header fields are
   syntactically optional.

> The illustration given by Keith (in which he adds CRLF.CRLF to each
> outgoing message without checking to see if the last "line" of
> message data already concluded with CRLF)  violates the RFC as
> I understand it.  The paragraph from 4.1.1.4, started above,
> concludes:
> "... An extra  MUST NOT be added, as that would cause an empty
> line to be added to the message.

Keep it in context.  An *extra*  would be bad.  But since Keith (and
James) strip the ., when the . is put back during
transmission, there is no extra .  If James did not strip the entire
terminator, then there would be an extra .

> In order to pass this point, to go on to the clause which allows
> the addition of a CRLF, a program would have to test whether
> there was already a concluding CRLF present.

Only the *originating* SMTP-sender is allowed to make that correction
because it is the only entity that receives the message in raw form.  That
is not the SMTP server.  It is the first thing that uses SMTP to transport
the message, e.g., Microsoft Outlook or the mail command.

> The writers of RFC 2821 didn't notice the difference, as
> in this authoritative tomfoolery, quoted again from 4.1.1.4:
> "The mail data is terminated by a line containing only a period,
> that is, the character sequence "." "

> This blithely equates a thing (a line containing only a period) with
> the indication of the thing (CRLF.CRLF).  But CRLF.CRLF is not a
> period alone in a line.

If you compare RFC 821 with RFC 2821, you will find that the latter was
explicitly edited to declare that identity.  I understand why you have a
question about the terminator's leading edge, but real requirement is to be
consistent, so that transmission does not change the contents.

In any event, if you feel that your interpretation of the RFC is correct,
and that everyone else is wrong, please contact the IETF to explain where
they went wrong, and ask them to issue a correction.  The Area Directors in
question would be:

  Ned Freed <[EMAIL PROTECTED]>
  Ted Hardie <[EMAIL PROTECTED]>

Obviously, we want to be RFC compliant.

--- Noel


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: new InputStream class for mail data

2003-07-15 Thread Richard O. Hammer
Noel J. Bergman wrote:
... as both Valdis and Keith pointed out,
[ in the thread starting at 
http://www.imc.org/ietf-smtp/mail-archive/msg00703.html ]
 an empty data set isn't a valid message.
I disagree.  An empty data set can be a valid message.  I find support 
in RFC 2821 Section 4.1.1.4.  The second paragraph starts as follows:

"The mail data is terminated by a line containing only a period, that 
is, the character sequence "." (see section 4.5.2).  This 
is the end of mail data indication.  Note that the first  of 
this terminating sequence is also the  that ends the final line 
of the data (message text) or, if there was no data, ends the DATA 
command itself."

The last sentence, as I understand it, says there can be no data.  It 
says that the first CRLF in the CRLF.CRLF sequence can be the CRLF at 
the end of the DATA command.

... As Keith illustrated, code can
ensure the proper data terminator when delivering the message via SMTP or
POP3.  What you do internally is up to the program.
The illustration given by Keith in 
 (in which he 
adds CRLF.CRLF to each outgoing message without checking to see if the 
last "line" of message data already concluded with CRLF)  violates the 
RFC as I understand it.  The paragraph from 4.1.1.4, started above, 
concludes:

"... An extra  MUST NOT be added, as that would cause an empty 
line to be added to the message.  The only exception to this rule 
would arise if the message body were passed to the originating 
SMTP-sender with a final "line" that did not end in ; in that 
case, the originating SMTP system MUST either reject the message as 
invalid or add  in order to have the receiving SMTP server 
recognize the "end of data" condition."

Note the clause, "The only exception to this rule would arise if the 
message body were passed to the originating SMTP-sender with a final 
"line" that did not end in ".  In order to pass this point, to 
go on to the clause which allows the addition of a CRLF, a program 
would have to test whether there was already a concluding CRLF present.

Please note that JavaMail sends an explicit "\r\n.\r\n" to terminate the
data stream.
In this case I believe JavaMail also violates the intent of that 
paragraph in 4.1.1.4.

I guess that this whole confusion originates in writing "CRLF.CRLF" as 
a way to indicate "a period alone in a line".
It is a darn good way to indicate a period alone in a line, because 
you need to indicate that the period in the line which you are talking 
about is the first and only character in that line.  First, before you 
start looking to see if you have a period alone, you need assurance 
that you are at the start of a line.  But once you have that assurance 
you need to set it aside and not confuse it with the thing you are 
looking for, which in this case is a period alone before CRLF.

Much of the time, for most of our purposes, we can substitute the 
indication of a thing for the thing itself and swear there is no 
difference.  The writers of RFC 2821 didn't notice the difference, as 
in this authoritative tomfoolery, quoted again from 4.1.1.4:
"The mail data is terminated by a line containing only a period, that 
is, the character sequence "." "

This blithely equates a thing (a line containing only a period) with 
the indication of the thing (CRLF.CRLF).  But CRLF.CRLF is not a 
period alone in a line.  It is (assurance of the end of a preceding 
line) + (a period alone in a line).  A + B does not equal B, not so 
long as A amounts to anything, and in this case A does amount to 
something, to CRLF.

Sometimes, when you get down into the code, you need to know that 
there is a difference between, for instance, a pointer to a variable 
and the variable itself.  But much of the time, at a certain level, 
you can forget that too.  That's the sort of confusion under this 
debate, I believe.

Rich

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: new InputStream class for mail data

2003-07-15 Thread Noel J. Bergman
Richard,

The statement that "the character sequence . IS NOT a line
containing only a period" is wrong.  As you know from your own finite state
machine, the required sequence in the stream is precisely CR-LF-.-CR-LF,
which would be interpreted as

   CR   need-lf
   LF   newline
   .line-containing-a-period-so-far
   CR   newline-dot-cr-need-lf
   LF   end-of-data

Recognizing . is not sufficient; that implies already being in the
newline state.  It is the initial  that puts the dot on a line by
itself by putting your FSM into LINE_STARTING_STATE.

Yes, there is a bug in James because it separates the command and data
streams, and doesn't start the data stream in the NEWLINE state.  And that
should be fixed, although as both Valdis and Keith pointed out, an empty
data set isn't a valid message.

The . is written as a marker for another reason, and that is to
indicate that the entire sequence is the terminator.  That was the point of
apparent disagreement between Keith and Daniel Bernstein.  But they weren't
really disagreeing.  The RFC doesn't require you to store  or any
other line terminator at all.  It only requires that whatever you use to
represent line separation internally, you must use  to separate lines
in the stream when transmitting the data.  As Keith illustrated, code can
ensure the proper data terminator when delivering the message via SMTP or
POP3.  What you do internally is up to the program.

Right now, as you noted
(http://marc.theaimsgroup.com/?l=james-dev&m=105527214016488&w=2), James
strips the entire . sequence, which is precisely the behavior
that both Valdis ad Keith told you was correct.  Then the POP3 handler will
send the entire . sequence, as would the SMTP transport.

Please note that JavaMail sends an explicit "\r\n.\r\n" to terminate the
data stream.  I believe that you indicated that you plan to use JavaMail.

> I see that this submission makes work for you if you undertake to
> implement and test it.

Anything that deals with RFC compliance deserves special attention, and you
seem to have a different understanding of the RFC, e.g.,

 *   When the end of mail data indicator is recognized in the input
stream,
 *the CRLF which immediately preceded the period in the indicator is
 *returned as part of the mail data as the CRLF which concludes the
final
 *line of mail data, rather than being discarded as part of the end of
mail
 *data indicator.

None of this is to say that SMTPInputStream might not be useful, but if you
make different assumptions about the RFC, then we would have to adjust the
code in order to use it.

FilteredInputStream is the right thing to extend.  SMTPInputStream
implements read() based upon the real stream, which is the one
non-constructor method that is correct in the code.  The rest are wrong.
For example:

public void close() throws IOException{
super.close();
}

explicitly illustrates a problem that is implicit with all of the inherited
methods.  The code invokes the inherited implementation, often a NO-OP.  It
ought to be delegating to the real stream.  FilteredInputStream provides the
core wrapper for delegation, allowing you to override just those methods
that implement your unique behavior.

--- Noel


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: new InputStream class for mail data

2003-07-15 Thread Noel J. Bergman
> I agree with you that [RFC 2821, 2.3.7] seems to clearly prohibit
> SMTP clients from sending a lone CR character in message body data.

> Since I am working on server-side SMTP code, I suppose I should allow
> the possibility that a lone CR might come in.

And what would you do with it then?  Throw the exception, as in
CRLFTerminatedReader?

--- Noel

-Original Message-
From: Richard O. Hammer [mailto:[EMAIL PROTECTED]
Sent: Tuesday, July 15, 2003 0:11
To: James Developers List
Subject: Re: new InputStream class for mail data


Noel J. Bergman wrote:
> It is not possible, by definition, because it is not permitted.  The RFC
is
> crystal clear on this point:
>
> RFC 2821, 2.3.7 Lines
>
>SMTP commands and, unless altered by a service extension, message
>data, are transmitted in "lines".  Lines consist of zero or more data
>characters terminated by the sequence ASCII character "CR" (hex value
>0D) followed immediately by ASCII character "LF" (hex value 0A).
>This termination sequence is denoted as  in this document.
>Conforming implementations MUST NOT recognize or generate any other
>character or character sequence as a line terminator.  Limits MAY be
>imposed on line lengths by servers (see section 4.5.3).
>
>In addition, the appearance of "bare" "CR" or "LF" characters in text
>(i.e., either without the other) has a long history of causing
>problems in mail implementations and applications that use the mail
>system as a tool.  SMTP client implementations MUST NOT transmit
>these characters except when they are intended as line terminators
>and then MUST, as indicated above, transmit them only as a 
>sequence.
>
>
>>the fact that it will probably be munged in transport is a srong
>>disincentive to sending it, but doesn;t prohibit it
>
>
> I would say that the above paragraphs constitute prohibition.

Thank you, Noel, for calling my attention back to that section, 2.3.7.
  I had been thinking that section dealt principally with envelope
command lines, but now I see that it does also deal with message body
data.  And yes, I agree with you that this seems to clearly prohibit
SMTP clients from sending a lone CR character in message body data.

Since I am working on server-side SMTP code, I suppose I should allow
the possibility that a lone CR might come in.

Rich


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: new InputStream class for mail data

2003-07-14 Thread Richard O. Hammer
I have described how I believe the recognition of "a period alone in a 
line" differs from the recognition of "CRLF.CRLF" in a few places, 
including this list:



And I have described the different behavior which results from 
recognizing one or the other as the "end of data indicator" in the 
Javadoc at the head of the file SMTPDataInputStream.java which I 
submitted to this list in email on July 2.

I think I have said it clearly in those places, and I doubt that I do 
any good by saying it again now.

Noel J. Bergman wrote:
Now, as for the code, itself.

As I said to Serge, I hadn't had time to test your code.  Also, I'm not
quite sure what goal you are trying to achieve with the change.  Would you
please elaborate?
I am responding in large part to my desire to understand my own code 
and to believe in it.  As I told before I am developing an email 
server which draws from James but which will differ in a number of 
respects.  I appreciate the lessons I learn from James and would like 
if possible to give back in some way.

In developing SMTPDataInputStream I was imagining that it might find 
use in both James and in my project.  I was trying mainly to write for 
my own project, taking the best of what I could learn from James while 
adding what I consider to be my own improvements.   But I thought that 
it might also be acceptable to the James project, so I shaped it to 
fit into James with as few changes as possible.

I see that this submission makes work for you if you undertake to 
implement and test it.  And it is in a part of James which, so far as 
I know, already seems to be working fine.  As such it may be best for 
you to set it aside and consider it no further, until such time as 
your priorities might bring you back to this area to do some refactoring.

You wrote that "The code we are using now employs buffer after buffer, and I
suspect that this redundant buffering may be unnecessary", but the only
buffers that I am finding present in the SMTP handler at the moment (I could
have missed something) are the BufferedInputStream assigned to "in", and the
line buffer in CRLFTerminatedReader.
Thank you, I stand corrected in large part.  I guess I was assuming 
that one of CharTerminatedInputStream, BytesReadResetInputStream, 
SizeLimitedInputStream, or DotStuffingInputStream, employed a 
BufferedInputStream in addition to the BufferedInputStream already 
created in SMTPHandler.handleConnection(), and I see that is not the 
case.  As you point out, there is redundant buffering in that 
CRLFTerminatedReader extends BufferedReader (in code which I suggested 
last month); I now believe that redundant buffering should be removed. 
 I notice that DotStuffingInputStream keeps a two-byte buffer with 
every byte that passes through, and CharTerminatedInputStream keeps a 
little buffer whenever a CRLF passes through.

> ...  The rest of the streams are
> unbuffered, and just add behavior.  One is a FilteredInputStream 
subclass,
> and the rest probably should be, including yours.

Thank you.  I had not considered extending extending 
FilteredInputStream.  From the Javadoc it does like that might be an 
improvement.  Why would you say it might be better?

Rich

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: new InputStream class for mail data

2003-07-14 Thread Richard O. Hammer
Noel J. Bergman wrote:
It is not possible, by definition, because it is not permitted.  The RFC is
crystal clear on this point:
RFC 2821, 2.3.7 Lines

   SMTP commands and, unless altered by a service extension, message
   data, are transmitted in "lines".  Lines consist of zero or more data
   characters terminated by the sequence ASCII character "CR" (hex value
   0D) followed immediately by ASCII character "LF" (hex value 0A).
   This termination sequence is denoted as  in this document.
   Conforming implementations MUST NOT recognize or generate any other
   character or character sequence as a line terminator.  Limits MAY be
   imposed on line lengths by servers (see section 4.5.3).
   In addition, the appearance of "bare" "CR" or "LF" characters in text
   (i.e., either without the other) has a long history of causing
   problems in mail implementations and applications that use the mail
   system as a tool.  SMTP client implementations MUST NOT transmit
   these characters except when they are intended as line terminators
   and then MUST, as indicated above, transmit them only as a 
   sequence.

the fact that it will probably be munged in transport is a srong
disincentive to sending it, but doesn;t prohibit it


I would say that the above paragraphs constitute prohibition.
Thank you, Noel, for calling my attention back to that section, 2.3.7. 
 I had been thinking that section dealt principally with envelope 
command lines, but now I see that it does also deal with message body 
data.  And yes, I agree with you that this seems to clearly prohibit 
SMTP clients from sending a lone CR character in message body data.

Since I am working on server-side SMTP code, I suppose I should allow 
the possibility that a lone CR might come in.

Rich

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: new InputStream class for mail data

2003-07-14 Thread Noel J. Bergman
> > The RFC states that  MUST NOT appear except paired with .
> > You know this because we addressed that in CRLFTerminatedReader.

> But this contingency cannot be discounted, surely?

It absolutely MUST be discounted.  I don't see that the RFC gives any
discretion.

> While most people will send mail using well behaved clients it is
_possible_
> for CR to appear on its own, particularly in unencoded binary data,

It is not possible, by definition, because it is not permitted.  The RFC is
crystal clear on this point:

RFC 2821, 2.3.7 Lines

   SMTP commands and, unless altered by a service extension, message
   data, are transmitted in "lines".  Lines consist of zero or more data
   characters terminated by the sequence ASCII character "CR" (hex value
   0D) followed immediately by ASCII character "LF" (hex value 0A).
   This termination sequence is denoted as  in this document.
   Conforming implementations MUST NOT recognize or generate any other
   character or character sequence as a line terminator.  Limits MAY be
   imposed on line lengths by servers (see section 4.5.3).

   In addition, the appearance of "bare" "CR" or "LF" characters in text
   (i.e., either without the other) has a long history of causing
   problems in mail implementations and applications that use the mail
   system as a tool.  SMTP client implementations MUST NOT transmit
   these characters except when they are intended as line terminators
   and then MUST, as indicated above, transmit them only as a 
   sequence.

> the fact that it will probably be munged in transport is a srong
> disincentive to sending it, but doesn;t prohibit it

I would say that the above paragraphs constitute prohibition.

--- Noel


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: new InputStream class for mail data

2003-07-14 Thread Danny Angus

> The RFC states that  MUST NOT appear except paired with .
>  You know
> this because we addressed that in CRLFTerminatedReader.

But this contingency cannot be discounted, surely?
While most people will send mail using well behaved clients it is _possible_
for CR to appear on its own, particularly in unencoded binary data, the fact
that it will probably be munged in transport is a srong disincentive to
sending it, but doesn;t prohibit it, and bearing in mind how easy it is to
write poor smtp clients i'd be suprised if there weren't thousands of
unchaperoned CR's flitting round the world everyday.

d.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: new InputStream class for mail data

2003-07-14 Thread Noel J. Bergman
> The changes in behavior are arguable.

Not really.  The RFC is clear enough.

RFC 2821, section 3.3:

   SMTP indicates the end of the mail data by sending a
   line containing only a "." (period or full stop).

> I argue that the right end of data indicator to recognize
> is "a period alone in a line" rather than "CRLF.CRLF", but
> it seems that many people see it differently.

No "argument" of this nature is necessary.  "A period alone in a line" *IS*
..  They are identical, by definition.  RFC 2821, section
4.1.1.4, states:

   The mail data is terminated by a line containing only a period, that
   is, the character sequence "." (see section 4.5.2).  This
   is the end of mail data indication.  Note that the first  of
   this terminating sequence is also the  that ends the final line
   of the data (message text) or, if there was no data, ends the DATA
   command itself.  An extra  MUST NOT be added, as that would
   cause an empty line to be added to the message.  The only exception
   to this rule would arise if the message body were passed to the
   originating SMTP-sender with a final "line" that did not end in
   ; in that case, the originating SMTP system MUST either reject
   the message as invalid or add  in order to have the receiving
   SMTP server recognize the "end of data" condition.

Ironically, you made note of those two specific sections, but you found
ambiguity in your reading.  There is no ambiguity involved.  There IS a
. in all cases.  The only "trick" is realizing that the first
 is the one that terminated the DATA command or line of data.  The
only EXTRA data is the ., but it must be preceded by a  in valid
SMTP messages.  Lines are separated by , therefore in order to be
alone on a line, you must be contiguous with  on either side.



Now, as for the code, itself.

As I said to Serge, I hadn't had time to test your code.  Also, I'm not
quite sure what goal you are trying to achieve with the change.  Would you
please elaborate?

You wrote that "The code we are using now employs buffer after buffer, and I
suspect that this redundant buffering may be unnecessary", but the only
buffers that I am finding present in the SMTP handler at the moment (I could
have missed something) are the BufferedInputStream assigned to "in", and the
line buffer in CRLFTerminatedReader.  The rest of the streams are
unbuffered, and just add behavior.  One is a FilteredInputStream subclass,
and the rest probably should be, including yours.

There USED to be a problem with redundant buffering.  Serge thought that he
had a solution to it around the New Year, but it didn't work, so we reverted
it as there are higher priorities to change in the code.  No one went back
to find out what was wrong, but you ended up fixing it quite neatly with
your CRLFTerminatedReader class.

So the only two buffers are (1) used to provide efficiency into the protocol
stack, and (2) used handle line accumulation.  The DATA command is processed
as a stream, and does not use the line buffer.

By the way, I was surprised to find this in your code:

   /* We have received the sequence
  PERIOD CR
  at the beginning of a line, but it is followed
  by something other than LF .
  So this is an unusual case of dot stuffing.
  We return CR, and buffer b to return on the next
  call.
*/

The RFC states that  MUST NOT appear except paired with .  You know
this because we addressed that in CRLFTerminatedReader.

The primary reasons I have to change the I/O handling is to support nio.
There is some value to reducing the method chaining, but there is a tradeoff
regarding method complexity.  If you can point out where I am missing
redundant buffers, that's fine.  I'm all for eliminating any redundancy, but
right now the only redundant data I see is the accumulated data in the line
buffer.  Unless a solution addressed the nio issue, or I'm missing the key
point, I don't see much of a reason to change.

Obviously you had a reason for going to all this trouble, so what am I
missing?

--- Noel


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: new InputStream class for mail data

2003-07-14 Thread Noel J. Bergman
> My 2c is that I'd like to see RFC compliance tested this way.

That would be my first priority.

--- Noel

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: new InputStream class for mail data

2003-07-14 Thread Noel J. Bergman
> I can't remember if this has been reviewed and applied or not

Not yet.  I haven't had time to test it, and he didn't provide a test case.
There are a number of patches that have languished so far because of lack of
time to test.

Vincenzo is currently integrating and testing Soren's mail attribute code.

Having just submitted a patch to Mars because he got line termination wrong
for SMTP tests, I think it is important that we test patches, especially
those going into v2.  We have added a lot of new functionality to v2.2, and
I am feeling that it is needing more testing before we can declare it
stable.

--- Noel


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: new InputStream class for mail data

2003-07-14 Thread Richard O. Hammer
Serge Knystautas wrote:
... if you can 
submit this to bugzilla, it will help us track that it's reviewed.
It did not occur to me to use bugzilla since the improvements, which I 
believe SMTPDataInputStream offers, are not necessarily fixes to bugs. 
  The changes in behavior are arguable.  I argue that the right end 
of data indicator to recognize is "a period alone in a line" rather 
than "CRLF.CRLF", but it seems that many people see it differently.

Should I use bugzilla to express my opinions about improvements needed?

> ... JUnit tests.

I did some JUnit tests, not comparing old vs. new, but only testing 
the new.  I attach a file containing that class.  Earlier I had tested 
the old (existing) behavior, as I reported briefly to this list in 
this message on June 10, 
.

Rich

import java.io.BufferedInputStream;
import java.io.ByteArrayInputStream;


import junit.framework.TestCase;

public class TestSIS extends TestCase {
SMTPDataInputStream sis;

public TestSIS(String whooo){
super(whooo);
}

static String food = 
  ".dogs\r. sp\n"
+ ".\r\n"
+ ".\r8\r\n"
+ ".\r\r\no*"
+ " \r\n.\r\n"
+ "I guess we shouldn't get here.\r\n";

//test response to a challenging sequence of characters
public void testFood0() throws Exception {
sis=constructSIS(food, 25, 7);
assertEquals((byte)sis.read(),(byte)'d');
assertEquals((byte)sis.read(),(byte)'o');
assertEquals((byte)sis.read(),(byte)'g');
assertEquals((byte)sis.read(),(byte)'s');
assertEquals((byte)sis.read(),(byte)13);
assertEquals((byte)sis.read(),(byte)'.');
assertEquals((byte)sis.read(),(byte)' ');
assertEquals((byte)sis.read(),(byte)'s');
assertEquals((byte)sis.read(),(byte)'p');
assertEquals((byte)sis.read(),(byte)012);
assertEquals((byte)sis.read(),(byte)'.');
assertEquals((byte)sis.read(),(byte)13);
assertEquals((byte)sis.read(),(byte)10);
assertEquals((byte)sis.read(),(byte)13);
assertEquals((byte)sis.read(),(byte)'8');
assertEquals((byte)sis.read(),(byte)13);
assertEquals((byte)sis.read(),(byte)10);
assertEquals((byte)sis.read(),(byte)13);
assertEquals((byte)sis.read(),(byte)13);
assertEquals((byte)sis.read(),(byte)10);
assertEquals((byte)sis.read(),(byte)'o');
assertEquals((byte)sis.read(),(byte)'*');
assertEquals((byte)sis.read(),(byte)' ');
assertEquals((byte)sis.read(),(byte)13);
assertEquals((byte)sis.read(),(byte)10);
assertEquals((byte)sis.read(),(byte)-1);
assertEquals((byte)sis.read(),(byte)-1);
assertEquals((byte)sis.read(),(byte)-1);
sis.close();
}

//test attempt to read beyond maxMessageSize
public void testFood1() throws Exception {
sis=constructSIS(food, 6, 7);
assertEquals((byte)sis.read(),(byte)'d');
assertEquals((byte)sis.read(),(byte)'o');
assertEquals((byte)sis.read(),(byte)'g');
assertEquals((byte)sis.read(),(byte)'s');
assertEquals((byte)sis.read(),(byte)13);
assertEquals((byte)sis.read(),(byte)'.');
Exception ex = null;
try{
sis.read();
}catch (Exception e){
ex = e;
}
assertTrue(ex instanceof MessageSizeException);
sis.close();
}

//test same with different reset interval
public void testFood2() throws Exception {
sis=constructSIS(food, 6, 1);
assertEquals((byte)sis.read(),(byte)'d');
assertEquals((byte)sis.read(),(byte)'o');
assertEquals((byte)sis.read(),(byte)'g');
assertEquals((byte)sis.read(),(byte)'s');
assertEquals((byte)sis.read(),(byte)13);
assertEquals((byte)sis.read(),(byte)'.');
Exception ex = null;
try{
sis.read();
}catch (Exception e){
ex = e;
}
assertTrue(ex instanceof MessageSizeException);
sis.close();
}

//test same with different reset interval
public void testFood3() throws Exception {
sis=constructSIS(food, 6, 6);
assertEquals((byte)sis.read(),(byte)'d');
assertEquals((byte)sis.read(),(byte)'o');
assertEquals((byte)sis.read(),(byte)'g');
assertEquals((byte)sis.read(),(byte)'s');
assertEquals((byte)sis.read(),(byte)13);
assertEquals((byte)sis.read(),(byte)'.');
Exception ex = null;
try{
sis.read();
}catch (Exception e){
ex = e;
}
assertTrue(ex instanceof MessageSizeException);
sis.close();
}

//test same with different reset interval
public void testFood4() throws Exception {
sis=constructSIS(food, 6, 3);
assertEquals((byte)sis

RE: new InputStream class for mail data

2003-07-14 Thread Danny Angus
> If you had any you made to test the old vs. new, that'd be great to see
> as well.  We don't have a framework in CVS yet for running JUnit tests,
> but I hope to change this before too long.

We do have some junit tests in /tests
What we don't have is either a comprehensive set of tests, or an automated
test procedure, which would probably require the ability to start and stop
James from Ant.

My 2c is that I'd like to see RFC compliance tested this way.

d.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: new InputStream class for mail data

2003-07-14 Thread Serge Knystautas
Richard O. Hammer wrote:
Attached you will find a class that I offer for your consideration. It 
is called SMTPDataInputStream and it would be used in the doDATA() 
method of SMTPHandler.  There it would replace 
CharTerminatedInputStream, BytesReadResetInputStream, 
SizeLimitedInputStream, and DotStuffingInputStream, since I believe it 
does the work of all those.
Also, I know this isn't very widely adopted in James yet, but for small 
code blocks like this, I'd really like to start adopting JUnit tests.

If you had any you made to test the old vs. new, that'd be great to see 
as well.  We don't have a framework in CVS yet for running JUnit tests, 
but I hope to change this before too long.

--
Serge Knystautas
President
Lokitech >> software . strategy . design >> http://www.lokitech.com
p. 301.656.5501
e. [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: new InputStream class for mail data

2003-07-14 Thread Serge Knystautas
Richard,

I can't remember if this has been reviewed and applied or not, which 
transitions nicely to what I was going to suggest anyway... if you can 
submit this to bugzilla, it will help us track that it's reviewed.

Thanks for the patch!

--
Serge Knystautas
President
Lokitech >> software . strategy . design >> http://www.lokitech.com
p. 301.656.5501
e. [EMAIL PROTECTED]
Richard O. Hammer wrote:
Attached you will find a class that I offer for your consideration. It 
is called SMTPDataInputStream and it would be used in the doDATA() 
method of SMTPHandler.  There it would replace 
CharTerminatedInputStream, BytesReadResetInputStream, 
SizeLimitedInputStream, and DotStuffingInputStream, since I believe it 
does the work of all those.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


new InputStream class for mail data

2003-07-02 Thread Richard O. Hammer
Attached you will find a class that I offer for your consideration. It 
is called SMTPDataInputStream and it would be used in the doDATA() 
method of SMTPHandler.  There it would replace 
CharTerminatedInputStream, BytesReadResetInputStream, 
SizeLimitedInputStream, and DotStuffingInputStream, since I believe it 
does the work of all those.

There are minor ways in which this class responds differently, and
more correctly I believe, than the current CharTerminatedInputStream. 
 The end of mail data indicator which this recognizes is a period
alone in a line, rather than CRLF.CRLF as recognized in James at
present.  As such this class returns the CRLF which terminates the
last line of message body data (the CRLF before the period), and it
allows the possibility of empty mail data (with a period being the
first character sent in the data).

I raised the question of whether these changes are indeed a better
interpretation of RFC 2821 on the mailing list [EMAIL PROTECTED]  If
you like you can see that thread starting at
http://www.imc.org/ietf-smtp/mail-archive/msg00703.html .  If you read
that thread and think the question remains unresolved you might adopt
the bias which I use in this case: multiply the value of each word
from Dan Bernsetin by 1000.
Improved performance would be another advantage which I would expect
from adopting this class in James.  The present stack of InputStreams
makes more method calls for each byte read than this replacement. But, 
I have to admit, it could be that performance in this code is not a 
major concern.

It is possible that this class could also serve in NNTPHandler to
replace CharTerminatedInputStream there, but I am not familiar with
the code in NNTPHandler.
I also attach a file showing old and new blocks of code in 
SMTPHandler, showing changes needed to employ this new 
SMTPDataInputStream.  In addition to these changes, the package name 
for SMTPDataInputStream will need to be corrected, as well as the 
importing of WatchDog and MessageSizeException.

I have tested this SMTPDataInputStream as a unit, and it performs 
correctly in every test I have thought to give it.  But I have not 
tested it installed in SMTPHandler.

Rich Hammer
Hillsborough, N.C.

import java.io.BufferedInputStream;
import java.io.IOException;
import java.io.InputStream;

/** 
 * An InputStream for SMTP message body data. Performs four of the functions
 *  needed while receiving the message body in the SMTP DATA command:
 * 
 *   watches for the end of mail data indicator, and signals this by
 *   returning EOF, that is -1.
 *   
 *   removes dot stuffing as described in RFC 2821 section 4.5.2.
 *   
 *   works with james.util.watchdog.Watchdog to police the minimum rate of 
 *   data transfer.  Calls Watchdog.reset() every time a certain number of
 *   bytes have been read, thus forestalling intervention of the watchdog for
 *   another time increment.
 *   
 *   optionally polices the total size of the data.  Throws 
 *   MessageSizeException if this exceeds a limit.
 *  
 * 
 * 
 * The end of mail data indicator which this class recognizes is a period
 * alone in a line.  This indicator is often described as "CRLF.CRLF", but that
 * description leads to errors in possibly minor ways.  The better 
 * description which this class recognizes, "a period alone in a line", leads
 * to better behavior in two ways:
 * 
 *   When the end of mail data indicator is recognized in the input stream,
 *the CRLF which immediately preceded the period in the indicator is 
 *returned as part of the mail data as the CRLF which concludes the final
 *line of mail data, rather than being discarded as part of the end of mail
 *data indicator.
 *   
 *   The end of mail data indicator can occur in the very first line of 
 *   mail data, with the period being the first character read.
 *   
 * 
 * RFC 2821 discusses this in sections 2.3.7, 3.3, 4.1.1.4, 4.5.2.
 * 
 *  This class resets the WatchDog each time it has read a quota of bytes
 * as specified in the constructor.  But it does not reset or stop the 
 * WatchDog when it recognizes the end of mail data indicator and returns EOF.
 * 
 * This class returns EOF in two circumstances: when it recognizes the end 
 * of mail data indicator in the stream (a normal occurrence); when the 
 * underlying stream signals EOF (probably an error of some sort).  This 
 * behavior may be okay, in that it mimics the behavior of the earlier James 
 * class CharTerminatedInputStream, but it may need further examination at some
 * point.
 * 
 * An instance of this class can not be reset.  A new instance must 
 * constructed for each message's data.
 * 
 */
public class SMTPDataInputStream extends InputStream{
BufferedInputStream in;

/* For a discussion of some decisions made in designing this class,
 * see the comment at the end.
 */

// The kinds of bytes we care about
static final int
EOF= -1,
CR = 13,
LF =