from:"Brett Henderson"

RE: [codec] StatefulDecoders

2004-03-31 Thread Brett Henderson

Alex,

Sorry about the delay.  I'm a bit snowed under at the moment.

I've attached producers/consumers that process Object instances instead of
type specific data.

Some differences between these interfaces and those you've already built
are:
1. These have no monitor facility.  All errors result in a CodecException
being thrown.  There is no concept of a warning.
2. These have a finalize concept which is required for implementations such
as base64 where padding on the final data block is required.
3. These have flush methods to allow the chain to be flushed without being
finalized.
4. These have a propogating flag which allows finalize/flush calls to be
propagated through codec chains.  By default this is true but can be set to
false.  This is necessary when a single consumer (eg. OutputStreamConsumer)
receives streams from multiple sources and each of those sources are
finalized before the next is started.  In this case you don't want the
OutputStreamConsumer to be finalized (and the underlying stream closed)
multiple times, you want this to occur only after the final input source
completes.  Hope this makes sense.

With regards to each difference:
1. Not sure of the correct approach here.  I threw exceptions because it was
simpler to implement and made it harder to end up with silent errors
occurring.  A monitor approach is more flexible although perhaps harder to
use from a client perspective.
2. I believe you will need to add a finalize concept.  Some codecs require
notification that this is the final processing call (ie. Base64).
3. Flush isn't critical.  I just added it for completeness.
4. A propagating option isn't critical and java IOStreams don't have this
concept.  However a common problem is where you wish to feed the result of
several streams into a single stream without the close on each top level
stream calling close on the receiving stream.  Another way of overcoming
this is to create a special Noop Nopropagate codec that you insert into the
chain to prevent these calls propagating.

I meant to create some sample code using my interfaces to compare with yours
but I can't get it done at the moment.

You're obviously clued in on what's required, any differences between mine
and yours is relatively small and I'm sure either would suit the purposes of
codec.

Given that I'm taking way too long to do anything at the moment I'll leave
it in your capable hands.  My current work should ease up in a few weeks and
I'll try to give you a hand again then.

Cheers,
Brett

 Could you give some example's of how this would look just using
 
 Objects instead of specific types to implement what the DecoderStack
 
 does here:
 
 
 
 http://cvs.apache.org/viewcvs.cgi/incubator/directory/snickers
 /trunk/codec-s
 
 tateful/src/java/org/apache/commons/codec/stateful/DecoderStac
 k.java?rev=972
 
 4root=Apache-SVNview=auto
 
 
 
 And go on to show how it's used like in this test code here:
 
 
 
 http://cvs.apache.org/viewcvs.cgi/incubator/directory/snickers
 /trunk/codec-s
 
 tateful/src/test/org/apache/commons/codec/stateful/DecoderStac
 kTest.java?rev
 
 =9724root=Apache-SVNview=auto
 
 
 
 Specifically I'm referring to the usage of the DecoderStack in the
 
 testDecode() example which shows the chaining really simply.
 
 Perhaps looking at the two use cases we can come to a better
 conclusion.
 
 
 
 
 
  Does the above make sense?  If so, please give it careful
  consideration
 
  because I originally used the callback design and modified it to use
 
  producers/consumers because I think it is actually simpler
 and is much
 
  more
 
  flexible.
 
 
 
 Yes it makes sense I just want to see it and play with it.
 Can you whip
 
 it up and we'll begin getting a feel for both sets of interfaces.
 
 
 
  If you're still not convinced I guess I'll have to give in
 and go with
  the
 
  flow ;-)
 
 
 
 Nah we'll try come to an understanding.
 
 
 


Producer.java
Description: java/


Consumer.java
Description: java/
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] StatefulDecoders

2004-03-24 Thread Brett Henderson

Alex,

I haven't had a chance to respond to your email yet.  I'll try to do so
tonight.

I'll knock up a couple of quick interfaces for comparison at the same time.

Cheers,
Brett


 -Original Message-
 From: Alex Karasulu [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, 24 March 2004 12:23 PM
 To: 'Jakarta Commons Developers List'
 Subject: RE: [codec] StatefulDecoders
 
 
 Brett,
 
 
 
 Ok let's take a breath and dive into this email :-).
 
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] StatefulDecoders

2004-03-10 Thread Brett Henderson

 -Original Message-
 From: Noel J. Bergman [mailto:[EMAIL PROTECTED]
 Sent: Monday, 8 March 2004 3:25 PM
 To: Jakarta Commons Developers List
 Subject: RE: [codec] StatefulDecoders

 Consider the Cocoon (http://cocoon.apache.org/) pipeline for the 
 concept of

 a fully event-driven programming approach, although their 
 implementation has

 far too much overhead for codec purposes (or the regex events I 
 mentioned).

I still intend to look at this although my list of reading seems to grow
daily ...

 IMO, we want a consistent interface that provides the fundamental

 operations, and then we can build convenience on top of that interace.

 Your interface is closer to what I had in mind than Alex's, at least 
 using

 more of the generic terminology.  The codec domain can be expressed 
 using

 the pipeline interface, or if we want a codec specific interface, 
 Alex's

 could be a convenience layer on top of the pipeline.

 What I had imagined is an approach where each element in the pipeline

 supports a registering for variety of event notifications. Some of 
 them may

 be generic, some of them may be domain specific.  Most codec uses 
 would use

 generic events.

 The key is being able to go an object and register for semantic 
 events.  So

 if you assume that a transformer is both a producer and a consumer, 
 you

 could register a transformer with a datasource (producer), and 
 register

 downstream consumers that want the decoded data with the transformer.  
 Yes,

 we would want to allow both fan-in and fan-out where appropriate.

I think my design already supports most of the above ideas, they just have
to be implemented as required for the particular usage.  For example, a
Base64Encoder already supports registering for events of type byte[].  A
MIMEMultipartDecoder could generate events of type MIMEPart.  The event
types are not specified by the existing implementation, they can be added as
necessary for the particular feature.  Fan out is achieved by creating a
multicast stage accepting single events from a producer and passing them to
multiple consumers of the same type (although I haven't implemented a
multicast stage because I haven't needed it yet :-).  Fan in can already be
handled by setting a single consumer as the destination for multiple
producers.

Perhaps this isn't quite what you're envisaging.  You may like to see a more
generic approach that allows events and pipelines to be described in more
abstract terms.  Unfortunately I can't see a way of achieving this without
making the API complex and imposing overhead.  If you're looking for a more
powerful approach, should it be implemented outside of codec where runtime
issues aren't quite as critical?

I guess it depends on what problems you're trying to solve.  If you wish to
process large streams of data in an efficient manner my implementation is a
good fit, if you're looking to process structured data (eg. MIME) it can be
extended to fit as required, if you're looking to use it as the basis of
communication and processing within a server then it isn't up to the task.
However, isn't the last point outside the scope of codec and more in the
realm of other designs/libraries such as SEDA.

 As for the details of message transport ... it seems to me that we 
 already

 have multiple options, so I'm not sure that we want to roll our own 
 versus

 adopting and/or adapting an existing one.

 We have JMS, and the new concurrency package coming in JDK 1.5.  One 
 thing

 that got botched in JDK 1.5 is that they removed the Putable and 
 Takable

 interfaces that Doug Lea had used in his library, instead merging 
 their

 functionality directly onto the queue interface, falsely believing 
 that a

 message queue is a Collection.  A number of us argued for those 
 interfaces,

 and Doug proposed a change, but it was vetoed by Sun.

 I see a few options, such as:

   - we pick up the necessary interfaces from Doug's concurrent

 library, and deal with java.util.concurrent down the road.

   - we use JMS interfaces in a very simplied form.

 As potentially whacked as the idea might be, considering the 
 complexity of

 JMS, I believe that we could selectively use JMS interfaces without 
 undo

 complexity or hurting performance.  Basically, we'd ignore the things 
 that

 don't make any sense in our context.  Take a look at MessageProducer,

 MessageConsumer and MessageListener.  Intelligently, they are just

 interfaces.  We don't need multi-threading, network transports, etc., 
 in

 general, although by using those interfaces, they would be available 
 where

 applications warranted them.

 ref:

 http://java.sun.com/j2ee/sdk_1.3/techdocs/api/javax/jms/packag
 e-summary.html

 Alternatively, I had this odd thought that we could use the Callable

 interface present in both Doug's code and java.util.concurrent.

   JSR-166:

RE: [codec] StatefulDecoders

2004-03-10 Thread Brett Henderson

 Take a look at the Reclaiming Type Safety section in this 
 article on the 
 
 event notification pattern here:
 
 
 
 http://members.ispwest.com/jeffhartkopf/notifier/
 
Cool, that's a neat way of achieving type safety.  Avoiding downcasts (eg.
Object to byte[]) is a good thing.  It still relies on a runtime check but
is only performed in one piece of code instead of every implementation of an
event receiver.

Advantages:
Type safety enforced in a single class instead of using downcasts within
each event receiver.
Single event method defined in interfaces instead of methods per event type.
No need to define separate interfaces per event type.

Disadvantages:
No compile time type checking, incorrect types may not be picked up during
development.
Runtime overhead to perform reflection on event receiver class and locate
type specific event receiver method.
Runtime overhead converting from generic event type to specific event type.

I don't know if compile time or runtime checks should be used but if runtime
checks are chosen then this pattern is a good way of enforcing them.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] StatefulDecoders

2004-03-08 Thread Brett Henderson

 How about we put our minds together and finalize some of this
 stuff so I can
 
 start writing some codecs that can be added back to this project?

Yeah definitely, sounds like we're trying to solve the same problem here.

I haven't responded to your previous emails because I haven't contributed
before and was leaving opinions to those who've actually proven themselves.

   In general, I have long preferred the pipeline/event model to
 
   the approach
 
  
 
   that Alex had, where it would give data to the codec, and
 
   then poll it for
 
 
 
 Agreed! Mine approach was not the best but have you had a
 chance at looking
 
 at the new interfaces that I sent out with the callbacks.
 Shall I resend
 
 those?
 

I still have them here.  I'll comment on them further down.

 Let me just list the requirements one more time:
 
 1). Interfaces should allow for implementations that perform
 piece meal
 
 decodes
 
- enables implementations to have constant sized
 processing footprints
 
- enables implementations to have efficient non-blocking
 and streaming
 
 operation

Agreed.

 2). Easily understood and simple to use

Agreed, although needs to be weighed up with any conflicting requirements.

 3). Interfaces should in no way shape or form restrict or limit the
 
 performance of implementations what ever they may be.

Agreed, although without knowing all of these implementations in advance we
can never be sure ;-)

 
  You're right, my design has no concept of structured content. It was
 
  developed to solve a particular problem (ie. efficient
 streamable data
 
  manipulation).  If API support for structured content is
 required then
  my
 
  implementation doesn't (yet) support it.
 
 
 
 You can build on this separately no?  There is no need to
 have the codec
 
 interfaces take this into account other then allow this
 decoration in the
 
 future rather than inhibit it.
 

Yes, I can build on it separately, however a new set of producers and
consumers are needed for each type of structured data.  I don't see this as
a problem because trying to make this too generic may lead to loss of
performance and a complicated API.

 
  I'll use engine for the want of a better word to describe
 an element
  in a
 
  pipeline performing some operation on the data passing through it.
 
 
 
 SEDA calls this a stage btw.

Much better :-)

 
 With codecs the encoding is variable right?  It could be anything.
 
 Something has to generate events/callbacks that delimit
 logical units of the
 
 encoding what ever that may be.  For some encodings that you mentioned
 
 (base64) there may not be a data structure but the unit of
 encoding must be
 
 at least two characters for base64 I think.  Please correct
 me if I'm wrong.

3 byte input 4 byte output for encoding, and 4 byte input 3 byte output for
decoding.  Input is padded if not a multiple of 3 bytes.

 
 So there is some minimum unit size that can range from one
 byte to anything
 
 and this is determined by the codec's encoding and reflected
 in some form of
 
 callback.  SAX uses callbacks to allow builders that are
 content aware do
 
 their thing right?  Now I'm not suggesting that a base64 codec's
 
 encoder/decoder pairs make callbacks on every 2 or single
 byte (depending on
 
 your direction).  In the case of such a non-structured
 decoder the buffer
 
 size would be the determining factor or the end of the stream.

Agreed.

 
 
 
 So I think we need to use callbacks to let decoders tell us
 when they hit
 
 some notable event that needs attention whatever that may be.

I agree in principle here although I'm not sure that I agree with the
structure of callbacks.  I'll explain more later.

   operations.  These are pipelines; receiving content on one
 
   end, performing
 
  
 
   operations, and generating events down a chain.  More than
 
   one event could
 
  
 
   be generated at any point, and the chain can have multiple paths.
 
 
 
 This, the pipelining notion, IMHO is overly complicated for
 building out
 
 codec interfaces.  The pipeline can be built from the smaller
 simpler parts
 
 we are discussing now.  We must try harder to constrain the scope of a
 
 codec's definition.
 
 Noel as you know I have built server's based on pipelined
 components before
 
 and am trying it all over again.  We must spare those wanting
 to implement
 
 simple codecs like base64 from these concepts let alone the
 language around
 
 them.  The intended use of codecs by some folks may not be so
 grandiose.
 
 They may simply need it to just convert a byte buffer and be
 done with it.
 
 There is no reason why we should cloud this picture for the
 simple user.  

I agree that we definitely don't want to introduce complexity and
computational overhead for simple cases.  However I think many of the above
concepts can be supported without creating complex APIs.

I believe these are the interfaces you have previously posted.  Let me know
if I've got the wrong ones :-)

RE: [codec] StatefulDecoders

2004-03-01 Thread Brett Henderson

Noel,

Sorry about the delay, I've been away for a few days.

 In general, I have long preferred the pipeline/event model to
 the approach
 
 that Alex had, where it would give data to the codec, and
 then poll it for
 
 state.  However, I don't see something in your implementation
 that I think
 
 we want.  We want to be able to have structured content handlers and
 
 customized events depending upon the content handler and the
 registered
 
 event handlers.  This could be particularly important in a streaming
 
 approach to MIME content.  And I also desperately want a
 regex in this same
 
 model.

You're right, my design has no concept of structured content. It was
developed to solve a particular problem (ie. efficient streamable data
manipulation).  If API support for structured content is required then my
implementation doesn't (yet) support it.

I'll use engine for the want of a better word to describe an element in a
pipeline performing some operation on the data passing through it.

An API aware of structured content shouldn't complicate the creation of
simple engines such as base64 which pay no attention to data structure.
Ideally, a structured API would extend an unstructured API and only those
engines requiring structured features would need to use it.

I'm having trouble visualising a design that supports structured content
without being specific to a particular type of structured content. Do you
have some examples of what operations you would like a structured data API
to support?  Do you see interactions between pipeline elements being
strongly typed?

My design uses the concepts of producers and consumers, I'd like to see
those ideas preserved.  Engines are both consumers and producers but the
first and last elements in a chain (or pipeline) are only producers and
consumers respectively allowing I/O to be decoupled from the pipeline
operations. For example, my design uses an OutputStreamConsumer to write
pipeline result data to an OutputStream, OutputStreamProducer to receive
data written to an OutputStream and pass into a pipeline, and
InputStreamProducer to pump data from an input stream and pass into a
pipeline.

A structured content API can extend the producer/consumer ideas by passing
data types understood by the structured content in question.

For example, a multipart mime decoding engine (consumer of byte data, hence
a ByteConsumer) could produce MIME parts (a MIMEPartProducer). A
MIMEPartConsumer design would receive MIMEPart objects (which are in turn
ByteByteEngines but extended with a MIME type property) and connect them to
a consumer capable of handling the byte data contained in the MIME part.

The above example would involve the definition of several new interfaces
(MIMEPart extending ByteByteEngine adding mime type property, MIMEProducer
extending Producer, MIMEConsumer extending
Consumer) and new classes to implement the new interfaces with the behaviour
desired.

Any other structured content types could be handled in similar ways with new
event types being defined and relevant producer and consumer interfaces
created to support them.

Perhaps a more generic method can be devised but weak typing and degraded
performance are hard to avoid.


 Drop the word conversion.

Yep, agreed.

 Conversion is simply one of many possible
 
 operations.  These are pipelines; receiving content on one
 end, performing
 
 operations, and generating events down a chain.  More than
 one event could
 
 be generated at any point, and the chain can have multiple paths.

If the above can be achieved without introducing a large overhead (both
runtime and coding overhead) for simple operations then it sounds good.  Is
it worth considering the possibility of a pipeline receiving data from more
than one source?  This may be necessary when composing multipart MIME
messages.  Then again, a multipart MIME consumer class may be a better
solution using similar ideas to those described earlier (ie. A
MIMEPartConsumer which combines all parts into a single byte stream).

I'm not sure how much sense I've made above, hopefully some ;-)

Brett


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] StatefulDecoders

2004-02-24 Thread Brett Henderson

I probably sound like a broken record but here goes :-)

If I'm barking up the wrong tree, let me know and I'll stop making noise on
this list ...

Many of the problems being discussed here have been solved in the library
I've posted previously. An up-to-date version can be found here:
http://www32.brinkster.com/bretthenderson/bhcodec-0.7.zip
It uses generic interfaces for communication between all components that
allows the use of streams, byte arrays, NIO, etc to be plugged together as
necessary.  NIO isn't currently supported but I expect it would be trivial
to add it.

The library can be visualised as a collection of data consumers and
producers (a codec engine implements both).  No distinction is made between
encoding and decoding (they are the same thing in my view from an api
perspective).

One problem I see with the current codec project is that every new use case
that is envisaged tends to require extensions to the current interfaces.
The above library is designed to be more generic and allow a more pluggable
approach where new functionality doesn't impact every codec implementation.

It uses a push model internally but pull-model utility classes wrapped
around underlying push classes can be used to implement pull functionality
where necessary.

It does not require JDK1.4 although NIO could be plugged in if necessary.

I understand that people don't want to spend time looking at every pet
project people have come up with but I think this could be useful in
commons.  There is a lot to look at and I guess that is discouraging people
from taking the time to look at it.

Should I propose this library as a separate project?  (How do I do this?)
Perhaps as more generic codec library that could potentially be used by
commons-codec once it has matured.  It may be too large a change to fit into
the existing codec project as it currently stands.

I've offered this several times now and while there doesn't seem to be any
major opposition to the idea, there hasn't been strong support either.  I'm
not sure how to proceed.  Is this something that can be placed in a sandbox
for people to play with?

Cheers,
Brett

 -Original Message-
 From: Noel J. Bergman [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, 24 February 2004 2:25 PM
 To: Jakarta Commons Developers List
 Subject: RE: [codec] StatefulDecoders
 
 
  This brings up an interesting issue: How do we potentially 
 package and
 
  deliver some code that depends on Java 1.4. In a second [codec] jar?
 
 
 
 There are several issues, but let me address what I consider 
 to be the key
 
 one: we have to design the core code as push-model.  If we 
 were to design
 
 the code as pull-model, we would lose the thread of execution 
 inside the
 
 callee.  We don't want the callee blocking on I/O and returning when
 
 finished.  But with a non-blocking callee, we can then use 
 either a NIO or
 
 IO wrapper as necessary.
 
 
 
 Obviously the interface between the I/O handling wrapper and the data
 
 handling core will have to be Java 2  1.4 compatible.
 
 
 
   --- Noel
 
 
 
 
 
 -
 
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] More thoughts on CharSets and Encoders (references: RE: [codec] Streamable Codec Framework)

2004-01-15 Thread Brett Henderson

 Does CharSet/Util's in [lang] approach a similar 
 functionality to nio.charset? After reviewing the codebase, 
 my viewpoint is no, as it is more for building charsets, 
 than for using them (authors rebuttals always welcome).

I'd also be interested to see if this functionality exists
somewhere.

 I think [httpclients] static ASCII methods (once in 
 HttpConstants) now also in [codec-multipart] are very similar 
 in functionality to the idea of CharsetEncoders/Decoders of 
 nio.charset.
 
 So we begin to have functionality for charset's in [lang] and for 
 encoders in [codec]. How do we bring this all together? I'd 
 like to see 
 similar CharsetEncoding/Decoding capabilities as nio (with 
 the eventual 
 goal of actually having Jakarta Commons converge to using nio of 
 Charsets in the future.
 
 As a possible bridge for transition I think a CharsetEncoder 
 API in [codec] that duplicates that of nio.charset would form 
 an excellent path for convergence. The eventual goal once 
 j2sdk1.3 was no longer in service would be to simply refactor 
 Apache Projects dependent on this API to use NIO instead.

Does the CharSetEncode class in my library approach the functionality
you require?
http://www32.brinkster.com/bretthenderson/BHCodec-0.6.zip
Internally it uses an OutputStreamWriter which leverages JDK
functionality albeit in a somewhat inelegant way.  I would expect
performance to be fairly reasonable however.

I intend to write a corresponding CharSetDecode class but haven't
gotten around to this yet.  If you have any interest I can up the
priority.  It will use an InputStreamReader internally unless
better alternatives are found.

If at some point in the future JDK 1.4 becomes an accepted base
I will be reworking CharSetEncode to use java.nio features
because they provide a cleaner interface than wrapping streams.

  If JDK1.4 is considered a sufficient base, I could
  
  I think tha considering 1.3.1 as the base requirement is safe. 
  Unfortunately, as discussed on this list under various 
 heading, making 
  1.4 a requirement is too aggressive.
  
  Gary
 
 Yes, we're still supporting 1.3 in many cases, BUT, wouldn't we want 
 convergence eventually to the API's provided by the j2sdk? 
 AND, by that 
 point in the future, is j2sdk 1.3 even going to be in play?

I will always be leaving a CharSetEncode feature in my library
because it allows charset conversion to be performed within
a processing chain but I would see the internal implementation
moving to java.nio eventually.

Brett


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] Streamable Codec Framework

2004-01-12 Thread Brett Henderson

Thanks for the reply.

Yep, it is a completely different framework.  I wrote
the framework before looking at the current commons codec
component so there are no relationships between the two.  Hopefully
there are ways of incorporating ideas between the two.

Are you hoping to incorporate streamed processing into the existing
design or create new classes to achieve this?  I have no preferences
either way but it could be tricky to re-use the existing interfaces.
I'll have to look at this further though.

I'll look at Ant as soon as I can to see how it approaches the problem.

Hmm, your simple example already uncovers a gap in my design :-)
Should be easily solved though. Phew.

I process all data off an input stream writing to a destination
without some manual coding.  However I can achieve this by creating
a Producer that reads InputStreams.  I will call this
InputStreamProducer.

My InputStreamProducer will implement ByteProducer and will have
a method (eg. pump()) which pumps data from the provided input
stream to the ByteConsumer attached to it.

Using my new InputStreamProducer I can perform MD5 Hex encoding
of an input stream creating a result String as follows:

  // Create processing objects.
  MD5 md5 = new MD5();
  AsciiHexEncode asciiHexEncode = new AsciiHexEncode();
  InputStreamProducer source = new InputStreamProducer(inputStream);
  BufferByteConsumer resultBuf = new BufferByteConsumer();
  String result;

  // Set up processing chain.
  source.setConsumer(md5);
  md5.setConsumer(asciiHexEncode);
  asciiHexEncode.setConsumer(resultBuf);

  // Process all available data.
  source.pump();

  // Obtain result hash.
  result = new String(resultBuf.getData());

If I eliminate all calls to .setConsumer() by adding the necessary
constructors to accept consumers, the above code can be shortened to.

  ByteBufferConsumer resultBuf = new BufferByteConsumer();
  String result;

  // Create md5 hash of data from inputStream.
  new InputStreamProducer(inputStream, new MD5(new
AsciiHexEncode(resultBuf))).pump();
  result = new String(resultBuf.getData());

What do you think?  Static utility methods could simplify the above even
further if necessary.

It's definitely more complex than the existing approach but it is
very flexible in that it allows arbitrary processing chains to be
defined and allows for simple integration with IO Streams.  Each class
performs a very small well defined purpose and can be coupled to build
complex processing chains.  Processing should be efficient although
more setup time is required.

Supporting Reader/Writer and any other IO classes should be as simple
as defining the relevant Consumer/Producer implementations to interact
with them.  Codec algorithms won't require modification.

Cheers,
Brett

 -Original Message-
 From: Gary Gregory [mailto:[EMAIL PROTECTED] 
 Sent: Friday, 9 January 2004 4:15 PM
 To: 'Jakarta Commons Developers List'
 Subject: RE: [codec] Streamable Codec Framework
 
 
 Hello,
 
 Streamable codecs make a lot of sense for some codecs (but 
 perhaps not for the language codecs). Thanks for bringing the 
 topic up. I took a very quick look at the code you refer to 
 and it seems to be a separate framework from what we have in 
 [codec] today (I could be wrong of course), especially the 
 whole Producer/Consumer business.
 
 A simple example I can think of that could drive an 
 implementation could be:
 
 InputStream inputStream = ... new File(...); 
 DigiestUtil.md5Hex(inputStream);
 
 It would be interesting to see how Ant implements MD5 and SHA.
 
 This probably means that Encoder.encode(Object) should also 
 handle I/O/Streams and Reader/Writer...
 
 Gary


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] Streamable Codec Framework

2004-01-12 Thread Brett Henderson

 I suspect we are going to need something along the lines of a 
 Streamable Encoder/Decoder for the Multipart stuff. If we look at 
 HttpClients MultipartPostMethod, there is a framework for basically 
 joining multiple sources (files, strings, byte arrays) 
 together into an 
 OutputStream which is multipart encoded. I want to attempt to 
 maintain 
 this strategy when isolating the code out of HttpClient and into the 
 multipart sandbox project. I suspect that your Streamable 
 Consumer/Producer stuff could also be advantageous for multipart 
 encoding/decoding. At least I want to make sure we're not 
 inventing the 
 same wheel.

I'll try to look at the HttpClient code to get a feel for how it
hangs together.  From what I can gather my code should plug in fairly
cleanly.  My code doesn't specify any type of IO interface as
any interface can be adapted in by implementing relevant consumers
and producers.  I've tried to design the framework such that the
actual codec algorithms have no knowledge of the source or destination
of the data they process.  This allows them to be far more generic
and greatly increases their usefulness.

 Specifically, I see we're going to need interfaces other than the 
 existing codec ones because they pass around byte[] and Object when 
 encoding/decoding. We need to maintain that the content will 
 be streamed 
 from its native datstructure when its consumed by a consumer 
 (HttpClient 
 MultipartPost for instance) or when it is used to decode that the 
 Objects produced are built efficiently off a InputStream (ie 
 Files are 
 immediately written to the FileSystem, Strings or byte[]s are 
 maintained 
 in memory).

My framework doesn't specify any particular type of data although
byte oriented processing is the only fleshed out implementation at
the moment.  All it cares about is that a producer is available
to generate data from an external source and a matching consumer
is available to pass it to a destination.
Every producer must have a matching consumer.  A consumer can be
called directly by clients.
Typically an engine (implementing both consumer and producer) will
sit in the middle performing some kind of translation/encoding/decoding
on the data.  It consumes input data and produces output data.
Using this structure, processing chains can be defined so that
multiple transforms can be performed on the same data all in a
stream oriented fashion.

To cut a long story short, chains can be defined to access data
from streams/buffers/etc, perform relevant translations (re-using
small in-memory buffers to eliminate garbage collection) and pass
data to output streams/buffers/etc.  Due to the stream support,
data of arbitrary size can be processed.

 
 Either way, I'm currently tidying up a maven project 
 directory to be 
 committed into the sandbox for the new multipart codec stuff. 
 Once, its 
 in place we could add your code to it as well.

Let me know if you want to import any of my code and I'll do
any necessary package reorganisation.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] multipart encoders/decoders

2004-01-12 Thread Brett Henderson

 (3) Should the Producer/Consumer framework submitted be 
 retrofitted into the current [codec] Encoder/Decoder 
 framework? Personally, I like the specificity of Encoder 
 and Decoder for interfaces. This means the current i/f 
 would be expanded.

I'm not terribly attached to Producer and Consumer but couldn't
think of a better alternative.  Perhaps Source and Sink but
they may be no better.

I'm not keen on Encoder and Decoder because I don't believe
there's a need to make the distinction between the two, plus I
might have to rewrite some code ;-)
Both encoders and decoders are processing data, it is the algorithm
that decides if it is encoding or decoding.  In some cases the
words encode and decode may not make sense.  If you're modifying
line endings on a file are you encoding or decoding?

Producer and Consumer relate to different things.  Base64Encode
for example implements ByteProducer and ByteConsumer by way of
the ByteEngine interface because it consumes byte data and
produces byte data.  OutputStreamConsumer only implements
ByteConsumer because it consumes byte data but sends the data
to a location outside the scope of the library (ie. OutputStream).

 
 (4) The other way around: should the current [codec] be 
 recast in the proposed Produced/Consumer f/w. I am not wild 
 about the genericity of Producer and Consumer as names.
 
 (5) I am assuming that two f/w's in [codec] are undesirable. 
 It would be good to agree or disagree on this previous 
 statement as a starter! ;-)

There are obviously advantages to having a single unified
framework and if possible it would be the ideal result.
Unfortunately I have run into performance disadvantages so far.
I haven't tried it for a while but in the past my base 64
conversion has not been as fast as the existing codec
implementation for small conversions.
For common algorithms such as base 64 it may make sense to have
two implementations optimised for different purposes.
In addition, I'm not familiar with language codecs but you
mentioned it makes no sense to use these in streams.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] multipart encoders/decoders

2004-01-12 Thread Brett Henderson

 It is accomplished under jakarta-commons-sandbox/codec-multipart.
 
  (2) Can we agree on /what/ streamable codecs are (sorry but 
 I like to 
  point out the obvious when starting something like this). Recognize 
  the current impls alternatives.
  
 
 Yes sorry, I think there are two ideas running around here:
 
 (a) Actual inline Stream Encoders/Decoders (SSL etc). that 
 require no 
 knowledge of the length of the content. Probibly extend 
 FilterOutputStreams etc.
 
 (b) Encoders/Decoders that actually work by passing thier content 
 through streaming to manage larger amounts of data 
 efficently. Data for 
 which the length is probibly already known (Files). An 
 interface which 
 supports handing objects and Streams manages this:

Can you give some examples of algorithms where the length needs to be
known in advance?

My code may break horribly with such an algorithm :-(


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] multipart encoders/decoders

2004-01-12 Thread Brett Henderson

 [snip]
  1. Commons exists as an effort to encourage code reuse.  The 
  Streamable framework presented was interesting, but I'd like us to 
  find an existing streamable Base64 implementation inside of the ASF 
  codebase.
 
 Not for Base64 but Ant has:
 
 o MD5 and SHA checksum computation: 
 http://ant.apache.org/manual/CoreTasks/checksu m.html

Everyone will be getting tired of my emails soon ...

I've had a look at Ant and they are using the java.security.MessageDigest
directly.
I think it goes back to JDK 1.2 so I assume it's okay to use within codec.
It supports MD5 and SHA-1.

We would have to create wrapper classes if we want them to support
the relevant codec interfaces but this should be straightforward.
Of course if we want to implement other algorithms such as SHA-256 or
SHA-512
we either have to write our own or rely on the user to have the relevant
security providers installed within their JVM.

It would be interesting to compare performance between the Sun provided
MD5 and codec.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] multipart encoders/decoders

2004-01-12 Thread Brett Henderson

 Here are a few good rules of thumb:
 
 1. Commons exists as an effort to encourage code reuse.  The 
 Streamable 
 framework presented was interesting, but I'd like us to find 
 an existing 
 streamable Base64 implementation inside of the ASF codebase.

I have no problems with this but so far I haven't seen anything
like this that doesn't sub-class InputStream and OutputStream.
Sub-classing InputStream and OutputStream is problematic because
it forces you to code algorithms around IOStream semantics
(InputStream coding is not simple, especially available() method)
and it forces you to make the encoder sub-class OutputStream and
the decoder sub-class InputStream unless you write an
implementation for each stream type.
Providing a single InputStream implementation that can use an
underlying codec engine simplifies development and testing
of new algorithms considerably and removes the distinction between
input and output streams.

 3. No need to expressly focus on a framework (at all).  Codec 
 is FIRSTLY a 
 functional beast, even if the solution is inelegant.  If there is an 
 existing streamable Base64 in ASF, I'd recommend copying it 
 outright and 
 placing in the codec package.  Over time, it can move towards 
 a unified 
 streamable framework.
I'm often guilty of over designing things, hey it's fun :-)
Would it help if I didn't call my classes a framework?
It's no more than a few common interfaces and some implementation
codecs.  There are no factories, or service providers, or other
abstractions complicating things.
It really isn't much different to the existing codec interfaces
except that they are written to support streaming and separate
input and output interfaces.
I really think there's a need for all streamable codecs to follow
common interfaces.  Perhaps InputStream and OutputStream are
sufficient but I think there's a better way.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] multipart encoders/decoders

2004-01-12 Thread Brett Henderson

 
  There are obviously advantages to having a single unified framework 
  and if possible it would be the ideal result. Unfortunately 
 I have run 
  into performance disadvantages so far. I haven't tried it 
 for a while 
  but in the past my base 64 conversion has not been as fast as the 
  existing codec implementation for small conversions.
  For common algorithms such as base 64 it may make sense to have
  two implementations optimised for different purposes.
 
 That does not seem justified at first. Optimize last if at all... ;-)

Hehe, you're right.

I guess it just feels wrong pushing for stream support in codec when
its introduction will incur overhead for non-streamed cases.  Of
course in 99.9% of those cases the performance difference will be
immeasurable in the overall application :-)

 
  In addition, I'm not familiar with language codecs but you 
 mentioned 
  it makes no sense to use these in streams.
 
 One of the things to keep in mind, is that for simple cases, 
 the f/w should be invisible to the client code. For example:
 
 DigiestUtil.md5Hex(new FileInputStream(boo.txt));
 
 Gary

Hmm, that is definitely worth remembering.  The more generic I made
the design, the more coding was required in order to use it :-(
Perhaps a symptom of over-engineering, I hope not.

There are a few ways I can think of dealing with this.
1. Do nothing.  Force people to learn a new and more complicated API.
2. Create a new API that supports streaming leaving the existing
API in place for the existing functionality and common use cases
not requiring stream support.
3. Add stream support to the existing API.
4. Create an API supporting stream processing and re-implement the
existing API using it.

Of these I think.
1. A non-starter but had to list it.  Backwards compatibility and usability
being two reasons.
2. This is a valid approach but leaves two distinct code-bases to support.
I hope there are other options available.
3. In most projects this tends to be the way things are done.  In this
case I'm not sure that its practical and may get fairly messy and create
an unmaintable codebase.  I really need to spend more time looking at the
existing APIs in detail though.
4. I think a variation on this idea could work well in practice.
Codec could be conceptually designed in various layers.  It could
have a low level API that is modular and supports stream based
processing.  My library or some equivalent would fit this purpose.
A second layer could then provide simplified access to the library
for the most common use cases implementing the existing API and adding
new functionality as desired.


To give an example, the example could be implemented as follows:
  //DigiestUtil.md5Hex(new FileInputStream(boo.txt));
  public class DigestUtil {
...
public static String md5Hex(InputStream inputStream) throws
CodecException {
  BufferByteConsumer result = new BufferByteConsumer();
  ChainByteEngine chain = new ChainByteEngine(result);
  
  chain.append(new MD5());
  chain.append(new AsciiHexEncode());
  
  new InputStreamProducer(chain, inputStream).pump();
  
  return new String(result.read());
}
...
  }

There's some overhead in initialisation but most classes are fairly
lightweight.
All of the above classes have been implemented if you wish to have a
look.

I updated some of the classes last night, a copy can be found at:
http://www32.brinkster.com/bretthenderson/BHCodec-0.6.zip

Brett


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] Streamable Codec Framework

2004-01-11 Thread Brett Henderson

Thanks for the reply.

Yep, it is a completely different framework.  I wrote
the framework before looking at the current commons codec
component so there are no relationships between the two.
Hopefully there are ways of incorporating ideas between the
two.

Are you hoping to incorporate streamed processing into the
existing design or create new classes to achieve this?  I
have no preferences either way but it could be tricky to
re-use the existing interfaces. I'll have to look at this
further though.

I'll look at Ant as soon as I can to see how it approaches
the problem.

Hmm, your simple example already uncovers a gap in my
design :-) Should be easily solved though. Phew.

Currently I can't process all data off an input stream writing
to a destination without some manual coding.  However I can
resolve this by creating a Producer that reads InputStreams.
I will call this InputStreamProducer.

My InputStreamProducer will implement ByteProducer and will
have a method (eg. pump()) which pumps data from the provided
input stream to the ByteConsumer attached to it.

Using my new InputStreamProducer I can perform MD5 Hex encoding
of an input stream creating a result String as follows:

  // Create processing objects.
  MD5 md5 = new MD5();
  AsciiHexEncode asciiHexEncode = new AsciiHexEncode();
  InputStreamProducer source = new InputStreamProducer(inputStream);
  BufferByteConsumer resultBuf = new BufferByteConsumer();
  String result;

  // Set up processing chain.
  source.setConsumer(md5);
  md5.setConsumer(asciiHexEncode);
  asciiHexEncode.setConsumer(resultBuf);

  // Process all available data.
  source.pump();

  // Obtain result hash.
  result = new String(resultBuf.getData());

If I eliminate all calls to .setConsumer() by adding the necessary
constructors to accept consumers, the above code can be shortened to.

  ByteBufferConsumer resultBuf = new BufferByteConsumer();
  String result;

  // Create md5 hash of data from inputStream.
  new InputStreamProducer(inputStream, new MD5(new
AsciiHexEncode(resultBuf))).pump();
  result = new String(resultBuf.getData());

What do you think?  Static utility methods could simplify the above
even further if necessary.

Supporting Reader/Writer and any other IO classes should be as simple
as defining the relevant Consumer/Producer implementations to interact
with them.  Codec algorithms won't require modification.

Cheers,
Brett

 -Original Message-
 From: Gary Gregory [mailto:[EMAIL PROTECTED] 
 Sent: Friday, 9 January 2004 4:15 PM
 To: 'Jakarta Commons Developers List'
 Subject: RE: [codec] Streamable Codec Framework
 
 
 Hello,
 
 Streamable codecs make a lot of sense for some codecs (but 
 perhaps not for the language codecs). Thanks for bringing the 
 topic up. I took a very quick look at the code you refer to 
 and it seems to be a separate framework from what we have in 
 [codec] today (I could be wrong of course), especially the 
 whole Producer/Consumer business.
 
 A simple example I can think of that could drive an 
 implementation could be:
 
 InputStream inputStream = ... new File(...); 
 DigiestUtil.md5Hex(inputStream);
 
 It would be interesting to see how Ant implements MD5 and SHA.
 
 This probably means that Encoder.encode(Object) should also 
 handle I/O/Streams and Reader/Writer...
 
 Gary


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] Streamable Codec Framework

2004-01-06 Thread Brett Henderson

There seemed to be definite interest in streamable
codecs but the list has gone fairly quiet.

I am interested in participating in work of this
kind but I'm not sure how to proceed.

I don't think this deserves to be a standalone
project as it seems to fit fairly well into the
scope of the current codec package and I don't
want to step on any toes with respect to the
existing codec project.

I believe Gary Gregory and Tim O'Brien are the two
primary codec committers.  Gary and Tim, your thoughts
would be most appreciated.

I'm providing the code mentioned in the below
message as an example because I believe it is more
effective to discuss working code than talk about
abstract ideas.

 -Original Message-
 From: Brett Henderson [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, 13 November 2003 10:33 AM
 To: 'Jakarta Commons Developers List'
 Subject: RE: [codec] Streamable Codec Framework
 
 
 I made some changes to the code I supplied previously, it can 
 be found at the following URL.
 
 http://www32.brinkster.com/bretthenderson/BHCodec-0.5.zip
 
 The main differences relate to the codec interfaces and 
 support for data types other than byte, the encoding 
 algorithms are largely unchanged.
 
 A quick summary of the framework is as follows:
 
 Framework is based around consumers and producers, consumers 
 accept incoming data and producers produce outgoing data. A 
 consumer implements the Consumer interface and a producer 
 implements the Producer interface.
 
 Specialisations of these interfaces are used for each type
 of data to be converted.  For example there are currently 
 ByteConsumer, ByteProducer, CharConsumer and CharProducer interfaces.
 
 The engine package contains classes (and interfaces) that
 are both consumers and producers (ie. accept incoming data
 and produce result data).  For example there is a ByteEngine 
 interface that implements ByteConsumer and ByteConsumer 
 interfaces and is in turn implemented by the Base64Encode 
 concrete class.
 
 Engines may consume one kind of data and produce another,
 the CharByteEngine interface defines an engine that
 consumes characters and produces bytes.  This is implemented
 by the CharSetEncode class (untested).
 
 The consumer package contains classes that consume data
 and perform an action on the data that doesn't allow it to
 be accessed via producer functionality.  For example, the 
 BufferByteConsumer class acts as a receiving buffer for 
 encoding results, the OutputStreamConsumer writes all data to 
 an OutputStream.
 
 The producer package contains classes that produce data
 for the framework but don't accept data via consumer 
 functionality.  For example, the OutputStreamProducer is an 
 OutputStream that produces all data passed to it.
 
 The io package contains classes that fit into the java.io 
 functionality that are neither consumers or producers in the 
 framework sense.  For example, the CodecOutputStream is a 
 FilterOutputStream that uses an internal ByteEngine to 
 perform a transformation on the data passing through it.
 
 JUnit tests exist for most classes in the framework.
 All testing is performed using JUnit.  If there is no unit
 test for a class, it can be considered untested.
 
 The framework is now generic enough to handle data of any
 type and allow classes to be defined which can accept
 any kind of data and/or produce any kind of data.  All
 data can be processed in a streamy fashion.  For example, 
 encoding engines implementing the ByteEngine interface can be 
 plugged into CodecOutputStream or CodecInputStream and used 
 for stream functionality without directly supporting java.io streams.
 
 Using the CharSetEncode and (currently non-existent) 
 CharSetDecode, it should be possible to encode character data 
 to base64 then write result to a Writer.  This should go part 
 way towards helping Konstantin with his XML conversions.
 
 Sorry about the brain dump but there is a fair bit
 contained in the zip file and I thought some explanation
 would be useful.
 
 Any feedback on the above is highly welcome.  I don't
 plan on making too many more changes unless it is
 deemed useful.
 
 Brett


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] Streamable Codec Framework

2003-11-12 Thread Brett Henderson

I made some changes to the code I supplied previously, it can
be found at the following URL.

http://www32.brinkster.com/bretthenderson/BHCodec-0.5.zip

The main differences relate to the codec interfaces and support
for data types other than byte, the encoding algorithms
are largely unchanged.

A quick summary of the framework is as follows:

Framework is based around consumers and producers, consumers
accept incoming data and producers produce outgoing data. A
consumer implements the Consumer interface and a producer
implements the Producer interface.

Specialisations of these interfaces are used for each type
of data to be converted.  For example there are currently
ByteConsumer, ByteProducer, CharConsumer and CharProducer
interfaces.

The engine package contains classes (and interfaces) that
are both consumers and producers (ie. accept incoming data
and produce result data).  For example there is a ByteEngine
interface that implements ByteConsumer and ByteConsumer
interfaces and is in turn implemented by the Base64Encode
concrete class.

Engines may consume one kind of data and produce another,
the CharByteEngine interface defines an engine that
consumes characters and produces bytes.  This is implemented
by the CharSetEncode class (untested).

The consumer package contains classes that consume data
and perform an action on the data that doesn't allow it to
be accessed via producer functionality.  For example, the
BufferByteConsumer class acts as a receiving buffer for
encoding results, the OutputStreamConsumer writes all data
to an OutputStream.

The producer package contains classes that produce data
for the framework but don't accept data via consumer
functionality.  For example, the OutputStreamProducer
is an OutputStream that produces all data passed to it.

The io package contains classes that fit into the java.io
functionality that are neither consumers or producers in
the framework sense.  For example, the CodecOutputStream
is a FilterOutputStream that uses an internal ByteEngine
to perform a transformation on the data passing through it.

JUnit tests exist for most classes in the framework.
All testing is performed using JUnit.  If there is no unit
test for a class, it can be considered untested.

The framework is now generic enough to handle data of any
type and allow classes to be defined which can accept
any kind of data and/or produce any kind of data.  All
data can be processed in a streamy fashion.  For example,
encoding engines implementing the ByteEngine interface
can be plugged into CodecOutputStream or CodecInputStream
and used for stream functionality without directly
supporting java.io streams.

Using the CharSetEncode and (currently non-existent)
CharSetDecode, it should be possible to encode character
data to base64 then write result to a Writer.  This
should go part way towards helping Konstantin with his
XML conversions.

Sorry about the brain dump but there is a fair bit
contained in the zip file and I thought some explanation
would be useful.

Any feedback on the above is highly welcome.  I don't
plan on making too many more changes unless it is
deemed useful.

Brett


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] Streamable Codec Framework

2003-11-10 Thread Brett Henderson

1.2.2 it is then :-)

I agree with maintaining 1.2.2 compatibility, it
is a bit harsh to require 1.4 to perform base64 encoding.
Unfortunately it would make life a lot easier with regards
to charset encoding ...

It should be possible to use OutputStreamWriter and
InputStreamReader internally to perform the conversions
without incurring much of a performance overhead.
For example a CharByteEngine??? could use OutputStreamWriter
internally to perform charset encoding.

In many cases OutputStreamWriter and InputStreamReader
can be used directly, it is the cases where byte to char
conversion is required during output streaming that require
an encoder for transforming between chars and bytes.
Perhaps I'm missing something here though ...

I also think it would be useful to be able to perform
charset conversion without depending on streams.

 -Original Message-
 From: Gary Gregory [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, 11 November 2003 4:19 AM
 To: 'Jakarta Commons Developers List'
 Subject: RE: [codec] Streamable Codec Framework
 
 
 Yes, no problem, 1.2.2.
 
 Gary
 
  -Original Message-
  From: Tim O'Brien [mailto:[EMAIL PROTECTED]
  Sent: Monday, November 10, 2003 08:10
  To: Jakarta Commons Developers List
  Subject: RE: [codec] Streamable Codec Framework
  
  Oleg, this is understood - 1.2.2 should be our LCD for codec.
  
  Tim
  
  
  On Mon, 10 Nov 2003 [EMAIL PROTECTED] wrote:
  
   Tim, Gary, et al
   Streamable codec framework would be a welcome addition to Commons 
   Codec. However, as far as we (Commons HttpClient) are 
 concerned, the 
   decision
  to
   ditch java 1.2.2 support would render Codec unusable for 
 us (and I'd
  guess
   a few other projects that still need to maintain java 1.2.2
  compatibility).
   Not that we like it too much, but because lots of our users still 
   demand it.
  


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] Streamable Codec Framework

2003-11-09 Thread Brett Henderson

I think the design of the codec framework could cover
your requirements but it will require more functionality
than it currently has.

Some of the goals I was working towards were:
1. No memory allocation during streaming.  This eliminates
garbage collection during large conversions.
   Cool. I got large conversions... I'm already at
   mediumblob in mysql , and it goes up/down XML
  stream
   :)
  
  I have a lot to learn here.  While I have some
  knowledge
  of XML (like every other developer on the planet), I
  have never used it for large data sets or used SAX
  parsing.
  Sounds like a good test to find holes in the design
  :-)
 
 It's easy. You got callback, where you can gobble up
 string buffers with incoming chars for element
 contents.  ( and there is a lot of this stuff... )
 After tag is closed, you have all the chars in a big
 string buffer, and get another callback - in this
 callback you have to convert data, and do whatever
 necessary ( in my case, create input stream, and pass
 it to database ) 

This could be tricky, it's something I've been thinking
about but would like feedback from others about the best
way of going about it.

The data you have available is in character format.
The base64 codec engine operates on byte buffers.
The writer you want to write to requires the data
to be in character format.

I have concentrated on byte processing for now because
it is the most common requirement.  XML processing
requires that characters be used instead.

It makes no sense to perform base64 conversion on
character arrays directly because base64 is only 8-bit
aware (you could split each character into two bytes
but this would blow out the result buffer size where
chars only contain ASCII data).

I think it makes more sense to perform character to
byte conversion separately (perhaps through
extensions to existing framework) and then perform
base64 encoding on the result.  I guess this is a
UTF-16 to UTF-8 conversion ...

What support is there within the JDK for performing
character to byte conversion?
JDK1.4 has the java.nio.charset package but I can't
see an equivalent for JDK1.3 and lower, they seem to
use com.sun classes internally when charset conversion
is required.

If JDK1.4 is considered a sufficient base, I could
extend the current framework to provide conversion
engines that translate from one data representation
to another.  I could then create a new CodecEngine
interface to handle character buffers (eg.
CodecEngineChar).


3. Customisable receivers.  All codecs utilise
receivers to
handle conversion results.  This allows
  different
outputs such as
streams, in-memory buffers, etc to be supported.
   
   And writers :) Velocity directives use them.
  
  Do you mean java.io.Writer?  If so I haven't
  included
  direct support for them because I focused on raw
  byte
  streams.  However it shouldn't be hard to add a
  receiver to write to java.io.Writer instances.
 
 
 My scenarios: 
 - I'm exporting information as base64 to XML with help
 ov velocity. I do it through custom directive - 
 in this directive I get a Writer from velocity, where
 I have to put my data. 
 
 Ideally codec would do: read input stream - encode -
 put it into writer without allocating too much 
 memory. 
 
 I'm importing information:
 - I have stream ( string ) of base 64 data - 
 codec gives me an input stream which is fed from this
 source and does not allocate too much memory and
 behaves polite...
 
The current framework doesn't handle direct conversion
from an input stream to an output stream but this
would be simple to add if required.
Again, the hard part would be the char/byte issues.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] Streamable Codec Framework

2003-11-04 Thread Brett Henderson

  I noticed Alexander Hvostov's recent email
  containing streamable
  base64 codecs.  Given that the current codec
  implementations are
  oriented around in-memory buffers, is there room for
  an
  alternative codec framework supporting stream
  functionality?  I
  realise the need for streamable codecs may not be
  that great but
  it does seem like a gap in the current library.
 
 I'm in the need. So we are at least 3 :) 
 
 
  Some of the goals I was working towards were:
  1. No memory allocation during streaming.  This
  eliminates
  garbage collection during large conversions.
 Cool. I got large conversions... I'm already at
 mediumblob in mysql , and it goes up/down XML stream
 :)

I have a lot to learn here.  While I have some knowledge
of XML (like every other developer on the planet), I
have never used it for large data sets or used SAX parsing.
Sounds like a good test to find holes in the design :-)

  3. Customisable receivers.  All codecs utilise
  receivers to
  handle conversion results.  This allows different
  outputs such as
  streams, in-memory buffers, etc to be supported.
 
 And writers :) Velocity directives use them.

Do you mean java.io.Writer?  If so I haven't included
direct support for them because I focused on raw byte
streams.  However it shouldn't be hard to add a
receiver to write to java.io.Writer instances.

 I'll give it a look at and come back later today :) 

I look forward to your feedback.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Streamable Codec Framework

2003-11-02 Thread Brett Henderson

Hi All,

I noticed Alexander Hvostov's recent email containing streamable
base64 codecs.  Given that the current codec implementations are
oriented around in-memory buffers, is there room for an
alternative codec framework supporting stream functionality?  I
realise the need for streamable codecs may not be that great but
it does seem like a gap in the current library.

I have done some work in this area over the last couple of months
as a small hobby project and have produced a small framework for
streamable codecs.

Some of the goals I was working towards were:
1. No memory allocation during streaming.  This eliminates
garbage collection during large conversions.
2. Pipelineable codecs.  This allows multiple codecs to be chained
together and treated as a single codec.  This allows codecs such as
base 64 to be broken into two components (base64 and line wrapping
codecs).
2. Single OutputStream, InputStream implementations which
utilise codec engines internally.  This eliminates the need to
produce a buffer based engine and a stream engine for every codec.
Note that this requires codec engines to be written in a manner
that supports streaming.
3. Customisable receivers.  All codecs utilise receivers to
handle conversion results.  This allows different outputs such as
streams, in-memory buffers, etc to be supported.
4. Direction agnostic codecs.  Decoupling the engine from the
streams allows the engines to be used in different ways than
originally intended.  Ie. You can perform base64 encoding
during reads from an InputStream.

I have produced base64 and ascii hex codecs as a proof of concept
and to evaluate performance.  It isn't as fast as the current
buffer based codecs but is unlikely to ever be as fast due to the
extra overheads associated with streaming.
Both base64 and ascii hex implementations can produce a data rate
of approximately 40MB/sec on a Pentium Mobile 1.5GHz notebook.
With some performance tuning I'm sure this could be improved,
I think array bounds checking is the largest performance hit.

Currently requires jdk1.4 (exception handling requires rework
for jdk1.3).
Running ant without arguments in the root directory will build
the project, run all unit tests and run performance tests.  Note
that the tests require junit to be available within ant.

Javadocs are the only documentation at the moment.

Files can be found at:
http://www32.brinkster.com/bretthenderson/BHCodec-0.2.zip

I hope someone finds this useful.  I'm not trying to force my
implementation on anybody and I'm sure it could be improved in
many ways.  I'm simply putting it forward as an optional approach.
If it is decided that streamable codecs are a useful addition to
commons I'd be glad to help.

Cheers,
Brett

PS.  Some areas that currently need improving are:
1. Exception handling requires jdk1.4, should be rewritten to
support older java versions.
2. BufferReceiver allocates memory continuously during streamed
conversions, should be fixed to recycle memory buffers.
3. Engines should have a new flush method added to allow them
to hold off posting to receivers until their internal buffers
fill up.  This would prevent fragmented buffers during
pipelined conversions.
4. OutputStream flush needs rework, shouldn't call finalize,
should call new flush method on CodecEngines.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[codec] Streamable Codec Framework

2003-11-02 Thread Brett Henderson

I just realised I left off codec in the subject.  Sorry about
that.

-Original Message-
From: Brett Henderson [mailto:[EMAIL PROTECTED] 
Sent: Monday, 3 November 2003 10:47 AM
To: [EMAIL PROTECTED]
Subject: Streamable Codec Framework

Hi All,

I noticed Alexander Hvostov's recent email containing streamable
base64 codecs.  Given that the current codec implementations are
oriented around in-memory buffers, is there room for an
alternative codec framework supporting stream functionality?  I
realise the need for streamable codecs may not be that great but
it does seem like a gap in the current library.

I have done some work in this area over the last couple of months
as a small hobby project and have produced a small framework for
streamable codecs.

Some of the goals I was working towards were:
1. No memory allocation during streaming.  This eliminates
garbage collection during large conversions.
2. Pipelineable codecs.  This allows multiple codecs to be chained
together and treated as a single codec.  This allows codecs such as
base 64 to be broken into two components (base64 and line wrapping
codecs).
2. Single OutputStream, InputStream implementations which
utilise codec engines internally.  This eliminates the need to
produce a buffer based engine and a stream engine for every codec.
Note that this requires codec engines to be written in a manner
that supports streaming.
3. Customisable receivers.  All codecs utilise receivers to
handle conversion results.  This allows different outputs such as
streams, in-memory buffers, etc to be supported.
4. Direction agnostic codecs.  Decoupling the engine from the
streams allows the engines to be used in different ways than
originally intended.  Ie. You can perform base64 encoding
during reads from an InputStream.

I have produced base64 and ascii hex codecs as a proof of concept
and to evaluate performance.  It isn't as fast as the current
buffer based codecs but is unlikely to ever be as fast due to the
extra overheads associated with streaming.
Both base64 and ascii hex implementations can produce a data rate
of approximately 40MB/sec on a Pentium Mobile 1.5GHz notebook.
With some performance tuning I'm sure this could be improved,
I think array bounds checking is the largest performance hit.

Currently requires jdk1.4 (exception handling requires rework
for jdk1.3).
Running ant without arguments in the root directory will build
the project, run all unit tests and run performance tests.  Note
that the tests require junit to be available within ant.

Javadocs are the only documentation at the moment.

Files can be found at:
http://www32.brinkster.com/bretthenderson/BHCodec-0.2.zip

I hope someone finds this useful.  I'm not trying to force my
implementation on anybody and I'm sure it could be improved in
many ways.  I'm simply putting it forward as an optional approach.
If it is decided that streamable codecs are a useful addition to
commons I'd be glad to help.

Cheers,
Brett

PS.  Some areas that currently need improving are:
1. Exception handling requires jdk1.4, should be rewritten to
support older java versions.
2. BufferReceiver allocates memory continuously during streamed
conversions, should be fixed to recycle memory buffers.
3. Engines should have a new flush method added to allow them
to hold off posting to receivers until their internal buffers
fill up.  This would prevent fragmented buffers during
pipelined conversions.
4. OutputStream flush needs rework, shouldn't call finalize,
should call new flush method on CodecEngines.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] StatefulDecoders

RE: [codec] StatefulDecoders

RE: [codec] StatefulDecoders

RE: [codec] StatefulDecoders

RE: [codec] StatefulDecoders

RE: [codec] StatefulDecoders

RE: [codec] StatefulDecoders

RE: [codec] More thoughts on CharSets and Encoders (references: RE: [codec] Streamable Codec Framework)

RE: [codec] Streamable Codec Framework

RE: [codec] Streamable Codec Framework

RE: [codec] multipart encoders/decoders

RE: [codec] multipart encoders/decoders

RE: [codec] multipart encoders/decoders

RE: [codec] multipart encoders/decoders

RE: [codec] multipart encoders/decoders

RE: [codec] Streamable Codec Framework

RE: [codec] Streamable Codec Framework

RE: [codec] Streamable Codec Framework

RE: [codec] Streamable Codec Framework

RE: [codec] Streamable Codec Framework

RE: [codec] Streamable Codec Framework

Streamable Codec Framework

[codec] Streamable Codec Framework

23 matches

Site Navigation

Mail list logo

Footer information