Re: streaming API design note for review

Steve Lawrence Thu, 09 Nov 2017 10:01:33 -0800

On 11/09/2017 11:17 AM, Mike Beckerle wrote:
> I created a design note on Streaming API features we need for 
> message-by-message processing style.



I think something that maybe isn't stated is missing, but I think is a
core justification for such a change. Thinking about it led me down a
brain dump:

My initial thought was that the new API looks very similar to something
like this:

  val pf: ProcessorFactory = ???
  val dp: DataProcessor = pf.onPath("/item")
  val is: InputStream = ??? // the raw data

  val xmlOut = new ScalaXMLInfosetOutputter()

  def items : Stream[Node] = {
    xmlOut.reset()
    val pr = dp.parse(is, xmlOut)
    val item = if (pr.isError) Nil else xmlOut.getResult()
    item #:: items
  }

Which is basically our existing API. And all we need to do is modify our
I/O layer to not be so greedy when it gets an InputStream like it is
now. But I think the main issue with this is that when we parse data we
could potentially read a bunch of data off the InputStream, then
backtrack, and now that InputStream data is lost for the next parse. To
me, that seems like a core issue with our current implementation that I
don't think you really mentioned.

To me, it seems the big thing that the StreamingParser gets you is that
I assume it would cache InputStream data from previous calls to parse()
so that they will be available to future calls of parse() if
backtracking occurs. Is this correct? Are the other benefits to the
StremaingParser?

If this is the main difference, rather than having a special
StreamingParser, and since this seems mostly related to the IO layer,
what if we just have special stateful DaffodilInputStream class that
handles this caching of data and other state related to the input. So
something like:

  val dis = new DaffodilInputStream(is)

  def items : Stream[Node] = {
    xmlOut.reset()
    val pr = dp.parse(dis, xmlOut)
    val item = if (pr.isError) Nil else xmlOut.getResult()
    item #:: items
  }

So this really is more a change to our I/O layer rather than the
parser/data processors. Another reason why something like this might be
useful is if the data stream was actually something like length-data
pairs, in which case the user might do something like this:

  val dis = new DaffodilInputString(is)

  def items : Stream[Node] = {
    xmlOut.reset()
    val len = dis.read() // next byte is a length
    val pr = dp.parse(dis, xmlOut, len * 8)  // len bytes of data
    val item = if (pr.isError) Nil else xmlOut.getResult()
    item #:: items
  }

Thoughts?

- Steve

Re: streaming API design note for review

Reply via email to