Hello,

I'm writing a simple tokenizer which is defined by this trait:

trait Tokenizer {
    fn next_token(&mut self) -> ~str;
    fn eof(&self) -> bool;
}

Obvious application for a tokenizer is splitting a stream going from
Reader, so I have the following structure which should implement
Tokenizer:

pub struct ReaderTokenizer<'self> {
    priv inner: &'self Reader,
    priv buffer: ~CyclicBuffer,
    priv seps: ~[~str]
}

I have used 'self lifetime parameter since I want for the tokenizer
work for any Reader. CyclicBuffer is another structure which
essentially is an array of u8 with special read/write operations.

Implementation of a Tokenizer for ReaderTokenizer involves reading
from the Reader one byte at a time. I decided to use buffering to
improve performance. But I still want to keep the useful abstraction
of single byte reading, so I decided to implement Iterator<u8> for my
Reader+CyclicBuffer pair. BTW, internal iterators in 0.7 were much
better for this, because internal iterator code was very simple and
didn't use explicit lifetimes at all, but 0.7 compiler suffers from
several errors related to pointers to traits which prevented my
program from compiling (I couldn't pass a reference to Reader to
CyclicBuffer method; there were other errors I've encountered too). I
So, I decided to use trunk version of the compiler in which these
errors are resolved according to github, but trunk version does not
allow internal iterators, which is very sad since now I'm forced to
create intermediate structures to achieve the same thing.

So, I came up with the following iterator structure:

struct RTBytesIterator<'self> {
    tokenizer: &'self mut ReaderTokenizer<'self>
}

impl<'self> Iterator<u8> for RTBytesIterator<'self> {
    fn next(&mut self) -> Option<u8> {
        if self.tokenizer.eof() {
            return None;
        }
        if self.tokenizer.buffer.readable_bytes() > 0 ||
           self.tokenizer.buffer.fill_from_reader(self.tokenizer.inner) > 0 {
            return Some(self.tokenizer.buffer.read_unsafe());
        } else {
            return None;
        }
    }
}

Note that tokenizer field is &'self mut since CyclicBuffer is mutable.
buffer.fill_from_reader() function reads as much as possible from the
reader (returning a number of bytes read), and buffer.read_unsafe()
returns next byte from the cyclic buffer.

Then I've added the following method to ReaderTokenizer:

impl<'self> ReaderTokenizer<'self> {
...
    fn bytes_iter(&mut self) -> RTBytesIterator<'self> {
        RTBytesIterator { tokenizer: self }
    }
...
}

This does not compile with the following error:

io/convert_io.rs:98:37: 98:43 error: cannot infer an appropriate
lifetime due to conflicting requirements
io/convert_io.rs:98         RTBytesIterator { tokenizer: self }
                                                         ^~~~~~
io/convert_io.rs:97:55: 99:5 note: first, the lifetime cannot outlive
the anonymous lifetime #1 defined on the block at 97:55...
io/convert_io.rs:97     fn bytes_iter(&mut self) -> RTBytesIterator<'self> {
io/convert_io.rs:98         RTBytesIterator { tokenizer: self }
io/convert_io.rs:99     }
io/convert_io.rs:98:37: 98:43 note: ...due to the following expression
io/convert_io.rs:98         RTBytesIterator { tokenizer: self }
                                                         ^~~~~~
io/convert_io.rs:97:55: 99:5 note: but, the lifetime must be valid for
the lifetime &'self  as defined on the block at 97:55...
io/convert_io.rs:97     fn bytes_iter(&mut self) -> RTBytesIterator<'self> {
io/convert_io.rs:98         RTBytesIterator { tokenizer: self }
io/convert_io.rs:99     }
io/convert_io.rs:98:8: 98:23 note: ...due to the following expression
io/convert_io.rs:98         RTBytesIterator { tokenizer: self }
                            ^~~~~~~~~~~~~~~
error: aborting due to previous error

OK, fair enough, I guess I have to annotate self parameter with 'self lifetime:

    fn bytes_iter(&'self mut self) -> RTBytesIterator<'self> {
        RTBytesIterator { tokenizer: self }
    }

This compiles, but now I'm getting another error at bytes_iter() usage
site, for example, the following code:

    fn try_read_sep(&mut self, first: u8) -> (~[u8], bool) {
        let mut part = ~[first];
        for b in self.bytes_iter() {
            part.push(b);
            if !self.is_sep_prefix(part) {
                return (part, false);
            }
            if self.is_sep(part) {
                break;
            }
        }
        return (part, true);
    }

fails to compile with this error:

io/convert_io.rs:117:17: 117:36 error: cannot infer an appropriate
lifetime due to conflicting requirements
io/convert_io.rs:117         for b in self.bytes_iter() {
                                      ^~~~~~~~~~~~~~~~~~~
io/convert_io.rs:117:17: 117:22 note: first, the lifetime cannot
outlive the expression at 117:17...
io/convert_io.rs:117         for b in self.bytes_iter() {
                                      ^~~~~
io/convert_io.rs:117:17: 117:22 note: ...due to the following expression
io/convert_io.rs:117         for b in self.bytes_iter() {
                                      ^~~~~
io/convert_io.rs:117:17: 117:36 note: but, the lifetime must be valid
for the method call at 117:17...
io/convert_io.rs:117         for b in self.bytes_iter() {
                                      ^~~~~~~~~~~~~~~~~~~
io/convert_io.rs:117:17: 117:22 note: ...due to the following expression
io/convert_io.rs:117         for b in self.bytes_iter() {
                                      ^~~~~

And now I'm completely stuck. I can't avoid these errors at all. This
looks like a bug to me, but I'm not completely sure - maybe it's me
who is wrong here.

I've studied libstd/libextra code for clues and found out that some
iterable structures have code very similar to mine, for example,
RingBuf. Here is its mut_iter() method:

    pub fn mut_iter<'a>(&'a mut self) -> RingBufMutIterator<'a, T> {
        RingBufMutIterator{index: 0, rindex: self.nelts, lo: self.lo,
elts: self.elts}
    }

I have tried to implement bytes_iter() method like this, but it
naturally didn't work because of 'a and 'self lifetimes conflict. In
my understanding, this works here because RingBuf does not have
lifetime parameter, so no conflict between 'self and 'a lifetime is
possible at all. But this will not work in my case, because I have to
have 'self parameter because of &'self Reader field.

What can I do to implement my ReaderTokenizer? Maybe there are other
ways of which I'm unaware?

Thank you very much in advance.

Best regards,
Vladimir.
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to