`index` is an NQP op, which means in this case that it is written in C (assuming you are using MoarVM)
https://github.com/MoarVM/MoarVM/blob/ddde09508310a5f60c63474db8f9682bc922700b/src/strings/ops.c#L557-L656 The code I gave for finding a Buf inside of another one was quickly made in a way to prevent certain bugs from even being possible. I rewrote it using knowledge of the internals to be a bit faster: sub blob-index ( Blob:D $buffer, Blob:D $needle, UInt:D $init = 0 --> Int ) { use nqp; my int $needle-width = $needle.elems; my int $elems = $buffer.elems; if $elems < $needle-width + $init { fail 'needle is larger than the buffer' } my uint $from = $init; my uint $to = $from + $needle-width - 1; loop ( ; $to < $elems ; ++$from,++$to ) { return $from if $needle eq nqp::slice($buffer,$from,$to) # eq is safe since they are two Blobs/Bufs } return Nil } It's faster mainly because it doesn't use an iterator. This was quickly written. Note that Buf is a subtype of Blob, so it will also work if given a Buf. It could be written to be faster, but it still isn't going to be as fast as `index`. For one `index` uses a subroutine named `knuth_morris_pratt_string_index`. --- Also Unicode strings are not stored flat in MoarVM 'a' x 10 That is stored something like the following internally. REPEAT( STR('a'), 10 ) What this means is that if you are looking for a 'b' for example, it could check the first repetition for 'b', and skip the rest of the REPEAT. That would save 9 string comparisons in this example. That is not how Bufs are stored in Perl6 at all. They are declared with `is repr('VMArray')` which basically means they are stored as C arrays. On Tue, Feb 5, 2019 at 1:47 AM ToddAndMargo via perl6-users <perl6-us...@perl.org> wrote: > > On 2/2/19 9:29 PM, Brad Gilbert wrote: > > Subs do not need to have a `return` statement if it is returning the last > > value. > > > > You also broke the return value of the subroutine that I wrote by > > assigning it to a variable. > > > > What I wrote would return `Nil` if it failed to find a match, yours > > will return an undefined `Int`. > > It should return `Nil` because that is what `index` would return. > > > > Doing bytewise conversion from Buf to Str is pointless. It will break > > on Unicode data. > > It would also be exactly the same as converting ASCII if it worked. > > (It won't work on binary data) > > > > If you are dealing with something that is mostly Unicode but also has > > binary data > > decode using 'utf8-c8'. > > > > If you are dealing with something that is mostly binary, decode using > > 'latin1', > > or just use raw bytes in a buffer. > > > > my Buf $buffer = $fh.read(10); > > my Str $string = $buffer.decode('latin1'); > > > > # the first three bytes were really a Utf8 encoded character > > my Str $char = $string.substr(0,3).encode('latin1').decode('utf8'); > > # or > > my Str $char = $buffer.subbuf(0,3).decode('utf8'); > > > > Also note that `encode` doesn't always return a Buf. > > > > my Buf $buf = Buf.new( 'hello'.encode('utf8') ); > > > > --- > > > > The subroutine I wrote was simplified to work for an Array or List, not a > > Buf. > > > > It is also weird that you are using CamelCase for variables, > > and a mixture of CamelCase and snake-case for the subroutine name. > > > > > > Improving your variant, and changing it so the second parameter is a Buf. > > > > sub Buf-Index ( Buf $Buffer, Buf $SubBuf ) { > > my List $Matcher = $SubBuf.List; # only call .List once > > my Any $Position is default(Nil) = Nil; > > my Int $Elems = $Matcher.elems; > > > > $Position = $Buffer.rotor($Elems => 1- $Elems).first(* eqv > > $Matcher, :k); > > return $Position; > > } > > > > `$Position` has to be `Any` (or `Mu`) so that it can store the value `Nil`. > > `Nil` sets a variable to its default, so we have to change the default > > with `is default(Nil)`. > > (The normal default is the same as the container type) > > (The `= Nil;` is always pointless in the declaration of a `$scalar` > > variable.) > > > > One simplification is to just have the return value as the last thing > > in the subroutine without a `return`. > > (It may also be slightly faster, but not by much.) > > > > sub Buf-Index ( Buf $Buffer, Buf $SubBuf ) { > > my List $Matcher = $SubBuf.List; > > my Any $Position is default(Nil) = Nil; > > my Int $Elems = $Matcher.elems; > > > > $Position = $Buffer.rotor($Elems => 1- $Elems).first(* eqv > > $Matcher, :k); > > $Position; # <------------ > > } > > > > Assignment is a rvalue, so we can remove that last line > > > > sub Buf-Index ( Buf $Buffer, Buf $SubBuf ) { > > my List $Matcher = $SubBuf.List; > > my Any $Position is default(Nil) = Nil; > > my Int $Elems = $Matcher.elems; > > > > $Position = $Buffer.rotor($Elems => 1- $Elems).first(* eqv > > $Matcher, :k); > > # <----------- > > } > > > > `$Position` is now completely pointless. > > > > sub Buf-Index ( Buf $Buffer, Buf $SubBuf ) { > > my List $Matcher = $SubBuf.List; > > my Int $Elems = $Matcher.elems; > > > > $Buffer.rotor($Elems => 1- $Elems).first(* eqv $Matcher, :k); > > # ^ > > } > > > > If you want `return` (even though it isn't doing anything) > > > > sub Buf-Index ( Buf $Buffer, Buf $SubBuf ) { > > my List $Matcher = $SubBuf.List; > > my Int $Elems = $Matcher.elems; > > > > return $Buffer.rotor($Elems => 1- $Elems).first(* eqv $Matcher, > > :k); > > # ^ > > } > > > > You could also declare the type of the return value > > > > sub Buf-Index ( Buf $Buffer, Buf $SubBuf --> Int ) { # <---- > > my List $Matcher = $SubBuf.List; > > my Int $Elems = $Matcher.elems; > > > > return $Buffer.rotor($Elems => 1- $Elems).first(* eqv $Matcher, > > :k); > > } > > > > Note that `Nil` can sneak around the return value type check. > > > > --- > > > > As an added bonus, here is a subroutine that returns all of the indices. > > (Note that the only differences are `grep` rather than `first`, and > > the return type) > > > > sub Buf-Indices ( Buf $Buffer, Buf $SubBuf --> Seq ) { > > my List $Matcher = $SubBuf.List; > > my Int $Elems = $Matcher.elems; > > > > return $Buffer.rotor($Elems => 1- $Elems).grep(* eqv $Matcher, :k); > > } > > > > } > > > > On Sat, Feb 2, 2019 at 10:05 PM ToddAndMargo via perl6-users > > <perl6-us...@perl.org> wrote: > >> > >> On 2/2/19 6:09 AM, Brad Gilbert wrote: > >>> sub buf-index ( Buf $buf, +@match ) { > >>> my $elems = @match.elems; > >>> $buf.rotor( $elems => 1 - $elems ).first(* eqv @match.List, :k) > >>> } > >>> > >>> my $buf = Buf[uint8].new(0x4D, 0x5A, 0x90, 0x00, 0x03); > >>> > >>> say buf-index( $buf, (0x90, 0x00, 0x03)); # 2 > >> > >> What did I do wrong? > >> > >> First I did a byte wise conversion of > >> > >> Buf $BinaryFile to Str $StrFile > >> > >> and > >> > >> Buf $VersionInfoBuf to Str $VersionInfoStr > >> > >> > >> > >> sub Buf-Index ( Buf $Buffer, +@SubBuf ) { > >> # `index` for buffers > >> # $Buffer is the buffer to search through > >> # $ +@SubBuf is the sub buffer pattern to search for in $Buffer > >> # returns the first instance of a match, Nil if no match > >> > >> my Int $Position = Nil; > >> my $Elems = @SubBuf.elems; > >> > >> $Position = $Buffer.rotor( $Elems => 1 - $Elems ).first( * eqv > >> @SubBuf.List, :k ); > >> return $Position; > >> } > >> > >> $i = index( $StrFile, $VersionInfoStr ); > >> $bi = Buf-Index( $BinaryFile, $VersionInfoBuf ); > >> say "i = <$i> bi = <$bi>"; > >> > >> > >> > >> > >> $ FileVer.pl6 > >> i = <11371> bi = <> > >> > >> > >> 11371 is correct. > >> > >> > >> > >> What did I do wrong? > >> > >> Many thanks, > >> -T > > > Got it working. Thank you! > > It is a tad slow. Depending on the file's size, it is > 20 to 190 times slower than "index". > > I have a lot of thinking to do.