On 2/5/19 7:55 AM, Brad Gilbert wrote:
`index` is an NQP op, which means in this case that it is written in C
(assuming you are using MoarVM)

https://github.com/MoarVM/MoarVM/blob/ddde09508310a5f60c63474db8f9682bc922700b/src/strings/ops.c#L557-L656

The code I gave for finding a Buf inside of another one was quickly
made in a way to prevent certain bugs from even being possible.


I rewrote it using knowledge of the internals to be a bit faster:

     sub blob-index ( Blob:D $buffer, Blob:D $needle, UInt:D $init = 0
--> Int ) {
         use nqp;
         my int $needle-width = $needle.elems;
         my int $elems = $buffer.elems;

         if $elems < $needle-width + $init {
             fail 'needle is larger than the buffer'
         }

         my uint $from = $init;
         my uint $to   = $from + $needle-width - 1;

         loop ( ; $to < $elems ; ++$from,++$to ) {
             return $from if $needle eq nqp::slice($buffer,$from,$to)
             # eq is safe since they are two Blobs/Bufs
         }
         return Nil
     }

It's faster mainly because it doesn't use an iterator.

This was quickly written.

Note that Buf is a subtype of Blob, so it will also work if given a Buf.

It could be written to be faster, but it still isn't going to be as
fast as `index`.
For one `index` uses a subroutine named `knuth_morris_pratt_string_index`.

---

Also Unicode strings are not stored flat in MoarVM

     'a' x 10

That is stored something like the following internally.

     REPEAT( STR('a'), 10 )

What this means is that if you are looking for a 'b' for example, it
could check the first repetition for 'b', and skip the rest of the
REPEAT.
That would save 9 string comparisons in this example.

That is not how Bufs are stored in Perl6 at all.
They are declared with `is repr('VMArray')` which basically means they
are stored as C arrays.

On Tue, Feb 5, 2019 at 1:47 AM ToddAndMargo via perl6-users
<perl6-us...@perl.org> wrote:

On 2/2/19 9:29 PM, Brad Gilbert wrote:
Subs do not need to have a `return` statement if it is returning the last value.

You also broke the return value of the subroutine that I wrote by
assigning it to a variable.

What I wrote would return `Nil` if it failed to find a match, yours
will return an undefined `Int`.
It should return `Nil` because that is what `index` would return.

Doing bytewise conversion from Buf to Str is pointless. It will break
on Unicode data.
It would also be exactly the same as converting ASCII if it worked.
(It won't work on binary data)

If you are dealing with something that is mostly Unicode but also has
binary data
decode using 'utf8-c8'.

If you are dealing with something that is mostly binary, decode using 'latin1',
or just use raw bytes in a buffer.

      my Buf $buffer = $fh.read(10);
      my Str $string = $buffer.decode('latin1');

      # the first three bytes were really a Utf8 encoded character
      my Str $char = $string.substr(0,3).encode('latin1').decode('utf8');
      # or
      my Str $char = $buffer.subbuf(0,3).decode('utf8');

Also note that `encode` doesn't always return a Buf.

      my Buf $buf = Buf.new( 'hello'.encode('utf8') );

---

The subroutine I wrote was simplified to work for an Array or List, not a Buf.

It is also weird that you are using CamelCase for variables,
and a mixture of CamelCase and snake-case for the subroutine name.


Improving your variant, and changing it so the second parameter is a Buf.

      sub Buf-Index ( Buf $Buffer, Buf $SubBuf ) {
          my List $Matcher = $SubBuf.List; # only call .List once
          my Any $Position is default(Nil) = Nil;
          my Int $Elems = $Matcher.elems;

          $Position = $Buffer.rotor($Elems => 1- $Elems).first(* eqv
$Matcher, :k);
          return $Position;
      }

`$Position` has to be `Any` (or `Mu`) so that it can store the value `Nil`.
`Nil` sets a variable to its default, so we have to change the default
with `is default(Nil)`.
(The normal default is the same as the container type)
(The `= Nil;` is always pointless in the declaration of a `$scalar` variable.)

One simplification is to just have the return value as the last thing
in the subroutine without a `return`.
(It may also be slightly faster, but not by much.)

      sub Buf-Index ( Buf $Buffer, Buf $SubBuf ) {
          my List $Matcher = $SubBuf.List;
          my Any $Position is default(Nil) = Nil;
          my Int $Elems = $Matcher.elems;

          $Position = $Buffer.rotor($Elems => 1- $Elems).first(* eqv
$Matcher, :k);
          $Position; # <------------
      }

Assignment is a rvalue, so we can remove that last line

      sub Buf-Index ( Buf $Buffer, Buf $SubBuf ) {
          my List $Matcher = $SubBuf.List;
          my Any $Position is default(Nil) = Nil;
          my Int $Elems = $Matcher.elems;

          $Position = $Buffer.rotor($Elems => 1- $Elems).first(* eqv
$Matcher, :k);
          # <-----------
      }

`$Position` is now completely pointless.

      sub Buf-Index ( Buf $Buffer, Buf $SubBuf ) {
          my List $Matcher = $SubBuf.List;
          my Int $Elems = $Matcher.elems;

          $Buffer.rotor($Elems => 1- $Elems).first(* eqv $Matcher, :k);
          # ^
      }

If you want `return` (even though it isn't doing anything)

      sub Buf-Index ( Buf $Buffer, Buf $SubBuf ) {
          my List $Matcher = $SubBuf.List;
          my Int $Elems = $Matcher.elems;

          return $Buffer.rotor($Elems => 1- $Elems).first(* eqv $Matcher, :k);
          # ^
      }

You could also declare the type of the return value

      sub Buf-Index ( Buf $Buffer, Buf $SubBuf --> Int ) { # <----
          my List $Matcher = $SubBuf.List;
          my Int $Elems = $Matcher.elems;

          return $Buffer.rotor($Elems => 1- $Elems).first(* eqv $Matcher, :k);
      }

Note that `Nil` can sneak around the return value type check.

---

As an added bonus, here is a subroutine that returns all of the indices.
(Note that the only differences are `grep` rather than `first`, and
the return type)

      sub Buf-Indices ( Buf $Buffer, Buf $SubBuf --> Seq ) {
          my List $Matcher = $SubBuf.List;
          my Int $Elems = $Matcher.elems;

          return $Buffer.rotor($Elems => 1- $Elems).grep(* eqv $Matcher, :k);
      }

      }

On Sat, Feb 2, 2019 at 10:05 PM ToddAndMargo via perl6-users
<perl6-us...@perl.org> wrote:

On 2/2/19 6:09 AM, Brad Gilbert wrote:
       sub buf-index ( Buf $buf, +@match ) {
           my $elems = @match.elems;
           $buf.rotor( $elems => 1 - $elems ).first(* eqv @match.List, :k)
       }

       my $buf = Buf[uint8].new(0x4D, 0x5A, 0x90, 0x00, 0x03);

       say buf-index( $buf, (0x90, 0x00, 0x03)); # 2

What did I do wrong?

First I did a byte wise conversion of

      Buf $BinaryFile   to   Str $StrFile

and

      Buf $VersionInfoBuf  to  Str $VersionInfoStr



sub Buf-Index ( Buf $Buffer, +@SubBuf ) {
      # `index` for buffers
      # $Buffer is the buffer to search through
      # $ +@SubBuf is the sub buffer pattern to search for in $Buffer
      # returns the first instance of a match, Nil if no match

      my Int $Position = Nil;
      my $Elems = @SubBuf.elems;

      $Position = $Buffer.rotor( $Elems => 1 - $Elems ).first( * eqv
@SubBuf.List, :k );
      return $Position;
}

      $i  = index(     $StrFile,    $VersionInfoStr );
      $bi = Buf-Index( $BinaryFile, $VersionInfoBuf );
      say "i = <$i>   bi = <$bi>";




$ FileVer.pl6
i = <11371>   bi = <>


11371 is correct.



What did I do wrong?

Many thanks,
-T


Got it working.  Thank you!

It is a tad slow.  Depending on the file's size, it is
20 to 190 times slower than "index".

I have a lot of thinking to do.


Thank you!

Hmmmm.  Buf's are C arrays.  Interesting.

I still can't get away with

    my Buf $x=Buf.new(0x55, 0x66, 0x77, 0x78);
    my Buf $y=$x[ 1..2 ];

I have to
    my Buf $x=Buf.new(0x55, 0x66, 0x77, 0x78);
    my Buf $y=$x.subbuf(1, 2); say $y;

-T


--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Computers are like air conditioners.
They malfunction when you open windows
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Reply via email to