[go-nuts] Character Replacement/Algorithm for Replacement

2018-08-29 Thread agruetz45
I have approximately 9 characters that all need to be replaced with 
different characters. I know there are a number of ways to do this but what 
is the most efficient?


   - 1) Do a []byte walk and compare each byte and replace when found?
   - Seems expensive if you have a 100 bytes in the []byte
 - comes out to a max of 900 operations
  - 2) Use byte.Replace() for each one?
   - This seems to be about the same as #1 but seems like it makes a copy 
  so might use slightly more memory
   - 3) Regex for each one?
   - Seems like using a nuke to kill a flee
  - Seems expensive in processing
   - 4) Something I have not thought of?
   - Specific algorithm to solve this?
   

The []byte slice it self likely will not be over 100 bytes, however they 
may be be 10s of thousands of them.

Thanks,
Anthony Gruetzmacher

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [go-nuts] Character Replacement/Algorithm for Replacement

2018-08-29 Thread Daniela Petruzalek
Do you have an example?

I'm assuming that the replacement is one character for another. (ie.: not
one character being replaced by a group of characters).

Regarding to finding the positions to replace you can't beat O(n)
complexity as you must look at least once at every character on the source
string.

In regards to finding the replacement... the 9 characters could be stored
in a dictionary (map in Go) with their counterparts. Since a dictionary
lookup is O(1) in the average case you still have an O(n) algorithm. But,
since the byte[] size will not be over 100, you may assume that n is
constant too, so it's a constant time algorithm.

You didn't mention if the replacement should be in-place or not, so I'm
assuming it's in place.

Something like this I suppose:

func main() {
dict := map[byte]byte {0:1, 1:2, 3: 5}
arr := []byte{0,1,1,2,3,5,8,13,21}
fmt.Printf("Before: %#v", arr)
for i, b := range arr {
if replacement, ok := dict[b]; ok {
arr[i] = replacement
}
}
fmt.Printf("After: %#v", arr)
}

https://play.golang.org/p/hXnBKK3pDWk

O(n) time complexity (or O(1) if n  == 100) and O(k) additional space for
the dictionary.

Best,

Dani

Em qua, 29 de ago de 2018 às 22:15,  escreveu:

> I have approximately 9 characters that all need to be replaced with
> different characters. I know there are a number of ways to do this but what
> is the most efficient?
>
>
>- 1) Do a []byte walk and compare each byte and replace when found?
>- Seems expensive if you have a 100 bytes in the []byte
>  - comes out to a max of 900 operations
>   - 2) Use byte.Replace() for each one?
>- This seems to be about the same as #1 but seems like it makes a copy
>   so might use slightly more memory
>- 3) Regex for each one?
>- Seems like using a nuke to kill a flee
>   - Seems expensive in processing
>- 4) Something I have not thought of?
>- Specific algorithm to solve this?
>
>
> The []byte slice it self likely will not be over 100 bytes, however they
> may be be 10s of thousands of them.
>
> Thanks,
> Anthony Gruetzmacher
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [go-nuts] Character Replacement/Algorithm for Replacement

2018-08-29 Thread agruetz45
Makes sense. Best I am going to get is a linear search w/ a divide and 
conquer if I want to speed it up. Thanks needed the sounding board and feed 
back.

Thanks,
Anthony 

On Wednesday, August 29, 2018 at 8:16:44 PM UTC-7, Daniela Petruzalek wrote:
>
> Do you have an example?
>
> I'm assuming that the replacement is one character for another. (ie.: not 
> one character being replaced by a group of characters).
>
> Regarding to finding the positions to replace you can't beat O(n) 
> complexity as you must look at least once at every character on the source 
> string.
>
> In regards to finding the replacement... the 9 characters could be stored 
> in a dictionary (map in Go) with their counterparts. Since a dictionary 
> lookup is O(1) in the average case you still have an O(n) algorithm. But, 
> since the byte[] size will not be over 100, you may assume that n is 
> constant too, so it's a constant time algorithm.
>
> You didn't mention if the replacement should be in-place or not, so I'm 
> assuming it's in place.
>
> Something like this I suppose:
>
> func main() {
> dict := map[byte]byte {0:1, 1:2, 3: 5}
> arr := []byte{0,1,1,2,3,5,8,13,21}
> fmt.Printf("Before: %#v", arr)
> for i, b := range arr {
> if replacement, ok := dict[b]; ok {
> arr[i] = replacement
> }
> }
> fmt.Printf("After: %#v", arr)
> }
>
> https://play.golang.org/p/hXnBKK3pDWk
>
> O(n) time complexity (or O(1) if n  == 100) and O(k) additional space for 
> the dictionary.
>
> Best,
>
> Dani
>
> Em qua, 29 de ago de 2018 às 22:15, > 
> escreveu:
>
>> I have approximately 9 characters that all need to be replaced with 
>> different characters. I know there are a number of ways to do this but what 
>> is the most efficient?
>>
>>
>>- 1) Do a []byte walk and compare each byte and replace when found?
>>- Seems expensive if you have a 100 bytes in the []byte
>>  - comes out to a max of 900 operations
>>   - 2) Use byte.Replace() for each one?
>>- This seems to be about the same as #1 but seems like it makes a 
>>   copy so might use slightly more memory
>>- 3) Regex for each one?
>>- Seems like using a nuke to kill a flee
>>   - Seems expensive in processing
>>- 4) Something I have not thought of?
>>- Specific algorithm to solve this?
>>
>>
>> The []byte slice it self likely will not be over 100 bytes, however they 
>> may be be 10s of thousands of them.
>>
>> Thanks,
>> Anthony Gruetzmacher
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "golang-nuts" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to golang-nuts...@googlegroups.com .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [go-nuts] Character Replacement/Algorithm for Replacement

2018-08-29 Thread Tamás Gulácsi
An even faster lookup is creating a [256]byte array for the replacements, 
having the 9 replacements candidates _position_ having the replacements, 
all others the identity:

// prepare the lookup table
var a [256]byte
for i := 0; i<256; i++ {
  a[i] = i
}
a[37] = '-'
a[129] = '-'
...

// replace the bytes, in-place operation
func replace(p []byte) {
  for i, b := range p {
p[i] = a[b]
  }
}

Although map lookup is O(1), array lookup is O(1) with a smaller constant 
factor, I'm sure.
And this has no branching, so should be even faster.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [go-nuts] Character Replacement/Algorithm for Replacement

2018-08-30 Thread Marvin Renich
* Tamás Gulácsi  [180830 01:17]:
> An even faster lookup is creating a [256]byte array for the replacements, 
> having the 9 replacements candidates _position_ having the replacements, 
> all others the identity:
> 
> // prepare the lookup table
> var a [256]byte
> for i := 0; i<256; i++ {
>   a[i] = i
> }
> a[37] = '-'
> a[129] = '-'
> ...
> 
> // replace the bytes, in-place operation
> func replace(p []byte) {
>   for i, b := range p {
> p[i] = a[b]
>   }
> }
> 
> Although map lookup is O(1), array lookup is O(1) with a smaller constant 
> factor, I'm sure.
> And this has no branching, so should be even faster.

I'm not convinced, and only benchmarking will tell.  If len(p) == 100
and only six of those are in the set to be replaced, doing 100 memory
writes instead of branching and doing six memory writes might very well
be slower.

Today's hardware has all sorts of optimizations that help with both
cases.  Predictive branching and caching will each help with one of the
two approaches.  You can't be sure until you have benched with realistic
data.

I would, however, replace the map in the previous suggestion with a
[256]int16 with -1 meaning "don't replace".  This is really just a
special-case optimized map.  If 0 is not a valid replacement byte, use
[256]byte with 0 meaning "don't replace".

...Marvin

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [go-nuts] Character Replacement/Algorithm for Replacement

2018-08-30 Thread Michael Jones
The task to "translate a string through a table" is common. It is a single
computer instruction (TR) in the IBM 360 and was extended over time with
related instructions for conversion in and out of UTF-8, testing before and
after translation, and expanded character sizes. Tamás Gulácsi's approach
would be mine too.

On Thu, Aug 30, 2018 at 5:30 AM Marvin Renich  wrote:

> * Tamás Gulácsi  [180830 01:17]:
> > An even faster lookup is creating a [256]byte array for the
> replacements,
> > having the 9 replacements candidates _position_ having the replacements,
> > all others the identity:
> >
> > // prepare the lookup table
> > var a [256]byte
> > for i := 0; i<256; i++ {
> >   a[i] = i
> > }
> > a[37] = '-'
> > a[129] = '-'
> > ...
> >
> > // replace the bytes, in-place operation
> > func replace(p []byte) {
> >   for i, b := range p {
> > p[i] = a[b]
> >   }
> > }
> >
> > Although map lookup is O(1), array lookup is O(1) with a smaller
> constant
> > factor, I'm sure.
> > And this has no branching, so should be even faster.
>
> I'm not convinced, and only benchmarking will tell.  If len(p) == 100
> and only six of those are in the set to be replaced, doing 100 memory
> writes instead of branching and doing six memory writes might very well
> be slower.
>
> Today's hardware has all sorts of optimizations that help with both
> cases.  Predictive branching and caching will each help with one of the
> two approaches.  You can't be sure until you have benched with realistic
> data.
>
> I would, however, replace the map in the previous suggestion with a
> [256]int16 with -1 meaning "don't replace".  This is really just a
> special-case optimized map.  If 0 is not a valid replacement byte, use
> [256]byte with 0 meaning "don't replace".
>
> ...Marvin
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>


-- 

*Michael T. jonesmichael.jo...@gmail.com *

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [go-nuts] Character Replacement/Algorithm for Replacement

2018-08-30 Thread Marvin Renich
* Michael Jones  [180830 11:44]:
> The task to "translate a string through a table" is common. It is a single
> computer instruction (TR) in the IBM 360 and was extended over time with
> related instructions for conversion in and out of UTF-8, testing before and
> after translation, and expanded character sizes. Tamás Gulácsi's approach
> would be mine too.

Wow, that brings back memories!  The x86 has xlat as well, which can be
used with lodsb and stosb for efficient string translation.  The
question is whether the go compiler will produce optimized code using
these instructions.

I was not disagreeing with Tamás.  My point was simply that without
benchmarking, or prior in-depth knowledge of both how the Go compiler
translates these constructs to assembly and the characteristics of the
processor/motherboard/memory bus, it is hard to say which Go algorithm
is the fastest for the OP's data.

...Marvin

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [go-nuts] Character Replacement/Algorithm for Replacement

2018-08-30 Thread Michael Jones
Always true. But hard to believe that a map could be faster. The array will
be on the stack.

On Thu, Aug 30, 2018 at 9:31 AM Marvin Renich  wrote:

> * Michael Jones  [180830 11:44]:
> > The task to "translate a string through a table" is common. It is a
> single
> > computer instruction (TR) in the IBM 360 and was extended over time with
> > related instructions for conversion in and out of UTF-8, testing before
> and
> > after translation, and expanded character sizes. Tamás Gulácsi's approach
> > would be mine too.
>
> Wow, that brings back memories!  The x86 has xlat as well, which can be
> used with lodsb and stosb for efficient string translation.  The
> question is whether the go compiler will produce optimized code using
> these instructions.
>
> I was not disagreeing with Tamás.  My point was simply that without
> benchmarking, or prior in-depth knowledge of both how the Go compiler
> translates these constructs to assembly and the characteristics of the
> processor/motherboard/memory bus, it is hard to say which Go algorithm
> is the fastest for the OP's data.
>
> ...Marvin
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
-- 

*Michael T. jonesmichael.jo...@gmail.com *

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.