The finite set idea might work, but the set is well over 300K. The strings 
(part numbers) are not regular.

I could make a single pass over the "parent" column and record in a 
map[string]int the index of the first occurrence. Then I would avoid 
sort.Search() having to find it each time. Or use sort.Search() on the 
smaller 300K slice if map performance was a problem...

There is a lot of repetition in the "branches" - perhaps some caching would 
be appropriate...

thanks for the ideas!

On Wednesday, November 30, 2016 at 9:31:56 AM UTC-5, adon...@google.com 
wrote:
>
> On Wednesday, 30 November 2016 03:37:55 UTC+2, Mandolyte wrote:
>>>
>>> I have a fairly large 2D slice of strings, about 115M rows. These are 
>>> (parent, child) pairs that, processed recursively, form a tree. I am 
>>> "materializing" all possible trees as well as determining for each root 
>>> node all child nodes for all levels for it.
>>>
>>  
> In addition to what Egon said:
>
> If the elements of the outer [][]string slice are all []string slices of 
> length 2, it would be much more efficient to use [2]string instead, that 
> is, fixed-sized arrays of two strings, since this would avoid a lot of 
> memory allocation and pointer indirection.  The type of the outer slice 
> would become [][2]string.
>
> Also, if all your strings are drawn from a finite set, you'll find many 
> operations (such as comparison or set membership tests) much more efficient 
> if you represent each string by its index in some side table.  Then instead 
> of [2]string you can use [2]uint32, or perhaps even a narrower integer 
> type, which will make your main array more compact by at least a factor of 
> 4.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to