Re: [go-nuts] Type func binding and interfaces

Louki Sumirniy Tue, 24 Apr 2018 02:03:13 -0700

I just dug into the data structure spec for Btrfs, figuring it would have 
trees in it, and no, they are not implemented in the way I am working here 
either.


Yes, it may well be that I am on a wild goose chase with this, but hey, 
I've learned an awful lot of cool stuff now that will serve me well later 
anyway, regardless of whether this succeeds or fails when the rubber hits 
the road.

On Tuesday, 24 April 2018 11:58:39 UTC+3, Louki Sumirniy wrote:
>
>
>
> On Tuesday, 24 April 2018 11:15:34 UTC+3, rog wrote:
>>
>> On 23 April 2018 at 19:58, Louki Sumirniy 
>> <louki.sumir...@gmail.com> wrote: 
>>
>  
>
>> > I set many of those identifiers to export because I wanted to keep 
>> separate 
>> > derived libraries that replaced the storage types embedded into the 
>> main 
>> > library. These limitations would not exist were the 'back store' over 
>> the 
>> > wire or on disk. They only exist for the storage that is most 
>> accessible to 
>> > the program, the main memory. 
>>
>> I don't understand this. If you're replacing the underlying storage 
>> types, why would you need to export details of an algorithm that the 
>> storage types shouldn't need to have anything to do with? 
>>
>
> Because if they can't be accessed outside of the scope of the file I am 
> forced to add the alternative data types within the same file. This is a 
> massive loss of flexibility (and means an importing application cannot 
> implement its own variant). In my test/scratch/thingy, even within the same 
> folder direct access was impossible because of file scoping, so I 
> capitalised the fields, because if it can't be done even in the same 
> folder, in a different pkg/name folder for another implementing library I 
> won't be able to do this either
>  
>
>> > Regarding the use of zero as an nil, that is already accepted by 
>> numerous 
>> > built in types (error, for example), and in this case the reason why I 
>> have 
>> > put that into the *abstract* type is because the data I am writing this 
>> > algorithm for is hashes, which are never zero, at least in practise, 
>> over a 
>> > period of decades, no hash functions produce zero out of any 
>> significant 
>> > amount of known data. 
>>
>> Why does your algorithm need to know about the possible nil-ness of 
>> the stored items? It doesn't use that property now, at any rate. 
>>
>
> Zero is the default signifier of an empty node. Zero acts as the sentinel 
> for the search to detect when it has fallen off the edge of a branch. But 
> zero isn't necessarily the 'empty node' value, an implementation using 
> strings needs to compare to "", and struct and slice types would need to 
> test for nil, and one may conceive of an application where the empty field 
> sentinel actually has a value (and its initialisation would require filling 
> the data that way, of course)
>  
>
>>
>> > This idea that I should be hiding all the functions smells like OOP to 
>> me. 
>>
>> This isn't OOP - it's just good code hygiene. By hiding implementation 
>> details, you make the public interface clear and you prevent clients 
>> from relying on internal implementation details that you might want to 
>> change. 
>>
>
> How do I override the functions to implement the data abstraction if the 
> overridden functions are not exported? Internal implementation details for 
> this specific library will not change once I hit the beta milestone. This 
> is not a complex application, it is just a data storage/search library. 
> Somewhere along the line there will have to be some exported elements that 
> can be passed to and fro and possibly some of these will be encapsulated 
> like this. But encapsulation can be a big obstacle to reuse and derivation.
>
> Specific points that clearly do not need to be visible need to be pointed 
> out. If someone is *extending* this then to be honest, they should just be 
> forking it anyway, rather than importing it. Also, I don't think that is as 
> easy to do in Go as it would be in an OOP language, as I had to do very 
> specific things in the base type struct spec in order to enable overriding. 
> In an OOP language you can override until the cows come home, and this is 
> the 'fragility' problem of OOP classes that Go spsecifically aims to 
> eliminate by enforcing composition.
>
> If a user of the library were to embed the structure in another type, they 
> would get access to these private elements anyway, and then by doing this 
> on an import rather than a fork, they would be depending on me not to 
> change the way it works. Everything I have exposed is specifically exposed 
> to allow an external library to simply reimplement the set of accessors. 
> These accessors need to know the Cursor type also, as this is how the tree 
> is walked.
>
> If I was writing a much more complex library then clearly there needs to 
> be exposure of only that which is interface, and the interface needs to be 
> immutable until version bumps.
>  
>
>> > As it stands with how it has to be implemented in Go, the accessing of 
>> these 
>> > would-be-private-in-C++ scenario, is very cumbersome due to the need to 
>> use 
>> > type assertions. I have tinkered with this for educational value but I 
>> don't 
>> > want to have to deal with this in the actual implementation. A key 
>> insight 
>> > of Schneier's work is 'security by obscurity' is weak. Private 
>> functions fit 
>> > that maxim pretty well. 
>>
>> Private functions are not "security by obscurity". Private functions 
>> aren't "obscure" (you can see them in the source perfectly well) but 
>> they are "secure" because you cannot access them. 
>>
>
> This is nonsense because all I have to do is fork and expose and what 
> exactly is the nature of the danger of me knowing the internals? I follow 
> FP methodology fairly strictly whenever possible and I did not design this 
> library with it in mind to multi-thread it and expose the matter of side 
> effects and conflicting state changes. It's specifically designed to be 
> single threaded.
>  
>
>> > I think that there is no real benefit in doing this, 
>> > since after all, this is the source, and if you can be told you can't 
>> access 
>> > that thing that is 'private' then it's not terribly private in reality, 
>> only 
>> > for arbitrary philosophical reasons excluded from your programmatic 
>> access 
>> > without forking the code. 
>>
>> As above, doing this is considered good code hygiene. If you do decide 
>> to fork the code and make something public, then it's obvious that 
>> you're breaking the API boundaries. 
>>
>
> You are probably right about a few parts of this needing to be not 
> exported. Specifically anything that cannot be extended, such as the tree 
> balance functions, and the details they work with, the fill tracking array, 
> and probably the depth and length values should be also not exported, as 
> one thing that would totally screw up this algorithm is callers clobbering 
> those state values, and triggering errors (I do test for all such 
> conditions that would cause array bounds exceptions, though, and this would 
> be the fault of the consumer). It's still a work in progress. I mean, I 
> only just finally made a working implementation for the abstraction of the 
> backing store yesterday. I'm not a professional OOP language programmer, or 
> any kind of professional, this is a work of love and that I think will help 
> me achieve goals for future projects, at least with a high likelihood for 
> the next project on my agenda.
>  
>
>> > The search functions are already in there, it's a piece of cake, given 
>> the 
>> > ability to test for equality, lesser and greater. That was probably one 
>> of 
>> > the first things I implemented, because it's so simple. Nobody has ever 
>> > implemented a binary search tree with a dense store like this before, 
>> and so 
>> > the insert and delete functions are not there yet. Just use your 
>> imagination 
>> > for a little bit, even with a 4 row *dense* tree and consider what 
>> happens 
>> > when I delete a node with children, let's say, at row 3, with two 
>> children. 
>> > Where do thtose children go? They are now orphans, as their parent has 
>> been 
>> > cut out of the tree. So, and critical to the design of BAST, is that it 
>> be 
>> > possible to perform short optimisations during delete/insert write 
>> > operations that do not cost too much time, but at the same time, do not 
>> > cause the search cost to become excessively spread out, and 
>> specifically 
>> > with delete, you can't have orphans. They have to be pushed up and 
>> inwards, 
>> > it's not complicated, but it gets more complicated the more nodes are 
>> > children of the severed node. 
>> > 
>> > It's not a model for tree structures that ANYONE is familiar with. 
>>
>> I think you overestimate the novelty of your algorithm. It sounds to 
>> me like it bears significant resemblance to a "complete binary tree" 
>> (see https://xlinux.nist.gov/dads/HTML/completeBinaryTree.html). 
>>
>
> A complete binary tree does not have a concept of empty nodes. 
>
> It uses references to bind nodes together, thus the overhead of memory 
> storage, and the randomness (usually insert order) in the array of nodes.
>
> The random memory locations of the node data is precisely what this 
> algorithm aims to eliminate, as well as reducing memory overhead. Walk 
> operations only require very short 32 bit integer math functions to provide 
> the location of nodes with various relations, mainly parent, child, and a 
> lateral direction, that treats the tree as though it was a sparse sorted 
> list (sparse because some elements will be empty, usually).
>
> I was hoping you had put me onto something that resembled what I am doing 
> but no. Sparse versus Dense is the conceptual polarity I am working from 
> here, not Partly Empty versus Complete. The powers of two formulas I use 
> allow me to derive relations between members of a 1 dimensional array that 
> map to a complete binary tree, but it's not a complete binary tree because 
> it has empty nodes, and these empty nodes are very important because they 
> mark the momentary boundaries and constraints required for search, insert 
> and delete (especially delete, where a subtree can be severed and become 
> unreachable).
>  
>
>> > Unlike simple, stupid newbie mistakes, this is intentional. I am 
>> writing this 
>> > algorithm because it, by its architecture, reduces random access of 
>> memory, 
>> > and in this, it gains a benefit of reduced cache misses in the CPU and 
>> thus 
>> > will have, potentially, a serious performance advantage compared to any 
>> > search-and-sort algorithm whether it be hashtables or vector reference 
>> based 
>> > binary search trees. 
>>
>> You keep on making references to the efficiency of this algorithm, but 
>> I don't see any benchmarks or algorithm analysis that supports your 
>> viewpoint here. 
>>
>
> Benchmarks would require a completed algorithm. I will absolutely be 
> profiling it once it's done, in order to work out the comparative 
> processing loads for each type of operation. I have a vague concept of the 
> heuristics for optimisation operations, but I had been putting off filling 
> these parts in because I still hadn't achieved the data type abstraction at 
> that point. 
>
> The general rules for these tree rotation operations, in the BAST idiom, 
> are simple, and mostly depend on the lateral walk operations,  but need to 
> know also when the directions are up and down in each sideways step. The 
> root can be fairly easily displaced sideways from one side to the other to 
> maintain left/right balance, and this operation can precede the insert and 
> delete operations, but in some cases it may be better to stepwise displace 
> sideways from an insert on an overweight side, but if this was always how 
> it was done it would become a very expensive operation on a large tree.
>
> At this point the *potential* for a performance benefit is fairly clear, 
> but it can't be said with certainty how much and in what conditions there 
> is a benefit, and indeed, when there might be a disadvantage. A lot of this 
> hinges on the tree balancing operations. It is almost unquestionably true, 
> however, that for searches, this structure will reduce memory access 
> latency, and if it were implemented on a spinning disk, the benefit would 
> be extraordinary (and this is something that I might be able to learn some 
> things about with the trees used in filesystems). 
>
> (which I am going to now, on that note, go and do a little bit of reading).
>
> Thanks for your discussion about this... it's implicitly complimentary to 
> have my ruminations actually read and considered.
>  
>
>>   cheers, 
>>     rog. 
>>
>> > Yes, on the last point, it always uses power of two because it is 
>> modelling 
>> > a dense binary tree, hence the name 'Bifurcation Array'. 
>> > Thus also why I 
>> > haven't finished the insert/delete functions. You can't just spin 
>> things 
>> > around like in a vector based BST. There is a cost in search linearity 
>> for 
>> > the differentials of path lengths from the root, but here's the thing - 
>> if 
>> > you want to dodge allocating a new row, you could instead find the 
>> value 
>> > that you could push inwards to the other side, and trace a path until 
>> there 
>> > is an inner empty child that you can 'rotate' into. Most of the time 
>> this 
>> > will not exceed maybe 32kb of data, maybe at the most, most of the 
>> time, for 
>> > most sizes of trees you would be working with, no more than 1mb, and 
>> guess 
>> > what, that all fits within a CPU cache which means manipulating that 
>> memory 
>> > takes no wait time and data transfers at 4-6 times as fast as the front 
>> side 
>> > bus. 
>> > 
>> > For the same reason I have functions and variables that track the fill 
>> of 
>> > each row of the tree, as well as one at the top that totals the left 
>> and 
>> > right. These values can be quickly updated with each insert and delete, 
>> and 
>> > can be used to feed heuristics that decide what probably will be the 
>> best 
>> > way to shuffle the tree around to maintain its low variance of minimum 
>> and 
>> > maximum search time. 
>> > 
>> > It's for realtime hash tables that rapidly change, it's for proof of 
>> work 
>> > algorithms that would otherwise entail such a high and mostly fixed 
>> cost of 
>> > sorting to restructure to enable fast searching, so that after each 
>> > operation, the tree is in a fairly optimal state for searches, as well 
>> as 
>> > inserts and deletes. 
>> > 
>> > I might be talking in ways that more experienced programmers and CS 
>> people 
>> > may think to be naive, but I don't think it is naive to think that we 
>> can 
>> > make more hay out of the SRAM in CPU caches, especially for 
>> applications 
>> > that are becoming extremely critical to our lives. Databases 
>> specifically, 
>> > but as a cryptocurrency miner, the issue of the onslaught of ASICs has 
>> given 
>> > me a personal and visceral motivation to try and find a way to 
>> eliminate the 
>> > differential between specialised hardware and commodity CPUs. It's an 
>> > interesting fact that isn't really discussed much, that the 
>> architecture of 
>> > CPUs is precisely a database specific design. There has been some 
>> progress 
>> > in using GPUs to accelerate graph databases, which involve graphs (and 
>> > vectors, and variant payload data types), but this is almost completely 
>> > irrelevant to how SQL, NoSQL and K/V database (and DHT based) stores 
>> work, 
>> > and the greatest amount of interesting stuff happening these days in 
>> > computer systems is all around distributed systems, an area that is 
>> rarefied 
>> > amongst the rarified field of CS itself. 
>> > 
>> > Sorry if I went off on a rant there, but I hope it explains why I am 
>> posing 
>> > questions and asking for advice from more experienced gophers about how 
>> to 
>> > do these things. The body of material available with the use of google 
>> > searches came up with almost nothing, and nothing that directly 
>> addressed my 
>> > modelling and even reading closer at the code that appeared to 
>> implement 
>> > this datatype agnosticism (within the constraint of comparability and 
>> an nil 
>> > value) so I wanted to engage people in a discussion. I would be very 
>> pleased 
>> > to learn that I have missed important things in the field that render 
>> my 
>> > design useless. I first was studying OOP back in the early 90s, and it 
>> > seemed like some kind of magic, but anyone who is reading this probably 
>> > already agrees that OOP (and functional) are not panaceas and languages 
>> like 
>> > Go are addressing this issue, promoting correctness, maintainability, 
>> and 
>> > efficiency all at the same time. I have tangled with more OOP code than 
>> I 
>> > wish I ever had, and to disclose, I did quite a bit of BASIC and 
>> Assembler 
>> > coding in my youth and you never forget the thrill of discovering a 
>> property 
>> > in the numbers or the architecture that lets you shortcut to solve 
>> > previously messy, thick and dense problems that faced programmers in 
>> the 
>> > past. 
>> > 
>> > On Monday, 23 April 2018 20:06:08 UTC+3, rog wrote: 
>> >> 
>> >> On 22 April 2018 at 23:20, Louki Sumirniy 
>> >> <louki.sumir...@gmail.com> wrote: 
>> >> > I essentially am trying to find an effective method in Go, 
>> preferably 
>> >> > not 
>> >> > too wordy, that lets me create an abstract data type, a struct, and 
>> a 
>> >> > set of 
>> >> > functions that bind to a different data type, and that I can write, 
>> >> > preferably not in too much code, a change that allows the data type 
>> of 
>> >> > the 
>> >> > embedded data to be changed. It's basically kinda inheritance, but 
>> after 
>> >> > much fiddling I found a hackish sorta way that isn't *too* 
>> boilerplate 
>> >> > filled: 
>> >> > 
>> >> > type nullTester func(*Bast, uint32) bool 
>> >> > 
>> >> > type Bast struct { 
>> >> >   ... 
>> >> >   isNull    nullTester 
>> >> >   ... 
>> >> >  } 
>> >> > 
>> >> > func isNull(b *Bast, d uint32) bool { 
>> >> >   return d == 0 
>> >> > } 
>> >> > 
>> >> > func NewBast() (b *Bast) { 
>> >> >   ... 
>> >> >   b.isNull = isNull 
>> >> >   ... 
>> >> > } 
>> >> > 
>> >> > // IsNull - tests if a value in the tree is null 
>> >> > func (b *Bast) IsNull(d uint32) bool { 
>> >> >   return b.isNull(b, d) 
>> >> > } 
>> >> > 
>> >> > 
>> >> > Now, bear in mind I haven't shown all of the code. But there is a 
>> slice 
>> >> > array in the Bast struct, and I it is defined as an interface{} and 
>> >> > isNull 
>> >> > is one of a set of operators that have to be written to match the 
>> type 
>> >> > used 
>> >> > in the slice store, this might be a bad example because it doesn't 
>> >> > actually 
>> >> > act on the interface typed slice, but the point here is just this: 
>> >> > 
>> >> > It does not appear to be possible to make the type specification 
>> from 
>> >> > the 
>> >> > top line match the function signature of the type-bound function in 
>> the 
>> >> > bottom of the code snippet. I haven't been able to find anything 
>> that 
>> >> > shows 
>> >> > that a func type can have a method binding. 
>> >> > 
>> >> > 
>> https://github.com/calibrae-project/bast/blob/master/pkg/bast/bast.go is 
>> >> > where my WiP lives. This slightly hacky solution seems sound to me, 
>> I 
>> >> > just 
>> >> > don't like to be forced to use workarounds like this. If a type 
>> >> > signature 
>> >> > cannot be written that matches a method, yet I can do it this way, I 
>> >> > don't 
>> >> > see what purpose this serves as far as any kind of correctness and 
>> >> > bug-resistance issues go. I would have to deal with a lot more 
>> potential 
>> >> > bugs if I had to concretely implemennt this library for the sake of 
>> 1 
>> >> > slice 
>> >> > and 7 functions out of a much larger library that conceptually is 
>> >> > intended 
>> >> > to only deal with comparable, mainly numerical values anyway. 
>> >> 
>> >> I don't really understand why you think you want all those function 
>> types. 
>> >> Your code seems to mix concerns quite a bit, but interfaces in Go are 
>> >> generally 
>> >> closely focused on a particular thing. 
>> >> 
>> >> I don't understand anything about your algorithm, but ISTM you could 
>> >> do something like this: 
>> >> 
>> >>     https://play.golang.org/p/wv0T8T8Ynns 
>> >> 
>> >> Some changes I made in doing that; 
>> >> 
>> >> - unexported a load of identifiers that really don't deserve to be 
>> >> part of the public API 
>> >> - made Cursor into a lightweight by-value type 
>> >> - defined a Store interface that doesn't require knowledge of the 
>> >> Cursor data type. 
>> >> 
>> >> A few other observations: 
>> >> 
>> >> - requiring a value type to know whether an item is null or not does 
>> >> not seem great 
>> >> when you're storing integers - who is to say that the zero uint32 is 
>> >> not a valid thing to store? 
>> >> 
>> >> - requiring callers to know about AddRow (and hence how the tree is 
>> >> organised) does 
>> >> not seem ideal. 
>> >> 
>> >> - there's no way to search for or insert items, but I guess you'll be 
>> >> doing that later. 
>> >> 
>> >> - it looks like your algorithm will always use a power-of-two multiple 
>> >> of the element size, 
>> >> which could end up very inefficient when the tree is large. 
>> >> 
>> >>   cheers, 
>> >>     rog. 
>> > 
>> > -- 
>> > You received this message because you are subscribed to the Google 
>> Groups 
>> > "golang-nuts" group. 
>> > To unsubscribe from this group and stop receiving emails from it, send 
>> an 
>> > email to golang-nuts...@googlegroups.com. 
>> > For more options, visit https://groups.google.com/d/optout. 
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] Type func binding and interfaces

Reply via email to