Re: [elm-discuss] Immutable data design problem

Lyle Kopnicky Mon, 24 Jul 2017 16:03:06 -0700

Hi Aaron,

Thanks for your thoughtful reply.


The domain model is pretty complex, so it's hard to distill down to a few 
issues. There's a higher-level structure called a Parcel. That already 
contains, among other things, the list of AssessmentEvents. I have a 
function called createParcel that takes a record with a parcel number, 
initial owner, and list of AssessmentEvents. Those AssessmentEvents must in 
turn have been created by calling createAssessmentEvent, which takes the 
independent fields of an AssessmentEvent and creates the full record with 
the derived fields. However, there really are yet more fields that can't be 
derived by looking at a single AssessmentEvent in isolation. Some 
calculation has to be done by determining chains of them and computing 
deltas along the chain.

Currently I have createParcel computing a Dict of assessmentEventsById (so, 
it's assuming some ID already exists on the AssessmentEvents, which is a 
separate issue). It also computes a list of roll years that are relevant to 
the assessment events, which involves some date math. It computes an 
ownership chain - that is, a list of date ranges and who owned the property 
during that time range. And finally it computes the list of assessment 
events that are effective for each roll year. Each assessment event might 
appear in the list for as many as two consecutive years, depending on its 
dates.

Then there will have to be deltas calculated between the assessment events 
for a given roll year. The accounts will be created from those. And 
finally, one or two bills will be created from each account, depending on 
the type of assessment event. All of this will be completely deterministic, 
based on the initial seed data of assessment events. But I need these 
accounts and bills calculated in order to properly view the data.

If I am using IDs, then I can make a data structure that just contains the 
deltas by ID, rather than creating another AssessmentEvent structure that 
has room for the delta values. But that would mean that when outside code 
needed to get the delta value, it couldn't just have an AssessmentEvent. It 
would have to have an AssessmentStore (or Parcel) and an EventID and call a 
function which could use that to retrieve the delta value from the Dict. 
So, it's a pretty different model for the caller.

So far I have been putting all this logic in one module, called Property. 
(The view logic is in a separate module.) I've been using datatypes with a 
single constructor, so the view code can pattern match against them. But 
now I'm starting to wonder whether it'd be safer to hide the 
representations here in the Property module.

At some point in the future I will want to allow adding/removing/updating 
assessment events in real time. Then I will have to decide whether I want 
to just recalculate the entire set of data or try to figure out which bits 
need to change. Recalculating the whole thing will probably be performant 
enough. But I guess there could be an issue with IDs - if some data gets 
loaded from the database and needs to preserve existing IDs, I can't just 
generate new IDs for the whole set. I'll figure out that problem when I 
come to it.

Regards,
Lyle

On Sunday, July 23, 2017 at 8:17:07 PM UTC-7, Aaron VonderHaar wrote:
>
> I'm not sure I understand all the details of your domain model, but it 
> seems like the notable point is that accounts are created implicitly as 
> assessment events occur, and you'd like to be able to, given an assessment 
> event, get the related accounts?
>
> I'd probably start with making a module (maybe called "AssessmentStore") 
> that has functions that describe what you need.  I'm thinking something 
> like:
>
> allEvents : AssessmentStore -> List AssessmentEvent
>
> and hmm... now that I write that out, it seems like that's all you want, 
> except that you ideally want AssessmentEvent to have a list of Accounts in 
> it.
>
> I think the approach I would prefer is similar to what you mention in your 
> last paragraph about keeping the data in separate structures, but you 
> question the safety of managing parallel structures.  If you create a 
> separate module to encapsulates the data, you can can limit the need for 
> careful handling to that single module.  I might try something like this in 
> `AssessmentStore`:
>
> type AssessmentStore =
>     AssessmentStore 
>         { assessmentEventInfo : Dict EventId { name : String, ... } -- 
> This is not the full AssessmentEvent; just the things that don't relate to 
> accounts.
>         , accountsByEvent : Dict EventId (List AccountId)
>         , accountInfo : Dict AccountId Account
>         , allEvents : List EventId -- (or maybe you want them indexed 
> differently, by time, etc)
>         }
>
> then have a function to create the assessment store, and then the 
> `allEvents` functions suggested above (or any other function to get 
> AssessmentEvents) can take the data in that private data structure and 
> merge it together to give the data that you actually want to return to the 
> caller.  In fact, you never need to expose the AccountIds/EventIds outside 
> of this module.
>
> If you are still worried about safety, you can add more unit tests to this 
> module, or try to define fuzz test properties to help you ensure that you 
> handle the computations correctly within the module.
>
> I've found this sort of approach to work well because it lets you 
> represent the data in whatever data structure is most performant and/or 
> appropriate for your needs (it is often also simpler to implement because 
> the data structures tend to be much flatter), but also hides the internal 
> representation behind an module interface so that you can still access the 
> data in whatever ways are most convenient for the calling code.
>
>
>
>
> On Sun, Jul 23, 2017 at 7:16 PM, Lyle Kopnicky <lyle...@gmail.com 
> <javascript:>> wrote:
>
>> I have a series of datatypes that have already been modeled in a 
>> relational database in a product. I'm trying to construct a lighter-weight 
>> in-memory representation in Elm for purposes of simulating operations and 
>> visualizing the results. Ultimately I will probably want to do some 
>> export/import operations that will allow someone to view data from the real 
>> database, or create records in a test database. But, I don't think the 
>> in-memory representations need to correspond exactly to the database ones 
>> in order to do this. I'd like to focus on as simple of a representation as 
>> possible, and I'm leaving out a fair number of fields.
>>
>> We start with a provided series of AssessmentEvents. It's just a limited 
>> amount of data for each AssessmentEvent. Some of the fields in the database 
>> can be calculated from the others, so those don't need to be provided. From 
>> this data, we can calculate more information about the AssessmentEvents, 
>> including deltas between them. We can also derive a series of Accounts in a 
>> completely deterministic fashion. Each AssessmentEvent will have up to two 
>> years associated with it, and for each year there will be at least one 
>> Account. From this we can also calculate one or two Bills to go with each 
>> Account.
>>
>> It's a fairly complex calculation. Certainly I can do it in Elm. But what 
>> I'm waffling about is how to store the data. These calculations can be 
>> cached - they do not need to be repeated if the user just changes their 
>> view of the data. They only need to be revised if the user wants to 
>> insert/edit/update AssessmentEvents. So to do all these calculations every 
>> time the user shifts the view would be wasteful.
>>
>> It becomes tricky with immutable data. In an object-oriented program, I 
>> would probably just have, say, extra empty fields on the AssessmentEvent 
>> object, that I would fill in as I updated the object. E.g., it could have a 
>> list of accounts, which initially would be a null value until I filled it 
>> in.
>>
>> At first I thought I might do something similar in the Elm data 
>> structure. An AssessmentEvent can contain a List of Accounts (I'm 
>> oversimplifying as it really needs to list the accounts per year). The list 
>> of Accounts can be initially empty. Then as I calculate the accounts, I can 
>> create a new list of AssessmentEvents that have Accounts in the list. But 
>> wait - since the list of AssessmentEvents is immutable, I can't change it. 
>> I can only create a new one, and then, where in the model do I put it?
>>
>> When a user initializes the model, then, what should they pass in? 
>> Perhaps they can pass in a list of AssessmentEvents that each have an empty 
>> list of Accounts, and then that gets stored in a variable. Then the 
>> Accounts are calculated, and we generate a new list of AssessmentEvents 
>> with Accounts attached, and that is what gets stored in the model.
>>
>> But this has some shortcomings. The user must now create something that 
>> has this extra unused field on it (and there will be more). I guess if they 
>> are using a function to create it, they needn't know that there are these 
>> extra fields. But what if the field isn't a list - it's an Int? Then do we 
>> need to make it a Maybe Int? Then all the code that later operates on that 
>> Int will have to handle the case that the Maybe Int might be a Nothing, 
>> even though at that point I know it will always be Just something.
>>
>> Maybe there should be a data structure that contains an AssessmentEvent, 
>> also containing the extra fields? But what if I have a series of functions, 
>> each of which adds some new field to the AssessmentEvent? Then I need a new 
>> data type for each step that just adds one more field?
>>
>> Perhaps if I use untagged records, then all the functions can just 
>> operate on the fields they care about, ignoring extra fields. I sort of 
>> liked the extra type safety that came with the tagged record, but it may 
>> just get in the way.
>>
>> Perhaps instead of attaching this extra data to AssessmentEvents, it 
>> could be kept in separate data structures? But then how do I know how they 
>> are connected? Unless I carefully manage the data in parallel arrays, I 
>> will need to add IDs to the AssessmentEvents, so they can be stored in a 
>> Dict.
>>
>> These are just some of my thoughts. Does anyone have any suggested 
>> patterns to follow?
>>
>> Thanks,
>> Lyle
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Elm Discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elm-discuss...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups "Elm 
Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elm-discuss+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [elm-discuss] Immutable data design problem

Reply via email to