> I am very fond of the relational functions in Clojure. That was one of > the first things that started winning me over actually.
Indeed, they're very nice to have! > Forgive me if this is an obvious question, but what exactly is the > disadvantage of the add-an-id approach? It's largely aesthetic for me: I don't like the idea of having to generate some identifier and decorate my data with it. From my perspective it's a hack to turn a set into a multiset, which is the concept I'm really working with (an unordered collection which includes duplicates). One could argue that choosing a name for the ID is not obviously easy, and that this is an approach that only works well for maps/structs, but those problems don't apply in my case, so I won't argue those points! I haven't done any timing to determine if it's an expensive hack: this is not time-critical code, so it doesn't matter much to me. For that reason I'll probably stick with this approach, albeit well-commented to explain to my future self why I'm temporarily introducing an otherwise-unused ID! I raised this whole issue not because I can't work around it, but because I like to use the right tool for the job if it exists, and maybe other people already built that tool. Who knows? perhaps Rich has been considering spending an afternoon adding multisets to core, and this is additional motivation. After all, we now have sorted-sets, which is the other axis of set-hood... > Or, another way, what would be > substantially better about having multisets over just doing what > you're doing? My understanding of relational theory and SQL (thanks > largely to Joe Celko's books) makes me suspicious of needing > cardinality—it sounds a lot like wanting access to the physical > ordering on disk. Then again, a lot of my database tables wind up with > a sort-order column or an auto-incrementing ID, I admit. It depends on how "pure" your experience with relational algebra is :) I've spent a lot of time with SPARQL, the RDF query language. It's relational (much like SQL for the web), but it preserves cardinality by default, but not ordering. (It has REDUCED and DISTINCT keywords to discard duplicates if desired or permitted.) Some people think preserving cardinality is an odd choice, given that RDF is defined in terms of sets, not bags, but it has its uses. Modeling event-like things (charges, in my case) in a pure relational system -- one with set semantics -- typically requires the addition of two things: a unique identifier to preserve otherwise-identical events; and some ordering attribute, to preserve sequentiality in an unordered system. Removing the "set-ness" (cardinality, un- orderedness, or both) is another way to resolve the impedance mismatch. > Of course, just because it violates relational theory doesn't mean it > wouldn't be a great addition to the language. I'm curious. > > Would you mind sharing the code with the error for the calculation > you're doing? I'm afraid I can't share the exact code, but the simplified relational part is something like: (use 'clojure.set) (defn example-charges "Take a relation between charge and identifier, and a relation between identifier and client, and sum the charges for each client." [charges-rel clients] ;; 5. Produce a sum charge for each client in a single map. ;; No need to apply merge-with: the index has unique keys. (into {} (map ;; 4. Turn the index into a numeric sum for each client. (fn [[k v]] [(:client k) (reduce + (map :charge v))]) (index (project ;; 1. Note that any identifiers not in the clients relation will ;; simply disappear at this point. (join charges-rel clients) ;; 2. Include :id in the projection to prevent set semantics. [:client :charge :id]) ;; 3. Now index from client to the projected relations. #{:client})))) E.g., (example-charges #{{:charge 10 :identifier "12345abcdef" :id 0} {:charge 10 :identifier "67890ghijkl" :id 1} {:charge 15 :identifier "12345poiuyt" :id 2}} #{{:identifier "12345abcdef" :client "Foocorp"} {:identifier "67890ghijkl" :client "Foocorp"} {:identifier "12345poiuyt" :client "Barcorp"}}) => {"Foocorp" 20, "Barcorp" 15} Omit the :id and we get this: (example-charges #{{:charge 10 :identifier "12345abcdef"} {:charge 10 :identifier "67890ghijkl"} {:charge 15 :identifier "12345poiuyt"}} #{{:identifier "12345abcdef" :client "Foocorp"} {:identifier "67890ghijkl" :client "Foocorp"} {:identifier "12345poiuyt" :client "Barcorp"}}) => {"Barcorp" 15, "Foocorp" 10} Oops! We're going to under-charge Foocorp! You get the same result if you omit the :id from the projection vector. Thanks, -R --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---