It's not necessarily true that immutable data structures perform an actual in-memory copy. See for instance Phil Bagwell's hash tries -- only minimal amounts of the data structure are actually copied during updates.
I don't know whether BaseX is making use of these optimizations (Christian could speak to that), but jumping directly to "it's immutible, so it must be inefficient" is not necessarily well-founded. On Tue, Nov 12, 2013 at 10:48 AM, jean-marc Mercier < [email protected]> wrote: > Dear Christian, > > Thanks for your answer. > >It may be interesting to do some testing in order to find out what’s > >the actual bottleneck in your query. How man entries is your hash set > >supposed to contain? > > The HashSet contains 10M strings. That's not a big deal. The problems > comes from filling the HashSet. > Well, let us be more technacal. I am using hashset for tree traversal > operations over a DataBase of 10M nodes, 1Go. This HashSet collects some > informations during the traversal, mainly a string at each node. Actually, > my code binds JAVA with something like > > import module namespace set = "java.util.HashSet"; > > declare function init(){set:clear(), > tree_traversal($my_big_DataBase), do_some_stuff_with_hash_set()} > declare function tree_traversal($elements) > { > for $element in $elements return ( > set:add(get_some_stuff_within($element) ), > tree_traversal($element/element()) > ) > } > and the whole traversal takes a minute. > > I don't know how to achieve this with pure XQUERY. The only thing that I > tried is something like : > > declare function init(){let $my_hash_set := > tree_traversal($my_big_DataBase, map:new()) return > do_some_stuff_with($my_hash_set) } > > declare function tree_traversal($elements, $hash_set) as function(*) > (return a HashSet) > { > for $element in $elements return ( > tree_traversal($nodes/element(), $hash_set), > map:entry( get_some_stuff_within($element),boolean) > ) > } > that performed very badly, compared to the first version (JAVA binding). I > explained this thinking that the first version is O(N) operations, and the > second is O(N^2) operations due to in-memory copy. > > >> - second : it lacks powerful libraries, as complete math modules. > > > What kind of functions would you be interested in? > > I need a stochastic modulus, as > http://www.boost.org/doc/libs/1_55_0/libs/math/doc/html/dist.html > also a linear Algebra modulus as UBLAS > http://www.boost.org/doc/libs/1_54_0/libs/numeric/ublas/doc/index.htm > > I guess that I could find JAVA equivalent and bind them to BaseX. Such > functionalities are available directly as XQUERY modules ? > > Cheers, > > Jean-Marc > > > > > 2013/11/12 Christian Grün <[email protected]> > >> Dear JohnLeM, >> >> thanks for your mail. As you already noted, XQuery is a functional >> language, and this is the reason why XQuery maps are not exactly >> comparable to maps and sets, as they are used in imperative languages: >> >> All maps in XQuery are persisent (immutable): Once a map has been >> generated, it is not possible to change its contents. Instead, a new >> map is to be created by each insertion or deletion [1]. This sounds >> like a huge memory killer, but it’s not as bad as you might guess. >> Various efficient solutions exist for persistent map, such as the >> mapped trie that has been implemented in BaseX [2]. It will only >> create copies of parts of the data structure that are to be changed. >> The following query is an example for a query which creates 100.000 >> map with a single entry, and a large map containing all the entries; >> on my system, it requires 200 ms to run: >> >> map:size(map:new( >> for $i in 1 to 100000 >> return map { $i := true() } >> )) >> >> In short: persistent maps may not be as efficient as mutable maps, but >> they are usually not the bottleneck in XQuery applications, because >> deleted entries (or obsolete maps) will automatically be discarded by >> the garbage collector as soon as they are not in the query scope >> anymore. If you want to enforce this, you can put your map operations >> into FLWOR expresions or user-declared functions. >> >> Back to your original questions: >> >> > 2) This module must expose a method "hashset:clear($hashset)" to >> de-allocate >> > memory dynamically. The map:module provides the function map:remove, >> and I >> > could remove all elements. [...] It does not deallocate memory, >> leading to poor >> > overall performances. >> >> It may be interesting to do some testing in order to find out what’s >> the actual bottleneck in your query. How man entries is your hash set >> supposed to contain? >> >> > 3) must expose a method "hashset:add($hashset,$element)" to add memory >> > dynamically. Through map:module, the only possibility that I see is to >> wrap >> > it as map:new($hashset, map:entry($element,$dummyboolean)). >> >> Right, using true() is probably the best choice (booleans are only >> instantiated once in memory). >> >> > - first : its dynamic memory management (the in-memory printfoot of my >> > XQUERY executables are usually tremendous). >> >> This can in fact be a problem, and is mostly due to various decisions >> taken in the specification, and the complexity of XML nodes in >> general. >> >> > - second : it lacks powerful libraries, as complete math modules. >> >> What kind of functions would you be interested in? >> >> Christian >> >> [1] http://en.wikipedia.org/wiki/Persistent_data_structure >> [2] http://en.wikipedia.org/wiki/Hash_array_mapped_trie >> > > > _______________________________________________ > BaseX-Talk mailing list > [email protected] > https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk > >
_______________________________________________ BaseX-Talk mailing list [email protected] https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

