It's not necessarily true that immutable data structures perform an actual
in-memory copy. See for instance Phil Bagwell's hash tries -- only minimal
amounts of the data structure are actually copied during updates.

I don't know whether BaseX is making use of these optimizations (Christian
could speak to that), but jumping directly to "it's immutible, so it must
be inefficient" is not necessarily well-founded.


On Tue, Nov 12, 2013 at 10:48 AM, jean-marc Mercier <
[email protected]> wrote:

> Dear Christian,
>
> Thanks for your answer.
> >It may be interesting to do some testing in order to find out what’s
> >the actual bottleneck in your query. How man entries is your hash set
> >supposed to contain?
>
> The HashSet contains 10M strings. That's not a big deal. The problems
> comes from filling the HashSet.
> Well, let us be more technacal. I am using hashset for tree traversal
> operations over a DataBase of 10M nodes, 1Go. This HashSet collects some
> informations during the traversal, mainly a string at each node. Actually,
> my code binds JAVA with something like
>
> import module namespace set = "java.util.HashSet";
>
> declare function init(){set:clear(),
> tree_traversal($my_big_DataBase), do_some_stuff_with_hash_set()}
> declare function tree_traversal($elements)
> {
>     for $element in $elements return (
>            set:add(get_some_stuff_within($element) ),
>            tree_traversal($element/element())
>      )
> }
> and the whole traversal takes a minute.
>
> I don't know how to achieve this with pure XQUERY. The only thing that I
> tried is something like  :
>
> declare function init(){let $my_hash_set :=
> tree_traversal($my_big_DataBase, map:new()) return
> do_some_stuff_with($my_hash_set) }
>
> declare function tree_traversal($elements, $hash_set) as function(*)
> (return a HashSet)
> {
>      for $element in $elements return (
>            tree_traversal($nodes/element(), $hash_set),
>            map:entry( get_some_stuff_within($element),boolean)
>      )
> }
> that performed very badly, compared to the first version (JAVA binding). I
> explained this thinking that the first version is O(N) operations, and the
> second is O(N^2) operations due to in-memory copy.
>
> >> - second : it lacks powerful libraries, as complete math modules.
>
> > What kind of functions would you be interested in?
>
> I need a stochastic modulus, as
> http://www.boost.org/doc/libs/1_55_0/libs/math/doc/html/dist.html
> also a linear Algebra modulus as UBLAS
> http://www.boost.org/doc/libs/1_54_0/libs/numeric/ublas/doc/index.htm
>
> I guess that I could find JAVA equivalent and bind them to BaseX. Such
> functionalities are available directly as XQUERY modules ?
>
> Cheers,
>
> Jean-Marc
>
>
>
>
> 2013/11/12 Christian Grün <[email protected]>
>
>> Dear JohnLeM,
>>
>> thanks for your mail. As you already noted, XQuery is a functional
>> language, and this is the reason why XQuery maps are not exactly
>> comparable to maps and sets, as they are used in imperative languages:
>>
>> All maps in XQuery are persisent (immutable): Once a map has been
>> generated, it is not possible to change its contents. Instead, a new
>> map is to be created by each insertion or deletion [1]. This sounds
>> like a huge memory killer, but it’s not as bad as you might guess.
>> Various efficient solutions exist for persistent map, such as the
>> mapped trie that has been implemented in BaseX [2]. It will only
>> create copies of parts of the data structure that are to be changed.
>> The following query is an example for a query which creates 100.000
>> map with a single entry, and a large map containing all the entries;
>> on my system, it requires 200 ms to run:
>>
>>   map:size(map:new(
>>     for $i in 1 to 100000
>>    return map { $i := true() }
>>   ))
>>
>> In short: persistent maps may not be as efficient as mutable maps, but
>> they are usually not the bottleneck in XQuery applications, because
>> deleted entries (or obsolete maps) will automatically be discarded by
>> the garbage collector as soon as they are not in the query scope
>> anymore. If you want to enforce this, you can put your map operations
>> into FLWOR expresions or user-declared functions.
>>
>> Back to your original questions:
>>
>> > 2) This module must expose a method "hashset:clear($hashset)" to
>> de-allocate
>> > memory dynamically. The map:module provides the function map:remove,
>> and I
>> > could remove all elements. [...]  It does not deallocate memory,
>> leading to poor
>> > overall performances.
>>
>> It may be interesting to do some testing in order to find out what’s
>> the actual bottleneck in your query. How man entries is your hash set
>> supposed to contain?
>>
>> > 3) must expose a method "hashset:add($hashset,$element)" to add memory
>> > dynamically. Through map:module, the only possibility that I see is to
>> wrap
>> > it as map:new($hashset, map:entry($element,$dummyboolean)).
>>
>> Right, using true() is probably the best choice (booleans are only
>> instantiated once in memory).
>>
>> > - first : its dynamic memory management (the in-memory printfoot of my
>> > XQUERY executables are usually tremendous).
>>
>> This can in fact be a problem, and is mostly due to various decisions
>> taken in the specification, and the complexity of XML nodes in
>> general.
>>
>> > - second : it lacks powerful libraries, as complete math modules.
>>
>> What kind of functions would you be interested in?
>>
>> Christian
>>
>> [1] http://en.wikipedia.org/wiki/Persistent_data_structure
>> [2] http://en.wikipedia.org/wiki/Hash_array_mapped_trie
>>
>
>
> _______________________________________________
> BaseX-Talk mailing list
> [email protected]
> https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
>
>
_______________________________________________
BaseX-Talk mailing list
[email protected]
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Reply via email to