Hi,

In textmining, the 'idf' or inverse document frequency is defined as
idf(term)=ln(ndocuments / ndocuments containing term). I am working on a
function that should return this idf.

This function:

declare function local:wordFreq_idf($nodes as node()*) as array(*) {
  let $count := count($nodes)
  let $text := for $node in $nodes
     return $node/text() => tokenize() => distinct-values()
     let $idf := $text   => tidyTM:wordCount_arr()
  return $idf
};

returns:

["probleem", 703]
["opgelost.", 248]
["dictu", 235]
["opgelost", 217]
["medewerker", 193]
...

For "probleem", the idf should be calculated as ln($count/703). Since
there are 1780 nodes this would result in 0.929011751.
I tried to exten the 'let $idf' line with:
       => array:for-each(function($idf) {array:append($idf,
math:log($count div $idf[2]) )})
which should result in ["probleem", 703, 0.929011751]

but no mather what I do, every time I get this error:
[XPTY0004] Cannot promote (array(xs:anyAtomicType))+ to array(*): ([
"probleem", 703 ], [ "opgelost.", 248 ], ...).

Is it possible to apply array:for-each on an array of arrays?

Ben

Reply via email to