Hi Ben -
I'm on mobile, please excuse any typos.

Maybe
`return array { $idf }`
is closer?

Untested, apologies!
Best,
Bridger


On Mon, Mar 30, 2020, 5:16 PM Ben Engbers <ben.engb...@be-logical.nl> wrote:

> Hi,
>
> In textmining, the 'idf' or inverse document frequency is defined as
> idf(term)=ln(ndocuments / ndocuments containing term). I am working on a
> function that should return this idf.
>
> This function:
>
> declare function local:wordFreq_idf($nodes as node()*) as array(*) {
>   let $count := count($nodes)
>   let $text := for $node in $nodes
>      return $node/text() => tokenize() => distinct-values()
>      let $idf := $text   => tidyTM:wordCount_arr()
>   return $idf
> };
>
> returns:
>
> ["probleem", 703]
> ["opgelost.", 248]
> ["dictu", 235]
> ["opgelost", 217]
> ["medewerker", 193]
> ...
>
> For "probleem", the idf should be calculated as ln($count/703). Since
> there are 1780 nodes this would result in 0.929011751.
> I tried to exten the 'let $idf' line with:
>        => array:for-each(function($idf) {array:append($idf,
> math:log($count div $idf[2]) )})
> which should result in ["probleem", 703, 0.929011751]
>
> but no mather what I do, every time I get this error:
> [XPTY0004] Cannot promote (array(xs:anyAtomicType))+ to array(*): ([
> "probleem", 703 ], [ "opgelost.", 248 ], ...).
>
> Is it possible to apply array:for-each on an array of arrays?
>
> Ben
>
>

Reply via email to