Re: [basex-talk] How to apply array:for-each on an array of arrays?
On 31.03.2020 18:32, Ben Engbers wrote: Hi, For (my personal) clarity, I have split up the original function in two parts: declare function local:step_one($nodes as node()*) as array(*)* { let $text := for $node in $nodes return $node/text() => tokenize() => distinct-values() let $idf := $text => tidyTM:wordCount_arr() return $idf }; In local:step_one(), I first create a sequence with the distinct tokens for each $node. All the sequences are joined in $text. I then call wordCount_arr to count the occurences of each word in $text: declare function tidyTM:wordCount_arr( $Words as xs:string*) as array(*) { for $w in $Words let $f := $w group by $f order by count($w) descending return ([$f, count($w)]) } ; I would say that tidyTM:wordCount_arr returns a sequence of arrays but I am not certain if I have specified the correct return-type? Reading the code I agree that the return type seems to be a sequence of arrays but therefore I wonder why you don't get a similar error as later on with declaring array(*) and not array(*)* Calling local:step_one(tidyTM:remove_Stopwords($nodes, "Stp", $Stoppers)) returns: ["probleem", 703] ["opgelost.", 248] I had hoped that calling the following local:wordFreq, would add the idf to each element but instead I get an error declare function local:wordFreq_idf($nodes as node()*) as array(*) { let $count := count($nodes) let $idf := local:step_one($nodes) let $result := for-each( $idf, function($z) {array:append ($z, math:log($count div $z(2) ) ) } ) return $result }; [XPTY0004] Cannot promote (array(xs:anyAtomicType))+ to array(*): $idf := ([ "probleem", 703 ], [ "opgelost.", 248 ], ...). The message tries to tell you that the declared return type array(*) is a single array while the function returns a (non-empty) sequence of arrays so using declare function local:wordFreq_idf($nodes as node()*) as array(*)* would remove that error. To insert the third value into each array I think you want let $result := $idf ! array:append(., math:log($count div .(2) ))
Re: [basex-talk] How to apply array:for-each on an array of arrays?
Hi, For (my personal) clarity, I have split up the original function in two parts: declare function local:step_one($nodes as node()*) as array(*)* { let $text := for $node in $nodes return $node/text() => tokenize() => distinct-values() let $idf := $text => tidyTM:wordCount_arr() return $idf }; In local:step_one(), I first create a sequence with the distinct tokens for each $node. All the sequences are joined in $text. I then call wordCount_arr to count the occurences of each word in $text: declare function tidyTM:wordCount_arr( $Words as xs:string*) as array(*) { for $w in $Words let $f := $w group by $f order by count($w) descending return ([$f, count($w)]) } ; I would say that tidyTM:wordCount_arr returns a sequence of arrays but I am not certain if I have specified the correct return-type? Calling local:step_one(tidyTM:remove_Stopwords($nodes, "Stp", $Stoppers)) returns: ["probleem", 703] ["opgelost.", 248] I had hoped that calling the following local:wordFreq, would add the idf to each element but instead I get an error declare function local:wordFreq_idf($nodes as node()*) as array(*) { let $count := count($nodes) let $idf := local:step_one($nodes) let $result := for-each( $idf, function($z) {array:append ($z, math:log($count div $z(2) ) ) } ) return $result }; [XPTY0004] Cannot promote (array(xs:anyAtomicType))+ to array(*): $idf := ([ "probleem", 703 ], [ "opgelost.", 248 ], ...). Cheers, Ben Op 31-03-2020 om 16:29 schreef Martin Honnen: > So does the working function return a sequence of arrays? That doesn't > match the > as array(*) > return type declaration, it seems. > > What does tidyTM:wordCount_arr() return, a single array (of atomic items)?
Re: [basex-talk] How to apply array:for-each on an array of arrays?
Hi, > => means "take the thing on the left and substitute it for the first > parameter of the function on the right, so I thought it meant "The first parameter on the right will be subsituted with the thing on the left"? > ('weasels') => replace('weasels','mustelids') works > > ('weasels','badgers') => replace('weasels','mustelids') DOES NOT work > > This is because a one-item sequence can be treated as the single string > value the first parameter of replace() requires, but a > greater-then-one-item sequence can't be. (This one gives you "item > expected, sequence found" if you try it from the GUI.) The following is quite similar to the 'piping' mechanism in R. I'll start experimenting with it. Thanx, Ben > ! means "take each item of the sequence on the left and pass it to the > thing on the right in turn", so > > ('weasels','badgers') ! replace(.,'weasels','mustelids') works. > > (note that replace() got its first parameter back as the context item > dot.) > > so if you take > > => array:for-each(function($idf) {array:append($idf,math:log($count div > $idf[2]) )}) > > and replace it with > ! array:for-each(.,function($idf) {array:append($idf,math:log($count div > $idf[2]) )}) > > (note the context-item dot!) > > you should at least get a different error message. > > -- Graydon >
Re: [basex-talk] How to apply array:for-each on an array of arrays?
On Tue, Mar 31, 2020 at 04:21:52PM +0200, Ben Engbers scripsit: > Op 31-03-2020 om 01:18 schreef Graydon: > > On Mon, Mar 30, 2020 at 11:16:23PM +0200, Ben Engbers scripsit: > > [snip] > >> For "probleem", the idf should be calculated as ln($count/703). Since > >> there are 1780 nodes this would result in 0.929011751. > >> I tried to exten the 'let $idf' line with: > >>=> array:for-each(function($idf) {array:append($idf, > >> math:log($count div $idf[2]) )}) > >> which should result in ["probleem", 703, 0.929011751] > >> > >> but no mather what I do, every time I get this error: > >> [XPTY0004] Cannot promote (array(xs:anyAtomicType))+ to array(*): ([ > >> "probleem", 703 ], [ "opgelost.", 248 ], ...). > > > > The errors says you're trying to feed a sequence of arrays to an array > > function; maybe you want ! where you have => ? > > Upon your remark about feeding a sequence of arrays, I first tried to > apply 'for-each' instead of 'array:for-each'. Alas, that didn't help > ;-(, the error was still the same. array:for-each takes a single array and gives you back a new array based on what the anonymous function passed as the second parameter does to each member of the original array. So you have to make sure you're feeding a single array to it. (and you're not; that's what the error message is telling you, you've got a sequence of arrays on the left of the => operator.) > I then tried to understand what you mean with the '!'. > In the book from Priscilla Walmsley, the ! is mentioned as a simple map > operator. How is that related to this problem? => means "take the thing on the left and substitute it for the first parameter of the function on the right, so ('weasels') => replace('weasels','mustelids') works ('weasels','badgers') => replace('weasels','mustelids') DOES NOT work This is because a one-item sequence can be treated as the single string value the first parameter of replace() requires, but a greater-then-one-item sequence can't be. (This one gives you "item expected, sequence found" if you try it from the GUI.) ! means "take each item of the sequence on the left and pass it to the thing on the right in turn", so ('weasels','badgers') ! replace(.,'weasels','mustelids') works. (note that replace() got its first parameter back as the context item dot.) so if you take => array:for-each(function($idf) {array:append($idf,math:log($count div $idf[2]) )}) and replace it with ! array:for-each(.,function($idf) {array:append($idf,math:log($count div $idf[2]) )}) (note the context-item dot!) you should at least get a different error message. -- Graydon
Re: [basex-talk] How to apply array:for-each on an array of arrays?
Am 30.03.2020 um 23:16 schrieb Ben Engbers: Hi, In textmining, the 'idf' or inverse document frequency is defined as idf(term)=ln(ndocuments / ndocuments containing term). I am working on a function that should return this idf. This function: declare function local:wordFreq_idf($nodes as node()*) as array(*) { let $count := count($nodes) let $text := for $node in $nodes return $node/text() => tokenize() => distinct-values() let $idf := $text => tidyTM:wordCount_arr() return $idf }; returns: ["probleem", 703] ["opgelost.", 248] ["dictu", 235] ["opgelost", 217] ["medewerker", 193] ... So does the working function return a sequence of arrays? That doesn't match the as array(*) return type declaration, it seems. What does tidyTM:wordCount_arr() return, a single array (of atomic items)?
Re: [basex-talk] How to apply array:for-each on an array of arrays?
Op 31-03-2020 om 01:18 schreef Graydon: > On Mon, Mar 30, 2020 at 11:16:23PM +0200, Ben Engbers scripsit: > [snip] >> For "probleem", the idf should be calculated as ln($count/703). Since >> there are 1780 nodes this would result in 0.929011751. >> I tried to exten the 'let $idf' line with: >>=> array:for-each(function($idf) {array:append($idf, >> math:log($count div $idf[2]) )}) >> which should result in ["probleem", 703, 0.929011751] >> >> but no mather what I do, every time I get this error: >> [XPTY0004] Cannot promote (array(xs:anyAtomicType))+ to array(*): ([ >> "probleem", 703 ], [ "opgelost.", 248 ], ...). > > The errors says you're trying to feed a sequence of arrays to an array > function; maybe you want ! where you have => ? > > -- Graydon > Hi, Upon your remark about feeding a sequence of arrays, I first tried to apply 'for-each' instead of 'array:for-each'. Alas, that didn't help ;-(, the error was still the same. I then tried to understand what you mean with the '!'. In the book from Priscilla Walmsley, the ! is mentioned as a simple map operator. How is that related to this problem? Cheers, Ben
Re: [basex-talk] How to apply array:for-each on an array of arrays?
On Mon, Mar 30, 2020 at 11:16:23PM +0200, Ben Engbers scripsit: [snip] > For "probleem", the idf should be calculated as ln($count/703). Since > there are 1780 nodes this would result in 0.929011751. > I tried to exten the 'let $idf' line with: >=> array:for-each(function($idf) {array:append($idf, > math:log($count div $idf[2]) )}) > which should result in ["probleem", 703, 0.929011751] > > but no mather what I do, every time I get this error: > [XPTY0004] Cannot promote (array(xs:anyAtomicType))+ to array(*): ([ > "probleem", 703 ], [ "opgelost.", 248 ], ...). The errors says you're trying to feed a sequence of arrays to an array function; maybe you want ! where you have => ? -- Graydon
Re: [basex-talk] How to apply array:for-each on an array of arrays?
Hi Ben - I'm on mobile, please excuse any typos. Maybe `return array { $idf }` is closer? Untested, apologies! Best, Bridger On Mon, Mar 30, 2020, 5:16 PM Ben Engbers wrote: > Hi, > > In textmining, the 'idf' or inverse document frequency is defined as > idf(term)=ln(ndocuments / ndocuments containing term). I am working on a > function that should return this idf. > > This function: > > declare function local:wordFreq_idf($nodes as node()*) as array(*) { > let $count := count($nodes) > let $text := for $node in $nodes > return $node/text() => tokenize() => distinct-values() > let $idf := $text => tidyTM:wordCount_arr() > return $idf > }; > > returns: > > ["probleem", 703] > ["opgelost.", 248] > ["dictu", 235] > ["opgelost", 217] > ["medewerker", 193] > ... > > For "probleem", the idf should be calculated as ln($count/703). Since > there are 1780 nodes this would result in 0.929011751. > I tried to exten the 'let $idf' line with: >=> array:for-each(function($idf) {array:append($idf, > math:log($count div $idf[2]) )}) > which should result in ["probleem", 703, 0.929011751] > > but no mather what I do, every time I get this error: > [XPTY0004] Cannot promote (array(xs:anyAtomicType))+ to array(*): ([ > "probleem", 703 ], [ "opgelost.", 248 ], ...). > > Is it possible to apply array:for-each on an array of arrays? > > Ben > >