Re: [basex-talk] How to apply array:for-each on an array of arrays?

2020-03-31 Thread Martin Honnen

On 31.03.2020 18:32, Ben Engbers wrote:

Hi,

For (my personal) clarity, I have split up the original function in two
parts:

declare function local:step_one($nodes as node()*) as array(*)*
{
   let $text := for $node in $nodes
  return $node/text() =>
  tokenize() => distinct-values()
   let $idf := $text   =>
  tidyTM:wordCount_arr()
   return $idf
};

In local:step_one(), I first create a sequence with the distinct tokens
for each $node. All the sequences are joined in $text.
I then call wordCount_arr to count the occurences of each word in $text:

declare function tidyTM:wordCount_arr(
   $Words as xs:string*)
   as array(*) {
for $w in $Words
   let $f := $w
   group by $f
   order by count($w) descending
return ([$f, count($w)])
} ;

I would say that tidyTM:wordCount_arr returns a sequence of arrays but I
am not certain if I have specified the correct return-type?


Reading the code I agree that the return type seems to be a sequence of
arrays but therefore I wonder why you don't get a similar error as later
on with declaring
  array(*)
and not
  array(*)*


Calling local:step_one(tidyTM:remove_Stopwords($nodes, "Stp", $Stoppers))
returns:
["probleem", 703]
["opgelost.", 248]


I had hoped that calling  the following local:wordFreq, would add the
idf to each element but instead I get an error

declare function local:wordFreq_idf($nodes as node()*)  as array(*)
{
   let $count := count($nodes)
   let $idf := local:step_one($nodes)
   let $result := for-each( $idf,
 function($z) {array:append ($z, math:log($count div $z(2) ) ) } )
   return $result
};
[XPTY0004] Cannot promote (array(xs:anyAtomicType))+ to array(*): $idf
:= ([ "probleem", 703 ], [ "opgelost.", 248 ], ...).


The message tries to tell you that the declared return type
  array(*)
is a single array while the function returns a (non-empty) sequence of
arrays so using
  declare function local:wordFreq_idf($nodes as node()*)  as array(*)*
would remove that error.

To insert the third value into each array I think you want

  let $result := $idf ! array:append(., math:log($count div .(2) ))


Re: [basex-talk] How to apply array:for-each on an array of arrays?

2020-03-31 Thread Ben Engbers
Hi,

For (my personal) clarity, I have split up the original function in two
parts:

declare function local:step_one($nodes as node()*) as array(*)*
{
  let $text := for $node in $nodes
 return $node/text() =>
 tokenize() => distinct-values()
  let $idf := $text   =>
 tidyTM:wordCount_arr()
  return $idf
};

In local:step_one(), I first create a sequence with the distinct tokens
for each $node. All the sequences are joined in $text.
I then call wordCount_arr to count the occurences of each word in $text:

declare function tidyTM:wordCount_arr(
  $Words as xs:string*)
  as array(*) {
for $w in $Words
  let $f := $w
  group by $f
  order by count($w) descending
return ([$f, count($w)])
} ;

I would say that tidyTM:wordCount_arr returns a sequence of arrays but I
am not certain if I have specified the correct return-type?

Calling local:step_one(tidyTM:remove_Stopwords($nodes, "Stp", $Stoppers))
returns:
["probleem", 703]
["opgelost.", 248]


I had hoped that calling  the following local:wordFreq, would add the
idf to each element but instead I get an error

declare function local:wordFreq_idf($nodes as node()*)  as array(*)
{
  let $count := count($nodes)
  let $idf := local:step_one($nodes)
  let $result := for-each( $idf,
function($z) {array:append ($z, math:log($count div $z(2) ) ) } )
  return $result
};
[XPTY0004] Cannot promote (array(xs:anyAtomicType))+ to array(*): $idf
:= ([ "probleem", 703 ], [ "opgelost.", 248 ], ...).


Cheers, Ben

Op 31-03-2020 om 16:29 schreef Martin Honnen:
> So does the working function return a sequence of arrays? That doesn't
> match the
>   as array(*)
> return type declaration, it seems.
> 
> What does tidyTM:wordCount_arr() return, a single array (of atomic items)?




Re: [basex-talk] How to apply array:for-each on an array of arrays?

2020-03-31 Thread Ben Engbers
Hi,

> => means "take the thing on the left and substitute it for the first
> parameter of the function on the right, so 
I thought it meant "The first parameter on the right will be subsituted
with the thing on the left"?

> ('weasels') => replace('weasels','mustelids')  works
> 
> ('weasels','badgers') => replace('weasels','mustelids')  DOES NOT work
> 
> This is because a one-item sequence can be treated as the single string
> value the first parameter of replace() requires, but a
> greater-then-one-item sequence can't be.  (This one gives you "item
> expected, sequence found" if you try it from the GUI.)

The following is quite similar to the 'piping' mechanism in R.
I'll start experimenting with it.

Thanx,
Ben
> ! means "take each item of the sequence on the left and pass it to the
> thing on the right in turn", so
> 
> ('weasels','badgers') ! replace(.,'weasels','mustelids')  works.
> 
> (note that replace() got its first parameter back as the context item
> dot.)
> 
> so if you take
> 
> => array:for-each(function($idf) {array:append($idf,math:log($count div 
> $idf[2]) )})
> 
> and replace it with 
> ! array:for-each(.,function($idf) {array:append($idf,math:log($count div 
> $idf[2]) )})
> 
> (note the context-item dot!)
> 
> you should at least get a different error message.
> 
> -- Graydon
> 



Re: [basex-talk] How to apply array:for-each on an array of arrays?

2020-03-31 Thread Graydon
On Tue, Mar 31, 2020 at 04:21:52PM +0200, Ben Engbers scripsit:
> Op 31-03-2020 om 01:18 schreef Graydon:
> > On Mon, Mar 30, 2020 at 11:16:23PM +0200, Ben Engbers scripsit:
> > [snip]
> >> For "probleem", the idf should be calculated as ln($count/703). Since
> >> there are 1780 nodes this would result in 0.929011751.
> >> I tried to exten the 'let $idf' line with:
> >>=> array:for-each(function($idf) {array:append($idf,
> >> math:log($count div $idf[2]) )})
> >> which should result in ["probleem", 703, 0.929011751]
> >>
> >> but no mather what I do, every time I get this error:
> >> [XPTY0004] Cannot promote (array(xs:anyAtomicType))+ to array(*): ([
> >> "probleem", 703 ], [ "opgelost.", 248 ], ...).
> > 
> > The errors says you're trying to feed a sequence of arrays to an array
> > function; maybe you want ! where you have => ?
> 
> Upon your remark about feeding a sequence of arrays, I first tried to
> apply 'for-each' instead of 'array:for-each'. Alas, that didn't help
> ;-(, the error was still the same.

array:for-each takes a single array and gives you back a new array based
on what the anonymous function passed as the second parameter does to
each member of the original array.

So you have to make sure you're feeding a single array to it.  (and
you're not; that's what the error message is telling you, you've got a
sequence of arrays on the left of the => operator.)

> I then tried to understand what you mean with the '!'.
> In the book from Priscilla Walmsley, the ! is mentioned as a simple map
> operator. How is that related to this problem?

=> means "take the thing on the left and substitute it for the first
parameter of the function on the right, so 

('weasels') => replace('weasels','mustelids')  works

('weasels','badgers') => replace('weasels','mustelids')  DOES NOT work

This is because a one-item sequence can be treated as the single string
value the first parameter of replace() requires, but a
greater-then-one-item sequence can't be.  (This one gives you "item
expected, sequence found" if you try it from the GUI.)

! means "take each item of the sequence on the left and pass it to the
thing on the right in turn", so

('weasels','badgers') ! replace(.,'weasels','mustelids')  works.

(note that replace() got its first parameter back as the context item
dot.)

so if you take

=> array:for-each(function($idf) {array:append($idf,math:log($count div 
$idf[2]) )})

and replace it with 
! array:for-each(.,function($idf) {array:append($idf,math:log($count div 
$idf[2]) )})

(note the context-item dot!)

you should at least get a different error message.

-- Graydon


Re: [basex-talk] How to apply array:for-each on an array of arrays?

2020-03-31 Thread Martin Honnen

Am 30.03.2020 um 23:16 schrieb Ben Engbers:

Hi,

In textmining, the 'idf' or inverse document frequency is defined as
idf(term)=ln(ndocuments / ndocuments containing term). I am working on a
function that should return this idf.

This function:

declare function local:wordFreq_idf($nodes as node()*) as array(*) {
   let $count := count($nodes)
   let $text := for $node in $nodes
  return $node/text() => tokenize() => distinct-values()
  let $idf := $text   => tidyTM:wordCount_arr()
   return $idf
};

returns:

["probleem", 703]
["opgelost.", 248]
["dictu", 235]
["opgelost", 217]
["medewerker", 193]
...


So does the working function return a sequence of arrays? That doesn't
match the
  as array(*)
return type declaration, it seems.

What does tidyTM:wordCount_arr() return, a single array (of atomic items)?






Re: [basex-talk] How to apply array:for-each on an array of arrays?

2020-03-31 Thread Ben Engbers
Op 31-03-2020 om 01:18 schreef Graydon:
> On Mon, Mar 30, 2020 at 11:16:23PM +0200, Ben Engbers scripsit:
> [snip]
>> For "probleem", the idf should be calculated as ln($count/703). Since
>> there are 1780 nodes this would result in 0.929011751.
>> I tried to exten the 'let $idf' line with:
>>=> array:for-each(function($idf) {array:append($idf,
>> math:log($count div $idf[2]) )})
>> which should result in ["probleem", 703, 0.929011751]
>>
>> but no mather what I do, every time I get this error:
>> [XPTY0004] Cannot promote (array(xs:anyAtomicType))+ to array(*): ([
>> "probleem", 703 ], [ "opgelost.", 248 ], ...).
> 
> The errors says you're trying to feed a sequence of arrays to an array
> function; maybe you want ! where you have => ?
> 
> -- Graydon
> 

Hi,
Upon your remark about feeding a sequence of arrays, I first tried to
apply 'for-each' instead of 'array:for-each'. Alas, that didn't help
;-(, the error was still the same.
I then tried to understand what you mean with the '!'.
In the book from Priscilla Walmsley, the ! is mentioned as a simple map
operator. How is that related to this problem?

Cheers,
Ben


Re: [basex-talk] How to apply array:for-each on an array of arrays?

2020-03-30 Thread Graydon
On Mon, Mar 30, 2020 at 11:16:23PM +0200, Ben Engbers scripsit:
[snip]
> For "probleem", the idf should be calculated as ln($count/703). Since
> there are 1780 nodes this would result in 0.929011751.
> I tried to exten the 'let $idf' line with:
>=> array:for-each(function($idf) {array:append($idf,
> math:log($count div $idf[2]) )})
> which should result in ["probleem", 703, 0.929011751]
> 
> but no mather what I do, every time I get this error:
> [XPTY0004] Cannot promote (array(xs:anyAtomicType))+ to array(*): ([
> "probleem", 703 ], [ "opgelost.", 248 ], ...).

The errors says you're trying to feed a sequence of arrays to an array
function; maybe you want ! where you have => ?

-- Graydon


Re: [basex-talk] How to apply array:for-each on an array of arrays?

2020-03-30 Thread Bridger Dyson-Smith
Hi Ben -
I'm on mobile, please excuse any typos.

Maybe
`return array { $idf }`
is closer?

Untested, apologies!
Best,
Bridger


On Mon, Mar 30, 2020, 5:16 PM Ben Engbers  wrote:

> Hi,
>
> In textmining, the 'idf' or inverse document frequency is defined as
> idf(term)=ln(ndocuments / ndocuments containing term). I am working on a
> function that should return this idf.
>
> This function:
>
> declare function local:wordFreq_idf($nodes as node()*) as array(*) {
>   let $count := count($nodes)
>   let $text := for $node in $nodes
>  return $node/text() => tokenize() => distinct-values()
>  let $idf := $text   => tidyTM:wordCount_arr()
>   return $idf
> };
>
> returns:
>
> ["probleem", 703]
> ["opgelost.", 248]
> ["dictu", 235]
> ["opgelost", 217]
> ["medewerker", 193]
> ...
>
> For "probleem", the idf should be calculated as ln($count/703). Since
> there are 1780 nodes this would result in 0.929011751.
> I tried to exten the 'let $idf' line with:
>=> array:for-each(function($idf) {array:append($idf,
> math:log($count div $idf[2]) )})
> which should result in ["probleem", 703, 0.929011751]
>
> but no mather what I do, every time I get this error:
> [XPTY0004] Cannot promote (array(xs:anyAtomicType))+ to array(*): ([
> "probleem", 703 ], [ "opgelost.", 248 ], ...).
>
> Is it possible to apply array:for-each on an array of arrays?
>
> Ben
>
>