It seems hitting the BigObject issue, the error message supposed to be "255 * 
DefaultFrameSize" bytes.

On the other hand, I don’t quite understand the final statement:
-------------
//print all authors.
let $res := (for $t in  [$coAuth,$noCoAuth]
limit 100
return $t)
-------------

I think you are expecting a union operation instead. 
The list constructor ([]) doesn't unnest the record for the internal list. For 
example, I tried the following query
-------------
let $x := [ { "a":1},{ "a":2},{ "a":3}]
let $y := [ { "b":1},{ "b":2},{ "b":3}]
let $xy := [$x, $y]
for $tx in $xy
return  $tx
-------------

It returns the following result. 
[ { "a": 1 }, { "a": 2 }, { "a": 3 } ]
[ { "b": 1 }, { "b": 2 }, { "b": 3 } ]
That means the $xy has two large records: $x and $y, not the six smaller 
records. 

Similarly, the "for $t in [$coAuth,$noCoAuth]” will only return two records. 
The first one is the $coAuth list, and the second one is the $noCoAuth list. It 
will definitely hit the big object problem or other memory issues if either one 
list is too big. 

You can try the union function as following:

for $t in $coAuth union $noCoAuth 
return $t

> On Nov 30, 2015, at 7:17 AM, Mike Carey <[email protected]> wrote:
> 
> I will look into details later, but:
> 
> 1. The answer to your question is yes - ORDER BY and LIMIT will both have the 
> results landing (at present) on a single node.  We need to add support for 
> range-partitioned results!
> 
> 2. It would be good to get familiar with reading query plans and also looking 
> for "listify" operations that might be in unfortunate places in query plans 
> (which can cause frame size issues).
> 
> Cheers,
> Mike
> 
> 
> On 11/30/15 5:55 AM, Wail Alkowaileet wrote:
>> Hi Team,
>> 
>> I noticed a weird behavior when executing an AQL with the limit clause
>> (LIMIT 100000)
>> I get an exception in one NC: java.lang.OutOfMemoryError
>> while the others seem to operate normally.
>> 
>> my -Xmx configurations are the default:
>> nc.java.opts                             :-Xmx1536m
>> cc.java.opts                             :-Xmx1024m
>> 
>> Here is the story:
>> 
>> I have a dataset for publications. The data contains huge nested and
>> heterogenous records.
>> Therefore, the specified type contains only a unique ID.
>> 
>> create type wosType as open
>> {
>> UID:string
>> }
>> 
>> After loading the data, I want to extract all the authors names (first and
>> last). However, the authors details for each publications is *heterogenous*.
>> if there is only one author (i.e no co-authors), the type of field "name"
>> is a JSON object, ordered list o.w
>> 
>> So I did the following (excuse the ugliness of my AQL):
>> 
>> -----------------------------
>> use dataverse wosDataverse
>> 
>> *//Get name details for single-authors*
>> let $noCoAuth := (for $x in dataset wos
>> let $summary := $x.static_data.summary
>> let $names := $summary.names
>> where $names.count = "1"
>> return {
>> "firstName":$names.name.first_name,
>> "lastName":$names.name.last_name
>> }
>> )
>> 
>> *//Generate a list of names for all co-authors*
>> let $coAuthList := (for $x in dataset wos
>> let $summary := $x.static_data.summary
>> let $names := $summary.names
>> where $names.count != "1"
>> return $names.name
>> )
>> 
>> *//Flatten the co-authors name list*
>> let $coAuth := (for $x in $coAuthList
>> for $y in $x
>> return {"firstName":$y.first_name,"lastName":$y.last_name})
>> 
>> //print all authors.
>> let $res := (for $t in  [$coAuth,$noCoAuth]
>> limit 100
>> return $t)
>> 
>> return $res
>> -----------------------------
>> 
>> 
>> This query couldn't be executed due to frame size limit:
>> 
>> Unable to allocate frame larger than:255 bytes [HyracksDataException]
>> 
>> So..
>> I limited the number of the results as such:
>> 
>> -----------------------------
>> use dataverse wosDataverse
>> let $noCoAuth := (for $x in dataset wos
>> let $summary := $x.static_data.summary
>> let $names := $summary.names
>> where $names.count = "1"
>> *limit 100000*
>> return {
>> "firstName":$names.name.first_name,
>> "lastName":$names.name.last_name
>> }
>> )
>> 
>> let $coAuthList := (for $x in dataset wos
>> let $summary := $x.static_data.summary
>> let $names := $summary.names
>> where $names.count != "1"
>> return $names.name
>> )
>> 
>> let $coAuth := (for $x in $coAuthList
>> for $y in $x
>> *limit 100000*
>> return {"firstName":$y.first_name,"lastName":$y.last_name})
>> 
>> 
>> let $res := (for $t in [$coAuth, $noCoAuth]
>> limit 100
>> return $t)
>> 
>> return $res
>> -----------------------------
>> 
>> Once I execute the previous AQL, one node (different one in each run)
>> reaches *400%* cpu-load (4-cores) and swallows up all the available memory
>> it can get.
>> 
>> 
>> For smaller result (e.g. limit 10000), it works fine.
>> 
>> 
>> Thanks and sorry for the long email.
> 



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine

Reply via email to