Another approach (sketchily/logically) would be to do the case-handling
on output, i.e., don't start by segmenting things based on which kind
they are - process them all and do the different handling in the return
clause...?
On 11/30/15 11:40 AM, Jianfeng Jia wrote:
It seems hitting the BigObject issue, the error message supposed to be "255 *
DefaultFrameSize" bytes.
On the other hand, I don’t quite understand the final statement:
-------------
//print all authors.
let $res := (for $t in [$coAuth,$noCoAuth]
limit 100
return $t)
-------------
I think you are expecting a union operation instead.
The list constructor ([]) doesn't unnest the record for the internal list. For
example, I tried the following query
-------------
let $x := [ { "a":1},{ "a":2},{ "a":3}]
let $y := [ { "b":1},{ "b":2},{ "b":3}]
let $xy := [$x, $y]
for $tx in $xy
return $tx
-------------
It returns the following result.
[ { "a": 1 }, { "a": 2 }, { "a": 3 } ]
[ { "b": 1 }, { "b": 2 }, { "b": 3 } ]
That means the $xy has two large records: $x and $y, not the six smaller
records.
Similarly, the "for $t in [$coAuth,$noCoAuth]” will only return two records.
The first one is the $coAuth list, and the second one is the $noCoAuth list. It will
definitely hit the big object problem or other memory issues if either one list is
too big.
You can try the union function as following:
for $t in $coAuth union $noCoAuth
return $t
On Nov 30, 2015, at 7:17 AM, Mike Carey <[email protected]> wrote:
I will look into details later, but:
1. The answer to your question is yes - ORDER BY and LIMIT will both have the
results landing (at present) on a single node. We need to add support for
range-partitioned results!
2. It would be good to get familiar with reading query plans and also looking for
"listify" operations that might be in unfortunate places in query plans (which
can cause frame size issues).
Cheers,
Mike
On 11/30/15 5:55 AM, Wail Alkowaileet wrote:
Hi Team,
I noticed a weird behavior when executing an AQL with the limit clause
(LIMIT 100000)
I get an exception in one NC: java.lang.OutOfMemoryError
while the others seem to operate normally.
my -Xmx configurations are the default:
nc.java.opts :-Xmx1536m
cc.java.opts :-Xmx1024m
Here is the story:
I have a dataset for publications. The data contains huge nested and
heterogenous records.
Therefore, the specified type contains only a unique ID.
create type wosType as open
{
UID:string
}
After loading the data, I want to extract all the authors names (first and
last). However, the authors details for each publications is *heterogenous*.
if there is only one author (i.e no co-authors), the type of field "name"
is a JSON object, ordered list o.w
So I did the following (excuse the ugliness of my AQL):
-----------------------------
use dataverse wosDataverse
*//Get name details for single-authors*
let $noCoAuth := (for $x in dataset wos
let $summary := $x.static_data.summary
let $names := $summary.names
where $names.count = "1"
return {
"firstName":$names.name.first_name,
"lastName":$names.name.last_name
}
)
*//Generate a list of names for all co-authors*
let $coAuthList := (for $x in dataset wos
let $summary := $x.static_data.summary
let $names := $summary.names
where $names.count != "1"
return $names.name
)
*//Flatten the co-authors name list*
let $coAuth := (for $x in $coAuthList
for $y in $x
return {"firstName":$y.first_name,"lastName":$y.last_name})
//print all authors.
let $res := (for $t in [$coAuth,$noCoAuth]
limit 100
return $t)
return $res
-----------------------------
This query couldn't be executed due to frame size limit:
Unable to allocate frame larger than:255 bytes [HyracksDataException]
So..
I limited the number of the results as such:
-----------------------------
use dataverse wosDataverse
let $noCoAuth := (for $x in dataset wos
let $summary := $x.static_data.summary
let $names := $summary.names
where $names.count = "1"
*limit 100000*
return {
"firstName":$names.name.first_name,
"lastName":$names.name.last_name
}
)
let $coAuthList := (for $x in dataset wos
let $summary := $x.static_data.summary
let $names := $summary.names
where $names.count != "1"
return $names.name
)
let $coAuth := (for $x in $coAuthList
for $y in $x
*limit 100000*
return {"firstName":$y.first_name,"lastName":$y.last_name})
let $res := (for $t in [$coAuth, $noCoAuth]
limit 100
return $t)
return $res
-----------------------------
Once I execute the previous AQL, one node (different one in each run)
reaches *400%* cpu-load (4-cores) and swallows up all the available memory
it can get.
For smaller result (e.g. limit 10000), it works fine.
Thanks and sorry for the long email.
Best,
Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine