Hi Asterix team,
I have two queries I'm struggling with. I'm hoping you could provide a
direction for me. Thanks in advance!
Here is what the data structure looks like:
create type HRMType as closed {
row_id: int32,
sid: int32,
date: date,
day: int32,
time: time,
bpm: int32,
RR: float
};
create dataset HRM (HRMType)
primary key row_id;
Previously I have used the time bin function to calculate the standard
deviation of bpm for each time bin:
for $i in dataset HRM
group by $sid := $i.sid, $gdate := $i.date, $gday := $i.day, $timebin :=
interval-bin($i.time, time("00:00:00"), day-time-duration("PT1M")) with $i
return {
"sid": $sid,
"gdate": $gdate,
"gday": $gday,
"timebin": $timebin,
"stdv": (avg(for $ii in $i return $ii.RR * $ii.RR) - avg(for $ii in $i
return $ii.RR) * avg(for $ii in $i return $ii.RR))^(0.5)};
Now I have two things I am hoping to do but need help with:
1. For each 1-min time bin, remove the bpm values that are above the top 5%
or below the bottom 5%. I thought about using the min/max function for a
few times to achieve this, but realized that it was not a good idea because
in each time bin, the number of instances are not always the same. So for
each 1-min time bin, we do need to calculate the 5% and 95% threshold, and
remove instances accordingly, which I don't know how to do.
2. After removing the outliers of bpm for each 1-min time bin, calculate a
median absolute deviation (MAD) for each 1-min time bin (as another measure
of variation besides the standard deviation). MAD =
median(abs(x-median(x)). I'm not sure how to write a query to do the median
function in Asterix.
Thank you so much in advance. Let me know if my questions are clear.
Yiran
--
Best,
Yiran