Re: [Pytables-users] In-kernal for subset?

Anthony Scopatz Wed, 15 Aug 2012 23:30:38 -0700

On Thu, Aug 16, 2012 at 1:06 AM, Adam Dershowitz
<adershow...@exponent.com>wrote:


>   From: Anthony Scopatz <scop...@gmail.com>
> Reply-To: Discussion list for PyTables <
> pytables-users@lists.sourceforge.net>
> Date: Wednesday, August 15, 2012 2:47 PM
> To: Discussion list for PyTables <pytables-users@lists.sourceforge.net>
> Subject: Re: [Pytables-users] In-kernal for subset?
>
>   On Wed, Aug 15, 2012 at 12:33 PM, Adam Dershowitz <
> adershow...@exponent.com> wrote:
>
>>  I am trying to find all cases where a value transitions above a
>> threshold.  So, my code first does a getwherelist to find values that are
>> above the threshold, then it uses that list to find immediately prior
>> values that are below.  The code is working, but the second part, searching
>> through just a smaller subset is much slower (First search is on the order
>> of 1 second, while the second is a minute).
>> Is there any way to get this second part of the search in-kernal?  Or any
>> more general way to do a search for values above a threshold, where the
>> prior value is below?
>> Essentially, what I am looking for is a way to speed up that second
>> search for "all rows in a prior defined list, where a condition is applied
>> to the table"
>>
>>  My table is just seconds and values, in chronological order.
>>
>>  Here is the code that I am using now:
>>
>>  h5data = tb.openFile("AllData.h5","r")
>> table1 = h5data.root.table1
>>
>>  #Find all values above threshold:
>>  thelist= table1.getWhereList("""Value > 150""")
>>
>>  #From the above list find all values where the immediately prior value
>> is below:
>> transition=[]
>> for i in thelist:
>> if (table1[i-1]['Value'] < 150) and (i != 0) :
>> transition.append(i)
>>
>
>  Hey Adam,
>
>  Sorry for taking a while to respond.  Assuming you don't mind one of
> these being <= or >=, you don't really need the second loop with a little
> index arithmetic:
>
>  import numpy as np
> inds = np.array(thelist)
> dinds = inds[1:] - inds[:-1]
> transition = dinds[(1 < dinds)]
>
>  This should get you an array of all of the transition indices since
> wherever the difference in indices is greater than 1 the Value must have
> dropped below the threshold and then returned back up.
>
>  Be Well
> Anthony
>
>
>>
>>
>  Thanks much for the response.  At first it didn't work, but it gave me
> the right idea, and now I got it working.  There were two problems above.
>  1)  I believe that yon u had a typo and the last line should have been
> "inds[(1 < …" and not "dinds[(1<…"  Otherwise you just get back the deltas
> instead of the actual index values.
>

Whoops, serves me right for hacking this out so quickly!


>  But, that still returned an array that wasn't working.  Turns out, after
> thinking some, that it was actually offset by one.  So by prepending a
> value into dinds (greater then 1, since the first value greater than the
> threshold, must always be a transition or the first table entry) it seems
> to solve the problem.  Here is the code that seems to work:
>
>  import numpy as np
>  inds = np.array(thelist)
> dinds=np.append([2],inds[1:] - inds[:-1])
> trans=inds[(1<dinds)]
>
>  Now, I am still curious, more for academic reasons, since the code now
> works, if there would be a way to speed up the second loop above?  It seems
> like there are other examples, where index arithmetic might not work, so is
> there a way to do an in-kernal search through just a subset of a table?
>

So the issue is that we rely on numexpr here for our in-kernel queries and
numpexpr doesn't support indexing at all.  There may be hope for this in
the future (see numba).  So the go stndexal here is to do whatever you can
to not have queries which rely on comparing two different indexes of the
same data.

If you really wanted to do this quickly and in kernel, you could probably
store two copies of the data. Call 'a' the original and 'b' a copy of 'a'
that is offset by 1 index and has a dummy value at the end (to make them
the same size).  Then you could do something like:

tb.Expr('a == b')

This would only work on Array, CArray, and Earray data.  You might be able
to get it to work using Tables with something like:

tb.Expr('a == b', uservars={'a': atable, 'b': btable})

I hope this helps.
Be Well
Anthony


>  Again, thanks for the help!
>
>  Best,
>
>  --Adam
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] In-kernal for subset?

Reply via email to