Re: [Pytables-users] In-kernal for subset?
From: Anthony Scopatz scop...@gmail.commailto:scop...@gmail.com Reply-To: Discussion list for PyTables pytables-users@lists.sourceforge.netmailto:pytables-users@lists.sourceforge.net Date: Wednesday, August 15, 2012 11:29 PM To: Discussion list for PyTables pytables-users@lists.sourceforge.netmailto:pytables-users@lists.sourceforge.net Subject: Re: [Pytables-users] In-kernal for subset? On Thu, Aug 16, 2012 at 1:06 AM, Adam Dershowitz adershow...@exponent.commailto:adershow...@exponent.com wrote: From: Anthony Scopatz scop...@gmail.commailto:scop...@gmail.com Reply-To: Discussion list for PyTables pytables-users@lists.sourceforge.netmailto:pytables-users@lists.sourceforge.net Date: Wednesday, August 15, 2012 2:47 PM To: Discussion list for PyTables pytables-users@lists.sourceforge.netmailto:pytables-users@lists.sourceforge.net Subject: Re: [Pytables-users] In-kernal for subset? On Wed, Aug 15, 2012 at 12:33 PM, Adam Dershowitz adershow...@exponent.commailto:adershow...@exponent.com wrote: I am trying to find all cases where a value transitions above a threshold. So, my code first does a getwherelist to find values that are above the threshold, then it uses that list to find immediately prior values that are below. The code is working, but the second part, searching through just a smaller subset is much slower (First search is on the order of 1 second, while the second is a minute). Is there any way to get this second part of the search in-kernal? Or any more general way to do a search for values above a threshold, where the prior value is below? Essentially, what I am looking for is a way to speed up that second search for all rows in a prior defined list, where a condition is applied to the table My table is just seconds and values, in chronological order. Here is the code that I am using now: h5data = tb.openFile(AllData.h5,r) table1 = h5data.root.table1 #Find all values above threshold: thelist= table1.getWhereList(Value 150) #From the above list find all values where the immediately prior value is below: transition=[] for i in thelist: if (table1[i-1]['Value'] 150) and (i != 0) : transition.append(i) Hey Adam, Sorry for taking a while to respond. Assuming you don't mind one of these being = or =, you don't really need the second loop with a little index arithmetic: import numpy as np inds = np.array(thelist) dinds = inds[1:] - inds[:-1] transition = dinds[(1 dinds)] This should get you an array of all of the transition indices since wherever the difference in indices is greater than 1 the Value must have dropped below the threshold and then returned back up. Be Well Anthony Thanks much for the response. At first it didn't work, but it gave me the right idea, and now I got it working. There were two problems above. 1) I believe that yon u had a typo and the last line should have been inds[(1 … and not dinds[(1… Otherwise you just get back the deltas instead of the actual index values. Whoops, serves me right for hacking this out so quickly! But, that still returned an array that wasn't working. Turns out, after thinking some, that it was actually offset by one. So by prepending a value into dinds (greater then 1, since the first value greater than the threshold, must always be a transition or the first table entry) it seems to solve the problem. Here is the code that seems to work: import numpy as np inds = np.array(thelist) dinds=np.append([2],inds[1:] - inds[:-1]) trans=inds[(1dinds)] Now, I am still curious, more for academic reasons, since the code now works, if there would be a way to speed up the second loop above? It seems like there are other examples, where index arithmetic might not work, so is there a way to do an in-kernal search through just a subset of a table? So the issue is that we rely on numexpr here for our in-kernel queries and numpexpr doesn't support indexing at all. There may be hope for this in the future (see numba). So the go stndexal here is to do whatever you can to not have queries which rely on comparing two different indexes of the same data. If you really wanted to do this quickly and in kernel, you could probably store two copies of the data. Call 'a' the original and 'b' a copy of 'a' that is offset by 1 index and has a dummy value at the end (to make them the same size). Then you could do something like: tb.Expr('a == b') This would only work on Array, CArray, and Earray data. You might be able to get it to work using Tables with something like: tb.Expr('a == b', uservars={'a': atable, 'b': btable}) I hope this helps. Be Well Anthony Yes, it helps explain the issue. I appreciate the info. --Adam -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security,
Re: [Pytables-users] openFile strategy question
Hi Anthony, Oh OK, I think I understand a little better. What I would do would be to make for i,file in enumerate(hdf5_files) the outer most loop and then use the File.walkNodes() method [1] to walk each file and pick out only the data sets that you want to copy, skipping over all others. This should allow you to only open each of the 400 files once. Hope this helps. Thanks. This is the idea I had, but was failing to implement (although I didn't use walkNodes). To get it to work, I had to figure out how to use createEArray properly. In the end, it was a silly fix. I created an EArray with shape (0,96,1,2), and was trying to append numpy arrays of shape (96,1,2) to this, which was failing. In the end, all I had to do was arr.append(np.array([my_array])) where as before, I was simply missing the [ ] brackets, so the shapes did not line up. Cheers, Andre -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
[Pytables-users] Searching for nan values in a table...
Hello All, I am trying to determine if there are any NaN values in one of my tables, but when I queried for numpy.nan I received a NameError. Can any tell be the best way to search for a NaN value? Thanks! In [7]: type(np.nan) Out[7]: float In [8]: bad_vols = tbl.getWhereList('volume == %f' % np.nan) --- NameError Traceback (most recent call last) /Users/aquilabdullah/ipython-input-8-2c1b183b0581 in module() 1 bad_vols = tbl.getWhereList('volume == %f' % np.nan) /Library/Python/2.7/site-packages/tables/table.pyc in getWhereList(self, condition, condvars, sort, start, stop, step) 1540 1541 coords = [ p.nrow for p in - 1542self._where(condition, condvars, start, stop, step) ] 1543 coords = numpy.array(coords, dtype=SizeType) 1544 # Reset the conditions /Library/Python/2.7/site-packages/tables/table.pyc in _where(self, condition, condvars, start, stop, step) 1434 1435 # Compile the condition and extract usable index conditions. - 1436 condvars = self._requiredExprVars(condition, condvars, depth=3) 1437 compiled = self._compileCondition(condition, condvars) 1438 /Library/Python/2.7/site-packages/tables/table.pyc in _requiredExprVars(self, expression, uservars, depth) 1207 val = user_globals[var] 1208 else: - 1209 raise NameError(name ``%s`` is not defined % var) 1210 1211 # Check the value. NameError: name ``nan`` is not defined -- Aquil H. Abdullah I never think of the future. It comes soon enough - Albert Einstein -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
Re: [Pytables-users] Searching for nan values in a table...
I get the same error if I use: bad_vols = tbl.getWhereList('volume == nan') bad_vols = tbl.getWhereList('volume == NaN') -- Aquil H. Abdullah I never think of the future. It comes soon enough - Albert Einstein On Thursday, August 16, 2012 at 1:52 PM, Anthony Scopatz wrote: Have you tried simply doing: 'volume == nan' or 'volume == NaN' On Thu, Aug 16, 2012 at 12:49 PM, Aquil H. Abdullah aquil.abdul...@gmail.com (mailto:aquil.abdul...@gmail.com) wrote: Hello All, I am trying to determine if there are any NaN values in one of my tables, but when I queried for numpy.nan I received a NameError. Can any tell be the best way to search for a NaN value? Thanks! In [7]: type(np.nan) Out[7]: float In [8]: bad_vols = tbl.getWhereList('volume == %f' % np.nan) --- NameError Traceback (most recent call last) /Users/aquilabdullah/ipython-input-8-2c1b183b0581 in module() 1 bad_vols = tbl.getWhereList('volume == %f' % np.nan) /Library/Python/2.7/site-packages/tables/table.pyc in getWhereList(self, condition, condvars, sort, start, stop, step) 1540 1541 coords = [ p.nrow for p in - 1542self._where(condition, condvars, start, stop, step) ] 1543 coords = numpy.array(coords, dtype=SizeType) 1544 # Reset the conditions /Library/Python/2.7/site-packages/tables/table.pyc in _where(self, condition, condvars, start, stop, step) 1434 1435 # Compile the condition and extract usable index conditions. - 1436 condvars = self._requiredExprVars(condition, condvars, depth=3) 1437 compiled = self._compileCondition(condition, condvars) 1438 /Library/Python/2.7/site-packages/tables/table.pyc in _requiredExprVars(self, expression, uservars, depth) 1207 val = user_globals[var] 1208 else: - 1209 raise NameError(name ``%s`` is not defined % var) 1210 1211 # Check the value. NameError: name ``nan`` is not defined -- Aquil H. Abdullah I never think of the future. It comes soon enough - Albert Einstein -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net (mailto:Pytables-users@lists.sourceforge.net) https://lists.sourceforge.net/lists/listinfo/pytables-users -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net (mailto:Pytables-users@lists.sourceforge.net) https://lists.sourceforge.net/lists/listinfo/pytables-users -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
Re: [Pytables-users] Searching for nan values in a table...
So this is probably a numexpr issue. There doesn't seem to be an isnan() implementation [1]. I would bring it up with them. Sorry we can't do more. Be Well Anthony 1. http://code.google.com/p/numexpr/wiki/UsersGuide On Thu, Aug 16, 2012 at 12:57 PM, Aquil H. Abdullah aquil.abdul...@gmail.com wrote: I get the same error if I use: bad_vols = tbl.getWhereList('volume == nan') bad_vols = tbl.getWhereList('volume == NaN') -- Aquil H. Abdullah I never think of the future. It comes soon enough - Albert Einstein On Thursday, August 16, 2012 at 1:52 PM, Anthony Scopatz wrote: Have you tried simply doing: 'volume == nan' or 'volume == NaN' On Thu, Aug 16, 2012 at 12:49 PM, Aquil H. Abdullah aquil.abdul...@gmail.com wrote: Hello All, I am trying to determine if there are any NaN values in one of my tables, but when I queried for numpy.nan I received a NameError. Can any tell be the best way to search for a NaN value? Thanks! In [7]: type(np.nan) Out[7]: float In [8]: bad_vols = tbl.getWhereList('volume == %f' % np.nan) --- NameError Traceback (most recent call last) /Users/aquilabdullah/ipython-input-8-2c1b183b0581 in module() 1 bad_vols = tbl.getWhereList('volume == %f' % np.nan) /Library/Python/2.7/site-packages/tables/table.pyc in getWhereList(self, condition, condvars, sort, start, stop, step) 1540 1541 coords = [ p.nrow for p in - 1542self._where(condition, condvars, start, stop, step) ] 1543 coords = numpy.array(coords, dtype=SizeType) 1544 # Reset the conditions /Library/Python/2.7/site-packages/tables/table.pyc in _where(self, condition, condvars, start, stop, step) 1434 1435 # Compile the condition and extract usable index conditions. - 1436 condvars = self._requiredExprVars(condition, condvars, depth=3) 1437 compiled = self._compileCondition(condition, condvars) 1438 /Library/Python/2.7/site-packages/tables/table.pyc in _requiredExprVars(self, expression, uservars, depth) 1207 val = user_globals[var] 1208 else: - 1209 raise NameError(name ``%s`` is not defined % var) 1210 1211 # Check the value. NameError: name ``nan`` is not defined -- Aquil H. Abdullah I never think of the future. It comes soon enough - Albert Einstein -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
Re: [Pytables-users] Searching for nan values in a table...
You are correct sir there doesn't appear to be support for isnan or infinite: http://code.google.com/p/numexpr/issues/detail?id=23q=nan I'll try something far less efficient, possibly a lambda expression on table column. Anyways, thanks for root causing the issue. -- Aquil H. Abdullah I never think of the future. It comes soon enough - Albert Einstein On Thursday, August 16, 2012 at 2:01 PM, Anthony Scopatz wrote: So this is probably a numexpr issue. There doesn't seem to be an isnan() implementation [1]. I would bring it up with them. Sorry we can't do more. Be Well Anthony 1. http://code.google.com/p/numexpr/wiki/UsersGuide On Thu, Aug 16, 2012 at 12:57 PM, Aquil H. Abdullah aquil.abdul...@gmail.com (mailto:aquil.abdul...@gmail.com) wrote: I get the same error if I use: bad_vols = tbl.getWhereList('volume == nan') bad_vols = tbl.getWhereList('volume == NaN') -- Aquil H. Abdullah I never think of the future. It comes soon enough - Albert Einstein On Thursday, August 16, 2012 at 1:52 PM, Anthony Scopatz wrote: Have you tried simply doing: 'volume == nan' or 'volume == NaN' On Thu, Aug 16, 2012 at 12:49 PM, Aquil H. Abdullah aquil.abdul...@gmail.com (mailto:aquil.abdul...@gmail.com) wrote: Hello All, I am trying to determine if there are any NaN values in one of my tables, but when I queried for numpy.nan I received a NameError. Can any tell be the best way to search for a NaN value? Thanks! In [7]: type(np.nan) Out[7]: float In [8]: bad_vols = tbl.getWhereList('volume == %f' % np.nan) --- NameError Traceback (most recent call last) /Users/aquilabdullah/ipython-input-8-2c1b183b0581 in module() 1 bad_vols = tbl.getWhereList('volume == %f' % np.nan) /Library/Python/2.7/site-packages/tables/table.pyc in getWhereList(self, condition, condvars, sort, start, stop, step) 1540 1541 coords = [ p.nrow for p in - 1542self._where(condition, condvars, start, stop, step) ] 1543 coords = numpy.array(coords, dtype=SizeType) 1544 # Reset the conditions /Library/Python/2.7/site-packages/tables/table.pyc in _where(self, condition, condvars, start, stop, step) 1434 1435 # Compile the condition and extract usable index conditions. - 1436 condvars = self._requiredExprVars(condition, condvars, depth=3) 1437 compiled = self._compileCondition(condition, condvars) 1438 /Library/Python/2.7/site-packages/tables/table.pyc in _requiredExprVars(self, expression, uservars, depth) 1207 val = user_globals[var] 1208 else: - 1209 raise NameError(name ``%s`` is not defined % var) 1210 1211 # Check the value. NameError: name ``nan`` is not defined -- Aquil H. Abdullah I never think of the future. It comes soon enough - Albert Einstein -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net (mailto:Pytables-users@lists.sourceforge.net) https://lists.sourceforge.net/lists/listinfo/pytables-users -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net (mailto:Pytables-users@lists.sourceforge.net) https://lists.sourceforge.net/lists/listinfo/pytables-users -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/