Re: [Numpy-discussion] Statistical distributions on samples
Hi Chris Brennan, On 15 August 2011 00:59, Brennan Williams wrote: You can use scipy.stats.truncnorm, can't you? Unless I misread, you want to sample a normal distribution but with generated values only being within a specified range? However you also say you want to do this with triangular and log normal and for these I presume the easiest way is to sample and then accept/reject. Brennan On 13/08/2011 2:53 a.m., Christopher Jordan-Squire wrote: Hi Andrea--An easy way to get something like this would be import numpy as np import scipy.stats as stats sigma = #some reasonable standard deviation for your application x = stats.norm.rvs(size=1000, loc=125, scale=sigma) x = x[x50] x = x[x200] That will give a roughly normal distribution to your velocities, as long as, say, sigma25. (I'm using the rule of thumb for the normal distribution that normal random samples lie 3 standard deviations away from the mean about 1 out of 350 times.) Though you won't be able to get exactly normal errors about your mean since normal random samples can theoretically be of any size. You can use this same process for any other distribution, as long as you've chosen a scale variable so that the probability of samples being outside your desired interval is really small. Of course, once again your random errors won't be exactly from the distribution you get your original samples from. Thank you for your answer. Indeed, it appears that a truncated distribution implementation exists only for the normal distribution (in the subset of distributions I need to use). I haven't checked yet what the code for truncnorm does but maybe it might be possible to apply the same approach for other distributions. In any case the sampling/reject/accept approach is the best approach for me, due to my ignorance about statistical things :-) Thank you again. Andrea. Imagination Is The Only Weapon In The War Against Reality. http://xoomer.alice.it/infinity77/ import PyQt4.QtGui Traceback (most recent call last): File interactive input, line 1, in module ImportError: No module named PyQt4.QtGui import pygtk Traceback (most recent call last): File interactive input, line 1, in module ImportError: No module named pygtk import wx ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] ULONG not in UINT16, UINT32, UINT64 under 64-bit windows, is this possible?
Hi, A student of mine using 32-bit numpy 1.5 under 64-bit Windows 7 noticed that giving a numpy array with dtype=uint32 to an extension module the following codelet would fail: switch(PyArray_TYPE(ARR)) { case PyArray_UINT16: /* do smth */ break; case PyArray_UINT32: /* do smth */ break; case PyArray_UINT64: /* do smth */ break; default: /* raise type error exception */ } The same test worked fine under Linux. Checking the value of PyArray_TYPE(ARR) (=8) showed that it corresponds to NPY_ULONG (when counting the items in the enum definition). Is this situation possible where NPY_ULONG does not correspond to a 16 or 32 or 64 bit integer? Or does this indicate a bug somewhere for this particular platform? Thanks, Pearu ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ULONG not in UINT16, UINT32, UINT64 under 64-bit windows, is this possible?
The reason is there can be multiple dtypes (i.e. with different .num) representing the same kind of data. Usually in Python this goes unnoticed, because you do not test a dtype through its .num, instead you use for instance == 'uint32', and all works fine. However, it can indeed confuse C code in situations like the one you describe, because of direct comparison of .num. I guess you have a few options: - Do not compare .num (I'm not sure what would be the equivalent to == 'utin32' in C though) = probably slower - Re-cast your array in the exact dtype you need (in Python you can do this with .view) = probably cumbersome - Write a customized comparison function that figures out at initialization time all dtypes that represent the same data, and then is able to do a fast comparison based on .num = probably best, but requires more work Here's some Python code that lists the various scalar dtypes associated to unique .num in numpy (excerpt slightly modified from code found in Theano -- http://deeplearning.net/software/theano -- BSD license). Call the get_numeric_types() function, and print both the string representation of the resulting dtypes as well as their .num. def get_numeric_subclasses(cls=numpy.number, ignore=None): Return subclasses of `cls` in the numpy scalar hierarchy. We only return subclasses that correspond to unique data types. The hierarchy can be seen here: http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html if ignore is None: ignore = [] rval = [] dtype = numpy.dtype(cls) dtype_num = dtype.num if dtype_num not in ignore: # Safety check: we should be able to represent 0 with this data type. numpy.array(0, dtype=dtype) rval.append(cls) ignore.append(dtype_num) for sub in cls.__subclasses__(): rval += [c for c in get_numeric_subclasses(sub, ignore=ignore)] return rval def get_numeric_types(): Return numpy numeric data types. :returns: A list of unique data type objects. Note that multiple data types may share the same string representation, but can be differentiated through their `num` attribute. rval = [] def is_within(cls1, cls2): # Return True if scalars defined from `cls1` are within the hierarchy # starting from `cls2`. # The third test below is to catch for instance the fact that # one can use ``dtype=numpy.number`` and obtain a float64 scalar, even # though `numpy.number` is not under `numpy.floating` in the class # hierarchy. return (cls1 is cls2 or issubclass(cls1, cls2) or isinstance(numpy.array([0], dtype=cls1)[0], cls2)) for cls in get_numeric_subclasses(): dtype = numpy.dtype(cls) rval.append([str(dtype), dtype, dtype.num]) # We sort it to be deterministic, then remove the string and num elements. return [x[1] for x in sorted(rval, key=str)] 2011/8/15 Pearu Peterson pearu.peter...@gmail.com Hi, A student of mine using 32-bit numpy 1.5 under 64-bit Windows 7 noticed that giving a numpy array with dtype=uint32 to an extension module the following codelet would fail: switch(PyArray_TYPE(ARR)) { case PyArray_UINT16: /* do smth */ break; case PyArray_UINT32: /* do smth */ break; case PyArray_UINT64: /* do smth */ break; default: /* raise type error exception */ } The same test worked fine under Linux. Checking the value of PyArray_TYPE(ARR) (=8) showed that it corresponds to NPY_ULONG (when counting the items in the enum definition). Is this situation possible where NPY_ULONG does not correspond to a 16 or 32 or 64 bit integer? Or does this indicate a bug somewhere for this particular platform? Thanks, Pearu ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Statistical distributions on samples
Hi Chris and All, On 12 August 2011 16:53, Christopher Jordan-Squire wrote: Hi Andrea--An easy way to get something like this would be import numpy as np import scipy.stats as stats sigma = #some reasonable standard deviation for your application x = stats.norm.rvs(size=1000, loc=125, scale=sigma) x = x[x50] x = x[x200] That will give a roughly normal distribution to your velocities, as long as, say, sigma25. (I'm using the rule of thumb for the normal distribution that normal random samples lie 3 standard deviations away from the mean about 1 out of 350 times.) Though you won't be able to get exactly normal errors about your mean since normal random samples can theoretically be of any size. You can use this same process for any other distribution, as long as you've chosen a scale variable so that the probability of samples being outside your desired interval is really small. Of course, once again your random errors won't be exactly from the distribution you get your original samples from. Thank you for your suggestion. There are a couple of things I am not clear with, however. The first one (the easy one), is: let's suppose I need 200 values, and the accept/discard procedure removes 5 of them from the list. Is there any way to draw these 200 values from a bigger sample so that the accept/reject procedure will not interfere too much? And how do I get 200 values out of the bigger sample so that these values are still representative? Another thing, possibly completely unrelated. I am trying to design a toy Latin Hypercube script (just for my own understanding). I found this piece of code on the web (and I modified it slightly): def lhs(dist, size=100): ''' Latin Hypercube sampling of any distrbution. dist is is a scipy.stats random number generator such as stats.norm, stats.beta, etc parms is a tuple with the parameters needed for the specified distribution. :Parameters: - `dist`: random number generator from scipy.stats module. - `size` :size for the output sample ''' n = size perc = numpy.arange(0.0, 1.0, 1.0/n) numpy.random.shuffle(perc) smp = [stats.uniform(i,1.0/n).rvs() for i in perc] v = dist.ppf(smp) return v Now, I am not 100% clear of what the percent point function is (I have read around the web, but please keep in mind that my statistical skills are close to minus infinity). From this page: http://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm I gather that, if you plot the results of the ppf, with the horizontal axis as probability, the vertical axis goes from the smallest to the largest value of the cumulative distribution function. If i do this: numpy.random.seed(123456) distribution = stats.norm(loc=125, scale=25) my_lhs = lhs(distribution, 50) Will my_lhs always contain valid values (i.e., included between 50 and 200)? I assume the answer is no... but even if this was the case, is this my_lhs array ready to be used to setup a LHS experiment when I have multi-dimensional problems (in which all the variables are completely independent from each other - no correlation)? My apologies for the idiocy of the questions. Andrea. Imagination Is The Only Weapon In The War Against Reality. http://xoomer.alice.it/infinity77/ import PyQt4.QtGui Traceback (most recent call last): File interactive input, line 1, in module ImportError: No module named PyQt4.QtGui import pygtk Traceback (most recent call last): File interactive input, line 1, in module ImportError: No module named pygtk import wx ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] [ANN] Constrained optimization solver with guaranteed precision
Hi all, I'm glad to inform you that general constraints handling for interalg (free solver with guaranteed user-defined precision) now is available. Despite it is very premature and requires lots of improvements, it is already capable of outperforming commercial BARON (example: http://openopt.org/interalg_bench#Test_4) and thus you could be interested in trying it right now (next OpenOpt release will be no sooner than 1 month). interalg can be especially more effective than BARON (and some other competitors) on problems with huge or absent Lipschitz constant, for example on funcs like sqrt(x), log(x), 1/x, x**alpha, alpha1, when domain of x is something like [small_positive_value, another_value]. Let me also remember you that interalg can search for all solutions of nonlinear equations / systems of them where local solvers like scipy.optimize fsolve cannot find anyone, and search single/multiple integral with guaranteed user-defined precision (speed of integration is intended to be enhanced in future). However, only FuncDesigner models are handled (read interalg webpage for more details). Regards, D. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Statistical distributions on samples
On Mon, Aug 15, 2011 at 8:53 AM, Andrea Gavana andrea.gav...@gmail.comwrote: Hi Chris and All, On 12 August 2011 16:53, Christopher Jordan-Squire wrote: Hi Andrea--An easy way to get something like this would be import numpy as np import scipy.stats as stats sigma = #some reasonable standard deviation for your application x = stats.norm.rvs(size=1000, loc=125, scale=sigma) x = x[x50] x = x[x200] That will give a roughly normal distribution to your velocities, as long as, say, sigma25. (I'm using the rule of thumb for the normal distribution that normal random samples lie 3 standard deviations away from the mean about 1 out of 350 times.) Though you won't be able to get exactly normal errors about your mean since normal random samples can theoretically be of any size. You can use this same process for any other distribution, as long as you've chosen a scale variable so that the probability of samples being outside your desired interval is really small. Of course, once again your random errors won't be exactly from the distribution you get your original samples from. Thank you for your suggestion. There are a couple of things I am not clear with, however. The first one (the easy one), is: let's suppose I need 200 values, and the accept/discard procedure removes 5 of them from the list. Is there any way to draw these 200 values from a bigger sample so that the accept/reject procedure will not interfere too much? And how do I get 200 values out of the bigger sample so that these values are still representative? FWIW, I'm not really advocating a truncated normal so much as making the standard deviation small enough so that there's no real difference between a true normal distribution and a truncated normal. If you're worried about getting exactly 200 samples, then you could sample N with N200 and such that after throwing out the ones that lie outside your desired region you're left with M200. Then just randomly pick 200 from those M. That shouldn't bias anything as long as you randomly pick them. (Or just pick the first 200, if you haven't done anything to impose any order on the samples, such as sorting them by size.) But I'm not sure why you'd want exactly 200 samples instead of some number of samples close to 200. Another thing, possibly completely unrelated. I am trying to design a toy Latin Hypercube script (just for my own understanding). I found this piece of code on the web (and I modified it slightly): def lhs(dist, size=100): ''' Latin Hypercube sampling of any distrbution. dist is is a scipy.stats random number generator such as stats.norm, stats.beta, etc parms is a tuple with the parameters needed for the specified distribution. :Parameters: - `dist`: random number generator from scipy.stats module. - `size` :size for the output sample ''' n = size perc = numpy.arange(0.0, 1.0, 1.0/n) numpy.random.shuffle(perc) smp = [stats.uniform(i,1.0/n).rvs() for i in perc] v = dist.ppf(smp) return v Now, I am not 100% clear of what the percent point function is (I have read around the web, but please keep in mind that my statistical skills are close to minus infinity). From this page: http://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm The ppf is what's called the quantile function elsewhere. I do not know why scipy calls it the ppf/percent point function. The quantile function is the inverse of the cumulative density function (cdf). So dist.ppf(z) is the x such that P(dist = x) = z. Roughly. (Things get slightly more finicky if you think about discrete distributions because then you have to pick what happens at the jumps in the cdf.) So dist.ppf(0.5) gives the median of dist, and dist.ppf(0.25) gives the lower/first quartile of dist. I gather that, if you plot the results of the ppf, with the horizontal axis as probability, the vertical axis goes from the smallest to the largest value of the cumulative distribution function. If i do this: numpy.random.seed(123456) distribution = stats.norm(loc=125, scale=25) my_lhs = lhs(distribution, 50) Will my_lhs always contain valid values (i.e., included between 50 and 200)? I assume the answer is no... but even if this was the case, is this my_lhs array ready to be used to setup a LHS experiment when I have multi-dimensional problems (in which all the variables are completely independent from each other - no correlation)? I'm not really sure if the above function is doing the lhs you want. To answer your question, it won't always generate values within [50,200]. If size is large enough then you're dividing up the probability space evenly. So even with the random perturbations (whose use I don't really understand), you'll ensure that some of the samples you get when you apply the ppf will correspond to the extremely low probability samples that are 50 or 200. -Chris JS My
Re: [Numpy-discussion] [ANN] Constrained optimization solver with guaranteed precision
Hi Dmitrey, 2011/8/15 Dmitrey tm...@ukr.net: Hi all, I'm glad to inform you that general constraints handling for interalg (free solver with guaranteed user-defined precision) now is available. Despite it is very premature and requires lots of improvements, it is already capable of outperforming commercial BARON (example: http://openopt.org/interalg_bench#Test_4) and thus you could be interested in trying it right now (next OpenOpt release will be no sooner than 1 month). interalg can be especially more effective than BARON (and some other competitors) on problems with huge or absent Lipschitz constant, for example on funcs like sqrt(x), log(x), 1/x, x**alpha, alpha1, when domain of x is something like [small_positive_value, another_value]. Let me also remember you that interalg can search for all solutions of nonlinear equations / systems of them where local solvers like scipy.optimize fsolve cannot find anyone, and search single/multiple integral with guaranteed user-defined precision (speed of integration is intended to be enhanced in future). However, only FuncDesigner models are handled (read interalg webpage for more details). Thank you for this new improvements. I am one of those who use OpenOpt in real life problems, and if I can advance a suggestion (for the second time), when you post a benchmark of various optimization methods, please do not consider the elapsed time only as a meaningful variable to measure a success/failure of an algorithm. Some (most?) of real life problems require intensive and time consuming simulations for every *function evaluation*; the time spent by the solver itself doing its calculations simply disappears in front of the real process simulation. I know it because our simulations take between 2 and 48 hours to run, so what's 300 seconds more or less in the solver calculations? If you talk about synthetic problems (such as the ones defined by a formula), I can see your point. For everything else, I believe the number of function evaluations is a more direct way to assess the quality of an optimization algorithm. Just my 2c. Andrea. Imagination Is The Only Weapon In The War Against Reality. http://xoomer.alice.it/infinity77/ import PyQt4.QtGui Traceback (most recent call last): File interactive input, line 1, in module ImportError: No module named PyQt4.QtGui import pygtk Traceback (most recent call last): File interactive input, line 1, in module ImportError: No module named pygtk import wx ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Constrained optimization solver with guaranteed precision
Hi Andrea, I believe benchmarks should be like Hans Mittelman do ( http://plato.asu.edu/bench.html ) and of course number of funcs evaluations matters when slow Python code vs compiled is tested, but my current work doesn't allow me to spend so much time for OpenOpt development, so, moreover, for auxiliary work such as benchmarking (and making it properly like that). Also, benchmarks of someone's own soft usually are not very trustful, moreover, on his own probs. BTW, please don't reply on my posts in scipy mail lists - I use them only to post the announcements like this and can miss a reply. Regards, D. --- Исходное сообщение --- От кого: Andrea Gavana andrea.gav...@gmail.com Кому: Discussion of Numerical Python numpy-discussion@scipy.org Дата: 15 августа 2011, 23:01:05 Тема: Re: [Numpy-discussion] [ANN] Constrained optimization solver with guaranteed precision Hi Dmitrey, 2011/8/15 Dmitrey tm...@ukr.net : Hi all, I'm glad to inform you that general constraints handling for interalg (free solver with guaranteed user-defined precision) now is available. Despite it is very premature and requires lots of improvements, it is already capable of outperforming commercial BARON (example: http://openopt.org/interalg_bench#Test_4 ) and thus you could be interested in trying it right now (next OpenOpt release will be no sooner than 1 month). interalg can be especially more effective than BARON (and some other competitors) on problems with huge or absent Lipschitz constant, for example on funcs like sqrt(x), log(x), 1/x, x**alpha, alpha1, when domain of x is something like [small_positive_value, another_value]. Let me also remember you that interalg can search for all solutions of nonlinear equations / systems of them where local solvers like scipy.optimize fsolve cannot find anyone, and search single/multiple integral with guaranteed user-defined precision (speed of integration is intended to be enhanced in future). However, only FuncDesigner models are handled (read interalg webpage for more details). Thank you for this new improvements. I am one of those who use OpenOpt in real life problems, and if I can advance a suggestion (for the second time), when you post a benchmark of various optimization methods, please do not consider the elapsed time only as a meaningful variable to measure a success/failure of an algorithm. Some (most?) of real life problems require intensive and time consuming simulations for every *function evaluation*; the time spent by the solver itself doing its calculations simply disappears in front of the real process simulation. I know it because our simulations take between 2 and 48 hours to run, so what's 300 seconds more or less in the solver calculations? If you talk about synthetic problems (such as the ones defined by a formula), I can see your point. For everything else, I believe the number of function evaluations is a more direct way to assess the quality of an optimization algorithm. Just my 2c. Andrea. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] inverting and calculating eigenvalues for many small matrices
Hi, I put together a set of tools for inverting, multiplying and finding eigenvalues for many small matrices (arrays of shape (N, M, M) where MxM is the size of each matrix). Thanks to the posoter who suggested using the Tokyo package. Although not used directly, it helped with figuring the correct arguments to pass to the lapack routines and getting stated with cython. I put the code up here if anyone happens to be interested. http://matforge.org/fipy/browser/sandbox/smallMatrixTools/smt It consists of three files, smallMatrixTools.py, smt.pyx amd smt.pxd. The speed tests comparing with numpy came out something like this... N, M, M: 1, 2, 2 mulinv speed up: 65.9, eigvec speed up: 11.2 N, M, M: 1, 3, 3 mulinv speed up: 32.3, eigvec speed up: 7.2 N, M, M: 1, 4, 4 mulinv speed up: 24.1, eigvec speed up: 5.9 N, M, M: 1, 5, 5 mulinv speed up: 17.0, eigvec speed up: 5.2 for random matrices. Not bad speed ups, but not out of this world. I'm new to cython so there may be some costly mistakes in the implementation. I profiled and it seems that most of the time is now being spent in the lapack routines, but still not completely convinced by the profiling results. One thing that I know I'm doing wrong is reassigning every sub-matrix to a new array. This may not be that costly, but it seems fairly ugly. I wasn't sure how to pass the address of the submatrix to the lapack routines so I'm assigning to a new array and passing that instead. I tested with http://matforge.org/fipy/browser/sandbox/smallMatrixTools/smt/test.py and speed tests were done using http://matforge.org/fipy/browser/sandbox/smallMatrixTools/smt/speedTest.py. Cheers On Tue, Jul 12, 2011 at 3:51 AM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 07/11/2011 11:01 PM, Daniel Wheeler wrote: Hi, I am trying to find the eigenvalues and eigenvectors as well as the inverse for a large number of small matrices. The matrix size (MxM) will typically range from 2x2 to 8x8 at most. The number of matrices (N) can be from 100 up to a million or more. My current solution is to define eig and inv to be, def inv(A): Inverts N MxM matrices, A.shape = (M, M, N), inv(A).shape = (M, M, N). return np.array(map(np.linalg.inv, A.transpose(2, 0, 1))).transpose(1, 2, 0) def eig(A): Calculate the eigenvalues and eigenvectors of N MxM matrices, A.shape = (M, M, N), eig(A)[0].shape = (M, N), eig(A)[1].shape = (M, M, N) tmp = zip(*map(np.linalg.eig, A.transpose(2, 0, 1))) return (np.array(tmp[0]).swapaxes(0,1), np.array(tmp[1]).transpose(1,2,0)) The above uses map to fake a vector solution, but this is heinously slow. Are there any better ways to do this without resorting to cython or weave (would it even be faster (or possible) to use np.linalg.eig and np.linalg.inv within cython)? I could write specialized versions If you want to go the Cython route, here's a start: http://www.vetta.org/2009/09/tokyo-a-cython-blas-wrapper-for-fast-matrix-math/ Dag Sverre ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Daniel Wheeler ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Efficient way to load a 1Gb file?
In article ca+rwobwjyy_abjijnxepkseraeom608uimywffgag-6xdgs...@mail.gmail.com, Torgil Svensson torgil.svens...@gmail.com wrote: Try the fromiter function, that will allow you to pass an iterator which can read the file line by line and not preload the whole file. file_iterator = iter(open('filename.txt') line_parser = lambda x: map(float,x.split('\t')) a=np.fromiter(itertools.imap(line_parser,file_iterator),dtype=float) You have also the option to iterate the file twice and pass the count argument. Thanks. That sounds great! -- RUssell ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Segfault for np.lookfor
Hi, On current trunk, all tests pass but running the (forgive my language) doctests, I found this: In [1]: import numpy as np In [2]: np.__version__ Out[2]: '2.0.0.dev-730b861' In [3]: np.lookfor('cos') Segmentation fault on: Linux angela 2.6.38-10-generic #46-Ubuntu SMP Tue Jun 28 15:07:17 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux Ubuntu Natty Python 2.7.1+ Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpydoc - latex longtables error
Hi, On Wed, Aug 10, 2011 at 5:17 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Wed, Aug 10, 2011 at 5:03 PM, josef.p...@gmail.com wrote: On Wed, Aug 10, 2011 at 6:17 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Wed, Aug 10, 2011 at 12:38 PM, Skipper Seabold jsseab...@gmail.com wrote: On Wed, Aug 10, 2011 at 3:28 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, I think this one might be for Pauli. I've run into an odd problem that seems to be an interaction of numpydoc and autosummary and large classes. In summary, large classes and numpydoc lead to large tables of class methods, and there seems to be an error in the creation of the large tables in latex. Specifically, if I run 'make latexpdf' with the attached minimal sphinx setup, I get a pdflatex error ending thus: ... l.118 \begin{longtable}{LL} and this is because longtable does not accept LL as an argument, but needs '|l|l|' (bar - el - bar - el - bar). I see in sphinx.writers.latex.py, around line 657, that sphinx knows about this in general, and long tables in standard ReST work fine with the el-bar arguments passed to longtable. if self.table.colspec: self.body.append(self.table.colspec) else: if self.table.has_problematic: colwidth = 0.95 / self.table.colcount colspec = ('p{%.3f\\linewidth}|' % colwidth) * \ self.table.colcount self.body.append('{|' + colspec + '}\n') elif self.table.longtable: self.body.append('{|' + ('l|' * self.table.colcount) + '}\n') else: self.body.append('{|' + ('L|' * self.table.colcount) + '}\n') However, using numpydoc and autosummary (see the conf.py file), what seems to happen is that, when we reach the self.table.colspec test at the beginning of the snippet above, 'self.table.colspec' is defined: In [1]: self.table.colspec Out[1]: '{LL}\n' and thus the LL gets written as the arg to longtable: \begin{longtable}{LL} and the pdf build breaks. I'm using the numpydoc out of the current numpy source tree. At that point I wasn't sure how to proceed with debugging. Can you give any hints? It's not a proper fix, but our workaround is to edit the Makefile for latex (and latexpdf) to https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/docs/Makefile#L94 https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/docs/make.bat#L121 to call the script to replace the longtable arguments https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/docs/fix_longtable.py The workaround itself probably isn't optimal, and I'd be happy to hear of a proper fix. Thanks - yes - I found your workaround in my explorations, I put in a version in our tree too: https://github.com/matthew-brett/nipy/blob/latex_build_fixes/tools/fix_longtable.py - but I agree it seems much better to get to the root cause. When I tried to figure this out, I never found out why the correct sphinx longtable code path never gets reached, or which code (numpydoc, autosummary or sphinx) is filling in the colspec. No - it looked hard to debug. I established that it required numpydoc and autosummary to be enabled. It looks like this conversation dried up, so I've moved it to a ticket: http://projects.scipy.org/numpy/ticket/1935 Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Segfault for np.lookfor
On Mon, Aug 15, 2011 at 3:53 PM, Matthew Brett matthew.br...@gmail.comwrote: Hi, On current trunk, all tests pass but running the (forgive my language) doctests, I found this: In [1]: import numpy as np In [2]: np.__version__ Out[2]: '2.0.0.dev-730b861' In [3]: np.lookfor('cos') Segmentation fault on: Linux angela 2.6.38-10-generic #46-Ubuntu SMP Tue Jun 28 15:07:17 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux Ubuntu Natty Python 2.7.1+ The problem is somewhere in print_coercion_tables.py Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Segfault for np.lookfor
On Mon, Aug 15, 2011 at 6:56 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Aug 15, 2011 at 3:53 PM, Matthew Brett matthew.br...@gmail.comwrote: Hi, On current trunk, all tests pass but running the (forgive my language) doctests, I found this: In [1]: import numpy as np In [2]: np.__version__ Out[2]: '2.0.0.dev-730b861' In [3]: np.lookfor('cos') Segmentation fault on: Linux angela 2.6.38-10-generic #46-Ubuntu SMP Tue Jun 28 15:07:17 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux Ubuntu Natty Python 2.7.1+ The problem is somewhere in print_coercion_tables.py Or more precisely, it triggered by importing print_coercion_tables.py. I don't think lookfor should be doing that, but in any case: array + scalar + ? b h i l q p B H I L Q P e f d g F D G S U V O M m ? ? b h i l l l B H I L L L e f d g F D G O O # O ! m b b b b b b b b b b b b b b e f d g F D G O O # O ! m h h h h h h h h h h h h h h f f d g F D G O O # O ! m i i i i i i i i i i i i i i d d d g D D G O O # O ! m l l l l l l l l l l l l l l d d d g D D G O O # O ! m q l l l l l l l l l l l l l d d d g D D G O O # O ! m p l l l l l l l l l l l l l d d d g D D G O O # O ! m B B B B B B B B B B B B B B e f d g F D G O O # O ! m H H H H H H H H H H H H H H f f d g F D G O O # O ! m I I I I I I I I I I I I I I d d d g D D G O O # O ! m L L L L L L L L L L L L L L d d d g D D G O O # O ! m Q L L L L L L L L L L L L L d d d g D D G O O # O ! m P L L L L L L L L L L L L L d d d g D D G O O # O ! m e e e e e e e e e e e e e e e e e e F F F O O # O ! # f f f f f f f f f f f f f f f f f f F F F O O # O ! # d d d d d d d d d d d d d d d d d d D D D O O # O ! # g g g g g g g g g g g g g g g g g g G G G O O # O ! # F F F F F F F F F F F F F F F F F F F F F O O # O ! # D D D D D D D D D D D D D D D D D D D D D O O # O ! # G G G G G G G G G G G G G G G G G G G G G O O # O ! # S O O O O O O O O O O O O O O O O O O O O O O # O ! O U O O O O O O O O O O O O O O O O O O O O O O # O ! O Segmentation fault (core dumped) Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Segfault for np.lookfor
On Mon, Aug 15, 2011 at 7:09 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Aug 15, 2011 at 6:56 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Aug 15, 2011 at 3:53 PM, Matthew Brett matthew.br...@gmail.comwrote: Hi, On current trunk, all tests pass but running the (forgive my language) doctests, I found this: In [1]: import numpy as np In [2]: np.__version__ Out[2]: '2.0.0.dev-730b861' In [3]: np.lookfor('cos') Segmentation fault on: Linux angela 2.6.38-10-generic #46-Ubuntu SMP Tue Jun 28 15:07:17 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux Ubuntu Natty Python 2.7.1+ The problem is somewhere in print_coercion_tables.py Or more precisely, it triggered by importing print_coercion_tables.py. I don't think lookfor should be doing that, but in any case: array + scalar + ? b h i l q p B H I L Q P e f d g F D G S U V O M m ? ? b h i l l l B H I L L L e f d g F D G O O # O ! m b b b b b b b b b b b b b b e f d g F D G O O # O ! m h h h h h h h h h h h h h h f f d g F D G O O # O ! m i i i i i i i i i i i i i i d d d g D D G O O # O ! m l l l l l l l l l l l l l l d d d g D D G O O # O ! m q l l l l l l l l l l l l l d d d g D D G O O # O ! m p l l l l l l l l l l l l l d d d g D D G O O # O ! m B B B B B B B B B B B B B B e f d g F D G O O # O ! m H H H H H H H H H H H H H H f f d g F D G O O # O ! m I I I I I I I I I I I I I I d d d g D D G O O # O ! m L L L L L L L L L L L L L L d d d g D D G O O # O ! m Q L L L L L L L L L L L L L d d d g D D G O O # O ! m P L L L L L L L L L L L L L d d d g D D G O O # O ! m e e e e e e e e e e e e e e e e e e F F F O O # O ! # f f f f f f f f f f f f f f f f f f F F F O O # O ! # d d d d d d d d d d d d d d d d d d D D D O O # O ! # g g g g g g g g g g g g g g g g g g G G G O O # O ! # F F F F F F F F F F F F F F F F F F F F F O O # O ! # D D D D D D D D D D D D D D D D D D D D D O O # O ! # G G G G G G G G G G G G G G G G G G G G G O O # O ! # S O O O O O O O O O O O O O O O O O O O O O O # O ! O U O O O O O O O O O O O O O O O O O O O O O O # O ! O Segmentation fault (core dumped) A quick fix is to put the print statements in a function. diff --git a/numpy/testing/print_coercion_tables.py b/numpy/testing/print_coercion_tables.p index d875449..3bc9253 100755 --- a/numpy/testing/print_coercion_tables.py +++ b/numpy/testing/print_coercion_tables.py @@ -65,22 +65,23 @@ def print_coercion_table(ntypes, inputfirstvalue, inputsecondvalue, fir print char, print -print can cast -print_cancast_table(np.typecodes['All']) -print -print In these tables, ValueError is '!', OverflowError is '@', TypeError is '#' -print -print scalar + scalar -print_coercion_table(np.typecodes['All'], 0, 0, False) -print -print scalar + neg scalar -print_coercion_table(np.typecodes['All'], 0, -1, False) -print -print array + scalar -print_coercion_table(np.typecodes['All'], 0, 0, True) -print -print array + neg scalar -print_coercion_table(np.typecodes['All'], 0, -1, True) -print -print promote_types -print_coercion_table(np.typecodes['All'], 0, 0, False, True) +def printem(): +print can cast +print_cancast_table(np.typecodes['All']) +print +print In these tables, ValueError is '!', OverflowError is '@', TypeError is '#' +print +print scalar + scalar +print_coercion_table(np.typecodes['All'], 0, 0, False) +print +print scalar + neg scalar +print_coercion_table(np.typecodes['All'], 0, -1, False) +print +print array + scalar +print_coercion_table(np.typecodes['All'], 0, 0, True) +print +print array + neg scalar +print_coercion_table(np.typecodes['All'], 0, -1, True) +print +print promote_types +print_coercion_table(np.typecodes['All'], 0, 0, False, True) Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] As any array, really any array
Hello all, I often find myself writing the following code: try: features = np.asanyarray(features) except: features = np.asanyarray(features, dtype=object) I basically want to be able to use fany indexing on features and, in most cases, it will be a numpy floating point array. Otherwise, default to having it be an array of dtype=object. Is there a more elegant way to do it with numpy? Thank you, -- Luis Pedro Coelho | Carnegie Mellon University | http://luispedro.org signature.asc Description: This is a digitally signed message part. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion