On Feb 14, 3:50 am, Paul Rubin <http://[EMAIL PROTECTED]> wrote: > The main thing killing most of the search apps that I'm involved with > is disk latency. If Aaron is listening, I might suggest offering a > config option to redundantly recording the stored search fields with > every search term in the index.
I'm not sure what you mean, but if I understand I think Nucular already does this. The signatures of the primary indices are Description: DocumentId x AttributeIndex x FullValue "Given a document Id find attributes and their values" AttributeIndex: AttributeIndex x TruncatedValue x DocumentId "Given an attribute and a value (prefix) find document Id's" AttributeWord: AttributeIndex x Word x DocumentId "Given an attribute and a word find documents containing that word in that attribute" WordIndex: Word x DocumentId "Given a word find documents containing that word anywhere" There are a lot of other possibilities which could be added fairly easily (and I'd like to work out an abstraction layer to make it even easier -- so you don't need to directly modify the library code). For instance you might want to make proximity searching faster by indexing words in a document with their locations. Currently proximity searches that must filter thousands of documents containing all the relevant words are noticably slower than other queries. It's a hard problem: every additional index and index column makes some queries faster, but it may make other queries slower sometimes and it always makes index builds and index files more expensive. It has also occurred to me that the underlying index implementations and related data structures may be of interest to Python programmers for all sorts of other purposes too. As far as how Nucular compares to Sphinx or anything else: I don't know and I'm not the right person to evaluate that. I'd encourage people to try out Nucular and see if it is easy enough to use and fast enough and feature rich enough for the intended use. If it isn't maybe you should find something else. Suggestions and criticism are always welcome. -- Aaron Watters === "Visit New Jersey: It's not as bad as you think!" -- suggested New Jersey tourism slogan http://www.xfeedme.com/nucular/pydistro.py/go?FREETEXT=frighten+away+evil+spirits -- http://mail.python.org/mailman/listinfo/python-list