Re: Text Search using Python
There is a Django abstraction to whoosh: http://haystacksearch.org/ On Thu, Jan 14, 2010 at 12:36 PM, Mark (Nosrednakram)wrote: > Hello, > > I've been using whoosh and find it very easy to implement. Whoosh has > good documentation on static file indexing and django related > documentation is available. > > Whoosh Specific > http://whoosh.ca/ > http://files.whoosh.ca/whoosh/docs/latest/index.html > > Django Related > http://www.arnebrodowski.de/blog/add-full-text-search-to-your-django-project-with-whoosh.html > http://projects.django-development.com/trac/dd_devel/wiki/dd_search_wiki > > The last link is some code I threw together that uses multiple indexes > so you could set one up for static files as well as for django based > if you want/need both. I would probably write a manager for indexing > static media that would take an OS path and base URL to prepend to it > and run it when needed. If you want to try whoosh feel free to ask me > questions. > > Mark > > On Jan 14, 4:04 am, Amit Sethi wrote: >> Hi , I have a project with a few static html pages , I wish to search these >> static html using a django search app . Most of the tutorials I saw are >> focused on searching django models but none I could see concentrates on >> indexing static html pages . Can some one guide be to a library /tutorial >> etc for searching and indexing static pages in a project . >> >> -- >> A-M-I-T S|S > > -- > You received this message because you are subscribed to the Google Groups > "Django users" group. > To post to this group, send email to django-us...@googlegroups.com. > To unsubscribe from this group, send email to > django-users+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/django-users?hl=en. > > > > -- :wq Atenciosamente __ Gabriel Falcão Jabber: gabrielfal...@jabber-br.org Blog: http://www.nacaolivre.org -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: Text Search using Python
I've been working on 2 sites that use full text search on a wiki-like system where users use a WYSIWYG/html editor. This, obviously, doesn't apply to flatpages, but the poblem/solution might be of help. The problem is, if you try tindex/search html, you get a lot terrible results. For example, if you were searching for the word class, virtually ever document would come up as result because of the html class="" bit. The solution we are using was an off shoot of the wiki-app from pinax. We use google's diff_match_patch library alot.here is the basic rundown 1. Get HTML content from a from submitted by user. 2. use django's defaultfilter striptags to strip the html tags 3. used diff_match_patch to create a patch between the plain text and html 4. save the plain text as the content on the document model 5. save a text version of the patch on the document model 6. index the plain text when you search for strings, this will be what the search is performed on An instance method that accepts no parameters ( so it can be used in templates ) is used to recreate the HTML from the patch and plain text and that is displayed when a user wants to view the page. As of yet, we haven't seen an difference in performance when rendering pages. While, I'm not certain, I'm not so sure that flatpages would allow you to do this. And the solution is probably a bit more complicated than what one would want to do for just indexing static pages. I maybe wrong, but I don't think you will be index full HTML content and only search the plain text with out doing some kind of conversion of the HTML first. This does, however, depend on how you set up your templates for flatpages. I have had flat pages that extend a base template and just render out plain text into generic container. If you were to do that, then you could index the flatpage content as it would only be plain text. this could be done fairly easily with django's orm. end ramble. On Jan 14, 5:04 am, Amit Sethiwrote: > Hi , I have a project with a few static html pages , I wish to search these > static html using a django search app . Most of the tutorials I saw are > focused on searching django models but none I could see concentrates on > indexing static html pages . Can some one guide be to a library /tutorial > etc for searching and indexing static pages in a project . > > -- > A-M-I-T S|S -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: Text Search using Python
Hello, I've been using whoosh and find it very easy to implement. Whoosh has good documentation on static file indexing and django related documentation is available. Whoosh Specific http://whoosh.ca/ http://files.whoosh.ca/whoosh/docs/latest/index.html Django Related http://www.arnebrodowski.de/blog/add-full-text-search-to-your-django-project-with-whoosh.html http://projects.django-development.com/trac/dd_devel/wiki/dd_search_wiki The last link is some code I threw together that uses multiple indexes so you could set one up for static files as well as for django based if you want/need both. I would probably write a manager for indexing static media that would take an OS path and base URL to prepend to it and run it when needed. If you want to try whoosh feel free to ask me questions. Mark On Jan 14, 4:04 am, Amit Sethiwrote: > Hi , I have a project with a few static html pages , I wish to search these > static html using a django search app . Most of the tutorials I saw are > focused on searching django models but none I could see concentrates on > indexing static html pages . Can some one guide be to a library /tutorial > etc for searching and indexing static pages in a project . > > -- > A-M-I-T S|S -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: Text Search using Python
On Jan 14, 2010, at 6:04 AM, Amit Sethi wrote: > Hi , I have a project with a few static html pages , I wish to search these > static html using a django search app . Most of the tutorials I saw are > focused on searching django models but none I could see concentrates on > indexing static html pages . Can some one guide be to a library /tutorial etc > for searching and indexing static pages in a project . > > -- > A-M-I-T S|S > -- A possibility (I don't know how feasible this is) would be to import the content of those HTML files into Django Flatpages. If the projects you discovered just require a Django model of some kind, you should be good to go. Otherwise, it sounds like, as in your subject line, you're just going to be using Python's string methods to search plain text. If the same static documents are going to be searched over time, then you could cache their contents in a database. Maybe then you can apply some of the clever ideas people have put into Django-specific search tools, but you might have to implement some of them in your own search code, rather than using a Django-specific project as-is. Shawn -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.