Re: Text Search using Python

2010-01-14 Thread Gabriel Falcão
There is a Django abstraction to whoosh:

http://haystacksearch.org/

On Thu, Jan 14, 2010 at 12:36 PM, Mark (Nosrednakram)
 wrote:
> Hello,
>
> I've been using whoosh and find it very easy to implement.  Whoosh has
> good documentation on static file indexing and  django related
> documentation is available.
>
> Whoosh Specific
>  http://whoosh.ca/
>  http://files.whoosh.ca/whoosh/docs/latest/index.html
>
> Django Related
>  http://www.arnebrodowski.de/blog/add-full-text-search-to-your-django-project-with-whoosh.html
>  http://projects.django-development.com/trac/dd_devel/wiki/dd_search_wiki
>
> The last link is some code I threw together that uses multiple indexes
> so you could set one up for static files as well as for django based
> if you want/need both.  I would probably write a manager for indexing
> static media that would take an OS path and base URL to prepend to it
> and run it when needed.  If you want to try whoosh feel free to ask me
> questions.
>
> Mark
>
> On Jan 14, 4:04 am, Amit Sethi  wrote:
>> Hi , I have  a project with a few static html pages , I wish to search these
>> static html using a django search app . Most of the tutorials I saw are
>> focused on searching django models but none I could see concentrates on
>> indexing static html pages . Can some one guide be to a library /tutorial
>> etc for searching and indexing static pages in a project .
>>
>> --
>> A-M-I-T S|S
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Django users" group.
> To post to this group, send email to django-us...@googlegroups.com.
> To unsubscribe from this group, send email to 
> django-users+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/django-users?hl=en.
>
>
>
>



-- 
:wq

Atenciosamente
__
Gabriel Falcão

Jabber: gabrielfal...@jabber-br.org
Blog: http://www.nacaolivre.org
-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.




Re: Text Search using Python

2010-01-14 Thread esatterwh...@wi.rr.com
I've been working on 2 sites that use full text search on a wiki-like
system where users use a WYSIWYG/html editor. This, obviously, doesn't
apply to flatpages, but the poblem/solution might be of help.

The problem is, if you try tindex/search html, you get a lot terrible
results. For example, if you were searching for the word class,
virtually ever document would come up as result because of the html
class="" bit.

The solution we are using was an off shoot of the wiki-app from pinax.

We use google's diff_match_patch library alot.here is the basic
rundown

1. Get HTML content from a from submitted by user.
2. use django's defaultfilter striptags to strip the html tags
3. used diff_match_patch to create a patch between the plain text and
html
4. save the plain text as the content on the document model
5. save a text version of the patch on the document model
6. index the plain text when you search for strings, this will be what
the search is performed on

An instance method that accepts no parameters ( so it can be used in
templates ) is used to recreate the HTML from the patch and plain text
and that is displayed when a user wants to view the page. As of yet,
we haven't seen an difference in performance when rendering pages.

While, I'm not certain, I'm not so sure that flatpages would allow you
to do this. And the solution is probably a bit more complicated than
what one would want to do for just indexing static pages. I maybe
wrong, but I don't think you will be index full HTML content and only
search the plain text with out doing some kind of conversion of the
HTML first.

This does, however, depend on how you set up your templates for
flatpages. I have had flat pages that extend a base template and just
render out plain text into generic  container. If you were to do
that, then you could index the flatpage content as it would only be
plain text. this could be done fairly easily with django's orm.

end ramble.

On Jan 14, 5:04 am, Amit Sethi  wrote:
> Hi , I have  a project with a few static html pages , I wish to search these
> static html using a django search app . Most of the tutorials I saw are
> focused on searching django models but none I could see concentrates on
> indexing static html pages . Can some one guide be to a library /tutorial
> etc for searching and indexing static pages in a project .
>
> --
> A-M-I-T S|S
-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.




Re: Text Search using Python

2010-01-14 Thread Mark (Nosrednakram)
Hello,

I've been using whoosh and find it very easy to implement.  Whoosh has
good documentation on static file indexing and  django related
documentation is available.

Whoosh Specific
  http://whoosh.ca/
  http://files.whoosh.ca/whoosh/docs/latest/index.html

Django Related
  
http://www.arnebrodowski.de/blog/add-full-text-search-to-your-django-project-with-whoosh.html
  http://projects.django-development.com/trac/dd_devel/wiki/dd_search_wiki

The last link is some code I threw together that uses multiple indexes
so you could set one up for static files as well as for django based
if you want/need both.  I would probably write a manager for indexing
static media that would take an OS path and base URL to prepend to it
and run it when needed.  If you want to try whoosh feel free to ask me
questions.

Mark

On Jan 14, 4:04 am, Amit Sethi  wrote:
> Hi , I have  a project with a few static html pages , I wish to search these
> static html using a django search app . Most of the tutorials I saw are
> focused on searching django models but none I could see concentrates on
> indexing static html pages . Can some one guide be to a library /tutorial
> etc for searching and indexing static pages in a project .
>
> --
> A-M-I-T S|S
-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.




Re: Text Search using Python

2010-01-14 Thread Shawn Milochik

On Jan 14, 2010, at 6:04 AM, Amit Sethi wrote:

> Hi , I have  a project with a few static html pages , I wish to search these 
> static html using a django search app . Most of the tutorials I saw are 
> focused on searching django models but none I could see concentrates on 
> indexing static html pages . Can some one guide be to a library /tutorial etc 
> for searching and indexing static pages in a project .
> 
> -- 
> A-M-I-T S|S
> -- 


A possibility (I don't know how feasible this is) would be to import the 
content of those HTML files into Django Flatpages. If the projects you 
discovered just require a Django model of some kind, you should be good to go.

Otherwise, it sounds like, as in your subject line, you're just going to be 
using Python's string methods to search plain text. If the same static 
documents are going to be searched over time, then you could cache their 
contents in a database. Maybe then you can apply some of the clever ideas 
people have put into Django-specific search tools, but you might have to 
implement some of them in your own search code, rather than using a 
Django-specific project as-is.

Shawn



-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.