Hello sir,
Thank you for the quick reply. I want to integrate this functionality with
web2py, So i would need to stick with python and Pylucene. So the method
you are saying is like, extracting text from all the document using
different python libraries, and then Indexing the data, then Search the
string, is this correct ?

And also, I have other doubt also, look here -
http://lucene.apache.org/pylucene/features.html in the middle of the page,
it is wriiten to import any function, i need to call like:

>>from lucene import StandardAnalyzer

but here, its not like this -
http://svn.apache.org/viewvc/lucene/pylucene/trunk/samples/IndexFiles.py?view=markup
its written :

>> from org.apache.lucene.analysis.standard import StandardAnalyzer

So, can u explain me why such a difference ?

Thanks you!!

Regards,
Vishrut Mehta




On Tue, Jun 11, 2013 at 3:10 PM, Vinay Modi <vinaym...@gmail.com> wrote:

>  Vishrut,
>
> Have a look at Apache Nutch.
>
> If you want to stick with Lucene, you will need to do lot of stuff
> yourself. See Python libraries for pdf and doc etc, read the text from
> those file using these libraries and then use Lucene to index and search.
>
> Regards,
>  *Vinay Modi
> **Celebrating 150th Birth Anniversary of Swami Vivekananda *
>  www.vivekananda150jayanti.org
>
> The information contained in this e-mail message and any attachment(s) is
> intended only for the person or entity to which it is addressed and may
> contain information which is confidential, privileged and/or exempt from
> disclosure under applicable law. Any review, retransmission, dissemination
> or other use of, or taking of any action in reliance upon this information
> by persons or entities other than the intended recipient is prohibited. If
> you received this e-mail message in error, please contact the sender and
> delete it or otherwise destroy all copies of this transmission.
>  On 11-06-2013 15:35, Vishrut Mehta wrote:
>
> Hello Everyone,
>                 I am Vishrut Mehta, currently a third year students at IIIT
> Hyderabad, India. I have been contributing to Open Source since two years
> and also have contributed to organizations like E-cidadania, Sahana
> Software Foundation, Gnome, etc. I am very interested in Search engines and
> search related libraries.
>
>                I need some help from the community, I am currently working
> on a project which deals with the follow issue - Need to search within any
> uploaded documents(like .pdf, .doc, etc) from the user    and need to
> search text or strings within those documents. Can anyone help me for this,
> it would be a great help ?!
>
> Thanks You!
> Regards,
>
>
>


-- 

*Vishrut Mehta*
International Institute of Information Technology,
Gachibowli,Hyderabad-500032

Reply via email to