Lucene is basically a search library and Solr is a search web
application using Lucene.
So, depending where you want to set your "starting point", you can
definitely do this with Lucene, whereas you might want to consider Solr
https://solr.apache.org/features.html
which is also based on Lucene, because you will provide many features
out of the box
https://solr.apache.org/features.html
Also see
https://cwiki.apache.org/confluence/display/SOLR/FAQ
Re Crawlers in combination with Solr see for example
https://cwiki.apache.org/confluence/display/SOLR/SolrEcosystem#SolrEcosystem-CrawlersAndConnectors
or
https://www.octoparse.com/blog/10-best-open-source-web-scraper#
Cheers
Michael
Am 05.04.21 um 11:59 schrieb Som Lima:
Thank you for your reply.
Yes I would like to provide a search engine for my company website and at
the same time build a web search engine as a personal project .
On Mon, 5 Apr 2021, 10:57 Michael Wechner, <michael.wech...@wyona.com>
wrote:
Hi
The following FAQ might be a bit outdated, but nevertheless you should
find some answers there as well
https://cwiki.apache.org/confluence/display/lucene/LuceneFAQ
For example to answer your question 4) see
https://cwiki.apache.org/confluence/display/lucene/LuceneFAQ#LuceneFAQ-CanIuseLucenetocrawlmysiteorothersitesontheInternet
?
If I understand your questions correctly, your objective is to provide a
search engine for your company website?
HTH
Michael
Am 05.04.21 um 11:34 schrieb Som Lima:
Hi,
Before doing a deep dive into lucene I would appreciate it if you
would
clarify a few things so I know if this is the right project to fulfill my
objective.
1. It is my my understanding that google search is a more elaborate
utility
but not unlike this *.nix search utility grep which searches for a string
pattern recursively in text files , for example files could .java
files
, .html files. The search starts in this case from the current
directory.
grep -RiIl 'search'
Quick grep explanation:
-R - recursive search
-i - case-insensitive
-I - skip binary files
-l - print a simple list as output.
2. Further to my undersrand , if it correct, is the objective of lucene
pretty much the same . Searching for String patterns recursively ?
3. If lucene is a search engine same as google or grep then do I just
point it to my website root directory ?
4. Can I use lucene as a web search engine same as Google, if so where
would I point it to so that lucence can recursively search the www
websites ?
5. Is lucene use case for something else entirely ?
Thanks
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org