Hi Venkat,
There is no need to post your question multiple times or cross-post.
People are distributed all around the world on this list and aren't
always available or capable to answer your question. Having to wait
11 hours for an answer on a free mailing list is not at all
unreasonable.
If you are just looking to get your hands dirty with Lucene, why not
just start w/ a subset on a machine you already own and work to scale
up? This way, you could start with what you have available and get a
feel for your memory usage, etc. Then you will be in a better
position to decide what your needs are.
If there is one thing that is true about search it is the fact that
everyone's situation is different.
Cheers,
Grant
On Dec 18, 2007, at 11:21 AM, v k wrote:
Hello,
I am using Lucene to build an index from roughly 10 million documents
in number. The documents are about 4 TB in total.
After some trial runs, indexing a subset of the documents I am trying
to figure out a hosting service configuration to create a full index
from the entire 10 TB of data. As I am still unsure how this project
will turn out I am not purchasing hardware/ram but considering a web
host.
for the purpose of :
1) download the data and to start indexing it.
2) The web front end to access this index will be a python framework (
eg. Django etc)
I am seriously contemplating signing up with Joyent for this plan:
AMD Opteron x64 multi-core servers with 4GiB RAM per core
1/16 (Burstable up to 95%)
1 TB - Bandwidth/month, 1 GB RAM, + as such as NAS storage as I
can
afford to pay for.
My QUESTION is - Will this RAM and CPU be sufficient during
development of the search application and building the index, etc. or
is it so abysmal and under-equipped in terms of hardware that the
development version of my application will not work.
I understand that having more RAM is always good, but is 1GB as good
as nothing?
This setup is NOT for production but for for development so I can get
my hands dirty with lucene which will require plenty of tweaks as the
project moves along.
What initial configuration would you recommend for a development
version given the corpus size. I am not even sure how large my index
will look like at this point.
I hope to build an my indexes this way and once the search
infrastructure is working and the web-front end complete, I plan to
worry about Redundancy, availability and scalability for the many
users I hope to provide this free service for :-)
Many of you in this forum have built successful products with Lucene.
To name a few I am aware of - Ken Krugle, James Ryley, Dennis Kubes
Some of you must have started with small machines,test set-ups etc
where you built your initial search apps. I hope to receive some
advise about my plan and approach to start building an infrastructure
to support my Lucene app.
Thank you.
Venkat
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]