Re: Index Sizes

2015-05-21 Thread Shawn Heisey
On 1/7/2014 7:48 AM, Steven Bower wrote:
> I was looking at the code for getIndexSize() on the ReplicationHandler to
> get at the size of the index on disk. From what I can tell, because this
> does directory.listAll() to get all the files in the directory, the size on
> disk includes not only what is searchable at the moment but potentially
> also files that are being created by background merges/etc.. I am wondering
> if there is an API that would give me the size of the "currently
> searchable" index files (doubt this exists, but maybe)..
> 
> If not what is the most appropriate way to get a list of the segments/files
> that are currently in use by the active searcher such that I could then ask
> the directory implementation for the size of all those files?
> 
> For a more complete picture of what I'm trying to accomplish, I am looking
> at building a quota/monitoring component that will trigger when index size
> on disk gets above a certain size. I don't want to trigger if index is
> doing a merge and ephemerally uses disk for that process. If anyone has any
> suggestions/recommendations here too I'd be interested..

Dredging up a VERY old thread here.  As I was replying to your most
recent query, I was looking through my email archive for your previous
messages and this one caught my eye, especially because it never got a
reply.  It must have escaped my notice last year.

This is a very good idea.  I imagine that the active searcher object
directly or indirectly knows exactly which files are in use for that
searcher, so I think it should be relatively easy for it to retrieve a
list, and the index size code should be able to return both the active
index size as well as the total directory size.

I've been putting a little bit of work in to get the index size code
moved out of the replication handler so that it is available even if
replication is completely disabled, but my free time has been limited.
I don't recall the issue number(s) for that work.

Thanks,
Shawn



Index Sizes

2014-01-07 Thread Steven Bower
I was looking at the code for getIndexSize() on the ReplicationHandler to
get at the size of the index on disk. From what I can tell, because this
does directory.listAll() to get all the files in the directory, the size on
disk includes not only what is searchable at the moment but potentially
also files that are being created by background merges/etc.. I am wondering
if there is an API that would give me the size of the "currently
searchable" index files (doubt this exists, but maybe)..

If not what is the most appropriate way to get a list of the segments/files
that are currently in use by the active searcher such that I could then ask
the directory implementation for the size of all those files?

For a more complete picture of what I'm trying to accomplish, I am looking
at building a quota/monitoring component that will trigger when index size
on disk gets above a certain size. I don't want to trigger if index is
doing a merge and ephemerally uses disk for that process. If anyone has any
suggestions/recommendations here too I'd be interested..

Thanks,

steve


Re: Prediction About Index Sizes of Solr

2013-04-08 Thread Dmitry Kan
Interesting bit, thanks* *Rafał!



On Mon, Apr 8, 2013 at 12:54 PM, Rafał Kuć  wrote:

> Hello!
>
> Let me answer the first part of your question. Please have a look at
>
> https://svn.apache.org/repos/asf/lucene/dev/trunk/dev-tools/size-estimator-lucene-solr.xls
> It should help you make an estimation about your index size.
>
> --
> Regards,
>  Rafał Kuć
>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
>
> > This may not be a well detailed question but I will try to make it clear.
>
> > I am crawling web pages and will index them at SolrCloud 4.2. What I want
> > to predict is the index size.
>
> > I will have approximately 2 billion web pages and I consider each of them
> > will be 100 Kb.
> > I know that it depends on storing documents, stop words. etc. etc. If you
> > want to ask about detail of my question I may give you more explanation.
> > However there should be some analysis to help me because I should predict
> > something about what will be the index size for me.
>
> > On the other hand my other important question is how SolrCloud makes
> > replicas for indexes, can I change it how many replicas will be. Because
> I
> > should multiply the total amount of index size with replica size.
>
> > Here I found an article related to my analysis:
> > http://juanggrande.wordpress.com/2010/12/20/solr-index-size-analysis/
>
> > I know this question may not be details but if you give ideas about it
> you
> > are welcome.
>
>


Re: Prediction About Index Sizes of Solr

2013-04-08 Thread Rafał Kuć
Hello!

Let me answer the first part of your question. Please have a look at
https://svn.apache.org/repos/asf/lucene/dev/trunk/dev-tools/size-estimator-lucene-solr.xls
It should help you make an estimation about your index size. 

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> This may not be a well detailed question but I will try to make it clear.

> I am crawling web pages and will index them at SolrCloud 4.2. What I want
> to predict is the index size.

> I will have approximately 2 billion web pages and I consider each of them
> will be 100 Kb.
> I know that it depends on storing documents, stop words. etc. etc. If you
> want to ask about detail of my question I may give you more explanation.
> However there should be some analysis to help me because I should predict
> something about what will be the index size for me.

> On the other hand my other important question is how SolrCloud makes
> replicas for indexes, can I change it how many replicas will be. Because I
> should multiply the total amount of index size with replica size.

> Here I found an article related to my analysis:
> http://juanggrande.wordpress.com/2010/12/20/solr-index-size-analysis/

> I know this question may not be details but if you give ideas about it you
> are welcome.



Prediction About Index Sizes of Solr

2013-04-08 Thread Furkan KAMACI
This may not be a well detailed question but I will try to make it clear.

I am crawling web pages and will index them at SolrCloud 4.2. What I want
to predict is the index size.

I will have approximately 2 billion web pages and I consider each of them
will be 100 Kb.
I know that it depends on storing documents, stop words. etc. etc. If you
want to ask about detail of my question I may give you more explanation.
However there should be some analysis to help me because I should predict
something about what will be the index size for me.

On the other hand my other important question is how SolrCloud makes
replicas for indexes, can I change it how many replicas will be. Because I
should multiply the total amount of index size with replica size.

Here I found an article related to my analysis:
http://juanggrande.wordpress.com/2010/12/20/solr-index-size-analysis/

I know this question may not be details but if you give ideas about it you
are welcome.


RE: Question about index sizes.

2009-06-23 Thread Ensdorf Ken
That's a great question.  And the answer is, of course, it depends.  Mostly on 
the size of the documents you are indexing.  50 million rows from a database 
table with a handful of columns is very different from 50 million web pages,  
pdf documents, books, etc.

We currently have about 50 million documents split across 2 servers with 
reasonable performance - sub-second response time in most cases.  The total 
size of the 2 indices is about 300G.  I'd say most of the size is from stored 
fields, though we index just about everything.  This is on 64-bit ubuntu boxes 
with 32G of memory.  We haven't pushed this into production yet, but initial 
load-testing results look promising.

Hope this helps!

> -Original Message-
> From: Jim Adams [mailto:jasolru...@gmail.com]
> Sent: Tuesday, June 23, 2009 1:24 PM
> To: solr-user@lucene.apache.org
> Subject: Question about index sizes.
>
> Can anyone give me a rule of thumb for knowing when you need to go to
> multicore or shards?  How many records can be in an index before it
> breaks
> down?  Does it break down?  Is it 10 million? 20 million?  50 million?
>
> Thanks, Jim


Question about index sizes.

2009-06-23 Thread Jim Adams
Can anyone give me a rule of thumb for knowing when you need to go to
multicore or shards?  How many records can be in an index before it breaks
down?  Does it break down?  Is it 10 million? 20 million?  50 million?

Thanks, Jim