Re: FunctionQuery score=0

2011-11-20 Thread John
After playing some more with this I managed to get what I want, almost.

My query now looks like:

q={!frange l=0 incl=false}query({!type=edismax qf=abstract^0.02 title^0.08
categorysearch^0.05 boost='eqsim(alltokens,xyz)' v='+tokens5:xyz '})


With the above query, I am getting only the results that I want, the ones
whose score after my FucntionQuery are above 0, but the problem now is that
the final score for all results is changed to 1, which affects the sorting.

How can I keep the original score that is calculated by the edismax query?

Cheers,
John

On Fri, Nov 18, 2011 at 10:50 AM, Andre Bois-Crettez
andre.b...@kelkoo.comwrote:

 Definitely worked for me, with a classic full text search on ipod and
 such.
 Changing the lower bound changed the number of results.

 Follow Chris advice, and give more details.



 John wrote:

 Doesn't seem to work.
 I though that FilterQueries work before the search is performed and not
 after... no?

 Debug doesn't include filter query only the below (changed a bit):

 BoostedQuery(boost(+fieldName:**,boostedFunction(ord(**
 fieldName),query)))


 On Thu, Nov 17, 2011 at 5:04 PM, Andre Bois-Crettez
 andre.b...@kelkoo.comwrote:



 John wrote:



 Some of the results are receiving score=0 in my function and I would
 like
 them not to appear in the search results.




 you can use frange, and filter by score:

 q=ipodfq={!frange l=0 incl=false}query($q)

 --
 André Bois-Crettez

 Search technology, Kelkoo
 http://www.kelkoo.com/








 --
 André Bois-Crettez

 Search technology, Kelkoo
 http://www.kelkoo.com/




Re: Performance issues

2011-11-20 Thread Govind @ Gmail
http://www.lucidimagination.com/content/scaling-lucene-and-solr 

Has good guidance.

Wrt 1. What is the  issue - mem, cpu or query perf or indexing process


On Nov 20, 2011, at 11:39 AM, Lalit Kumar 4 lkum...@sapient.com wrote:

 Hello:
 We recently have seen performance issues of SOLR (running on jetty). 
 
 We are looking for help in:
 
 1) How can I benchmark our current implementation?
 2) We are trying core vs another instances. What are pros and cons?
 3) Any pointers to validate current configuration is correct?
 
 Sent on my BlackBerry® from Vodafone


Re: Performance issues

2011-11-20 Thread Lalit Kumar 4

The search with couple of parameters bringing 650 counts(out of 2500 approx) 
and taking around 30 seconds

The schema.xml have more than 100 fields.
 
-Original Message-
From: Govind @ Gmail govind.kan...@gmail.com
Date: Sun, 20 Nov 2011 15:01:04 
To: solr-user@lucene.apache.orgsolr-user@lucene.apache.org
Reply-To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Cc: solr-user@lucene.apache.orgsolr-user@lucene.apache.org
Subject: Re: Performance issues

http://www.lucidimagination.com/content/scaling-lucene-and-solr 

Has good guidance.

Wrt 1. What is the  issue - mem, cpu or query perf or indexing process


On Nov 20, 2011, at 11:39 AM, Lalit Kumar 4 lkum...@sapient.com wrote:

 Hello:
 We recently have seen performance issues of SOLR (running on jetty). 
 
 We are looking for help in:
 
 1) How can I benchmark our current implementation?
 2) We are trying core vs another instances. What are pros and cons?
 3) Any pointers to validate current configuration is correct?
 
 Sent on my BlackBerry® from Vodafone


Re: Performance issues

2011-11-20 Thread Tor Henning Ueland
On Sun, Nov 20, 2011 at 11:27 AM, Lalit Kumar 4 lkum...@sapient.com wrote:

 The search with couple of parameters bringing 650 counts(out of 2500 approx) 
 and taking around 30 seconds
 The schema.xml have more than 100 fields.

You have of course started with the basics like making sure that the
index is less than available RAM on your server? (Or index-size per
shard is less than available RAM on the server if you are running a
multi server cluster)

As long as your index is bigger than what can be placed in cache you
will have a hard time keeping your queries fast no matter what, unless
the search queries are few enough that they are always within Solr`s
own query cache.

-- 
Regards
Tor Henning Ueland


how to transform a URL (newbie question)

2011-11-20 Thread Bent Jensen
I am a beginner to solr and need to ask the following:
Using the apache-solr example, how can I display an url in the xml document
as an active link/url in http? Do i need to add some special transform in
the example.xslt file? 

thanks
Ben
-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.1869 / Virus Database: 2092/4628 - Release Date: 11/20/11



Re: how to transform a URL (newbie question)

2011-11-20 Thread Erik Hatcher
Ben, 

Not quite sure how to interpret what you're asking here.  Are you speaking of 
the /browse view?  If so, you can tweak the templates under conf/velocity to 
make links out of things.

But generally, it's the end application that would take the results from Solr 
and render links as appropriate.

Erik

On Nov 20, 2011, at 11:53 , Bent Jensen wrote:

 I am a beginner to solr and need to ask the following:
 Using the apache-solr example, how can I display an url in the xml document
 as an active link/url in http? Do i need to add some special transform in
 the example.xslt file? 
 
 thanks
 Ben
 -
 No virus found in this message.
 Checked by AVG - www.avg.com
 Version: 2012.0.1869 / Virus Database: 2092/4628 - Release Date: 11/20/11
 



RE: how to transform a URL (newbie question)

2011-11-20 Thread Bent Jensen
Erik,
OK, I will look at that. Basically, what I amtrying to do is to index a
document with lots of URLs. I also index the url and give it a field type.
Don't know much about solr yet, but though maybe I can transform the url to
an active link, i.e. 'a href'. I tried putting the href into the xml
document, but it just prints out as text in html. I also could not find any
xslt transform or schema.

thanks
Ben

-Original Message-
From: Erik Hatcher [mailto:erik.hatc...@gmail.com] 
Sent: Sunday, November 20, 2011 9:05 AM
To: solr-user@lucene.apache.org
Subject: Re: how to transform a URL (newbie question)

Ben, 

Not quite sure how to interpret what you're asking here.  Are you speaking
of the /browse view?  If so, you can tweak the templates under conf/velocity
to make links out of things.

But generally, it's the end application that would take the results from
Solr and render links as appropriate.

Erik

On Nov 20, 2011, at 11:53 , Bent Jensen wrote:

 I am a beginner to solr and need to ask the following:
 Using the apache-solr example, how can I display an url in the xml
document
 as an active link/url in http? Do i need to add some special transform in
 the example.xslt file? 
 
 thanks
 Ben
 -
 No virus found in this message.
 Checked by AVG - www.avg.com
 Version: 2012.0.1869 / Virus Database: 2092/4628 - Release Date: 11/20/11
 

-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.1869 / Virus Database: 2092/4628 - Release Date: 11/20/11
-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.1869 / Virus Database: 2092/4628 - Release Date: 11/20/11



Re: Only a subset of edismax pf fields are used for the phrase part DisjunctionMaxQuery

2011-11-20 Thread Erick Erickson
Could we see the schema definitions for the fields in question? And
the solrconfig for the handler, and the query you actually send?

Best
Erick

On Fri, Nov 18, 2011 at 6:33 AM, Jean-Claude Dauphin
jc.daup...@gmail.com wrote:
 Hello,

 The parsedQuery is displayed as follow:

 parsedquery=+(DisjunctionMaxQuery((title:responsable^4.0 |
 keywords:responsable^3.0 | organizationName:responsable |
 location:responsable | formattedDescription:responsable^2.0 |
 nafCodeText:responsable^2.0 | jobCodeText:responsable^3.0 |
 categoryPayloads:responsable | labelLocation:responsable)~0.1)
 DisjunctionMaxQuery((title:boutique^4.0 | keywords:boutique^3.0 |
 organizationName:boutique | location:boutique |
 formattedDescription:boutique^2.0 | nafCodeText:boutique^2.0 |
 jobCodeText:boutique^3.0 | categoryPayloads:boutique |
 labelLocation:boutique)~0.1) DisjunctionMaxQuery((title:lingerie^4.0 |
 keywords:lingerie^3.0 | organizationName:lingerie | location:lingerie |
 formattedDescription:lingerie^2.0 | nafCodeText:lingerie^2.0 |
 jobCodeText:lingerie^3.0 | categoryPayloads:lingerie |
 labelLocation:lingerie)~0.1))

 *DisjunctionMaxQuery*((title:responsable boutique lingerie~10^4.0 |
 formattedDescription:responsable boutique lingerie~10^2.0 |
 categoryPayloads:responsable boutique lingerie~10)~0.1)

 The search query is 'responsable boutique lingerie'
 The qf and pf fields are the same:

 qf= title^4.0 formattedDescription^2.0 nafCodeText^2.0 jobCodeText^3.0
 organizationName^1.0 keywords^3.0 location^1.0 labelLocation^1.0
 categoryPayloads^1.0,

 pf= title^4.0 formattedDescription^2.0 nafCodeText^2.0 jobCodeText^3.0
 organizationName^1.0 keywords^3.0 location^1.0 labelLocation^1.0
 categoryPayloads^1.0,

 I would have expect to retrieve the whole set of pf fields for the phrase
 part of the parsed query!

 Is it comming from the field definition in the schema.xml?

 Best,

 Jean-Claude Dauphin



 --
 Jean-Claude Dauphin

 jc.daup...@gmail.com
 jc.daup...@afus.unesco.org

 http://kenai.com/projects/j-isis/
 http://www.unesco.org/isis/
 http://www.unesco.org/idams/
 http://www.greenstone.org



Re: wild card search and lower-casing

2011-11-20 Thread Erick Erickson
As it happens I'm working on SOLR-2438 which should address this. This patch
will provide two things:

The ability to define a new analysis chain in your schema.xml, currently called
multiterm that will be applied to queries of various sorts,
including wildcard,
prefix, range. This will be somewhat of an expert thing to make yourself...

In the absence of an explicit definition it'll synthesize a multiterm analyzer
out of the query analyzer, taking any char fitlers, and
lowercaseFilter (if present),
and ASCIIFoldingfilter (if present) and putting them in the multiterm
analyzer along
with a (hardcoded) WhitespaceTokenizer.

As of 3.6 and 4.0, this will be the default behavior, although you can
explicitly
define a field type parameter to specify the current behavior.

The reason it is on 3.6 is that I want it to bake for a while before
getting into the
wild, so I have no intention of trying to get it into the 3.5 release.

The patch is up for review now, I'd like another set of eyeballs or
two on it before
committing.

The patch that's up there now is against trunk but I hope to have a 3x
patch that
I'll apply to the 3x code line after 3.5 RC1 is cut.

Best
Erick


On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan iori...@yahoo.com wrote:

 You're right:

 public SolrQueryParser(IndexSchema schema, String
 defaultField) {
 ...
 setLowercaseExpandedTerms(false);
 ...
 }

 Please note that lowercaseExpandedTerms uses String.toLowercase() (uses  
 default Locale) which is a Locale sensitive operation.

 In Lucene AnalyzingQueryParser exists for this purposes, but I am not sure if 
 it is ported to solr.

  http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html



Re: Solr filterCache size settings...

2011-11-20 Thread Erick Erickson
Each fq will create a bitmap that is bounded by (maxdocs / 8) bytes.

You can think of the entries in the fiterCache as a map where the key is the
filter query you specify and the value is the aforementioned bitmap. The
number of entries specified in the config file is the number of entries
in that map. So the cache can take up roughly (assuming the size if 512)
512 * maxDocs / 8 bytes.

Best
Erick

On Fri, Nov 18, 2011 at 6:49 PM, Andrew Lundgren
lundg...@familysearch.org wrote:
 I am new to solr in general and trying to get a handle on the memory 
 requirements for caching.   Specifically I am looking at the filterCache 
 right now.  The documentation on size setting seems to indicate that it is 
 the number of values to be cached.  Did I read that correctly, or is it 
 really the amount of memory that will be set aside for the cache?

 How do you determine how much cache each fq will consume?

 Thank you!

 --
 Andrew Lundgren
 lundg...@familysearch.org


  NOTICE: This email message is for the sole use of the intended recipient(s) 
 and may contain confidential and privileged information. Any unauthorized 
 review, use, disclosure or distribution is prohibited. If you are not the 
 intended recipient, please contact the sender by reply email and destroy all 
 copies of the original message.





Re: Can files be faceted based on their size ?

2011-11-20 Thread Erick Erickson
Well, I wouldn't store it as a string in the first place. Otherwise,
you're right,
you have to store it as an entity that compares lexicographically, usually by
left-padding with zeroes.

But don't do that if at all possible, it's much more expensive than
storing ints or
longs, so can you re-index these as one of the Trie* types?

Best
Erick

On Sat, Nov 19, 2011 at 3:35 AM, neuron005 neuron...@gmail.com wrote:
 But sir
 fileSize is of type string, how will it compare?

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Can-files-be-faceted-based-on-their-size-tp3518393p3520569.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Performance issues

2011-11-20 Thread Erick Erickson
Please review:

http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

On Sun, Nov 20, 2011 at 5:32 AM, Tor Henning Ueland
tor.henn...@gmail.com wrote:
 On Sun, Nov 20, 2011 at 11:27 AM, Lalit Kumar 4 lkum...@sapient.com wrote:

 The search with couple of parameters bringing 650 counts(out of 2500 approx) 
 and taking around 30 seconds
 The schema.xml have more than 100 fields.

 You have of course started with the basics like making sure that the
 index is less than available RAM on your server? (Or index-size per
 shard is less than available RAM on the server if you are running a
 multi server cluster)

 As long as your index is bigger than what can be placed in cache you
 will have a hard time keeping your queries fast no matter what, unless
 the search queries are few enough that they are always within Solr`s
 own query cache.

 --
 Regards
 Tor Henning Ueland



Re: how to transform a URL (newbie question)

2011-11-20 Thread Erick Erickson
I think you're confusing Solr with a web app G

Solr itself has nothing to do whatsoever with presenting
things to the user. It just returns, as you have seen,
XML (or JSON or ) formatted replies. It's up to the
application layer to do something intelligent with those.

That said, the /browse request handler that ships with the
example code uses something
called the VelocityResponseWriter to render pages, where
the VeolcityResponseWriter interacts with the templates
Erik Hatcher mentioned to show you pages. So think of
all the Velocity stuff as your app engine for demo purposes.

Erik is directing you at that code if you want to hack the
Solr example to display stuff.

Hope that helps
Erick (not Hatcher G)

On Sun, Nov 20, 2011 at 2:15 PM, Bent Jensen bentjen...@yahoo.com wrote:
 Erik,
 OK, I will look at that. Basically, what I amtrying to do is to index a
 document with lots of URLs. I also index the url and give it a field type.
 Don't know much about solr yet, but though maybe I can transform the url to
 an active link, i.e. 'a href'. I tried putting the href into the xml
 document, but it just prints out as text in html. I also could not find any
 xslt transform or schema.

 thanks
 Ben

 -Original Message-
 From: Erik Hatcher [mailto:erik.hatc...@gmail.com]
 Sent: Sunday, November 20, 2011 9:05 AM
 To: solr-user@lucene.apache.org
 Subject: Re: how to transform a URL (newbie question)

 Ben,

 Not quite sure how to interpret what you're asking here.  Are you speaking
 of the /browse view?  If so, you can tweak the templates under conf/velocity
 to make links out of things.

 But generally, it's the end application that would take the results from
 Solr and render links as appropriate.

        Erik

 On Nov 20, 2011, at 11:53 , Bent Jensen wrote:

 I am a beginner to solr and need to ask the following:
 Using the apache-solr example, how can I display an url in the xml
 document
 as an active link/url in http? Do i need to add some special transform in
 the example.xslt file?

 thanks
 Ben
 -
 No virus found in this message.
 Checked by AVG - www.avg.com
 Version: 2012.0.1869 / Virus Database: 2092/4628 - Release Date: 11/20/11


 -
 No virus found in this message.
 Checked by AVG - www.avg.com
 Version: 2012.0.1869 / Virus Database: 2092/4628 - Release Date: 11/20/11
 -
 No virus found in this message.
 Checked by AVG - www.avg.com
 Version: 2012.0.1869 / Virus Database: 2092/4628 - Release Date: 11/20/11




Pagination problem with group.limit=2

2011-11-20 Thread Samarendra Pratap
Hi,
 As per our business logic we have to show two products per company in our
results. Second product should also be displayed as a normal search result,
instead of more... (or +, expand) kind of nested results.

*Short description*: With *group.limit=2 *option I am not able to find the
exact number of results (not groups) that will be returned in the output on
increasing start/rows parameters. Is there any solution?

*Detailed description*:

We are using solr 3.4 with following group options

group=true
group.field=company
group.limit=2
group.format=simple
group.ngroups=true

I have following company wise count in index against search criteria
company1 = 1 products
company2 = 2 products
company3 = 3 products
company4 = 4 products

Now I get a result with following figures

matches:10
ngroups:4
doclist.numFound:10

But actual results returned are different (7 results).

At max there can be 4x2 (= 8 ngroups x limit) products in the result but
company1 returns just 1 results. Others return 2 results each.

Now My question is how can I find this actual number so that I can display
correct page numbers?

I searched for it but there were very few threads regarding the exact issue
and those too were old.

Any help or pointer is appreciated.

-- 
Regards,
Samar


Re: wild card search and lower-casing

2011-11-20 Thread Dmitry Kan
Thanks Erick.

Do you think the patch you are working on will be applicable as well to 3.4?

Best,
Dmitry

On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson erickerick...@gmail.comwrote:

 As it happens I'm working on SOLR-2438 which should address this. This
 patch
 will provide two things:

 The ability to define a new analysis chain in your schema.xml, currently
 called
 multiterm that will be applied to queries of various sorts,
 including wildcard,
 prefix, range. This will be somewhat of an expert thing to make
 yourself...

 In the absence of an explicit definition it'll synthesize a multiterm
 analyzer
 out of the query analyzer, taking any char fitlers, and
 lowercaseFilter (if present),
 and ASCIIFoldingfilter (if present) and putting them in the multiterm
 analyzer along
 with a (hardcoded) WhitespaceTokenizer.

 As of 3.6 and 4.0, this will be the default behavior, although you can
 explicitly
 define a field type parameter to specify the current behavior.

 The reason it is on 3.6 is that I want it to bake for a while before
 getting into the
 wild, so I have no intention of trying to get it into the 3.5 release.

 The patch is up for review now, I'd like another set of eyeballs or
 two on it before
 committing.

 The patch that's up there now is against trunk but I hope to have a 3x
 patch that
 I'll apply to the 3x code line after 3.5 RC1 is cut.

 Best
 Erick


 On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan iori...@yahoo.com wrote:
 
  You're right:
 
  public SolrQueryParser(IndexSchema schema, String
  defaultField) {
  ...
  setLowercaseExpandedTerms(false);
  ...
  }
 
  Please note that lowercaseExpandedTerms uses String.toLowercase() (uses
  default Locale) which is a Locale sensitive operation.
 
  In Lucene AnalyzingQueryParser exists for this purposes, but I am not
 sure if it is ported to solr.
 
 
 http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html
 



Re: delta-import of rich documents like word and pdf files!

2011-11-20 Thread kumar8anuj
I am using solr 3.4 and configured my DataImportHandler to get some data from
MySql as well as index some rich document from the disk. 

This is the part of db-data-config file where i am indexing Rich text
documents. 


entity name=resume dataSource=ds-db query=Select
name,js_login_id div 25000 as dir from js_resumes where
js_login_id='${js_logins.id}' and is_primary = 1 and deleted=0 and mask_cv
!= 1 pk=resume_name 
deltaQuery=select js_login_id from js_resumes where
modified  '${dataimporter.last_index_time}' and is_primary = 1 and
deleted=0 
parentDeltaQuery=select  jsl.id as id  from
service_request_histories srh,service_requests sr, js_login_screenings jsls,
js_logins jsl where jsl.status IN(1,2) and srh.service_request_id = sr.id 
and jsl.id=jsls.js_login_id and srh.status in ('8','43') and jsls.id=srh.sid
and date(srh.created)  date_sub(now(),interval 2 day) and jsl.id =
'${js_resumes.js_login_id}' 
 
entity processor=TikaEntityProcessor
tikaConfig=tika-config.xml
url=http://localhost/resumes-new/resumes${resume.dir}/${js_logins.id}/${resume.name};
dataSource=ds-file format=text
field column=text name=resume /
/entity
/entity


But after some time i get the following error in my error log. It looks like
a class missing error, Can anyone tell me which poi jar version would work
with tika.0.6. Currently I have  poi-3.7.jar. 

Error which i am getting is this  

SEVERE: Exception while processing: js_logins document :
SolrInputDocument[{id=id(1.0)={100984},
complete_mobile_number=complete_mobile_number(1.0)={+91 9600067575},
emailid=emailid(1.0)={vkry...@gmail.com}, full_name=full_name(1.0)={Venkat
Ryali}}]:org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NoSuchMethodError:
org.apache.poi.xwpf.usermodel.XWPFParagraph.init(Lorg/openxmlformats/schemas/wordprocessingml/x2006/main/CTP;Lorg/apache/poi/xwpf/usermodel/XWPFDocument;)V
 
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:669)
 
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:622)
 
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:622)
 
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268) 
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187) 
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
 
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427) 
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408) 
Caused by: java.lang.NoSuchMethodError:
org.apache.poi.xwpf.usermodel.XWPFParagraph.init(Lorg/openxmlformats/schemas/wordprocessingml/x2006/main/CTP;Lorg/apache/poi/xwpf/usermodel/XWPFDocument;)V
 
at
org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator$MyXWPFParagraph.init(XWPFWordExtractorDecorator.java:163)
 
at
org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator$MyXWPFParagraph.init(XWPFWordExtractorDecorator.java:161)
 
at
org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractTableContent(XWPFWordExtractorDecorator.java:140)
 
at
org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.buildXHTML(XWPFWordExtractorDecorator.java:91)
 
at
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:69)
 
at
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:51) 
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120) 
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101) 
at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:128)
 
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
 
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:596)
 
... 7 more

--
View this message in context: 
http://lucene.472066.n3.nabble.com/delta-import-of-rich-documents-like-word-and-pdf-files-tp3502039p3524047.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr Performance/Architecture

2011-11-20 Thread Husain, Yavar

Number of rows in SQL Table (Indexed till now using Solr): 1 million
Total Size of Data in the table: 4GB
Total Index Size: 3.5 GB

Total Number of Rows that I have to index: 20 Million (approximately 100 GB 
Data) and growing

What is the best practices with respect to distributing the index? What I mean 
to say here is when should I distribute and what is the magic number that I can 
have for index size per instance?

For 1 million itself Solr instance running on a VM is taking roughly 2.5 hrs to 
index for me. So for 20 million roughly it would take 60 -70 hrs. That would be 
too much.

What would be the best distributed architecture for my case? It will be great 
if people may share their best practices and experience.

Thanks!!
/PRE
BR
**BRThis
 message may contain confidential or proprietary information intended only for 
the use of theBRaddressee(s) named above or may contain information that is 
legally privileged. If you areBRnot the intended addressee, or the person 
responsible for delivering it to the intended addressee,BRyou are hereby 
notified that reading, disseminating, distributing or copying this message is 
strictlyBRprohibited. If you have received this message by mistake, please 
immediately notify us byBRreplying to the message and delete the original 
message and any copies immediately thereafter.BR
BR
Thank you.~BR
**BR
FAFLDBR
PRE