Re: Sorting performance
It is slow each time I run it. (I test it from the Solr admin console or from a JAVA program using the Http client). I do not get the OOM each time. Thx Christophe Otis Gospodnetic wrote: Is the sorted query slow only the first time or every time you run it? You got an OOM? What -Xmx value are you using? Try increasing it. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: christophe [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Friday, October 17, 2008 1:28:52 PM Subject: Sorting performance Hi, I'm doing some tests with Solr1.3 I have loaded around 7M documents, each with a few stored and indexed fields. This query: text:sometext returns the results, sorted by score in a few milliseconds. (I display 10 out of 8747 matched documents) This one: text:sometext;id desc takes something like 60s or more to return the data (when it doesn't fails with an out of memory error). (id is a string type). I have tried to display only id, same results. Any ideas ? I'm sure I'm doing something wrong. My schema is based on the sample, with the following fields: / multiValued=true / default=NOW multiValued=false/ Thanks Christophe
Re: Sorting performance
Here are the memory parameters I'm using now(Tomcat): -Xms2024m -Xmx2024m With those values, the second query is way faster. Only the first one is very slow. Thanks for the tip. However, I'm wondering if will be enough and I will not hit the same issues when I will have many users searching at the same time: I will do a stress test to check this. Thanks Christophe christophe wrote: It is slow each time I run it. (I test it from the Solr admin console or from a JAVA program using the Http client). I do not get the OOM each time. Thx Christophe Otis Gospodnetic wrote: Is the sorted query slow only the first time or every time you run it? You got an OOM? What -Xmx value are you using? Try increasing it. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: christophe [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Friday, October 17, 2008 1:28:52 PM Subject: Sorting performance Hi, I'm doing some tests with Solr1.3 I have loaded around 7M documents, each with a few stored and indexed fields. This query: text:sometext returns the results, sorted by score in a few milliseconds. (I display 10 out of 8747 matched documents) This one: text:sometext;id desc takes something like 60s or more to return the data (when it doesn't fails with an out of memory error). (id is a string type). I have tried to display only id, same results. Any ideas ? I'm sure I'm doing something wrong. My schema is based on the sample, with the following fields: / multiValued=true / default=NOW multiValued=false/ Thanks Christophe
solr 1.3 multi language?
Hi everybody, I would like you to help me a bit about managing this multi-language part, actually an example would be excellent. So I did multi index in one core and I would like you to let me know what you think about the way that I've managed that, is there more parameter that I don't know, some help and an example would be great full. Thanks a lot, I need to manage this language : French (FR) English (EN) German (DE) Spanish (ES) Russian (RU) Portuguese (Brazilian) (PT) Polish (PO) Dutch (NL) Greek (GR) Japanese (JA) Turkish (TR) My schema looks like that : fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType !-- languages -- fieldtype name=text_fr class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=French / /analyzer /fieldtype fieldtype name=text_en class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English / /analyzer /fieldtype fieldtype name=text_de class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German / /analyzer /fieldtype fieldtype name=text_es class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=Spanish / /analyzer /fieldtype fieldType name=text_ru class=solr.TextField analyzer class=org.apache.lucene.analysis.ru.RussianAnalyzer/ filter class=solr.SnowballPorterFilterFactory language=Russian / /fieldType fieldtype name=text_pt class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=Portuguese / /analyzer /fieldtype fieldtype name=text_it class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=Italian / /analyzer /fieldtype
Re: Tree Faceting Component
Jeremy, Great troubleshooting! You were spot on. I've posted a new patch that fixes the issue. Erik On Oct 16, 2008, at 9:53 PM, Jeremy Hinegardner wrote: After a bit more investigating, it appears that any facet tree where the first item is numerical or boolean or some non-textual type does not produce any secondary facets. This includes sint, sfloat, boolean and such. For instance, on the sample index: facet.tree=sku,cat = works facet.tree=cat,sku = works facet.tree=manu_exact,cat = works facet.tree=cat,manu_exact = works facet.tree=popularity,inStock = fails facet.tree=inStock,popularity = fails facet.tree=manu_exact,weight = works facet.tree=weight,manu_exact = fails I'm not very familiar with the Solr / Lucene Java API, so this is slow going here. Maybe I'm barking up the wrong tree, but is the TermQuery for the secondary SimpleFacet messing up some how? I tried to dig into the code, but was unsuccessful. It appears to me that the searcher never returns a docSet for any TermQuery where the field being searched has a type that is non-textual. As a final test, I changed the schema and made the inStock field a 'text' field instead of 'boolean'. When I did that, and reindexed the sample data then the tree facet would work correctly as either facet.tree=cat,inStock or facet.tree=inStock,cat. Whereas before it would only work in the former. enjoy, -jeremy On Thu, Oct 16, 2008 at 10:55:49AM -0600, Jeremy Hinegardner wrote: Erik, After some more experiments, I can get it to perform incorrectly using the sample solr data. The example query from SOLR-792 ticket: http://localhost:8983/solr/select?q=*:*rows=0facet=onfacet.field=catfacet.tree=cat,inStockwt=jsonindent=on Make a few altertions to the query: 1) swap the tree order - all tree facets are 0 http://localhost:8983/solr/select?q=*:*rows=0facet=onfacet.field=catfacet.tree=inStock,catwt=jsonindent=on 2) swap tree order and change facet.field to be the primary( inStock ) http://localhost:8983/solr/select?q=*:*rows=0facet=onfacet.field=inStockfacet.tree=inStock,catwt=jsonindent=on Also, can tree faceting work distributed? enjoy, -jeremy On Wed, Oct 15, 2008 at 05:41:21PM -0700, Erik Hatcher wrote: Jeremy, What's the full request you're making to Solr? Do you get values when you facet normally on date_id and type? facet.field=date_idfacet.field=type Erik p.s. this e-mail is not on the list (on a hotel net connection blocking outgoing mail) - feel free to reply to this back on the list though. On Oct 15, 2008, at 5:29 PM, Jeremy Hinegardner wrote: Hi all, I'm testing out using the Tree Faceting Component (SOLR-792) on top of Solr 1.3. It looks like it would do exactly what I want, but something is not working correctly with my schema. When I use the example schema, it works just fine, but I swap out the example schema's and example index and then put in my index and and schema, tree facet does not work. Both of the fields I want to facet can be faceted individually, but when I say facet.tree=date_id,type then all of the values are 0. Does anyone have any ideas on where I should start looking ? enjoy, -jeremy -- = = = = = === Jeremy Hinegardner [EMAIL PROTECTED] -- = = = = Jeremy Hinegardner [EMAIL PROTECTED] -- = = == Jeremy Hinegardner [EMAIL PROTECTED]
Re: Tree Faceting Component
Erik, Thanks, its working great. Next is to make it distributed. I was thinking of working on this, is the FacetCompoent a good model to work from to make the TreeFacet distributed? I should probably join solr-dev for that conversation I assume :-). -jeremy On Thu, Oct 16, 2008 at 11:12:45PM -0700, Erik Hatcher wrote: Jeremy, Great troubleshooting! You were spot on. I've posted a new patch that fixes the issue. Erik On Oct 16, 2008, at 9:53 PM, Jeremy Hinegardner wrote: After a bit more investigating, it appears that any facet tree where the first item is numerical or boolean or some non-textual type does not produce any secondary facets. This includes sint, sfloat, boolean and such. For instance, on the sample index: facet.tree=sku,cat = works facet.tree=cat,sku = works facet.tree=manu_exact,cat = works facet.tree=cat,manu_exact = works facet.tree=popularity,inStock = fails facet.tree=inStock,popularity = fails facet.tree=manu_exact,weight = works facet.tree=weight,manu_exact = fails I'm not very familiar with the Solr / Lucene Java API, so this is slow going here. Maybe I'm barking up the wrong tree, but is the TermQuery for the secondary SimpleFacet messing up some how? I tried to dig into the code, but was unsuccessful. It appears to me that the searcher never returns a docSet for any TermQuery where the field being searched has a type that is non-textual. As a final test, I changed the schema and made the inStock field a 'text' field instead of 'boolean'. When I did that, and reindexed the sample data then the tree facet would work correctly as either facet.tree=cat,inStock or facet.tree=inStock,cat. Whereas before it would only work in the former. enjoy, -jeremy On Thu, Oct 16, 2008 at 10:55:49AM -0600, Jeremy Hinegardner wrote: Erik, After some more experiments, I can get it to perform incorrectly using the sample solr data. The example query from SOLR-792 ticket: http://localhost:8983/solr/select?q=*:*rows=0facet=onfacet.field=catfacet.tree=cat,inStockwt=jsonindent=on Make a few altertions to the query: 1) swap the tree order - all tree facets are 0 http://localhost:8983/solr/select?q=*:*rows=0facet=onfacet.field=catfacet.tree=inStock,catwt=jsonindent=on 2) swap tree order and change facet.field to be the primary( inStock ) http://localhost:8983/solr/select?q=*:*rows=0facet=onfacet.field=inStockfacet.tree=inStock,catwt=jsonindent=on Also, can tree faceting work distributed? enjoy, -jeremy On Wed, Oct 15, 2008 at 05:41:21PM -0700, Erik Hatcher wrote: Jeremy, What's the full request you're making to Solr? Do you get values when you facet normally on date_id and type? facet.field=date_idfacet.field=type Erik p.s. this e-mail is not on the list (on a hotel net connection blocking outgoing mail) - feel free to reply to this back on the list though. On Oct 15, 2008, at 5:29 PM, Jeremy Hinegardner wrote: Hi all, I'm testing out using the Tree Faceting Component (SOLR-792) on top of Solr 1.3. It looks like it would do exactly what I want, but something is not working correctly with my schema. When I use the example schema, it works just fine, but I swap out the example schema's and example index and then put in my index and and schema, tree facet does not work. Both of the fields I want to facet can be faceted individually, but when I say facet.tree=date_id,type then all of the values are 0. Does anyone have any ideas on where I should start looking ? enjoy, -jeremy -- Jeremy Hinegardner [EMAIL PROTECTED] -- Jeremy Hinegardner [EMAIL PROTECTED] -- Jeremy Hinegardner [EMAIL PROTECTED] -- Jeremy Hinegardner [EMAIL PROTECTED]
Re: Sorting performance
You need to setup a warming query that sorts so that the initial long query is done behind the scenes. Users first query will then be fast. Solrconfig. - Mark On Oct 18, 2008, at 1:34 AM, christophe [EMAIL PROTECTED] wrote: Here are the memory parameters I'm using now(Tomcat): -Xms2024m - Xmx2024m With those values, the second query is way faster. Only the first one is very slow. Thanks for the tip. However, I'm wondering if will be enough and I will not hit the same issues when I will have many users searching at the same time: I will do a stress test to check this. Thanks Christophe christophe wrote: It is slow each time I run it. (I test it from the Solr admin console or from a JAVA program using the Http client). I do not get the OOM each time. Thx Christophe Otis Gospodnetic wrote: Is the sorted query slow only the first time or every time you run it? You got an OOM? What -Xmx value are you using? Try increasing it. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: christophe [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Friday, October 17, 2008 1:28:52 PM Subject: Sorting performance Hi, I'm doing some tests with Solr1.3 I have loaded around 7M documents, each with a few stored and indexed fields. This query: text:sometext returns the results, sorted by score in a few milliseconds. (I display 10 out of 8747 matched documents) This one: text:sometext;id desc takes something like 60s or more to return the data (when it doesn't fails with an out of memory error). (id is a string type). I have tried to display only id, same results. Any ideas ? I'm sure I'm doing something wrong. My schema is based on the sample, with the following fields: / multiValued=true / default=NOW multiValued=false/ Thanks Christophe