Re: How to select one entity at a time?
Of course I agree. But while performing a search, if I want to search only the data from USER table, how can I acheive it. Suppose I have a user name bob in both USER and MANAGER tables. So when I perform http://localhost:8983/solr/dataimport?command=full-import , all the USER and MANAGER values will get indexed. And when i do a search like, http://localhost:8983/solr/select/?q=bob&version=2.2&start=0&rows=10&indent=on&wt=json it will return all the values indexed from both USER and MANAGER table. But I want only the data indexed from either USER table or MANAGER table at a time based on the end user's choice. How can I achieve it. Thanks for your reply con Noble Paul നോബിള് नोब्ळ् wrote: > > The entity and the select query has no relationship > The entity comes into picture when you do a dataimport > > eg: > http://localhost:8983/solr/dataimport?command=full-import&enity=user > > This is an indexing operation > > On Wed, Oct 1, 2008 at 11:26 AM, con <[EMAIL PROTECTED]> wrote: >> >> Hi guys, >> In the URL, http://localhost:8983/solr/select/?q= >> :bob&version=2.2&start=0&rows=10&indent=on&wt=json >> >> q=: applies to a field and not to an entity. So If I have 3 entities >> like: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> I cannot invoke the entity, 'user', just like the above url. i went >> through >> the possible arguments but didnt found a way to invoke an entity. Is >> there a >> way for this purpose. >> ragards >> con >> >> >> >> >> >> >> con wrote: >>> >>> Thanks Everybody. >>> I have went through the wiki and some other docs. Actually I have a >>> tight >>> schedule and I have to look into various other things along with this. >>> Currently I am looking into rebuilding solr by writing a wrapper class. >>> I will update you with more meaningful questions soon.. >>> thanks and regards. >>> con >>> >>> >>> Norberto Meijome-6 wrote: On Fri, 26 Sep 2008 02:35:18 -0700 (PDT) con <[EMAIL PROTECTED]> wrote: > What you meant is correct only. Please excuse for that I am new to > solr. > :-( Con, have a read here : http://www.ibm.com/developerworks/java/library/j-solr1/ it helped me pick up the basics a while back. it refers to 1.2, but the core concepts are relevant to 1.3 too. b _ {Beto|Norberto|Numard} Meijome Hildebrant's Principle: If you don't know where you are going, any road will get you there. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned. >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/How-to-select-one-entity-at-a-time--tp19668759p19754869.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > --Noble Paul > > -- View this message in context: http://www.nabble.com/How-to-select-one-entity-at-a-time--tp19668759p19755437.html Sent from the Solr - User mailing list archive at Nabble.com.
Does Solr Indexing Websites possible?
Hi all, I want to enable the search functionality in my website. Can i use solr for indexing the website? Is there any option in solr.Pls let me know as soon as possible. Thanks in advance Prabhu.K -- View this message in context: http://www.nabble.com/Does-Solr-Indexing-Websites-possible--tp19755329p19755329.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to select one entity at a time?
The entity and the select query has no relationship The entity comes into picture when you do a dataimport eg: http://localhost:8983/solr/dataimport?command=full-import&enity=user This is an indexing operation On Wed, Oct 1, 2008 at 11:26 AM, con <[EMAIL PROTECTED]> wrote: > > Hi guys, > In the URL, http://localhost:8983/solr/select/?q= > :bob&version=2.2&start=0&rows=10&indent=on&wt=json > > q=: applies to a field and not to an entity. So If I have 3 entities > like: > > > > > > > > > > > > > > > > I cannot invoke the entity, 'user', just like the above url. i went through > the possible arguments but didnt found a way to invoke an entity. Is there a > way for this purpose. > ragards > con > > > > > > > con wrote: >> >> Thanks Everybody. >> I have went through the wiki and some other docs. Actually I have a tight >> schedule and I have to look into various other things along with this. >> Currently I am looking into rebuilding solr by writing a wrapper class. >> I will update you with more meaningful questions soon.. >> thanks and regards. >> con >> >> >> Norberto Meijome-6 wrote: >>> >>> On Fri, 26 Sep 2008 02:35:18 -0700 (PDT) >>> con <[EMAIL PROTECTED]> wrote: >>> What you meant is correct only. Please excuse for that I am new to solr. :-( >>> >>> Con, have a read here : >>> >>> http://www.ibm.com/developerworks/java/library/j-solr1/ >>> >>> it helped me pick up the basics a while back. it refers to 1.2, but the >>> core concepts are relevant to 1.3 too. >>> >>> b >>> _ >>> {Beto|Norberto|Numard} Meijome >>> >>> Hildebrant's Principle: >>> If you don't know where you are going, >>> any road will get you there. >>> >>> I speak for myself, not my employer. Contents may be hot. Slippery when >>> wet. Reading disclaimers makes you go blind. Writing them is worse. You >>> have been Warned. >>> >>> >> >> > > -- > View this message in context: > http://www.nabble.com/How-to-select-one-entity-at-a-time--tp19668759p19754869.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- --Noble Paul
Re: How to select one entity at a time?
Hi guys, In the URL, http://localhost:8983/solr/select/?q= :bob&version=2.2&start=0&rows=10&indent=on&wt=json q=: applies to a field and not to an entity. So If I have 3 entities like: I cannot invoke the entity, 'user', just like the above url. i went through the possible arguments but didnt found a way to invoke an entity. Is there a way for this purpose. ragards con con wrote: > > Thanks Everybody. > I have went through the wiki and some other docs. Actually I have a tight > schedule and I have to look into various other things along with this. > Currently I am looking into rebuilding solr by writing a wrapper class. > I will update you with more meaningful questions soon.. > thanks and regards. > con > > > Norberto Meijome-6 wrote: >> >> On Fri, 26 Sep 2008 02:35:18 -0700 (PDT) >> con <[EMAIL PROTECTED]> wrote: >> >>> What you meant is correct only. Please excuse for that I am new to solr. >>> :-( >> >> Con, have a read here : >> >> http://www.ibm.com/developerworks/java/library/j-solr1/ >> >> it helped me pick up the basics a while back. it refers to 1.2, but the >> core concepts are relevant to 1.3 too. >> >> b >> _ >> {Beto|Norberto|Numard} Meijome >> >> Hildebrant's Principle: >> If you don't know where you are going, >> any road will get you there. >> >> I speak for myself, not my employer. Contents may be hot. Slippery when >> wet. Reading disclaimers makes you go blind. Writing them is worse. You >> have been Warned. >> >> > > -- View this message in context: http://www.nabble.com/How-to-select-one-entity-at-a-time--tp19668759p19754869.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing Large Files with Large DataImport: Problems
this patch is created from 1.3 (may apply on trunk also) --Noble On Wed, Oct 1, 2008 at 9:56 AM, Noble Paul നോബിള് नोब्ळ् <[EMAIL PROTECTED]> wrote: > I guess it is a threading problem. I can give you a patch. you can raise a bug > --Noble > > On Wed, Oct 1, 2008 at 2:11 AM, KyleMorrison <[EMAIL PROTECTED]> wrote: >> >> As a follow up: I continued tweaking the data-config.xml, and have been able >> to make the commit fail with as little as 3 fields in the sdc.xml, with only >> one multivalued field. Even more strange, some fields work and some do not. >> For instance, in my dc.xml: >> >> > xpath="/iProClassDatabase/iProClassEntry/GENERAL_INFORMATION/Taxonomy/Lineage/Taxon" >> /> >> . >> . >> . >> > xpath="/iProClassDatabase/iProClassEntry/GENERAL_INFORMATION/Protein_Name_and_ID/GenPept" >> /> >> >> and in the schema.xml: >> > multiValued="true" /> >> . >> . >> . >> > multiValued="true" /> >> but taxon works and genpept does not. What could possibly account for this >> discrepancy? Again, the error logs from the server are exactly that seen in >> the first post. >> >> What is going on? >> >> >> KyleMorrison wrote: >>> >>> Yes, this is the most recent version of Solr, stream="true" and stopwords, >>> lowercase and removeDuplicate being applied to all multivalued fields? >>> Would the filters possibly be causing this? I will not use them and see >>> what happens. >>> >>> Kyle >>> >>> >>> Shalin Shekhar Mangar wrote: Hmm, strange. This is Solr 1.3.0, right? Do you have any transformers applied to these multi-valued fields? Do you have stream="true" in the entity? On Tue, Sep 30, 2008 at 11:01 PM, KyleMorrison <[EMAIL PROTECTED]> wrote: > > I apologize for spamming this mailing list with my problems, but I'm at > my > wits end. I'll get right to the point. > > I have an xml file which is ~1GB which I wish to index. If that is > successful, I will move to a larger file of closer to 20GB. However, > when I > run my data-config(let's call it dc.xml) over it, the import only > manages > to > get about 27 rows, out of roughly 200K. The exact same > data-config(dc.xml) > works perfectly on smaller data files of the same type. > > This data-config is quite large, maybe 250 fields. When I run a smaller > data-config (let's call it sdc.xml) over the 1GB file, the sdc.xml works > perfectly. The only conclusion I can draw from this is that the > data-config > method just doesn't scale well. > > When the dc.xml fails, the server logs spit out: > > Sep 30, 2008 11:40:18 AM org.apache.solr.core.SolrCore execute > INFO: [] webapp=/solr path=/dataimport params={command=full-import} > status=0 > QTime=95 > Sep 30, 2008 11:40:18 AM org.apache.solr.handler.dataimport.DataImporter > doFullImport > INFO: Starting Full Import > Sep 30, 2008 11:40:18 AM org.apache.solr.update.DirectUpdateHandler2 > deleteAll > INFO: [] REMOVING ALL DOCUMENTS FROM INDEX > Sep 30, 2008 11:40:20 AM org.apache.solr.handler.dataimport.DataImporter > doFullImport > SEVERE: Full Import failed > java.util.ConcurrentModificationException >at > java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) >at java.util.AbstractList$Itr.next(AbstractList.java:343) >at > > org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402) >at > > org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373) >at > > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304) >at > > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) >at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) >at > > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >at > > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) >at > > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) > Sep 30, 2008 11:41:18 AM org.apache.solr.core.SolrCore execute > INFO: [] webapp=/solr path=/dataimport params={command=full-import} > status=0 > QTime=77 > Sep 30, 2008 11:41:18 AM org.apache.solr.handler.dataimport.DataImporter > doFullImport > INFO: Starting Full Import > Sep 30, 2008 11:41:18 AM org.apache.solr.update.DirectUpdateHandler2 > deleteAll > INFO: [] REMOVING ALL DOCUMENTS FROM INDEX > Sep 30, 2008 11:41:19 AM org.apache.solr.handler.dataimport.DataImporter > doFullImport > SEVERE: Full Import failed > java.util.ConcurrentModificationException >at > java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) >at java.util.Abstr
Re: Indexing Large Files with Large DataImport: Problems
I guess it is a threading problem. I can give you a patch. you can raise a bug --Noble On Wed, Oct 1, 2008 at 2:11 AM, KyleMorrison <[EMAIL PROTECTED]> wrote: > > As a follow up: I continued tweaking the data-config.xml, and have been able > to make the commit fail with as little as 3 fields in the sdc.xml, with only > one multivalued field. Even more strange, some fields work and some do not. > For instance, in my dc.xml: > > xpath="/iProClassDatabase/iProClassEntry/GENERAL_INFORMATION/Taxonomy/Lineage/Taxon" > /> > . > . > . > xpath="/iProClassDatabase/iProClassEntry/GENERAL_INFORMATION/Protein_Name_and_ID/GenPept" > /> > > and in the schema.xml: > multiValued="true" /> > . > . > . > multiValued="true" /> > but taxon works and genpept does not. What could possibly account for this > discrepancy? Again, the error logs from the server are exactly that seen in > the first post. > > What is going on? > > > KyleMorrison wrote: >> >> Yes, this is the most recent version of Solr, stream="true" and stopwords, >> lowercase and removeDuplicate being applied to all multivalued fields? >> Would the filters possibly be causing this? I will not use them and see >> what happens. >> >> Kyle >> >> >> Shalin Shekhar Mangar wrote: >>> >>> Hmm, strange. >>> >>> This is Solr 1.3.0, right? Do you have any transformers applied to these >>> multi-valued fields? Do you have stream="true" in the entity? >>> >>> On Tue, Sep 30, 2008 at 11:01 PM, KyleMorrison <[EMAIL PROTECTED]> >>> wrote: >>> I apologize for spamming this mailing list with my problems, but I'm at my wits end. I'll get right to the point. I have an xml file which is ~1GB which I wish to index. If that is successful, I will move to a larger file of closer to 20GB. However, when I run my data-config(let's call it dc.xml) over it, the import only manages to get about 27 rows, out of roughly 200K. The exact same data-config(dc.xml) works perfectly on smaller data files of the same type. This data-config is quite large, maybe 250 fields. When I run a smaller data-config (let's call it sdc.xml) over the 1GB file, the sdc.xml works perfectly. The only conclusion I can draw from this is that the data-config method just doesn't scale well. When the dc.xml fails, the server logs spit out: Sep 30, 2008 11:40:18 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=95 Sep 30, 2008 11:40:18 AM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Sep 30, 2008 11:40:18 AM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Sep 30, 2008 11:40:20 AM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402) at org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) Sep 30, 2008 11:41:18 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=77 Sep 30, 2008 11:41:18 AM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Sep 30, 2008 11:41:18 AM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Sep 30, 2008 11:41:19 AM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402) at org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373) at org.apache.solr.
Re: Question about facet.prefix usage
not really. facet.query filters the result set. Here we need to filter the facet counts by multiple facet prefixes. facet.query would work only if the faceted field is not a multi-value field. Erik Hatcher wrote: > > If I'm not mistaken, doesn't facet.query accomplish what you want? > > Erik > > > -- View this message in context: http://www.nabble.com/Question-about-facet.prefix-usage-tp15836501p19753290.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Are facet searches slower on large indexes?
the time factor has more to do with teh number of distinct values in the field being faceted on then it does the number of documents. with 1 million documents there are probably a lot more indexed terms in the "contents" field then there are with only 1000 documents. As an inverted index, there is no efficient way for Solr's faceting code to know just which terms are in the 37 docs that match your query -- it has to check them all. The good news is that if you can make your filterCache big enough, it won't matter which 37 (or 37,000) documents match your next query where you facet on the contents field -- the facet counts should compute much faster. For fields where Solr can tell it will have just one value, it can do some optimizations to use the FieldCache instead of iterating over every term in the field you're faceting on, but that would apply to your "contents" field. : I'm doing a facet search like the following. The content field schema is : : : : : : : : : /solr/select?q=dirt : field:www.example.com&facet=true&facet.field=content&facet.limit=-1&facet.mincount=1 : : If I run this on a server with a total of 1000 pages that contain : pages for www.example.com, it returns in about 1 second, and gives me : 37 docs, and quite a few facet values. : : If I run this same search on a server with over a 1,000,000 pages in : total, including the pages that are in the first example, it returns : in about 2 minutes! still giving me 37 docs and the same amount of : facet values. : : Seems to me the search should have been constrained to : field:www.example.com in both cases, so perhaps shouldn't be much : different in time to execute. : : Is there any more in formation on facet searching that will explain : what's going on? -Hoss
Re: Discarding undefined fields in query
On Tue, Sep 30, 2008 at 2:42 PM, Jérôme Etévé <[EMAIL PROTECTED]> wrote: > But still I have an error from the webapp when I try to query my > schema with non existing fields in my query ( like foo:bar ). > > I'm wondering if the query q is parsed in a very simple way somewhere > else (and independently from any customized QParserPlugin) and checked > against the schema. It should not be. Are you sure your QParser is being used? Does the error contain a stack trace that can pinpoint where it's coming from? -Yonik
Re: Searching Question
I hit ctrl-S by mistake. This is the method you are after: http://lucene.apache.org/java/2_3_2/api/core/org/apache/lucene/search/DefaultSimilarity.html#tf(float) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Otis Gospodnetic <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Tuesday, September 30, 2008 4:40:08 PM > Subject: Re: Searching Question > > The easiest thing is to look at Lucene javadoc and look for Similarity and > DefaultSimilarity classes. Then have a peek at Lucene contrib to get some > other > examples of custom Similarity. You'll just need to override one method, for > example: > > > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message > > From: Jake Conk > > To: solr-user@lucene.apache.org > > Sent: Tuesday, September 30, 2008 3:11:01 PM > > Subject: Re: Searching Question > > > > How would I write a custom Similarity factor that overrides the TF > > function? Is there some documentation on that somewhere? > > > > On Sat, Sep 27, 2008 at 5:14 AM, Grant Ingersoll wrote: > > > > > > On Sep 26, 2008, at 2:10 PM, Otis Gospodnetic wrote: > > > > > >> It might be easiest to store the thread ID and the number of replies in > > >> the thread in each post Document in Solr. > > > > > > Yeah, but that would mean updating every document in a thread every time a > > > new reply is added. > > > > > > I still keep going back to the solution as putting all the replies in a > > > single document, and then using a custom Similarity factor that overrides > > > the TF function and/or the length normalization. Still, this suffers from > > > having to update the document for every new reply. > > > > > > Let's take a step back... > > > > > > Can I ask why you want the scoring this way? What have you seen in your > > > results that leads you to believe it is the correct way? Note, I'm not > > > trying to convince you it's wrong, I just want to better understand what's > > > going on. > > > > > > > > >> > > >> > > >> Otherwise it sounds like you'll have to combine some search results or > > >> data post-search. > > >> > > >> Otis > > >> -- > > >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > >> > > >> > > >> > > >> - Original Message > > >>> > > >>> From: Jake Conk > > >>> To: solr-user@lucene.apache.org > > >>> Sent: Friday, September 26, 2008 1:50:37 PM > > >>> Subject: Re: Searching Question > > >>> > > >>> Grant, > > >>> > > >>> Each post is its own document but I can merge them all into a single > > >>> document under one thread if that will allow me to do what I want. > > >>> The number of replies is stored both in Solr and the DB. > > >>> > > >>> Thanks, > > >>> > > >>> - JC > > >>> > > >>> On Fri, Sep 26, 2008 at 5:24 AM, Grant Ingersoll wrote: > > > > Is a thread and all of it's posts a single document? In other words, > > how > > are you modeling your posts as Solr documents? Also, where are you > > keeping > > track of the number of replies? Is that in Solr or in a DB? > > > > -Grant > > > > On Sep 25, 2008, at 8:51 PM, Jake Conk wrote: > > > > > Hello, > > > > > > We are using Solr for our new forums search feature. If possible when > > > searching for the word "Halo" we would like threads that contain the > > > word "Halo" the most with the least amount of posts in that thread to > > > have a higher score. > > > > > > For instance, if we have a thread with 10 posts and the word "Halo" > > > shows up 5 times then that should have a lower score than a thread > > > that has the word "Halo" 3 times within its posts and has 5 replies. > > > Basically the thread that shows the search string most frequently > > > amongst the number of posts in the thread should be the one with the > > > highest score. > > > > > > Is something like this possible? > > > > > > Thanks, > > > > > > > > >
Re: Indexing Large Files with Large DataImport: Problems
As a follow up: I continued tweaking the data-config.xml, and have been able to make the commit fail with as little as 3 fields in the sdc.xml, with only one multivalued field. Even more strange, some fields work and some do not. For instance, in my dc.xml: . . . and in the schema.xml: . . . but taxon works and genpept does not. What could possibly account for this discrepancy? Again, the error logs from the server are exactly that seen in the first post. What is going on? KyleMorrison wrote: > > Yes, this is the most recent version of Solr, stream="true" and stopwords, > lowercase and removeDuplicate being applied to all multivalued fields? > Would the filters possibly be causing this? I will not use them and see > what happens. > > Kyle > > > Shalin Shekhar Mangar wrote: >> >> Hmm, strange. >> >> This is Solr 1.3.0, right? Do you have any transformers applied to these >> multi-valued fields? Do you have stream="true" in the entity? >> >> On Tue, Sep 30, 2008 at 11:01 PM, KyleMorrison <[EMAIL PROTECTED]> >> wrote: >> >>> >>> I apologize for spamming this mailing list with my problems, but I'm at >>> my >>> wits end. I'll get right to the point. >>> >>> I have an xml file which is ~1GB which I wish to index. If that is >>> successful, I will move to a larger file of closer to 20GB. However, >>> when I >>> run my data-config(let's call it dc.xml) over it, the import only >>> manages >>> to >>> get about 27 rows, out of roughly 200K. The exact same >>> data-config(dc.xml) >>> works perfectly on smaller data files of the same type. >>> >>> This data-config is quite large, maybe 250 fields. When I run a smaller >>> data-config (let's call it sdc.xml) over the 1GB file, the sdc.xml works >>> perfectly. The only conclusion I can draw from this is that the >>> data-config >>> method just doesn't scale well. >>> >>> When the dc.xml fails, the server logs spit out: >>> >>> Sep 30, 2008 11:40:18 AM org.apache.solr.core.SolrCore execute >>> INFO: [] webapp=/solr path=/dataimport params={command=full-import} >>> status=0 >>> QTime=95 >>> Sep 30, 2008 11:40:18 AM org.apache.solr.handler.dataimport.DataImporter >>> doFullImport >>> INFO: Starting Full Import >>> Sep 30, 2008 11:40:18 AM org.apache.solr.update.DirectUpdateHandler2 >>> deleteAll >>> INFO: [] REMOVING ALL DOCUMENTS FROM INDEX >>> Sep 30, 2008 11:40:20 AM org.apache.solr.handler.dataimport.DataImporter >>> doFullImport >>> SEVERE: Full Import failed >>> java.util.ConcurrentModificationException >>>at >>> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) >>>at java.util.AbstractList$Itr.next(AbstractList.java:343) >>>at >>> >>> org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402) >>>at >>> >>> org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373) >>>at >>> >>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304) >>>at >>> >>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) >>>at >>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) >>>at >>> >>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >>>at >>> >>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) >>>at >>> >>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) >>> Sep 30, 2008 11:41:18 AM org.apache.solr.core.SolrCore execute >>> INFO: [] webapp=/solr path=/dataimport params={command=full-import} >>> status=0 >>> QTime=77 >>> Sep 30, 2008 11:41:18 AM org.apache.solr.handler.dataimport.DataImporter >>> doFullImport >>> INFO: Starting Full Import >>> Sep 30, 2008 11:41:18 AM org.apache.solr.update.DirectUpdateHandler2 >>> deleteAll >>> INFO: [] REMOVING ALL DOCUMENTS FROM INDEX >>> Sep 30, 2008 11:41:19 AM org.apache.solr.handler.dataimport.DataImporter >>> doFullImport >>> SEVERE: Full Import failed >>> java.util.ConcurrentModificationException >>>at >>> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) >>>at java.util.AbstractList$Itr.next(AbstractList.java:343) >>>at >>> >>> org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402) >>>at >>> >>> org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373) >>>at >>> >>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304) >>>at >>> >>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) >>>at >>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) >>>at >>> >>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >>>at >>> >>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) >>>at >>> >>> org.apache.solr.handler.dataimport.DataImporter$1
Re: Searching Question
The easiest thing is to look at Lucene javadoc and look for Similarity and DefaultSimilarity classes. Then have a peek at Lucene contrib to get some other examples of custom Similarity. You'll just need to override one method, for example: -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Jake Conk <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Tuesday, September 30, 2008 3:11:01 PM > Subject: Re: Searching Question > > How would I write a custom Similarity factor that overrides the TF > function? Is there some documentation on that somewhere? > > On Sat, Sep 27, 2008 at 5:14 AM, Grant Ingersoll wrote: > > > > On Sep 26, 2008, at 2:10 PM, Otis Gospodnetic wrote: > > > >> It might be easiest to store the thread ID and the number of replies in > >> the thread in each post Document in Solr. > > > > Yeah, but that would mean updating every document in a thread every time a > > new reply is added. > > > > I still keep going back to the solution as putting all the replies in a > > single document, and then using a custom Similarity factor that overrides > > the TF function and/or the length normalization. Still, this suffers from > > having to update the document for every new reply. > > > > Let's take a step back... > > > > Can I ask why you want the scoring this way? What have you seen in your > > results that leads you to believe it is the correct way? Note, I'm not > > trying to convince you it's wrong, I just want to better understand what's > > going on. > > > > > >> > >> > >> Otherwise it sounds like you'll have to combine some search results or > >> data post-search. > >> > >> Otis > >> -- > >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > >> > >> > >> > >> - Original Message > >>> > >>> From: Jake Conk > >>> To: solr-user@lucene.apache.org > >>> Sent: Friday, September 26, 2008 1:50:37 PM > >>> Subject: Re: Searching Question > >>> > >>> Grant, > >>> > >>> Each post is its own document but I can merge them all into a single > >>> document under one thread if that will allow me to do what I want. > >>> The number of replies is stored both in Solr and the DB. > >>> > >>> Thanks, > >>> > >>> - JC > >>> > >>> On Fri, Sep 26, 2008 at 5:24 AM, Grant Ingersoll wrote: > > Is a thread and all of it's posts a single document? In other words, > how > are you modeling your posts as Solr documents? Also, where are you > keeping > track of the number of replies? Is that in Solr or in a DB? > > -Grant > > On Sep 25, 2008, at 8:51 PM, Jake Conk wrote: > > > Hello, > > > > We are using Solr for our new forums search feature. If possible when > > searching for the word "Halo" we would like threads that contain the > > word "Halo" the most with the least amount of posts in that thread to > > have a higher score. > > > > For instance, if we have a thread with 10 posts and the word "Halo" > > shows up 5 times then that should have a lower score than a thread > > that has the word "Halo" 3 times within its posts and has 5 replies. > > Basically the thread that shows the search string most frequently > > amongst the number of posts in the thread should be the one with the > > highest score. > > > > Is something like this possible? > > > > Thanks, > > > > > >
Re: Integrating external stemmer in Solr and pre-processing text
Hi, The suggested approach with a TokenFilter extending the BufferedTokenStream class works fine, performance is OK - the external stemmer is now invoked only once for the complete search text. Also, from a functional point of view, the approach is useful, because it allows for other filtering (i.e WordDelimiterFilter with the various useful options) to be done before stemming takes place. Code is roughly like this for the process() function of the custom Filter class: protected Token process (Token token) { StringBuilderstringBuilder = new StringBuilder(); TokennextToken; IntegertokenPos = 0; MaptokenMap = new LinkedHashMap(); stringBuilder.append(token.term()).append(' '); tokenMap.put(tokenPos++, token); nextToken= read(); while (nextToken != null) { stringBuilder.append(nextToken.term()).append(' '); tokenMap.put(tokenPos++, nextToken); nextToken= read(); } StringinputText = stringBuilder.toString(); StringstemmedText = stemText(inputText); String[] stemmedWords= stemmedText.split("\\s"); for (Map.Entry entry : tokenMap.entrySet()) { Integerpos= entry.getKey(); Tokentok = entry.getValue(); tok.setTermBuffer(stemmedWords[pos]); write(tok); } return null; } } This will need some work and additional error checking, and I'll probably put a maximum om the number of tokens that is to be processed in one go to make sure things don't get too big in memory. Thanks for helping out! Bye, Jaco. 2008/9/26 Jaco <[EMAIL PROTECTED]> > Thanks for these suggestions, will try it in the coming days and post my > findings in this thread. > > Bye, > > > Jaco. > > 2008/9/26 Grant Ingersoll <[EMAIL PROTECTED]> > >> >> On Sep 26, 2008, at 12:05 PM, Jaco wrote: >> >> Hi Grant, >>> >>> In reply to your questions: >>> >>> 1. Are you having to restart/initialize the stemmer every time for your >>> "slow" approach? Does that really need to happen? >>> >>> It is invoking a COM object in Windows. The object is instantiated once >>> for >>> a token stream, and then invoked once for each token. The invoke always >>> has >>> an overhead, not much to do about that (sigh...) >>> >>> 2. Can the stemmer return something other than a String? Say a String >>> array >>> of all the stemmed words? Or maybe even some type of object that tells >>> you >>> the original word and the stemmed word? >>> >>> The stemmer can only return a String. But, I do know that the returned >>> string always has exactly the same number of words as the input string. >>> So >>> logically, it would be possible to : >>> a) first calculate the position/start/end of each token in the input >>> string >>> (usual tokenization by Whitespace), resulting in token list 1 >>> b) then invoke the stemmer, and tokenize that result by Whitespace, >>> resulting in token list 2 >>> c) 'merge' the token values of token list 2 into token list 1, which is >>> possible because each token's position is the same in both lists... >>> d) return that 'merged' token list 2 for further processing >>> >>> Would this work in Solr? >>> >> >> I think so, assuming your stemmer tokenizes on whitespace as well. >> >> >>> >>> I can do some Java coding to achieve that from logical point of view, but >>> I >>> wouldn't know how to structure this flow into the MyTokenizerFactory, so >>> some hints to achieve that would be great! >>> >> >> >> One thought: >> Don't create an all in one Tokenizer. Instead, keep the Whitespace >> Tokenizer as is. Then, create a TokenFilter that buffers the whole document >> into a memory (via the next() implementation) and also creates, using >> StringBuilder, a string containing the whole text. Once you've read it all >> in, then send the string to your stemmer, parse it back out and associate it >> back to your token buffer. If you are guaranteed position, you could even >> keep a (linked) hash, such that it is really quick to look up tokens after >> stemming. >> >> Pseudocode looks something like: >> >> while (token.next != null) >> tokenMap.put(token.position, token) >> stringBuilder.append(' ').append(token.text) >> >> stemmedText = comObj.stem(stringBuilder.toString()) >> correlateStemmedText(stemmedText, tokenMap) >> >> spit out the tokens one by one... >> >> >> I think this approach should be fast (but maybe not as fast as your all in >> one tokenizer) and will provide the correct position and offsets. You do >> have to be careful w/ really big documents, as that map can be big. You >> also want to be careful about map reuse, token reuse, etc. >> >> I believe there are a couple of buffering TokenFilters in Solr that you >> could examine for inspiration. I think the RemoveDuplicatesTokenFilter (or >> whatever it's called) does buffer
Re: Indexing Large Files with Large DataImport: Problems
Yes, this is the most recent version of Solr, stream="true" and stopwords, lowercase and removeDuplicate being applied to all multivalued fields? Would the filters possibly be causing this? I will not use them and see what happens. Kyle Shalin Shekhar Mangar wrote: > > Hmm, strange. > > This is Solr 1.3.0, right? Do you have any transformers applied to these > multi-valued fields? Do you have stream="true" in the entity? > > On Tue, Sep 30, 2008 at 11:01 PM, KyleMorrison <[EMAIL PROTECTED]> wrote: > >> >> I apologize for spamming this mailing list with my problems, but I'm at >> my >> wits end. I'll get right to the point. >> >> I have an xml file which is ~1GB which I wish to index. If that is >> successful, I will move to a larger file of closer to 20GB. However, when >> I >> run my data-config(let's call it dc.xml) over it, the import only manages >> to >> get about 27 rows, out of roughly 200K. The exact same >> data-config(dc.xml) >> works perfectly on smaller data files of the same type. >> >> This data-config is quite large, maybe 250 fields. When I run a smaller >> data-config (let's call it sdc.xml) over the 1GB file, the sdc.xml works >> perfectly. The only conclusion I can draw from this is that the >> data-config >> method just doesn't scale well. >> >> When the dc.xml fails, the server logs spit out: >> >> Sep 30, 2008 11:40:18 AM org.apache.solr.core.SolrCore execute >> INFO: [] webapp=/solr path=/dataimport params={command=full-import} >> status=0 >> QTime=95 >> Sep 30, 2008 11:40:18 AM org.apache.solr.handler.dataimport.DataImporter >> doFullImport >> INFO: Starting Full Import >> Sep 30, 2008 11:40:18 AM org.apache.solr.update.DirectUpdateHandler2 >> deleteAll >> INFO: [] REMOVING ALL DOCUMENTS FROM INDEX >> Sep 30, 2008 11:40:20 AM org.apache.solr.handler.dataimport.DataImporter >> doFullImport >> SEVERE: Full Import failed >> java.util.ConcurrentModificationException >>at >> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) >>at java.util.AbstractList$Itr.next(AbstractList.java:343) >>at >> >> org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402) >>at >> >> org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373) >>at >> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304) >>at >> >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) >>at >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) >>at >> >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >>at >> >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) >>at >> >> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) >> Sep 30, 2008 11:41:18 AM org.apache.solr.core.SolrCore execute >> INFO: [] webapp=/solr path=/dataimport params={command=full-import} >> status=0 >> QTime=77 >> Sep 30, 2008 11:41:18 AM org.apache.solr.handler.dataimport.DataImporter >> doFullImport >> INFO: Starting Full Import >> Sep 30, 2008 11:41:18 AM org.apache.solr.update.DirectUpdateHandler2 >> deleteAll >> INFO: [] REMOVING ALL DOCUMENTS FROM INDEX >> Sep 30, 2008 11:41:19 AM org.apache.solr.handler.dataimport.DataImporter >> doFullImport >> SEVERE: Full Import failed >> java.util.ConcurrentModificationException >>at >> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) >>at java.util.AbstractList$Itr.next(AbstractList.java:343) >>at >> >> org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402) >>at >> >> org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373) >>at >> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304) >>at >> >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) >>at >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) >>at >> >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >>at >> >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) >>at >> >> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) >> >> This mass of exceptions DOES NOT occur when I perform the same >> full-import >> with sdc.xml. As far as I can tell, the only difference between the two >> files is the amount of fields they contain. >> >> Any guidance or information would be greatly appreciated. >> Kyle >> >> >> PS The schema.xml in use specifies almost all fields as multivalued, and >> has >> a copyfield for almost every field. I can fix this if it is causing my >> problem, but I would prefer not to. >> -- >> View this message in context: >> http://www.nabble.com/Indexing-Large-Files-with-Large-DataImport%3A-Problems-tp19746831p1974683
Re: Indexing Large Files with Large DataImport: Problems
Hmm, strange. This is Solr 1.3.0, right? Do you have any transformers applied to these multi-valued fields? Do you have stream="true" in the entity? On Tue, Sep 30, 2008 at 11:01 PM, KyleMorrison <[EMAIL PROTECTED]> wrote: > > I apologize for spamming this mailing list with my problems, but I'm at my > wits end. I'll get right to the point. > > I have an xml file which is ~1GB which I wish to index. If that is > successful, I will move to a larger file of closer to 20GB. However, when I > run my data-config(let's call it dc.xml) over it, the import only manages > to > get about 27 rows, out of roughly 200K. The exact same data-config(dc.xml) > works perfectly on smaller data files of the same type. > > This data-config is quite large, maybe 250 fields. When I run a smaller > data-config (let's call it sdc.xml) over the 1GB file, the sdc.xml works > perfectly. The only conclusion I can draw from this is that the data-config > method just doesn't scale well. > > When the dc.xml fails, the server logs spit out: > > Sep 30, 2008 11:40:18 AM org.apache.solr.core.SolrCore execute > INFO: [] webapp=/solr path=/dataimport params={command=full-import} > status=0 > QTime=95 > Sep 30, 2008 11:40:18 AM org.apache.solr.handler.dataimport.DataImporter > doFullImport > INFO: Starting Full Import > Sep 30, 2008 11:40:18 AM org.apache.solr.update.DirectUpdateHandler2 > deleteAll > INFO: [] REMOVING ALL DOCUMENTS FROM INDEX > Sep 30, 2008 11:40:20 AM org.apache.solr.handler.dataimport.DataImporter > doFullImport > SEVERE: Full Import failed > java.util.ConcurrentModificationException >at > java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) >at java.util.AbstractList$Itr.next(AbstractList.java:343) >at > > org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402) >at > > org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373) >at > > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304) >at > > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) >at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) >at > > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >at > > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) >at > > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) > Sep 30, 2008 11:41:18 AM org.apache.solr.core.SolrCore execute > INFO: [] webapp=/solr path=/dataimport params={command=full-import} > status=0 > QTime=77 > Sep 30, 2008 11:41:18 AM org.apache.solr.handler.dataimport.DataImporter > doFullImport > INFO: Starting Full Import > Sep 30, 2008 11:41:18 AM org.apache.solr.update.DirectUpdateHandler2 > deleteAll > INFO: [] REMOVING ALL DOCUMENTS FROM INDEX > Sep 30, 2008 11:41:19 AM org.apache.solr.handler.dataimport.DataImporter > doFullImport > SEVERE: Full Import failed > java.util.ConcurrentModificationException >at > java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) >at java.util.AbstractList$Itr.next(AbstractList.java:343) >at > > org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402) >at > > org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373) >at > > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304) >at > > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) >at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) >at > > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >at > > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) >at > > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) > > This mass of exceptions DOES NOT occur when I perform the same full-import > with sdc.xml. As far as I can tell, the only difference between the two > files is the amount of fields they contain. > > Any guidance or information would be greatly appreciated. > Kyle > > > PS The schema.xml in use specifies almost all fields as multivalued, and > has > a copyfield for almost every field. I can fix this if it is causing my > problem, but I would prefer not to. > -- > View this message in context: > http://www.nabble.com/Indexing-Large-Files-with-Large-DataImport%3A-Problems-tp19746831p19746831.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Regards, Shalin Shekhar Mangar.
Re: Searching Question
How would I write a custom Similarity factor that overrides the TF function? Is there some documentation on that somewhere? On Sat, Sep 27, 2008 at 5:14 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > On Sep 26, 2008, at 2:10 PM, Otis Gospodnetic wrote: > >> It might be easiest to store the thread ID and the number of replies in >> the thread in each post Document in Solr. > > Yeah, but that would mean updating every document in a thread every time a > new reply is added. > > I still keep going back to the solution as putting all the replies in a > single document, and then using a custom Similarity factor that overrides > the TF function and/or the length normalization. Still, this suffers from > having to update the document for every new reply. > > Let's take a step back... > > Can I ask why you want the scoring this way? What have you seen in your > results that leads you to believe it is the correct way? Note, I'm not > trying to convince you it's wrong, I just want to better understand what's > going on. > > >> >> >> Otherwise it sounds like you'll have to combine some search results or >> data post-search. >> >> Otis >> -- >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> >> >> >> - Original Message >>> >>> From: Jake Conk <[EMAIL PROTECTED]> >>> To: solr-user@lucene.apache.org >>> Sent: Friday, September 26, 2008 1:50:37 PM >>> Subject: Re: Searching Question >>> >>> Grant, >>> >>> Each post is its own document but I can merge them all into a single >>> document under one thread if that will allow me to do what I want. >>> The number of replies is stored both in Solr and the DB. >>> >>> Thanks, >>> >>> - JC >>> >>> On Fri, Sep 26, 2008 at 5:24 AM, Grant Ingersoll wrote: Is a thread and all of it's posts a single document? In other words, how are you modeling your posts as Solr documents? Also, where are you keeping track of the number of replies? Is that in Solr or in a DB? -Grant On Sep 25, 2008, at 8:51 PM, Jake Conk wrote: > Hello, > > We are using Solr for our new forums search feature. If possible when > searching for the word "Halo" we would like threads that contain the > word "Halo" the most with the least amount of posts in that thread to > have a higher score. > > For instance, if we have a thread with 10 posts and the word "Halo" > shows up 5 times then that should have a lower score than a thread > that has the word "Halo" 3 times within its posts and has 5 replies. > Basically the thread that shows the search string most frequently > amongst the number of posts in the thread should be the one with the > highest score. > > Is something like this possible? > > Thanks, > > >
Re: Monitoring solr stats with munin?
: > has anyone had the need and maybe already written a munin plugin to graph : > some informations from e.g. admin/stats.jsp ? : Something like that, though I havn't seen anything available publicly yet. Its Anything exposed via stats.jsp should also be available via JMX (if you enable JMX) ... and agoogle seems to think there is a JMX plugin for munin (even though i don't really understand what munin is) http://muninexchange.projects.linpro.no/?search&cid=38&pid=29 -Hoss
Re: Calculated Unique Key Field
On Wed, Oct 1, 2008 at 12:08 AM, Jim Murphy <[EMAIL PROTECTED]> wrote: > > Question1: Is this the best place to do this? This sounds like a job for http://wiki.apache.org/solr/UpdateRequestProcessor -- Regards, Shalin Shekhar Mangar.
Re: Applying Stop words for Field Type String
: Question : Is it possible to do the same for String type or not, since the StrField doesn't support an analyzer like TextField does, but if you define "string" to be a TextField using KeywordTokenizer it will preserve the whole value as a single token and you can then use the StopWordFilterFactory to throw out values which are stop words. The stored value for a TextField and a StrField are returned to clients in exactly the same way. -Hoss
Re: Dismax , "query phrases"
: That's why I was wondering how Dismax breaks it all apart. It makes sense...I : suppose what I'd like to have is a way to tell dismax which fields NOT to : tokenize the input for. For these fields, it would pass the full q instead of : each part of it. Does this make sense? would it be useful at all? the *goal* makes sense, but the implementation would be ... problematic. you have to remember the DisMax parser's whole way of working is to make each "chunk" of input match against any qf field, and find the highest scoring field for each chunk, with this input... q = some phase & qf = a b c ...you get... ( (a:some | b:some | c:some) (a:phrase | b:phrase | c:phrase) ) ...even if dismax could tell that "c" was a field that should only support exact matches, how would it fit c:"some phrase" into that structure? I've already kinda forgotten how this thread started ... but would it make sense to just use your "exact" fields in the pf, and have inexact versions of them in the qf? then docs that match your input exactly should score at the top, but less exact matches will also still match. -Hoss
Re: Calculated Unique Key Field
It may not be all that relevant but our Update handler extends from DirectUpdateHandler2. -- View this message in context: http://www.nabble.com/Calculated-Unique-Key-Field-tp19747955p19748032.html Sent from the Solr - User mailing list archive at Nabble.com.
Discarding undefined fields in query
Hi All, I wrote a customized query parser which discards non-schema fields from the query (I'm using the schema field names from req.getSchema().getFields().keySet() ) . This parser works fine in unit tests. But still I have an error from the webapp when I try to query my schema with non existing fields in my query ( like foo:bar ). I'm wondering if the query q is parsed in a very simple way somewhere else (and independently from any customized QParserPlugin) and checked against the schema. Is there an option to modify this behaviour so undefined fields in a query could be simply discarded instead of throwing an error ? Cheers ! Jerome. -- Jerome Eteve. Chat with me live at http://www.eteve.net [EMAIL PROTECTED]
Calculated Unique Key Field
My unique key field is an MD5 hash of several other fields that represent identity of documents in my index. We've been calculating this externally and setting the key value in documents but have found recurring bugs as the number and variety of inserting consumers has grown... So I wanted to move to calculating these at "add" time. We already have our own UpdateHandler, extending from DirectUpdateHandler, so I extended its addDoc method to do the hashing and field setting. Here's the implementation highlights: String postGuid = // set the value - overwrite if already present { SolrInputField postGuidField = doc.getField(POST_GUID_NAME); if (postGuidField != null) { postGuidField.setValue(postGuid, DEFAULT_BOOST); } else { doc.addField(POST_GUID_NAME, postGuid); } } { // add guid field to the lucene doc too - huh. Document lucDoc = cmd.getLuceneDocument(schema); Field aiPostGuidField = lucDoc.getField(POST_GUID_NAME); if (aiPostGuidField != null) { aiPostGuidField.setValue(postGuid); } else { SchemaField aiPostGuidSchemaField = schema.getField(POST_GUID_NAME); Field postGuidField = aiPostGuidSchemaField.createField(postGuid, DEFAULT_BOOST); lucDoc.add(postGuidField); } } Question1: Is this the best place to do this? Question2: Is there a way around adding it to both the SolrDocument and the Lucene Document? Thoughts? Best regards, Jim -- View this message in context: http://www.nabble.com/Calculated-Unique-Key-Field-tp19747955p19747955.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: French synonyms & Online synonyms
True, synonyms can be grouped in cliques based on the strength of their "resemblence" given a specific context. But what I'm indexing is the text content of TV programs produced by a public television, so the context is very large and non-specific. What I want is to find "automobile" for "car", "motorcycle" for "bike", "pub" for "restaurant", "woman" for "lady", and the likes. There actually are free on-line resources for most European languages (of course, English included), check these out: http://dico.isc.cnrs.fr/dico_html/en/index.html http://www.crisco.unicaen.fr/alexandria2.html Would you mind commenting on the following plan for a special synonym analyzer. 1/ We would start with an empty synonyms file. 2/ For each indexing request, the analyser looks up the file for synonyms. If it finds synonyms, it proceeds normally. 3/ Otherwise, it checks an online resource for synonyms, updates the synonyms file, and proceeds. If you think this is workable, there are two problems left: which terms to look up for online synonyms, and how to select the "synonymity" clique. For the first issue, I would definitely only search for synonyms of nouns, verbs and adjectives, so some stemming is required initially. For the second issue, I'd have a cut-off value for the strength of "resemblence", if this information is available, or / and use the frequency of the synonyms in the SOLR index as a measure. Building the synonyms file that way would make the system quicker over time, and for a specific domain (chemistry, biology, sports, etc) the process would be auto-adaptive - perhaps with some human help from time to time. Thanks, Pierre Walter Underwood a écrit : Synonyms are domain-specific, so general-purpose lists are not very useful. Ultraseek shipped a British-American synonym list as an example, but even that wasn't very general. One of our customers was a chemical company and was very surprised when the search "rocket fuel" suggested "arugula", even though "rocket" is a perfectly good synonym for "arugula". wunder On 9/30/08 10:14 AM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote: Pierre, 1) I don't know, but a good place to check and see what previous answers to this questions were is markmail.org 2) I don't think there is such a thing, but I also don't think there are sites that make this data freely available (answer to 1?) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Pierre Auslaender <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, September 30, 2008 11:28:40 AM Subject: French synonyms & Online synonyms Hello, I'm sure these questions have been raised a million times, I'll try one more: 1/ Is there any general-purpose, free, French synonyms file out there? 2/ Is there a Solr or Lucene analyser class that could tap an on-line resource for synoynms at index-time? And by the same token, maintain and complete a synoynms text file? Thanks for the great work on SOLR and for the liveliness of this list. Pierre
Re: commit not fired
: When I check my commit.log nothings is runned commit.log is only updated by the bin/commit script ... not by Solr itself. you'll see Solr log commits in whatever logs are kept by your servlet container. : My snapshooter too: but no log in snapshooter.log : : : ./data/solr/book/logs/snapshooter I believe Shalin or Bill already commented on this in another thread ... those paths really don't look right. -Hoss
Re: Indexing Large Files with Large DataImport: Problems
Exception indicates a threading bug, not a scaling issue... I'm sure the issue will be illuminated on soon though. KyleMorrison wrote: I apologize for spamming this mailing list with my problems, but I'm at my wits end. I'll get right to the point. I have an xml file which is ~1GB which I wish to index. If that is successful, I will move to a larger file of closer to 20GB. However, when I run my data-config(let's call it dc.xml) over it, the import only manages to get about 27 rows, out of roughly 200K. The exact same data-config(dc.xml) works perfectly on smaller data files of the same type. This data-config is quite large, maybe 250 fields. When I run a smaller data-config (let's call it sdc.xml) over the 1GB file, the sdc.xml works perfectly. The only conclusion I can draw from this is that the data-config method just doesn't scale well. When the dc.xml fails, the server logs spit out: Sep 30, 2008 11:40:18 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=95 Sep 30, 2008 11:40:18 AM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Sep 30, 2008 11:40:18 AM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Sep 30, 2008 11:40:20 AM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402) at org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) Sep 30, 2008 11:41:18 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=77 Sep 30, 2008 11:41:18 AM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Sep 30, 2008 11:41:18 AM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Sep 30, 2008 11:41:19 AM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402) at org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) This mass of exceptions DOES NOT occur when I perform the same full-import with sdc.xml. As far as I can tell, the only difference between the two files is the amount of fields they contain. Any guidance or information would be greatly appreciated. Kyle PS The schema.xml in use specifies almost all fields as multivalued, and has a copyfield for almost every field. I can fix this if it is causing my problem, but I would prefer not to.
Indexing Large Files with Large DataImport: Problems
I apologize for spamming this mailing list with my problems, but I'm at my wits end. I'll get right to the point. I have an xml file which is ~1GB which I wish to index. If that is successful, I will move to a larger file of closer to 20GB. However, when I run my data-config(let's call it dc.xml) over it, the import only manages to get about 27 rows, out of roughly 200K. The exact same data-config(dc.xml) works perfectly on smaller data files of the same type. This data-config is quite large, maybe 250 fields. When I run a smaller data-config (let's call it sdc.xml) over the 1GB file, the sdc.xml works perfectly. The only conclusion I can draw from this is that the data-config method just doesn't scale well. When the dc.xml fails, the server logs spit out: Sep 30, 2008 11:40:18 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=95 Sep 30, 2008 11:40:18 AM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Sep 30, 2008 11:40:18 AM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Sep 30, 2008 11:40:20 AM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402) at org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) Sep 30, 2008 11:41:18 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=77 Sep 30, 2008 11:41:18 AM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Sep 30, 2008 11:41:18 AM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Sep 30, 2008 11:41:19 AM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402) at org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) This mass of exceptions DOES NOT occur when I perform the same full-import with sdc.xml. As far as I can tell, the only difference between the two files is the amount of fields they contain. Any guidance or information would be greatly appreciated. Kyle PS The schema.xml in use specifies almost all fields as multivalued, and has a copyfield for almost every field. I can fix this if it is causing my problem, but I would prefer not to. -- View this message in context: http://www.nabble.com/Indexing-Large-Files-with-Large-DataImport%3A-Problems-tp19746831p19746831.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: French synonyms & Online synonyms
Synonyms are domain-specific, so general-purpose lists are not very useful. Ultraseek shipped a British-American synonym list as an example, but even that wasn't very general. One of our customers was a chemical company and was very surprised when the search "rocket fuel" suggested "arugula", even though "rocket" is a perfectly good synonym for "arugula". wunder On 9/30/08 10:14 AM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote: > Pierre, > > 1) I don't know, but a good place to check and see what previous answers to > this questions were is markmail.org > 2) I don't think there is such a thing, but I also don't think there are sites > that make this data freely available (answer to 1?) > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: Pierre Auslaender <[EMAIL PROTECTED]> >> To: solr-user@lucene.apache.org >> Sent: Tuesday, September 30, 2008 11:28:40 AM >> Subject: French synonyms & Online synonyms >> >> Hello, >> >> I'm sure these questions have been raised a million times, I'll try one >> more: >> >> 1/ Is there any general-purpose, free, French synonyms file out there? >> >> 2/ Is there a Solr or Lucene analyser class that could tap an on-line >> resource for synoynms at index-time? And by the same token, maintain and >> complete a synoynms text file? >> >> Thanks for the great work on SOLR and for the liveliness of this list. >> >> Pierre >
Re: French synonyms & Online synonyms
Pierre, 1) I don't know, but a good place to check and see what previous answers to this questions were is markmail.org 2) I don't think there is such a thing, but I also don't think there are sites that make this data freely available (answer to 1?) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Pierre Auslaender <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Tuesday, September 30, 2008 11:28:40 AM > Subject: French synonyms & Online synonyms > > Hello, > > I'm sure these questions have been raised a million times, I'll try one > more: > > 1/ Is there any general-purpose, free, French synonyms file out there? > > 2/ Is there a Solr or Lucene analyser class that could tap an on-line > resource for synoynms at index-time? And by the same token, maintain and > complete a synoynms text file? > > Thanks for the great work on SOLR and for the liveliness of this list. > > Pierre
French synonyms & Online synonyms
Hello, I'm sure these questions have been raised a million times, I'll try one more: 1/ Is there any general-purpose, free, French synonyms file out there? 2/ Is there a Solr or Lucene analyser class that could tap an on-line resource for synoynms at index-time? And by the same token, maintain and complete a synoynms text file? Thanks for the great work on SOLR and for the liveliness of this list. Pierre
Re: Howto concatenate tokens at index time (without spaces)
I haven't used the German analyzer (either Snowball or the one we have in Lucene's contrib), but have you checked if that does the trick of keeping words together? Or maybe the compound tokenizer has this option? (check Lucene JIRA, not sure now where the compound tokenizer went) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Batzenmann <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Tuesday, September 30, 2008 7:28:53 AM > Subject: Howto concatenate tokens at index time (without spaces) > > > Hi, > > I'm looking for a way to create a fieldtype which will apart from the > whitespacedtokenized tokens also store concatenated versions of the tokens. > > The ShingleFilter does s.th. very similar but keeps spaces in between words. > In german a shoe(Schuh) you wear in your 'spare time'(Freizeit) is actually > a "Freizeitschuh" and not a "Freizeit Schuh". > The WorddelimterFactory could be incorporated for this as well if the space > character could be configured as delimiter-character - but these cant be > configured at all or am I wrong? > > Synonyms are in my opinion not the solution for this, as this it is imho > aboslutely not neccessary to persist any data for this requirement. > > cheers, Axel > -- > View this message in context: > http://www.nabble.com/Howto-concatenate-tokens-at-index-time-%28without-spaces%29-tp19740271p19740271.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing Multiple Fields with the Same Name
That was indeed the error, I apologize for wasting your time. Thank you very much for the help. Kyle Shalin Shekhar Mangar wrote: > > Is that a mis-spelling? > > mulitValued="true" > > On Thu, Sep 25, 2008 at 12:12 AM, KyleMorrison <[EMAIL PROTECTED]> wrote: > >> >> I'm trying to index fields as such: >>6100966 >>375010 >>2338917 >>1943701 >>1357528 >>3301821 >>2450046 >>8940112 >>6251457 >>293 >>6262769 >>2693214 >>2839489 >>6283093 >>2666401 >>6343085 >>1721838 >>6377309 >>3882429 >>6302075 >> >> And in the xml schema we see >> > stored="false" >> mulitValued="true"/> >> >> However, when I search for entries in PMID, the only one that ever gets >> stored is the last one in the list. For instance, q=PMID:6302075 returns >> a >> document, whereas q=PMID:3882429 does not. Shouldn't the data import >> handler >> take care of this, or am I misunderstanding the function of >> mulitValued="true"? >> >> Kyle >> >> -- >> View this message in context: >> http://www.nabble.com/Indexing-Multiple-Fields-with-the-Same-Name-tp19655285p19655285.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > -- > Regards, > Shalin Shekhar Mangar. > > -- View this message in context: http://www.nabble.com/Indexing-Multiple-Fields-with-the-Same-Name-tp19655285p19743517.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: spellcheck: buildOnOptimize?
On Fri, Sep 26, 2008 at 9:33 AM, Shalin Shekhar Mangar < [EMAIL PROTECTED]> wrote: > Jason, can you please open a jira issue to add this feature? > Done. https://issues.apache.org/jira/browse/SOLR-795 Jason
spellcheck: substitutions, but no inserts or deletes
I've been testing the SpellCheckComponent for use on StyleFeeder. It seems to do a great job of suggesting character substitutions, but I haven't seen any deletion/insertion suggestions. I've tried decreasing the "accuracy" parameter to 0.5. Some queries I've tried are: bluea: suggests "blues" (should be "blue") yello: no suggestions (should be "yellow") candyz: suggests "candyâ" (should be "candy") chane: no suggestions (should be "chanel") It looks to me like it is only willing to make character substitutions and is unwilling to insert/delete characters. Does anyone know why it might be behaving this way? I'm certain that the "should be" words appear fairly frequently in the field I used for spellcheck indexing. And, I reindexed the documents after setting up the spellchecker. Not sure if this would help to debug, but I noticed that words appear with different frequency in the spellcheck index file (.cfs in the spellcheck dir). I.e. here's what I get for a few variants on "blue": [EMAIL PROTECTED] spellchecker]$ strings _2y.cfs | grep ^blue$|wc 46 46 230 [EMAIL PROTECTED] spellchecker]$ strings _2y.cfs | grep ^bluea$|wc 0 0 0 [EMAIL PROTECTED] spellchecker]$ strings _2y.cfs | grep ^blues$|wc 3 3 18 All the "should be" words appear 10+ times. The misspellings appear 0 or 1 times. Any help is appreciated. Thanks, Jason
Howto concatenate tokens at index time (without spaces)
Hi, I'm looking for a way to create a fieldtype which will apart from the whitespacedtokenized tokens also store concatenated versions of the tokens. The ShingleFilter does s.th. very similar but keeps spaces in between words. In german a shoe(Schuh) you wear in your 'spare time'(Freizeit) is actually a "Freizeitschuh" and not a "Freizeit Schuh". The WorddelimterFactory could be incorporated for this as well if the space character could be configured as delimiter-character - but these cant be configured at all or am I wrong? Synonyms are in my opinion not the solution for this, as this it is imho aboslutely not neccessary to persist any data for this requirement. cheers, Axel -- View this message in context: http://www.nabble.com/Howto-concatenate-tokens-at-index-time-%28without-spaces%29-tp19740271p19740271.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Running Solr1.3 with multicore support
Hi Saurabh Bhutyani, Is it show the two core links in ur solr home page like Admin core0 Admin core1 if not,the problem is you are upgrading the solr from 1.2 to 1.3. Better stop the server delete all the floders in %Tomcat_Home%\work\Catalina\localhost location and restart it. Hope it will work. Regards Prabhu.K -- View this message in context: http://www.nabble.com/Running-Solr1.3-with-multicore-support-tp19722268p19739928.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question about facet.prefix usage
If I'm not mistaken, doesn't facet.query accomplish what you want? Erik On Sep 29, 2008, at 5:43 PM, Simon Hu wrote: I also need the exact same feature. I was not able to find an easy solution and ended up modifying class SimpleFacets to make it accept an array of facet prefixes per field. If you are interested, I can email you the modified SimpleFacets.java. -Simon steve berry-2 wrote: Question: Is it possible to pass complex queries to facet.prefix? Example instead of facet.prefix:foo I want facet.prefix:foo OR facet.prefix:bar My application is for browsing business records that fall into categories. The user is only allowed to see businesses falling into categories which they have access to. I have a series of documents dumped into the following basic structure which I was hoping would help me deal with this: 123 Business Corp. 28255-0001 . charlotte_2006 Banks charlotte_2007 Banks sanfrancisco_2006 Banks sanfrancisco_2007 Banks ... (lots more market_category entries) ... 124 Factory Corp. 28205-0001 . charlotte_2006 Banks charlotte_2007 Banks austin_2006 Banks austin_2007 Banks ... (lots more market_category entries) ... . The multivalued market_category fields are flattened relational data attributed to that business and I want to use those values for facted navigation /but/ I want the facets to be restricted depending on what products the user has access to. For example a user may have access to sanfrancisco_2007 and sanfrancisco_2006 data but nothing else. So I've created a request using facet.prefix that looks something like this: http://SOLRSERVER:8080/solr/select?q.op=AND&q=docType:gen&facet.field=market_category&facet.prefix=charlotte_2007 This ends up producing perfectly suitable facet results that look like this: .. 1 1 1 1 1 1 0 . Bingo! facet.prefix does exactly what I want it to. Now I want to go a step further and pass a compound statement to the facet.prefix along the lines of "facet.prefix:charlotte_2007 OR sanfrancisco_2007" or "facet.prefix:charlotte_2007 OR charlotte_2006" to return more complex facet sets. As far as I can tell looking at the docs this won't work. Is this possible using the existing facet.prefix functionality? Anyone have a better idea of how I should accomplish this? Thanks, steve berry American City Business Journals -- View this message in context: http://www.nabble.com/Question-about-facet.prefix-usage-tp15836501p19732310.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Extending Solr with custom filter
Wiadomość napisana w dniu 2008-09-12, o godz. 17:58, przez Andrzej Bialecki: ok .. that? I recommend using Stempelator (or Morfologik) for Polish stemming and lemmatization. It provides a superset of Stempel features, namely in addition to the algorithmic stemming it provides a dictionary-based stemming, and these two methods nicely complement each other. I'm not familiar with Java enough to do anything more complicated than write some wrapping factory class. Stempel seems to have such classes to wrap, but I did not found any Lucene analyzer that uses Morfologik. Or am I completely wrong and it should be plugged into Solr in completely different way? -- We read Knuth so you don't have to. - Tim Peters Jarek Zgoda, R&D, Redefine [EMAIL PROTECTED]