Re: How to configure Solr in Glassfish ?
Yes, I don't know how set solr.home in glassfish with centOS. I tried to configure the solr.home, but the error log is:looking for solr.xml: /var/deploy/solr/solr.xml markrmiller wrote: What have you tried? Deploying the Solr war should be pretty straightforward. The main issue is likely setting solr.home. You likely have a lot of options there though. You can set a system property in the startup script, set a system property in the webapp context xml (if you can locate it), or I think glassfish offers a GUI to set such things. There really shouldn't be much more to it than that, but you should try and see what you run into. I havn't tried out glassfish in a couple years now. -- - Mark http://www.lucidimagination.com On Mon, Jul 20, 2009 at 8:27 AM, huenzhao huenz...@126.com wrote: I want use glassfish as the solr search server, but I don't know how to configure. Anybody knows? enzhao...@gmail.com Thanks! -- View this message in context: http://www.nabble.com/How-to-configure-Solr--in-Glassfish---tp24565758p24565758.html Sent from the Solr - User mailing list archive at Nabble.com. -- -- View this message in context: http://www.nabble.com/How-to-configure-Solr--in-Glassfish---tp24565758p24582232.html Sent from the Solr - User mailing list archive at Nabble.com.
solr indexing on same set of records with different value of unique field...not working...
hi, I need to run around 10 million records to index, by solr. I has nearly 2lakh records, so i made a program to looping it till 10 million. Here, i specified 20 fields in schema.xml file. the unoque field i set was, currentTimeStamp field. So, when i run the loader program (which loads xml data into solr) it creates currentTimestamp value...and loads into solr. For this situation, i stopped the loader program, after 100 records indexed into solr. Then again, i run the loader program for the SAME 100 records to indexed means, the solr results 100, rather than 200. Because, i set currentTimeStamp field as uniqueField. So i expect the result as 200, if i run again the same 100 records... Any suggestions please... regards, Noor
Linguistic variation support
Hi, I am implementing linguistic variations in solr search engine. I want to implement this for US/UK/CA/AU english. e.g. Color (UK) = Colour (US) when user searches for either of the word, both results should appear. I don't want to use synonym.txt as this will make synonym.txt very long. Please let me know how can we do this. Thanks, Prerna -- View this message in context: http://www.nabble.com/Linguistic-variation-support-tp24583581p24583581.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Lemmatisation support in Solr
I think that to get the best results you need some kind of natural language processing I'm trying to do so using UIMA but i need to integrate it with SOLR as I explain in this post http://www.nabble.com/Solr-and-UIMA-tc24567504.html prerna07 wrote: Hi, I am implementing Lemmatisation in Solr, which means if user looks for Mouse then it should display results of Mouse and Mice both. I understand that this is something context search. I think of using synonym for this but then synonyms.txt will be having so many records and this will keep on adding. Please suggest how I can implement it in some other way. Thanks, Prerna -- View this message in context: http://www.nabble.com/Lemmatisation-support-in-Solr-tp24583655p24583841.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: method inform of SolrCoreAware callled 2 times
I am with a nightly from middle june Noble Paul നോബിള് नोब्ळ्-2 wrote: it is not normal to get the inform() called twice for a single object. which version of solr are you using? On Mon, Jul 20, 2009 at 7:17 PM, Marc Sturlesemarc.sturl...@gmail.com wrote: Hey there, I have implemented a custom component wich extends SearchComponent and implements SolrCoreAware. I have decalred it in solrconfig.xml as: searchComponent name=mycomp class=solr.MyCustomComponent And added it in my Searchhandler as: arr name=last-components strmycomp/str /arr I am using multicore with two cores. I have noticed (doing some logging) that the method inform (the ones that implements SolrCoreAware) in being called 2 times per each core when I start my solr instance. As I understood SolrCoreAware inform method should be just called once per core, am I right or it's normal that is is called 2 times per core? -- View this message in context: http://www.nabble.com/method-inform-of-SolrCoreAware-callled-2-times-tp24570221p24570221.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- View this message in context: http://www.nabble.com/method-inform-of-SolrCoreAware-callled-2-times-tp24570221p24584667.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: index version on slave
on the slave this command would not work well. The indexversion is not the actual index version. It is the current replicateable index version. why do you call that API directly? On Tue, Jul 21, 2009 at 12:53 AM, solr jaysolr...@gmail.com wrote: If you ask for the index version of a slave instance, you always get version number being 0. Is it expected behavior? I am using this url http://slave_host:8983/solr/replication?command=indexversion This request returns correct version on master. If you use the 'details' command, you get the right version number (and generation number, and it gives more than what you want). Thanks, -- J -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Highlight arbitrary text
On Fri, 17 Jul 2009 16:04:24 +0200, Anders Melchiorsen m...@cup.kalibalik.dk wrote: On Thu, 16 Jul 2009 10:56:38 -0400, Erik Hatcher e...@ehatchersolutions.com wrote: One trick worth noting is the FieldAnalysisRequestHandler can provide offsets from external text, which could be used for client-side highlighting (see the showmatch parameter too). Thanks. I tried doing this, and it almost works. However, in the normal highlighter, I am using usePhraseHighlighter and highlightMultiTerm and it seems that there is no way to turn these on in FieldAnalysisRequestHandler ? In case these options are not available with the FieldAnalysisRequestHandler, would it be simple to implement them with a plugin? The highlightMultiTerm is absolutely needed, as we use a lot of prefix searches. Thanks, Anders.
Re: Solr and UIMA
On Jul 20, 2009, at 6:43 AM, JCodina wrote: D: Break things down. The CAS would only produce XML that solr can process. Then different Tokenizers can be used to deal with the data in the CAS. the main point is that the XML has a the doc and field labels of solr. I just committed the DelimitedPayloadTokenFilterFactory, I suspect this is along the lines of what you are thinking, but I haven't done all that much with UIMA. I also suspect the Tee/Sink capabilities of Lucene could be helpful, but they aren't available in Solr yet. E: The set of capabilities to process the xml is defined in XML, similar to lucas to define the ouput and in the solr schema to define how this is processed. I want to use it in order to index something that is common but I can't get any tool to do that with sol: indexing a word and coding at the same position the syntactic and semantic information. I know that in Lucene this is evolving and it will be possible to include metadata but for the moment What does Lucas do with Lucene? Is it putting multiple tokens at the same position or using Payloads? -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: to index Ms-outlook(.Pst) files to solr tika
http://wiki.apache.org/solr/ExtractingRequestHandler contains several examples of posting files to Solr for Tika. FYI, I don't know if PST files are supported by Tika. -Grant On Jul 21, 2009, at 4:38 AM, Brindha wrote: Hi, How to index Ms-outlook(.Pst) files to solr tika.I have posted the Ms-outlook(.Pst) file directly to solr,the file also gets posted but with empty content. -- View this message in context: http://www.nabble.com/to-index-Ms-outlook%28.Pst%29-files-to-solr-tika-tp24583846p24583846.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Lemmatisation support in Solr
Sounds like you need a TokenFilter that does lemmatisation. I don't know of any open ones off hand, but I haven't looked all that hard. On Jul 21, 2009, at 4:25 AM, prerna07 wrote: Hi, I am implementing Lemmatisation in Solr, which means if user looks for Mouse then it should display results of Mouse and Mice both. I understand that this is something context search. I think of using synonym for this but then synonyms.txt will be having so many records and this will keep on adding. Please suggest how I can implement it in some other way. Thanks, Prerna -- View this message in context: http://www.nabble.com/Lemmatisation-support-in-Solr-tp24583655p24583655.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: index version on slave
oh, in case of index data corrupted on slave, I want to download the entire index from master. During downloading, I want the slave be out of service and put it back after it finished. I was trying figure out how to determine downloading is done. Right now, I am calling http://slave_host:8983/solr/replication?command=detailshttp://slave_host:8983/solr/replication?command=indexversion and compare the index version on slave and on master, and put the instance back in service when this two are the same. It works fine except that the response claims that the structure of the response may change. Is this the right way to do it? Thanks, 2009/7/21 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com on the slave this command would not work well. The indexversion is not the actual index version. It is the current replicateable index version. why do you call that API directly? On Tue, Jul 21, 2009 at 12:53 AM, solr jaysolr...@gmail.com wrote: If you ask for the index version of a slave instance, you always get version number being 0. Is it expected behavior? I am using this url http://slave_host:8983/solr/replication?command=indexversion This request returns correct version on master. If you use the 'details' command, you get the right version number (and generation number, and it gives more than what you want). Thanks, -- J -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- J
Re: Solr and UIMA
Hello, Grant, there are two ways, to implement this, one is payloads, and the other one is multiple tokens at the same positions. Each of them can be useful, let me explain the way I thick they can be used. Payloads : every token has extra information that can be used in the processing , for example if I can add Part-of-speech then I can develop tokenizers that take into account the POS (or for example I can generate bigrams of Noum Adjective, or Noum prep Noum i can have a better stopwords algorithm) Multiple tokes in one position: If I can have different tokens at the same place, I can have different informations like: was #verb _be so I can do a search for you _be #adjective to find all the sentences that talk about you for example you were clever you are tall .. I have not understood the way that theDelimitedPayloadTokenFilterFactory may work in solr, which is the input format? so I was thinking in generating an xml where for each token a single string is generated like was#verb#be and then there is a tokenfilter that splits by # each white space separated string, in this case in three words and adds the trailing character that allows to search for the right semantic info. But gives them the same increment. Of course the full processing chain must be aware of this. But I must think on multiwords tokens Grant Ingersoll-6 wrote: On Jul 20, 2009, at 6:43 AM, JCodina wrote: D: Break things down. The CAS would only produce XML that solr can process. Then different Tokenizers can be used to deal with the data in the CAS. the main point is that the XML has the doc and field labels of solr. I just committed the DelimitedPayloadTokenFilterFactory, I suspect this is along the lines of what you are thinking, but I haven't done all that much with UIMA. I also suspect the Tee/Sink capabilities of Lucene could be helpful, but they aren't available in Solr yet. E: The set of capabilities to process the xml is defined in XML, similar to lucas to define the ouput and in the solr schema to define how this is processed. I want to use it in order to index something that is common but I can't get any tool to do that with sol: indexing a word and coding at the same position the syntactic and semantic information. I know that in Lucene this is evolving and it will be possible to include metadata but for the moment What does Lucas do with Lucene? Is it putting multiple tokens at the same position or using Payloads? -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- View this message in context: http://www.nabble.com/Solr-and-UIMA-tp24567504p24590509.html Sent from the Solr - User mailing list archive at Nabble.com.
Synonyms.txt and index_synonyms.txt
Do you anyone the differences between these two? From the schema.xml We have: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Do you know if we need both of them for search to be working? Thanks Francis
Re: Synonyms.txt and index_synonyms.txt
Hi Francis, The named of synonyms files are arbitrary, but whatever you call them needs to match what you have in solrconfig.xml If you are referring to them, then they should probably exist. If you are referring to them, then they should probably be non-empty. But think this through a bit, because it seems like the index-time vs. query-time synonyms are still a bit fuzzy for you. The Wiki has a good page on that. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Francis Yakin fya...@liquid.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tuesday, July 21, 2009 1:50:43 PM Subject: Synonyms.txt and index_synonyms.txt Do you anyone the differences between these two? From the schema.xml We have: ignoreCase=true expand=true/ words=stopwords.txt/ generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ protected=protwords.txt/ ignoreCase=true expand=true/ words=stopwords.txt/ generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ protected=protwords.txt/ Do you know if we need both of them for search to be working? Thanks Francis
Storing string field in solr.ExternalFieldFile type
We're in the process of building a log searcher application. In order to reduce the index size to improve the query performance, we're exploring the possibility of having: 1. One field for each log line with 'indexed=true stored=false' that will be used for searching 2. Another field for each log line of type solr.ExternalFileField that will be used only for display purpose. We realized that currently solr.ExternalFileField supports only float type. Is there a way we can override this to support string type? Any issues with this approach? Any ideas are welcome. Thanks, -Jibo
Re: Lemmatisation support in Solr
There are for-money solutions to this. On Tue, Jul 21, 2009 at 10:04 AM, Grant Ingersollgsing...@apache.org wrote: Sounds like you need a TokenFilter that does lemmatisation. I don't know of any open ones off hand, but I haven't looked all that hard. On Jul 21, 2009, at 4:25 AM, prerna07 wrote: Hi, I am implementing Lemmatisation in Solr, which means if user looks for Mouse then it should display results of Mouse and Mice both. I understand that this is something context search. I think of using synonym for this but then synonyms.txt will be having so many records and this will keep on adding. Please suggest how I can implement it in some other way. Thanks, Prerna -- View this message in context: http://www.nabble.com/Lemmatisation-support-in-Solr-tp24583655p24583655.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: All in one index, or multiple indexes?
It will depend on how much total volume you have. If you are discussing millions and millions of records, I'd say use multicore and shards. On Wed, Jul 8, 2009 at 5:25 AM, Tim Sell trs...@gmail.com wrote: Hi, I am wondering if it is common to have just one very large index, or multiple smaller indexes specialized for different content types. We currently have multiple smaller indexes, although one of them is much larger then the others. We are considering merging them, to allow the convenience of searching across multiple types at once and get them back in one list. The largest of the current indexes has a couple of types that belong together, it has just one text field, and it is usually quite short and is similar to product names (words like The matter). Another index I would merge with this one, has multiple text fields (also quite short). We of course would still like to be able to get specific types. Is doing filtering on just one type a big performance hit compared to just querying it from it's own index? Bare in mind all these indexes run on the same machine. (we replicate them all to three machines and do load balancing). There are a number of considerations. From an application standpoint when querying across all types we may split the results out into the separate types anyway once we have the list back. If we always do this, is it silly to have them in one index, rather then query multiple indexes at once? Is multiple http requests less significant then the time to post split the results? In some ways it is easier to maintain a single index, although it has felt easier to optimize the results for the type of content if they are in separate indexes. My main concern of putting it all in one index is that we'll make it harder to work with. We will definitely want to do filtering on types sometimes, and if we go with a mashed up index I'd prefer not to maintain separate specialized indexes as well. Any thoughts? ~Tim.
solr 1.3.0 and Oracle Fusion Middleware
Trying to install SOLR for a project. Currently we have a 10.1.3 Oracle J2EE install. I believe it satisfies the SOLR requirements. I have the war file deployed and it appears to be ½ working, but have errors with the .css file when hitting the admin page. Anyone else been successful putting SOLR on Oracle's Java Containers and are there any pointers??? -Any help would be greatly appreciated. Dave
FATAL: Solr returned an error: Invalid_Date_String
Hi, I have the following tag in my xml files: field name=timestamp2009-05-06/field When I try posting the file I get this error: FATAL: Solr returned an error: Invalid_Date_String20090506 My schema.xml file has this: field name=timestamp type=date indexed=true stored=true default=NOW multiValued=false/ How do I specify a correct date string? -- View this message in context: http://www.nabble.com/FATAL%3A-Solr-returned-an-error%3A-Invalid_Date_String-tp24594686p24594686.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: FATAL: Solr returned an error: Invalid_Date_String
Hi Dates must be in ISO 8601 format: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html e.g 1995-12-31T23:59:59Z Hope this helps Andrew McCombe 2009/7/21 Mick England mic...@mac.com Hi, I have the following tag in my xml files: field name=timestamp2009-05-06/field When I try posting the file I get this error: FATAL: Solr returned an error: Invalid_Date_String20090506 My schema.xml file has this: field name=timestamp type=date indexed=true stored=true default=NOW multiValued=false/ How do I specify a correct date string? -- View this message in context: http://www.nabble.com/FATAL%3A-Solr-returned-an-error%3A-Invalid_Date_String-tp24594686p24594686.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: FATAL: Solr returned an error: Invalid_Date_String
Thanks for the quick response. That worked for me. Andrew McCombe wrote: Dates must be in ISO 8601 format: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html e.g 1995-12-31T23:59:59Z -- View this message in context: http://www.nabble.com/FATAL%3A-Solr-returned-an-error%3A-Invalid_Date_String-tp24594686p24595148.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 1.3.0 and Oracle Fusion Middleware
What are the errors you see? On Tue, Jul 21, 2009 at 3:01 PM, Hall, David dh...@vermeer.com wrote: Trying to install SOLR for a project. Currently we have a 10.1.3 Oracle J2EE install. I believe it satisfies the SOLR requirements. I have the war file deployed and it appears to be ½ working, but have errors with the .css file when hitting the admin page. Anyone else been successful putting SOLR on Oracle's Java Containers and are there any pointers??? -Any help would be greatly appreciated. Dave -- -- - Mark http://www.lucidimagination.com
RE: solr 1.3.0 and Oracle Fusion Middleware
Jul 20, 2009 2:45:34 PM org.apache.solr.common.SolrException log SEVERE: java.lang.StackOverflowError at java.util.Properties.getProperty(Properties.java:774) at com.evermind.server.ApplicationServerSystemProperties.getProperty(ApplicationServerSystemProperties.java:43) at java.lang.System.getProperty(System.java:629) at sun.security.action.GetPropertyAction.run(GetPropertyAction.java:66) at java.security.AccessController.doPrivileged(Native Method) at java.io.PrintWriter.init(PrintWriter.java:77) at java.io.PrintWriter.init(PrintWriter.java:61) at org.apache.solr.common.SolrException.toStr(SolrException.java:160) at org.apache.solr.common.SolrException.log(SolrException.java:132) at org.apache.solr.common.SolrException.logOnce(SolrException.java:150) at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:319) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:281) ... and the errors continues... Any help appreciated. Dave -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, July 21, 2009 4:55 PM To: solr-user@lucene.apache.org Subject: Re: solr 1.3.0 and Oracle Fusion Middleware What are the errors you see? On Tue, Jul 21, 2009 at 3:01 PM, Hall, David dh...@vermeer.com wrote: Trying to install SOLR for a project. Currently we have a 10.1.3 Oracle J2EE install. I believe it satisfies the SOLR requirements. I have the war file deployed and it appears to be ½ working, but have errors with the .css file when hitting the admin page. Anyone else been successful putting SOLR on Oracle's Java Containers and are there any pointers??? -Any help would be greatly appreciated. Dave -- -- - Mark http://www.lucidimagination.com
Random Slowness
We are experiencing random slowness on certain queries. I have been unable to diagnose what the issue is. We are using SOLR 1.4 and 99.99% of queries return in under 250 ms. The remaining queries are returning in 2-5 seconds for no apparent reason. There does not seem to be any commonality between the queries. This problem also includes admin system queries. Any help or direction would be much appreciated. Specs: Solr 1.4 Tomcat Server 4 cores Largest core 155,000 documents. Logs: INFO: [zeta-main] webapp=null path=null params={command=details} status=0 QTime=1276 INFO: [zeta-main] webapp=null path=null params={command=details} status=0 QTime=1144 INFO: [zeta-main] webapp=null path=null params={command=details} status=0 QTime=1285 INFO: [zeta-main] webapp=/solr path=/select params={facet=truefacet.mincount=1facet.limit=-1wt=javabinrows=0facet.s ort=truestart=0q=shoesfacet.field=colorFacetfacet.field=brandNameFacetf acet.field=heelHeightfacet.field=attrFacet_Styleqt=dismaxfq=productTypeFa cet:Shoesfq=gender:Womensfq=categoryFacet:Sandalsfq=width:EEfq=size:10.5 fq=priceFacet:$100.00+and+Underfq=personalityFacet:Sexy} hits=19 status=0 QTime=3689 INFO: [zeta-main] webapp=/solr path=/select params={wt=javabinrows=100facet.sort=truestart=0q=shoesqt=dismaxfq=pro ductTypeFacet:Shoesfq=gender:Womensfq=size:8fq=width:Dfq=brandNameFacet: Dansko} hits=8 status=0 QTime=3566 INFO: [zeta-main] webapp=/solr path=/select params={wt=javabinrows=100facet.sort=truestart=0q=shoesqt=dismaxfq=gen der:Womensfq=productTypeFacet:Shoesfq=subCategoryFacet:Heelsfq=categoryFa cet:Shoesfq=size:10} hits=5409 status=0 QTime=3348 INFO: [zeta-main] webapp=/solr path=/select params={wt=javabinrows=100facet.sort=truestart=100q=shoesqt=dismaxfq=p roductTypeFacet:Shoesfq=gender:Womensfq=personalityFacet:Dressfq=category Facet:Shoesfq=size:10fq=heelHeight:Medium+(1+3/8in+-+2+1/2in)} hits=1129 status=0 QTime=3285 INFO: [zeta-main] webapp=/solr path=/select params={wt=javabinrows=100facet.sort=truestart=200q=shoesqt=dismaxfq=p roductTypeFacet:Shoesfq=gender:Womensfq=categoryFacet:Shoesfq=subCategory Facet:Heelsfq=personalityFacet:Dressfq=attrFacet_Style:Pumpfq=size:5} hits=644 status=0 QTime=3750 INFO: [6pm-main] webapp=/solr path=/select params={wt=javabinrows=100facet.sort=truestart=0q=shoesqt=dismaxfq=exp andedGender:Kidsfq=productTypeFacet:Shoesfq=gender:girlsfq=brandNameFacet :UGG+Kids} hits=17 status=0 QTime=3789 -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562
Re: expand synonyms without tokenizing stream?
: I'd like to take keywords in my documents, and expand them as synonyms; for : example, if the document gets annotated with a keyword of 'sf', I'd like : that to expand to 'San Francisco'. (San Francisco,San Fran,SF is a line in : my synonyms.txt file). : : But I also want to be able to display facets with counts for these keywords; : I'd like them to be suitable for display. ... : I've also done a copyfield to a 'KeywordsString' field, which is : defined as string. i.e. : : fieldType name=string class=solr.StrField sortMissingLast=true : omitNorms=true/ It sounds like you are on the right track ... the key isearch on a field with synonyms expanded (at index time) and facet on a field with synonyms collapse (at index time) try chagning the fieldtype you facet on to be a TextField with the KeywordTokenizer, and then use the SynonymFilter on it ... that should work (but i haven't tried it) if you format your synonyms file properly (commas instead of arrows), you can use the exact smae file for both fieldtypes, even though one will expand, and the other will collapse. -Hoss
Re: solr Analyzer help
Any Lucene analyzer that has a no arg constructor can be used in Solr, just specify it by full class name (there is an example of this in the example schema.xml) Any Tokenizer/TokenFilter that exists in the Lucene distribution also gets a Factory in Solr (unless someone forgets) you can use these Factories if you want to mix/match. : I also see stem filter factory and palin filtet factory for some : languages like : DutchStemFilterFactory,BrazilianStemFilterFactory.java : GermanStemFilterFactory etc : : and the plain filter like ChineseFilterFactory.java : : What is the stem filter factory does it stem the words without including : the snowball porter filter factory They are factories for the corrisponding filters ... you should look at the docs for those Filters to understand what they do (the Factories are just simple, dumb APIs for generating instances of hte Filters when configured in hte schema.xml) -Hoss
Re: solr 1.3.0 and Oracle Fusion Middleware
Thanks. Check out this thread: http://www.lucidimagination.com/search/document/b15c06f78820d1da/weblogic_10_compatibility_issue_stackoverflowerror and this wikipage: http://wiki.apache.org/solr/SolrWeblogic If it helps, please add to our wiki - if not, we can dig deeper. Thanks, -- - Mark http://www.lucidimagination.com On Tue, Jul 21, 2009 at 6:01 PM, Hall, David dh...@vermeer.com wrote: Jul 20, 2009 2:45:34 PM org.apache.solr.common.SolrException log SEVERE: java.lang.StackOverflowError at java.util.Properties.getProperty(Properties.java:774) at com.evermind.server.ApplicationServerSystemProperties.getProperty(ApplicationServerSystemProperties.java:43) at java.lang.System.getProperty(System.java:629) at sun.security.action.GetPropertyAction.run(GetPropertyAction.java:66) at java.security.AccessController.doPrivileged(Native Method) at java.io.PrintWriter.init(PrintWriter.java:77) at java.io.PrintWriter.init(PrintWriter.java:61) at org.apache.solr.common.SolrException.toStr(SolrException.java:160) at org.apache.solr.common.SolrException.log(SolrException.java:132) at org.apache.solr.common.SolrException.logOnce(SolrException.java:150) at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:319) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:281) ... and the errors continues... Any help appreciated. Dave -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, July 21, 2009 4:55 PM To: solr-user@lucene.apache.org Subject: Re: solr 1.3.0 and Oracle Fusion Middleware What are the errors you see? On Tue, Jul 21, 2009 at 3:01 PM, Hall, David dh...@vermeer.com wrote: Trying to install SOLR for a project. Currently we have a 10.1.3 Oracle J2EE install. I believe it satisfies the SOLR requirements. I have the war file deployed and it appears to be ½ working, but have errors with the .css file when hitting the admin page. Anyone else been successful putting SOLR on Oracle's Java Containers and are there any pointers??? -Any help would be greatly appreciated. Dave -- -- - Mark http://www.lucidimagination.com
Re: DutchStemFilterFactory reducing double vowels bug ?
: Some time ago I configured my Solr instance to use the : DutchStemFilterFactory. ... : Words like 'baas', 'paas', 'maan', 'boom' etc. are indexed as 'bas', : 'pas', 'man' and 'bom'. Those wordt have a meaning of their own. Am I : missing something, or has this to be considered as a bug? I know nothing about Dutch, but the DutchStemFilterFactory is just a factory for the DutchStemFilter, which is just a Lucene TOkenFilter arround the DutchStemmer which is a java impl of this algorithm... http://snowball.tartarus.org/algorithms/dutch/stemmer.html ...according to that page, Step#4 explicilty includes a reduction of doubled vowels (maan-man is an explicit example) so the code seems to be working as specified .. wether it's what you *want* is a different question. -Hoss
Re: Deleting from SolrQueryResponse
: Okay. So still, how would I go about creating a new DocList and Docset as : they cannot be instantiated? DocLists and DocSets are retrieved from the SolrIndexSearcher as results from searches. a simple javadoc search for the useages of the DocList and DocSet APIs would have given you this answer. -Hoss
Re: Regarding Response Builder
: SolrParams params = req.getParams(); : : Now I want to get the values of those params. What should be the : approach as SolrParams is an abstract class and its get(String) method : is abstract? your question seems to be more about java basics then about using Solr -- it doens't matter if SolrParams is abstract, any method (including req.getParams()) which says it returns an instance of SolrParams is required to do just that -- return an instance. the SolrParams API contract garuntees that you can call get(String) on any instance. -Hoss
Re: Solrj, tomcat and a proxy
: Subject: Solrj, tomcat and a proxy : References: 2aa3aff80907130547y124d433chec4f4bcbbfb35...@mail.gmail.com : In-Reply-To: 2aa3aff80907130547y124d433chec4f4bcbbfb35...@mail.gmail.com http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/Thread_hijacking -Hoss
Re: Merge Policy
: SolrIndexConfig accepts a mergePolicy class name, however how does one : inject properties into it? At the moment you can't. If you look at the history of MergePolicy, users have never been encouraged to implement their own (the API actively discourages it, without going so far as to make it impossible). -Hoss
Solr index as multiple separate index directories
I'd like to be able to define within a single Solr core, a set of indexes in multiple directories. This is really useful for indexing in Hadoop or integrating with Katta where an EmbeddedSolrServer is distributed to the Hadoop cluster and indexes are generated in parallel and returned to Solr slave servers. It seems like this could be done using a custom IndexReaderFactory that opens a MultiReader over the directories. SolrIndexWriter usage in this context would be limited to incremental updates (if anything). It would be great for Solr docSet caching to operate at the SegmentReader level so the small incremental updates don't cause a massive cache regeneration. Maybe there's a way to trick Solr into doing this today by using multiple EmbeddedSolrServer instances for each large segment/shard, and executing a local distributed query to them? This way each EmbeddedSolrServer maintains caches that are not disturbed by shard updates. Ideally if I had to use multiple cores, I'd rather not have maintain separate instances of /conf on disk but could pass the same in memory rep of solrconfig and schema into the core?
Re: lucene or Solr bug with dismax?
: Indeed - I assumed that only the + and - characters had any : special meaning when parsing dismax queries and that all other content : would be treated just as keywords. That seems to be how it's : described in the dismax documentation? The dirty little secret of hte dismax parser is that i was an idiot when i wrote it. I was working on project that needed a parser that would support +/-, and wanted to try the DisjunctionMaxQuery expanstion of the terms that DisMaxParser now supports. I started by attempting to tackle the DisjunctionMaxQuery expantion in a subclass of the existing QueryParser with every intention of throwing it away once it was working. This was because i needed a quick proof of concept that demonstrated the dismax query structures produced were actually useful, so far i'd only tested a few hardcoded example queries, and i needed the parser support so i could run some regression tests over existing *WELL FORMED* queries to compare the relevancy results. It worked great: i successfully demonstrated to the right people that the query structures made sense for all our use cases. So then i put a lower priority item on my todo list / schedule to figure out the right way to implement a DisMaxParser so i wasn't stuck with any of the error code paths in the QueryParser superclass. I'm not sure if i was really tired when i finally got to looking at it, or if i was just really distracted, but i distinctly remember testing queries that had and and or in them and seeing them get parsed the way i wanted: the words were treated as litterals and incorporated into the DisjunctionMaxQuery structure. So i guess i assumed something about how i subclassed QUeryParser was bypassing the normal and/or logic, and i decided by quick and dirty subclass would work well enough. The key thing to note here is that i remember testing and and or ... not AND and OR ... for some reason or another i was totally brain dead and tested the wrong thing. had i tested the right thing, i probably would have decided i needed to write a new parser from scratch, and had the time to work it into hte project schedule. alas: 20/20 hindsight. -Hoss
Re: Merge Policy
I am referring to setting properties on the *existing* policy available in Lucene such as LogByteSizeMergePolicy.setMaxMergeMB On Tue, Jul 21, 2009 at 5:11 PM, Chris Hostetterhossman_luc...@fucit.org wrote: : SolrIndexConfig accepts a mergePolicy class name, however how does one : inject properties into it? At the moment you can't. If you look at the history of MergePolicy, users have never been encouraged to implement their own (the API actively discourages it, without going so far as to make it impossible). -Hoss
how to change the size of fieldValueCache in solr?
The FieldValueCache plays a important role in sort and facet of solr. But this cache is not managed by solr, is there any way to configure it? thanks!
Re: how to change the size of fieldValueCache in solr?
Hello, You can control it in solrconfig.xml: !-- Cache used to hold field values that are quickly accessible by document id. The fieldValueCache is created by default even if not configured here. -- fieldValueCache class=solr.FastLRUCache size=512 autowarmCount=128 showItems=32 / Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: shb suh...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Wednesday, July 22, 2009 12:29:43 AM Subject: how to change the size of fieldValueCache in solr? The FieldValueCache plays a important role in sort and facet of solr. But this cache is not managed by solr, is there any way to configure it? thanks!
Re: how to change the size of fieldValueCache in solr?
Thanks very much. Is there any difference between fieldValueCache and fieldCache?
Re: Regarding Response Builder
I would just do something like this: String myParam = req.getParams().get(xparam); where xparam is: http://localhost:8983/solr/select/?q=dogxparam=somethingstart=0rows=10indent=on Kartik1 wrote: The responsebuiilder class has SolrQueryRequest as public type. Using SolrQueryRequest we can get a list of SolrParams like SolrParams params = req.getParams(); Now I want to get the values of those params. What should be the approach as SolrParams is an abstract class and its get(String) method is abstract? Best regards, Amandeep Singh -- View this message in context: http://www.nabble.com/Regarding-Response-Builder-tp24456722p24600481.html Sent from the Solr - User mailing list archive at Nabble.com.