Re: replication -- missing field data file
On Thu, Jan 7, 2010 at 9:34 PM, Giovanni Fernandez-Kincade gfernandez-kinc...@capitaliq.com wrote: Right, but if you want to take periodic backups and ship them to tape or some DR site, you need to be able to tell when the backup is actually complete. It's seems very strange to me that you can actually track the replication progress on a slave, but you can't track the backup progress on a master. You are right. This can be improved. See https://issues.apache.org/jira/browse/SOLR-1714 -- Regards, Shalin Shekhar Mangar.
Re: Adaptive search?
On Fri, Jan 8, 2010 at 3:41 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: - Original Message From: Shalin Shekhar Mangar shalinman...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, December 23, 2009 2:45:21 AM Subject: Re: Adaptive search? On Wed, Dec 23, 2009 at 4:09 AM, Lance Norskog wrote: Nice! Siddhant: Another problem to watch out for is the feedback problem: someone clicks on a link and it automatically becomes more interesting, so someone else clicks, and it gets even more interesting... So you need some kind of suppression. For example, as individual clicks get older, you can push them down. Or you can put a cap on the number of clicks used to rank the query. We use clicks/views instead of just clicks to avoid this problem. Doesn't a click imply a view? You click to view. I must be missing something... I was talking about boosting documents using past popularity. So a user searches for X and gets 10 results. This view is recorded for each of the 10 documents and added to the index later. If a user clicks on result #2, the click is recorded for doc #2 and added to index. We boost using clicks/view. -- Regards, Shalin Shekhar Mangar.
Re: Understanding the query parser
I am running in to the same issue. I have tried to replace my WhitespaceTokenizerFactory with a PatternTokenizerFactory with pattern (\s+|-) but I still seem to get a phrase query. Why is that? Ahmet Arslan wrote: I am using Solr 1.3. I have an index with a field called name. It is of type text (unmodified, stock text field from solr). My query field:foo-bar is parsed as a phrase query field:foo bar I was rather expecting it to be parsed as field:(foo bar) or field:foo field:bar Is there an expectation mismatch? Can I make it work as I expect it to? If the query analyzer produces two or more tokens from a single token, QueryParser constructs PhraseQuery. Therefore it is expected. Without writing custom code it seems impossible to alter this behavior. Modifying QueryParser to change this behavior will be troublesome. I think easiest way is to replace '-' with whitespace before analysis phase. Probably in client side. Or in an custom RequestHandler. May be you can set qp.setPhraseSlop(Integer.MAX_VALUE); so that field:foo-bar and field:(foo AND bar) will be virtually equal. hope this helps. -- View this message in context: http://old.nabble.com/Understanding-the-query-parser-tp27071483p27107523.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Synonyms from Database
You could try to take the code for SynonymFilterFactory as a starting point, and adapt it to obtain the synonym configuration from another source than a text file. But I'm not sure what you mean by checking for synonyms at query time. As I understand it, Solr works like that anyway - depending on how you configure it. The only difference between your new SynonymFilterFactory and Solr's default would be where it obtains the synonym configuration from. You can get Solr to re-read the configuration by issuing a reload command. See http://wiki.apache.org/solr/CoreAdmin#RELOAD. Med venlig hilsen / Best regards Peter Kirk E-mail: mailto:p...@alpha-solutions.dk -Original Message- From: Ravi Gidwani [mailto:ravi.gidw...@gmail.com] Sent: 10. januar 2010 16:20 To: solr-user@lucene.apache.org Subject: Synonyms from Database Hi : Is there any work done in providing synonyms from a database instead of synonyms.txt file ? Idea is to have a dictionary in DB that can be enhanced on the fly in the application. This can then be used at query time to check for synonyms. I know I am not putting thoughts to the performance implications of this approach, but will love to hear about others thoughts. ~Ravi. No virus found in this incoming message. Checked by AVG - www.avg.com Version: 9.0.725 / Virus Database: 270.14.133/2612 - Release Date: 01/11/10 08:35:00
Re: Synonyms from Database
Thanks all for your replies. I guess what I meant by Query time, and as I understand solr (and I may be wrong here) I can add synonyms.txt in the query analyser as follows: analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ /analyzer By this my understanding is , even if the document (at index time) has a word mathematics and my synonyms.txt file has: mathematics=math,maths, a query for math will match mathematics. Since we have the synonyms.txt in the query analyzer. So I was curious about the database approach on similar lines. I get the point of the performance, and I think that is a big NO NO for this approach. But the idea was to allow changing the synonyms on the fly (more like adaptive synonyms) and improve the hits. I guess the only way (as Otis suggested) is to rewrite the file and reload configuration (as Peter suggested). This might be a performance hit (rewrite the file) and reload, but I guess still much better than the reading from DB ? Thanks again for your comments. ~Ravi. 2010/1/10 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com On Sun, Jan 10, 2010 at 1:04 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Ravi, I think if your synonyms were in a DB, it would be trivial to periodically dump them into a text file Solr expects. You wouldn't want to hit the DB to look up synonyms at query time... Why query time. Can it not be done at startup time ? Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message From: Ravi Gidwani ravi.gidw...@gmail.com To: solr-user@lucene.apache.org Sent: Sat, January 9, 2010 10:20:18 PM Subject: Synonyms from Database Hi : Is there any work done in providing synonyms from a database instead of synonyms.txt file ? Idea is to have a dictionary in DB that can be enhanced on the fly in the application. This can then be used at query time to check for synonyms. I know I am not putting thoughts to the performance implications of this approach, but will love to hear about others thoughts. ~Ravi. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Adaptive search?
Shalin: Can you point me to pages/resources that talk about this approach in details ? OR can you provide more details on the schema and the function(?) used for ranking the documents. Thanks, ~Ravi. On Mon, Jan 11, 2010 at 1:00 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Fri, Jan 8, 2010 at 3:41 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: - Original Message From: Shalin Shekhar Mangar shalinman...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, December 23, 2009 2:45:21 AM Subject: Re: Adaptive search? On Wed, Dec 23, 2009 at 4:09 AM, Lance Norskog wrote: Nice! Siddhant: Another problem to watch out for is the feedback problem: someone clicks on a link and it automatically becomes more interesting, so someone else clicks, and it gets even more interesting... So you need some kind of suppression. For example, as individual clicks get older, you can push them down. Or you can put a cap on the number of clicks used to rank the query. We use clicks/views instead of just clicks to avoid this problem. Doesn't a click imply a view? You click to view. I must be missing something... I was talking about boosting documents using past popularity. So a user searches for X and gets 10 results. This view is recorded for each of the 10 documents and added to the index later. If a user clicks on result #2, the click is recorded for doc #2 and added to index. We boost using clicks/view. -- Regards, Shalin Shekhar Mangar.
RE: Synonyms from Database
Hi - I don't think you'll see a performance hit using a DB for your synonym configuration as opposed to a text file. The configuration is only done once (at startup) - or when you reload. You won't be reloading every minute, will you? After reading the configuration, the synonyms are available to Solr via the SynonymFilter object (at least as I understand it from looking at the code). The reload feature actually sounds quite neat - it will reload in the background, and switch in the newly read configuration when it's ready - so hopefully no down-time waiting for configuration. Med venlig hilsen / Best regards Peter Kirk E-mail: mailto:p...@alpha-solutions.dk -Original Message- From: Ravi Gidwani [mailto:ravi.gidw...@gmail.com] Sent: 11. januar 2010 22:43 To: solr-user@lucene.apache.org; noble.p...@gmail.com Subject: Re: Synonyms from Database Thanks all for your replies. I guess what I meant by Query time, and as I understand solr (and I may be wrong here) I can add synonyms.txt in the query analyser as follows: analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ /analyzer By this my understanding is , even if the document (at index time) has a word mathematics and my synonyms.txt file has: mathematics=math,maths, a query for math will match mathematics. Since we have the synonyms.txt in the query analyzer. So I was curious about the database approach on similar lines. I get the point of the performance, and I think that is a big NO NO for this approach. But the idea was to allow changing the synonyms on the fly (more like adaptive synonyms) and improve the hits. I guess the only way (as Otis suggested) is to rewrite the file and reload configuration (as Peter suggested). This might be a performance hit (rewrite the file) and reload, but I guess still much better than the reading from DB ? Thanks again for your comments. ~Ravi. 2010/1/10 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com On Sun, Jan 10, 2010 at 1:04 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Ravi, I think if your synonyms were in a DB, it would be trivial to periodically dump them into a text file Solr expects. You wouldn't want to hit the DB to look up synonyms at query time... Why query time. Can it not be done at startup time ? Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message From: Ravi Gidwani ravi.gidw...@gmail.com To: solr-user@lucene.apache.org Sent: Sat, January 9, 2010 10:20:18 PM Subject: Synonyms from Database Hi : Is there any work done in providing synonyms from a database instead of synonyms.txt file ? Idea is to have a dictionary in DB that can be enhanced on the fly in the application. This can then be used at query time to check for synonyms. I know I am not putting thoughts to the performance implications of this approach, but will love to hear about others thoughts. ~Ravi. -- - Noble Paul | Systems Architect| AOL | http://aol.com No virus found in this incoming message. Checked by AVG - www.avg.com Version: 9.0.725 / Virus Database: 270.14.133/2612 - Release Date: 01/11/10 08:35:00
Re: Synonyms from Database
On Jan 11, 2010, at 4:51 AM, Peter A. Kirk wrote: The reload feature actually sounds quite neat - it will reload in the background, and switch in the newly read configuration when it's ready - so hopefully no down-time waiting for configuration. Correct me if I'm wrong, but I don't think that it's true about a reload working in the background. While a core is reloading (and warming), it is unavailable for search. right? I think you have to create a new core, and then swap to keep things alive constantly. Erik
Re: Synonyms from Database
On Mon, Jan 11, 2010 at 4:15 PM, Erik Hatcher erik.hatc...@gmail.comwrote: On Jan 11, 2010, at 4:51 AM, Peter A. Kirk wrote: The reload feature actually sounds quite neat - it will reload in the background, and switch in the newly read configuration when it's ready - so hopefully no down-time waiting for configuration. Correct me if I'm wrong, but I don't think that it's true about a reload working in the background. While a core is reloading (and warming), it is unavailable for search. right? I think you have to create a new core, and then swap to keep things alive constantly. Core reload swaps the old core with a new core on the same configuration files with no downtime. See CoreContainer#reload. -- Regards, Shalin Shekhar Mangar.
Re: Understanding the query parser
I am running in to the same issue. I have tried to replace my WhitespaceTokenizerFactory with a PatternTokenizerFactory with pattern (\s+|-) but I still seem to get a phrase query. Why is that? It is in the source code of QueryParser's getFieldQuery(String field, String queryText) method line#660. If numTokens 1 it returns Phrase Query. Modifications in analysis phase (CharFilterFactory, TokenizerFactory, TokenFilterFactory) won't change this behavior. Something must be done before analysis phase. But i think in your case, you can obtain match with modifying parameters of WordDelimeterFilterFactory even with PhraseQuery.
Re: No Analyzer, tokenizer or stemmer works at Solr
Hello Hossman, sorry for my late response. For this specific case, you are right. It makes more sense to do such work on the fly. However, I am only testing at the moment, what one can do with Solr and what not. Is the UpdateProcessor something that comes froms Lucene itself or from Solr? Thanks! hossman wrote: : Is there a way to prepare a document the described way with Lucene/Solr, : before I analyze it? : My use case is to categorize several documents in an automatic way, which : includes that I have to create data from the given input doing some : information retrieval. As Ryan mentioned earlier: this is what the UpdateRequestProcessor API is for -- it allows you to modify Documents (regardless of how they were added: csv, xml, dih) prior to Solr processing them... http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-to27026739.html Personally, i think you may be looking at your problem from the wrong dirrection... : Imagine you would analyze, index and store them like you normally do and : afterwards you want to set, whether the document belongs to the expensive : item-group or not. : If the price for the item is higher than 500$, it belongs to the : expensive : ones, otherwise not. ...for a situation like that, i wouldn't attempt to classify the docs as expensive or cheap when adding them. instead i would use numeric ranges for faceting and filtering to show me how many docs where expensive or cheap at query time -- that way when the ecomony tanks i can redifine my definition of expensive on the fly w/o needing to reindex a million documents. -Hoss -- View this message in context: http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27109760.html Sent from the Solr - User mailing list archive at Nabble.com.
Multi language support
Hi Solr users. I'm trying to set up a site with Solr search integrated. And I use the SolJava API to feed the index with search documents. At the moment I have only activated search on the English portion of the site. I'm interested in using as many features of solr as possible. Synonyms, Stopwords and stems all sounds quite interesting and useful but how do I set up this in a good way for a multilingual site? The site don't have a huge text mass so performance issues don't really bother me but still I'd like to hear your suggestions before I try to implement an solution. Best regards Daniel
Re: No Analyzer, tokenizer or stemmer works at Solr
On Jan 11, 2010, at 7:33 AM, MitchK wrote: Is the UpdateProcessor something that comes froms Lucene itself or from Solr? It's at the Solr level - http://lucene.apache.org/solr/api/org/apache/solr/update/processor/UpdateRequestProcessor.html Erik
Re: Synonyms from Database
On Jan 11, 2010, at 5:50 AM, Shalin Shekhar Mangar wrote: On Mon, Jan 11, 2010 at 4:15 PM, Erik Hatcher erik.hatc...@gmail.comwrote: On Jan 11, 2010, at 4:51 AM, Peter A. Kirk wrote: The reload feature actually sounds quite neat - it will reload in the background, and switch in the newly read configuration when it's ready - so hopefully no down-time waiting for configuration. Correct me if I'm wrong, [me saying something wrong] Core reload swaps the old core with a new core on the same configuration files with no downtime. See CoreContainer#reload. Sweet! Thanks for the correction. Erik
Re: Could not start SOLR issue
On Jan 11, 2010, at 1:38 AM, dipti khullar wrote: Hi We are running master/slave Solr 1.3 version on production since about 5 months. Yesterday, we faced following issue on one of the slaves for the first time because of which we had to restart the slave. SEVERE: Could not start SOLR. Check solr/home property java.lang.RuntimeException: java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.FSDirectory@/opt/solr/solr_slave/solr/data/index: files: null It looks like your index was removed out from under you. Perhaps this is due to the failed snapshot install? Can you replicate the problem? Stopping the slave and deleting the index directory and then restarting it should resolve it for now. I searched on forums but couldn't find any relevant info which could have possibly caused the issue. In snapinstaller logs, following failed logs were observed: 2010/01/11 04:20:06 started by solr 2010/01/11 04:20:06 command: /opt/solr/solr_slave/solr/solr/bin/snapinstaller 2010/01/11 04:20:07 installing snapshot /opt/solr/solr_slave/solr/data/snapshot.20100111041402 2010/01/11 04:20:07 notifing Solr to open a new Searcher 2010/01/11 04:20:07 failed to connect to Solr server 2010/01/11 04:20:07 snapshot installed but Solr server has not open a new Searcher 2010/01/11 04:20:08 failed (elapsed time: 1 sec) Configurations: There are 2 search servers in a virtualized VMware environment. Each has 2 instances of Solr running on separates ports in tomcat. Server 1: hosts 1 master(application 1), 1 slave (application 1) Server 2: hosta 1 master (application 2), 1 slave (application 1) Both servers have 4 CPUs and 4 GB RAM. Master - 4GB RAM - 1GB JVM Heap memory is allocated to Solr Slave1/Slave2: - 4GB RAM - 2GB JVM Heap memory is allocated to Solr Can there be any possible reasons that solr/home property couldn't be found? Thanks Dipti
Re: Could not start SOLR issue
We were able to resolve the problem by restarting the slave. Also these failed snapshot install incidents occur after the exception was observed, which seems logically correct also. Could not start SOLR. Check solr/home property We just want to avoid such instances for future. Is it possible that an any instance of time solr/home property can get corrupted? One more thing we observed was that tomcat-users.xml was overwritten. Should we debug towards that also? Thanks Dipti On Mon, Jan 11, 2010 at 6:55 PM, Grant Ingersoll gsing...@apache.orgwrote: On Jan 11, 2010, at 1:38 AM, dipti khullar wrote: Hi We are running master/slave Solr 1.3 version on production since about 5 months. Yesterday, we faced following issue on one of the slaves for the first time because of which we had to restart the slave. SEVERE: Could not start SOLR. Check solr/home property java.lang.RuntimeException: java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.FSDirectory@ /opt/solr/solr_slave/solr/data/index: files: null It looks like your index was removed out from under you. Perhaps this is due to the failed snapshot install? Can you replicate the problem? Stopping the slave and deleting the index directory and then restarting it should resolve it for now. I searched on forums but couldn't find any relevant info which could have possibly caused the issue. In snapinstaller logs, following failed logs were observed: 2010/01/11 04:20:06 started by solr 2010/01/11 04:20:06 command: /opt/solr/solr_slave/solr/solr/bin/snapinstaller 2010/01/11 04:20:07 installing snapshot /opt/solr/solr_slave/solr/data/snapshot.20100111041402 2010/01/11 04:20:07 notifing Solr to open a new Searcher 2010/01/11 04:20:07 failed to connect to Solr server 2010/01/11 04:20:07 snapshot installed but Solr server has not open a new Searcher 2010/01/11 04:20:08 failed (elapsed time: 1 sec) Configurations: There are 2 search servers in a virtualized VMware environment. Each has 2 instances of Solr running on separates ports in tomcat. Server 1: hosts 1 master(application 1), 1 slave (application 1) Server 2: hosta 1 master (application 2), 1 slave (application 1) Both servers have 4 CPUs and 4 GB RAM. Master - 4GB RAM - 1GB JVM Heap memory is allocated to Solr Slave1/Slave2: - 4GB RAM - 2GB JVM Heap memory is allocated to Solr Can there be any possible reasons that solr/home property couldn't be found? Thanks Dipti
update solr index
Hi, I am running solr in tomcat and I have about 35 indexes (between 2 and 80 millions documents each). Currently if I try to update few documents from an index (let's say the one which contains 80 millions documents) while tomcat is running and therefore receiving requests, I am getting few very long garbage collection (about 60sec). I am running tomcat with -Xms10g -Xmx10g -Xmn2g -XX:PermSize=256m -XX:MaxPermSize=256m. I'm using ConcMarkSweepGC. I have 2 questions: 1. Is solr doing something specific while an index is being updated like updating something in memory which would cause the garbage collection? 2. Any idea how I could solve this problem? Currently I stop tomcat, update index, start tomcat. I would like to be able to update my index while tomcat is running. I was thinking about running more tomcat instance with less memory for each and each running few of my indexes. Do you think it would be the best way to go? Thanks, Marc -- This transmission is strictly confidential, possibly legally privileged, and intended solely for the addressee. Any views or opinions expressed within it are those of the author and do not necessarily represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's subsidiary companies. If you are not the intended recipient then you must not disclose, copy or take any action in reliance of this transmission. If you have received this transmission in error, please notify the sender as soon as possible. No employee or agent is authorised to conclude any binding agreement on behalf of i-CD Publishing (UK) Ltd with another party by email without express written confirmation by an authorised employee of the Company. http://www.192.com (Tel: 08000 192 192). i-CD Publishing (UK) Ltd is incorporated in England and Wales, company number 3148549, VAT No. GB 673128728.
Re: No Analyzer, tokenizer or stemmer works at Solr
Is there any schemata that explains which class is responsible for which level of processing my data to the index? My example was: I have categorized, whether something is cheap or expensive. Let's say I didn't do that on the fly, but with the help of the UpdateRequestProcessor. Imagine there is a query like harry potter dvd-collection cheap or cheap Harry Potter dvd-collection. How can I customize, that, if there is something said about the category cheap, Solr uses a facetting query on cat:cheap? To do so, I have to alter the original query - how can I do that? Erik Hatcher-4 wrote: On Jan 11, 2010, at 7:33 AM, MitchK wrote: Is the UpdateProcessor something that comes froms Lucene itself or from Solr? It's at the Solr level - http://lucene.apache.org/solr/api/org/apache/solr/update/processor/UpdateRequestProcessor.html Erik -- View this message in context: http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27111504.html Sent from the Solr - User mailing list archive at Nabble.com.
How to display Highlight with VelocityResponseWriter?
Hi, we need a web gui for solr and we've noticed that VelocityResponseWriter is integrated in solr-proj for that purpose. But i have no idea how i can configure solrconfig.xml so that snippet with highlight can also be displayed in the web gui. I've added bool name=hltrue/bool into the standard responseHandler and it already works, i.e without velocity. But the same line doesn't take effect in itas. Should i configure anything else? Thanks in advance. with best regards, Qiuyan ?xml version=1.0 encoding=UTF-8 ? !-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -- config !-- Set this to 'false' if you want solr to continue working after it has encountered an severe configuration error. In a production environment, you may want solr to keep working even if one handler is mis-configured. You may also set this to false using by setting the system property: -Dsolr.abortOnConfigurationError=false -- abortOnConfigurationError${solr.abortOnConfigurationError:true}/abortOnConfigurationError !-- Used to specify an alternate directory to hold all index data other than the default ./data under the Solr home. If replication is in use, this should match the replication configuration. -- dataDir${solr.data.dir:./solr/data}/dataDir indexDefaults !-- Values here affect all index writers and act as a default unless overridden. -- useCompoundFilefalse/useCompoundFile mergeFactor10/mergeFactor !-- If both ramBufferSizeMB and maxBufferedDocs is set, then Lucene will flush based on whichever limit is hit first. -- !--maxBufferedDocs1000/maxBufferedDocs-- !-- Tell Lucene when to flush documents to disk. Giving Lucene more memory for indexing means faster indexing at the cost of more RAM If both ramBufferSizeMB and maxBufferedDocs is set, then Lucene will flush based on whichever limit is hit first. -- ramBufferSizeMB32/ramBufferSizeMB maxMergeDocs2147483647/maxMergeDocs maxFieldLength1/maxFieldLength writeLockTimeout1000/writeLockTimeout commitLockTimeout1/commitLockTimeout !-- Expert: Turn on Lucene's auto commit capability. This causes intermediate segment flushes to write a new lucene index descriptor, enabling it to be opened by an external IndexReader. NOTE: Despite the name, this value does not have any relation to Solr's autoCommit functionality -- !--luceneAutoCommitfalse/luceneAutoCommit-- !-- Expert: The Merge Policy in Lucene controls how merging is handled by Lucene. The default in 2.3 is the LogByteSizeMergePolicy, previous versions used LogDocMergePolicy. LogByteSizeMergePolicy chooses segments to merge based on their size. The Lucene 2.2 default, LogDocMergePolicy chose when to merge based on number of documents Other implementations of MergePolicy must have a no-argument constructor -- !--mergePolicyorg.apache.lucene.index.LogByteSizeMergePolicy/mergePolicy-- !-- Expert: The Merge Scheduler in Lucene controls how merges are performed. The ConcurrentMergeScheduler (Lucene 2.3 default) can perform merges in the background using separate threads. The SerialMergeScheduler (Lucene 2.2 default) does not. -- !--mergeSchedulerorg.apache.lucene.index.ConcurrentMergeScheduler/mergeScheduler-- !-- This option specifies which Lucene LockFactory implementation to use. single = SingleInstanceLockFactory - suggested for a read-only index or when there is no possibility of another process trying to modify the index. native = NativeFSLockFactory simple = SimpleFSLockFactory (For backwards compatibility with Solr 1.2, 'simple' is the default if not specified.) -- lockTypesingle/lockType /indexDefaults mainIndex !-- options specific to the main on-disk lucene index -- useCompoundFilefalse/useCompoundFile ramBufferSizeMB32/ramBufferSizeMB mergeFactor10/mergeFactor !-- Deprecated -- !--maxBufferedDocs1000/maxBufferedDocs-- maxMergeDocs2147483647/maxMergeDocs maxFieldLength1/maxFieldLength !-- If true, unlock any held write or
Re: Getting solr response data in a JS query
You might be running into an Ajax restriction. See if an article like this helps. http://www.nathanm.com/ajax-bypassing-xmlhttprequest-cross-domain-restriction/ On 1/9/10 11:37 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Dan, You didn't mention whether you tried wt=json . Does it work if you use that to tell Solr to return its response in JSON format? Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message From: Dan Yamins dyam...@gmail.com To: solr-user@lucene.apache.org Sent: Sat, January 9, 2010 10:05:54 PM Subject: Getting solr response data in a JS query Hi: I'm trying to use figure out how to get solr responses and use them in my website.I'm having some problems figure out how to 1) My initial thought is is to use ajax, and insert a line like this in my script: data = eval($.get(http://localhost:8983/solr/select/?q=*:* ).responseText) ... and then do what I want with the data, with logic being done in Javascript on the front page. However, this is just not working technically: no matter what alternative I use, I always seem to get no response to this query. I think I'm having exactly the same problem as described here: http://www.mail-archive.com/solr-user@lucene.apache.org/msg29949.html%20http://www.mail-archive.com/solr-user@lucene.apache.org/msg29949.html and here: http://stackoverflow.com/questions/1906498/solr-responses-to-webbrowser-url-but-not-from-javascript-code Just like those two OPs, I can definitely access my solr responese through a web browser, but my jquery is getting nothing.Unfortunately, in neither thread did the answer seem to have been figured out satisfactorily. Does anybody know what the problem is? 2) As an alternative, I _can_ use the ajax-solr library. Code like this: var Manager; (function ($) { $(function () { Manager = new AjaxSolr.Manager({ solrUrl: 'http://localhost:8983/solr/' }); Manager.init(); Manager.store.addByValue('q', '*:*'); Manager.store.addByValue('rows', '1000'); Manager.doRequest(); }); })(jQuery); does indeed load solr data into my DOM.Somehow, ajax-solr's doRequest method is doing something that makes it possible to receive the proper response from the solr servlet, but I don't know what it is so I can't replicate it with my own ajax. Does anyone know what is happening? (Of course, I _could_ just use ajax-solr, but doing so would mean figuring out how to re-write my existing application for how to display search results in a form that works with the ajax-solr api, and I' d rather avoid this if possible since it looks somewhat nontrivial.) Thanks! Dan
Re: Getting solr response data in a JS query
I remember having a difficult time getting jquery to work as I thought it would. Something to do with the wt. I ended up creating a little client lib. Maybe this will be useful in finding your problem? example: http://github.com/mwmitchell/get_rest/blob/master/solr_example.html lib: http://github.com/mwmitchell/get_rest/blob/master/solr_client.jquery.js Matt On Mon, Jan 11, 2010 at 11:22 AM, Gregg Hoshovsky hosho...@ohsu.edu wrote: You might be running into an Ajax restriction. See if an article like this helps. http://www.nathanm.com/ajax-bypassing-xmlhttprequest-cross-domain-restriction/ On 1/9/10 11:37 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Dan, You didn't mention whether you tried wt=json . Does it work if you use that to tell Solr to return its response in JSON format? Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message From: Dan Yamins dyam...@gmail.com To: solr-user@lucene.apache.org Sent: Sat, January 9, 2010 10:05:54 PM Subject: Getting solr response data in a JS query Hi: I'm trying to use figure out how to get solr responses and use them in my website.I'm having some problems figure out how to 1) My initial thought is is to use ajax, and insert a line like this in my script: data = eval($.get(http://localhost:8983/solr/select/?q=*:* ).responseText) ... and then do what I want with the data, with logic being done in Javascript on the front page. However, this is just not working technically: no matter what alternative I use, I always seem to get no response to this query. I think I'm having exactly the same problem as described here: http://www.mail-archive.com/solr-user@lucene.apache.org/msg29949.html %20http://www.mail-archive.com/solr-user@lucene.apache.org/msg29949.html and here: http://stackoverflow.com/questions/1906498/solr-responses-to-webbrowser-url-but-not-from-javascript-code Just like those two OPs, I can definitely access my solr responese through a web browser, but my jquery is getting nothing.Unfortunately, in neither thread did the answer seem to have been figured out satisfactorily. Does anybody know what the problem is? 2) As an alternative, I _can_ use the ajax-solr library. Code like this: var Manager; (function ($) { $(function () { Manager = new AjaxSolr.Manager({ solrUrl: 'http://localhost:8983/solr/' }); Manager.init(); Manager.store.addByValue('q', '*:*'); Manager.store.addByValue('rows', '1000'); Manager.doRequest(); }); })(jQuery); does indeed load solr data into my DOM.Somehow, ajax-solr's doRequest method is doing something that makes it possible to receive the proper response from the solr servlet, but I don't know what it is so I can't replicate it with my own ajax. Does anyone know what is happening? (Of course, I _could_ just use ajax-solr, but doing so would mean figuring out how to re-write my existing application for how to display search results in a form that works with the ajax-solr api, and I' d rather avoid this if possible since it looks somewhat nontrivial.) Thanks! Dan
Re: How to display Highlight with VelocityResponseWriter?
Qiuyan, with highlight can also be displayed in the web gui. I've added bool name=hltrue/bool into the standard responseHandler and it already works, i.e without velocity. But the same line doesn't take effect in itas. Should i configure anything else? Thanks in advance. First of all, just a few notes on the /itas request handler in your solrconfig.xml: 1. The entry arr name=components strhighlight/str /arr is obsolete, since the highlighting component is a default search component [1]. 2. Note that since you didn't specify a value for hl.fl highlighting will only affect the fields listed inside of qf. 3. Why did you override the default value of hl.fragmenter? In most cases the default fragmenting algorithm (gap) works fine - and maybe in yours as well? To make sure all your hl related settings are correct, can you post an xml output (change the wt parameter to xml) for a search with highlighted results. And finally, can you post the vtl code snippet that should produce the highlighted output. -Sascha [1] http://wiki.apache.org/solr/SearchComponent
Re: Multi language support
Hello, We have implemented language specific search in Solr using language specific fields and field types. For instance, an en_text field type can use an English stemmer, and list of stopwords and synonyms. We, however did not use specific stopwords, instead we used one list shared by both languages. So you would have a field type like: fieldType name=en_text class=solr.TextField ... analyzer type= filter class=solr.StopFilterFactory words=stopwords.en.txt filter class=solr.SynonymFilterFactory synonyms=synoyms.en.txt etc etc. Cheers, - Markus Jelsma Buyways B.V. Technisch ArchitectFriesestraatweg 215c http://www.buyways.nl 9743 AD Groningen Alg. 050-853 6600 KvK 01074105 Tel. 050-853 6620 Fax. 050-3118124 Mob. 06-5025 8350 In: http://www.linkedin.com/in/markus17 On Mon, 2010-01-11 at 13:45 +0100, Daniel Persson wrote: Hi Solr users. I'm trying to set up a site with Solr search integrated. And I use the SolJava API to feed the index with search documents. At the moment I have only activated search on the English portion of the site. I'm interested in using as many features of solr as possible. Synonyms, Stopwords and stems all sounds quite interesting and useful but how do I set up this in a good way for a multilingual site? The site don't have a huge text mass so performance issues don't really bother me but still I'd like to hear your suggestions before I try to implement an solution. Best regards Daniel
Replication problem
Hi, sorry for the somewhat inane question: I setup replication request handler on the master however I'm not seeing any replicatable indexes via http://localhost:8080/solr/main/replication?command=indexversion Queries such as *:* yield results on the master (so I assume the commit worked). The replication console shows an index, so not sure what's going on. Here's the request handler XML on the master: requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enabletrue/str !--Replicate on 'optimize'. Other values can be 'commit', 'startup'. It is possible to have multiple entries o$ str name=replicateAftercommit,optimize/str !--Create a backup after 'optimize'. Other values can be 'commit', 'startup'. It is possible to have multiple $ !-- str name=backupAfteroptimize/str -- !--If configuration files need to be replicated give the names here, separated by comma -- str name=confFilesschema.xml,synonyms.txt,stopwords.txt,elevate.xml/str !--The default value of reservation is 10 secs.See the documentation below . Normally , you should not need to$ str name=commitReserveDuration00:10:00/str /lst /requestHandler
Re: Replication problem
Did you try adding startup to the list of events to replicate after? -Yonik http://www.lucidimagination.com On Mon, Jan 11, 2010 at 12:25 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Hi, sorry for the somewhat inane question: I setup replication request handler on the master however I'm not seeing any replicatable indexes via http://localhost:8080/solr/main/replication?command=indexversion Queries such as *:* yield results on the master (so I assume the commit worked). The replication console shows an index, so not sure what's going on. Here's the request handler XML on the master: requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enabletrue/str !--Replicate on 'optimize'. Other values can be 'commit', 'startup'. It is possible to have multiple entries o$ str name=replicateAftercommit,optimize/str !--Create a backup after 'optimize'. Other values can be 'commit', 'startup'. It is possible to have multiple $ !-- str name=backupAfteroptimize/str -- !--If configuration files need to be replicated give the names here, separated by comma -- str name=confFilesschema.xml,synonyms.txt,stopwords.txt,elevate.xml/str !--The default value of reservation is 10 secs.See the documentation below . Normally , you should not need to$ str name=commitReserveDuration00:10:00/str /lst /requestHandler
Re: Replication problem
Yonik, I added startup to replicateAfter, however no dice... There's no errors the Tomcat log. The output of: http://localhost-master:8080/solr/main/replication?command=indexversion response lst name=responseHeader int name=status0/int int name=QTime0/int /lst long name=indexversion0/long long name=generation0/long /response The master replication UI: Local Index Index Version: 1263182366335, Generation: 3 Location: /mnt/solr/main/data/index Size: 1.08 KB Master solrconfig.xml, and tomcat was restarted: requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enabletrue/str !--Replicate on 'optimize'. Other values can be 'commit', 'startup'. It is possible to have multiple entries o$ str name=replicateAfterstartup,commit,optimize/str !--Create a backup after 'optimize'. Other values can be 'commit', 'startup'. It is possible to have multiple $ !-- str name=backupAfteroptimize/str -- !--If configuration files need to be replicated give the names here, separated by comma -- str name=confFilesschema.xml,synonyms.txt,stopwords.txt,elevate.xml/str !--The default value of reservation is 10 secs.See the documentation below . Normally , you should not need to$ str name=commitReserveDuration00:10:00/str /lst /requestHandler On Tue, Jan 12, 2010 at 11:29 AM, Yonik Seeley yo...@lucidimagination.com wrote: Did you try adding startup to the list of events to replicate after? -Yonik http://www.lucidimagination.com On Mon, Jan 11, 2010 at 12:25 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Hi, sorry for the somewhat inane question: I setup replication request handler on the master however I'm not seeing any replicatable indexes via http://localhost:8080/solr/main/replication?command=indexversion Queries such as *:* yield results on the master (so I assume the commit worked). The replication console shows an index, so not sure what's going on. Here's the request handler XML on the master: requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enabletrue/str !--Replicate on 'optimize'. Other values can be 'commit', 'startup'. It is possible to have multiple entries o$ str name=replicateAftercommit,optimize/str !--Create a backup after 'optimize'. Other values can be 'commit', 'startup'. It is possible to have multiple $ !-- str name=backupAfteroptimize/str -- !--If configuration files need to be replicated give the names here, separated by comma -- str name=confFilesschema.xml,synonyms.txt,stopwords.txt,elevate.xml/str !--The default value of reservation is 10 secs.See the documentation below . Normally , you should not need to$ str name=commitReserveDuration00:10:00/str /lst /requestHandler
help implementing a couple of business rules
hello *, im looking for help on writing queries to implement a few business rules. 1. given a set of fields how to return matches that match across them but not just one specific one, ex im using a dismax parser currently but i want to exclude any results that only match against a field called 'description2' 2. given a set of fields how to return matches that match across them but on one specific field match as a phrase only, ex im using a dismax parser currently but i want matches against a field called 'people' to only match as a phrase thx much, --joe
Re: help implementing a couple of business rules
On Jan 11, 2010, at 12:56 PM, Joe Calderon wrote: 1. given a set of fields how to return matches that match across them but not just one specific one, ex im using a dismax parser currently but i want to exclude any results that only match against a field called 'description2' One way could be to add an fq parameter to the request: fq=-description2:(query) 2. given a set of fields how to return matches that match across them but on one specific field match as a phrase only, ex im using a dismax parser currently but i want matches against a field called 'people' to only match as a phrase Doesn't setting pf=people accomplish this? Erik
Re: help implementing a couple of business rules
thx, but im not sure that covers all edge cases, to clarify 1. matching description2 is okay if other fields are matched too, but results matching only to description2 should be omitted 2. its okay to not match against the people field, but matches against the people field should only be phrase matches sorry if i was unclear --joe On Mon, Jan 11, 2010 at 10:13 AM, Erik Hatcher erik.hatc...@gmail.com wrote: On Jan 11, 2010, at 12:56 PM, Joe Calderon wrote: 1. given a set of fields how to return matches that match across them but not just one specific one, ex im using a dismax parser currently but i want to exclude any results that only match against a field called 'description2' One way could be to add an fq parameter to the request: fq=-description2:(query) 2. given a set of fields how to return matches that match across them but on one specific field match as a phrase only, ex im using a dismax parser currently but i want matches against a field called 'people' to only match as a phrase Doesn't setting pf=people accomplish this? Erik
Re: Understanding the query parser
It is in the source code of QueryParser's getFieldQuery(String field, String queryText) method line#660. If numTokens 1 it returns Phrase Query. That's exactly the question. Would be nice to hear from someone as to why is it that way? Cheers Avlesh On Mon, Jan 11, 2010 at 5:10 PM, Ahmet Arslan iori...@yahoo.com wrote: I am running in to the same issue. I have tried to replace my WhitespaceTokenizerFactory with a PatternTokenizerFactory with pattern (\s+|-) but I still seem to get a phrase query. Why is that? It is in the source code of QueryParser's getFieldQuery(String field, String queryText) method line#660. If numTokens 1 it returns Phrase Query. Modifications in analysis phase (CharFilterFactory, TokenizerFactory, TokenFilterFactory) won't change this behavior. Something must be done before analysis phase. But i think in your case, you can obtain match with modifying parameters of WordDelimeterFilterFactory even with PhraseQuery.
Cores + Replication Config
If you want to share one config amidst master slaves, using Solr 1.4 replication, is there a way to specific whether a core is Master or Slave when using the CREATE Core command? Thanks, Gio.
Re: XmlUpdateRequestHandler with HTMLStripCharFilterFactory
Thanks we were having the saem issue. We are trying to store article content and we are strong a field like pThis article is for blah /p. Wheni see the analysis.jsp page it does strip out the p tags and is indexed. but when we fetch the document it returns the field with the p tags. From solr point of view, its correct but our issue is that this kind of html tags is screwing up our display of our page. Is there an easy way to esure how to strip out hte html tags, or do we have to take care of manually. Thanks Rashid aseem cheema wrote: Alright. It turns out that escapedTags is not for what I thought it is for. The problem that I am having with HTMLStripCharFilterFactory is that it strips the html while indexing the field, but not while storing the field. That is why what is see in analysis.jsp, which is index analysis, does not match what gets stored... because.. well HTML is stripped only for indexing. Makes so much sense. Thanks to Ryan McKinley for clarifying this. Aseem On Wed, Nov 11, 2009 at 9:50 AM, aseem cheema aseemche...@gmail.com wrote: I am trying to post a document with the following content using SolrJ: centercontent/center I need the xml/html tags to be ignored. Even though this works fine in analysis.jsp, this does not work with SolrJ, as the client escapes the and with lt; and gt; and HTMLStripCharFilterFactory does not strip those escaped tags. How can I achieve this? Any ideas will be highly appreciated. There is escapedTags in HTMLStripCharFilterFactory constructor. Is there a way to get that to work? Thanks -- Aseem -- Aseem -- View this message in context: http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116434.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tokenizer question
What do your FieldTypes look like for the fields in question? On Jan 10, 2010, at 10:05 AM, rswart wrote: Hi, This is probably an easy question. I am doing a simple query on postcode and house number. If the housenumber contains a minus sign like: q=PostCode:(1078 pw)+AND+HouseNumber:(39-43) the resulting parsed query contains a phrase query: +(PostCode:1078 PostCode:pw) +PhraseQuery(HouseNumber:39 43) This never matches. What I want solr to do is generate the following parsed query (essentially an OR for both house numbers): +(PostCode:1078 PostCode:pw) +(HouseNumber:39 HouseNumber:43) Solr generates this based on the following query (so a space instead of a minus sign): q=PostCode:(1078 pw)+AND+HouseNumber:(39 43) I tried two things to have Solr generate the desired parsed query: 1. WordDelimiterFilterFactory with generateNumberParts=1 but this results in a phrase query 2. PatternTokenizerFactory that splits on (\s+|-). But both options don't work. Any suggestions on how to get rid of the phrase query? Thanks, Richard -- View this message in context: http://old.nabble.com/Tokenizer-question-tp27099119p27099119.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
Re: Tokenizer question
And also, what query parser are you using? On Jan 11, 2010, at 2:46 PM, Grant Ingersoll wrote: What do your FieldTypes look like for the fields in question? On Jan 10, 2010, at 10:05 AM, rswart wrote: Hi, This is probably an easy question. I am doing a simple query on postcode and house number. If the housenumber contains a minus sign like: q=PostCode:(1078 pw)+AND+HouseNumber:(39-43) the resulting parsed query contains a phrase query: +(PostCode:1078 PostCode:pw) +PhraseQuery(HouseNumber:39 43) This never matches. What I want solr to do is generate the following parsed query (essentially an OR for both house numbers): +(PostCode:1078 PostCode:pw) +(HouseNumber:39 HouseNumber:43) Solr generates this based on the following query (so a space instead of a minus sign): q=PostCode:(1078 pw)+AND+HouseNumber:(39 43) I tried two things to have Solr generate the desired parsed query: 1. WordDelimiterFilterFactory with generateNumberParts=1 but this results in a phrase query 2. PatternTokenizerFactory that splits on (\s+|-). But both options don't work. Any suggestions on how to get rid of the phrase query? Thanks, Richard -- View this message in context: http://old.nabble.com/Tokenizer-question-tp27099119p27099119.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
Re: XmlUpdateRequestHandler with HTMLStripCharFilterFactory
This page: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters http://wiki.apache.org/solr/AnalyzersTokenizersTokenFiltersshows you many of the SOLR analyzers and filters. Would one of the various *HTMLStrip* stuff work? HTH ERick On Mon, Jan 11, 2010 at 2:44 PM, darniz rnizamud...@edmunds.com wrote: Thanks we were having the saem issue. We are trying to store article content and we are strong a field like pThis article is for blah /p. Wheni see the analysis.jsp page it does strip out the p tags and is indexed. but when we fetch the document it returns the field with the p tags. From solr point of view, its correct but our issue is that this kind of html tags is screwing up our display of our page. Is there an easy way to esure how to strip out hte html tags, or do we have to take care of manually. Thanks Rashid aseem cheema wrote: Alright. It turns out that escapedTags is not for what I thought it is for. The problem that I am having with HTMLStripCharFilterFactory is that it strips the html while indexing the field, but not while storing the field. That is why what is see in analysis.jsp, which is index analysis, does not match what gets stored... because.. well HTML is stripped only for indexing. Makes so much sense. Thanks to Ryan McKinley for clarifying this. Aseem On Wed, Nov 11, 2009 at 9:50 AM, aseem cheema aseemche...@gmail.com wrote: I am trying to post a document with the following content using SolrJ: centercontent/center I need the xml/html tags to be ignored. Even though this works fine in analysis.jsp, this does not work with SolrJ, as the client escapes the and with lt; and gt; and HTMLStripCharFilterFactory does not strip those escaped tags. How can I achieve this? Any ideas will be highly appreciated. There is escapedTags in HTMLStripCharFilterFactory constructor. Is there a way to get that to work? Thanks -- Aseem -- Aseem -- View this message in context: http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116434.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: XmlUpdateRequestHandler with HTMLStripCharFilterFactory
Well thats the whole discussion we are talking about. I had the impression that the html tags are filtered and then the field is stored without tags. But looks like the html tags are removed and terms are indexed purely for indexing, and the actual text is stored in raw format. Lets say for example if i enter a field like field name=bodyphonda car road review/field When i do analysis on the body field the html filter removes the p tag and indexed works honda, car, road, review. But when i fetch body field to display in my document it returns phonda car road review I hope i make sense. thanks darniz Erick Erickson wrote: This page: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters http://wiki.apache.org/solr/AnalyzersTokenizersTokenFiltersshows you many of the SOLR analyzers and filters. Would one of the various *HTMLStrip* stuff work? HTH ERick On Mon, Jan 11, 2010 at 2:44 PM, darniz rnizamud...@edmunds.com wrote: Thanks we were having the saem issue. We are trying to store article content and we are strong a field like pThis article is for blah /p. Wheni see the analysis.jsp page it does strip out the p tags and is indexed. but when we fetch the document it returns the field with the p tags. From solr point of view, its correct but our issue is that this kind of html tags is screwing up our display of our page. Is there an easy way to esure how to strip out hte html tags, or do we have to take care of manually. Thanks Rashid aseem cheema wrote: Alright. It turns out that escapedTags is not for what I thought it is for. The problem that I am having with HTMLStripCharFilterFactory is that it strips the html while indexing the field, but not while storing the field. That is why what is see in analysis.jsp, which is index analysis, does not match what gets stored... because.. well HTML is stripped only for indexing. Makes so much sense. Thanks to Ryan McKinley for clarifying this. Aseem On Wed, Nov 11, 2009 at 9:50 AM, aseem cheema aseemche...@gmail.com wrote: I am trying to post a document with the following content using SolrJ: centercontent/center I need the xml/html tags to be ignored. Even though this works fine in analysis.jsp, this does not work with SolrJ, as the client escapes the and with lt; and gt; and HTMLStripCharFilterFactory does not strip those escaped tags. How can I achieve this? Any ideas will be highly appreciated. There is escapedTags in HTMLStripCharFilterFactory constructor. Is there a way to get that to work? Thanks -- Aseem -- Aseem -- View this message in context: http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116434.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116601.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Adaptive search?
: I was talking about boosting documents using past popularity. So a user : searches for X and gets 10 results. This view is recorded for each of the 10 : documents and added to the index later. If a user clicks on result #2, the : click is recorded for doc #2 and added to index. We boost using clicks/view. FWIW: I've observed three problems with this type of metric... 1) render vs view ... what you are calling a view is really a rendering -- you are sending the data back to include the item in the list of 10 items on the page, and the brwoser is rendering it, but that doesn't mean the users is actaully viewing it -- particularly in a webpage type situation where only the first 3-5 results might actually appear above the fold and the user has to scroll to see the rest. Even in a smaller UI element (like a left or right nav info box, there's no garuntee that the user acctually views any of the items, which can bias things. 2) It doesn't take into account people who click on a result, decide it's terrible, hit the back arrow and click on a differnet result -- both of those wind up scoring equally. Some really complex session+click analysis can overcome this, but not a lot of people have the resources to do that all the time. 3) ignoring #1 and #2 above (because i havne't found many better options) you face the popularity problem -- or what my coworkers and i use to call the TRL Problem back in the 90s: MTV's Total Request Live was a Top X countdown show of videos, featuring hte most popular videos of the week based on requests -- but it was also the number one show on the network, occupying something like 4/24 broadcast hours of every day, when there was only a total of 6/24 hours that actaully showed music videoes. So for them ost part the only videos peopel ever saw were on TRL, so those were the only videos that ever got requested. In a nutshell: once something becomes popular and is what everybody sees, it stays popular, because it's what everybody sees and they don't know that there is better stuff out there. Even if everyone looks at the full list of results and actaully reads all of the first 10 summaries, in the absense of ay other bias their inclination is going to be to assume #1 is the best. So they might click on that even if another result on the list appears better bassed on their opinion. A variation that i did some experiments with, but never really refined because i didn't have the time/energy to really go to town on it, is to weight the clicks based on position: a click on item #1 whould't be worth anything -- it's hte number one result, the expectation is that it better get clicked or something is wrong. A click on #2 is worth soemthing to that item, and a click on #3 is worth more to that item, and so on ... so that if the #9 item gets a click, that's huge. To do it right, I think what you really want to do is penalize items that get views but no clicks -- because if someone loads up resuolts 1-10, and doesn't click on any of them, that should be a vote in favor of moving all of them down and moving item #11 up (even though it got no views or clicks) But like i said: i never experimented with this idea enough to come up with a good formula, or verify that the idea was sound. -Hoss
Re: Getting solr response data in a JS query
AJAX Solr does more or less the following: jQuery.getJSON('http://localhost:8983/solr/select/?q=*:*wt=jsonjson.wrf=?', {}, function (data) { // do something with data, which is the eval'd JSON response }); -- View this message in context: http://old.nabble.com/Getting-solr-response-data-in-a-JS-query-tp27095224p27116970.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: XmlUpdateRequestHandler with HTMLStripCharFilterFactory
: stored without tags. But looks like the html tags are removed and terms are : indexed purely for indexing, and the actual text is stored in raw format. Correct. Analysis is all about indexing it has nothing to do with stored content. You can write UpdateProcessors that modify the content before it is either indexed or stored, but there aren't a lot of Processors provided out of hte box at the moment. -Hoss
Re: Tokenizer question
We are using the standard query parser (so no dismax). Fieldtype is solr.TextField with the following query analyzer: analyzer type=query tokenizer class=solr.PatternTokenizerFactory pattern=(\s+|-) / filter class=solr.StopFilterFactory words=../../../synonyms/nl_stopwords.txt ignoreCase=true/ filter class=solr.SynonymFilterFactory synonyms=../../../synonyms/nl_synonyms.txt ignoreCase=true expand=true / filter class=solr.PatternReplaceFilterFactory pattern=- replacement= replace=all / filter class=com.foo.IgnoreListWordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=0 catenateAll=0 preserveOriginal=0 splitOnCaseChange=0 ignoreList=@amp;/ filter class=solr.PatternReplaceFilterFactory pattern=^0+(.) replacement=$1 replace=all / filter class=solr.LowerCaseFilterFactory / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer Grant Ingersoll-6 wrote: And also, what query parser are you using? On Jan 11, 2010, at 2:46 PM, Grant Ingersoll wrote: What do your FieldTypes look like for the fields in question? On Jan 10, 2010, at 10:05 AM, rswart wrote: Hi, This is probably an easy question. I am doing a simple query on postcode and house number. If the housenumber contains a minus sign like: q=PostCode:(1078 pw)+AND+HouseNumber:(39-43) the resulting parsed query contains a phrase query: +(PostCode:1078 PostCode:pw) +PhraseQuery(HouseNumber:39 43) This never matches. What I want solr to do is generate the following parsed query (essentially an OR for both house numbers): +(PostCode:1078 PostCode:pw) +(HouseNumber:39 HouseNumber:43) Solr generates this based on the following query (so a space instead of a minus sign): q=PostCode:(1078 pw)+AND+HouseNumber:(39 43) I tried two things to have Solr generate the desired parsed query: 1. WordDelimiterFilterFactory with generateNumberParts=1 but this results in a phrase query 2. PatternTokenizerFactory that splits on (\s+|-). But both options don't work. Any suggestions on how to get rid of the phrase query? Thanks, Richard -- View this message in context: http://old.nabble.com/Tokenizer-question-tp27099119p27099119.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search -- View this message in context: http://old.nabble.com/Tokenizer-question-tp27099119p27117036.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: XmlUpdateRequestHandler with HTMLStripCharFilterFactory
Ah, I read your post too fast and ignored the title. Sorry 'bout that. Erick On Mon, Jan 11, 2010 at 2:55 PM, darniz rnizamud...@edmunds.com wrote: Well thats the whole discussion we are talking about. I had the impression that the html tags are filtered and then the field is stored without tags. But looks like the html tags are removed and terms are indexed purely for indexing, and the actual text is stored in raw format. Lets say for example if i enter a field like field name=bodyphonda car road review/field When i do analysis on the body field the html filter removes the p tag and indexed works honda, car, road, review. But when i fetch body field to display in my document it returns phonda car road review I hope i make sense. thanks darniz Erick Erickson wrote: This page: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters http://wiki.apache.org/solr/AnalyzersTokenizersTokenFiltersshows you many of the SOLR analyzers and filters. Would one of the various *HTMLStrip* stuff work? HTH ERick On Mon, Jan 11, 2010 at 2:44 PM, darniz rnizamud...@edmunds.com wrote: Thanks we were having the saem issue. We are trying to store article content and we are strong a field like pThis article is for blah /p. Wheni see the analysis.jsp page it does strip out the p tags and is indexed. but when we fetch the document it returns the field with the p tags. From solr point of view, its correct but our issue is that this kind of html tags is screwing up our display of our page. Is there an easy way to esure how to strip out hte html tags, or do we have to take care of manually. Thanks Rashid aseem cheema wrote: Alright. It turns out that escapedTags is not for what I thought it is for. The problem that I am having with HTMLStripCharFilterFactory is that it strips the html while indexing the field, but not while storing the field. That is why what is see in analysis.jsp, which is index analysis, does not match what gets stored... because.. well HTML is stripped only for indexing. Makes so much sense. Thanks to Ryan McKinley for clarifying this. Aseem On Wed, Nov 11, 2009 at 9:50 AM, aseem cheema aseemche...@gmail.com wrote: I am trying to post a document with the following content using SolrJ: centercontent/center I need the xml/html tags to be ignored. Even though this works fine in analysis.jsp, this does not work with SolrJ, as the client escapes the and with lt; and gt; and HTMLStripCharFilterFactory does not strip those escaped tags. How can I achieve this? Any ideas will be highly appreciated. There is escapedTags in HTMLStripCharFilterFactory constructor. Is there a way to get that to work? Thanks -- Aseem -- Aseem -- View this message in context: http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116434.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116601.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search query log using solr
: application. I am planning to add a search query log that will capture all : the search queries (and more information like IP,user info,date time,etc). : I understand I can easily do this on the application side capturing all the : search request, logging them in a DB/File before sending them to solr for : execution. : But I wanted to check with the forum if there was any better : approach OR best practices OR anything that has been added to Solr for such : requirement. doing this in your applicatyion is probably the best bet ... you could put all of the extra info in query args to solr, which would be ignored but included in Solr's own logs, except that would mcuk up any HTTP Caching you might do (and putting an Accelerator Cache in front of Solr is a really easy way to reduce load in a lot of common situations) -Hoss
Re: Understanding the query parser
On Jan 11, 2010, at 1:33 PM, Avlesh Singh wrote: It is in the source code of QueryParser's getFieldQuery(String field, String queryText) method line#660. If numTokens 1 it returns Phrase Query. That's exactly the question. Would be nice to hear from someone as to why is it that way? Suppose you indexed Foo Bar. It'd get indexed as two tokens [foo] followed by [bar]. Then someone searches for foo-bar, which would get analyzed into two tokens also. A PhraseQuery is the most logical thing for it to turn into, no? What's the alternative? Of course it's tricky business though, impossible to do the right thing for all cases within SolrQueryParser. Thankfully it is pleasantly subclassable and overridable for this method. Erik
Commons Lang
We have a solr plugin that would be much easier to write if commons-lang was available. Why does solr not have this library? Is there any drawbacks to pulling in the commons lang for StringUtils? -- Jeff Newburn Software Engineer, Zappos.com
Re: XmlUpdateRequestHandler with HTMLStripCharFilterFactory
no problem Erick Erickson wrote: Ah, I read your post too fast and ignored the title. Sorry 'bout that. Erick On Mon, Jan 11, 2010 at 2:55 PM, darniz rnizamud...@edmunds.com wrote: Well thats the whole discussion we are talking about. I had the impression that the html tags are filtered and then the field is stored without tags. But looks like the html tags are removed and terms are indexed purely for indexing, and the actual text is stored in raw format. Lets say for example if i enter a field like field name=bodyphonda car road review/field When i do analysis on the body field the html filter removes the p tag and indexed works honda, car, road, review. But when i fetch body field to display in my document it returns phonda car road review I hope i make sense. thanks darniz Erick Erickson wrote: This page: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters http://wiki.apache.org/solr/AnalyzersTokenizersTokenFiltersshows you many of the SOLR analyzers and filters. Would one of the various *HTMLStrip* stuff work? HTH ERick On Mon, Jan 11, 2010 at 2:44 PM, darniz rnizamud...@edmunds.com wrote: Thanks we were having the saem issue. We are trying to store article content and we are strong a field like pThis article is for blah /p. Wheni see the analysis.jsp page it does strip out the p tags and is indexed. but when we fetch the document it returns the field with the p tags. From solr point of view, its correct but our issue is that this kind of html tags is screwing up our display of our page. Is there an easy way to esure how to strip out hte html tags, or do we have to take care of manually. Thanks Rashid aseem cheema wrote: Alright. It turns out that escapedTags is not for what I thought it is for. The problem that I am having with HTMLStripCharFilterFactory is that it strips the html while indexing the field, but not while storing the field. That is why what is see in analysis.jsp, which is index analysis, does not match what gets stored... because.. well HTML is stripped only for indexing. Makes so much sense. Thanks to Ryan McKinley for clarifying this. Aseem On Wed, Nov 11, 2009 at 9:50 AM, aseem cheema aseemche...@gmail.com wrote: I am trying to post a document with the following content using SolrJ: centercontent/center I need the xml/html tags to be ignored. Even though this works fine in analysis.jsp, this does not work with SolrJ, as the client escapes the and with lt; and gt; and HTMLStripCharFilterFactory does not strip those escaped tags. How can I achieve this? Any ideas will be highly appreciated. There is escapedTags in HTMLStripCharFilterFactory constructor. Is there a way to get that to work? Thanks -- Aseem -- Aseem -- View this message in context: http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116434.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116601.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27118304.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi language support
This is the way I've implemented multilingual search as well. 2010/1/11 Markus Jelsma mar...@buyways.nl Hello, We have implemented language specific search in Solr using language specific fields and field types. For instance, an en_text field type can use an English stemmer, and list of stopwords and synonyms. We, however did not use specific stopwords, instead we used one list shared by both languages. So you would have a field type like: fieldType name=en_text class=solr.TextField ... analyzer type= filter class=solr.StopFilterFactory words=stopwords.en.txt filter class=solr.SynonymFilterFactory synonyms=synoyms.en.txt etc etc. Cheers, - Markus Jelsma Buyways B.V. Technisch ArchitectFriesestraatweg 215c http://www.buyways.nl 9743 AD Groningen Alg. 050-853 6600 KvK 01074105 Tel. 050-853 6620 Fax. 050-3118124 Mob. 06-5025 8350 In: http://www.linkedin.com/in/markus17 On Mon, 2010-01-11 at 13:45 +0100, Daniel Persson wrote: Hi Solr users. I'm trying to set up a site with Solr search integrated. And I use the SolJava API to feed the index with search documents. At the moment I have only activated search on the English portion of the site. I'm interested in using as many features of solr as possible. Synonyms, Stopwords and stems all sounds quite interesting and useful but how do I set up this in a good way for a multilingual site? The site don't have a huge text mass so performance issues don't really bother me but still I'd like to hear your suggestions before I try to implement an solution. Best regards Daniel
Encountering a roadblock with my Solr schema design...use dedupe?
I am in the process of building a Solr search solution for my application and have run into a roadblock with the schema design. Trying to match criteria in one multi-valued field with corresponding criteria in another multi-valued field. Any advice would be greatly appreciated. BACKGROUND: My RDBMS data model is such that for every one of my Product entities, there are one-to-many SKU entities available for purchase. Each SKU entity can have its own price, as well as one-to-many options, etc. The web frontend displays available Product entities on both directory and detail pages. In order to take advantage of Solr's facet count, paging, and sorting functionality, I decided to base the Solr schema on Product documents; so none of my documents currently contain duplicate Product data, and all SKU related data is denormalized as necessary, but into multi-valued fields. For example, I have a document with an id field set to Product:7, a docType field is set to Product as well as multi-valued SKU related fields and data like, sku_color {Red | Green | Blue}, sku_size {Small | Medium | Large}, sku_price {10.00 | 10.00 | 7.99} I hit the roadblock when I tried to answer the question, Which products are available that contain skus with color Green, size M, and a price of $9.99 or less?...and have now begun the switch to SKU level indexing. This also gives me what I need for faceted browsing/navigation, and search refinement...leading the user to Product entities having purchasable SKU entities. But this also means I now have documents which are mostly duplicates for each Product, and all, facet counts, paging and sorting is then inaccurate; so it appears I need do this myself, with multiple Solr requests. Is this really the best approach; and if so, should I use the Solr Deduplication update processor when indexing and querying? Thanks in advance, Kelly -- View this message in context: http://old.nabble.com/Encountering-a-roadblock-with-my-Solr-schema-design...use-dedupe--tp27118977p27118977.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Encountering a roadblock with my Solr schema design...use dedupe?
Hello Kelly, I am not entirely sure if i understand your problem correctly. But i believe your first approach is the right one. Your question: Which products are available that contain skus with color Green, size M, and a price of $9.99 or less? can be easily answered using a schema like yours. id = 1 color = [green, blue] size = [M, S] price = 6 id = 2 color = [red, blue] size = [L, S] price = 12 id = 3 color = [green, red, blue] size = [L, S, M] price = 5 Using the data above you can answer your question using a basic Solr query [1] like the following: q=color:green AND price:[0 TO 9,99] AND size:M Of course, you would make this a function query [2] but this, if i understood your question well enough, answers it. [1] http://wiki.apache.org/solr/SolrQuerySyntax [2] http://wiki.apache.org/solr/FunctionQuery Cheers, Kelly Taylor zei: I am in the process of building a Solr search solution for my application and have run into a roadblock with the schema design. Trying to match criteria in one multi-valued field with corresponding criteria in another multi-valued field. Any advice would be greatly appreciated. BACKGROUND: My RDBMS data model is such that for every one of my Product entities, there are one-to-many SKU entities available for purchase. Each SKU entity can have its own price, as well as one-to-many options, etc. The web frontend displays available Product entities on both directory and detail pages. In order to take advantage of Solr's facet count, paging, and sorting functionality, I decided to base the Solr schema on Product documents; so none of my documents currently contain duplicate Product data, and all SKU related data is denormalized as necessary, but into multi-valued fields. For example, I have a document with an id field set to Product:7, a docType field is set to Product as well as multi-valued SKU related fields and data like, sku_color {Red | Green | Blue}, sku_size {Small | Medium | Large}, sku_price {10.00 | 10.00 | 7.99} I hit the roadblock when I tried to answer the question, Which products are available that contain skus with color Green, size M, and a price of $9.99 or less?...and have now begun the switch to SKU level indexing. This also gives me what I need for faceted browsing/navigation, and search refinement...leading the user to Product entities having purchasable SKU entities. But this also means I now have documents which are mostly duplicates for each Product, and all, facet counts, paging and sorting is then inaccurate; so it appears I need do this myself, with multiple Solr requests. Is this really the best approach; and if so, should I use the Solr Deduplication update processor when indexing and querying? Thanks in advance, Kelly -- View this message in context: http://old.nabble.com/Encountering-a-roadblock-with-my-Solr-schema-design...use-dedupe--tp27118977p27118977.html Sent from the Solr - User mailing list archive at Nabble.com.
EOF IOException Query
Hi all, I got following exception for SOLR, but the index is still searchable. (At least it is searchable for query *:*.) I am just wondering what is the root cause. Thanks, Osborn INFO: [publicGalleryPostMaster] webapp=/multicore path=/select params={wt=javabinrows=12start=0sort=/gallery/1/postlist/1Rank_i+descq=%2B(comm unityList_s_m:/gallery/1/postlist/1)+%2Bstate_s:Aversion=1} status=500 QTime=3 Jan 11, 2010 12:23:01 PM org.apache.solr.common.SolrException log SEVERE: java.io.IOException: read past EOF at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:151) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38) at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:80) at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:112) at org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:712) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:208) at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:676) at org.apache.lucene.search.FieldComparator$StringOrdValComparator.setNextReader(FieldComparator.java:667) at org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:94) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:245) at org.apache.lucene.search.Searcher.search(Searcher.java:171) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
Re: Encountering a roadblock with my Solr schema design...use dedupe?
Hi Markus, Thanks for your reply. Using the current schema and query like you suggest, how can I identify the unique combination of options and price for a given SKU? I don't want the user to arrive at a product which doesn't completely satisfy their search request. For example, with the color:Green, size:M, and price:[0 to 9.99] search refinements applied, no products should be displayed which only have size:M in color:Blue The actual data in the database for a product to display on the frontend could be as follows: product id = 1 product name = T-shirt related skus... -- sku id = 7 [color=green, size=S, price=10.99] -- sku id = 9 [color=green, size=L, price=10.99] -- sku id = 10 [color=blue, size=S, price=9.99] -- sku id = 11 [color=blue, size=M, price=10.99] -- sku id = 12 [color=blue, size=L, price=10.99] Regards, Kelly Markus Jelsma - Buyways B.V. wrote: Hello Kelly, I am not entirely sure if i understand your problem correctly. But i believe your first approach is the right one. Your question: Which products are available that contain skus with color Green, size M, and a price of $9.99 or less? can be easily answered using a schema like yours. id = 1 color = [green, blue] size = [M, S] price = 6 id = 2 color = [red, blue] size = [L, S] price = 12 id = 3 color = [green, red, blue] size = [L, S, M] price = 5 Using the data above you can answer your question using a basic Solr query [1] like the following: q=color:green AND price:[0 TO 9,99] AND size:M Of course, you would make this a function query [2] but this, if i understood your question well enough, answers it. [1] http://wiki.apache.org/solr/SolrQuerySyntax [2] http://wiki.apache.org/solr/FunctionQuery Cheers, -- View this message in context: http://old.nabble.com/Encountering-a-roadblock-with-my-Solr-schema-design...use-dedupe--tp27118977p27120031.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Encountering a roadblock with my Solr schema design...use dedupe?
Hello Kelly, Simple boolean algebra, you tell Solr you want color = green AND size = M so it will only return green t-shirts in size M. If you, however, turn the AND in a OR it will return all t-shirts that are green OR in size M, thus you can then get M sized shirts in the blue color or green shirts in size XXL. I suggest you'd just give it a try and perhaps come back later to find some improvements for your query. It would also be a good idea - if i may say so - to read the links provided in the earlier message. Hope you will find what you're looking for :) Cheers, Kelly Taylor zei: Hi Markus, Thanks for your reply. Using the current schema and query like you suggest, how can I identify the unique combination of options and price for a given SKU? I don't want the user to arrive at a product which doesn't completely satisfy their search request. For example, with the color:Green, size:M, and price:[0 to 9.99] search refinements applied, no products should be displayed which only have size:M in color:Blue The actual data in the database for a product to display on the frontend could be as follows: product id = 1 product name = T-shirt related skus... -- sku id = 7 [color=green, size=S, price=10.99] -- sku id = 9 [color=green, size=L, price=10.99] -- sku id = 10 [color=blue, size=S, price=9.99] -- sku id = 11 [color=blue, size=M, price=10.99] -- sku id = 12 [color=blue, size=L, price=10.99] Regards, Kelly Markus Jelsma - Buyways B.V. wrote: Hello Kelly, I am not entirely sure if i understand your problem correctly. But i believe your first approach is the right one. Your question: Which products are available that contain skus with color Green, size M, and a price of $9.99 or less? can be easily answered using a schema like yours. id = 1 color = [green, blue] size = [M, S] price = 6 id = 2 color = [red, blue] size = [L, S] price = 12 id = 3 color = [green, red, blue] size = [L, S, M] price = 5 Using the data above you can answer your question using a basic Solr query [1] like the following: q=color:green AND price:[0 TO 9,99] AND size:M Of course, you would make this a function query [2] but this, if i understood your question well enough, answers it. [1] http://wiki.apache.org/solr/SolrQuerySyntax [2] http://wiki.apache.org/solr/FunctionQuery Cheers, -- View this message in context: http://old.nabble.com/Encountering-a-roadblock-with-my-Solr-schema-design...use-dedupe--tp27118977p27120031.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Encountering a roadblock with my Solr schema design...use dedupe?
Hi Markus, Thanks again. I wish this were simple boolean algebra. This is something I have already tried. So either I am missing the boat completely, or have failed to communicate it clearly. I didn't want to confuse the issue further but maybe the following excerpts will help... Excerpt from Solr 1.4 Enterprise Search Server by David Smiley Eric Pugh... ...the criteria for this hypothetical search involves multi-valued fields, where the index of one matching criteria needs to correspond to the same value in another multi-valued field in the same index. You can't do that... And this excerpt is from Solr and RDBMS: The basics of designing your application for the best of both by by Amit Nithianandan... ...If I wanted to allow my users to search for wiper blades available in a store nearby, I might create an index with multiple documents or records for the same exact wiper blade, each document having different location data (lat/long, address, etc.) to represent an individual store. Solr has a de-duplication component to help show unique documents in case that particular wiper blade is available in multiple stores near me... http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Solr-and-RDBMS-design-basics Remember, with my original schema definition I have multi-valued fields, and when the product document is built, these fields do contain an array of values retrieved from each of the related skus. Skus are children of my products. Using your example data, which t-shirt sku is available for purchase as a child of t-shirt product with id 3? Is it really the green, M, or have we found a product document related to both a green t-shirt and a Medium t-shirt of some other color, which will thereby leave the user with nothing to purchase? sku = 9 [color=green, size=L, price=10.99], product id = 3 sku = 10 [color=blue, size=S, price=9.99], product id = 3 sku = 11 [color=blue, size=M, price=10.99], product id = 3 id = 1 color = [green, blue] size = [M, S] price = 6 id = 2 color = [red, blue] size = [L, S] price = 12 id = 3 color = [green, red, blue] size = [L, S, M] price = 5 If this is still unclear, I'll post a new question based on findings from this conversation. Thanks for all of your help. -Kelly Markus Jelsma - Buyways B.V. wrote: Hello Kelly, Simple boolean algebra, you tell Solr you want color = green AND size = M so it will only return green t-shirts in size M. If you, however, turn the AND in a OR it will return all t-shirts that are green OR in size M, thus you can then get M sized shirts in the blue color or green shirts in size XXL. I suggest you'd just give it a try and perhaps come back later to find some improvements for your query. It would also be a good idea - if i may say so - to read the links provided in the earlier message. Hope you will find what you're looking for :) Cheers, Kelly Taylor zei: Hi Markus, Thanks for your reply. Using the current schema and query like you suggest, how can I identify the unique combination of options and price for a given SKU? I don't want the user to arrive at a product which doesn't completely satisfy their search request. For example, with the color:Green, size:M, and price:[0 to 9.99] search refinements applied, no products should be displayed which only have size:M in color:Blue The actual data in the database for a product to display on the frontend could be as follows: product id = 1 product name = T-shirt related skus... -- sku id = 7 [color=green, size=S, price=10.99] -- sku id = 9 [color=green, size=L, price=10.99] -- sku id = 10 [color=blue, size=S, price=9.99] -- sku id = 11 [color=blue, size=M, price=10.99] -- sku id = 12 [color=blue, size=L, price=10.99] Regards, Kelly Markus Jelsma - Buyways B.V. wrote: Hello Kelly, I am not entirely sure if i understand your problem correctly. But i believe your first approach is the right one. Your question: Which products are available that contain skus with color Green, size M, and a price of $9.99 or less? can be easily answered using a schema like yours. id = 1 color = [green, blue] size = [M, S] price = 6 id = 2 color = [red, blue] size = [L, S] price = 12 id = 3 color = [green, red, blue] size = [L, S, M] price = 5 Using the data above you can answer your question using a basic Solr query [1] like the following: q=color:green AND price:[0 TO 9,99] AND size:M Of course, you would make this a function query [2] but this, if i understood your question well enough, answers it. [1] http://wiki.apache.org/solr/SolrQuerySyntax [2] http://wiki.apache.org/solr/FunctionQuery Cheers, -- View this message in context: http://old.nabble.com/Encountering-a-roadblock-with-my-Solr-schema-design...use-dedupe--tp27118977p27120031.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context:
Re: Commons Lang
There's no point in moving it to Solr core unless something in core depends on it. The VelocityResponseWriter depends on commons-lang, though, and I am aiming to integrate that into core at some point. But, you can put commons-lang in your solr-home/lib and your plugin will be able to see it fine. Erik On Jan 11, 2010, at 4:39 PM, Jeff Newburn wrote: We have a solr plugin that would be much easier to write if commons- lang was available. Why does solr not have this library? Is there any drawbacks to pulling in the commons lang for StringUtils? -- Jeff Newburn Software Engineer, Zappos.com
Re: Tokenizer question
: q=PostCode:(1078 pw)+AND+HouseNumber:(39-43) : : the resulting parsed query contains a phrase query: : : +(PostCode:1078 PostCode:pw) +PhraseQuery(HouseNumber:39 43) This stems from some fairly fundemental behavior i nthe QueryParser ... each chunk of input that isn't deemed markup (ie: not field names, or special characters) is sent to the analyzer. If the analyzer produces multiple tokens at differnet positions, then a PhraseQuery is constructed. -- Things like simple phrase searchs and N-Gram based partial matching require this behavior. If the analyzer produces multiple Tokens, but they all have the same position then the QueryParser produces a BooleanQuery will all SHOULD clauses. -- This is what allows simple synonyms to work. If you write a simple TokenFilter to flatten all of the positions to be the same, and use it after WordDelimiterFilter then it should give you the OR style query you want. This isn't hte default behavior because the Phrase behavior of WDF fits it's intended case better --- someone searching for a product sku like X3QZ-D5 expects it to match X-3QZD5, but not just X or 3QZ -Hoss
Re: Tokenizer question
If the analyzer produces multiple Tokens, but they all have the same position then the QueryParser produces a BooleanQuery will all SHOULD clauses. -- This is what allows simple synonyms to work. You rock Hoss!!! This is exactly the explanation I was looking for .. it is as simple as it sounds. Thanks! Cheers Avlesh On Tue, Jan 12, 2010 at 6:37 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : q=PostCode:(1078 pw)+AND+HouseNumber:(39-43) : : the resulting parsed query contains a phrase query: : : +(PostCode:1078 PostCode:pw) +PhraseQuery(HouseNumber:39 43) This stems from some fairly fundemental behavior i nthe QueryParser ... each chunk of input that isn't deemed markup (ie: not field names, or special characters) is sent to the analyzer. If the analyzer produces multiple tokens at differnet positions, then a PhraseQuery is constructed. -- Things like simple phrase searchs and N-Gram based partial matching require this behavior. If the analyzer produces multiple Tokens, but they all have the same position then the QueryParser produces a BooleanQuery will all SHOULD clauses. -- This is what allows simple synonyms to work. If you write a simple TokenFilter to flatten all of the positions to be the same, and use it after WordDelimiterFilter then it should give you the OR style query you want. This isn't hte default behavior because the Phrase behavior of WDF fits it's intended case better --- someone searching for a product sku like X3QZ-D5 expects it to match X-3QZD5, but not just X or 3QZ -Hoss
Re: Understanding the query parser
Thanks Erik for responding. Hoss explained the behavior with nice corollaries here - http://www.lucidimagination.com/search/document/8bc351d408f24cf6/tokenizer_question Cheers Avlesh On Tue, Jan 12, 2010 at 2:21 AM, Erik Hatcher erik.hatc...@gmail.comwrote: On Jan 11, 2010, at 1:33 PM, Avlesh Singh wrote: It is in the source code of QueryParser's getFieldQuery(String field, String queryText) method line#660. If numTokens 1 it returns Phrase Query. That's exactly the question. Would be nice to hear from someone as to why is it that way? Suppose you indexed Foo Bar. It'd get indexed as two tokens [foo] followed by [bar]. Then someone searches for foo-bar, which would get analyzed into two tokens also. A PhraseQuery is the most logical thing for it to turn into, no? What's the alternative? Of course it's tricky business though, impossible to do the right thing for all cases within SolrQueryParser. Thankfully it is pleasantly subclassable and overridable for this method. Erik
Solr 1.4 Field collapsing - What are the steps for applying the SOLR-236 patch?
Hi, Is there a step-by-step for applying the patch for SOLR-236 to enable field collapsing in Solr 1.4? Thanks, Kelly -- View this message in context: http://old.nabble.com/Solr-1.4-Field-collapsing---What-are-the-steps-for-applying-the-SOLR-236-patch--tp27122621p27122621.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 1.4 Field collapsing - What are the steps for applying the SOLR-236 patch?
it seems to be in flux right now as the solr developers slowly make improvements and ingest the various pieces into the solr trunk, i think your best bet might be to use the 12/24 patch and fix any errors where it doesnt apply cleanly im using solr trunk r892336 with the 12/24 patch --joe On 01/11/2010 08:48 PM, Kelly Taylor wrote: Hi, Is there a step-by-step for applying the patch for SOLR-236 to enable field collapsing in Solr 1.4? Thanks, Kelly
Seattle Hadoop / HBase / Lucene / NoSQL meetup Jan 27th!
Greetings, A friendly reminder that the Seattle Hadoop, NoSQL, etc. meetup is on January 27th at University of Washington in the Allen Computer Science Building, room 303. I believe Razorfish will be giving a talk on how they use Hadoop. Here's the new, shiny meetup.com link with more detail: http://www.meetup.com/Seattle-Hadoop-HBase-NoSQL-Meetup -- http://www.drawntoscalehq.com -- Big Data for all. The Big Data Platform. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
Re: Tokenizer question
Cristal clear. Thanks for your responsetime! -- View this message in context: http://old.nabble.com/Tokenizer-question-tp27099119p27123281.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: update solr index
On Mon, Jan 11, 2010 at 7:42 PM, Marc Des Garets marc.desgar...@192.comwrote: I am running solr in tomcat and I have about 35 indexes (between 2 and 80 millions documents each). Currently if I try to update few documents from an index (let's say the one which contains 80 millions documents) while tomcat is running and therefore receiving requests, I am getting few very long garbage collection (about 60sec). I am running tomcat with -Xms10g -Xmx10g -Xmn2g -XX:PermSize=256m -XX:MaxPermSize=256m. I'm using ConcMarkSweepGC. I have 2 questions: 1. Is solr doing something specific while an index is being updated like updating something in memory which would cause the garbage collection? Solr's caches are thrown away and a fixed number of old queries are re-executed to re-generated the cache on the new index (known as auto-warming). This happens on a commit. 2. Any idea how I could solve this problem? Currently I stop tomcat, update index, start tomcat. I would like to be able to update my index while tomcat is running. I was thinking about running more tomcat instance with less memory for each and each running few of my indexes. Do you think it would be the best way to go? If you stop tomcat, how do you update the index? Are you running a multi-core setup? Perhaps it is better to split up the indexes among multiple boxes. Also, you should probably lower the JVM heap so that the full GC pause doesn't make your index unavailable for such a long time. Also see http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr -- Regards, Shalin Shekhar Mangar.
What is this error means?
When I am building the index for around 2 ~ 25000 records, sometimes I came across with this error: Uncaught exception Exception with message '0' Status: Communication Error I search Google Yahoo but no answer. I am now committing document to solr on every 10 records fetched from a SQLite Database with PHP 5.3. Platform: Windows 7 Home Web server: Nginx Solr Specification Version: 1.4.0 Solr Implementation Version: 1.4.0 833479 - grantingersoll - 2009-11-06 12:33:40 Lucene Specification Version: 2.9.1 Lucene Implementation Version: 2.9.1 832363 - 2009-11-03 04:37:25 Solr hosted in jetty 6.1.3 All the above are in one single test machine. The situation is that sometimes when I build the index, it can be created successfully. But sometimes it will just stop with the above error. Any clue? Please help. Thank you in advance.