Re: behavior of solr.KeepWordFilterFactory
Solr index is case-sensitive by default, unless you used the lower case filter. I remember I saw this topic on Solr, and the solution is simple: copy the filed; use a new analyzer/tokenizer to process this field, and do not use lower case filter when query, make sure both fields are included. On Mon, Dec 3, 2012 at 3:04 PM, Joe Zhang smartag...@gmail.com wrote: In other words, what I wanted to achieve is case-senstive indexing on a small set of words. Can anybody help? On Sun, Dec 2, 2012 at 11:56 PM, Joe Zhang smartag...@gmail.com wrote: To be more specific, this is the data type I was using: fieldType name=textspecial class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.KeepWordFilterFactory words=tickers.txt ignoreCase=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType On Sun, Dec 2, 2012 at 11:51 PM, Joe Zhang smartag...@gmail.com wrote: yes, that is the correct behavior. But how do I achieve my goal, i.e, speical treatment on a list of uppercase/special words, normal treatment on everything else? On Sun, Dec 2, 2012 at 11:46 PM, Xi Shen davidshe...@gmail.com wrote: By the definition on https://lucene.apache.org/solr/api-3_6_1/org/apache/solr/analysis/KeepWordFilter.html , I am pretty sure it is the correct behavior of this filter :) I guess you are trying to this filter to index some special words in Chinese? On Mon, Dec 3, 2012 at 1:54 PM, Joe Zhang smartag...@gmail.com wrote: I defined the following data type in my solr schema.xml fieldtype name=testkeep class=solr.TextField analyzer filter class=solr.KeepWordFilterFactory words=keepwords.txt ignoreCase=false/ /analyzer /fieldtype when I use the type testkeep to index a test field, my true expecation was to make sure solr indexes the uppercase form of a small list of words in the file, AND TREAT EVERY OTHER WORD AS USUAL. The goal of securing the closed list is achieved, but NO OTHER WORD outside the list is indexed! Can anybody help? Thanks in advance! Joe -- Regards, David Shen http://about.me/davidshen https://twitter.com/#!/davidshen84 -- Regards, David Shen http://about.me/davidshen https://twitter.com/#!/davidshen84
Re: duplicated URL sent from Nutch to solr index
Then the URL must be the same. On Mon, Dec 3, 2012 at 2:34 PM, Joe Zhang smartag...@gmail.com wrote: Sorry I didn't make it perfectly clear. The id field is URL. On Sun, Dec 2, 2012 at 11:33 PM, Joe Zhang smartag...@gmail.com wrote: Thanks! On Sun, Dec 2, 2012 at 11:20 PM, Xi Shen davidshe...@gmail.com wrote: If the value for id field is the same, the old entry will be update; if it is new, a new entry will be created indexed. This is my experience. :) On Mon, Dec 3, 2012 at 1:45 PM, Joe Zhang smartag...@gmail.com wrote: Dear list, I just want to confirm an expected behavior of solr: Assuming we have uniqueKeyid/uniqueKey in schema.xml for solr, when we send the same URL from nutch to solr multiple times. would there be ONLY ONE entry for that URL, but the content (if changed) and timestamp would be updated? Thanks! Joe -- Regards, David Shen http://about.me/davidshen https://twitter.com/#!/davidshen84 -- Regards, David Shen http://about.me/davidshen https://twitter.com/#!/davidshen84
Re: Solr 4: Join Query
Hi Erick, One more thing: So is there any other way to get the result? I mean, I need to get both parent and child document in/not nested format. Regards, Vikash Regards, Vikash Sharma vikash0...@gmail.com On Sat, Dec 1, 2012 at 10:29 PM, Erick Erickson erickerick...@gmail.comwrote: That's the way joins work, and why they're called pseudo join, they don't work like DB joins and return data from both records Joins were put in for a specific use-case, when you try to treat Solr like a DB you're bound to be disappointed. I'd think about reworking the solution to de-normalize the data so you don't have to do joins. Best Erick On Fri, Nov 30, 2012 at 10:38 AM, Vikash Sharma vikash0...@gmail.com wrote: Hi All, I have my field definition in schema.xml like below field name=id type=string indexed=true. / field name=Emp_id type=string indexed=true. / field name=doc_id type=string indexed=true. / field name=content type=string indexed=true. / I need to create separate record in solr for each parent child relationship... such that if child is same across different parent that it gets stored only once. For e.g. ---_Record 1 idABCid/ emp_idEMP001emp_id/ doc_idDOC001doc_id/ doc_contentMy Parent Docdoc_content/ ---_Record 2 idDOC001id/ emp_idemp_id/ doc_iddoc_id/ doc_contentMy Document Datadoc_content/ This will ensure that if any doc_id content is duplicate, than only once the record is inserted in the solr. Lastly, I want the result as join. if emp_id=EMP001. then both record should be returned, as there is a relationship between two records using of doc_id = id If I query: http://localhost:8983/solr/select?q={!join%20from=doc_id%20to=id}emp_id:EMP001wt=json http://localhost:8983/solr/select?q={!join%20from=sha_one%20to=id}project_id:10wt=json I expect both record should be returned either one after another or nested.. But I only get child records... Please help.. Regards, Vikash Sharma vikash0...@gmail.com
How to change Solr UI
Hi, I want to change the Solr UI. As far as i understand, Solritas is just for prototyping, where I can change the UI according to a predefined template (Velocity) and cannot add on any additional functionality to that page. How can I change the Solr UI otherwise. Any guidance would be appreciated. Thanks and regards, Romita
AW: Edismax query parser and phrase queries
Hi, the use case we have in mind is that we would like to achieve exact matches for explicit phrases. Our users expect that an explicit phrase not only considers the order of terms, but also the exact wording. Therefore if we search on fields using a data type that is not meant performing exact matches, we need to change that for explicit phrases. This means in a usual query we have qf default fields using advanced tokenization (for query processing and indexing), for example like stemming via SnowballPorterFilterFactory. So our idea was to change the default search fields for explicit phrases to achieve exact matches, by using a simple data format like for example “string“ (StrField, without advanced options). Extending our example from the last mail: qf=title text Datatype of title, text, something like “text_advanced”: fieldtype ... analyzer type=index !--(and also analyzer type=query )-- filter class=solr.WordDelimiterFilterFactory ... filter class=solr.LowerCaseFilterFactory / filter class=solr.SnowballPorterFilterFactory language=German2 / ... Data type of the additional fields titleExact, textExact: fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ q=ran away from home Cat Dog -transformTo- q=( titleExact:ran away from home OR textExact:ran away from home ) Cat Dog. Regards, Richard. BINSERV Gesellschaft für interaktive Konzepte und neue Medien mbH Software Engineer Gotenstr. 7-9 53175 Bonn Tel.: +49 (0)228 / 4 22 86 - 38 Fax.: +49 (0)228 / 4 22 86 - 538 E-Mail: r.tant...@binserv.de Web: www.binserv.de www.binforcepro.de Geschäftsführer: Rüdiger Jakob Amtsgericht: Siegburg HRB 6765 Hauptsitz der Gesellschaft.: Pfarrer-Wichert-Str. 35, 53639 Königswinter Diese E-Mail einschließlich eventuell angehängter Dateien enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind und diese E-Mail irrtümlich erhalten haben, dürfen Sie weder den Inhalt dieser E-Mail nutzen noch dürfen Sie die eventuell angehängten Dateien öffnen und auch nichts kopieren oder weitergeben/verbreiten. Bitte verständigen Sie den Absender und löschen Sie diese E-Mail und eventuell angehängte Dateien umgehend. Vielen Dank! - Original message - Von: Jack Krupansky [mailto:j...@basetechnology.com] Gesendet: Freitag, 30. November 2012 23:04 An: solr-user@lucene.apache.org Betreff: Re: Edismax query parser and phrase queries I don’t have a simple answer for your stated issue, but maybe part of that is because I’m not so sure what the exact problem/goal is. I mean, what’s so special about phrase queries for your app than they need distinct processing from individual terms? And, ultimately, what goal are you trying to achieve? Such as, how will the outcome of the query affect what users see and do. -- Jack Krupansky From: Tantius, Richard Sent: Friday, November 30, 2012 8:44 AM To: solr-user@lucene.apache.org Subject: Edismax query parser and phrase queries Hi, we are using the edismax query parser and execute queries on specific fields by using the qf option. Like others, we are facing the problem we do not want explicit phrase queries to be performed on some of the qf fields and also require additional search fields for those kind of queries. We tried to expand explicit phrases in a query by implementing some pre-processing logic, which did not seemed to be quite convenient. So for example (lets assume qf=title text, we want phrase queries to be performed on the additional fields titleAlt textAlt ): q=ran away from home Cat Dog -transformTo- q=( titleAlt:ran away from home OR textAlt:ran away from home ) Cat Dog. Unfortunately this gets rather complicated if logic operators are involved within the query. Is there some kind of best practice, should we for example extend the query parser, or stick to our pre-processing approach? Regards, Richard.
Re: Replication in SolrCloud
Thanks for the explaination It's clear now... I expanded the setup to: 4 hosts with 2 shards en 1 replicator for each shard. When I shutdown tomcat on solr01-dcg which is the master of shard 1 for both collections, the replicator (solr01-gs) seems NOT to takeover. See logs below. Dec 3, 2012 9:55:34 AM org.apache.solr.cloud.ShardLeaderElectionContext runLeaderProcess INFO: Running the leader process. Dec 3, 2012 9:55:34 AM org.apache.solr.cloud.ShardLeaderElectionContext shouldIBeLeader INFO: Checking if I should try and be the leader. Dec 3, 2012 9:55:34 AM org.apache.solr.cloud.ShardLeaderElectionContext shouldIBeLeader INFO: My last published State was Active, it's okay to be the leader. Dec 3, 2012 9:55:34 AM org.apache.solr.cloud.ShardLeaderElectionContext runLeaderProcess INFO: I may be the new leader - try and sync Dec 3, 2012 9:55:34 AM org.apache.solr.cloud.SyncStrategy sync INFO: Sync replicas to http://solr01-gs:8983/solr/intradesk/ Dec 3, 2012 9:55:34 AM org.apache.solr.update.PeerSync sync INFO: PeerSync: core=intradesk url="" class="moz-txt-link-freetext" href="http://solr01-gs:8983/solr">http://solr01-gs:8983/solr START replicas=[http://solr01-dcg:8983/solr/intradesk/] nUpdates=100 Dec 3, 2012 9:55:34 AM org.apache.solr.update.PeerSync sync INFO: PeerSync: core=intradesk url="" class="moz-txt-link-freetext" href="http://solr01-gs:8983/solr">http://solr01-gs:8983/solr DONE. We have no versions. sync failed. Dec 3, 2012 9:55:34 AM org.apache.solr.common.SolrException log SEVERE: Sync Failed Dec 3, 2012 9:55:34 AM org.apache.solr.cloud.ShardLeaderElectionContext rejoinLeaderElection INFO: There is a better leader candidate than us - going back into recovery Dec 3, 2012 9:55:35 AM org.apache.solr.update.DefaultSolrCoreState doRecovery INFO: Running recovery - first canceling any ongoing recovery Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.RecoveryStrategy run INFO: Starting recovery process. core=intradesk recoveringAfterStartup=false Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.RecoveryStrategy doRecovery INFO: Attempting to PeerSync from http://solr01-dcg:8983/solr/intradesk/ core=intradesk - recoveringAfterStartup=false Dec 3, 2012 9:55:35 AM org.apache.solr.update.PeerSync sync INFO: PeerSync: core=intradesk url="" class="moz-txt-link-freetext" href="http://solr01-gs:8983/solr">http://solr01-gs:8983/solr START replicas=[http://solr01-dcg:8983/solr/intradesk/] nUpdates=100 Dec 3, 2012 9:55:35 AM org.apache.solr.update.PeerSync sync WARNING: no frame of reference to tell of we've missed updates Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.RecoveryStrategy doRecovery INFO: PeerSync Recovery was not successful - trying replication. core=intradesk Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.RecoveryStrategy doRecovery INFO: Starting Replication Recovery. core=intradesk Dec 3, 2012 9:55:35 AM org.apache.solr.client.solrj.impl.HttpClientUtil createClient INFO: Creating new http client, config:maxConnections=128maxConnectionsPerHost=32followRedirects=false Dec 3, 2012 9:55:35 AM org.apache.solr.common.SolrException log SEVERE: Error while trying to recover. core=intradesk:org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://solr01-dcg:8983/solr at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:406) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220) Caused by: org.apache.http.conn.HttpHostConnectException: Connection to http://solr01-dcg:8983 refused at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:158) at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:150) at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121) at
Re: Replication in SolrCloud
Never mind I think I found it. There must be some documents into each shardso they havea version number. Then everything seems to work... On 11/30/2012 04:57 PM, Mark Miller wrote: Thanks for all the detailed info! Yes, that is confusing. One of the sore points we have while supporting both std Solr and SolrCloud mode. In SolrCloud, every node is a Master when thinking about std Solr replication. However, as you see on the cloud page, only one of them is a *leader*. A leader is different than a master. Being a Master when it comes to the replication handler simply means you can replicate the index to other nodes - in SolrCloud we need every node to be capable of doing that. Each shard only has one leader, but every node in your cluster will be a replication master. - Mark On Nov 30, 2012, at 10:32 AM, Arkadi Colson ark...@smartbit.be wrote: This is my setup for solrCloud 4.0 on Tomcat 7.0.33 and zookeeper 3.4.5 hosts: - solr01-dcg (first started) - solr01-gs (second started so becomes replicate) collections: - smsc shards: - mydoc zookeeper: - on solr01-dcg - on solr01-gs SOLR_OPTS=-Dsolr.solr.home=/opt/solr/ -Dport=8983 -Dcollection.configName=smsc -DzkClientTimeout=2 -DzkHost=solr01-dcg:2181,solr01-gs:2181 solr.xml: ?xml version=1.0 encoding=UTF-8 ? solr persistent=true cores adminPath=/admin/cores zkClientTimeout=2 hostPort=8983 core schema=schema.xml shard=shard1 instanceDir=/solr/mydoc/ name=mydoc config=solrconfig.xml collection=mydoc/ /cores /solr I upload the config to zookeeper: java -classpath .:/usr/local/tomcat/webapps/solr/WEB-INF/lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost solr01-dcg:2181,solr01-gs:2181 -confdir /opt/solr/conf -confname smsc Linking the config to the collection: java -classpath .:/usr/local/tomcat/webapps/solr/WEB-INF/lib/* org.apache.solr.cloud.ZkCLI -cmd linkconfig -collection mydoc -zkhost solr01-dcg.intnet.smartbit.be:2181,solr01-gs.intnet.smartbit.be:2181 -confname smsc cloud on both hosts: dcddagii.png solr01-dcg hhfgdeab.png solr01-gs: daafhdef.png Any idea? Thanks! On 11/30/2012 03:15 PM, Mark Miller wrote: On Nov 30, 2012, at 5:08 AM, Arkadi Colson ark...@smartbit.be wrote: Hi I've setup an simple 2 machine cloud with 1 shard, one replicator and 2 collections.Everything went fine. However when I look at the interface: http://localhost:8983/solr/#/coll1/replication is reporting the both machines are master. Did I do something wrong in my config or isit a report for manual replication configuration? Can someone else check this? How? You don't really give anything to look at :) Is it poossible to link 2 collections to the same conf in zookeeper? Yes, that is no problem. - Mark -- Met vriendelijke groeten Arkadi Colson Smartbit bvba . Hoogstraat 13 . 3670 Meeuwen T +32 11 64 08 80 . F +32 11 64 08 81
Re: News clustering
One of our clients uses Solr's search results clustering for grouping news. Instead of the default Carrot2 algorithm that ships with Solr they use a commercial one, but Carrot2 should give you decent clusters too. Here's an example clustering result: http://imagebin.org/238001 Staszek -- Stanislaw Osinski http://carrotsearch.com On Fri, Nov 30, 2012 at 4:44 PM, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Hi all: I'm thinking on using nutch combined with solr to index some news sites in an intranet. And I was wondering how effective could be using the clustering component to cluster the search results? Any success history on using solr clustering component for news clustering? Any existing solution for clustering/classification on index time? Greetings! 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: behavior of solr.KeepWordFilterFactory
across-the-board case-senstive indexing is not what I want... Let me make sure I understand your suggestion: fieldType name=text1 class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType fieldType name=text2 class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ /analyzer /fieldType And define content1 as text1, content2 as text2? On Mon, Dec 3, 2012 at 1:09 AM, Xi Shen davidshe...@gmail.com wrote: Solr index is case-sensitive by default, unless you used the lower case filter. I remember I saw this topic on Solr, and the solution is simple: copy the filed; use a new analyzer/tokenizer to process this field, and do not use lower case filter when query, make sure both fields are included. On Mon, Dec 3, 2012 at 3:04 PM, Joe Zhang smartag...@gmail.com wrote: In other words, what I wanted to achieve is case-senstive indexing on a small set of words. Can anybody help? On Sun, Dec 2, 2012 at 11:56 PM, Joe Zhang smartag...@gmail.com wrote: To be more specific, this is the data type I was using: fieldType name=textspecial class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.KeepWordFilterFactory words=tickers.txt ignoreCase=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType On Sun, Dec 2, 2012 at 11:51 PM, Joe Zhang smartag...@gmail.com wrote: yes, that is the correct behavior. But how do I achieve my goal, i.e, speical treatment on a list of uppercase/special words, normal treatment on everything else? On Sun, Dec 2, 2012 at 11:46 PM, Xi Shen davidshe...@gmail.com wrote: By the definition on https://lucene.apache.org/solr/api-3_6_1/org/apache/solr/analysis/KeepWordFilter.html , I am pretty sure it is the correct behavior of this filter :) I guess you are trying to this filter to index some special words in Chinese? On Mon, Dec 3, 2012 at 1:54 PM, Joe Zhang smartag...@gmail.com wrote: I defined the following data type in my solr schema.xml fieldtype name=testkeep class=solr.TextField analyzer filter class=solr.KeepWordFilterFactory words=keepwords.txt ignoreCase=false/ /analyzer /fieldtype when I use the type testkeep to index a test field, my true expecation was to make sure solr indexes the uppercase form of a small list of words in the file, AND TREAT EVERY OTHER WORD AS USUAL. The goal of securing the closed list is achieved, but NO OTHER WORD outside the list is indexed! Can anybody help? Thanks in advance! Joe -- Regards, David Shen http://about.me/davidshen https://twitter.com/#!/davidshen84 -- Regards, David Shen http://about.me/davidshen https://twitter.com/#!/davidshen84
Re: News clustering
Hi Stanislaw Osinski, On Mon, Dec 3, 2012 at 6:13 PM, Stanislaw Osinski stanis...@osinski.namewrote: One of our clients uses Solr's search results clustering for grouping news. Instead of the default Carrot2 algorithm that ships with Solr they use a commercial one, but Carrot2 should give you decent clusters too. Here's an example clustering result: http://imagebin.org/238001 Staszek -- Stanislaw Osinski http://carrotsearch.com On Fri, Nov 30, 2012 at 4:44 PM, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Hi all: I'm thinking on using nutch combined with solr to index some news sites in an intranet. And I was wondering how effective could be using the clustering component to cluster the search results? Any success history on using solr clustering component for news clustering? Any existing solution for clustering/classification on index time? Greetings! 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: News clustering
Hi Stanislaw Osinski, Was the picture generated using Lingo 3G algorihtms? I saw some sub-clusters inside it. Nice pic :) I am interested to learn it. How long is the Lingo 3G trial period? Is there any way to programmatically measure the performance of Carrot2 clustering algorithm? thanx cheers Hanjoyo On Mon, Dec 3, 2012 at 6:13 PM, Stanislaw Osinski stanis...@osinski.namewrote: One of our clients uses Solr's search results clustering for grouping news. Instead of the default Carrot2 algorithm that ships with Solr they use a commercial one, but Carrot2 should give you decent clusters too. Here's an example clustering result: http://imagebin.org/238001 Staszek -- Stanislaw Osinski http://carrotsearch.com On Fri, Nov 30, 2012 at 4:44 PM, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Hi all: I'm thinking on using nutch combined with solr to index some news sites in an intranet. And I was wondering how effective could be using the clustering component to cluster the search results? Any success history on using solr clustering component for news clustering? Any existing solution for clustering/classification on index time? Greetings! 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: How to change Solr UI
Hi Romita, In my opinion, if you are new to Solr, you can start learning from Solritas. Solritas uses Apache Velocity, a templating language, CSS and JQuery to manage it looks and behavior. Besides that you can write a custom SearchComponent inside the /browse SearchHandler to add more functionality to your search application. Kind regards, Hanjoyo On Mon, Dec 3, 2012 at 4:35 PM, Romita Saha romita.s...@sg.panasonic.comwrote: Hi, I want to change the Solr UI. As far as i understand, Solritas is just for prototyping, where I can change the UI according to a predefined template (Velocity) and cannot add on any additional functionality to that page. How can I change the Solr UI otherwise. Any guidance would be appreciated. Thanks and regards, Romita
Re: News clustering
Was the picture generated using Lingo 3G algorihtms? I saw some sub-clusters inside it. Nice pic :) That is correct. I am interested to learn it. How long is the Lingo 3G trial period? I'll send you the details in a private e-mail in a second. Is there any way to programmatically measure the performance of Carrot2 clustering algorithm? I'm not sure what you mean by performance. Measuring clustering time is pretty straightforward, measuring the quality of clusters is not, a lot depends on your specific data and application. Staszek
Whole Phrase search in Solr
Hello, I am trying to achieve searching with a phrase in SOLR. Specifically I have the following field in my schema: field name=search_field type=phrase_search indexed=true stored=false multiValued=true/ fieldType name= phrase_search class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Also (as a second similar problem) in the “synonyms.txt” I have values like these: aword = a whole phrase and I even tried: aword = a whole phrase now I tried searching for “check this” in several ways: fq=search_field:check this fq=search_field:check+this fq=search_field:check this fq=search_field:'check this' but in all cases the search seems to run for “check OR this”! similarly, if I search for “aword” which matches the synonyms file, the search also looks for “a OR whole OR phrase”. What am I doing wrong? Is there any way to force the query for the whole phrase and not for each word separately? -- View this message in context: http://lucene.472066.n3.nabble.com/Whole-Phrase-search-in-Solr-tp4023931.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Luke and SOLR search giving different results
Jack, Thanks for help. I removed data folder of SOLR and indexed this sample doc from scratch, there was no document in SOLR but only one. When I analysed , I can see stemming is correct and I can see these for words bul, baş ,gör and umut in SF row I attached analyse screens Erol Akarsu On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky j...@basetechnology.comwrote: Have you tried using the Solr Admin Analysis page, using the word and a few words of context for index analysis and the word alone for query analysis? And be sure to fully reindex if you change ANYTHING in the schema fields or field types. -- Jack Krupansky From: Erol Akarsu Sent: Sunday, December 02, 2012 10:38 PM To: solr-user@lucene.apache.org Subject: Luke and SOLR search giving different results Hi, I am trying to apply SOLR for Turkish Language for my research. Instead of using language identification, I manually assigned Turkish language for a sample test document. I have configured SOLR schema.xml, activated the part below. I have added the attached document testTurkishDoc.xml that is inserted to SOLR database. But searching for raw Lucene index through Luke and SOLR 4.0 search though GUI is giving different results. In picture Selection_006.png, the word baş is listed as top term. I search the word baş in Luke and I got the result result that is only document, shown in Selection_004.png. But in SOLR GUI, I am getting empty result for word baş in picture Selection_002.png. In the text we have features field, that has word baştan that is being derived from root word baş in Turkish Grammar. Somehow, SOLR GUI is doing search different than Luke. I could not figure it out why I could not find it while getting in Luke. The same thing happens for words umut, bul and gör. I will appreciate if you can help me to get same results from SOLR UI. field name=features Firmalarsa Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu! diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı reklam Arda'nın kabinde papağan gibi tekrarladığı My darling! repliği, sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de Paris'in ancak 5 kez izledikten sonra anlaşılan Paris seçti, firma yaptı, Arda bayıldı. sözleriyle kazındı hafızalara, Keşke unutabilsek! dedirterek. /field Added to schema.xml for SOLR: field name=features type=text_tr indexed=true stored=true multiValued=true/ fieldType name=text_tr class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.TurkishLowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_tr.txt enablePositionIncrements=true/ filter class=solr.SnowballPorterFilterFactory language=Turkish/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.TurkishLowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_tr.txt enablePositionIncrements=true/ filter class=solr.SnowballPorterFilterFactory language=Turkish/ /analyzer /fieldType
Backing up SolR 4.0
Hi all. I'm new to SolR, and I have recently had to set up a SolR server running 4.0. I've been searching for info on backing it up, but all I've managed to come up with is it'll be different or you'll be able to do push replication or using http and the command=backup parameter, which doesn't sound like it will be effective for a production setup (unless I've got that wrong)... I was wondering if I can just stop or suspend the SolR server, then do an LVM snapshot of the data store, before bringing it back on line, but I'm not sure if that will cut it. I gather merely rsyncing the data files won't do... Can anyone give me a pointer to that easy-to-find document I have so far failed to find? Or failing that, maybe some sound advice on how to proceed? Regards, -Andy -- Andy D'Arcy Jewell SysMicro Limited Linux Support E: andy.jew...@sysmicro.co.uk W: www.sysmicro.co.uk
Re: News clustering
Hi Stanislaw, I mean measuring the similarity between the document in each cluster. Also, difference between document on one cluster with another cluster. I saw the sample code ClusteringQualityBencmark.java However, I do not know how to make use of it for assessing my Solr Clustering performance. Kind regards, Hanjoyo On Mon, Dec 3, 2012 at 8:11 PM, Stanislaw Osinski stanis...@osinski.namewrote: Was the picture generated using Lingo 3G algorihtms? I saw some sub-clusters inside it. Nice pic :) That is correct. I am interested to learn it. How long is the Lingo 3G trial period? I'll send you the details in a private e-mail in a second. Is there any way to programmatically measure the performance of Carrot2 clustering algorithm? I'm not sure what you mean by performance. Measuring clustering time is pretty straightforward, measuring the quality of clusters is not, a lot depends on your specific data and application. Staszek
PHP client
Hi Anyone tested the pecl Solr Client in combination with SolrCloud? I seems to be broken since 4.0 Best regard Arkadi
Re: PHP client
https://bugs.php.net/bug.php?id=62332 There is a fork with patches applied. On Mon, Dec 3, 2012 at 9:38 AM, Arkadi Colson ark...@smartbit.be wrote: Hi Anyone tested the pecl Solr Client in combination with SolrCloud? I seems to be broken since 4.0 Best regard Arkadi
Re: AW: Edismax query parser and phrase queries
Okay, so the bottom line here is that you wish to change the semantics of quoted phrases. Fine, that's your prerogative, but a change in semantics would require a change to the query parser, or as you originally indicated, a pre-processor. It does sound as if a pre-processor is the way to go here. You still have a choice: An application-level preprocessor that generates an edismax query, or implement a Solr SearchComponent that pre-processes the query after Solr receives it but before edismax sees it. The former is probably easier. The only question is whether there might be multiple applications that access the same Solr node, so that maybe centralizing the pre-processing in Solr might be warranted. -- Jack Krupansky -Original Message- From: Tantius, Richard Sent: Monday, December 03, 2012 5:03 AM To: solr-user@lucene.apache.org Subject: AW: Edismax query parser and phrase queries Hi, the use case we have in mind is that we would like to achieve exact matches for explicit phrases. Our users expect that an explicit phrase not only considers the order of terms, but also the exact wording. Therefore if we search on fields using a data type that is not meant performing exact matches, we need to change that for explicit phrases. This means in a usual query we have qf default fields using advanced tokenization (for query processing and indexing), for example like stemming via SnowballPorterFilterFactory. So our idea was to change the default search fields for explicit phrases to achieve exact matches, by using a simple data format like for example “string“ (StrField, without advanced options). Extending our example from the last mail: qf=title text Datatype of title, text, something like “text_advanced”: fieldtype ... analyzer type=index !--(and also analyzer type=query )-- filter class=solr.WordDelimiterFilterFactory ... filter class=solr.LowerCaseFilterFactory / filter class=solr.SnowballPorterFilterFactory language=German2 / ... Data type of the additional fields titleExact, textExact: fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ q=ran away from home Cat Dog -transformTo- q=( titleExact:ran away from home OR textExact:ran away from home ) Cat Dog. Regards, Richard. BINSERV Gesellschaft für interaktive Konzepte und neue Medien mbH Software Engineer Gotenstr. 7-9 53175 Bonn Tel.: +49 (0)228 / 4 22 86 - 38 Fax.: +49 (0)228 / 4 22 86 - 538 E-Mail: r.tant...@binserv.de Web: www.binserv.de www.binforcepro.de Geschäftsführer: Rüdiger Jakob Amtsgericht: Siegburg HRB 6765 Hauptsitz der Gesellschaft.: Pfarrer-Wichert-Str. 35, 53639 Königswinter Diese E-Mail einschließlich eventuell angehängter Dateien enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind und diese E-Mail irrtümlich erhalten haben, dürfen Sie weder den Inhalt dieser E-Mail nutzen noch dürfen Sie die eventuell angehängten Dateien öffnen und auch nichts kopieren oder weitergeben/verbreiten. Bitte verständigen Sie den Absender und löschen Sie diese E-Mail und eventuell angehängte Dateien umgehend. Vielen Dank! - Original message - Von: Jack Krupansky [mailto:j...@basetechnology.com] Gesendet: Freitag, 30. November 2012 23:04 An: solr-user@lucene.apache.org Betreff: Re: Edismax query parser and phrase queries I don’t have a simple answer for your stated issue, but maybe part of that is because I’m not so sure what the exact problem/goal is. I mean, what’s so special about phrase queries for your app than they need distinct processing from individual terms? And, ultimately, what goal are you trying to achieve? Such as, how will the outcome of the query affect what users see and do. -- Jack Krupansky From: Tantius, Richard Sent: Friday, November 30, 2012 8:44 AM To: solr-user@lucene.apache.org Subject: Edismax query parser and phrase queries Hi, we are using the edismax query parser and execute queries on specific fields by using the qf option. Like others, we are facing the problem we do not want explicit phrase queries to be performed on some of the qf fields and also require additional search fields for those kind of queries. We tried to expand explicit phrases in a query by implementing some pre-processing logic, which did not seemed to be quite convenient. So for example (lets assume qf=title text, we want phrase queries to be performed on the additional fields titleAlt textAlt ): q=ran away from home Cat Dog -transformTo- q=( titleAlt:ran away from home OR textAlt:ran away from home ) Cat Dog. Unfortunately this gets rather complicated if logic operators are involved within the query. Is there some kind of best practice, should we for example extend the query parser, or stick to our pre-processing approach? Regards, Richard.
Re: Luke and SOLR search giving different results
So, does that highlight the problem for you or not? Is the term analyzed as you expected? -- Jack Krupansky From: Erol Akarsu Sent: Monday, December 03, 2012 8:44 AM To: solr-user@lucene.apache.org Subject: Re: Luke and SOLR search giving different results Jack, Thanks for help. I removed data folder of SOLR and indexed this sample doc from scratch, there was no document in SOLR but only one. When I analysed , I can see stemming is correct and I can see these for words bul, baş ,gör and umut in SF row I attached analyse screens Erol Akarsu On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky j...@basetechnology.com wrote: Have you tried using the Solr Admin Analysis page, using the word and a few words of context for index analysis and the word alone for query analysis? And be sure to fully reindex if you change ANYTHING in the schema fields or field types. -- Jack Krupansky From: Erol Akarsu Sent: Sunday, December 02, 2012 10:38 PM To: solr-user@lucene.apache.org Subject: Luke and SOLR search giving different results Hi, I am trying to apply SOLR for Turkish Language for my research. Instead of using language identification, I manually assigned Turkish language for a sample test document. I have configured SOLR schema.xml, activated the part below. I have added the attached document testTurkishDoc.xml that is inserted to SOLR database. But searching for raw Lucene index through Luke and SOLR 4.0 search though GUI is giving different results. In picture Selection_006.png, the word baş is listed as top term. I search the word baş in Luke and I got the result result that is only document, shown in Selection_004.png. But in SOLR GUI, I am getting empty result for word baş in picture Selection_002.png. In the text we have features field, that has word baştan that is being derived from root word baş in Turkish Grammar. Somehow, SOLR GUI is doing search different than Luke. I could not figure it out why I could not find it while getting in Luke. The same thing happens for words umut, bul and gör. I will appreciate if you can help me to get same results from SOLR UI. field name=features Firmalarsa “Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!” diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan ve büyük umutlarla Türkiye’ye getirilen Paris Hilton’un oynatıldığı giyim firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı reklam Arda’nın kabinde papağan gibi tekrarladığı “My darling!” repliği, sonunda Paris’i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de Paris’in ancak 5 kez izledikten sonra anlaşılan “Paris seçti, firma yaptı, Arda bayıldı.” sözleriyle kazındı hafızalara, “Keşke unutabilsek!” dedirterek. /field Added to schema.xml for SOLR: field name=features type=text_tr indexed=true stored=true multiValued=true/ fieldType name=text_tr class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.TurkishLowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_tr.txt enablePositionIncrements=true/ filter class=solr.SnowballPorterFilterFactory language=Turkish/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.TurkishLowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_tr.txt enablePositionIncrements=true/ filter class=solr.SnowballPorterFilterFactory language=Turkish/ /analyzer /fieldType
Re: Solr 4: Join Query
not that I know of. Also, your performance will be much better if you can denormlized the data. On Mon, Dec 3, 2012 at 12:44 AM, Vikash Sharma vikash0...@gmail.com wrote: Hi Erick, One more thing: So is there any other way to get the result? I mean, I need to get both parent and child document in/not nested format. Regards, Vikash Regards, Vikash Sharma vikash0...@gmail.com On Sat, Dec 1, 2012 at 10:29 PM, Erick Erickson erickerick...@gmail.com wrote: That's the way joins work, and why they're called pseudo join, they don't work like DB joins and return data from both records Joins were put in for a specific use-case, when you try to treat Solr like a DB you're bound to be disappointed. I'd think about reworking the solution to de-normalize the data so you don't have to do joins. Best Erick On Fri, Nov 30, 2012 at 10:38 AM, Vikash Sharma vikash0...@gmail.com wrote: Hi All, I have my field definition in schema.xml like below field name=id type=string indexed=true. / field name=Emp_id type=string indexed=true. / field name=doc_id type=string indexed=true. / field name=content type=string indexed=true. / I need to create separate record in solr for each parent child relationship... such that if child is same across different parent that it gets stored only once. For e.g. ---_Record 1 idABCid/ emp_idEMP001emp_id/ doc_idDOC001doc_id/ doc_contentMy Parent Docdoc_content/ ---_Record 2 idDOC001id/ emp_idemp_id/ doc_iddoc_id/ doc_contentMy Document Datadoc_content/ This will ensure that if any doc_id content is duplicate, than only once the record is inserted in the solr. Lastly, I want the result as join. if emp_id=EMP001. then both record should be returned, as there is a relationship between two records using of doc_id = id If I query: http://localhost:8983/solr/select?q={!join%20from=doc_id%20to=id}emp_id:EMP001wt=json http://localhost:8983/solr/select?q={!join%20from=sha_one%20to=id}project_id:10wt=json I expect both record should be returned either one after another or nested.. But I only get child records... Please help.. Regards, Vikash Sharma vikash0...@gmail.com
Re: How to change Solr UI
Adding to what Iwan said, I want to be sure you're not confusing prototyping with a full-fledged application. The Velocity code included is mostly intended as a rapid-prototyping vehicle. There are significant security issues if you try to use it as your user-facing application, be sure you trust your users if you go down this route. But to change it, see the Apache velocity project, and the code in solr home/conf/velocity. Note that Velocity _can_ be used for user-facing code, but be very sure you secure your Solr. If you allow direct access, a user can easily enter something like http://solr/update?commit=truestream.body=deletequery*:*/query/delete. And all your documents will be gone. Most installations use a middle layer between Solr and the user that controls access. Best Erick On Mon, Dec 3, 2012 at 5:01 AM, Iwan Hanjoyo ihanj...@gmail.com wrote: Hi Romita, In my opinion, if you are new to Solr, you can start learning from Solritas. Solritas uses Apache Velocity, a templating language, CSS and JQuery to manage it looks and behavior. Besides that you can write a custom SearchComponent inside the /browse SearchHandler to add more functionality to your search application. Kind regards, Hanjoyo On Mon, Dec 3, 2012 at 4:35 PM, Romita Saha romita.s...@sg.panasonic.com wrote: Hi, I want to change the Solr UI. As far as i understand, Solritas is just for prototyping, where I can change the UI according to a predefined template (Velocity) and cannot add on any additional functionality to that page. How can I change the Solr UI otherwise. Any guidance would be appreciated. Thanks and regards, Romita
Re: Luke and SOLR search giving different results
Jack, Yes. I expect SOLR should give same search results as Luked does. Term analyzer gives correct answer in SOLR as expected. But SOLR does not return correct search results. I don't know why. Erol Akarsu On Mon, Dec 3, 2012 at 11:21 AM, Jack Krupansky j...@basetechnology.comwrote: So, does that highlight the problem for you or not? Is the term analyzed as you expected? -- Jack Krupansky From: Erol Akarsu Sent: Monday, December 03, 2012 8:44 AM To: solr-user@lucene.apache.org Subject: Re: Luke and SOLR search giving different results Jack, Thanks for help. I removed data folder of SOLR and indexed this sample doc from scratch, there was no document in SOLR but only one. When I analysed , I can see stemming is correct and I can see these for words bul, baş ,gör and umut in SF row I attached analyse screens Erol Akarsu On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky j...@basetechnology.com wrote: Have you tried using the Solr Admin Analysis page, using the word and a few words of context for index analysis and the word alone for query analysis? And be sure to fully reindex if you change ANYTHING in the schema fields or field types. -- Jack Krupansky From: Erol Akarsu Sent: Sunday, December 02, 2012 10:38 PM To: solr-user@lucene.apache.org Subject: Luke and SOLR search giving different results Hi, I am trying to apply SOLR for Turkish Language for my research. Instead of using language identification, I manually assigned Turkish language for a sample test document. I have configured SOLR schema.xml, activated the part below. I have added the attached document testTurkishDoc.xml that is inserted to SOLR database. But searching for raw Lucene index through Luke and SOLR 4.0 search though GUI is giving different results. In picture Selection_006.png, the word baş is listed as top term. I search the word baş in Luke and I got the result result that is only document, shown in Selection_004.png. But in SOLR GUI, I am getting empty result for word baş in picture Selection_002.png. In the text we have features field, that has word baştan that is being derived from root word baş in Turkish Grammar. Somehow, SOLR GUI is doing search different than Luke. I could not figure it out why I could not find it while getting in Luke. The same thing happens for words umut, bul and gör. I will appreciate if you can help me to get same results from SOLR UI. field name=features Firmalarsa Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu! diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı reklam Arda'nın kabinde papağan gibi tekrarladığı My darling! repliği, sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de Paris'in ancak 5 kez izledikten sonra anlaşılan Paris seçti, firma yaptı, Arda bayıldı. sözleriyle kazındı hafızalara, Keşke unutabilsek! dedirterek. /field Added to schema.xml for SOLR: field name=features type=text_tr indexed=true stored=true multiValued=true/ fieldType name=text_tr class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.TurkishLowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_tr.txt enablePositionIncrements=true/ filter class=solr.SnowballPorterFilterFactory language=Turkish/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.TurkishLowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_tr.txt enablePositionIncrements=true/ filter class=solr.SnowballPorterFilterFactory language=Turkish/ /analyzer /fieldType
Re: Whole Phrase search in Solr
Thank you Jack, the problem with the AND is that it does not search for a PHRASE but for the 2 words being SOMEWHERE in the article. For example the Check this will NOT search for Check this as a PHRASE but for the Check word and the this word somewhere in the article, even far away the one from the other. So the suggestions that you made do not work for searching as a PHRASE. Unless we do something wrong? Any other ideas on the PHRASE search? Thank you again! -- View this message in context: http://lucene.472066.n3.nabble.com/Whole-Phrase-search-in-Solr-tp4023931p4024029.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Backing up SolR 4.0
There's no real need to do what you ask. First thing is that you should always be prepared, in the worst-case scenario, to regenerate your entire index. That said, perhaps the easiest way to back up Solr is just to use master/slave replication. Consider having a machine that's a slave to the master (but not necessarily searched against) and periodically poll your master (say daily or whatever your interval is). You can configure Solr to keep N copies of the index as extra insurance. These will be fairly static so if you _really_ wanted to you could just copy the solrhome/data directory somewhere, but I don't know if that's necessary. See: http://wiki.apache.org/solr/SolrReplication Best Erick On Mon, Dec 3, 2012 at 6:07 AM, Andy D'Arcy Jewell andy.jew...@sysmicro.co.uk wrote: Hi all. I'm new to SolR, and I have recently had to set up a SolR server running 4.0. I've been searching for info on backing it up, but all I've managed to come up with is it'll be different or you'll be able to do push replication or using http and the command=backup parameter, which doesn't sound like it will be effective for a production setup (unless I've got that wrong)... I was wondering if I can just stop or suspend the SolR server, then do an LVM snapshot of the data store, before bringing it back on line, but I'm not sure if that will cut it. I gather merely rsyncing the data files won't do... Can anyone give me a pointer to that easy-to-find document I have so far failed to find? Or failing that, maybe some sound advice on how to proceed? Regards, -Andy -- Andy D'Arcy Jewell SysMicro Limited Linux Support E: andy.jew...@sysmicro.co.uk W: www.sysmicro.co.uk
Re: Backing up SolR 4.0
On 03/12/12 16:39, Erick Erickson wrote: There's no real need to do what you ask. First thing is that you should always be prepared, in the worst-case scenario, to regenerate your entire index. That said, perhaps the easiest way to back up Solr is just to use master/slave replication. Consider having a machine that's a slave to the master (but not necessarily searched against) and periodically poll your master (say daily or whatever your interval is). You can configure Solr to keep N copies of the index as extra insurance. These will be fairly static so if you_really_ wanted to you could just copy the solrhome/data directory somewhere, but I don't know if that's necessary. See:http://wiki.apache.org/solr/SolrReplication Best Erick Hi Erick, Thanks for that, I'll take a look. However, wouldn't re-creating the index on a large dataset take an inordinate amount of time? The system I will be backing up is likely to undergo rapid development and thus schema changes, so I need some kind of insurance against corruption if we need to roll-back after a change. How should I go about creating multiplebackup versions I can put aside (e.g. on tape) to hedge against the down-time which would be required to regenerate the indexes from scratch? Regards, -Andy -- Andy D'Arcy Jewell SysMicro Limited Linux Support E: andy.jew...@sysmicro.co.uk W: www.sysmicro.co.uk
Re: AW: Edismax query parser and phrase queries
It _seems_ like just adding phrase fields (qf) to your edismax defaults gets you close. It would have the problem of matching if the field were longer... but it might be close enough. Otherwise, why not just add in fq clauses on your exact fields? Because one problem you'll have is that you need to get the parameters past the parser to the field, which will be...er...interesting. And one note. Rather than String fields (which are case sensitive), consider KeywordTokenizer and LowercaseFilter or some such. But I'd _really_ prove that you can't get close enough with current functionality before I went down the custom route. Often things like this seem like a good idea but then don't improve results enough to be worth the complexity. Best Erick On Mon, Dec 3, 2012 at 8:00 AM, Jack Krupansky j...@basetechnology.comwrote: Okay, so the bottom line here is that you wish to change the semantics of quoted phrases. Fine, that's your prerogative, but a change in semantics would require a change to the query parser, or as you originally indicated, a pre-processor. It does sound as if a pre-processor is the way to go here. You still have a choice: An application-level preprocessor that generates an edismax query, or implement a Solr SearchComponent that pre-processes the query after Solr receives it but before edismax sees it. The former is probably easier. The only question is whether there might be multiple applications that access the same Solr node, so that maybe centralizing the pre-processing in Solr might be warranted. -- Jack Krupansky -Original Message- From: Tantius, Richard Sent: Monday, December 03, 2012 5:03 AM To: solr-user@lucene.apache.org Subject: AW: Edismax query parser and phrase queries Hi, the use case we have in mind is that we would like to achieve exact matches for explicit phrases. Our users expect that an explicit phrase not only considers the order of terms, but also the exact wording. Therefore if we search on fields using a data type that is not meant performing exact matches, we need to change that for explicit phrases. This means in a usual query we have qf default fields using advanced tokenization (for query processing and indexing), for example like stemming via SnowballPorterFilterFactory. So our idea was to change the default search fields for explicit phrases to achieve exact matches, by using a simple data format like for example “string“ (StrField, without advanced options). Extending our example from the last mail: qf=title text Datatype of title, text, something like “text_advanced”: fieldtype ... analyzer type=index !--(and also analyzer type=query )-- filter class=solr.**WordDelimiterFilterFactory ... filter class=solr.**LowerCaseFilterFactory / filter class=solr.**SnowballPorterFilterFactory language=German2 / ... Data type of the additional fields titleExact, textExact: fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ q=ran away from home Cat Dog -transformTo- q=( titleExact:ran away from home OR textExact:ran away from home ) Cat Dog. Regards, Richard. BINSERV Gesellschaft für interaktive Konzepte und neue Medien mbH Software Engineer Gotenstr. 7-9 53175 Bonn Tel.: +49 (0)228 / 4 22 86 - 38 Fax.: +49 (0)228 / 4 22 86 - 538 E-Mail: r.tant...@binserv.de Web: www.binserv.de www.binforcepro.de Geschäftsführer: Rüdiger Jakob Amtsgericht: Siegburg HRB 6765 Hauptsitz der Gesellschaft.: Pfarrer-Wichert-Str. 35, 53639 Königswinter Diese E-Mail einschließlich eventuell angehängter Dateien enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind und diese E-Mail irrtümlich erhalten haben, dürfen Sie weder den Inhalt dieser E-Mail nutzen noch dürfen Sie die eventuell angehängten Dateien öffnen und auch nichts kopieren oder weitergeben/verbreiten. Bitte verständigen Sie den Absender und löschen Sie diese E-Mail und eventuell angehängte Dateien umgehend. Vielen Dank! - Original message - Von: Jack Krupansky [mailto:jack@basetechnology.**comj...@basetechnology.com ] Gesendet: Freitag, 30. November 2012 23:04 An: solr-user@lucene.apache.org Betreff: Re: Edismax query parser and phrase queries I don’t have a simple answer for your stated issue, but maybe part of that is because I’m not so sure what the exact problem/goal is. I mean, what’s so special about phrase queries for your app than they need distinct processing from individual terms? And, ultimately, what goal are you trying to achieve? Such as, how will the outcome of the query affect what users see and do. -- Jack Krupansky From: Tantius, Richard Sent: Friday, November 30, 2012 8:44 AM To: solr-user@lucene.apache.org Subject: Edismax query parser and phrase queries Hi, we are using the edismax query parser and execute queries on specific fields by using the qf option. Like others, we are facing
Re: Whole Phrase search in Solr
As Jack suggested, show the results of adding debugQuery=on, it'll help us help you. Particularly with this form: q=search_field:check this. It should be doing what you want. Best Erick On Mon, Dec 3, 2012 at 8:37 AM, NickA nickathen...@gmail.com wrote: Thank you Jack, the problem with the AND is that it does not search for a PHRASE but for the 2 words being SOMEWHERE in the article. For example the Check this will NOT search for Check this as a PHRASE but for the Check word and the this word somewhere in the article, even far away the one from the other. So the suggestions that you made do not work for searching as a PHRASE. Unless we do something wrong? Any other ideas on the PHRASE search? Thank you again! -- View this message in context: http://lucene.472066.n3.nabble.com/Whole-Phrase-search-in-Solr-tp4023931p4024029.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Whole Phrase search in Solr
If you use the edismax query parser and set the pf, pf2, and pf3 fields your phrases should show up as top results. This will not eliminate non-phrase matches, but will assure that phrase matches get boosted. See: http://wiki.apache.org/solr/ExtendedDisMax#pf_.28Phrase_Fields.29 -- Jack Krupansky -Original Message- From: NickA Sent: Monday, December 03, 2012 11:37 AM To: solr-user@lucene.apache.org Subject: Re: Whole Phrase search in Solr Thank you Jack, the problem with the AND is that it does not search for a PHRASE but for the 2 words being SOMEWHERE in the article. For example the Check this will NOT search for Check this as a PHRASE but for the Check word and the this word somewhere in the article, even far away the one from the other. So the suggestions that you made do not work for searching as a PHRASE. Unless we do something wrong? Any other ideas on the PHRASE search? Thank you again! -- View this message in context: http://lucene.472066.n3.nabble.com/Whole-Phrase-search-in-Solr-tp4023931p4024029.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Downloading files from the solr replication Handler
They are the '\0' character. what is a marker? Gettting the following with a wget HTTP request sent, awaiting response... 200 OK Length: unspecified [application/xml] On Fri, Nov 30, 2012 at 4:58 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: What mime type you get for binary files? Maybe server is misconfigured for that extension and sends them as text. Then they could be the markers. Do they look like markers? Regards, Alex On 30 Nov 2012 04:06, Eva Lacy e...@lacy.ie wrote: Doesn't make much sense if they are in binary files as well. On Thu, Nov 29, 2012 at 10:16 PM, Lance Norskog goks...@gmail.com wrote: Maybe these are text encoding markers? - Original Message - | From: Eva Lacy e...@lacy.ie | To: solr-user@lucene.apache.org | Sent: Thursday, November 29, 2012 3:53:07 AM | Subject: Re: Downloading files from the solr replication Handler | | I tried downloading them with my browser and also with a c# | WebRequest. | If I skip the first and last 4 bytes it seems work fine. | | | On Thu, Nov 29, 2012 at 2:28 AM, Erick Erickson | erickerick...@gmail.comwrote: | | How are you downloading them? I suspect the issue is | with the download process rather than Solr, but I'm just guessing. | | Best | Erick | | | On Wed, Nov 28, 2012 at 12:19 PM, Eva Lacy e...@lacy.ie wrote: | | Just to add to that, I'm using solr 3.6.1 | | | On Wed, Nov 28, 2012 at 5:18 PM, Eva Lacy e...@lacy.ie wrote: | |I downloaded some configuration and data files directly from |solr in an |attempt to develop a backup solution. |I noticed there is some characters at the start and end of the |file | that |aren't in configuration files, I notice the same characters at |the | start |and end of the data files. |Anyone with any idea how I can download these files without the |extra |characters or predict how many there are going to be so I can |skip | them? | | | |
Re: Luke and SOLR search giving different results
Two points: 1. Possibly an encoding problem with your container? Is UTF-8 encoding enabled? 2. Add debugQuery=true to your query (from the browser) and see if the parser_query has the expected term that matches what Luke reports for the index and what Solr Admin Analysis also reports for index analysis. -- Jack Krupansky -Original Message- From: Erol Akarsu Sent: Monday, December 03, 2012 11:35 AM To: solr-user@lucene.apache.org Subject: Re: Luke and SOLR search giving different results Jack, Yes. I expect SOLR should give same search results as Luked does. Term analyzer gives correct answer in SOLR as expected. But SOLR does not return correct search results. I don't know why. Erol Akarsu On Mon, Dec 3, 2012 at 11:21 AM, Jack Krupansky j...@basetechnology.comwrote: So, does that highlight the problem for you or not? Is the term analyzed as you expected? -- Jack Krupansky From: Erol Akarsu Sent: Monday, December 03, 2012 8:44 AM To: solr-user@lucene.apache.org Subject: Re: Luke and SOLR search giving different results Jack, Thanks for help. I removed data folder of SOLR and indexed this sample doc from scratch, there was no document in SOLR but only one. When I analysed , I can see stemming is correct and I can see these for words bul, baş ,gör and umut in SF row I attached analyse screens Erol Akarsu On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky j...@basetechnology.com wrote: Have you tried using the Solr Admin Analysis page, using the word and a few words of context for index analysis and the word alone for query analysis? And be sure to fully reindex if you change ANYTHING in the schema fields or field types. -- Jack Krupansky From: Erol Akarsu Sent: Sunday, December 02, 2012 10:38 PM To: solr-user@lucene.apache.org Subject: Luke and SOLR search giving different results Hi, I am trying to apply SOLR for Turkish Language for my research. Instead of using language identification, I manually assigned Turkish language for a sample test document. I have configured SOLR schema.xml, activated the part below. I have added the attached document testTurkishDoc.xml that is inserted to SOLR database. But searching for raw Lucene index through Luke and SOLR 4.0 search though GUI is giving different results. In picture Selection_006.png, the word baş is listed as top term. I search the word baş in Luke and I got the result result that is only document, shown in Selection_004.png. But in SOLR GUI, I am getting empty result for word baş in picture Selection_002.png. In the text we have features field, that has word baştan that is being derived from root word baş in Turkish Grammar. Somehow, SOLR GUI is doing search different than Luke. I could not figure it out why I could not find it while getting in Luke. The same thing happens for words umut, bul and gör. I will appreciate if you can help me to get same results from SOLR UI. field name=features Firmalarsa Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu! diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı reklam Arda'nın kabinde papağan gibi tekrarladığı My darling! repliği, sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de Paris'in ancak 5 kez izledikten sonra anlaşılan Paris seçti, firma yaptı, Arda bayıldı. sözleriyle kazındı hafızalara, Keşke unutabilsek! dedirterek. /field Added to schema.xml for SOLR: field name=features type=text_tr indexed=true stored=true multiValued=true/ fieldType name=text_tr class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.TurkishLowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_tr.txt enablePositionIncrements=true/ filter class=solr.SnowballPorterFilterFactory language=Turkish/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.TurkishLowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_tr.txt enablePositionIncrements=true/ filter class=solr.SnowballPorterFilterFactory language=Turkish/ /analyzer /fieldType
Re: News clustering
I mean measuring the similarity between the document in each cluster. Also, difference between document on one cluster with another cluster. I saw the sample code ClusteringQualityBencmark.java However, I do not know how to make use of it for assessing my Solr Clustering performance. You'd need to write your own code for this, here are the most common clustering quality measures you mentioned: http://en.wikipedia.org/wiki/Cluster_analysis#Evaluation_of_clustering_results These are meant for the general case (numeric attributes), to apply them to texts, you'd need to use the vector representation of the documents. One a more general note, synthetic measures test only the document-cluster assignments, but none take the quality of labels into account (this is really hard to measure objectively). Staszek
Re: Whole Phrase search in Solr
The edismax phrase boost feature boosts the phrase IF it occurs - it's optional. If you want Solr to search ONLY by whole phrase, Solr does have a precise way to request that - simply enclose the phrase in quotes. But I presume that you knew that. You can certainly preprocess your query to convert raw phrases into quoted phrases. -- Jack Krupansky -Original Message- From: NickA Sent: Monday, December 03, 2012 12:40 PM To: solr-user@lucene.apache.org Subject: Re: Whole Phrase search in Solr Thank you Jack, Before doing this major change, please note that the problem is that there are ZERO matches of the your products phrase (on my example below). It is not that the search finds this phrase but it has it in very low ranking... it is that it NEVER finds this phrase as a result. So how will the search show them on top results, since these are ZERO? OR you mean that with this new parser we WILL get phrase results too? Thank you again! -- View this message in context: http://lucene.472066.n3.nabble.com/Whole-Phrase-search-in-Solr-tp4023931p4024048.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Backing up SolR 4.0
On 12/3/2012 9:47 AM, Andy D'Arcy Jewell wrote: However, wouldn't re-creating the index on a large dataset take an inordinate amount of time? The system I will be backing up is likely to undergo rapid development and thus schema changes, so I need some kind of insurance against corruption if we need to roll-back after a change. How should I go about creating multiplebackup versions I can put aside (e.g. on tape) to hedge against the down-time which would be required to regenerate the indexes from scratch? Serious production Solr installs require at least two copies of your index. Failures *will* happen, and sometimes they'll be the kind of failures that will take down an entire machine. You can plan for some failures -- redundant power supply and RAID are important for this. Some failures will cause downtime, though -- multiple disk failures, motherboard, CPU, memory, software problems wiping out your index, user error, etc.If you have at least one other copy of your index, you'll be able to keep the system operational while you fix the down machine. Replication is a very good way to accomplish getting two or more copies of your index. I would expect that most production Solr installations use either plain replication or SolrCloud. I do my redundancy a different way that gives me a lot more flexibility, but replication is a VERY solid way to go. If you are running on a UNIX/Linux platform (just about anything *other* than Windows), and backups via replication are not enough for you, you can use the hardlink capability in the OS to avoid taking Solr down while you make backups. Here's the basic sequence: 1) Pause indexing, wait for all commits and merges to complete. 2) Create a target directory on the same filesystem as your Solr index. 3) Make hardlinks of all files in your Solr index in the target directory. 4) Resume indexing. 5) Copy the target directory to your backup location at your leisure. 6) Delete the hardlink copies from the target directory. Making hardlinks is a near-instantaneous operation. The way that Solr/Lucene works will guarantee that your hardlink copy will continue to be a valid index snapshot no matter what happens to the live index. If you can make the backup and get the hardlinks deleted before your index undergoes a merge, the hardlinks will use very little extra disk space. If you leave the hardlink copies around, eventually your live index will diverge to the point where the copy has different files and therefore takes up disk space. If you have a *LOT* of extra disk space on the Solr server, you can keep multiple hardlink copies around as snapshots. Recent versions of Windows do have features similar to UNIX links, so there may in fact be a way to do this on Windows. I will leave that for someone else to pursue. Thanks, Shawn
Re: Luke and SOLR search giving different results
Jack, I have already set tomcat server fro UTF-Encoding before. I have added URIEncoding=UTF-8 to all Connector .. elements in server.xml in Tomcat 7. As you see below, when I search word baş with debug mode I can see empty response. But when I search word baştan, I can get correct response. It seems to me that TurkishAnalyser is not being used in SOLR search because we can make only full word search baştan but not the root word baş. Probably, English Analyzer is being used and could not find the root word. For example, in Luke, if I change Analyser to use for query parsing to EnglishAnalyser, then it can not find word baş but it can with TurkishAnalyser only. I guess SOLR is not using TurkishAnalyzer. Is this assumption true? I could not find any other reason ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime58/int lst name=params str name=debugQuerytrue/str str name=qbaş/str str name=wtxml/str /lst /lst result name=response numFound=0 start=0 / lst name=debug str name=rawquerystringbaş/str str name=querystringbaş/str str name=parsedquerytext:baş/str str name=parsedquery_toStringtext:baş/str lst name=explain / str name=QParserLuceneQParser/str lst name=timing double name=time38.0/double lst name=prepare double name=time16.0/double lst name=org.apache.solr.handler.component.QueryComponent double name=time3.0/double /lst lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.StatsComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst lst name=process double name=time10.0/double lst name=org.apache.solr.handler.component.QueryComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.StatsComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.DebugComponent double name=time10.0/double /lst /lst /lst /lst /response response lst name=responseHeader int name=status0/int int name=QTime2/int lst name=params str name=debugQuerytrue/str str name=qbaştan/str str name=wtxml/str /lst /lst result name=response numFound=1 start=0 doc str name=urlhtt://111.a.b1/str str name=id6H500F0/str str name=langtr/str str name=nameMaxtor DiamondMax 11 - hard drive - 500 GB - SATA-300 /str str name=manuMaxtor Corp./str str name=manu_id_smaxtor/str arr name=cat strelectronics/str strhard drive/str /arr arr name=features strSATA 3.0Gb/s, NCQ/str str8.5ms seek/str str16MB cache/str str Firmalarsa Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu! diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı reklam Arda'nın kabinde papağan gibi tekrarladığı My darling! repliği, sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de Paris'in ancak 5 kez izledikten sonra anlaşılan Paris seçti, firma yaptı, Arda bayıldı. sözleriyle kazındı hafızalara, Keşke unutabilsek! dedirterek.
Re: Luke and SOLR search giving different results
Ah! See where it says str name=parsedquery_toStringtext:baş/str? Your query is against the text field, which probably doesn't have the Turkish analysis. There is probably a copyField from features to text. You use the text_tr field type for features, but probably not for the text field. -- Jack Krupansky -Original Message- From: Erol Akarsu Sent: Monday, December 03, 2012 1:06 PM To: solr-user@lucene.apache.org Subject: Re: Luke and SOLR search giving different results Jack, I have already set tomcat server fro UTF-Encoding before. I have added URIEncoding=UTF-8 to all Connector .. elements in server.xml in Tomcat 7. As you see below, when I search word baş with debug mode I can see empty response. But when I search word baştan, I can get correct response. It seems to me that TurkishAnalyser is not being used in SOLR search because we can make only full word search baştan but not the root word baş. Probably, English Analyzer is being used and could not find the root word. For example, in Luke, if I change Analyser to use for query parsing to EnglishAnalyser, then it can not find word baş but it can with TurkishAnalyser only. I guess SOLR is not using TurkishAnalyzer. Is this assumption true? I could not find any other reason ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime58/int lst name=params str name=debugQuerytrue/str str name=qbaş/str str name=wtxml/str /lst /lst result name=response numFound=0 start=0 / lst name=debug str name=rawquerystringbaş/str str name=querystringbaş/str str name=parsedquerytext:baş/str str name=parsedquery_toStringtext:baş/str lst name=explain / str name=QParserLuceneQParser/str lst name=timing double name=time38.0/double lst name=prepare double name=time16.0/double lst name=org.apache.solr.handler.component.QueryComponent double name=time3.0/double /lst lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.StatsComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst lst name=process double name=time10.0/double lst name=org.apache.solr.handler.component.QueryComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.StatsComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.DebugComponent double name=time10.0/double /lst /lst /lst /lst /response response lst name=responseHeader int name=status0/int int name=QTime2/int lst name=params str name=debugQuerytrue/str str name=qbaştan/str str name=wtxml/str /lst /lst result name=response numFound=1 start=0 doc str name=urlhtt://111.a.b1/str str name=id6H500F0/str str name=langtr/str str name=nameMaxtor DiamondMax 11 - hard drive - 500 GB - SATA-300 /str str name=manuMaxtor Corp./str str name=manu_id_smaxtor/str arr name=cat strelectronics/str strhard drive/str /arr arr name=features strSATA 3.0Gb/s, NCQ/str str8.5ms seek/str str16MB cache/str str Firmalarsa Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu! diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki
Re: Luke and SOLR search giving different results
Jack, I have these in schema.xml that defines features as type of text_tr But unfortunately, this fails. field name=features type=text_tr indexed=true stored=true multiValued=true/ copyField source=features dest=text/ fieldType name=text_tr class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.TurkishLowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_tr.txt enablePositionIncrements=true/ filter class=solr.SnowballPorterFilterFactory language=Turkish/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.TurkishLowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_tr.txt enablePositionIncrements=true/ filter class=solr.SnowballPorterFilterFactory language=Turkish/ /analyzer /fieldType On Mon, Dec 3, 2012 at 1:15 PM, Jack Krupansky j...@basetechnology.comwrote: Ah! See where it says str name=parsedquery_toString**text:baş/str? Your query is against the text field, which probably doesn't have the Turkish analysis. There is probably a copyField from features to text. You use the text_tr field type for features, but probably not for the text field. -- Jack Krupansky -Original Message- From: Erol Akarsu Sent: Monday, December 03, 2012 1:06 PM To: solr-user@lucene.apache.org Subject: Re: Luke and SOLR search giving different results Jack, I have already set tomcat server fro UTF-Encoding before. I have added URIEncoding=UTF-8 to all Connector .. elements in server.xml in Tomcat 7. As you see below, when I search word baş with debug mode I can see empty response. But when I search word baştan, I can get correct response. It seems to me that TurkishAnalyser is not being used in SOLR search because we can make only full word search baştan but not the root word baş. Probably, English Analyzer is being used and could not find the root word. For example, in Luke, if I change Analyser to use for query parsing to EnglishAnalyser, then it can not find word baş but it can with TurkishAnalyser only. I guess SOLR is not using TurkishAnalyzer. Is this assumption true? I could not find any other reason ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime58/int lst name=params str name=debugQuerytrue/str str name=qbaş/str str name=wtxml/str /lst /lst result name=response numFound=0 start=0 / lst name=debug str name=rawquerystringbaş/**str str name=querystringbaş/str str name=parsedquerytext:baş/**str str name=parsedquery_toString**text:baş/str lst name=explain / str name=QParserLuceneQParser/**str lst name=timing double name=time38.0/double lst name=prepare double name=time16.0/double lst name=org.apache.solr.handler.**component.QueryComponent double name=time3.0/double /lst lst name=org.apache.solr.handler.**component.FacetComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.**component.**MoreLikeThisComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.**component.HighlightComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.**component.StatsComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.**component.DebugComponent double name=time0.0/double /lst /lst lst name=process double name=time10.0/double lst name=org.apache.solr.handler.**component.QueryComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.**component.FacetComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.**component.**MoreLikeThisComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.**component.HighlightComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.**component.StatsComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.**component.DebugComponent double name=time10.0/double /lst /lst /lst /lst
Re: Whole Phrase search in Solr
Jack thank you again, however we have the major problem that using QUOTES to bring phrase results, actually does not bring any results AT ALL! I mentioned this at the initial post, that we also used these: fq=search_field:check this fq=search_field:'check this' But no results appear when quotes are used. What may be doing wrong in our configuration? -- View this message in context: http://lucene.472066.n3.nabble.com/Whole-Phrase-search-in-Solr-tp4023931p4024071.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: News clustering
I'm trying to using to search though news websites, but I was interested in classification on index time, is there any available solution for this? Greetings! On Dec 3, 2012, at 12:37 PM, Stanislaw Osinski stanis...@osinski.name wrote: I mean measuring the similarity between the document in each cluster. Also, difference between document on one cluster with another cluster. I saw the sample code ClusteringQualityBencmark.java However, I do not know how to make use of it for assessing my Solr Clustering performance. You'd need to write your own code for this, here are the most common clustering quality measures you mentioned: http://en.wikipedia.org/wiki/Cluster_analysis#Evaluation_of_clustering_results These are meant for the general case (numeric attributes), to apply them to texts, you'd need to use the vector representation of the documents. One a more general note, synthetic measures test only the document-cluster assignments, but none take the quality of labels into account (this is really hard to measure objectively). Staszek 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: Luke and SOLR search giving different results
Jack, I see interesting stuff here now. I tried as search query not baş but features:baş in field q in SOLR GUI. And, I got result! In the one document, I had some fields type of text_eng, text_general and one field features type of text_tr. If I don't specify field name, SOLR use EnglishAnalyzer. If I do, it uses the analyzer specific to field specified in search query string. Is this true? Erol Akarsu On Mon, Dec 3, 2012 at 1:30 PM, Erol Akarsu eaka...@gmail.com wrote: Jack, I have these in schema.xml that defines features as type of text_tr But unfortunately, this fails. field name=features type=text_tr indexed=true stored=true multiValued=true/ copyField source=features dest=text/ fieldType name=text_tr class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.TurkishLowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_tr.txt enablePositionIncrements=true/ filter class=solr.SnowballPorterFilterFactory language=Turkish/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.TurkishLowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_tr.txt enablePositionIncrements=true/ filter class=solr.SnowballPorterFilterFactory language=Turkish/ /analyzer /fieldType On Mon, Dec 3, 2012 at 1:15 PM, Jack Krupansky j...@basetechnology.comwrote: Ah! See where it says str name=parsedquery_toString**text:baş/str? Your query is against the text field, which probably doesn't have the Turkish analysis. There is probably a copyField from features to text. You use the text_tr field type for features, but probably not for the text field. -- Jack Krupansky -Original Message- From: Erol Akarsu Sent: Monday, December 03, 2012 1:06 PM To: solr-user@lucene.apache.org Subject: Re: Luke and SOLR search giving different results Jack, I have already set tomcat server fro UTF-Encoding before. I have added URIEncoding=UTF-8 to all Connector .. elements in server.xml in Tomcat 7. As you see below, when I search word baş with debug mode I can see empty response. But when I search word baştan, I can get correct response. It seems to me that TurkishAnalyser is not being used in SOLR search because we can make only full word search baştan but not the root word baş. Probably, English Analyzer is being used and could not find the root word. For example, in Luke, if I change Analyser to use for query parsing to EnglishAnalyser, then it can not find word baş but it can with TurkishAnalyser only. I guess SOLR is not using TurkishAnalyzer. Is this assumption true? I could not find any other reason ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime58/int lst name=params str name=debugQuerytrue/str str name=qbaş/str str name=wtxml/str /lst /lst result name=response numFound=0 start=0 / lst name=debug str name=rawquerystringbaş/**str str name=querystringbaş/str str name=parsedquerytext:baş/**str str name=parsedquery_toString**text:baş/str lst name=explain / str name=QParserLuceneQParser/**str lst name=timing double name=time38.0/double lst name=prepare double name=time16.0/double lst name=org.apache.solr.handler.**component.QueryComponent double name=time3.0/double /lst lst name=org.apache.solr.handler.**component.FacetComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.**component.**MoreLikeThisComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.**component.HighlightComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.**component.StatsComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.**component.DebugComponent double name=time0.0/double /lst /lst lst name=process double name=time10.0/double lst name=org.apache.solr.handler.**component.QueryComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.**component.FacetComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.**component.**MoreLikeThisComponent double
Re: Luke and SOLR search giving different results
As I pointed out in my message, your query is indicating that text is your default search field. So, either choose a different default search field, or assure that the text field has the desired field type. If you want to change the default search field, eEither use a df request parameter or change the df default value for the request handler in the solrconfig.xml. -- Jack Krupansky -Original Message- From: Erol Akarsu Sent: Monday, December 03, 2012 3:44 PM To: solr-user@lucene.apache.org Subject: Re: Luke and SOLR search giving different results Jack, I see interesting stuff here now. I tried as search query not baş but features:baş in field q in SOLR GUI. And, I got result! In the one document, I had some fields type of text_eng, text_general and one field features type of text_tr. If I don't specify field name, SOLR use EnglishAnalyzer. If I do, it uses the analyzer specific to field specified in search query string. Is this true? Erol Akarsu On Mon, Dec 3, 2012 at 1:30 PM, Erol Akarsu eaka...@gmail.com wrote: Jack, I have these in schema.xml that defines features as type of text_tr But unfortunately, this fails. field name=features type=text_tr indexed=true stored=true multiValued=true/ copyField source=features dest=text/ fieldType name=text_tr class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.TurkishLowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_tr.txt enablePositionIncrements=true/ filter class=solr.SnowballPorterFilterFactory language=Turkish/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.TurkishLowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_tr.txt enablePositionIncrements=true/ filter class=solr.SnowballPorterFilterFactory language=Turkish/ /analyzer /fieldType On Mon, Dec 3, 2012 at 1:15 PM, Jack Krupansky j...@basetechnology.comwrote: Ah! See where it says str name=parsedquery_toString**text:baş/str? Your query is against the text field, which probably doesn't have the Turkish analysis. There is probably a copyField from features to text. You use the text_tr field type for features, but probably not for the text field. -- Jack Krupansky -Original Message- From: Erol Akarsu Sent: Monday, December 03, 2012 1:06 PM To: solr-user@lucene.apache.org Subject: Re: Luke and SOLR search giving different results Jack, I have already set tomcat server fro UTF-Encoding before. I have added URIEncoding=UTF-8 to all Connector .. elements in server.xml in Tomcat 7. As you see below, when I search word baş with debug mode I can see empty response. But when I search word baştan, I can get correct response. It seems to me that TurkishAnalyser is not being used in SOLR search because we can make only full word search baştan but not the root word baş. Probably, English Analyzer is being used and could not find the root word. For example, in Luke, if I change Analyser to use for query parsing to EnglishAnalyser, then it can not find word baş but it can with TurkishAnalyser only. I guess SOLR is not using TurkishAnalyzer. Is this assumption true? I could not find any other reason ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime58/int lst name=params str name=debugQuerytrue/str str name=qbaş/str str name=wtxml/str /lst /lst result name=response numFound=0 start=0 / lst name=debug str name=rawquerystringbaş/**str str name=querystringbaş/str str name=parsedquerytext:baş/**str str name=parsedquery_toString**text:baş/str lst name=explain / str name=QParserLuceneQParser/**str lst name=timing double name=time38.0/double lst name=prepare double name=time16.0/double lst name=org.apache.solr.handler.**component.QueryComponent double name=time3.0/double /lst lst name=org.apache.solr.handler.**component.FacetComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.**component.**MoreLikeThisComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.**component.HighlightComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.**component.StatsComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.**component.DebugComponent double name=time0.0/double /lst /lst
solr war - osgi
Hi, Has anyone had any experience repackaging the solr war for osgi? And while I'm at it, has anyone done this in geronimo 3.0? Regards, Marcos
Re: Luke and SOLR search giving different results
On 12/3/2012 1:44 PM, Erol Akarsu wrote: I tried as search query not baş but features:baş in field q in SOLR GUI. And, I got result! In the one document, I had some fields type of text_eng, text_general and one field features type of text_tr. If I don't specify field name, SOLR use EnglishAnalyzer. If I do, it uses the analyzer specific to field specified in search query string. Your config is set up to search against a field named text by default - either by a setting in schema.xml or a df parameter in your search handler definition in solrconfig.xml. If you are using (e)dismax, it might be qf/pf parameters instead of df. The field named text is not properly set up for this search. Your attachment at the beginning of this thread indicates that either you do not have a text field for this document at all, or that field is not stored. If the text field is a copyField as Jack has mentioned, note that it doesn't matter what analysis you are doing on features -- the copy is done before analysis, so it is completely separate. Thanks, Shawn
Re: Whole Phrase search in Solr
Ah! You have conflicting tokenizers in your index and query analyzers. They should be the same. Your index has: tokenizer class=solr.StandardTokenizerFactory/ Your query has: tokenizer class=solr.KeywordTokenizerFactory/ That has the effect of treating the entire query term as one index term. That actually works for simple terms, but a quoted phrase is passed to the query analyzer as one string and the keyword tokenizer will treat it as one token and this will index it as one term, which will not match the two terms that were indexed by the standard tokenizer. Stick with the same tokenizer as you used at index time. -- Jack Krupansky -Original Message- From: NickA Sent: Monday, December 03, 2012 1:47 PM To: solr-user@lucene.apache.org Subject: Re: Whole Phrase search in Solr Jack thank you again, however we have the major problem that using QUOTES to bring phrase results, actually does not bring any results AT ALL! I mentioned this at the initial post, that we also used these: fq=search_field:check this fq=search_field:'check this' But no results appear when quotes are used. What may be doing wrong in our configuration? -- View this message in context: http://lucene.472066.n3.nabble.com/Whole-Phrase-search-in-Solr-tp4023931p4024071.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem with ping handler, SolrJ 4.1-SNAPSHOT, Solr 3.5.0
On 11/8/2012 3:25 PM, Dyer, James wrote: Could this be a side-effect from SOLR-4019, in branch_4.0 this was commit r1405894 ? Prior to this commit, PingRequestHandler would throw a SolrException for 503/Bad Request. The change is that the exception isn't actually thrown but rather sent in place of the response. This prevents the container from logging huge stack traces just because PingrequestHandler is in a disabled state. Prior to this, SolrException had logging disabled for 503's with hardcoding, but this broke other uses of 503 SE's. While working on another issue (SOLR-4143), I figured out why this isn't working. Initially I did not connect the exceptions in the Solr 3.5 log to my problems getting ping responses, but the light eventually turned on. My requests to the 3.5 ping handler from SolrJ 4.1-SNAPSHOT use the setRequestHandler method to talk to /admin/ping. In addition to using /admin/ping as the URL path, this also sets the qt parameter to /admin/ping. The PingRequestHandler in Solr 3.x looks at the qt parameter that it receives, and if that handler is an instance of PingRequestHandler, throws an exception saying that you can't call PRH recursively. This is why I get an exception and no response, but it works perfectly in a browser -- I wasn't setting qt in my browser. Once I did that, I get the bad response in the browser too. There is no way in SolrJ 4.x or trunk to set the request handler without also setting qt. When I looked at SolrJ code trying to make a patch for SOLR-4143, I discovered that it's not a trivial change, and it may not be possible to even do in branch_4x. Is there possibly a workaround I can use in SolrJ? Other thoughts? Thanks, Shawn
Re: News clustering
Hi Stanislaw, I see. Thank you for the reference. Kind regards, Hanjoyo On Tue, Dec 4, 2012 at 12:37 AM, Stanislaw Osinski stanis...@osinski.namewrote: I mean measuring the similarity between the document in each cluster. Also, difference between document on one cluster with another cluster. I saw the sample code ClusteringQualityBencmark.java However, I do not know how to make use of it for assessing my Solr Clustering performance. You'd need to write your own code for this, here are the most common clustering quality measures you mentioned: http://en.wikipedia.org/wiki/Cluster_analysis#Evaluation_of_clustering_results These are meant for the general case (numeric attributes), to apply them to texts, you'd need to use the vector representation of the documents. One a more general note, synthetic measures test only the document-cluster assignments, but none take the quality of labels into account (this is really hard to measure objectively). Staszek
Re: How to change Solr UI
Note that Velocity _can_ be used for user-facing code, but be very sure you secure your Solr. If you allow direct access, a user can easily enter something like http:// solr/update?commit=truestream.body=deletequery*:*/query/delete. And all your documents will be gone. Hi Erickson, Thank you for the input. I'll notice and filter out this url. * http:// solr/update?commit=truestream.body=deletequery*:*/query/delete Kind regards, Hanjoyo
Re: solr war - osgi
Has anyone had any experience repackaging the solr war for osgi? And while I'm at it, has anyone done this in geronimo 3.0? Hi Marcos, Start glassfish web server. Put solr war file inside the autodeploy folder. Finally, you need to find the solr home folder location. Different operating system will have different solr home location for glassfish. You need to find it yourself in the glassfish log file. It is a bit difficult. good luck Kind regards, Hanjoyo
Re: How to change Solr UI
It is annoying to have to repeat these explanations so much. Any serious objection to removing the VW UI from Solr proper and replacing it with a standalone app? I mean, Solr should have PHP, python, Java, and ruby example apps, right? -- Jack Krupansky -Original Message- From: Iwan Hanjoyo Sent: Monday, December 03, 2012 8:28 PM To: solr-user@lucene.apache.org Subject: Re: How to change Solr UI Note that Velocity _can_ be used for user-facing code, but be very sure you secure your Solr. If you allow direct access, a user can easily enter something like http:// solr/update?commit=truestream.body=deletequery*:*/query/delete. And all your documents will be gone. Hi Erickson, Thank you for the input. I'll notice and filter out this url. * http:// solr/update?commit=truestream.body=deletequery*:*/query/delete Kind regards, Hanjoyo
Solr Query Parameter : ids - What is this used for?
Hello, as it is clear in the title too, i wanna know for what solr uses this parameter... i see it on a sharding env on cloud, so i guess it is related with cloud but still there is no explanation about it in any of wiki pages that i have checked... can someone explain the usage and aim of this parameter? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Query-Parameter-ids-What-is-this-used-for-tp4024152.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Query Parameter : ids - What is this used for?
On Mon, Dec 3, 2012 at 10:55 PM, deniz denizdurmu...@gmail.com wrote: Hello, as it is clear in the title too, i wanna know for what solr uses this parameter... i see it on a sharding env on cloud, so i guess it is related with cloud but still there is no explanation about it in any of wiki pages that i have checked... can someone explain the usage and aim of this parameter? It's an internal implementation detail of distributed search - the second phase selects specific ids on each shard via the ids parameter. -Yonik http://lucidworks.com
Difference between 'bf' and 'boost' when using eDismax
Hi there, I'm not sure if I understand this clearly. 'bf' is that final score will be add some value return by bf? for example- score + bf = final score 'boost' is that score will be multiply with value that return by boost? for example- score * boost = final score When using both( 'bf' and 'boost') score * boost + bf = final score If I would like to make recent created document ranking higher, using 'bf' or 'boost' will be better solution(Assume bf and boost will use the same function recip(ms(NOW,datefield),3.16e-11,1,1))? Please help on this.
search behavior on a case-sensitive field
I have a search like this: fieldType name=text_cs class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ !--filter class=solr.LowerCaseFilterFactory/ -- filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType When I query COST, it gives reasonable results (n1); When I query CoSt, however, it gives me n2 (n1) results, and I can't locate actual occurence of CoSt in the docs at all. Can anybody advise?
Re: Solr Query Parameter : ids - What is this used for?
Yonik Seeley-4 wrote It's an internal implementation detail of distributed search - the second phase selects specific ids on each shard via the ids parameter. -Yonik http://lucidworks.com so i suppose it us unique field? or it depends on which field we are using for querying on shards? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Query-Parameter-ids-What-is-this-used-for-tp4024152p4024159.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: search behavior on a case-sensitive field
CoSt was split into two terms and the query parser generated an OR of them. Adding the autoGeneratePhraseQueries=true attribute to your field type should fix the problem. You can also change splitOnCaseChange=1 to splitOnCaseChange=0 to avoid the term splitting issue. Be sure to completely reindex in either case. -- Jack Krupansky -Original Message- From: Joe Zhang Sent: Monday, December 03, 2012 11:10 PM To: solr-user@lucene.apache.org Subject: search behavior on a case-sensitive field I have a search like this: fieldType name=text_cs class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ !--filter class=solr.LowerCaseFilterFactory/ -- filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType When I query COST, it gives reasonable results (n1); When I query CoSt, however, it gives me n2 (n1) results, and I can't locate actual occurence of CoSt in the docs at all. Can anybody advise?
Re: search behavior on a case-sensitive field
haha, makes perfect sense! Thanks a lot! On Mon, Dec 3, 2012 at 9:25 PM, Jack Krupansky j...@basetechnology.comwrote: CoSt was split into two terms and the query parser generated an OR of them. Adding the autoGeneratePhraseQueries=**true attribute to your field type should fix the problem. You can also change splitOnCaseChange=1 to splitOnCaseChange=0 to avoid the term splitting issue. Be sure to completely reindex in either case. -- Jack Krupansky -Original Message- From: Joe Zhang Sent: Monday, December 03, 2012 11:10 PM To: solr-user@lucene.apache.org Subject: search behavior on a case-sensitive field I have a search like this: fieldType name=text_cs class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.**WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.**WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ !--filter class=solr.**LowerCaseFilterFactory/ -- filter class=solr.**EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.**RemoveDuplicatesTokenFilterFac** tory/ /analyzer /fieldType When I query COST, it gives reasonable results (n1); When I query CoSt, however, it gives me n2 (n1) results, and I can't locate actual occurence of CoSt in the docs at all. Can anybody advise?
Re: Difference between 'bf' and 'boost' when using eDismax
bf is processed first, then boost. All the bf's will be added, then the resulting scores will be boosted by the product of all the boost function queries. -- Jack Krupansky -Original Message- From: Floyd Wu Sent: Monday, December 03, 2012 11:00 PM To: solr-user@lucene.apache.org Subject: Difference between 'bf' and 'boost' when using eDismax Hi there, I'm not sure if I understand this clearly. 'bf' is that final score will be add some value return by bf? for example- score + bf = final score 'boost' is that score will be multiply with value that return by boost? for example- score * boost = final score When using both( 'bf' and 'boost') score * boost + bf = final score If I would like to make recent created document ranking higher, using 'bf' or 'boost' will be better solution(Assume bf and boost will use the same function recip(ms(NOW,datefield),3.16e-11,1,1))? Please help on this.
Re: How to change Solr UI
That's only one example, there are others, stream.body=deleteidblah/id/delete. or deletequeryid:*/query/delete Jack's comment is well taken, consider a real middleware application. Best Erick On Mon, Dec 3, 2012 at 5:28 PM, Iwan Hanjoyo ihanj...@gmail.com wrote: Note that Velocity _can_ be used for user-facing code, but be very sure you secure your Solr. If you allow direct access, a user can easily enter something like http:// solr/update?commit=truestream.body=deletequery*:*/query/delete. And all your documents will be gone. Hi Erickson, Thank you for the input. I'll notice and filter out this url. * http:// solr/update?commit=truestream.body=deletequery*:*/query/delete Kind regards, Hanjoyo
Re: Difference between 'bf' and 'boost' when using eDismax
Thanks Jack! It helps a lots. Floyd 2012/12/4 Jack Krupansky j...@basetechnology.com bf is processed first, then boost. All the bf's will be added, then the resulting scores will be boosted by the product of all the boost function queries. -- Jack Krupansky -Original Message- From: Floyd Wu Sent: Monday, December 03, 2012 11:00 PM To: solr-user@lucene.apache.org Subject: Difference between 'bf' and 'boost' when using eDismax Hi there, I'm not sure if I understand this clearly. 'bf' is that final score will be add some value return by bf? for example- score + bf = final score 'boost' is that score will be multiply with value that return by boost? for example- score * boost = final score When using both( 'bf' and 'boost') score * boost + bf = final score If I would like to make recent created document ranking higher, using 'bf' or 'boost' will be better solution(Assume bf and boost will use the same function recip(ms(NOW,datefield),3.16e-**11,1,1))? Please help on this.
Migrating solr 3.6 to solr 4.0
Hi, I had solr3.6 installed on my system, now i am migrating my solr3.6 to solr4.0. but i am getting the error SEVERE: Unable to create core: collection1 java.io.IOException: Can't find resource 'solrconfig.xml' in classpath or 'solr/collection1/conf/', cwd=/opt/tomcat/bin i don't know how to resolve this. -- View this message in context: http://lucene.472066.n3.nabble.com/Migrating-solr-3-6-to-solr-4-0-tp4024173.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Migrating solr 3.6 to solr 4.0
can you paste the content of solr.xml On Dec 4, 2012, at 1:26 AM, Shaveta_Chawla wrote: Hi, I had solr3.6 installed on my system, now i am migrating my solr3.6 to solr4.0. but i am getting the error SEVERE: Unable to create core: collection1 java.io.IOException: Can't find resource 'solrconfig.xml' in classpath or 'solr/collection1/conf/', cwd=/opt/tomcat/bin i don't know how to resolve this. -- View this message in context: http://lucene.472066.n3.nabble.com/Migrating-solr-3-6-to-solr-4-0-tp4024173.html Sent from the Solr - User mailing list archive at Nabble.com. **Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you. *