more like this generated query
Hello, I am using solr-4.10.4 with mlt. I noticed that mlt constructs query which is missing some words. For example, for doc with title: Jennnifer Lopez keywords: Jennifer, concert, Hollywood the parsedquery generated by mlt for this doc is title:lopez keywords:jennifer keywords:concert keywords:hollywood. It seems to me that there must be title:jennifer, too For another doc that has only title, mlt generated query includes keywords:famili. This doc has family in title. Any ideas what is wrong here? Thanks. Alex.
Re: snapinstaller does not start newSearcher
I have used snapshotter api and modified snapinstaller script, so that it successfully grabs the snapshot folder and updates index folder in slave. However, it fails to open newSearcher. It simple, sends a commit command to slave, but hasUncommittedChanges function returns false. That is the reason. Reloading collection picks up changes. Could reloading return no results for queries that were sent during this process? Thanks. Alex. -- View this message in context: http://lucene.472066.n3.nabble.com/snapinstaller-does-not-start-newSearcher-tp4188449p4191069.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: snapinstaller does not start newSearcher
Hello, We cannot use replication with the current architecture, so decided to use snapshotter with snapinstaller. Here is the full stack trace 8937 [coreLoadExecutor-5-thread-3] INFO org.apache.solr.core.CachingDirectoryFactory – Closing directory: /home/solr/solr-4.10.1/solr/example/solr/product/data 8938 [coreLoadExecutor-5-thread-3] ERROR org.apache.solr.core.CoreContainer – Error creating core [product]: Error opening new searcher org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.init(SolrCore.java:873) at org.apache.solr.core.SolrCore.init(SolrCore.java:646) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:491) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:255) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:249) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677) at org.apache.solr.core.SolrCore.init(SolrCore.java:845) ... 9 more Caused by: java.nio.file.NoSuchFileException: /home/solr/solr-4.10.1/solr/example/solr/product/data/index/segments_4 at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:176) at java.nio.channels.FileChannel.open(FileChannel.java:287) at java.nio.channels.FileChannel.open(FileChannel.java:334) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:196) at org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:198) at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:113) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:341) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:454) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:906) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:752) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:450) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:792) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:77) at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64) at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:279) at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:111) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1528) ... 11 more 8943 [main] INFO org.apache.solr.servlet.SolrDispatchFilter – user.dir=/home/solr/solr-4.10.1/solr/example 8943 [main] INFO org.apache.solr.servlet.SolrDispatchFilter – SolrDispatchFilter.init() done 8982 [main] INFO org.eclipse.jetty.server.AbstractConnector – Started SocketConnector@0.0.0.0:8983 Thanks. Alex. -Original Message- From: Shalin Shekhar Mangar shalinman...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Tue, Feb 24, 2015 12:13 am Subject: Re: snapinstaller does not start newSearcher Do you mean the snapinstaller (bash) script? Those are legacy scripts. It's been a long time since they were tested. The ReplicationHandler is the recommended way to setup replication. If you want to take a snapshot then the replication handler has an HTTP based API which lets you do that. In any case, do you have the full stack trace for that exception? There should be another cause nested under it. On Tue, Feb 24, 2015 at 12:47 PM, alx...@aim.com wrote: Hello, I am using latest solr (solr trunk) . I run snapinstaller, and see that it copies snapshot to index folder but changes are not picked up and logs in slave after running snapinstaller are 44302 [qtp1312571113-14] INFO org.apache.solr.update.UpdateHandler – start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} 44303 [qtp1312571113-14] INFO org.apache.solr.update.UpdateHandler – No uncommitted changes. Skipping IW.commit. 44304 [qtp1312571113-14] INFO org.apache.solr.core.SolrCore – SolrIndexSearcher has not
snapinstaller does not start newSearcher
Hello, I am using latest solr (solr trunk) . I run snapinstaller, and see that it copies snapshot to index folder but changes are not picked up and logs in slave after running snapinstaller are 44302 [qtp1312571113-14] INFO org.apache.solr.update.UpdateHandler – start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} 44303 [qtp1312571113-14] INFO org.apache.solr.update.UpdateHandler – No uncommitted changes. Skipping IW.commit. 44304 [qtp1312571113-14] INFO org.apache.solr.core.SolrCore – SolrIndexSearcher has not changed - not re-opening: org.apache.solr.search.SolrIndexSearcher 44305 [qtp1312571113-14] INFO org.apache.solr.update.UpdateHandler – end_commit_flush 44305 [qtp1312571113-14] INFO org.apache.solr.update.processor.LogUpdateProcessor – [product] webapp=/solr path=/update params={} {commit=} 0 57 Restarting solr gives Error creating core [product]: Error opening new searcher org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.init(SolrCore.java:873) at org.apache.solr.core.SolrCore.init(SolrCore.java:646) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:491) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:255) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:249) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677) at org.apache.solr.core.SolrCore.init(SolrCore.java:845) ... 9 more Any idea what causes this issue. Thanks in advance. Alex.
custom sorting of search result
Hello, We need to order solr search results according to specific rules. I will explain with an example. Let say solr returns 1000 results for query sport. These results must be divided into three buckets according to rules that come from database. Then one doc must be chosen from each bucket and put in the results subsequently until all buckets are empty. One approach was to modify/override solr code where it gets results, sorts them and return #rows of elements. However, from the code in Weight.java scoreAll function we see that docs have only internal document id and nothing else. We expect unique solr document id in order to match documents with the custom scoring. We also see that Lucene code handles those doc ids to scoreAll function, and for now We do not want to modify Lucene code and prefer to solve this issue as a Solr plugin . Any ideas are welcome. Thanks. Alex.
Re: Incorrect group.ngroups value
Hi, From the discussion it is not clear if this is a fixable bug in the case of documents being in different shards. If this is fixable could someone please direct me to the part of the code so that I could investigate. Thanks. Alex. -Original Message- From: Andrew Shumway andrew.shum...@issinc.com To: solr-user solr-user@lucene.apache.org Sent: Fri, Aug 22, 2014 8:15 am Subject: RE: Incorrect group.ngroups value The Co-location section of this document http://searchhub.org/2013/06/13/solr-cloud-document-routing/ might be of interest to you. It mentions the need for using Solr Cloud routing to group documents in the same core so that grouping can work properly. --Andrew Shumway -Original Message- From: Bryan Bende [mailto:bbe...@gmail.com] Sent: Friday, August 22, 2014 9:01 AM To: solr-user@lucene.apache.org Subject: Re: Incorrect group.ngroups value Thanks Jim. We've been using the composite id approach where we put group value as the leading portion of the id (i.e. groupValue!documentid), so I was expecting all of the documents for a given group to be in the same shard, but at least this gives me something to look into. I'm still suspicious of something changing between 4.6.1 and 4.8.1, because we've had the grouping implemented this way for a while, and only on the exact day we upgraded did someone bring this problem forward. I will keep investigating, thanks. On Fri, Aug 22, 2014 at 9:18 AM, jim ferenczi jim.feren...@gmail.com wrote: Hi Bryan, This is a known limitations of the grouping. https://wiki.apache.org/solr/FieldCollapsing#RequestParameters group.ngroups: *WARNING: If this parameter is set to true on a sharded environment, all the documents that belong to the same group have to be located in the same shard, otherwise the count will be incorrect. If you are using SolrCloud https://wiki.apache.org/solr/SolrCloud, consider using custom hashing* Cheers, Jim 2014-08-21 21:44 GMT+02:00 Bryan Bende bbe...@gmail.com: Is there any known issue with using group.ngroups in a distributed Solr using version 4.8.1 ? I recently upgraded a cluster from 4.6.1 to 4.8.1, and I'm noticing several queries where ngroups will be more than the actual groups returned in the response. For example, ngroups will say 5, but then there will be 3 groups in the response. It is not happening on all queries, only some.
regexTransformer returns no results if there is no match
Hello, I try to construct wikipedia page url from page title using regexTransformer with field column=title_underscore regex=\s+ replaceWith=_ sourceColName=title / This does not work for titles that have no space, so title_underscore for them is empty. Any ideas what is wrong here? This is with solr-4.8.1 Thanks. Alex.
Re: group.ngroups is set to an incorrect value - specific field types
Hi, I see similar problem in our solr application. Sometime it gives number in a group as number of all documents. This starting to happen after upgrade from 4.6.1 to 4.8.1 Thanks. Alex. -Original Message- From: 海老澤 志信 shinobu_ebis...@waku-2.com To: solr-user solr-user@lucene.apache.org Sent: Tue, Jun 17, 2014 5:24 am Subject: RE: group.ngroups is set to an incorrect value - specific field types Hi all Could anyone have comments on my bug report? Regards, Ebisawa -Original Message- From: 海老澤 志信 Sent: Friday, June 13, 2014 7:45 PM To: 'solr-user@lucene.apache.org' Subject: group.ngroups is set to an incorrect value - specific field types Hi, I'm using Solr version 4.1. I found a bug in group.ngroups. So could anyone kindly take a look at my bug report? If I specify the type Double as group.field, the value of group.ngroups is set to be an incorrect value. [condition] - Double is defined in group.field - Documents without the field which is defined as group.field, [Sample query and Example] --- solr/select?q=*:*group=truegroup.ngroups=truegroup.field=Double_Fiel d * Double_Field is defined solr.TrieDoubleField type. --- When documents with group.field are 4 and documents without group.field are 6, then it turns out 10 of group.ngroups as result of the query. But I think that group.ngroups should be 5 rightly in this case. [Root Cause] It seems there is a bug in the source code of Lucene. There is a function that compares a list of whether these groups contain the same group.field, It calls MutableValueDouble.compareSameType(). See below the point which seems to be a root cause. - if (!exists) return -1; if (!b.exists) return 1; - If exists is false, it return -1. But I think it should return 0, when exists and b.exists are equal. [Similar problem] There is a similar problem to MutableValueBool.compareSameType(). Therefore, when you grouping the field of type Boolean (solr.BoolField), value of group.ngroups is always 0 or 1 . [Solution] I propose the following modifications: MutableValueDouble.compareSameType() === --- MutableValueDouble.java +++ MutableValueDouble.java @@ -54,9 +54,8 @@ MutableValueDouble b = (MutableValueDouble)other; int c = Double.compare(value, b.value); if (c != 0) return c; -if (!exists) return -1; -if (!b.exists) return 1; -return 0; +if (exists == b.exists) return 0; +return exists ? 1 : -1; } === I propose the following modifications: MutableValueBool.compareSameType() === --- MutableValueBool.java +++ MutableValueBool.java @@ -52,7 +52,7 @@ @Override public int compareSameType(Object other) { MutableValueBool b = (MutableValueBool)other; -if (value != b.value) return value ? 1 : 0; +if (value != b.value) return value ? 1 : -1; if (exists == b.exists) return 0; return exists ? 1 : -1; } === Thanks, Ebisawa
Re: how do I get search for fort st john to match ft saint john
It seems to me that, you are missing this line filter class=solr.SynonymFilterFactory synonyms=city_index_synonyms.txt ignoreCase=true expand=true / under analyzer type=query Alex. -Original Message- From: solr-user solr-u...@hotmail.com To: solr-user solr-user@lucene.apache.org Sent: Tue, Apr 1, 2014 5:01 pm Subject: Re: how do I get search for fort st john to match ft saint john Hi Eric. Sorry, been away. The city_index_synonyms.txt file is pretty small as it contains just these two lines: saint,st,ste fort,ft There is nothing at all in the city_query_synonyms.txt file, and it isn't used either. My understanding is that solr would create the appropriate synonym entries in the index and so treat fort and ft as equal if you have a simple one line schema (that uses the type definition from my original email) and index fort saint john, does it work for you? i.e. does it return results if you search for ft st john and ft saint john and fort st john? My Solr 4.6.1 instance doesn't. I am wondering if synonyms just don't work for all/some words in a phrase -- View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128500.html Sent from the Solr - User mailing list archive at Nabble.com.
spellcheck in solr-4.6-1 distrib=true
Hello, For queries in solrcloud and in distributed mode solr-4.6.1 spellcheck does not return any suggestions, but in non-distrubited mode. Is this a know bug? Thanks. Alex.
Re: change character correspondence in icu lib
I found out that generated files are the same. I think this is because that these lines inside build file target name=gen-utr30-data-files depends=compile-tools java classname=org.apache.lucene.analysis.icu.GenerateUTR30DataFiles dir=${utr30.data.dir} fork=true failonerror=true classpath path refid=icujar/ pathelement location=${build.dir}/classes/tools/ /classpath /java /target property name=gennorm2.src.files value=nfc.txt nfkc.txt nfkc_cf.txt BasicFoldings.txt DiacriticFolding.txt DingbatFolding.txt HanRadicalFolding.txt NativeDigitFolding.txt/ property name=gennorm2.tmp value=${build.dir}/gennorm2/utr30.tmp/ property name=gennorm2.dst value=${resources.dir}/org/apache/lucene/analysis/icu/utr30.nrm/ target name=gennorm2 depends=gen-utr30-data-files echoNote that the gennorm2 and icupkg tools must be on your PATH. These tools are part of the ICU4C package. See http://site.icu-project.org/ /echo mkdir dir=${build.dir}/gennorm2/ exec executable=gennorm2 failonerror=true arg value=-v/ arg value=-s/ arg value=${utr30.data.dir}/ arg line=${gennorm2.src.files}/ arg value=-o/ arg value=${gennorm2.tmp}/ /exec !-- now convert binary file to big-endian -- exec executable=icupkg failonerror=true arg value=-tb/ arg value=${gennorm2.tmp}/ arg value=${gennorm2.dst}/ /exec delete file=${gennorm2.tmp}/ /target Are not executed and resource files are downloaded from internet instead. Any ideas how to fix this issue? Thanks. Alex. -Original Message- From: Alexandre Rafalovitch arafa...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Wed, Feb 12, 2014 5:20 pm Subject: Re: change character correspondence in icu lib Not a direct answer, but the usual next question is: are you absolutely sure you are using the right jars? Try renaming them and restarting Solr. If it complains, you got the right ones. If not Also, unzip those jars and see if your file made it all the way through the build pipeline. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Feb 13, 2014 at 8:12 AM, alx...@aim.com wrote: Hello, I use icu4j-49.1.jar, lucene-analyzers-icu-4.6-SNAPSHOT.jar for one of the fields in the form filter class=solr.ICUFoldingFilterFactory / I need to change one of the accent char's corresponding letter. I made changes to this file lucene/analysis/icu/src/data/utr30/DiacriticFolding.txt recompiled solr and lucene and replaced the above jars with new ones, but no change in the indexing and parsing of keywords. Any ideas where the appropriate change must be made? Thanks. Alex.
Re: change character correspondence in icu lib
I found out that generated files are the same. I think this is because that these lines inside build file target name=gen-utr30-data-files depends=compile-tools java classname=org.apache.lucene.analysis.icu.GenerateUTR30DataFiles dir=${utr30.data.dir} fork=true failonerror=true classpath path refid=icujar/ pathelement location=${build.dir}/classes/tools/ /classpath /java /target property name=gennorm2.src.files value=nfc.txt nfkc.txt nfkc_cf.txt BasicFoldings.txt DiacriticFolding.txt DingbatFolding.txt HanRadicalFolding.txt NativeDigitFolding.txt/ property name=gennorm2.tmp value=${build.dir}/gennorm2/utr30.tmp/ property name=gennorm2.dst value=${resources.dir}/org/apache/lucene/analysis/icu/utr30.nrm/ target name=gennorm2 depends=gen-utr30-data-files echoNote that the gennorm2 and icupkg tools must be on your PATH. These tools are part of the ICU4C package. See http://site.icu-project.org/ /echo mkdir dir=${build.dir}/gennorm2/ exec executable=gennorm2 failonerror=true arg value=-v/ arg value=-s/ arg value=${utr30.data.dir}/ arg line=${gennorm2.src.files}/ arg value=-o/ arg value=${gennorm2.tmp}/ /exec !-- now convert binary file to big-endian -- exec executable=icupkg failonerror=true arg value=-tb/ arg value=${gennorm2.tmp}/ arg value=${gennorm2.dst}/ /exec delete file=${gennorm2.tmp}/ /target Are not executed and resource files are downloaded from internet instead. Any ideas how to fix this issue? Thanks. Alex. -Original Message- From: Alexandre Rafalovitch arafa...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Wed, Feb 12, 2014 5:20 pm Subject: Re: change character correspondence in icu lib Not a direct answer, but the usual next question is: are you absolutely sure you are using the right jars? Try renaming them and restarting Solr. If it complains, you got the right ones. If not Also, unzip those jars and see if your file made it all the way through the build pipeline. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Feb 13, 2014 at 8:12 AM, alx...@aim.com wrote: Hello, I use icu4j-49.1.jar, lucene-analyzers-icu-4.6-SNAPSHOT.jar for one of the fields in the form filter class=solr.ICUFoldingFilterFactory / I need to change one of the accent char's corresponding letter. I made changes to this file lucene/analysis/icu/src/data/utr30/DiacriticFolding.txt recompiled solr and lucene and replaced the above jars with new ones, but no change in the indexing and parsing of keywords. Any ideas where the appropriate change must be made? Thanks. Alex.
change character correspondence in icu lib
Hello, I use icu4j-49.1.jar, lucene-analyzers-icu-4.6-SNAPSHOT.jar for one of the fields in the form filter class=solr.ICUFoldingFilterFactory / I need to change one of the accent char's corresponding letter. I made changes to this file lucene/analysis/icu/src/data/utr30/DiacriticFolding.txt recompiled solr and lucene and replaced the above jars with new ones, but no change in the indexing and parsing of keywords. Any ideas where the appropriate change must be made? Thanks. Alex.
Re: additional requests sent to solr
Hi, Could someone please confirm that this must me so or this is a bug in SOLR. In short, I see three logs in SOLR for one request http://server1:8983/solr/mycollection/select?q=alexwt=xmldefType=edismaxfacet.field=schoolfacet.field=companyfacet=truefacet.limit=10facet.mincount=1qf=school_txt+company_txt+nameshards=server1:8983/solr/mycollection,server2.com:8983/solr/mycollection for the case when facet=true. The third log looks like as INFO: [mycollection] webapp=/solr path=/select params={facet=truefacet.mincount=1company__terms=Googleids=957642543183429632,957841245982425088,67612781366,56659036467,50875569066,957707339232706560,465078975511facet.limit=10qf=school_txt+company_txt+namedistrib=falsewt=javabinversion=2rows=10defType=edismaxNOW=1374191542130shard.url=server1:8983/solr/mycollectionschool__terms=Michigan+State+University,Brigham+Young+University,Northeastern+Universityq=alexfacet.field={!terms%3D$school__terms}schoolfacet.field={!terms%3D$company__terms}companyisShard=true} status=0 QTime=6 where company__terms and school_terms values are taken from facet values for company and school fields. When data is big this leads to a log with all facet values, that considerably slows performance. This issue is observed in distributed mode only. Thanks in advance. Alex. -- View this message in context: http://lucene.472066.n3.nabble.com/additional-requests-sent-to-solr-tp4079007p4083799.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: additional requests sent to solr
Hello, I still have this issue. Basically in distributed mode, when facet is true, solr-4.2 issues an additional query with facet.field={!terms%3D$company__terms}companyisShard=true} where for example company__terms have all values from company facet field. I have added terms=false to the original query sent to solr, but it did not help. Does anyone has any idea how to suppress these queries. Thanks. Alex. -Original Message- From: alxsss alx...@aim.com To: solr-user solr-user@lucene.apache.org Sent: Fri, Jul 19, 2013 5:00 am Subject: additional requests sent to solr Hello, I send to solr( to server1 in the cluster of two servers) the folowing request http://server1:8983/solr/mycollection/select?q=alexwt=xmldefType=edismaxfacet.field=schoolfacet.field=companyfacet=truefacet.limit=10facet.mincount=1qf=school_txt+company_txt+nameshards=server1:8983/solr/mycollection,server2.com:8983/solr/mycollection I see in the logs 2 additional requests INFO: [mycollection] webapp=/solr path=/select params={facet=truef.company.facet.limit=25qf=school_txt+company_txt+namedistrib=falsewt=javabinversion=2rows=10defType=edismaxf.school_facet.facet.limit=25NOW=1374191542130shard.url=server1:8983/solr/mycollectionfl=id,scorestart=0q=alexfacet.field=schoolfacet.field=companyisShard=truefsv=true} hits=9118 status=0 QTime=72 Jul 18, 2013 4:52:22 PM org.apache.solr.core.SolrCore execute INFO: [mycollection] webapp=/solr path=/select params={facet=truefacet.mincount=1company__terms=Googleids=957642543183429632,957841245982425088,67612781366,56659036467,50875569066,957707339232706560,465078975511facet.limit=10qf=school_txt+company_txt+namedistrib=falsewt=javabinversion=2rows=10defType=edismaxNOW=1374191542130shard.url=server1:8983/solr/mycollectionschool__terms=Michigan+State+University,Brigham+Young+University,Northeastern+Universityq=alexfacet.field={!terms%3D$school__terms}schoolfacet.field={!terms%3D$company__terms}companyisShard=true} status=0 QTime=6 Jul 18, 2013 4:52:22 PM org.apache.solr.core.SolrCore execute INFO: [mycollection] webapp=/solr path=/select params={facet=trueshards=server1.prod.mylife.com:8983/solr/mycollection,server2:8983/solr/mycollectionfacet.mincount=1q=alexfacet.limit=10qf=school_txt+company_txt+namefacet.field=schoolfacet.field=companywt=xmldefType=edismax} hits=97262 status=0 QTime=168 I can understand that the first and the third log records are related to the above request, but cannot inderstand where the second log comes from. I see in it, company__terms and {!terms%3D$school__terms}schoolfacet.field={!terms%3D$company__terms}, whish seems does not have anything to do with the initial request. This is solr-4.2.0 Any ideas about it are welcome. Thanks in advance. Alex.
Re: additional requests sent to solr
I care about performance. Since the data is too big the query with terms becomes to long and slows performance. bq --- In general distributed searchrequires two round trips to the other shards. --- In this case I have three queries to solr. The third one is with {!terms..., which I do not understand why it is there. Thanks. Alex. -Original Message- From: Erick Erickson erickerick...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Mon, Aug 5, 2013 7:10 pm Subject: Re: additional requests sent to solr Why do you care? Is this causing you trouble? In general distributed search requires two round trips to the other shards. The first query gets the top N, those are returned to the originator (just a list of IDs and sort criteria, often score). The originator then assembles the final top N, but then the actual body of those documents must be fetched from the other nodes. Best Erick On Mon, Aug 5, 2013 at 2:02 AM, alx...@aim.com wrote: Hello, I still have this issue. Basically in distributed mode, when facet is true, solr-4.2 issues an additional query with facet.field={!terms%3D$company__terms}companyisShard=true} where for example company__terms have all values from company facet field. I have added terms=false to the original query sent to solr, but it did not help. Does anyone has any idea how to suppress these queries. Thanks. Alex. -Original Message- From: alxsss alx...@aim.com To: solr-user solr-user@lucene.apache.org Sent: Fri, Jul 19, 2013 5:00 am Subject: additional requests sent to solr Hello, I send to solr( to server1 in the cluster of two servers) the folowing request http://server1:8983/solr/mycollection/select?q=alexwt=xmldefType=edismaxfacet.field=schoolfacet.field=companyfacet=truefacet.limit=10facet.mincount=1qf=school_txt+company_txt+nameshards=server1:8983/solr/mycollection,server2.com:8983/solr/mycollection I see in the logs 2 additional requests INFO: [mycollection] webapp=/solr path=/select params={facet=truef.company.facet.limit=25qf=school_txt+company_txt+namedistrib=falsewt=javabinversion=2rows=10defType=edismaxf.school_facet.facet.limit=25NOW=1374191542130shard.url=server1:8983/solr/mycollectionfl=id,scorestart=0q=alexfacet.field=schoolfacet.field=companyisShard=truefsv=true} hits=9118 status=0 QTime=72 Jul 18, 2013 4:52:22 PM org.apache.solr.core.SolrCore execute INFO: [mycollection] webapp=/solr path=/select params={facet=truefacet.mincount=1company__terms=Googleids=957642543183429632,957841245982425088,67612781366,56659036467,50875569066,957707339232706560,465078975511facet.limit=10qf=school_txt+company_txt+namedistrib=falsewt=javabinversion=2rows=10defType=edismaxNOW=1374191542130shard.url=server1:8983/solr/mycollectionschool__terms=Michigan+State+University,Brigham+Young+University,Northeastern+Universityq=alexfacet.field={!terms%3D$school__terms}schoolfacet.field={!terms%3D$company__terms}companyisShard=true} status=0 QTime=6 Jul 18, 2013 4:52:22 PM org.apache.solr.core.SolrCore execute INFO: [mycollection] webapp=/solr path=/select params={facet=trueshards= server1.prod.mylife.com:8983/solr/mycollection,server2:8983/solr/mycollectionfacet.mincount=1q=alexfacet.limit=10qf=school_txt+company_txt+namefacet.field=schoolfacet.field=companywt=xmldefType=edismax } hits=97262 status=0 QTime=168 I can understand that the first and the third log records are related to the above request, but cannot inderstand where the second log comes from. I see in it, company__terms and {!terms%3D$school__terms}schoolfacet.field={!terms%3D$company__terms}, whish seems does not have anything to do with the initial request. This is solr-4.2.0 Any ideas about it are welcome. Thanks in advance. Alex.
additional requests sent to solr
Hello, I send to solr( to server1 in the cluster of two servers) the folowing request http://server1:8983/solr/mycollection/select?q=alexwt=xmldefType=edismaxfacet.field=schoolfacet.field=companyfacet=truefacet.limit=10facet.mincount=1qf=school_txt+company_txt+nameshards=server1:8983/solr/mycollection,server2.com:8983/solr/mycollection I see in the logs 2 additional requests INFO: [mycollection] webapp=/solr path=/select params={facet=truef.company.facet.limit=25qf=school_txt+company_txt+namedistrib=falsewt=javabinversion=2rows=10defType=edismaxf.school_facet.facet.limit=25NOW=1374191542130shard.url=server1:8983/solr/mycollectionfl=id,scorestart=0q=alexfacet.field=schoolfacet.field=companyisShard=truefsv=true} hits=9118 status=0 QTime=72 Jul 18, 2013 4:52:22 PM org.apache.solr.core.SolrCore execute INFO: [mycollection] webapp=/solr path=/select params={facet=truefacet.mincount=1company__terms=Googleids=957642543183429632,957841245982425088,67612781366,56659036467,50875569066,957707339232706560,465078975511facet.limit=10qf=school_txt+company_txt+namedistrib=falsewt=javabinversion=2rows=10defType=edismaxNOW=1374191542130shard.url=server1:8983/solr/mycollectionschool__terms=Michigan+State+University,Brigham+Young+University,Northeastern+Universityq=alexfacet.field={!terms%3D$school__terms}schoolfacet.field={!terms%3D$company__terms}companyisShard=true} status=0 QTime=6 Jul 18, 2013 4:52:22 PM org.apache.solr.core.SolrCore execute INFO: [mycollection] webapp=/solr path=/select params={facet=trueshards=server1.prod.mylife.com:8983/solr/mycollection,server2:8983/solr/mycollectionfacet.mincount=1q=alexfacet.limit=10qf=school_txt+company_txt+namefacet.field=schoolfacet.field=companywt=xmldefType=edismax} hits=97262 status=0 QTime=168 I can understand that the first and the third log records are related to the above request, but cannot inderstand where the second log comes from. I see in it, company__terms and {!terms%3D$school__terms}schoolfacet.field={!terms%3D$company__terms}, whish seems does not have anything to do with the initial request. This is solr-4.2.0 Any ideas about it are welcome. Thanks in advance. Alex.
Re: document id in nutch/solr
Another way of overriding nutch fields is to modify solrindex-mapping.xml file. hth Alex. -Original Message- From: Jack Krupansky j...@basetechnology.com To: solr-user solr-user@lucene.apache.org Sent: Sun, Jun 23, 2013 12:04 pm Subject: Re: document id in nutch/solr Add the passthrough dynamic field to your Solr schema, and then see what fields get passed through to Solr from Nutch. Then, add the missing fields to your Solr schema and remove the passthrough. dynamicField name=* type=string indexed=true stored=true multiValued=true / Or, add Solr copyField directives to place fields in existing named fields. Or... talk to the nutch people about how to do field name mapping on the nutch side of the fence. Hold off on UUIDs until you figure all of the above out and everything is working without them. -- Jack Krupansky -Original Message- From: Joe Zhang Sent: Sunday, June 23, 2013 2:35 PM To: solr-user@lucene.apache.org Subject: Re: document id in nutch/solr Can somebody help with this one, please? On Fri, Jun 21, 2013 at 10:36 PM, Joe Zhang smartag...@gmail.com wrote: A quite standard configuration of nutch seems to autoamtically map url to id. Two questions: - Where is such mapping defined? I can't find it anywhere in nutch-site.xml or schema.xml. The latter does define the id field as well as its uniqueness, but not the mapping. - Given that nutch nutch has already defined such an id, can i ask solr to redefine id as UUID? field name=id type=uuid indexed=true stored=true default=NEW/ - This leads to a related question: do solr and nutch have to have IDENTICAL schema.xml?
whole index in memory
Hello, I have a solr index of size 5GB. I am thinking of increasing cache size to 5 GB, expecting Solr will put whole index into memory. 1. Will Solr indeed put whole index into memory? 2. What are drawbacks of this approach? Thanks in advance. Alex.
Re: EdgeGram filter
Hi, I was unable to find more info about LimitTokenCountFilterFactory in solr wiki. Is there any other place to get thorough description of what it does? Thanks. Alex. -Original Message- From: Jack Krupansky j...@basetechnology.com To: solr-user solr-user@lucene.apache.org Sent: Tue, Apr 23, 2013 11:36 am Subject: Re: EdgeGram filter Well, you could copy to another field (using copyField) and then have an analyzer with a LimitTokenCountFilterFactory that accepts only 1 token, and then apply the EdgeNGramFilter to that one token. But you would have to query explicitly against that other field. Since you are using dismax, you should be able to add that second field to the qf parameter. And then remove the EdgeNGramFilter from your main field. -- Jack Krupansky -Original Message- From: hassancrowdc Sent: Tuesday, April 23, 2013 12:09 PM To: solr-user@lucene.apache.org Subject: EdgeGram filter Hi, I want to edgeNgram let's say this document that has 'difficult contents' so that if i query (using disman) q=dif it shows me this result. This is working fine. But now if i search for q=con it gives me this document as well. is there any way to only show this document when i search for 'dif' or 'di'. basically i want to edgegram 'difficultcontent' not 'difficult' and 'content'. Any help? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/EdgeGram-filter-tp4058337.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: EdgeGram filter
Hi, I did not find any descriptions, except constructor and method names. Thanks. Alex. -Original Message- From: Markus Jelsma markus.jel...@openindex.io To: solr-user solr-user@lucene.apache.org Sent: Tue, Apr 23, 2013 12:08 pm Subject: RE: EdgeGram filter Always check the javadocs. There's a lot of info to be found there: http://lucene.apache.org/core/4_0_0-BETA/analyzers-common/org/apache/lucene/analysis/miscellaneous/LimitTokenCountFilterFactory.html -Original message- From:alx...@aim.com alx...@aim.com Sent: Tue 23-Apr-2013 21:06 To: solr-user@lucene.apache.org Subject: Re: EdgeGram filter Hi, I was unable to find more info about LimitTokenCountFilterFactory in solr wiki. Is there any other place to get thorough description of what it does? Thanks. Alex. -Original Message- From: Jack Krupansky j...@basetechnology.com To: solr-user solr-user@lucene.apache.org Sent: Tue, Apr 23, 2013 11:36 am Subject: Re: EdgeGram filter Well, you could copy to another field (using copyField) and then have an analyzer with a LimitTokenCountFilterFactory that accepts only 1 token, and then apply the EdgeNGramFilter to that one token. But you would have to query explicitly against that other field. Since you are using dismax, you should be able to add that second field to the qf parameter. And then remove the EdgeNGramFilter from your main field. -- Jack Krupansky -Original Message- From: hassancrowdc Sent: Tuesday, April 23, 2013 12:09 PM To: solr-user@lucene.apache.org Subject: EdgeGram filter Hi, I want to edgeNgram let's say this document that has 'difficult contents' so that if i query (using disman) q=dif it shows me this result. This is working fine. But now if i search for q=con it gives me this document as well. is there any way to only show this document when i search for 'dif' or 'di'. basically i want to edgegram 'difficultcontent' not 'difficult' and 'content'. Any help? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/EdgeGram-filter-tp4058337.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr-cloud performance decrease day by day
How many segments each shard has and what is the reason of running multiple shards in one machine? Alex. -Original Message- From: qibaoyuan qibaoy...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Fri, Apr 19, 2013 12:26 am Subject: Re: solr-cloud performance decrease day by day there are 6 shards and they are in one machine,and the jvm param is very big,the physical memory is 16GB,the total #docs is about 150k,the index size of each shard is about 1GB.AND there is indexing while searching,I USE auto commit each 10min.and the data comes about 100 per minutes. 在 2013-4-19,下午3:17,Furkan KAMACI furkankam...@gmail.com 写道: Could you give more info about your index size and technical details of your machine? Maybe you are indexing more data day by day and your RAM capability is not enough anymore? 2013/4/19 qibaoyuan qibaoy...@gmail.com Hello, i am using sold 4.1.0 and ihave used sold cloud in my product.I have found at first everything seems good,the search time is fast and delay is slow,but it becomes very slow after days.does any one knows if there maybe some params or optimization to use sold cloud?
Re: Spellchecker not working for Solr 4.1
inside your request handler try to put spellcheck true and name of the spellcheck dictionary hth Alex. -Original Message- From: davers dboych...@improvementdirect.com To: solr-user solr-user@lucene.apache.org Sent: Thu, Apr 11, 2013 6:24 pm Subject: Spellchecker not working for Solr 4.1 This is almost the same exact setup I was using in solr 3.6 not sure why it's not working. Here is my setup. searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=namedefault/str str name=fieldspell/str str name=classnamesolr.DirectSolrSpellChecker/str str name=distanceMeasureinternal/str float name=accuracy0.7/float int name=maxEdits2/int int name=minPrefix1/int int name=maxInspections5/int int name=minQueryLength4/int float name=maxQueryFrequency0.01/float /lst /searchComponent requestHandler name=/productQuery class=solr.SearchHandler lst name=defaults str name=dftext/str str name=defTypeedismax/str float name=tie0.01/float str name=qf sku^9.0 upc^9.1 uniqueid^9.0 series^2.8 productTitle^1.2 productid^9.0 manufacturer^4.0 masterFinish^1.5 theme^1.1 categoryName^0.2 finish^1.4 /str str name=pf text^0.2 productTitle^1.5 manufacturer^4.0 finish^1.9 /str str name=bf linear(popularity_82_i,1,2)^3.0 /str str name=fl uniqueid,productid,manufacturer /str str name=mm 3lt;-1 5lt;-2 6lt;90% /str bool name=grouptrue/bool str name=group.fieldgroupid/str bool name=group.ngroupstrue/bool int name=ps100/int int name=qs3/int int name=spellcheck.count10/int bool name=spellcheck.collatetrue/bool int name=spellcheck.maxCollations10/int int name=spellcheck.maxCollationTries100/int /lst arr name=last-components strspellcheck/str /arr /requestHandler fieldType name=textSpell class=solr.TextField positionIncrementGap=100 omitNorms=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.PatternReplaceFilterFactory pattern=([\.,;:_/\-]) replacement= replace=all/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.PatternReplaceFilterFactory pattern=([\.,;:_/\-]) replacement= replace=all/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType This is what I see in my logs when I attempt a spellcheck INFO: [productindex] webapp=/solr path=/select params={spellcheck=falsegroup.distributed.first=truetie=0.01spellcheck.maxCollationTries=100distrib=falseversion=2NOW=1365729795603shard.url=solr-shard-1.sys.id.build.com:8080/solr/productindex/|solr-shard-4.sys.id.build.com:8080/solr/productindex/fl=id,scoredf=textbf=%0a%09%09linear(popularity_82_i,1,2)^3.0%0a%09%09++group.field=groupidspellcheck.count=10qs=3spellcheck.build=truemm=%0a%09%093-1+5-2+690%25%0a%09%09++group.ngroups=truespellcheck.maxCollations=10qf=%0a%09%09sku^9.0+upc^9.1+uniqueid^9.0+series^2.8+productTitle^1.2+productid^9.0+manufacturer^4.0+masterFinish^1.5+theme^1.1+categoryName^0.2+finish^1.4%0a%09%09++wt=javabinspellcheck.collate=truedefType=edismaxrows=10pf=%0a%09%09text^0.2+productTitle^1.5+manufacturer^4.0+finish^1.9%0a%09%09++start=0q=fuacetgroup=trueisShard=trueps=100} status=0 QTime=13 Apr 11, 2013 6:23:15 PM org.apache.solr.handler.component.SpellCheckComponent finishStage INFO: solr-shard-2.sys.id.build.com:8080/solr/productindex/|solr-shard-5.sys.id.build.com:8080/solr/productindex/ null Apr 11, 2013 6:23:15 PM org.apache.solr.handler.component.SpellCheckComponent finishStage INFO: solr-shard-3.sys.id.build.com:8080/solr/productindex/|solr-shard-6.sys.id.build.com:8080/solr/productindex/ null Apr 11, 2013 6:23:15 PM org.apache.solr.handler.component.SpellCheckComponent finishStage INFO: solr-shard-1.sys.id.build.com:8080/solr/productindex/|solr-shard-4.sys.id.build.com:8080/solr/productindex/ null -- View this message in
Re: Query slow with termVectors termPositions termOffsets
Did index size increase after turning on termPositions and termOffsets? Thanks. Alex. -Original Message- From: Ravi Solr ravis...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Mon, Mar 25, 2013 8:27 am Subject: Query slow with termVectors termPositions termOffsets Hello, We re-indexed our entire core of 115 docs with some of the fields having termVectors=true termPositions=true termOffsets=true, prior to the reindex we only had termVectors=true. After the reindex the the query component has become very slow. I thought that adding the termOffsets and termPositions will increase the speed, am I wrong ? Several queries like the one shown below which used to run fine are now very slow. Can somebody kindly clarify how termOffsets and termPositions affect query component ? lst name=processdouble name=time19076.0/double lst name=org.apache.solr.handler.component.QueryComponentdouble name=time18972.0/double/lst lst name=org.apache.solr.handler.component.FacetComponentdouble name=time0.0/double/lst lst name=org.apache.solr.handler.component.MoreLikeThisComponentdouble name=time0.0/double/lst lst name=org.apache.solr.handler.component.HighlightComponentdouble name=time0.0/double/lst lst name=org.apache.solr.handler.component.StatsComponentdouble name=time0.0/double/lst lst name=org.apache.solr.handler.component.QueryElevationComponentdouble name=time0.0/double/lst lst name=org.apache.solr.handler.clustering.ClusteringComponentdouble name=time0.0/double/lst lst name=org.apache.solr.handler.component.DebugComponentdouble name=time104.0/double/lst /lst [#|2013-03-25T11:22:53.446-0400|INFO|sun-appserver2.1|org.apache.solr.core.SolrCore|_ThreadID=45;_ThreadName=httpSSLWorkerThread-9001-19;|[xxx] webapp=/solr-admin path=/select params={q=primarysectionnode:(/national*+OR+/health*)+OR+(contenttype:Blog+AND+subheadline:(The+Checkup+OR+Checkpoint+Washington+OR+Post+Carbon+OR+TSA+OR+College+Inc.+OR+Campus+Overload+OR+Planet+Panel+OR+The+Answer+Sheet+OR+Class+Struggle+OR+BlogPost))+OR+(contenttype:Photo+Gallery+AND+headline:day+in+photos)start=0rows=1sort=displaydatetime+descfq=-source:(Reuters+OR+PC+World+OR+CBS+News+OR+NC8/WJLA+OR+NewsChannel+8+OR+NC8+OR+WJLA+OR+CBS)+-contenttype:(Discussion+OR+Photo)+-slug:(op-*dummy*+OR+noipad-*)+-(contenttype:Photo+Gallery+AND+headline:(Drawing+Board+OR+Drawing+board+OR+drawing+board))+headline:[*+TO+*]+contenttype:[*+TO+*]+pubdatetime:[NOW/DAY-3YEARS+TO+NOW/DAY%2B1DAY]+-headline:(Summary+Box*+OR+Video*+OR+Post+Sports+Live*)+-slug:(warren*+OR+history)+-(contenttype:Blog+AND+subheadline:(DC+Schools+Insider+OR+On+Leadership))+contenttype:Blog+-systemid:(999c7102-955a-11e2-95ca-dd43e7ffee9c+OR+72bbb724-9554-11e2-95ca-dd43e7ffee9c+OR+2d008b80-9520-11e2-95ca-dd43e7ffee9c+OR+d2443d3c-9514-11e2-95ca-dd43e7ffee9c+OR+173764d6-9520-11e2-95ca-dd43e7ffee9c+OR+0181fd42-953c-11e2-95ca-dd43e7ffee9c+OR+e6cacb96-9559-11e2-95ca-dd43e7ffee9c+OR+03288052-9501-11e2-95ca-dd43e7ffee9c+OR+ddbf020c-9517-11e2-95ca-dd43e7ffee9c)+fullbody:[*+TO+*]wt=javabinversion=2} hits=4985 status=0 QTime=19044 |#] Thanks, Ravi Kiran Bhaskar
Re: strange behaviour of wordbreak spellchecker in solr cloud
Hello, Further investigation shows the following pattern, for both DirectIndex and wordbreak spellchekers. Assume that in all cases there are spellchecker results when distrib=false In distributed mode (distrib=true) case when matches=0 1. group=true, no spellcheck results 2. group=false , there are spellcheck results case when matches0 1. group=true, there are spellcheck results 2. group =false, there are spellcheck results Do these constitute a failing test case? Thanks. Alex. -Original Message- From: alxsss alx...@aim.com To: solr-user solr-user@lucene.apache.org Sent: Thu, Mar 21, 2013 6:50 pm Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud Hello, I am debugging the SpellCheckComponent#finishStage. From the responses I see that not only wordbreak, but also directSpellchecker does not return some results in distributed mode. The request handler I was using had str name=grouptrue/str So, I desided to turn of grouping and I see spellcheck results in distributed mode. curl 'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandler' has no spellchek results but curl 'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandler group=false' returns results. So, the conclusion is that grouping causes the distributed spellcheker to fail. Could please you point me to the class that may be responsible to this issue? Thanks. Alex. -Original Message- From: Dyer, James james.d...@ingramcontent.com To: solr-user solr-user@lucene.apache.org Sent: Thu, Mar 21, 2013 11:23 am Subject: RE: strange behaviour of wordbreak spellchecker in solr cloud The shard responses get combined in SpellCheckComponent#finishStage . I highly recommend you file a JIRA bug report for this at https://issues.apache.org/jira/browse/SOLR . If you write a failing unit test, it would make it much more likely that others would help you with a fix. Of course, if you solve the issue entirely, a patch would be much appreciated. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: alx...@aim.com [mailto:alx...@aim.com] Sent: Thursday, March 21, 2013 12:45 PM To: solr-user@lucene.apache.org Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud Hello, We need this feature be fixed ASAP. So, please let me know which class is responsible for combining spellcheck results from all shards. I will try to debug the code. Thanks in advance. Alex. -Original Message- From: alxsss alx...@aim.com To: solr-user solr-user@lucene.apache.org Sent: Tue, Mar 19, 2013 11:34 am Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud -- distributed environment. But to nail it down, we probably need to see both -- the applicable requestHandler / Not sure what this is? I have searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypespell/str !-- Multiple Spell Checkers can be declared and used by this component -- !-- a spellchecker built from a field of the main index -- lst name=spellchecker str name=namedirect/str str name=fieldspell/str str name=classnamesolr.DirectSolrSpellChecker/str !-- the spellcheck distance measure used, the default is the internal levenshtein -- str name=distanceMeasureinternal/str !-- minimum accuracy needed to be considered a valid spellcheck suggestion -- float name=accuracy0.5/float !-- the maximum #edits we consider when enumerating terms: can be 1 or 2 -- int name=maxEdits2/int !-- the minimum shared prefix when enumerating terms -- int name=minPrefix1/int !-- maximum number of inspections per result. -- int name=maxInspections5/int !-- minimum length of a query term to be considered for correction -- int name=minQueryLength4/int !-- maximum threshold of documents a query term can appear to be considered for correction -- float name=maxQueryFrequency0.01/float !-- uncomment this to require suggestions to occur in 1% of the documents float name=thresholdTokenFrequency.01/float -- /lst !-- a spellchecker that can break or combine words. See /spell handler below for usage -- lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldspell/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges10/int /lst !-- a spellchecker that uses a different distance measure -- !-- lst name=spellchecker str name=namejarowinkler/str str name=fieldspell/str str name=classnamesolr.DirectSolrSpellChecker/str str name=distanceMeasure org.apache.lucene.search.spell.JaroWinklerDistance /str /lst
Re: strange behaviour of wordbreak spellchecker in solr cloud
Thanks. I can fix this, but going over code it seems it is not easy to figure out where the whole request and response come from. I followed up SpellCheckComponent#finishStage and found out that SearchHandler#handleRequestBody calls this function. However, which part calls handleRequestBody and how its arguments are constructed is not clear. Thanks. Alex. -Original Message- From: Dyer, James james.d...@ingramcontent.com To: solr-user solr-user@lucene.apache.org Sent: Fri, Mar 22, 2013 2:08 pm Subject: RE: strange behaviour of wordbreak spellchecker in solr cloud Alex, I added your comments to SOLR-3758 (https://issues.apache.org/jira/browse/SOLR-3758) , which seems to me to be the very same issue. If you need this to work now and if you cannot devise a fix yourself, then perhaps a workaround is if the query returns with 0 results, re-issue the query with rows=0group=false (you would omit all other optional components also). This will give you back just a spell check result. I realize this is not optimal because it requires the overhead of issuing 2 queries but if you do it only in instances the user gets nothing (or very little) back maybe it would be tolerable? Then once a viable fix is devised you can remove the extra code from your application. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: alx...@aim.com [mailto:alx...@aim.com] Sent: Friday, March 22, 2013 12:53 PM To: solr-user@lucene.apache.org Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud Hello, Further investigation shows the following pattern, for both DirectIndex and wordbreak spellchekers. Assume that in all cases there are spellchecker results when distrib=false In distributed mode (distrib=true) case when matches=0 1. group=true, no spellcheck results 2. group=false , there are spellcheck results case when matches0 1. group=true, there are spellcheck results 2. group =false, there are spellcheck results Do these constitute a failing test case? Thanks. Alex. -Original Message- From: alxsss alx...@aim.com To: solr-user solr-user@lucene.apache.org Sent: Thu, Mar 21, 2013 6:50 pm Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud Hello, I am debugging the SpellCheckComponent#finishStage. From the responses I see that not only wordbreak, but also directSpellchecker does not return some results in distributed mode. The request handler I was using had str name=grouptrue/str So, I desided to turn of grouping and I see spellcheck results in distributed mode. curl 'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandler' has no spellchek results but curl 'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandler group=false' returns results. So, the conclusion is that grouping causes the distributed spellcheker to fail. Could please you point me to the class that may be responsible to this issue? Thanks. Alex. -Original Message- From: Dyer, James james.d...@ingramcontent.com To: solr-user solr-user@lucene.apache.org Sent: Thu, Mar 21, 2013 11:23 am Subject: RE: strange behaviour of wordbreak spellchecker in solr cloud The shard responses get combined in SpellCheckComponent#finishStage . I highly recommend you file a JIRA bug report for this at https://issues.apache.org/jira/browse/SOLR . If you write a failing unit test, it would make it much more likely that others would help you with a fix. Of course, if you solve the issue entirely, a patch would be much appreciated. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: alx...@aim.com [mailto:alx...@aim.com] Sent: Thursday, March 21, 2013 12:45 PM To: solr-user@lucene.apache.org Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud Hello, We need this feature be fixed ASAP. So, please let me know which class is responsible for combining spellcheck results from all shards. I will try to debug the code. Thanks in advance. Alex. -Original Message- From: alxsss alx...@aim.com To: solr-user solr-user@lucene.apache.org Sent: Tue, Mar 19, 2013 11:34 am Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud -- distributed environment. But to nail it down, we probably need to see both -- the applicable requestHandler / Not sure what this is? I have searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypespell/str !-- Multiple Spell Checkers can be declared and used by this component -- !-- a spellchecker built from a field of the main index -- lst name=spellchecker str name=namedirect/str str name=fieldspell/str str name=classnamesolr.DirectSolrSpellChecker/str !-- the spellcheck distance measure used, the default is the internal levenshtein -- str name
Re: strange behaviour of wordbreak spellchecker in solr cloud
Hello, We need this feature be fixed ASAP. So, please let me know which class is responsible for combining spellcheck results from all shards. I will try to debug the code. Thanks in advance. Alex. -Original Message- From: alxsss alx...@aim.com To: solr-user solr-user@lucene.apache.org Sent: Tue, Mar 19, 2013 11:34 am Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud -- distributed environment. But to nail it down, we probably need to see both -- the applicable requestHandler / Not sure what this is? I have searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypespell/str !-- Multiple Spell Checkers can be declared and used by this component -- !-- a spellchecker built from a field of the main index -- lst name=spellchecker str name=namedirect/str str name=fieldspell/str str name=classnamesolr.DirectSolrSpellChecker/str !-- the spellcheck distance measure used, the default is the internal levenshtein -- str name=distanceMeasureinternal/str !-- minimum accuracy needed to be considered a valid spellcheck suggestion -- float name=accuracy0.5/float !-- the maximum #edits we consider when enumerating terms: can be 1 or 2 -- int name=maxEdits2/int !-- the minimum shared prefix when enumerating terms -- int name=minPrefix1/int !-- maximum number of inspections per result. -- int name=maxInspections5/int !-- minimum length of a query term to be considered for correction -- int name=minQueryLength4/int !-- maximum threshold of documents a query term can appear to be considered for correction -- float name=maxQueryFrequency0.01/float !-- uncomment this to require suggestions to occur in 1% of the documents float name=thresholdTokenFrequency.01/float -- /lst !-- a spellchecker that can break or combine words. See /spell handler below for usage -- lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldspell/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges10/int /lst !-- a spellchecker that uses a different distance measure -- !-- lst name=spellchecker str name=namejarowinkler/str str name=fieldspell/str str name=classnamesolr.DirectSolrSpellChecker/str str name=distanceMeasure org.apache.lucene.search.spell.JaroWinklerDistance /str /lst -- !-- a spellchecker that use an alternate comparator comparatorClass be one of: 1. score (default) 2. freq (Frequency first, then score) 3. A fully qualified class name -- !-- lst name=spellchecker str name=namefreq/str str name=fieldlowerfilt/str str name=classnamesolr.DirectSolrSpellChecker/str str name=comparatorClassfreq/str -- !-- A spellchecker that reads the list of words from a file -- !-- lst name=spellchecker str name=classnamesolr.FileBasedSpellChecker/str str name=namefile/str str name=sourceLocationspellings.txt/str str name=characterEncodingUTF-8/str str name=spellcheckIndexDirspellcheckerFile/str /lst -- /searchComponent spell filed in our schema is called spell and its type also is called spell. Here are requests curl 'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandlerdistrib=false' ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime32/int lst name=params str name=indenttrue/str str name=shards.qttesthandler/str str name=qpaulusoles/str str name=distribfalse/str str name=rows10/str /lst /lst lst name=grouped lst name=site int name=matches0/int int name=ngroups0/int arr name=groups/ /lst /lst lst name=highlighting/ lst name=spellcheck lst name=suggestions/ /lst /response curl 'server2:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandlerdistrib=false' ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime26/int lst name=params str name=indenttrue/str str name=shards.qttesthandler/str str name=qpaulusoles/str str name=distribfalse/str str name=rows10/str /lst /lst lst name=grouped lst name=site int name=matches0/int int name=ngroups0/int arr name=groups/ /lst /lst lst name=highlighting/ lst name=spellcheck lst name=suggestions lst name=paulusoles int name=numFound1/int int name=startOffset0/int int name=endOffset11/int arr name=suggestion strpaul u soles/str /arr /lst str name=collation(paul u soles)/str
Re: strange behaviour of wordbreak spellchecker in solr cloud
Hello, I am debugging the SpellCheckComponent#finishStage. From the responses I see that not only wordbreak, but also directSpellchecker does not return some results in distributed mode. The request handler I was using had str name=grouptrue/str So, I desided to turn of grouping and I see spellcheck results in distributed mode. curl 'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandler' has no spellchek results but curl 'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandler group=false' returns results. So, the conclusion is that grouping causes the distributed spellcheker to fail. Could please you point me to the class that may be responsible to this issue? Thanks. Alex. -Original Message- From: Dyer, James james.d...@ingramcontent.com To: solr-user solr-user@lucene.apache.org Sent: Thu, Mar 21, 2013 11:23 am Subject: RE: strange behaviour of wordbreak spellchecker in solr cloud The shard responses get combined in SpellCheckComponent#finishStage . I highly recommend you file a JIRA bug report for this at https://issues.apache.org/jira/browse/SOLR . If you write a failing unit test, it would make it much more likely that others would help you with a fix. Of course, if you solve the issue entirely, a patch would be much appreciated. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: alx...@aim.com [mailto:alx...@aim.com] Sent: Thursday, March 21, 2013 12:45 PM To: solr-user@lucene.apache.org Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud Hello, We need this feature be fixed ASAP. So, please let me know which class is responsible for combining spellcheck results from all shards. I will try to debug the code. Thanks in advance. Alex. -Original Message- From: alxsss alx...@aim.com To: solr-user solr-user@lucene.apache.org Sent: Tue, Mar 19, 2013 11:34 am Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud -- distributed environment. But to nail it down, we probably need to see both -- the applicable requestHandler / Not sure what this is? I have searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypespell/str !-- Multiple Spell Checkers can be declared and used by this component -- !-- a spellchecker built from a field of the main index -- lst name=spellchecker str name=namedirect/str str name=fieldspell/str str name=classnamesolr.DirectSolrSpellChecker/str !-- the spellcheck distance measure used, the default is the internal levenshtein -- str name=distanceMeasureinternal/str !-- minimum accuracy needed to be considered a valid spellcheck suggestion -- float name=accuracy0.5/float !-- the maximum #edits we consider when enumerating terms: can be 1 or 2 -- int name=maxEdits2/int !-- the minimum shared prefix when enumerating terms -- int name=minPrefix1/int !-- maximum number of inspections per result. -- int name=maxInspections5/int !-- minimum length of a query term to be considered for correction -- int name=minQueryLength4/int !-- maximum threshold of documents a query term can appear to be considered for correction -- float name=maxQueryFrequency0.01/float !-- uncomment this to require suggestions to occur in 1% of the documents float name=thresholdTokenFrequency.01/float -- /lst !-- a spellchecker that can break or combine words. See /spell handler below for usage -- lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldspell/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges10/int /lst !-- a spellchecker that uses a different distance measure -- !-- lst name=spellchecker str name=namejarowinkler/str str name=fieldspell/str str name=classnamesolr.DirectSolrSpellChecker/str str name=distanceMeasure org.apache.lucene.search.spell.JaroWinklerDistance /str /lst -- !-- a spellchecker that use an alternate comparator comparatorClass be one of: 1. score (default) 2. freq (Frequency first, then score) 3. A fully qualified class name -- !-- lst name=spellchecker str name=namefreq/str str name=fieldlowerfilt/str str name=classnamesolr.DirectSolrSpellChecker/str str name=comparatorClassfreq/str -- !-- A spellchecker that reads the list of words from a file -- !-- lst name=spellchecker str name=classnamesolr.FileBasedSpellChecker/str str name=namefile/str str name=sourceLocationspellings.txt/str str name=characterEncodingUTF-8/str
Re: strange behaviour of wordbreak spellchecker in solr cloud
Hello, I was testing my custom testhandler. Direct spellchecker also was not working in cloud. After I added arr name=last-components strspellcheck/str /arr to /select requestHandler it worked but the wordbreak spellchecker. I have added shards.qt=testhanlder to curl request but it did not solve the issue. Thanks. Alex. -Original Message- From: Dyer, James james.d...@ingramcontent.com To: solr-user solr-user@lucene.apache.org Sent: Tue, Mar 19, 2013 10:30 am Subject: RE: strange behaviour of wordbreak spellchecker in solr cloud Mark, I wasn't sure if Alex is actually testing /select, or if the problem is just coming up in /testhandler. Just wanted to verify that before we get into bug reports. DistributedSpellCheckComponentTest does have 1 little Word Break test scenario in it, so we know WordBreakSolrSpellChecker at least works some of the time in a Distributed environment :) . Ideally, we should probably use a random test for stuff like this as adding a bunch of test scenarios would make this already-slower-than-molasses test even slower. On the other hand, we want to test as many possibilities as we can. Based on DSCCT and it being so superficial, I really can't vouch too much for my spell check enhancements working as well with shards as they do with a single index. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, March 19, 2013 11:49 AM To: solr-user@lucene.apache.org Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud My first thought too, but then I saw that he had the spell component in both his custom testhander and the /select handler, so I'd expect that to work as well. - Mark On Mar 19, 2013, at 12:18 PM, Dyer, James james.d...@ingramcontent.com wrote: Can you try including in your request the shards.qt parameter? In your case, I think you should set it to testhandler. See http://wiki.apache.org/solr/SpellCheckComponent?highlight=%28shards\.qt%29#Distributed_Search_Support for a brief discussion. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: alx...@aim.com [mailto:alx...@aim.com] Sent: Monday, March 18, 2013 4:07 PM To: solr-user@lucene.apache.org Subject: strange behaviour of wordbreak spellchecker in solr cloud Hello, I try to use wordbreak spellchecker in solr-4.2 with cloud feature. We have two server with one shard in each of them. curl 'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10' curl 'server2:8983/solr/test/testhandler?q=paulusolesindent=truerows=10' does not return any results in spellchecker. However, if I specify distrib=false only one of these has spellchecker results. curl 'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10distrib=false' no spellcheler results curl 'server2:8983/solr/test/testhandler?q=paulusolesindent=truerows=10distrib=false' returns spellcheker results. My testhandler and select handlers are as follows requestHandler name=/testhandler class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qfhost^30 content^0.5 title^1.2 /str str name=pfsite^25 content^10 title^22/str str name=flurl,id,title/str !-- str name=mm2-1 5-3 690%/str -- str name=mm3-1 5-3 690%/str int name=ps1/int str name=hltrue/str str name=hl.flcontent/str str name=f.content.hl.fragmenterregex/str str name=hl.fragsize165/str str name=hl.fragmentsBuilderdefault/str str name=spellcheck.dictionarydirect/str str name=spellcheck.dictionarywordbreak/str str name=spellcheckon/str str name=spellcheck.collatetrue/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.count2/str /lst arr name=last-components strspellcheck/str /arr /requestHandler requestHandler name=/select class=solr.SearchHandler !-- default values for query parameters can be specified, these will be overridden by parameters in the request -- lst name=defaults str name=echoParamsexplicit/str int name=rows10/int !-- str name=dftext/str -- /lst !-- In addition to defaults, appends params can be specified to identify values which should be appended to the list of multi-val params from the query (or the existing defaults). -- !-- In this example, the param fq=instock:true would be appended to any query time fq params the user may specify, as a mechanism for partitioning the index, independent of any user selected filtering that may also be desired (perhaps as a result of faceted searching). NOTE: there is *absolutely* nothing a client can do to prevent these appends values from being used, so don't use this mechanism unless you are sure you always want it. --
Re: strange behaviour of wordbreak spellchecker in solr cloud
-- distributed environment. But to nail it down, we probably need to see both -- the applicable requestHandler / Not sure what this is? I have searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypespell/str !-- Multiple Spell Checkers can be declared and used by this component -- !-- a spellchecker built from a field of the main index -- lst name=spellchecker str name=namedirect/str str name=fieldspell/str str name=classnamesolr.DirectSolrSpellChecker/str !-- the spellcheck distance measure used, the default is the internal levenshtein -- str name=distanceMeasureinternal/str !-- minimum accuracy needed to be considered a valid spellcheck suggestion -- float name=accuracy0.5/float !-- the maximum #edits we consider when enumerating terms: can be 1 or 2 -- int name=maxEdits2/int !-- the minimum shared prefix when enumerating terms -- int name=minPrefix1/int !-- maximum number of inspections per result. -- int name=maxInspections5/int !-- minimum length of a query term to be considered for correction -- int name=minQueryLength4/int !-- maximum threshold of documents a query term can appear to be considered for correction -- float name=maxQueryFrequency0.01/float !-- uncomment this to require suggestions to occur in 1% of the documents float name=thresholdTokenFrequency.01/float -- /lst !-- a spellchecker that can break or combine words. See /spell handler below for usage -- lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldspell/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges10/int /lst !-- a spellchecker that uses a different distance measure -- !-- lst name=spellchecker str name=namejarowinkler/str str name=fieldspell/str str name=classnamesolr.DirectSolrSpellChecker/str str name=distanceMeasure org.apache.lucene.search.spell.JaroWinklerDistance /str /lst -- !-- a spellchecker that use an alternate comparator comparatorClass be one of: 1. score (default) 2. freq (Frequency first, then score) 3. A fully qualified class name -- !-- lst name=spellchecker str name=namefreq/str str name=fieldlowerfilt/str str name=classnamesolr.DirectSolrSpellChecker/str str name=comparatorClassfreq/str -- !-- A spellchecker that reads the list of words from a file -- !-- lst name=spellchecker str name=classnamesolr.FileBasedSpellChecker/str str name=namefile/str str name=sourceLocationspellings.txt/str str name=characterEncodingUTF-8/str str name=spellcheckIndexDirspellcheckerFile/str /lst -- /searchComponent spell filed in our schema is called spell and its type also is called spell. Here are requests curl 'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandlerdistrib=false' ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime32/int lst name=params str name=indenttrue/str str name=shards.qttesthandler/str str name=qpaulusoles/str str name=distribfalse/str str name=rows10/str /lst /lst lst name=grouped lst name=site int name=matches0/int int name=ngroups0/int arr name=groups/ /lst /lst lst name=highlighting/ lst name=spellcheck lst name=suggestions/ /lst /response curl 'server2:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandlerdistrib=false' ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime26/int lst name=params str name=indenttrue/str str name=shards.qttesthandler/str str name=qpaulusoles/str str name=distribfalse/str str name=rows10/str /lst /lst lst name=grouped lst name=site int name=matches0/int int name=ngroups0/int arr name=groups/ /lst /lst lst name=highlighting/ lst name=spellcheck lst name=suggestions lst name=paulusoles int name=numFound1/int int name=startOffset0/int int name=endOffset11/int arr name=suggestion strpaul u soles/str /arr /lst str name=collation(paul u soles)/str /lst /lst /response No distrib param curl 'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandler' ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime24/int lst name=params str name=indenttrue/str str name=shards.qttesthandler/str str name=qpaulusoles/str str name=distribfalse/str str name=rows10/str
Re: structure of solr index
---So,search time is in no way impacting by the existence or non-existence of stored values, What about memory? Would it require to increase memeory in order to have the same Qtime as in the case of indexed only fields? For example in the case of indexed fields only index size is 5GB, average Qtime is 0.1 sec and memory is 10G. In case when the same fields are indexed and stored index size is 50GB. Will the Qtime be 0.1s + time for extracting of stored fields? Another scenario is to store fields in hbase or cassandra, have only indexed fields in Solr and after getting id field from solr extract stored values from hbase or cassandra. Will this setup be faster than the one with stored fields in Solr? Thanks. Alex. -Original Message- From: Jack Krupansky j...@basetechnology.com To: solr-user solr-user@lucene.apache.org Sent: Sat, Mar 16, 2013 9:53 am Subject: Re: structure of solr index Search depends only on the index. But... returning field values for each of the matched documents does require access to the stored values. So, search time is in no way impacting by the existence or non-existence of stored values, but total query processing time would of course include both search time and the time to access and format the stored field values. -- Jack Krupansky -Original Message- From: alx...@aim.com Sent: Saturday, March 16, 2013 12:48 PM To: solr-user@lucene.apache.org Subject: Re: structure of solr index Hi, So, will search time be the same for the case when fields are indexed only vs the case when they are indexed and stored? Thanks. Alex. -Original Message- From: Otis Gospodnetic otis.gospodne...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Fri, Mar 15, 2013 8:09 pm Subject: Re: structure of solr index Hi, I think you are asking if the original/raw content of those fields will be read. No, it won't, not for the search itself. If you want to retrieve/return those fields then, of course, they will be read for the documents being returned. Otis -- Solr ElasticSearch Support http://sematext.com/ On Fri, Mar 15, 2013 at 2:41 PM, alx...@aim.com wrote: Hi, I wondered if solr searches on indexed fields only or on entire index? In more detail, let say I have fields id, title and content, all indexed, stored. Will a search send all these fields to memory or only indexed part of these fields? Thanks. Alex.
strange behaviour of wordbreak spellchecker in solr cloud
Hello, I try to use wordbreak spellchecker in solr-4.2 with cloud feature. We have two server with one shard in each of them. curl 'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10' curl 'server2:8983/solr/test/testhandler?q=paulusolesindent=truerows=10' does not return any results in spellchecker. However, if I specify distrib=false only one of these has spellchecker results. curl 'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10distrib=false' no spellcheler results curl 'server2:8983/solr/test/testhandler?q=paulusolesindent=truerows=10distrib=false' returns spellcheker results. My testhandler and select handlers are as follows requestHandler name=/testhandler class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qfhost^30 content^0.5 title^1.2 /str str name=pfsite^25 content^10 title^22/str str name=flurl,id,title/str !-- str name=mm2-1 5-3 690%/str -- str name=mm3-1 5-3 690%/str int name=ps1/int str name=hltrue/str str name=hl.flcontent/str str name=f.content.hl.fragmenterregex/str str name=hl.fragsize165/str str name=hl.fragmentsBuilderdefault/str str name=spellcheck.dictionarydirect/str str name=spellcheck.dictionarywordbreak/str str name=spellcheckon/str str name=spellcheck.collatetrue/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.count2/str /lst arr name=last-components strspellcheck/str /arr /requestHandler requestHandler name=/select class=solr.SearchHandler !-- default values for query parameters can be specified, these will be overridden by parameters in the request -- lst name=defaults str name=echoParamsexplicit/str int name=rows10/int !-- str name=dftext/str -- /lst !-- In addition to defaults, appends params can be specified to identify values which should be appended to the list of multi-val params from the query (or the existing defaults). -- !-- In this example, the param fq=instock:true would be appended to any query time fq params the user may specify, as a mechanism for partitioning the index, independent of any user selected filtering that may also be desired (perhaps as a result of faceted searching). NOTE: there is *absolutely* nothing a client can do to prevent these appends values from being used, so don't use this mechanism unless you are sure you always want it. -- !-- lst name=appends str name=fqinStock:true/str /lst -- !-- invariants are a way of letting the Solr maintainer lock down the options available to Solr clients. Any params values specified here are used regardless of what values may be specified in either the query, the defaults, or the appends params. In this example, the facet.field and facet.query params would be fixed, limiting the facets clients can use. Faceting is not turned on by default - but if the client does specify facet=true in the request, these are the only facets they will be able to see counts for; regardless of what other facet.field or facet.query params they may specify. NOTE: there is *absolutely* nothing a client can do to prevent these invariants values from being used, so don't use this mechanism unless you are sure you always want it. -- !-- lst name=invariants str name=facet.fieldcat/str str name=facet.fieldmanu_exact/str str name=facet.queryprice:[* TO 500]/str str name=facet.queryprice:[500 TO *]/str /lst -- !-- If the default list of SearchComponents is not desired, that list can either be overridden completely, or components can be prepended or appended to the default list. (see below) -- !-- arr name=components strnameOfCustomComponent1/str strnameOfCustomComponent2/str /arr -- arr name=last-components strspellcheck/str /arr /requestHandler is this a bug or something else has to be done? Thanks. Alex.
Re: structure of solr index
Hi, So, will search time be the same for the case when fields are indexed only vs the case when they are indexed and stored? Thanks. Alex. -Original Message- From: Otis Gospodnetic otis.gospodne...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Fri, Mar 15, 2013 8:09 pm Subject: Re: structure of solr index Hi, I think you are asking if the original/raw content of those fields will be read. No, it won't, not for the search itself. If you want to retrieve/return those fields then, of course, they will be read for the documents being returned. Otis -- Solr ElasticSearch Support http://sematext.com/ On Fri, Mar 15, 2013 at 2:41 PM, alx...@aim.com wrote: Hi, I wondered if solr searches on indexed fields only or on entire index? In more detail, let say I have fields id, title and content, all indexed, stored. Will a search send all these fields to memory or only indexed part of these fields? Thanks. Alex.
structure of solr index
Hi, I wondered if solr searches on indexed fields only or on entire index? In more detail, let say I have fields id, title and content, all indexed, stored. Will a search send all these fields to memory or only indexed part of these fields? Thanks. Alex.
spellchecker does not have suggestion for keywords typed through a non-whitespace delimiter
Hello, Recently we noticed that solr and its spellchecker do not return results for keywords typed with non-whitespace delimiter. A user accidentally typed u instead of white space. For example, paulusoles instead of paul soles. Solr does not return any results or spellcheck suggestion for keyword paulusoles, although it returns results for keywords paul soles, paul, and soles. search.yahoo.com returns results for the keyword paulusoles as if it was given keyword paul soles. Any ideas how to implement this functionality in solr? text and spell fields are as follows; fieldType name=text class=solr.TextField positionIncrementGap=100 termVectors=true termPositions=true termOffsets=true analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 splitOnNumerics=0/ filter class=solr.ICUFoldingFilterFactory / filter class=solr.SnowballPorterFilterFactory language=Spanish / filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=spell class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 splitOnNumerics=0/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType str name=spellchecktrue/str str name=spellcheck.dictionarydirect/str str name=spellcheck.collatetrue/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count2/str This is solr -4.1.0 with cloud feature and index based dictionary. Thanks. Alex.
Re: solr cloud index size is too big
Hi, It is the index folder. tlog is only a few MB. I have analysed all changed and found out that only one field in schema was changed. This field in non cloud fieldType name=text class=solr.TextField positionIncrementGap=100 was changed to fieldType name=text class=solr.TextField positionIncrementGap=100 termVectors=true termPositions=true termOffsets=true in cloud to use fastVectorHighlighting. Is it possible that this change could double index size? Thanks. Alex. -Original Message- From: Jan Høydahl jan@cominvent.com To: solr-user solr-user@lucene.apache.org Sent: Mon, Mar 4, 2013 2:24 pm Subject: Re: solr cloud index size is too big Can you tell whether it's the index folder that is that large or is it including the tlog transaction log folder? If you have a huge transaction log, you need to start sending hard commits more often during indexing to flush the tlogs. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 4. mars 2013 kl. 04:16 skrev alx...@aim.com: Hello, I had a non cloud collection index size around 80G for 15M documents with solr-4.1.0. So, I decided to use solr cloud with two shards and sent to solr the following command curl 'http://slave:8983/solr/admin/collections?action=CREATEname=mycollectionnumShards=2replicationFactor=1maxShardsPerNode=1' I tried to put replicationFactor=0 but this command gave an error. After reindexing, into two separate linux boxes with one instances of solr running in each of them I see that size of index in each shard is 90GB versus expected 40GB although each of the shards has half (7.5M) of documents. Any ideas what went wrong? Thanks. Alex.
Re: How do I create two collections on the same cluster?
Hi, What if you add new collection to solr.xml file? Alex. -Original Message- From: Shankar Sundararaju shan...@ebrary.com To: solr-user solr-user@lucene.apache.org Sent: Thu, Feb 21, 2013 8:51 pm Subject: How do I create two collections on the same cluster? I am using Solr 4.1. I created collection1 consisting of 2 leaders and 2 replicas (2 shards) at boot time. After the cluster is up, I am trying to create collection2 with 2 leaders and 2 replicas just like collection1. I am using following collections API for that: http://localhost:7575/solr/admin/collections?action=CREATEname=collection2numShards=2replicationFactor=2collection.configName=myconfcreateNodeSet=localhost:8983_solr,localhost:7574_solr,localhost:7575_solr,localhost:7576_solr Yes, collection2 does get created. But I see a problem - createNodeSet parameter is not being honored. All 4 nodes are not being used to create collection2, only 3 are being used. Is this a bug or I don't understand how this parameter should be used? What is the best way to create collection2? Can I specify both collections in solr.xml in the solr home dir in all nodes and launch them? Do I have to get the configs for collection2 uploaded to zookeeper before I launch the nodes? Thanks in advance. -Shankar -- Regards, *Shankar Sundararaju *Sr. Software Architect ebrary, a ProQuest company 410 Cambridge Avenue, Palo Alto, CA 94306 USA shan...@ebrary.com | www.ebrary.com | 650-475-8776 (w) | 408-426-3057 (c)
how to overrride pre and post tags when usefastVectorHighlighter is set to true
Hello, I was unable to change pre and post tags for highlighting when usefastVectorHighlighter is set to true. Changing default tags in solrconfig.xml works for standard highlighter though. I searched mailing list and the net with no success. I use solr-4.1.0. Thanks. Alex.
Re: long QTime for big index
Hi, It is curious to know how many linux boxes do you have and how many cores in each of them. It was my understanding that solr puts in the memory all documents found for a keyword, not the whole index. So, why it must be faster with more cores, when number of selected documents from many separate cores are the same as from one core? Thanks. Alex. -Original Message- From: Mou mouna...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Thu, Feb 14, 2013 2:35 pm Subject: Re: long QTime for big index Just to close this discussion , we solved the problem by splitting the index. It turned out that distributed search with 12 cores are faster than searching two cores. All queries ,tomcat configuration, jvm configuration remain same. Now queries are served in milliseconds. On Thu, Jan 31, 2013 at 9:34 PM, Mou [via Lucene] ml-node+s472066n4037870...@n3.nabble.com wrote: Thank you again. Unfortunately the index files will not fit in the RAM.I have to try using document cache. I am also moving my index to SSD again, we took our index off when fusion IO cards failed twice during indexing and index was corrupted.Now with the bios upgrade and new driver, it is supposed to be more reliable. Also I am going to look into the client app to verify that it is making proper query requests. Surprisingly when I used a much lower value than default for defaultconnectionperhost and maxconnectionperhost in solrmeter , it performs very well, the same queries return in less than one sec . I am not sure yet, need to run solrmeter with different heap size , with cache and without cache etc. If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4037870.html To unsubscribe from long QTime for big index, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4040535.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Pause and resume indexing on SolR 4 for backups
Depending on your architecture, why not index the same data into two machines? One will be your prod another your backup? Thanks. Alex. -Original Message- From: Upayavira u...@odoko.co.uk To: solr-user solr-user@lucene.apache.org Sent: Thu, Dec 20, 2012 11:51 am Subject: Re: Pause and resume indexing on SolR 4 for backups You're saying that there's no chance to catch it in the middle of writing the segments file? Having said that, the segments file is pretty small, so the chance would be pretty slim. Upayavira On Thu, Dec 20, 2012, at 06:45 PM, Lance Norskog wrote: To be clear: 1) is fine. Lucene index updates are carefully sequenced so that the index is never in a bogus state. All data files are written and flushed to disk, then the segments.* files are written that match the data files. You can capture the files with a set of hard links to create a backup. The CheckIndex program will verify the index backup. java -cp yourcopy/lucene-core-SOMETHING.jar org.apache.lucene.index.CheckIndex collection/data/index lucene-core-SOMETHING.jar is usually in the solr-webapp directory where Solr is unpacked. On 12/20/2012 02:16 AM, Andy D'Arcy Jewell wrote: Hi all. Can anyone advise me of a way to pause and resume SolR 4 so I can perform a backup? I need to be able to revert to a usable (though not necessarily complete) index after a crash or other disaster more quickly than a re-index operation would yield. I can't yet afford the extravagance of a separate SolR replica just for backups, and I'm not sure if I'll ever have the luxury. I'm currently running with just one node, be we are not yet live. I can think of the following ways to do this, each with various downsides: 1) Just backup the existing index files whilst indexing continues + Easy + Fast - Incomplete - Potential for corruption? (e.g. partial files) 2) Stop/Start Tomcat + Easy - Very slow and I/O, CPU intensive - Client gets errors when trying to connect 3) Block/unblock SolR port with IpTables + Fast - Client gets errors when trying to connect - Have to wait for existing transactions to complete (not sure how, maybe watch socket FD's in /proc) 4) Pause/Restart SolR service + Fast ? (hopefully) - Client gets errors when trying to connect In any event, the web app will have to gracefully handle unavailability of SolR, probably by displaying a down for maintenance message, but this should preferably be only a very short amount of time. Can anyone comment on my proposed solutions above, or provide any additional ones? Thanks for any input you can provide! -Andy
Re: Grouping performance problem
Re: Grouping performance problem
This is strange. We have data folder size 24Gb, RAM for java 2GB. We query with grouping, ngroups and highlighting, do not query all fields and query time mostly is less than 1 sec it rarely goes up to 2 sec. We use solr 3.6 and tuned off all kind of caching. Maybe your problem is with caching and displaying all fields? Hope this may help. Alex. -Original Message- From: Agnieszka Kukałowicz agnieszka.kukalow...@usable.pl To: solr-user solr-user@lucene.apache.org Sent: Mon, Jul 16, 2012 10:04 am Subject: Re: Grouping performance problem I have server with 24GB RAM. I have 4 shards on it, each of them with 4GB RAM for java: JAVA_OPTIONS=-server -Xms4096M -Xmx4096M The size is about 15GB for one shard (i use ssd disk for index data). Agnieszka 2012/7/16 alx...@aim.com What are the RAM of your server and size of the data folder? -Original Message- From: Agnieszka Kukałowicz agnieszka.kukalow...@usable.pl To: solr-user solr-user@lucene.apache.org Sent: Mon, Jul 16, 2012 6:16 am Subject: Re: Grouping performance problem Hi Pavel, I tried with group.ngroups=false but didn't notice a big improvement. The times were still about 4000 ms. It doesn't solve my problem. Maybe this is because of my index type. I have millions of documents but only about 20 000 groups. Cheers Agnieszka 2012/7/16 Pavel Goncharik pavel.goncha...@gmail.com Hi Agnieszka , if you don't need number of groups, you can try leaving out group.ngroups=true param. In this case Solr apparently skips calculating all groups and delivers results much faster. At least for our application the difference in performance with/without group.ngroups=true is significant (have to say, we use Solr 3.6). WBR, Pavel On Mon, Jul 16, 2012 at 1:00 PM, Agnieszka Kukałowicz agnieszka.kukalow...@usable.pl wrote: Hi, Is the any way to make grouping searches more efficient? My queries look like: /select?q=querygroup=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1 For index with 3 mln documents query for all docs with group=true takes almost 4000ms. Because queryResultCache is not used next queries take a long time also. When I remove group=true and leave only faceting the query for all docs takes much more less time: for first time ~ 700ms and next runs only 200ms because of queryResultCache being used. So with group=true the query is about 20 time slower than without it. Is it possible or is there any way to improve performance with grouping? My application needs grouping feature and all of the queries use it but the performance of them is to low for production use. I use Solr 4.x from trunk Agnieszka Kukalowicz
Re: Broken pipe error
I had the same problem with jetty. It turned out that broken pipe happens when application disconnects from jetty. In my case I was using php client and it had 10 sec restriction in curl request. When solr takes more than 10 sec to respond, curl automatically disconnected from jetty. Hope this can help. Alex. -Original Message- From: Jason hialo...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Mon, Jul 2, 2012 7:41 pm Subject: Broken pipe error Hi, all We're independently running three search servers. One of three servers has bigger index size and more connection users than the others. Except that, all configurations are same. Problem is that server sometimes occurs broken pipe error. But I don't know what problem is. Please give some ideas. Thanks in advance. Jason error message below... === 2012-07-03 10:42:56,753 [http-8080-exec-3677] ERROR org.apache.solr.servlet.SolrDispatchFilter - null:ClientAbortException: java.io.IOException: Broken pipe at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:358) at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:432) at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:309) at org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:288) at org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:98) at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:278) at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212) at org.apache.solr.util.FastWriter.flush(FastWriter.java:115) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:402) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:279) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:470) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:732) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2262) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:69) at sun.nio.ch.IOUtil.write(IOUtil.java:40) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334) at org.apache.tomcat.util.net.NioChannel.write(NioChannel.java:116) at org.apache.tomcat.util.net.NioBlockingSelector.write(NioBlockingSelector.java:93) at org.apache.tomcat.util.net.NioSelectorPool.write(NioSelectorPool.java:156) at org.apache.coyote.http11.InternalNioOutputBuffer.writeToSocket(InternalNioOutputBuffer.java:460) at org.apache.coyote.http11.InternalNioOutputBuffer.flushBuffer(InternalNioOutputBuffer.java:804) at org.apache.coyote.http11.InternalNioOutputBuffer.addToBB(InternalNioOutputBuffer.java:644) at org.apache.coyote.http11.InternalNioOutputBuffer.access$000(InternalNioOutputBuffer.java:46) at org.apache.coyote.http11.InternalNioOutputBuffer$SocketOutputBuffer.doWrite(InternalNioOutputBuffer.java:829) at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:126) at org.apache.coyote.http11.InternalNioOutputBuffer.doWrite(InternalNioOutputBuffer.java:610) at org.apache.coyote.Response.doWrite(Response.java:560) at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:353) ... 25 more -- View this message in context:
Re: Removing old documents
I use jetty that comes with solr. I use solr's dedupe updateRequestProcessorChain name=dedupe processor class=solr.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool str name=signatureFieldid/str bool name=overwriteDupestrue/bool str name=fieldsurl/str str name=signatureClasssolr.processor.Lookup3Signature/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain and because of this id is not url itself but its encoded signature. I see solrclean uses url to delete a document. Is it possible that the issue is because of this mismatch? Thanks. Alex. -Original Message- From: Paul Libbrecht p...@hoplahup.net To: solr-user solr-user@lucene.apache.org Sent: Tue, May 1, 2012 11:43 pm Subject: Re: Removing old documents With which client? paul Le 2 mai 2012 à 01:29, alx...@aim.com a écrit : all caching is disabled and I restarted jetty. The same results.
Re: Removing old documents
Hello, I did bin/nutch solrclean crawl/crawldb http://127.0.0.1:8983/solr/ without and with -noCommit and restarted solr server Log shows that 5 documents were removed but they are still in the search results. Is this a bug or something is missing? I use nutch-1.4 and solr 3.5 Thanks. Alex. -Original Message- From: Markus Jelsma markus.jel...@openindex.io To: solr-user solr-user@lucene.apache.org Sent: Tue, May 1, 2012 7:41 am Subject: Re: Removing old documents Nutch 1.4 has a separate tool to remove 404 and redirects documents from your index based on your CrawlDB. Trunk's SolrIndexer can add and remove documents in one run based on segment data. On Tuesday 01 May 2012 16:31:47 Bai Shen wrote: I'm running Nutch, so it's updating the documents, but I'm wanting to remove ones that are no longer available. So in that case, there's no update possible. On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk mav.p...@holidaylettings.co.uk wrote: Not sure if there is an automatic way but we do it via a delete query and where possible we update doc under same id to avoid deletes. On 01/05/2012 13:43, Bai Shen baishen.li...@gmail.com wrote: What is the best method to remove old documents? Things that no generate 404 errors, etc. Is there an automatic method or do I have to do it manually? THanks. -- Markus Jelsma - CTO - Openindex
Re: Removing old documents
all caching is disabled and I restarted jetty. The same results. Thanks. Alex. -Original Message- From: Lance Norskog goks...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Tue, May 1, 2012 2:57 pm Subject: Re: Removing old documents Maybe this is the HTTP caching feature? Solr comes with HTTP caching turned on by default and so when you do queries and changes your browser does not fetch your changed documents. On Tue, May 1, 2012 at 11:53 AM, alx...@aim.com wrote: Hello, I did bin/nutch solrclean crawl/crawldb http://127.0.0.1:8983/solr/ without and with -noCommit and restarted solr server Log shows that 5 documents were removed but they are still in the search results. Is this a bug or something is missing? I use nutch-1.4 and solr 3.5 Thanks. Alex. -Original Message- From: Markus Jelsma markus.jel...@openindex.io To: solr-user solr-user@lucene.apache.org Sent: Tue, May 1, 2012 7:41 am Subject: Re: Removing old documents Nutch 1.4 has a separate tool to remove 404 and redirects documents from your index based on your CrawlDB. Trunk's SolrIndexer can add and remove documents in one run based on segment data. On Tuesday 01 May 2012 16:31:47 Bai Shen wrote: I'm running Nutch, so it's updating the documents, but I'm wanting to remove ones that are no longer available. So in that case, there's no update possible. On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk mav.p...@holidaylettings.co.uk wrote: Not sure if there is an automatic way but we do it via a delete query and where possible we update doc under same id to avoid deletes. On 01/05/2012 13:43, Bai Shen baishen.li...@gmail.com wrote: What is the best method to remove old documents? Things that no generate 404 errors, etc. Is there an automatic method or do I have to do it manually? THanks. -- Markus Jelsma - CTO - Openindex -- Lance Norskog goks...@gmail.com
Re: term frequency outweighs exact phrase match
Hello Hoss, Here are the explain tags for two doc str name=a0127d8e70a6d523 0.021646015 = (MATCH) sum of: 0.021646015 = (MATCH) sum of: 0.02141003 = (MATCH) max plus 0.01 times others of: 2.84194E-4 = (MATCH) weight(content:apache^0.5 in 3578), product of: 0.0029881175 = queryWeight(content:apache^0.5), product of: 0.5 = boost 4.3554416 = idf(docFreq=126092, maxDocs=3613605) 0.0013721307 = queryNorm 0.09510804 = (MATCH) fieldWeight(content:apache in 3578), product of: 2.236068 = tf(termFreq(content:apache)=5) 4.3554416 = idf(docFreq=126092, maxDocs=3613605) 0.009765625 = fieldNorm(field=content, doc=3578) 0.021407187 = (MATCH) weight(title:apache^1.2 in 3578), product of: 0.01371095 = queryWeight(title:apache^1.2), product of: 1.2 = boost 8.327043 = idf(docFreq=2375, maxDocs=3613605) 0.0013721307 = queryNorm 1.5613205 = (MATCH) fieldWeight(title:apache in 3578), product of: 1.0 = tf(termFreq(title:apache)=1) 8.327043 = idf(docFreq=2375, maxDocs=3613605) 0.1875 = fieldNorm(field=title, doc=3578) 2.359865E-4 = (MATCH) max plus 0.01 times others of: 2.359865E-4 = (MATCH) weight(content:solr^0.5 in 3578), product of: 0.004071705 = queryWeight(content:solr^0.5), product of: 0.5 = boost 5.9348645 = idf(docFreq=25986, maxDocs=3613605) 0.0013721307 = queryNorm 0.05795766 = (MATCH) fieldWeight(content:solr in 3578), product of: 1.0 = tf(termFreq(content:solr)=1) 5.9348645 = idf(docFreq=25986, maxDocs=3613605) 0.009765625 = fieldNorm(field=content, doc=3578) /strstr name=d89380e313c64aa5 0.021465056 = (MATCH) sum of: 1.8154096E-4 = (MATCH) sum of: 6.354771E-5 = (MATCH) max plus 0.01 times others of: 6.354771E-5 = (MATCH) weight(content:apache^0.5 in 638040), product of: 0.0029881175 = queryWeight(content:apache^0.5), product of: 0.5 = boost 4.3554416 = idf(docFreq=126092, maxDocs=3613605) 0.0013721307 = queryNorm 0.021266805 = (MATCH) fieldWeight(content:apache in 638040), product of: 1.0 = tf(termFreq(content:apache)=1) 4.3554416 = idf(docFreq=126092, maxDocs=3613605) 0.0048828125 = fieldNorm(field=content, doc=638040) 1.1799325E-4 = (MATCH) max plus 0.01 times others of: 1.1799325E-4 = (MATCH) weight(content:solr^0.5 in 638040), product of: 0.004071705 = queryWeight(content:solr^0.5), product of: 0.5 = boost 5.9348645 = idf(docFreq=25986, maxDocs=3613605) 0.0013721307 = queryNorm 0.02897883 = (MATCH) fieldWeight(content:solr in 638040), product of: 1.0 = tf(termFreq(content:solr)=1) 5.9348645 = idf(docFreq=25986, maxDocs=3613605) 0.0048828125 = fieldNorm(field=content, doc=638040) 0.021283515 = (MATCH) weight(content:apache solr~1^30.0 in 638040), product of: 0.42358932 = queryWeight(content:apache solr~1^30.0), product of: 30.0 = boost 10.290306 = idf(content: apache=126092 solr=25986) 0.0013721307 = queryNorm 0.050245635 = fieldWeight(content:apache solr in 638040), product of: 1.0 = tf(phraseFreq=1.0) 10.290306 = idf(content: apache=126092 solr=25986) 0.0048828125 = fieldNorm(field=content, doc=638040) /str Although the second doc has exact match it is placed after the first one which does not have exact match. I use the following request handler requestHandler name=search class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qfhost^30 content^0.5 title^1.2 anchor^1.2/str str name=pfcontent^30/str str name=flurl,id, site ,title/str str name=mm2lt;-1 5lt;-2 6lt;90%/str int name=ps1/int bool name=hltrue/bool str name=q.alt*:*/str str name=hl.flcontent/str str name=f.title.hl.fragsize0/str str name=hl.fragsize165/str str name=f.title.hl.alternateFieldtitle/str str name=f.url.hl.fragsize0/str str name=f.url.hl.alternateFieldurl/str str name=f.content.hl.fragmenterregex/str str name=spellchecktrue/str str name=spellcheck.collatetrue/str str name=spellcheck.count5/str str name=grouptrue/str str name=group.fieldsite/str str name=group.ngroupstrue/str /lst arr name=last-components strspellcheck/str /arr /requestHandler and the query is as follows http://localhost:8983/solr/select/?q=apache solrversion=2.2start=0rows=10indent=onqt=searchdebugQuery=true Thanks. Alex. -Original Message- From: Chris Hostetter hossman_luc...@fucit.org To: solr-user solr-user@lucene.apache.org Sent: Thu, Apr 12, 2012 7:43 pm Subject: Re: term frequency outweighs exact phrase match : I use solr 3.5 with edismax. I have the following issue with phrase : search. For example if I have three documents with content like : : 1.apache apache : 2. solr solr :
Re: term frequency outweighs exact phrase match
In that case documents 1 and 2 will not be in the results. We need them also be shown in the results but be ranked after those docs with exact match. I think omitting term frequency in calculating ranking in phrase queries will solve this issue, but I do not see that such a parameter in configs. I see omitTermFreqAndPositions=true but not sure if it is the setting I need, because its description is too vague. Thanks. Alex. -Original Message- From: Erick Erickson erickerick...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Wed, Apr 11, 2012 8:23 am Subject: Re: term frequency outweighs exact phrase match Consider boosting on phrase with a SHOULD clause, something like field:apache solr^2.. Best Erick On Tue, Apr 10, 2012 at 12:46 PM, alx...@aim.com wrote: Hello, I use solr 3.5 with edismax. I have the following issue with phrase search. For example if I have three documents with content like 1.apache apache 2. solr solr 3.apache solr then search for apache solr displays documents in the order 1,.2,3 instead of 3, 2, 1 because term frequency in the first and second documents is higher than in the third document. We want results be displayed in the order as 3,2,1 since the third document has exact match. My request handler is as follows. requestHandler name=search class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qfhost^30 content^0.5 title^1.2/str str name=pfhost^30 content^20 title^22 /str str name=flurl,id, site ,title/str str name=mm2lt;-1 5lt;-2 6lt;90%/str int name=ps1/int bool name=hltrue/bool str name=q.alt*:*/str str name=hl.flcontent/str str name=f.title.hl.fragsize0/str str name=hl.fragsize165/str str name=f.title.hl.alternateFieldtitle/str str name=f.url.hl.fragsize0/str str name=f.url.hl.alternateFieldurl/str str name=f.content.hl.fragmenterregex/str str name=spellchecktrue/str str name=spellcheck.collatetrue/str str name=spellcheck.count5/str str name=grouptrue/str str name=group.fieldsite/str str name=group.ngroupstrue/str /lst arr name=last-components strspellcheck/str /arr /requestHandler Any ideas how to fix this issue? Thanks in advance. Alex.
term frequency outweighs exact phrase match
Hello, I use solr 3.5 with edismax. I have the following issue with phrase search. For example if I have three documents with content like 1.apache apache 2. solr solr 3.apache solr then search for apache solr displays documents in the order 1,.2,3 instead of 3, 2, 1 because term frequency in the first and second documents is higher than in the third document. We want results be displayed in the order as 3,2,1 since the third document has exact match. My request handler is as follows. requestHandler name=search class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qfhost^30 content^0.5 title^1.2/str str name=pfhost^30 content^20 title^22 /str str name=flurl,id, site ,title/str str name=mm2lt;-1 5lt;-2 6lt;90%/str int name=ps1/int bool name=hltrue/bool str name=q.alt*:*/str str name=hl.flcontent/str str name=f.title.hl.fragsize0/str str name=hl.fragsize165/str str name=f.title.hl.alternateFieldtitle/str str name=f.url.hl.fragsize0/str str name=f.url.hl.alternateFieldurl/str str name=f.content.hl.fragmenterregex/str str name=spellchecktrue/str str name=spellcheck.collatetrue/str str name=spellcheck.count5/str str name=grouptrue/str str name=group.fieldsite/str str name=group.ngroupstrue/str /lst arr name=last-components strspellcheck/str /arr /requestHandler Any ideas how to fix this issue? Thanks in advance. Alex.
data/index/segments_u (No such file or directory)
Hello, I have copied solr's data folder from dev linux box to prod one. When starting solr I get this error in prod server. In dev solr starts sucessfully. Caused by: java.io.FileNotFoundException: /home/apache-solr-3.5.0/example/solr/data/index/segments_u (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:233) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.init(SimpleFSDirectory.java:70) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.init(SimpleFSDirectory.java:97) at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.init(NIOFSDirectory.java:92) at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:79) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:345) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:265) at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:79) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:754) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:75) at org.apache.lucene.index.IndexReader.open(IndexReader.java:462) at org.apache.lucene.index.IndexReader.open(IndexReader.java:405) at org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1092) There is no segments_u file or folder in the dev box. Thanks in advenace. Alex.
Re: Help with duplicate unique IDs
take a look to updateRequestProcessorChain name=dedupe I think you must use dedup to solve this issue -Original Message- From: Thomas Dowling tdowl...@ohiolink.edu To: solr-user solr-user@lucene.apache.org Cc: Mikhail Khludnev mkhlud...@griddynamics.com Sent: Fri, Mar 2, 2012 1:10 pm Subject: Re: Help with duplicate unique IDs Thanks. In fact, the behavior I want is overwrite=true. I want to be able to reindex documents, with the same id string, and automatically overwrite the previous version. Thomas On 03/02/2012 04:01 PM, Mikhail Khludnev wrote: Hello Tomas, I guess you could just specify overwrite=false http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22 On Fri, Mar 2, 2012 at 11:23 PM, Thomas Dowlingtdowl...@ohiolink.eduwrote: In a Solr index of journal articles, I thought I was safe reindexing articles because their unique ID would cause the new record in the index to overwrite the old one. (As stated at http://wiki.apache.org/solr/** SchemaXml#The_Unique_Key_Fieldhttp://wiki.apache.org/solr/SchemaXml#The_Unique_Key_Field- right?)
Re: spellcheck configuration not providing suggestions or corrections
you have put this str name=buildOnOptimizetrue/str Maybe you need to put str name=buildOnCommittrue/str Alex. -Original Message- From: Dyer, James james.d...@ingrambook.com To: solr-user solr-user@lucene.apache.org Sent: Mon, Feb 13, 2012 12:43 pm Subject: RE: spellcheck configuration not providing suggestions or corrections That would be it, I tbinkl. Your request is to /select, but you've put spellchecking into /search. Try /search instead. Also, I doubt its the problem, but try removing the trailing CRLFs from your query. Also, typically you'd still query against the main field (itemDesc in your case) and just use itemDescSpell from which to build your dictionary. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: geeky2 [mailto:gee...@hotmail.com] Sent: Monday, February 13, 2012 2:28 PM To: solr-user@lucene.apache.org Subject: RE: spellcheck configuration not providing suggestions or corrections hello thank you for the suggestion - however this did not work. i went in to solrconfig and change the count to 20 - then restarted the server and then did a reimport. is it possible that i am not firing the request handler that i think i am firing ? requestHandler name=/search class=org.apache.solr.handler.component.SearchHandler lst name=defaults str name=spellcheck.dictionarydefault/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.count20/str str name=echoParamsexplicit/str /lst arr name=last-components strspellcheck/str /arr /requestHandler query sent to server: http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select/?q=itemDescSpell%3Agusket%0D%0Aversion=2.2start=0rows=10indent=onspellcheck=truespellcheck.build=true results: responselst name=responseHeaderint name=status0/intint name=QTime0/intlst name=paramsstr name=spellchecktrue/strstr name=indenton/strstr name=start0/strstr name=qitemDescSpell:gusket /strstr name=spellcheck.buildtrue/strstr name=rows10/strstr name=version2.2/str/lst/lstresult name=response numFound=0 start=0//response -- View this message in context: http://lucene.472066.n3.nabble.com/spellcheck-configuration-not-providing-suggestions-or-corrections-tp3740877p3741521.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: can solr automatically search for different punctuation of a word
Hi Chantal, In the readme file at solr/contrib/analysis-extras/README.txt it says to add the ICU library (in lib/) Do I need also add dependecy... and where? Thanks. Alex. -Original Message- From: Chantal Ackermann chantal.ackerm...@btelligent.de To: solr-user solr-user@lucene.apache.org Sent: Fri, Jan 13, 2012 1:52 am Subject: Re: can solr automatically search for different punctuation of a word Hi Alex, for me, ICUFoldingFilterFactory works very good. It does lowercasing and removes diacritica (this is how umlauts and accenting of letters is called - punctuation means comma, points etc.). It will work for any any language, not only German. And it will also handle apostrophs as in C'est bien. ICU requires additional libraries in the classpath. For an in-built solr solution have a look at ASCIIFoldingFilterFactory. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ASCIIFoldingFilterFactory http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUFoldingFilterFactory Example configuration: fieldType name=text_sort class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.ICUFoldingFilterFactory / /analyzer /fieldType And dependencies (example for Maven) in addition to solr-core: dependency groupIdorg.apache.lucene/groupId artifactIdlucene-icu/artifactId version${solr.version}/version scoperuntime/scope /dependency dependency groupIdorg.apache.solr/groupId artifactIdsolr-analysis-extras/artifactId version${solr.version}/version scoperuntime/scope /dependency Cheers, Chantal On Fri, 2012-01-13 at 00:09 +0100, alx...@aim.com wrote: Hello, I would like to know if solr has a functionality to automatically search for a different punctuation of a word. For example if I if a user searches for a word Uber, and stemmer is german lang, then solr looks for both Uber and Über, like in synonyms. Is it possible to give a file with a list of possible substitutions of letters to solr and have it search for all possible punctuations? Thanks. Alex.
can solr automatically search for different punctuation of a word
Hello, I would like to know if solr has a functionality to automatically search for a different punctuation of a word. For example if I if a user searches for a word Uber, and stemmer is german lang, then solr looks for both Uber and Über, like in synonyms. Is it possible to give a file with a list of possible substitutions of letters to solr and have it search for all possible punctuations? Thanks. Alex.
Re: How to apply relevant Stemmer to each document
Hi Erick, Why querying would be wrong? It is my understanding that if I have let say 3 docs and each of them has been indexed with its own language stemmer, then sending a query will search all docs and return matching results? Let say if a query is driving and one of the docs has drive and was stemmed by English Stemmer, then it would return 1 result as opposed if I had applied to all docs Russian lang stemmer and resuilt be 0 docs? Am I missing something? Thanks. Alex. -Original Message- From: Erick Erickson erickerick...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Thu, Dec 22, 2011 11:06 am Subject: Re: How to apply relevant Stemmer to each document Not really. And it's hard to make sense of how this would work in practice because stemming the document (even if you could) because that's only half the battle. How would querying work then? No matter what language you used for your stemming, it would be wrong for all the documents that used a different stemmer (or a stemmer based on a different language). So I wouldn't hold out too much hope here. Best Erick On Wed, Dec 21, 2011 at 4:09 PM, alx...@aim.com wrote: Hello, I would like to know if in the latest version of solr is it possible to apply relevant stemmer to each doc depending on its lang field. I searched solr-user mailing lists and fount this thread http://lucene.472066.n3.nabble.com/Multiplexing-TokenFilter-for-multi-language-td3235341.html but not sure if it was developed into a jira ticket. Thanks. Alex.
Re: two word phrase search using dismax
Hi Eric, After reading more about pf param I increased them a few times and this solved options 2, 3, 4 but 1. As an example, for phrase newspaper latimes latimes.com is not even in the results to boost it to the first place and changing mm param to str name=mm1lt;-1 5lt;-2 6lt;90%/str solves only 1,4 but 2,3. Thanks. Alex. -Original Message- From: Erick Erickson erickerick...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Mon, Dec 5, 2011 5:52 am Subject: Re: two word phrase search using dismax Have you looked at the pf (phrase fields) parameter of edismax? http://wiki.apache.org/solr/DisMaxQParserPlugin#pf_.28Phrase_Fields.29 Best Erick On Sat, Dec 3, 2011 at 7:04 PM, alx...@aim.com wrote: Hello, Here is my request handler requestHandler name=search class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qfsite^1.5 content^0.5 title^1.2/str str name=pfsite^1.5 content^0.5 title^1.2/str str name=flid,title, site/str str name=mm2lt;-1 5lt;-2 6lt;90%/str int name=ps300/int bool name=hltrue/bool str name=q.alt*:*/str str name=hl.flcontent/str str name=f.title.hl.fragsize0/str str name=hl.fragsize165/str str name=f.title.hl.alternateFieldtitle/str str name=f.url.hl.fragsize0/str str name=f.url.hl.alternateFieldurl/str str name=f.content.hl.fragmenterregex/str /lst /requestHandler I have made a few tests with debugQuery and realised that for two word phrases, solr takes the first word and gives it a score according to qf param then takes the second word and gives it score and etc, but not to the whole phrase. That is why if one of the words is in the title and one of them in the content then this doc is given higher score than the one that has both words in the content but none in the title. Ideally, I want to achieve the following order. 1. If one (or both) of the words are in field site, then it must be given higher score. 2. Then come docs with both words in the title. 3. Next, docs with both words in the content. 4. And finally docs having either of words in the title and content. I tried to change mm param to str name=mm1lt;-1 5lt;-2 6lt;90%/str This allows to achieve 1,4 but not 2,3 Thanks. Alex. -Original Message- From: Chris Hostetter hossman_luc...@fucit.org To: solr-user solr-user@lucene.apache.org Sent: Thu, Nov 17, 2011 2:17 pm Subject: Re: two word phrase search using dismax : After putting the same score for title and content in qf filed, docs : with both words in content moved to fifth place. The doc in the first, : third and fourth places still have only one of the words in content and : title. The doc in the second place has one of the words in title and : both words in the content but in different places not together. details matter -- if you send futher followup mails the full details of your dismax options and the score explanations for debugQuery are neccessary to be sure people understand what you are describing (a snapshot of reality is far more valuable then a vague description of reality) off hand what you are describing sounds correct -- this is what the dismax parser is really designed to do. even if you have given both title and content equal boosts, your title field is probably shorter then your content field, so words matching once in title are likly to score higher then the same word matching once in content due to length normalization -- and unless you set the tie param to something really high, the score contribution from the highest scoring field (in this case title) will be the dominant factor in the score (it's disjunction *max* by default ... if you make tie=1 then it's disjunction *sum*) you haven't mentioned anything about hte pf param at all which i can only assume means you aren't using it -- the pf param is how you configure that scores should be increased if/when all of the words in teh query string appear together. I would suggest putting all of the fields in your qf param in your pf param as well. -Hoss
less search results in prod
Hello, I have build solr-3.4.0 data folder in dev server and copied it to prod server. Made a search for a keyword, then modified qf and pf params in solrconfig.xml. Made search for the same keywords, then restored qf and pf params to their original value. Now, solr returns very less number of docs for the same keywords in comparison with the dev server. Tried other keywords, the issue is the same. Copied solrconfig.xml from dev server, but nothing changed. Took a look to statistics, the numDocs and maxDoc values are the same in both servers. Any ideas how to debug this issue? Thanks in advance. Alex.
Re: two word phrase search using dismax
Hello, Here is my request handler requestHandler name=search class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qfsite^1.5 content^0.5 title^1.2/str str name=pfsite^1.5 content^0.5 title^1.2/str str name=flid,title, site/str str name=mm2lt;-1 5lt;-2 6lt;90%/str int name=ps300/int bool name=hltrue/bool str name=q.alt*:*/str str name=hl.flcontent/str str name=f.title.hl.fragsize0/str str name=hl.fragsize165/str str name=f.title.hl.alternateFieldtitle/str str name=f.url.hl.fragsize0/str str name=f.url.hl.alternateFieldurl/str str name=f.content.hl.fragmenterregex/str /lst /requestHandler I have made a few tests with debugQuery and realised that for two word phrases, solr takes the first word and gives it a score according to qf param then takes the second word and gives it score and etc, but not to the whole phrase. That is why if one of the words is in the title and one of them in the content then this doc is given higher score than the one that has both words in the content but none in the title. Ideally, I want to achieve the following order. 1. If one (or both) of the words are in field site, then it must be given higher score. 2. Then come docs with both words in the title. 3. Next, docs with both words in the content. 4. And finally docs having either of words in the title and content. I tried to change mm param to str name=mm1lt;-1 5lt;-2 6lt;90%/str This allows to achieve 1,4 but not 2,3 Thanks. Alex. -Original Message- From: Chris Hostetter hossman_luc...@fucit.org To: solr-user solr-user@lucene.apache.org Sent: Thu, Nov 17, 2011 2:17 pm Subject: Re: two word phrase search using dismax : After putting the same score for title and content in qf filed, docs : with both words in content moved to fifth place. The doc in the first, : third and fourth places still have only one of the words in content and : title. The doc in the second place has one of the words in title and : both words in the content but in different places not together. details matter -- if you send futher followup mails the full details of your dismax options and the score explanations for debugQuery are neccessary to be sure people understand what you are describing (a snapshot of reality is far more valuable then a vague description of reality) off hand what you are describing sounds correct -- this is what the dismax parser is really designed to do. even if you have given both title and content equal boosts, your title field is probably shorter then your content field, so words matching once in title are likly to score higher then the same word matching once in content due to length normalization -- and unless you set the tie param to something really high, the score contribution from the highest scoring field (in this case title) will be the dominant factor in the score (it's disjunction *max* by default ... if you make tie=1 then it's disjunction *sum*) you haven't mentioned anything about hte pf param at all which i can only assume means you aren't using it -- the pf param is how you configure that scores should be increased if/when all of the words in teh query string appear together. I would suggest putting all of the fields in your qf param in your pf param as well. -Hoss
Re: spellcheck in dismax
It seem you forget this str name=spellchecktrue/str -Original Message- From: Ruixiang Zhang rxzh...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Tue, Nov 22, 2011 11:54 am Subject: spellcheck in dismax I put the following into dismax requestHandler, but no suggestion field is returned. lst name=defaults str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.count1/str /lst arr name=last-components strspellcheck/str /arr But everything works if I put it as a separate requestHandler. Did I miss something? Thanks Richard
jetty error, broken pipe
Hello, I use solr 3.4 with jetty that is included in it. Periodically, I see this error in the jetty output SEVERE: org.mortbay.jetty.EofException at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791) at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569) at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012) at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:296) at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:140) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229) ... ... ... Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) at java.net.SocketOutputStream.write(SocketOutputStream.java:153) at org.mortbay.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:368) at org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:129) at org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:161) at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:714) ... 25 more 2011-11-19 20:50:00.060:WARN::Committed before 500 null||org.mortbay.jetty.EofException|?at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791)|?at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569)|?at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012)|?at sun.nio.cs.StreamEncoder.implFlush(S I searched web and the only advice I get is to upgrade to jetty 6.1, but I think the version included in solr is 6.1.26. Any advise is appreciated. Thanks. Alex.
Re: jetty error, broken pipe
I found out that curl timeout was set to 10 and for queries taking longer than 10 sec it was closing connection to jetty. I noticed that when number of docs found is large solr returns results for about 20 sec. This is too long. I set caching to off but it did not help. I think solr spends too much time to find total number of docs. Is there a way to turn off this count? Thanks. Alex. -Original Message- From: Fuad Efendi f...@efendi.ca To: solr-user solr-user@lucene.apache.org Cc: solr-user solr-user@lucene.apache.org Sent: Sat, Nov 19, 2011 7:24 pm Subject: Re: jetty error, broken pipe It's not Jetty. It is broken TCP pipe due to client-side. It happens when client closes TCP connection. And I even had this problem with recent Tomcat 6. Problem disappeared after I explicitly tuned keep-alive at Tomcat, and started using monitoring thread with HttpClient and SOLRJ... Fuad Efendi http://www.tokenizer.ca Sent from my iPad On 2011-11-19, at 9:14 PM, alx...@aim.com wrote: Hello, I use solr 3.4 with jetty that is included in it. Periodically, I see this error in the jetty output SEVERE: org.mortbay.jetty.EofException at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791) at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569) at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012) at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:296) at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:140) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229) ... ... ... Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) at java.net.SocketOutputStream.write(SocketOutputStream.java:153) at org.mortbay.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:368) at org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:129) at org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:161) at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:714) ... 25 more 2011-11-19 20:50:00.060:WARN::Committed before 500 null||org.mortbay.jetty.EofException|?at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791)|?at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569)|?at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012)|?at sun.nio.cs.StreamEncoder.implFlush(S I searched web and the only advice I get is to upgrade to jetty 6.1, but I think the version included in solr is 6.1.26. Any advise is appreciated. Thanks. Alex.
Re: two word phrase search using dismax
Hello, Thanks for your letter. I investigated further and found out that we have title scored more than content in qf field and those docs in the first places have one of the words in title but not both of them. The doc in the first place has only one of the words in the content. Docs with both words in content are placed after them in around 20th place. After putting the same score for title and content in qf filed, docs with both words in content moved to fifth place. The doc in the first, third and fourth places still have only one of the words in content and title. The doc in the second place has one of the words in title and both words in the content but in different places not together. Thanks. Alex. -Original Message- From: Michael Kuhlmann k...@solarier.de To: solr-user solr-user@lucene.apache.org Sent: Tue, Nov 15, 2011 12:20 am Subject: Re: two word phrase search using dismax Am 14.11.2011 21:50, schrieb alx...@aim.com: Hello, I use solr3.4 and nutch 1.3. In request handler we have str name=mm2lt;-1 5lt;-2 6lt;90%/str As fas as I know this means that for two word phrase search match must be 100%. However, I noticed that in most cases documents with both words are ranked around 20 place. In the first places are documents with one of the words in the phrase. Any ideas why this happening and is it possible to fix it? Hi, are you sure that only one of the words matched in the found documents? Have you checked all fields that are listed in the qf parameter? And did you check for stemmed versions of your search terms? If all this is true, you maybe want to give an example. And AFAIK the mm parameter does not affect the ranking.
Re: how to achieve google.com like results for phrase queries
Solr also can query link(url) text and rank them higher if we specify url in qf field. Only problem is that why it does not rank pages with both words higher when mm is set as 1lt;-1. It seems to me that this is a bug. Thanks. Alex. -Original Message- From: Ted Dunning ted.dunn...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Sat, Nov 5, 2011 8:59 pm Subject: Re: how to achieve google.com like results for phrase queries Google achieves their results by using data not found in the web pages themselves. This additional data critically includes link text, but also is derived from behavioral information. On Sat, Nov 5, 2011 at 5:07 PM, alx...@aim.com wrote: Hi Erick, The term newspaper latimes is not found in latimes.com. However, google places it in the first place. My guess is that mm parameter must not be set as 2lt;-1 in order to achieve google.com like ranking for two word phrase queries. My goal is to set mm parameter in such a way that latimes.com will be ranked in 1-3rd places and sites with both words will be placed after them. As I wrote in my previous letter setting mm as 1lt;-1 solves this issue partially. Problem in this case is that sites with both words are placed at the bottom or are not in the search results at all. Thanks. Alex. -Original Message- From: Erick Erickson erickerick...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Sat, Nov 5, 2011 9:01 am Subject: Re: how to achieve google.com like results for phrase queries First, the default query operator is ignored by edismax, so that's not doing anything. Why would you expect newspaper latimes to be found at all in latimes.com? What proof do you have that the two terms are even in the latimes.com document? You can look at the Query Elevation Component to force certain known documents to the top of the results based on the search terms, but that's not a very elegant solution. What business requirement are you trying to accomplish here? Because as asked, there's really not enough information to provide a meaningful suggestion. Best Erick On Thu, Nov 3, 2011 at 7:30 PM, alx...@aim.com wrote: Hello, I use nutch-1.3 crawled results in solr-3.4. I noticed that for two word phrases like newspaper latimes, latimes.com is not in results at all. This may be due to the dismax def type that I use in request handler str name=defTypedismax/str str name=qfurl^1.5 id^1.5 content^ title^1.2/str str name=pfurl^1.5 id^1.5 content^0.5 title^1.2/str with mm as str name=mm2lt;-1 5lt;-2 6lt;90%/str However, changing it to str name=mm1lt;-1 2lt;-1 5lt;-2 6lt;90%/str and q.op to OR or AND do not solve the problem. In this case latimes.com is ranked higher, but still is not in the first place. Also in this case results with both words are ranked very low, almost at the end. We need to be able to achieve the case when latimes.com is placed in the first place then results with both words and etc. Any ideas how to modify config to this end? Thanks in advance. Alex.
Re: how to achieve google.com like results for phrase queries
Hi Erick, The term newspaper latimes is not found in latimes.com. However, google places it in the first place. My guess is that mm parameter must not be set as 2lt;-1 in order to achieve google.com like ranking for two word phrase queries. My goal is to set mm parameter in such a way that latimes.com will be ranked in 1-3rd places and sites with both words will be placed after them. As I wrote in my previous letter setting mm as 1lt;-1 solves this issue partially. Problem in this case is that sites with both words are placed at the bottom or are not in the search results at all. Thanks. Alex. -Original Message- From: Erick Erickson erickerick...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Sat, Nov 5, 2011 9:01 am Subject: Re: how to achieve google.com like results for phrase queries First, the default query operator is ignored by edismax, so that's not doing anything. Why would you expect newspaper latimes to be found at all in latimes.com? What proof do you have that the two terms are even in the latimes.com document? You can look at the Query Elevation Component to force certain known documents to the top of the results based on the search terms, but that's not a very elegant solution. What business requirement are you trying to accomplish here? Because as asked, there's really not enough information to provide a meaningful suggestion. Best Erick On Thu, Nov 3, 2011 at 7:30 PM, alx...@aim.com wrote: Hello, I use nutch-1.3 crawled results in solr-3.4. I noticed that for two word phrases like newspaper latimes, latimes.com is not in results at all. This may be due to the dismax def type that I use in request handler str name=defTypedismax/str str name=qfurl^1.5 id^1.5 content^ title^1.2/str str name=pfurl^1.5 id^1.5 content^0.5 title^1.2/str with mm as str name=mm2lt;-1 5lt;-2 6lt;90%/str However, changing it to str name=mm1lt;-1 2lt;-1 5lt;-2 6lt;90%/str and q.op to OR or AND do not solve the problem. In this case latimes.com is ranked higher, but still is not in the first place. Also in this case results with both words are ranked very low, almost at the end. We need to be able to achieve the case when latimes.com is placed in the first place then results with both words and etc. Any ideas how to modify config to this end? Thanks in advance. Alex.
how to achieve google.com like results for phrase queries
Hello, I use nutch-1.3 crawled results in solr-3.4. I noticed that for two word phrases like newspaper latimes, latimes.com is not in results at all. This may be due to the dismax def type that I use in request handler str name=defTypedismax/str str name=qfurl^1.5 id^1.5 content^ title^1.2/str str name=pfurl^1.5 id^1.5 content^0.5 title^1.2/str with mm as str name=mm2lt;-1 5lt;-2 6lt;90%/str However, changing it to str name=mm1lt;-1 2lt;-1 5lt;-2 6lt;90%/str and q.op to OR or AND do not solve the problem. In this case latimes.com is ranked higher, but still is not in the first place. Also in this case results with both words are ranked very low, almost at the end. We need to be able to achieve the case when latimes.com is placed in the first place then results with both words and etc. Any ideas how to modify config to this end? Thanks in advance. Alex.
apply filter to spell filed
Hello, I have implemented spellchecker in two ways. 1. Adding a textspell type to schema.xml and making a copy field from original content field, which is type text. 2. without adding new type and copy field. Simple adding name of spell field, content to solrconfig.xml I have an issue in both cases. In case 1. data folder becomes twice bigger and it comes with additional copy field which is a exact copy of content field and is an unnecessary data . In case 2 , suggestions are lower cases of search keywords, i.e. if a user searches for Jessica Alba, solr suggests jessica alba. So my question is that is it possible to resolve this issue without adding additional type and copy field to the schema.xml? Thanks. Alex.
Re: pagination with grouping
Is case #2 planned to be coded in the future releases? Thanks. Alex. -Original Message- From: Bill Bell billnb...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Thu, Sep 8, 2011 10:17 pm Subject: Re: pagination with grouping There are 2 use cases: 1. rows=10 means 10 groups. 2. rows=10 means to results (irregardless of groups). I thought there was a total number of groups (ngroups) or case #1. I don't believe case #2 has been coded. On 9/8/11 2:22 PM, alx...@aim.com alx...@aim.com wrote: Hello, When trying to implement pagination as in the case without grouping I see two issues. 1. with rows=10 solr feed displays 10 groups not 10 results 2. there is no total number of results with grouping to show the last page. In detail: 1. I need to display only 10 results in one page. For example if I have group.limit=5 and the first group has 5 docs, the second 3 and the third 2 then only these 3 group must be displayed in the first page. Currently specifying rows=10, shows 10 groups and if we have 5 docs in each group then in the first page we will have 50 docs. 2.I need to show the last page, for which I need total number of results with grouping. For example if I have 5 groups with number of docs 5, 4, 3,2 1 then this total number must be 15. Any ideas how to achieve this. Thanks in advance. Alex.
pagination with grouping
Hello, When trying to implement pagination as in the case without grouping I see two issues. 1. with rows=10 solr feed displays 10 groups not 10 results 2. there is no total number of results with grouping to show the last page. In detail: 1. I need to display only 10 results in one page. For example if I have group.limit=5 and the first group has 5 docs, the second 3 and the third 2 then only these 3 group must be displayed in the first page. Currently specifying rows=10, shows 10 groups and if we have 5 docs in each group then in the first page we will have 50 docs. 2.I need to show the last page, for which I need total number of results with grouping. For example if I have 5 groups with number of docs 5, 4, 3,2 1 then this total number must be 15. Any ideas how to achieve this. Thanks in advance. Alex.
grouping by alpha-numeric field
Hello, I try to group by a field with type string. In the results I see groupValues as parts of the group field. Any ideas how to fix this. Thanks. Alex.
spellchecking in nutch solr
Hello, I have tried to implement spellchecker based on index in nutch-solr by adding spell field to schema.xml and making it a copy from content field. However, this increased data folder size twice and spell filed as a copy of content field appears in xml feed which is not necessary. Is it possible to implement spellchecker without this issue? Thanks. Alex.
Re: how to manually add data to indexes generated by nutch-1.0 using solr
I forget to say that when I do curl http://localhost:8983/solr/update -H Content-Type: text/xml --data-binary 'commit waitFlush=false waitSearcher=false/' ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime453/int/lst /response and search for added keywords gives 0 results. Does status 0 mean that addition was successful? Thanks. Alex. -Original Message- From: Erik Hatcher e...@ehatchersolutions.com To: solr-user@lucene.apache.org Sent: Tue, 12 May 2009 6:48 pm Subject: Re: how to manually add data to indexes generated by nutch-1.0 using solr send a commit/ request afterwards, or you can add ?commit=true to the /update request with the adds.? ? ? Erik? ? On May 12, 2009, at 8:57 PM, alx...@aim.com wrote:? ? ? Tried to add a new record using? ? ? ? curl http://localhost:8983/solr/update -H Content-Type: text/xml -- data-binary 'add? doc boost=2.5? field name=segment20090512170318/field? field name=digest86937aaee8e748ac3007ed8b66477624/field? field name=boost0.21189615/field? field name=urltest.com/field? field name=titletest test/field? field name=tstamp 20090513003210909/field? /doc /add'? ? I get? ? ?xml version=1.0 encoding=UTF-8?? response? lst name=responseHeaderint name=status0/intint name=QTime71/int/lst? /response? ? ? and added records are not found in the search.? ? Any ideas what went wrong?? ? ? Thanks.? Alex.? ? ? ? ? -Original Message-? From: alx...@aim.com? To: solr-u...@lucene.apache.org? Sent: Mon, 11 May 2009 12:14 pm? Subject: how to manually add data to indexes generated by nutch-1.0 using solr? ? ? ? ? ? ? ? ? ? ? Hello,? ? I had? Nutch -1.0 to crawl fetch and index a lot of files. Then I needed to?? ? index a few files also. But I know keywords for those files and their?? locations. I need to add them manually. I took a look to two tutorials on the? wiki, but did not find any info about this issue.? Is there a tutorial on, step by step procedure of adding data to? nutch index? using solr? manually?? ? Thanks in advance.? Alex.? ? ? ? ? ? ?
Re: how to manually add data to indexes generated by nutch-1.0 using solr
Tried to add a new record using curl http://localhost:8983/solr/update -H Content-Type: text/xml --data-binary 'add doc boost=2.5 field name=segment20090512170318/field field name=digest86937aaee8e748ac3007ed8b66477624/field field name=boost0.21189615/field field name=urltest.com/field field name=titletest test/field field name=tstamp 20090513003210909/field /doc /add' I get ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime71/int/lst /response and added records are not found in the search. Any ideas what went wrong? Thanks. Alex. -Original Message- From: alx...@aim.com To: solr-user@lucene.apache.org Sent: Mon, 11 May 2009 12:14 pm Subject: how to manually add data to indexes generated by nutch-1.0 using solr Hello, I had? Nutch -1.0 to crawl fetch and index a lot of files. Then I needed to? index a few files also. But I know keywords for those files and their? locations. I need to add them manually. I took a look to two tutorials on the wiki, but did not find any info about this issue. Is there a tutorial on, step by step procedure of adding data to? nutch index using solr? manually? Thanks in advance. Alex.
how to manually add data to indexes generated by nutch-1.0 using solr
Hello, I had? Nutch -1.0 to crawl fetch and index a lot of files. Then I needed to? index a few files also. But I know keywords for those files and their? locations. I need to add them manually. I took a look to two tutorials on the wiki, but did not find any info about this issue. Is there a tutorial on, step by step procedure of adding data to? nutch index using solr? manually? Thanks in advance. Alex.