Re: Error while initializing EmbeddedSolrServer
Hi, Did you tried to figure out which artefact contains org.apache.lucene.codecs.Codec? I guess it should be something like lucene-codec or so. Also, referring different versions 3.6.2 vs 4.7.2 vs 3.0.3 is dead-end. Happy OSGIing! On Sun, Nov 23, 2014 at 6:14 AM, Danesh Kuruppu dknkuru...@gmail.com wrote: Hi all, I am using solr version 4.7.2. I need to use EmbeddedSolrServer. I am getting following error while initializing the coreContainer Exception in thread Thread-15 java.lang.NoClassDefFoundError: Could not initialize class org.apache.lucene.codecs.Codec at org.apache.solr.core.SolrResourceLoader.reloadLuceneSPI(SolrResourceLoader.java:186) at org.apache.solr.core.SolrResourceLoader.init(SolrResourceLoader.java:122) at org.apache.solr.core.SolrResourceLoader.init(SolrResourceLoader.java:236) at org.apache.solr.core.CoreContainer.init(CoreContainer.java:136) In my case, I create a osgi bundle for solr. when I check the bundle, class is in the bundle. Set dependencies added , dependency groupIdorg.apache.solr/groupId artifactIdsolr-solrj/artifactId version4.7.2/version /dependency dependency groupIdorg.apache.solr/groupId artifactIdsolr-core/artifactId version4.7.2/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-analyzers/artifactId version3.6.2/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-highlighter/artifactId version4.7.2/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-memory/artifactId version4.7.2/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-queries/artifactId version4.7.2/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-snowball/artifactId version3.0.3/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-misc/artifactId version4.7.2/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-spellchecker/artifactId version3.6.2/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-core/artifactId version4.7.2/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-codecs/artifactId version4.7.2/version /dependency Code: CoreContainer coreContainer = new CoreContainer(solrHome.getPath()); coreContainer.load(); this.server = new EmbeddedSolrServer(coreContainer, ); Could not find the wrong. Please help me. Thanks Danesh -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
ClassCastException using elevations with cursorMark.
Hi Folks, I'd like to switch to using cursorMark for pagination, but I can't get it to work with elevations (Solr 4.10.2 / jdk7 / osx). I try a query like: q=foosort=score+desc,id+ascelevateIds=1234567cursorMark=* and get an exception: java.lang.ClassCastException: java.lang.Float cannot be cast to org.apache.lucene.util.BytesRef at org.apache.solr.schema.FieldType.marshalStringSortValue(FieldType.java:1012) at org.apache.solr.schema.StrField.marshalSortValue(StrField.java:87) at org.apache.solr.search.CursorMark.getSerializedTotem(CursorMark.java:257) full trace here: https://gist.github.com/pandacalculus/e5cc8fef13e4b8372bd2 I get an identical exception when using static elevations (i.e. elevate.xml) with cursorMark. It appears the problem is in the calculation of the next cursorMark. Has anyone encountered this? Thanks, -Brendan
Indexing problems with BBoxField
Hi all, I just downloaded Solr 4.10.2 and wanted to try out the new BBoxField type, but couldn't get it to work. The error (with status 400) I get is: ERROR: [doc=foo] Error adding field 'bboxs_field_location_area'='ENVELOPE(25.89, 41.13, 47.07, 35.31)' msg=java.lang.IllegalStateException: instead call createFields() because isPolyField() is true Which, of course, is rather unhelpful for a user. The relevant portions of my schema.xml look like this (largely copied from [1]: fieldType name=bbox class=solr.BBoxField geo=true units=degrees numberType=_bbox_coord / fieldType name=_bbox_coord class=solr.TrieDoubleField precisionStep=8 stored=false / dynamicField name=bboxs_* type=bbox indexed=true stored=false multiValued=false/ [1] https://cwiki.apache.org/confluence/display/solr/Spatial+Search And the request I send is this: add doc field name=idfoo/field field name=bboxs_field_location_areaENVELOPE(25.89, 41.13, 47.07, 35.31)/field /doc /add Does anyone have any idea what could be going wrong here? Thanks a lot in advance, Thomas
Re: Error while initializing EmbeddedSolrServer
Those Spi classes rely on a configuration file that gets stored in the META-INF folder. I'm not familiar with who OSGI works, but I'm pretty sure that failure is because the file META-INF/services/org.apache.lucene.codecs.Codec (you'll see it in the lucene-core jar) can't be found -Mike On 11/22/2014 10:14 PM, Danesh Kuruppu wrote: Hi all, I am using solr version 4.7.2. I need to use EmbeddedSolrServer. I am getting following error while initializing the coreContainer Exception in thread Thread-15 java.lang.NoClassDefFoundError: Could not initialize class org.apache.lucene.codecs.Codec at org.apache.solr.core.SolrResourceLoader.reloadLuceneSPI(SolrResourceLoader.java:186) at org.apache.solr.core.SolrResourceLoader.init(SolrResourceLoader.java:122) at org.apache.solr.core.SolrResourceLoader.init(SolrResourceLoader.java:236) at org.apache.solr.core.CoreContainer.init(CoreContainer.java:136) In my case, I create a osgi bundle for solr. when I check the bundle, class is in the bundle. Set dependencies added , dependency groupIdorg.apache.solr/groupId artifactIdsolr-solrj/artifactId version4.7.2/version /dependency dependency groupIdorg.apache.solr/groupId artifactIdsolr-core/artifactId version4.7.2/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-analyzers/artifactId version3.6.2/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-highlighter/artifactId version4.7.2/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-memory/artifactId version4.7.2/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-queries/artifactId version4.7.2/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-snowball/artifactId version3.0.3/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-misc/artifactId version4.7.2/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-spellchecker/artifactId version3.6.2/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-core/artifactId version4.7.2/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-codecs/artifactId version4.7.2/version /dependency Code: CoreContainer coreContainer = new CoreContainer(solrHome.getPath()); coreContainer.load(); this.server = new EmbeddedSolrServer(coreContainer, ); Could not find the wrong. Please help me. Thanks Danesh
Re: Indexing problems with BBoxField
A difference I see in your snippet from the example is that you don't have docValues=true on the coordinate field type. You wrote: fieldType name=_bbox_coord class=solr.TrieDoubleField precisionStep=8 stored=false / But the example is: fieldType name=_bbox_coord class=solr.TrieDoubleField precisionStep=8 docValues=true stored=false/ Also, maybe try a static field rather than dynamic field, although the latter should work anyway. Please file a Jira to request that Solr give a user-sensible error, not a Lucene-level error. I mean, the Solr user has no ability to directly invoke the createFields method. And now... let's see what David Smiley has to say about all of this! -- Jack Krupansky -Original Message- From: Thomas Seidl Sent: Sunday, November 23, 2014 6:33 AM To: solr-user@lucene.apache.org Subject: Indexing problems with BBoxField Hi all, I just downloaded Solr 4.10.2 and wanted to try out the new BBoxField type, but couldn't get it to work. The error (with status 400) I get is: ERROR: [doc=foo] Error adding field 'bboxs_field_location_area'='ENVELOPE(25.89, 41.13, 47.07, 35.31)' msg=java.lang.IllegalStateException: instead call createFields() because isPolyField() is true Which, of course, is rather unhelpful for a user. The relevant portions of my schema.xml look like this (largely copied from [1]: fieldType name=bbox class=solr.BBoxField geo=true units=degrees numberType=_bbox_coord / fieldType name=_bbox_coord class=solr.TrieDoubleField precisionStep=8 stored=false / dynamicField name=bboxs_* type=bbox indexed=true stored=false multiValued=false/ [1] https://cwiki.apache.org/confluence/display/solr/Spatial+Search And the request I send is this: add doc field name=idfoo/field field name=bboxs_field_location_areaENVELOPE(25.89, 41.13, 47.07, 35.31)/field /doc /add Does anyone have any idea what could be going wrong here? Thanks a lot in advance, Thomas
Re: Indexing problems with BBoxField
Thanks a lot for your reply! I had »docValues=true« in there before, but then thought I'd try out removing it to see if that helped. It didn't, and I forgot to re-add it before copying it into the mail. So, unfortunately, that's not it. However, the other one seems to bring us a step closer to the solution: After adding field name=bboxs_field_location_area type=bbox indexed=true stored=false multiValued=false/ (even without removing the dynamic fields), this works indeed just fine! So, the question is what causes this, and it seems more and more like a bug instead of a user error. But I'll wait for a bit more feedback before filing a Jira. On 2014-11-23 14:10, Jack Krupansky wrote: A difference I see in your snippet from the example is that you don't have docValues=true on the coordinate field type. You wrote: fieldType name=_bbox_coord class=solr.TrieDoubleField precisionStep=8 stored=false / But the example is: fieldType name=_bbox_coord class=solr.TrieDoubleField precisionStep=8 docValues=true stored=false/ Also, maybe try a static field rather than dynamic field, although the latter should work anyway. Please file a Jira to request that Solr give a user-sensible error, not a Lucene-level error. I mean, the Solr user has no ability to directly invoke the createFields method. And now... let's see what David Smiley has to say about all of this! -- Jack Krupansky -Original Message- From: Thomas Seidl Sent: Sunday, November 23, 2014 6:33 AM To: solr-user@lucene.apache.org Subject: Indexing problems with BBoxField Hi all, I just downloaded Solr 4.10.2 and wanted to try out the new BBoxField type, but couldn't get it to work. The error (with status 400) I get is: ERROR: [doc=foo] Error adding field 'bboxs_field_location_area'='ENVELOPE(25.89, 41.13, 47.07, 35.31)' msg=java.lang.IllegalStateException: instead call createFields() because isPolyField() is true Which, of course, is rather unhelpful for a user. The relevant portions of my schema.xml look like this (largely copied from [1]: fieldType name=bbox class=solr.BBoxField geo=true units=degrees numberType=_bbox_coord / fieldType name=_bbox_coord class=solr.TrieDoubleField precisionStep=8 stored=false / dynamicField name=bboxs_* type=bbox indexed=true stored=false multiValued=false/ [1] https://cwiki.apache.org/confluence/display/solr/Spatial+Search And the request I send is this: add doc field name=idfoo/field field name=bboxs_field_location_areaENVELOPE(25.89, 41.13, 47.07, 35.31)/field /doc /add Does anyone have any idea what could be going wrong here? Thanks a lot in advance, Thomas
Re: Error while initializing EmbeddedSolrServer
Thanks Mikhail/Michael I checked both lucene-core.jar and lucene-codecs.jar. org.apache.lucene.codecs.Codec is there in both files. Seems like problem is in osgi bundling. I need to consider the service loader in osgi bundle which is not done. Thanks for information. Danesh On Sun, Nov 23, 2014 at 5:46 PM, Michael Sokolov msoko...@safaribooksonline.com wrote: Those Spi classes rely on a configuration file that gets stored in the META-INF folder. I'm not familiar with who OSGI works, but I'm pretty sure that failure is because the file META-INF/services/org.apache.lucene.codecs.Codec (you'll see it in the lucene-core jar) can't be found -Mike On 11/22/2014 10:14 PM, Danesh Kuruppu wrote: Hi all, I am using solr version 4.7.2. I need to use EmbeddedSolrServer. I am getting following error while initializing the coreContainer Exception in thread Thread-15 java.lang.NoClassDefFoundError: Could not initialize class org.apache.lucene.codecs.Codec at org.apache.solr.core.SolrResourceLoader.reloadLuceneSPI( SolrResourceLoader.java:186) at org.apache.solr.core.SolrResourceLoader.init( SolrResourceLoader.java:122) at org.apache.solr.core.SolrResourceLoader.init( SolrResourceLoader.java:236) at org.apache.solr.core.CoreContainer.init( CoreContainer.java:136) In my case, I create a osgi bundle for solr. when I check the bundle, class is in the bundle. Set dependencies added , dependency groupIdorg.apache.solr/groupId artifactIdsolr-solrj/artifactId version4.7.2/version /dependency dependency groupIdorg.apache.solr/groupId artifactIdsolr-core/artifactId version4.7.2/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-analyzers/artifactId version3.6.2/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-highlighter/artifactId version4.7.2/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-memory/artifactId version4.7.2/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-queries/artifactId version4.7.2/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-snowball/artifactId version3.0.3/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-misc/artifactId version4.7.2/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-spellchecker/artifactId version3.6.2/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-core/artifactId version4.7.2/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-codecs/artifactId version4.7.2/version /dependency Code: CoreContainer coreContainer = new CoreContainer(solrHome. getPath()); coreContainer.load(); this.server = new EmbeddedSolrServer(coreContainer, ); Could not find the wrong. Please help me. Thanks Danesh
Too much data after closed for HttpChannelOverHttp
Hi there, I have deployed solr with Jetty, and I'm trying to index a quite large amount of items (300K), retreived from a MySQL database (unfortunately I'm not using DIH; I'm doing it manually, by getting items from MySQL and then index them it in Solr). But, I'm not indexing all of those items at the same time; I'm indexing them by chunks of 3K. So, I get the first 3K, index them, then goes to the next 3K chunk to index it. Here is the error I got in jetty logs, I guess it has nothing to do with Mysql: *Does anyone know the meaning of the error 'badMessage: java.lang.IllegalStateException: too much data after closed for HttpChannelOverHttp@5432494a' ?* Thanks for your help, if anything isnt very precise please tell me to explain it (and sorry for my bad english). -- Cordialement, Best regards, Hakim Benoudjit
analyzer of user queries in SOLR 4.10?
This could be a dummy question. At index time, we can specify the fieldType of different fields; thus, we know the analyzers for those fields. In schema.xml, I do not see the configuration how to specify the fieldType (thus analyzer) for runtime user queries. Can anyone help explain this ? Thanks, DL
Re: analyzer of user queries in SOLR 4.10?
Query time analysis depends on the query parser in play. If a query parser chooses to analyze some or all of the query it will use the same analysis as index time unless specified separately (in the field type definition itself too) Erik On Nov 23, 2014, at 13:08, David Lee seek...@gmail.com wrote: This could be a dummy question. At index time, we can specify the fieldType of different fields; thus, we know the analyzers for those fields. In schema.xml, I do not see the configuration how to specify the fieldType (thus analyzer) for runtime user queries. Can anyone help explain this ? Thanks, DL
Re: analyzer of user queries in SOLR 4.10?
You can find an example of split definition as well as the full list of analyzers available to you at: http://www.solr-start.com/info/analyzers/ (example is well towards the bottom) Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 23 November 2014 at 13:21, Erik Hatcher erik.hatc...@gmail.com wrote: Query time analysis depends on the query parser in play. If a query parser chooses to analyze some or all of the query it will use the same analysis as index time unless specified separately (in the field type definition itself too) Erik On Nov 23, 2014, at 13:08, David Lee seek...@gmail.com wrote: This could be a dummy question. At index time, we can specify the fieldType of different fields; thus, we know the analyzers for those fields. In schema.xml, I do not see the configuration how to specify the fieldType (thus analyzer) for runtime user queries. Can anyone help explain this ? Thanks, DL
Re: Too much data after closed for HttpChannelOverHttp
Most probably just a request that's too large. Have you tried dropping down to 500 items and seeing what happens? Are you using SolrJ to send content to Solr? Or a direct HTTP request? Regards, Alex. P.s. You may also find it useful to read up on the Solr commit and hard vs. soft commits. Check solrconfig.xml in the example distribution. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 23 November 2014 at 12:31, Hakim Benoudjit h.benoud...@gmail.com wrote: Hi there, I have deployed solr with Jetty, and I'm trying to index a quite large amount of items (300K), retreived from a MySQL database (unfortunately I'm not using DIH; I'm doing it manually, by getting items from MySQL and then index them it in Solr). But, I'm not indexing all of those items at the same time; I'm indexing them by chunks of 3K. So, I get the first 3K, index them, then goes to the next 3K chunk to index it. Here is the error I got in jetty logs, I guess it has nothing to do with Mysql: *Does anyone know the meaning of the error 'badMessage: java.lang.IllegalStateException: too much data after closed for HttpChannelOverHttp@5432494a' ?* Thanks for your help, if anything isnt very precise please tell me to explain it (and sorry for my bad english). -- Cordialement, Best regards, Hakim Benoudjit
Re: Too much data after closed for HttpChannelOverHttp
Actually I'm using a php client (I think it sends a HTTP request to Solr), but you're right tomorrow once I'll get to the office, I'll set chunk size to a smaller value, and will tell you if that was the reason. Thanks. 2014-11-23 19:35 GMT+01:00 Alexandre Rafalovitch arafa...@gmail.com: Most probably just a request that's too large. Have you tried dropping down to 500 items and seeing what happens? Are you using SolrJ to send content to Solr? Or a direct HTTP request? Regards, Alex. P.s. You may also find it useful to read up on the Solr commit and hard vs. soft commits. Check solrconfig.xml in the example distribution. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 23 November 2014 at 12:31, Hakim Benoudjit h.benoud...@gmail.com wrote: Hi there, I have deployed solr with Jetty, and I'm trying to index a quite large amount of items (300K), retreived from a MySQL database (unfortunately I'm not using DIH; I'm doing it manually, by getting items from MySQL and then index them it in Solr). But, I'm not indexing all of those items at the same time; I'm indexing them by chunks of 3K. So, I get the first 3K, index them, then goes to the next 3K chunk to index it. Here is the error I got in jetty logs, I guess it has nothing to do with Mysql: *Does anyone know the meaning of the error 'badMessage: java.lang.IllegalStateException: too much data after closed for HttpChannelOverHttp@5432494a' ?* Thanks for your help, if anything isnt very precise please tell me to explain it (and sorry for my bad english). -- Cordialement, Best regards, Hakim Benoudjit -- Cordialement, Best regards, Hakim Benoudjit
Using stored value of a field to build suggester index
Hi, I am trying to build a suggester for a field which is both index and stored. The field is whitespace tokenized, lowercased, stemmed etc while indexing. It looks like that the indexed terms are used as a source for building the suggester index. Which is what the following line in the suggester documentation also mentions. https://wiki.apache.org/solr/Suggester - field - if sourceLocation is empty then terms from this field in the index will be used when building the trie. I want to display the suggested value in UI, is it possible to use the stored value of the field rather than the indexed terms to build the index. Here are the relevant definitions from solrconfig.xml and schema.xml. Thanks. Faisal solrconfig.xml searchComponent class=solr.SpellCheckComponent name=infix_suggest_analyzing lst name=spellchecker str name=nameinfix_suggest_analyzing/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory/str str name=buildOnCommitfalse/str !-- Suggester properties -- str name=suggestAnalyzerFieldTypeautosuggest_fieldType/str str name=dictionaryImplorg.apache.solr.spelling.suggest.HighFrequencyDictionaryFactory/str str name=fieldDisplayName/str /lst !-- specify a fieldtype using keywordtokenizer + lowercase + cleanup -- str name=queryAnalyzerFieldTypephrase_suggest/str /searchComponent requestHandler name=/suggest class=org.apache.solr.handler.component.SearchHandler lst name=defaults str name=echoParamsexplicit/str str name=spellchecktrue/str str name=spellcheck.dictionaryinfix_suggest_analyzing/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count200/str str name=spellcheck.collatetrue/str str name=spellcheck.maxCollations10/str /lst arr name=components strinfix_suggest_analyzing/str /arr /requestHandler schema.xml fieldType name=autosuggest_fieldType class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ /analyzer /fieldType fieldtype name=phrase_suggest class=solr.TextField analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.PatternReplaceFilterFactory pattern=([^\p{L}\p{M}\p{N}\p{Cs}]*[\p{L}\p{M}\p{N}\p{Cs}\_]+:)|([^\p{L}\p{M}\p{N}\p{Cs}])+ replacement= replace=all/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory/ /analyzer /fieldtype fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType field name=DisplayName type=text indexed=true stored=true required=true multiValued=false /
Re: analyzer of user queries in SOLR 4.10?
Thanks Erik. I am actually using edismax query parser in SOLR. I can explicitly specify the fieldType (e.g., text_general or text_en) for different fields (e.g., title or description) . But I do not see how to specify the fieldType (thus analyzer) for runtime queries. Thanks, DL On Sun, Nov 23, 2014 at 10:21 AM, Erik Hatcher erik.hatc...@gmail.com wrote: Query time analysis depends on the query parser in play. If a query parser chooses to analyze some or all of the query it will use the same analysis as index time unless specified separately (in the field type definition itself too) Erik On Nov 23, 2014, at 13:08, David Lee seek...@gmail.com wrote: This could be a dummy question. At index time, we can specify the fieldType of different fields; thus, we know the analyzers for those fields. In schema.xml, I do not see the configuration how to specify the fieldType (thus analyzer) for runtime user queries. Can anyone help explain this ? Thanks, DL -- SeekWWW: the Search Engine of Choice www.seekwww.com
Re: analyzer of user queries in SOLR 4.10?
On 11/23/2014 2:13 PM, David Lee wrote: Thanks Erik. I am actually using edismax query parser in SOLR. I can explicitly specify the fieldType (e.g., text_general or text_en) for different fields (e.g., title or description) . But I do not see how to specify the fieldType (thus analyzer) for runtime queries. The query analysis is chosen by the field that you are querying. If the request sent to your edismax parser is configured to query multiple fields (qf, pf, etc), then multiple analysis chains might get used -- each field uses its own analysis chain. Setting the debugQuery parameter to true will show you exactly how a query was analyzed. The same thing can happen when you use multiple field:value clauses in your query. Thanks, Shawn
Random sorting and result consistency across successive calls based on seed
Hi, We are using the random dynamic type (solr.RandomSortField) with Solr Cloud, 1 shard and 1 replica. Our use case is searching and displaying items for sale in two major types of ad - premium and standard. We don't want all recently-updated or recently-created ads sorted to the top by default, instead we use the random type to allow random distribution of results within premium and standard ads, as a way to not give preference to anyone (other explicit sorting options like date, price, etc. also exist). The random dynamic type allows a seed to be specified as part of the field name, in the default configuration you can sort by random_{seed}, and we set the {seed} to the current date (yymmdd) to enforce consistency across pagination of results for the day. For example, sort=random_141124+desc. This works well enough and allows pagination using the same seed, with the following two caveats: 1. The shard and the replica give different ordering of the results. 2. Any changes to the underlying index alter the ordering of the results. We solved issue #1 by ensuring stickiness to whichever Solr instance the web application first talked to (they're sitting behind a load balancer so we use cookie stickiness provided by the load balancer). For issue #2, we looked at the source for RandomSortField ( https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/schema/RandomSortField.java) and noticed that the seed is actually comprised of the hashcode of the seed we specify in the field name, along with the index version (plus whatever context.docBase is): private static int getSeed(String fieldName, LeafReaderContext context) { final DirectoryReader top = (DirectoryReader) ReaderUtil.getTopLevelContext(context).reader(); return fieldName.hashCode() + context.docBase + (int)top.getVersion(); } Clearly that is the cause of random ordering changing whenever docs are added to the index. As a test we created our own random field type based on RandomSortField, but implemented getSeed differently: private static int getSeed(String fieldName, LeafReaderContext context) { return fieldName.hashCode(); } And after compiling, adding the JAR to the solrconfig.xml to be used in place of the existing class for the random field type and testing - it seems to work. We now have consistently random results across pagination when using the same seed, even when adding documents to the index. Incidentally this does not fix different results across the shard versus the replica. Haven't worked that out. Does anyone know why context.docBase and the index version are part of the seed in the first place? I wonder what we're missing out on by removing them from our random class, or any other side effects. Thanks, Adam.
Re: analyzer of user queries in SOLR 4.10?
Yes, my edismax parser is configured to query multiple fields, including qf, pf, pf2 and pf3. Is there any online documentation on multiple analysis chains might get used -- each field uses its own analysis chain ? Thanks, DL On Sun, Nov 23, 2014 at 1:34 PM, Shawn Heisey apa...@elyograg.org wrote: On 11/23/2014 2:13 PM, David Lee wrote: Thanks Erik. I am actually using edismax query parser in SOLR. I can explicitly specify the fieldType (e.g., text_general or text_en) for different fields (e.g., title or description) . But I do not see how to specify the fieldType (thus analyzer) for runtime queries. The query analysis is chosen by the field that you are querying. If the request sent to your edismax parser is configured to query multiple fields (qf, pf, etc), then multiple analysis chains might get used -- each field uses its own analysis chain. Setting the debugQuery parameter to true will show you exactly how a query was analyzed. The same thing can happen when you use multiple field:value clauses in your query. Thanks, Shawn -- SeekWWW: the Search Engine of Choice www.seekwww.com
Re: Using stored value of a field to build suggester index
You can't build the suggester from the stored values, it's constructed from indexed terms only. You probably want to create a copyField to a less-analyzed (indexed) field and suggest from _that_. You'll probably want to do things like remove punctuation, perhaps lowercase and the like but not stem etc. Best, Erick On Sun, Nov 23, 2014 at 12:25 PM, Faisal Mansoor faisal.mans...@gmail.com wrote: Hi, I am trying to build a suggester for a field which is both index and stored. The field is whitespace tokenized, lowercased, stemmed etc while indexing. It looks like that the indexed terms are used as a source for building the suggester index. Which is what the following line in the suggester documentation also mentions. https://wiki.apache.org/solr/Suggester - field - if sourceLocation is empty then terms from this field in the index will be used when building the trie. I want to display the suggested value in UI, is it possible to use the stored value of the field rather than the indexed terms to build the index. Here are the relevant definitions from solrconfig.xml and schema.xml. Thanks. Faisal solrconfig.xml searchComponent class=solr.SpellCheckComponent name=infix_suggest_analyzing lst name=spellchecker str name=nameinfix_suggest_analyzing/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory/str str name=buildOnCommitfalse/str !-- Suggester properties -- str name=suggestAnalyzerFieldTypeautosuggest_fieldType/str str name=dictionaryImplorg.apache.solr.spelling.suggest.HighFrequencyDictionaryFactory/str str name=fieldDisplayName/str /lst !-- specify a fieldtype using keywordtokenizer + lowercase + cleanup -- str name=queryAnalyzerFieldTypephrase_suggest/str /searchComponent requestHandler name=/suggest class=org.apache.solr.handler.component.SearchHandler lst name=defaults str name=echoParamsexplicit/str str name=spellchecktrue/str str name=spellcheck.dictionaryinfix_suggest_analyzing/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count200/str str name=spellcheck.collatetrue/str str name=spellcheck.maxCollations10/str /lst arr name=components strinfix_suggest_analyzing/str /arr /requestHandler schema.xml fieldType name=autosuggest_fieldType class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ /analyzer /fieldType fieldtype name=phrase_suggest class=solr.TextField analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.PatternReplaceFilterFactory pattern=([^\p{L}\p{M}\p{N}\p{Cs}]*[\p{L}\p{M}\p{N}\p{Cs}\_]+:)|([^\p{L}\p{M}\p{N}\p{Cs}])+ replacement= replace=all/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory/ /analyzer /fieldtype fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType field name=DisplayName type=text indexed=true stored=true required=true multiValued=false /
Re: analyzer of user queries in SOLR 4.10?
Hi, Same as index time. You can define query time analyzer in schema.xml. Here is an example taken from default schema.xml fieldType name=text_en class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ /analyzer analyzer type=query . /analyzer /fieldType On Monday, November 24, 2014 12:05 AM, David Lee seek...@gmail.com wrote: Yes, my edismax parser is configured to query multiple fields, including qf, pf, pf2 and pf3. Is there any online documentation on multiple analysis chains might get used -- each field uses its own analysis chain ? Thanks, DL On Sun, Nov 23, 2014 at 1:34 PM, Shawn Heisey apa...@elyograg.org wrote: On 11/23/2014 2:13 PM, David Lee wrote: Thanks Erik. I am actually using edismax query parser in SOLR. I can explicitly specify the fieldType (e.g., text_general or text_en) for different fields (e.g., title or description) . But I do not see how to specify the fieldType (thus analyzer) for runtime queries. The query analysis is chosen by the field that you are querying. If the request sent to your edismax parser is configured to query multiple fields (qf, pf, etc), then multiple analysis chains might get used -- each field uses its own analysis chain. Setting the debugQuery parameter to true will show you exactly how a query was analyzed. The same thing can happen when you use multiple field:value clauses in your query. Thanks, Shawn -- SeekWWW: the Search Engine of Choice www.seekwww.com
RE: Too much data after closed for HttpChannelOverHttp
For what it's worth, depending on the type of PC/MAC you're using, you can use WireShark to look at active http header (sent and received) that are being created for the request. https://www.wireshark.org/ I don't have any financial interest in them, but the stuff works! Steve Date: Sun, 23 Nov 2014 20:47:05 +0100 Subject: Re: Too much data after closed for HttpChannelOverHttp From: h.benoud...@gmail.com To: solr-user@lucene.apache.org Actually I'm using a php client (I think it sends a HTTP request to Solr), but you're right tomorrow once I'll get to the office, I'll set chunk size to a smaller value, and will tell you if that was the reason. Thanks. 2014-11-23 19:35 GMT+01:00 Alexandre Rafalovitch arafa...@gmail.com: Most probably just a request that's too large. Have you tried dropping down to 500 items and seeing what happens? Are you using SolrJ to send content to Solr? Or a direct HTTP request? Regards, Alex. P.s. You may also find it useful to read up on the Solr commit and hard vs. soft commits. Check solrconfig.xml in the example distribution. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 23 November 2014 at 12:31, Hakim Benoudjit h.benoud...@gmail.com wrote: Hi there, I have deployed solr with Jetty, and I'm trying to index a quite large amount of items (300K), retreived from a MySQL database (unfortunately I'm not using DIH; I'm doing it manually, by getting items from MySQL and then index them it in Solr). But, I'm not indexing all of those items at the same time; I'm indexing them by chunks of 3K. So, I get the first 3K, index them, then goes to the next 3K chunk to index it. Here is the error I got in jetty logs, I guess it has nothing to do with Mysql: *Does anyone know the meaning of the error 'badMessage: java.lang.IllegalStateException: too much data after closed for HttpChannelOverHttp@5432494a' ?* Thanks for your help, if anything isnt very precise please tell me to explain it (and sorry for my bad english). -- Cordialement, Best regards, Hakim Benoudjit -- Cordialement, Best regards, Hakim Benoudjit
Re: Too much data after closed for HttpChannelOverHttp
Good point on that one Steve. Wireshark is both a hammer and a power drill of network troubleshooting. Takes steady hands to hold it right (it has a bit of a learning curve) but it is a great tool. I swore by it (well Ethereal back then) in my tech support days. So, seconded to try using that if the simple approach fails outright. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 23 November 2014 at 20:31, steve sc_shep...@hotmail.com wrote: For what it's worth, depending on the type of PC/MAC you're using, you can use WireShark to look at active http header (sent and received) that are being created for the request. https://www.wireshark.org/ I don't have any financial interest in them, but the stuff works! Steve Date: Sun, 23 Nov 2014 20:47:05 +0100 Subject: Re: Too much data after closed for HttpChannelOverHttp From: h.benoud...@gmail.com To: solr-user@lucene.apache.org Actually I'm using a php client (I think it sends a HTTP request to Solr), but you're right tomorrow once I'll get to the office, I'll set chunk size to a smaller value, and will tell you if that was the reason. Thanks. 2014-11-23 19:35 GMT+01:00 Alexandre Rafalovitch arafa...@gmail.com: Most probably just a request that's too large. Have you tried dropping down to 500 items and seeing what happens? Are you using SolrJ to send content to Solr? Or a direct HTTP request? Regards, Alex. P.s. You may also find it useful to read up on the Solr commit and hard vs. soft commits. Check solrconfig.xml in the example distribution. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 23 November 2014 at 12:31, Hakim Benoudjit h.benoud...@gmail.com wrote: Hi there, I have deployed solr with Jetty, and I'm trying to index a quite large amount of items (300K), retreived from a MySQL database (unfortunately I'm not using DIH; I'm doing it manually, by getting items from MySQL and then index them it in Solr). But, I'm not indexing all of those items at the same time; I'm indexing them by chunks of 3K. So, I get the first 3K, index them, then goes to the next 3K chunk to index it. Here is the error I got in jetty logs, I guess it has nothing to do with Mysql: *Does anyone know the meaning of the error 'badMessage: java.lang.IllegalStateException: too much data after closed for HttpChannelOverHttp@5432494a' ?* Thanks for your help, if anything isnt very precise please tell me to explain it (and sorry for my bad english). -- Cordialement, Best regards, Hakim Benoudjit -- Cordialement, Best regards, Hakim Benoudjit
Re: Duplicate facets when the handler configuration specifies facet fields
I can reproduce it. I added your parameters to the default section of the config and then run the following: curl http://localhost:8983/solr/schemaless/select?q=*:*rows=0wt=jsonindent=truefacet=truefacet.field=primaryId2facet.limit=10echoParams=all; I get: -- params:{ f.typedef.facet.limit:15, facet.field:primaryId2, df:_text, f.subtype.facet.limit:15, echoParams:all, facet.mincount:1, rows:0, facet:true, q:*:*, facet.limit:10, facet.field:primaryId2, indent:true, echoParams:all, rows:0, wt:json, facet:true}}, -- This is against Solr 5 build, but I think that bug is there all the way to Solr 4.1. I think I traced the source of the bug too (parameter and default names are just joined together but with the first (override) value both times as shown above). Usually makes no difference to anything, but it looks like faceting component iterates over the elements, not just gets them, so it gets bitten twice. I've created a JIRA for this issue: https://issues.apache.org/jira/browse/SOLR-6780 Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 21 November 2014 at 18:29, Alexandre Rafalovitch arafa...@gmail.com wrote: Could you add echoParams=all to the query and see what comes back? Currently, you echo the params you sent, would be good to see what they look like after they combine with defaults. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 21 November 2014 18:04, Tom Zimmermann zimm.to...@gmail.com wrote: Brian and I are working together to diagnose this issue so I can chime in quickly here as well. These values are defined as part of the the defaults section of the config.
Error occurred when getting solr-core
Hi all, I am using solr version 4.7.2 I am getting following when communicating with solr server. org.apache.solr.common.SolrException: No such core: db at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:112) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:118) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102) I am using EmbeddedSorlServer, It is initialized as follows CoreContainer coreContainer = new CoreContainer(solrHome.getPath()); coreContainer.load(); this.server = new EmbeddedSolrServer(coreContainer, db); in solr-home, there is a directory called db. in core.properties file, I define the name and dataDir. Couldn't find the cause for this error. Please help. Thanks Danesh
ERROR StreamingSolrServers 4.10.2
Hi, I have a production Solr Cloud setup which has been migrated from 4.2 to 4.10.2. Upon then at times I'm getting this ERROR. ERROR StreamingSolrServers org.apache.solr.common.SolrException: Bad Request request: http://10.0.0.160:8080/solr/profiles/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F10.0.2.160%3A8080%2Fsolr%2Fprofiles%2Fwt=javabinversion=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:241) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) The cloud is of single shard single replica setup with Zookeeper 3.4.6 and JAVA version used is java version 1.7.0_72 Java(TM) SE Runtime Environment (build 1.7.0_72-b14) Java HotSpot(TM) 64-Bit Server VM (build 24.72-b04, mixed mode) It would be great if anyone could throw some light here. ~Thanks Joe