Solr 4.x EdgeNGramFilterFactory and highlighting
Hello, We are using EdgeNGramFilterFactory to provide partial match on the search phrase for type ahead/autocomplete. Field type definition fieldType name=edgytext class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 preserveOriginal=0 splitOnCaseChange=0 splitOnNumerics=0 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 side=front / /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 preserveOriginal=0 splitOnCaseChange=0 splitOnNumerics=0 / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Field is defined as following field name=ac_source_id type=edgytext indexed=true stored=true multiValued=false termVectors=true termPositions=true termOffsets=true/ Query string q=22qt=%2Fedismaxqf=ac_source_idfl=ac_source_idhl=truehl.mergeContiguous=truehl.useFastVectorHighlighter=truehl.fl=ac_source_id This is what comes back in highlighting response arr name=ac_source_id strem2282372/em/str /arr What I expect is em22/em82372 as it was in Solr 3.6. Also I noticed when I run Analysis on the field start and end positions are the same even though EdgeNGramFilterFactory generates multiple ngrams (see image attached). Questions: 1. How do I accomplish highlighting on partial match in Solr 4.x? 2. Did behavior of EdgeNGramFilterFactory change between 3.6 and 4.x. Highlighting on partial match works fine in Solr 3.6 Thank you, Dmitriy Shvadskiy http://lucene.472066.n3.nabble.com/file/n4114748/Solr_Analysis.png -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-x-EdgeNGramFilterFactory-and-highlighting-tp4114748.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem running Solr indexing in Amazon EMR
Michael, Amazon Hadoop distribution has Lucene 2.9.4 jars in /lib directory and they conflict with Solr 4.4 we are using. Once we pass that problem we run into conflict with Apache HttpComponents you describe. I think the best bet would be for us to build our own AMI to avoid these dependencies. Dmitriy -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-running-Solr-indexing-in-Amazon-EMR-tp4083636p4084103.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem running Solr indexing in Amazon EMR
Michael, We replaced Lucene jars but run into a problem with incompatible version of Apache HttpComponents. Still figuring it out. Dmitriy -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-running-Solr-indexing-in-Amazon-EMR-tp4083636p4084121.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem running Solr indexing in Amazon EMR
Erick, Thank you for the reply. Cloudera image includes Solr 4.3. I'm not sure what version Amazon EMR includes. We are not directly referencing or using their version of Solr but instead build our jar against Solr 4.4 and include all dependencies in our jar file. Also error occurs not while reading existing index but simply creating an instance of EmbeddedSolrServer. I think there is a conflict between jars that EMR process loads and that our map/reduce job requires but I can't figure out what it is. Dmitriy -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-running-Solr-indexing-in-Amazon-EMR-tp4083636p4083855.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem running Solr indexing in Amazon EMR
Erick, It actually suppose to be just one version of Solr that is bundled with our map/reduce jar. To be clear: Map/Reduce job is generating a new index, not reading an existing one. But it fails even before as an instance of EmbeddedSolrServer is created at the first line of the following code. CoreContainer coreContainer = new CoreContainer(solrhomedir); coreContainer.load(); EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, collection1); Dmitriy -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-running-Solr-indexing-in-Amazon-EMR-tp4083636p4083884.html Sent from the Solr - User mailing list archive at Nabble.com.
Problem running Solr indexing in Amazon EMR
Hello, We are trying to utilize Amazon Elastic Map Reduce to build Solr indexes. We are using embedded Solr in the Reduce phase to create the actual index. However we run into a following error and not sure what is causing it. Solr version is 4.4. The job runs fine locally in Cloudera CDH 4.3 VM Thanks, Dmitriy 2013-08-09 14:52:02,602 FATAL org.apache.hadoop.mapred.Child (main): Error running child : java.lang.VerifyError: (class: org/apache/lucene/codecs/lucene40/Lucene40FieldInfosRead er, method: read signature: (Lorg/apache/lucene/store/Directory;Ljava/lang/String;Lorg/apache/lucene/store/IOContext;)Lorg/apache/lucene/index/FieldInfos;) Incompatible argument to function at org.apache.lucene.codecs.lucene40.Lucene40FieldInfosFormat.init(Lucene40FieldInfosFormat.java:99) at org.apache.lucene.codecs.lucene40.Lucene40Codec.init(Lucene40Codec.java:49) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at java.lang.Class.newInstance(Class.java:374) at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:47) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:37) at org.apache.lucene.codecs.Codec.clinit(Codec.java:41) at org.apache.solr.core.SolrResourceLoader.reloadLuceneSPI(SolrResourceLoader.java:185) at org.apache.solr.core.SolrResourceLoader.init(SolrResourceLoader.java:121) at org.apache.solr.core.SolrResourceLoader.init(SolrResourceLoader.java:235) at org.apache.solr.core.CoreContainer.init(CoreContainer.java:149) at org.finra.ss.solr.SolrIndexingReducer.getEmbeddedSolrServer(SolrIndexingReducer.java:195) at org.finra.ss.solr.SolrIndexingReducer.reduce(SolrIndexingReducer.java:94) at org.finra.ss.solr.SolrIndexingReducer.reduce(SolrIndexingReducer.java:33) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:528) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:429) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132) at org.apache.hadoop.mapred.Child.main(Child.java:249) -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-running-Solr-indexing-in-Amazon-EMR-tp4083636.html Sent from the Solr - User mailing list archive at Nabble.com.
Dismax mm per field
Hello, Is there a way to apply (e)dismax mm parameter per field? If I have a query field1:(blah blah) AND field2:(foo bar) is there a way to apply mm only to field2? Thanks, Dmitriy -- View this message in context: http://lucene.472066.n3.nabble.com/Dismax-mm-per-field-tp3222594p3222594.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dismax mm per field
Thanks Jonathan. I thought it would be possible via nested queries but somehow could not get it to work. I'll give it another shot. On Wed, Aug 3, 2011 at 12:32 PM, Jonathan Rochkind [via Lucene] ml-node+3222792-952640420-221...@n3.nabble.com wrote: There is not, and the way dismax works makes it not really that feasible in theory, sadly. One thing you could do instead is combine multiple separate dismax queries using the nested query syntax. This will effect your relevancy ranking possibly in odd ways, but anything that accomplishes 'mm per field' will neccesarily not really be using dismax's disjunction-max relevancy ranking in the way it's intended. Here's how you could combine two seperate dismax queries: defType=lucene q=_query_:{!dismax qf=field1 mm=100%}blah blah AND _query_:{!dismax qf=field2 mm=80%}foo bar That whole q value would need to be properly URI escaped, which I haven't done here for human-readability. Dismax has always got an mm, there's no way to not have an mm with dismax, but mm 100% might be what you mean. Of course, one of those queries could also not be dismax at all, but ordinary lucene query parser or anything else. And of course you could have the same query text for nested queries repeating eg blah blah in both. On 8/3/2011 11:24 AM, Dmitriy Shvadskiy wrote: Hello, Is there a way to apply (e)dismax mm parameter per field? If I have a query field1:(blah blah) AND field2:(foo bar) is there a way to apply mm only to field2? Thanks, Dmitriy -- View this message in context: http://lucene.472066.n3.nabble.com/Dismax-mm-per-field-tp3222594p3222594.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Dismax-mm-per-field-tp3222594p3222792.html To unsubscribe from Dismax mm per field, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3222594code=ZHNodmFkc2tpeUBnbWFpbC5jb218MzIyMjU5NHwtMjczNzY1OTgx. -- View this message in context: http://lucene.472066.n3.nabble.com/Dismax-mm-per-field-tp3222594p3222851.html Sent from the Solr - User mailing list archive at Nabble.com.
Boosting non synonyms result
Hello, Is there a way to boost the result that is an exact match as oppose to synonym match when using query time synonyms? Given the query John Smith and synonyms Jonathan,Jonathan,John,Jon,Nat,Nathan I'd like result containing John Smith to be ranked higher then Jonathan Smith. My thinking was to do it by defining 2 fields: 1 with query time synonyms and 1 without and sort by a function query of a non-synonym field. Is it even possible? I can't quite figure out the syntax for this. I'm using Solr 3.1. Thanks, Dmitriy
Specifying returned fields
Hello, I know you can explicitly specify list of fields returned via fl=field1,field2,field3 Is there a way to specify return all fields but field1 and field2? Thanks, Dmitriy
Re: Specifying returned fields
Thanks Gora The workaround of loading fields via LukeRequestHandler and building fl from it will work for what we need. However it takes 15 seconds per core and we have 15 cores. The query I'm running is /admin/luke?show=schema Is there a way to limit query to return just fields? Thanks, Dmitriy -- View this message in context: http://lucene.472066.n3.nabble.com/Specifying-returned-fields-tp2243423p2243923.html Sent from the Solr - User mailing list archive at Nabble.com.
Best way to check Solr index for completeness
Hello, What would be the best way to check Solr index against original system (Database) to make sure index is up to date? I can use Solr fields like Id and timestamp to check against appropriate fields in database. Our index currently contains over 2 mln documents across several cores. Pulling all documents from Solr index via search (1000 docs at a time) is very slow. Is there a better way to do it? Thanks, Dmitriy