Solr 4.x EdgeNGramFilterFactory and highlighting

2014-01-31 Thread Dmitriy Shvadskiy
Hello,

We are using EdgeNGramFilterFactory to provide partial match on the search
phrase for type ahead/autocomplete. Field type definition

fieldType name=edgytext class=solr.TextField positionIncrementGap=100

 analyzer type=index
   tokenizer class=solr.StandardTokenizerFactory/

   filter class=solr.WordDelimiterFilterFactory
generateWordParts=1
generateNumberParts=1
catenateWords=1
catenateNumbers=1
catenateAll=0
preserveOriginal=0
splitOnCaseChange=0
splitOnNumerics=0
/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EdgeNGramFilterFactory minGramSize=1
maxGramSize=25 side=front /
 /analyzer
 analyzer type=query
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.WordDelimiterFilterFactory
generateWordParts=1
generateNumberParts=1
catenateWords=1
catenateNumbers=1
catenateAll=0
preserveOriginal=0
splitOnCaseChange=0
splitOnNumerics=0
/
   filter class=solr.LowerCaseFilterFactory/
 /analyzer
/fieldType

Field is defined as following
field name=ac_source_id  type=edgytext indexed=true  stored=true 
multiValued=false termVectors=true termPositions=true
termOffsets=true/ 

Query string
q=22qt=%2Fedismaxqf=ac_source_idfl=ac_source_idhl=truehl.mergeContiguous=truehl.useFastVectorHighlighter=truehl.fl=ac_source_id

This is what comes back in highlighting response
arr name=ac_source_id
  strem2282372/em/str
/arr

What I expect is em22/em82372 as it was in Solr 3.6. Also I noticed when
I run Analysis on the field start and end positions are the same even though
EdgeNGramFilterFactory generates multiple ngrams (see image attached). 
Questions:
1. How do I accomplish highlighting on partial match in Solr 4.x?
2. Did behavior of EdgeNGramFilterFactory change between 3.6 and 4.x.
Highlighting on partial match works fine in Solr 3.6

Thank you,
Dmitriy Shvadskiy
http://lucene.472066.n3.nabble.com/file/n4114748/Solr_Analysis.png 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-x-EdgeNGramFilterFactory-and-highlighting-tp4114748.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem running Solr indexing in Amazon EMR

2013-08-12 Thread Dmitriy Shvadskiy
Michael,

Amazon Hadoop distribution has Lucene 2.9.4 jars in /lib directory and they
conflict with Solr 4.4 we are using. Once we pass that problem we run into
conflict with Apache HttpComponents you describe. I think the best bet would
be for us to build our own AMI to avoid these dependencies.

Dmitriy



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-running-Solr-indexing-in-Amazon-EMR-tp4083636p4084103.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem running Solr indexing in Amazon EMR

2013-08-12 Thread Dmitriy Shvadskiy
Michael,
We replaced Lucene jars but run into a problem with incompatible version of
Apache HttpComponents. Still figuring it out.

Dmitriy 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-running-Solr-indexing-in-Amazon-EMR-tp4083636p4084121.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem running Solr indexing in Amazon EMR

2013-08-11 Thread Dmitriy Shvadskiy
Erick,

Thank you for the reply. Cloudera image includes Solr 4.3. I'm not sure what
version Amazon EMR includes. We are not directly referencing or using their
version of Solr but instead build our jar against Solr 4.4 and include all
dependencies in our jar file.  Also error occurs not while reading existing
index but simply creating an instance of EmbeddedSolrServer. I think there
is a conflict between jars that EMR process loads and that our map/reduce
job requires but I can't figure out what it is.

Dmitriy



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-running-Solr-indexing-in-Amazon-EMR-tp4083636p4083855.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem running Solr indexing in Amazon EMR

2013-08-11 Thread Dmitriy Shvadskiy
Erick,

It actually suppose to be just one version of Solr that is bundled with our
map/reduce jar. To be clear: Map/Reduce job is generating a new index, not
reading an existing one. But it fails even before  as an instance of
EmbeddedSolrServer is created at the first line of the following code.

CoreContainer coreContainer = new CoreContainer(solrhomedir);
coreContainer.load();
EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer,
collection1);

Dmitriy



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-running-Solr-indexing-in-Amazon-EMR-tp4083636p4083884.html
Sent from the Solr - User mailing list archive at Nabble.com.


Problem running Solr indexing in Amazon EMR

2013-08-09 Thread Dmitriy Shvadskiy
Hello,
We are trying to utilize Amazon Elastic Map Reduce to build Solr indexes. We
are using embedded Solr in the Reduce phase to create the actual index.
However we run into a following error and not sure what is causing it. Solr
version is 4.4. The job runs fine locally in Cloudera CDH 4.3 VM

Thanks,
Dmitriy


2013-08-09 14:52:02,602 FATAL org.apache.hadoop.mapred.Child (main): Error
running child : java.lang.VerifyError: (class:
org/apache/lucene/codecs/lucene40/Lucene40FieldInfosRead
er, method: read signature:
(Lorg/apache/lucene/store/Directory;Ljava/lang/String;Lorg/apache/lucene/store/IOContext;)Lorg/apache/lucene/index/FieldInfos;)
Incompatible argument
to function
at
org.apache.lucene.codecs.lucene40.Lucene40FieldInfosFormat.init(Lucene40FieldInfosFormat.java:99)
at
org.apache.lucene.codecs.lucene40.Lucene40Codec.init(Lucene40Codec.java:49)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at java.lang.Class.newInstance(Class.java:374)
at
org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67)
at
org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:47)
at
org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:37)
at org.apache.lucene.codecs.Codec.clinit(Codec.java:41)
at
org.apache.solr.core.SolrResourceLoader.reloadLuceneSPI(SolrResourceLoader.java:185)
at
org.apache.solr.core.SolrResourceLoader.init(SolrResourceLoader.java:121)
at
org.apache.solr.core.SolrResourceLoader.init(SolrResourceLoader.java:235)
at org.apache.solr.core.CoreContainer.init(CoreContainer.java:149)
at
org.finra.ss.solr.SolrIndexingReducer.getEmbeddedSolrServer(SolrIndexingReducer.java:195)
at
org.finra.ss.solr.SolrIndexingReducer.reduce(SolrIndexingReducer.java:94)
at
org.finra.ss.solr.SolrIndexingReducer.reduce(SolrIndexingReducer.java:33)
at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:528)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:429)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-running-Solr-indexing-in-Amazon-EMR-tp4083636.html
Sent from the Solr - User mailing list archive at Nabble.com.


Dismax mm per field

2011-08-03 Thread Dmitriy Shvadskiy
Hello,
Is there a way to apply (e)dismax mm parameter per field? If I have a query 
field1:(blah blah) AND field2:(foo bar)

is there a way to apply mm only to field2?

Thanks,
Dmitriy

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dismax-mm-per-field-tp3222594p3222594.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dismax mm per field

2011-08-03 Thread Dmitriy Shvadskiy
Thanks Jonathan. I thought it would be possible via nested queries but
somehow could not get it to work.
I'll give it another shot.

On Wed, Aug 3, 2011 at 12:32 PM, Jonathan Rochkind [via Lucene] 
ml-node+3222792-952640420-221...@n3.nabble.com wrote:

 There is not, and the way dismax works makes it not really that feasible
 in theory, sadly.

 One thing you could do instead is combine multiple separate dismax
 queries using the nested query syntax. This will effect your relevancy
 ranking possibly in odd ways, but anything that accomplishes 'mm per
 field' will neccesarily not really be using dismax's disjunction-max
 relevancy ranking in the way it's intended.

 Here's how you could combine two seperate dismax queries:

 defType=lucene
 q=_query_:{!dismax qf=field1 mm=100%}blah blah AND _query_:{!dismax
 qf=field2 mm=80%}foo bar

 That whole q value would need to be properly URI escaped, which I
 haven't done here for human-readability.

 Dismax has always got an mm, there's no way to not have an mm with
 dismax, but mm 100% might be what you mean. Of course, one of those
 queries could also not be dismax at all, but ordinary lucene query
 parser or anything else. And of course you could have the same query
 text for nested queries repeating eg blah blah in both.



 On 8/3/2011 11:24 AM, Dmitriy Shvadskiy wrote:

  Hello,
  Is there a way to apply (e)dismax mm parameter per field? If I have a
 query
  field1:(blah blah) AND field2:(foo bar)
 
  is there a way to apply mm only to field2?
 
  Thanks,
  Dmitriy
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Dismax-mm-per-field-tp3222594p3222594.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Dismax-mm-per-field-tp3222594p3222792.html
  To unsubscribe from Dismax mm per field, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3222594code=ZHNodmFkc2tpeUBnbWFpbC5jb218MzIyMjU5NHwtMjczNzY1OTgx.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dismax-mm-per-field-tp3222594p3222851.html
Sent from the Solr - User mailing list archive at Nabble.com.

Boosting non synonyms result

2011-05-17 Thread Dmitriy Shvadskiy
Hello,
Is there a way to boost the result that is an exact match as oppose to
synonym match when using query time synonyms?
Given the query John Smith and synonyms
Jonathan,Jonathan,John,Jon,Nat,Nathan

I'd like result containing John Smith to be ranked higher then Jonathan
Smith.
My thinking was to do it by defining 2 fields: 1 with query time synonyms
and 1 without and sort by a function query of a non-synonym field. Is it
even possible? I can't quite figure out the syntax for this.
I'm using Solr 3.1.

Thanks,
Dmitriy


Specifying returned fields

2011-01-12 Thread Dmitriy Shvadskiy
Hello,

I know you can explicitly specify list of fields returned via
fl=field1,field2,field3

Is there a way to specify return all fields but field1 and field2?

Thanks,
Dmitriy


Re: Specifying returned fields

2011-01-12 Thread Dmitriy Shvadskiy

Thanks Gora
The workaround of loading fields via LukeRequestHandler and building fl from
it will work for what we need. However it takes 15 seconds per core and we
have 15 cores. 
The query I'm running is /admin/luke?show=schema
Is there a way to limit query to return just fields?

Thanks,
Dmitriy
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Specifying-returned-fields-tp2243423p2243923.html
Sent from the Solr - User mailing list archive at Nabble.com.


Best way to check Solr index for completeness

2010-09-28 Thread Dmitriy Shvadskiy
Hello,
What would be the best way to check Solr index against original system
(Database) to make sure index is up to date? I can use Solr fields like Id
and timestamp to check against appropriate fields in database. Our index
currently contains over 2 mln documents across several cores. Pulling all
documents from Solr index via search (1000 docs at a time) is very slow. Is
there a better way to do it?

Thanks,
Dmitriy