Re: Unexpected docvalues type error using result grouping - Use UninvertingReader or index with docvalues

2015-07-24 Thread shamik
Thanks Eric.

Here's the part which I'm not able to understand. I've for e.g. Source A, B,
C and D in index. Each source contains n number of documents. Now, out of
these, a bunch of documents in A and B are tagged with MediaType. I took the
following steps:

1. Delete all documents tagged with MediaType for A and B. Documents from C
and D are not touched.

2. Re-Index documents which were tagged with MediaType

3. Run Optimization

Still, I keep seeing this exception. Does this mean, content from C and D
are impacted even though they are not tagged with MediaType ?

I'll follow your recommendation of creating a new collection, do a full
index and delete original collection.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unexpected-docvalues-type-error-using-result-grouping-Use-UninvertingReader-or-index-with-docvalues-tp4218939p4219127.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Unexpected docvalues type error using result grouping - Use UninvertingReader or index with docvalues

2015-07-24 Thread shamik
I didn't use the REST API, instead updated the schema manually.

Can you be specific on removing the data directory content ? I certainly
don't want to wipe out the index. I've four Solr instances, 2 shards with a
replica each. Are you suggesting clearing the index and re-indexing from
scratch ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unexpected-docvalues-type-error-using-result-grouping-Use-UninvertingReader-or-index-with-docvalues-tp4218939p4219089.html
Sent from the Solr - User mailing list archive at Nabble.com.


Unexpected docvalues type error using result grouping - Use UninvertingReader or index with docvalues

2015-07-23 Thread Shamik Bandopadhyay
)
... 37 more


Here's the current field definition :

field name=MediaType type=string indexed=true stored=true
multiValued=true required=false omitNorms=true /

I've re-indexed the documents, ran optimization on all four instances,
still I'm seeing the same error. I'm bit puzzled to figure out the
root cause. Do I need to delete the documents

tagged with MediaType and re-index ? I'm getting results back if I
don't use result grouping.

Any pointers will be appreciated.

- Thanks,

Shamik


Combining two MLT queries

2015-07-22 Thread Shamik Bandopadhyay
Just wondering if it's possible to combine to separate MLT queries (based
on filtering condition) into a single one. I'm trying to combine the
results of this two query:

http://localhost:8983/solr/collection1/mlt?q=title:ABCfq=Source:(Test1
OR Test3 OR Test4)
http://localhost:8983/solr/collection1/mlt?q=title:ABCfq=Source:(Test2)

The Source field filter differs in both cases. What I'm looking is to
combine the top 4 from query 1 and top 4 from query 2. I was exploring the
option to combine into a single query instead of two. Is it possible ?

Any pointer will be appreciated.

-Thanks,

Shamik


Re: Issue with German search

2015-05-19 Thread shamik
Anyone ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Issue-with-German-search-tp4206104p4206306.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Issue with German search

2015-05-19 Thread shamik
Thanks Doug. I'm using eDismax

Here's my Solr query :

http://localhost:8983/solr/testhandlerdeu?debugQuery=trueq=title_deu:Software%20und%20Downloads

Here's my request handler.

requestHandler name=/testhandlerdeu class=solr.SearchHandler
lst name=defaults
str name=echoParamsexplicit/str
float name=tie0.01/float
str name=wtvelocity/str
str name=v.templatebrowse/str
str name=v.contentTypetext/html;charset=UTF-8/str 
  
str name=v.layoutlayout/str
str name=v.channeltesthandler/str
str name=titleTest Request Handler German/str

str name=defTypeedismax/str
str name=q.opAND/str

str name=q.alt*:*/str
str name=rows15/str
str name=fl*,score/str
str name=qfname_deu^1.2  title_deu^10.0 
description_deu^5.0 /str
str name=dftext_deu/str   


str name=faceton/str
str name=facet.mincount1/str
str name=facet.limit-1/str
str name=facet.sortindex/str
str name=facet.methodenum/str
str name=facet.fieldcat/str
str name=facet.fieldmanu_exact/str
str name=facet.fieldcontent_type/str
str name=facet.fieldauthor/str


str name=hltrue/str
str name=hl.tag.pre/str
str name=hl.tag.post/str
str name=hl.flname subject description_deu name_deu 
title_deu/str
str name=hl.encoderhtml/str
str name=f.subject.hl.fragsize20/str
str name=f.description_fra.hl.fragsize20/str
str name=f.name_fra.hl.fragsize20/str
str name=hl.usePhraseHighlighterfalse/str
str name=hl.useFastVectorHighlightertrue/str
str name=hl.boundaryScannerbreakIterator/str
str name=hl.bs.typeSENTENCE/str


str name=spellchecktrue/str
str name=spellcheck.dictionarydefault/str
str name=spellcheck.collatetrue/str
str name=spellcheck.onlyMorePopularfalse/str
str name=spellcheck.extendedResultsfalse/str
str name=spellcheck.count1/str

/lst
arr name=last-components
strspellcheck/str
/arr
/requestHandler





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Issue-with-German-search-tp4206104p4206341.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Issue with German search

2015-05-19 Thread shamik
Thanks a ton Doug, I should have figured this out, pretty stupid of me.

Appreciate your help.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Issue-with-German-search-tp4206104p4206357.html
Sent from the Solr - User mailing list archive at Nabble.com.


Issue with German search

2015-05-18 Thread Shamik Bandopadhyay
Hi,

 I'm having an issue with searching a term in german. Here's the keyword(s)
I'm trying to search -- Software und Downloads

I've a document indexed in German with the same title -- Software und
Downloads

I'm expecting that the search on Software und Downloads will return this
document, unfortunately it's not happening.

Here's my sample test scenario from my local machine.

In schema, I've defined these three fields.

field name=title_deu type=adsktext_deu indexed=true stored=true
multiValued=true /
field name=name_deu type=adsktext_deu indexed=true stored=true
 termVectors=true termPositions=true termOffsets=true/
field name=description_deu type=adsktext_deu indexed=true
stored=true  termVectors=true termPositions=true termOffsets=true/


Field Type definition :

!-- German language specific definitions --
fieldType name=adsktext_deu class=solr.TextField
positionIncrementGap=100 autoGeneratePhraseQueries=true
analyzer type=index
charFilter class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent.txt /
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_de.txt format=snowball /
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0 splitOnCaseChange=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.DictionaryCompoundWordTokenFilterFactory
dictionary=lang/dictionary_de.txt /
 filter class=solr.GermanNormalizationFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=German2/
/analyzer

analyzer type=query
charFilter class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent.txt /
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_de.txt format=snowball /
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0 splitOnCaseChange=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.DictionaryCompoundWordTokenFilterFactory
dictionary=lang/dictionary_de.txt /
filter class=solr.GermanNormalizationFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=German2/
/analyzer
/fieldType


When I ran a sample analysis of Software und Downloads,  the term is
indexed as

softwar softoft   download   ad

During query, it's getting searched as

softwar  download

Not sure, why it's not returning the document.


Here's the sample data indexed through solr.xml under example docs.

doc
field name=id12234!SOLR11092212/field
field name=name_deuTest Name/field
field name=title_deuSoftware und Downloads/field
field name=description_deudiv#actcontain { width: 100%; min-width:
220px; display: block; float: left; padding: 0 8px 0 0; } div#actcopy {
width: 48%; min-width: 230px; min-height: 120px; float: left; display:
inline-block; padding: 0 28px 0 0; margin: 10px 0 0 0; overflow: hidden; }
Häufige ThemenDownload-Verfahren im Autodesk-KontoDownload-Verfahren für
Education Community (Schüler, Studenten und Lehrkräfte)Suchen von Service
Packs, Hotfixes und SprachpaketenSoftware-Lizenzen im Autodesk-Store
kaufenSuchen kostenloser Testversion-Downloads
 Download-VerfahrenHerunterladen von Software aus verschiedenen
Speicherorten, abhängig von Ihrem Konto oder dem Subscription-TypenNutzung
am Heimarbeitsplatz für AbonnentenDesktop Subscription können lizenzierte
Software zur Verwendung auf ihrem Computer zu Hause
erhaltenProdukterweiterungen für AbonnentenExklusiver Zugriff auf die
neueste Software für einige Autodesk-ProdukteBestellen einer Software-DVDSo
bestellen Sie eine DVD oder einen USB-Stick für Ihre SoftwareAktuelle
Versionen für AbonnentenSubscription-Kunden haben Zugriff auf
Produkt-Updates, die während der Vertragslaufzeit verfügbar
sind.VorgängerversionenErfahren Sie, wie Sie eine Vorgängerversion Ihrer
Autodesk-Software erhaltenSprachoptionenHerunterladen der lizenzierten
Software in einer anderen Sprache oder Erhalten eines Sprachpakets.
/field
field name=authorBob/field
/doc

Any pointers will be appreciated.

-Thanks,
Shamik


Re: Grouping Performance Optimation

2015-04-23 Thread shamik
You should look at CollapsinQParserPlugin. It's much faster compared to a
Grouping query.

https://wiki.apache.org/solr/CollapsingQParserPlugin

It has a limitation though, check the following JIRA if it might affect your
use-case.

https://issues.apache.org/jira/browse/SOLR-6143



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Grouping-Performance-Optimation-tp4201886p4202032.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Unable to update config file using zkcli or RELOAD

2015-04-03 Thread shamik
Ok, I figured the steps in case someone needs a reference. It required  both
zkcli and RELOAD to update the changes.

1. Use zkcli to load the changes. I ran it from the node which used the
bootstrapping.

sh zkcli.sh -cmd upconfig -zkhost  zoohost1:2181 -confname myconf -solrhome 
/mnt/opt/solrhome/ -confdir /mnt/opt/solrhome/solr/collection1/conf/ 

2. Use the same node to run the RELOAD

http://54.151.xx.xxx:8983/solr/admin/cores?action=RELOADcore=collection1



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unable-to-update-config-file-using-zkcli-or-RELOAD-tp4197376p4197393.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Unable to update config file using zkcli or RELOAD

2015-04-03 Thread shamik
Thanks Shawn for the pointer, really appreciate it.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unable-to-update-config-file-using-zkcli-or-RELOAD-tp4197376p4197494.html
Sent from the Solr - User mailing list archive at Nabble.com.


Unable to update config file using zkcli or RELOAD

2015-04-02 Thread Shamik Bandopadhyay
Hi,

  I'm facing a weird issue. I've a solr cloud cluster with 2 shards having
a replica each. I started the cluster
using -Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=myconf. After the cluster is up and running, I
added a new request handler (newhandler) and wanted to push it without
restarting the server. First, I tried the RELOAD option. I ran

http://54.151.xx.xxx:8983/solr/admin/cores?action=RELOADcore=collection1

The command was successful, but when I logged in to the admin screen, the
solrconfig didn't show the request handler. Next I tried the zkcli script
on shard 1.

sh zkcli.sh -cmd upconfig -zkhost  zoohost1:2181 -confname myconf -solrhome
/mnt/opt/solrhome/ -confdir /mnt/opt/solrhome/solr/collection1/conf/

The script ran successfully and I could see the updated solrconfig file in
Solr admin. But then, when I tried

http://54.151.xx.xxx:8983/solr/collection1/newhandler

I got a 404. Not sure what I'm doing wrong. Do I need to run the zkcli
script on each node? I'm using Solr 5.0.

Regards,
Shamik


Uneven index distribution using composite router

2015-03-26 Thread Shamik Bandopadhyay
Hi,

   I'm using a three level composite router in a solr cloud environment,
primarily for multi-tenant and field collapsing. The format is as follows.

*language!topic!url*.

An example would be :

ENU!12345!www.testurl.com/enu/doc1
GER!12345!www.testurl.com/ger/doc2
CHS!67890!www.testurl.com/chs/doc3

The Solr Cloud cluster contains 2 shard, each having 3 replicas. After
indexing around 10 million documents, I'm observing that the index size in
shard 1 is around 60gb while shard 2 is 15gb. So the bulk of the data is
getting indexed in shard 1. Since 60% of the document is english, I expect
the index size to be higher on one shard, but the difference seem little
too high.

The idea is to make sure that all ENU!12345 documents are routed to one
shard so that distributed field collapsing works. Is there something I can
do differently here to make a better distribution ?

Any pointers will be appreciated.

Regards,
Shamik


Re: Uneven index distribution using composite router

2015-03-26 Thread shamik
Thanks for your reply Eric.

In my case, I've 14 languages, out of which 50% of the documents belong to
English. German and CHS will probably constitute another 25%. I'm not using
copyfield, rather, each language has it's dedicated field such as title_enu,
text_enu, title_ger,text_ger, etc. Since I know the language prior to index
time, this works for, me. 

I've added one more sample key in the example. 

ENU!12345!www.testurl.com/enu/doc1 
ENU!12345!www.testurl.com/enu/doc10 
GER!12345!www.testurl.com/ger/doc2 
CHS!67890!www.testurl.com/chs/doc3 

As you can see, there are 2 documents in english having same topic id
(12345). I added topicid as part of the key to make sure that they are
residing in the same shard in order to make field collapsing work on topic
id. I can perhaps remove the composite key and only have language and url,
something like, 

ENU!www.testurl.com/enu/doc1

But that'll probably not solve the distribution issue. You mentioned when
you take over routing, making sure the distribution is even is now your
responsibility. I'm wondering, what's the best practice to make it happen ?
I can get away from composite router and manually assign a bunch of language
to a dedicated shard, both during index and query time. But I'm not sure
keeping a map is an efficient way of dealing with it. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Uneven-index-distribution-using-composite-router-tp4195569p4195591.html
Sent from the Solr - User mailing list archive at Nabble.com.


Uneven data distribution with composite router

2015-03-25 Thread Shamik Bandopadhyay
Hi,

   I'm using a three level composite router in a solr cloud environment,
primarily for multi-tenant and field collapsing. The format is as follows.

*language!topic!url*.

An example would be :

ENU!12345!www.testurl.com/enu/doc1
GER!12345!www.testurl.com/ger/doc2
CHS!67890!www.testurl.com/chs/doc3

The Solr Cloud cluster contains 2 shard, each having 3 replicas. After
indexing around 10 million documents, I'm observing that the index size in
shard 1 is around 60gb while shard 2 is 15gb. So the bulk of the data is
getting indexed in shard 1. Since 60% of the document is english, I expect
the index size to be higher on one shard, but the difference seem little
too high.

The idea is to make sure that all ENU!12345 documents are routed to one
shard so that distributed field collapsing works. Is there something I can
do differently here to make a better distribution ?

Any pointers will be appreciated.

Regards,
Shamik


Problem with Terms Query Parser

2015-03-24 Thread Shamik Bandopadhyay
Hi,

  I'm trying to use Terms Query Parser for one of my use cases where I use
an implicit filter on bunch of sources.

When I'm trying to run the following query,

fq={!terms f=Source}help,documentation,sfdc

I'm getting the following error.

lst name=errorstr name=msgUnknown query parser 'terms'/strint
name=code400/int/lst

What am I missing here ? I'm using Solr 5.0 version.

Any pointers will be appreciated.

Regards,
Shamik


Solr 5.0 -- IllegalStateException: unexpected docvalues type NONE on result grouping

2015-03-12 Thread Shamik Bandopadhyay
Hi,

   I've a field which is being used for result grouping. Here's the field
definition.

field name=ADDedup type=string indexed=true stored=true
multiValued=false required=false omitNorms=true  docValues=true/

This started once I did a rolling update from 4.7 to 5.0. I started getting
the error on any group by query -- SolrDispatchFilter null:java.lang.
IllegalStateException: unexpected docvalues type NONE for field 'ADSKDedup'
(expected=SORTED). Use UninvertingReader or dex with docvalues.

Does this mean that I need to re-index documents to get over this error ?

Regards,
Shamik


Re: Solr 5.0 -- IllegalStateException: unexpected docvalues type NONE on result grouping

2015-03-12 Thread shamik
Wow, optimize worked like a charm. This really addressed the docvalues
issue. A follow-up question, is it recommended to run optimize in a
Production Solr index ? Also, in a Sorl cloud mode, do we need to run
optimize on each instance / each shard / any instance ?

Appreciate your help Alex.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-5-0-migration-IllegalStateException-unexpected-docvalues-type-NONE-on-fields-using-docvalues-tp4192477p4192732.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 5.0 -- IllegalStateException: unexpected docvalues type NONE on result grouping

2015-03-12 Thread shamik
Well, I think I've narrowed down the issue. The error is happening when I'm
trying to do a rolling update from Solr 4.7 (which is our current version)
to 5.0 . I'm able to re-produce this couple of times. If I do a fresh index
on a 5.0, it works. Not sure if there's any other way to mitigate it. 

I'll appreciate if someone can share their experience on the same.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-5-0-migration-IllegalStateException-unexpected-docvalues-type-NONE-on-fields-using-docvalues-tp4192477p4192706.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 5.0 -- IllegalStateException: unexpected docvalues type NONE on result grouping

2015-03-11 Thread shamik
Looks like it's happening for any field which is using docvalues.

java.lang.IllegalStateException: unexpected docvalues type NONE for field
'title_sort' (expected=SORTED). Use UninvertingReader or index with
docvalues.

Any idea ? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-5-0-IllegalStateException-unexpected-docvalues-type-NONE-on-result-grouping-tp4192477p4192529.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 5.0 -- IllegalStateException: unexpected docvalues type NONE on result grouping

2015-03-11 Thread shamik
Thanks for your reply. Initially, I was under the impression that the issue
is related to grouping as group queries were failing. Later, when I looked
further, I found that it's happening for any field for which the docvalue
has turned on. The second example I took was from another field. Here's a
full stack trace for another field using docvalues.

Field definition :

field name=DocumentType type=string indexed=true stored=true
multiValued=false required=false omitNorms=true docValues=true /


3/11/2015, 2:14:30 PM   ERROR   SolrDispatchFilter 
null:java.lang.IllegalStateException: unexpected docvalues type NONE for
field 'DocumentType' (expected=SORTED). Use UninvertingReader or index with
docvalues.

null:java.lang.IllegalStateException: unexpected docvalues type NONE for
field 'DocumentType' (expected=SORTED). Use UninvertingReader or index with
docvalues.
at org.apache.lucene.index.DocValues.checkField(DocValues.java:208)
at org.apache.lucene.index.DocValues.getSorted(DocValues.java:264)
at
org.apache.lucene.search.FieldComparator$TermOrdValComparator.getSortedDocValues(FieldComparator.java:757)
at
org.apache.lucene.search.FieldComparator$TermOrdValComparator.getLeafComparator(FieldComparator.java:762)
at
org.apache.lucene.search.FieldValueHitQueue.getComparators(FieldValueHitQueue.java:183)
at
org.apache.lucene.search.TopFieldCollector$NonScoringCollector.getLeafCollector(TopFieldCollector.java:141)
at
org.apache.lucene.search.MultiCollector.getLeafCollector(MultiCollector.java:99)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:583)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:284)
at
org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:231)
at
org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1766)
at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1502)
at
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:586)
at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:511)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:227)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2006)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 

Re: How to start solr in solr cloud mode using external zookeeper ?

2015-03-05 Thread shamik
The other way you can do that is to specify the startup parameters in
solr.in.sh. 

Example :

SOLR_MODE=solrcloud

ZK_HOST=zoohost1:2181,zoohost2:2181,zoohost3:2181

SOLR_PORT=4567

You can simply start solr by running ./solr start



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-start-solr-in-solr-cloud-mode-using-external-zookeeper-tp4190630p4191286.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Does DocValues improve Grouping performance ?

2015-02-05 Thread shamik
Joel,

  To give you some context, we are running queries against 6 million
documents in a Solr cloud environment. The grouping is done to de-duplicate
content based on an unique field. Unfortunately, due to some requirement
constraint, the only way for us to run the de-duplication is during query
time.

The group numbers are pretty high in our case. Average distinct group is
around 1000. The total number of distinct group for the field is around 10k.
Phrase queries are especially worse,averaging a response time of 10-12 secs.
Having said that, CollapsingQParserPlugin makes a huge difference in
performance, only caveat being the lack of support for  group.facets
equivalent. I had this discussion earlier with you where you had confirmed
it

http://lucene.472066.n3.nabble.com/RE-SOLR-6143-Bad-facet-counts-from-CollapsingQParserPlugin-td4140455.html#a4146645

Are there any plans to address this ? Not sure if it's a big change at your
end, but if something we can contribute to add it, I'm more than happy to
help. I know there are a bunch of people who are looking forward to this.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-DocValues-improve-Grouping-performance-tp4179926p4184295.html
Sent from the Solr - User mailing list archive at Nabble.com.


Include stopwords in phrase search

2015-02-04 Thread Shamik Bandopadhyay
Hi,

  I'm having an issue running phrase quires with stopwords. Looks like Solr
is ignoring the stopword during search. Here's my search term.

cannot open device

When I'm executing title:cannot open device , it's bringing back titles
with Find Open Devices.  Here's my field definition for title :

field name=title type=adsktext indexed=true stored=true
multiValued=true/

fieldType name=adsktext class=solr.TextField
positionIncrementGap=100 autoGeneratePhraseQueries=true
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=0 catenateNumbers=1
catenateAll=0 splitOnCaseChange=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=0 catenateNumbers=1
catenateAll=0 splitOnCaseChange=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
/analyzer
/fieldType

Sample text :

doc
field name=id111!SOLR1000/field
field name=nameSolr, the Enterprise Search Server/field
field name=titleFind Open Devices/field
/doc
doc
field name=id333!SOLR1002/field
field name=nameElasticSearch Server/field
field name=titleCannot open device/field
/doc

I've cannot as part of my stopword list.

Weird part is, when I analyze the phrase in Solr admin, it's getting
indexed as the following three tokens :

cannot open devic

I'm in Solr 4.7, so not sure if enablePositionIncrements=true is making
any difference.

Any feedback will be appreciated.

Thanks,
Shamik


Re: Include stopwords in phrase search

2015-02-04 Thread shamik
Well, I somehow made it work by using CommonGramsFilterFactory.

filter class=solr.CommonGramsFilterFactory words=stopwords.txt
ignoreCase=true/

Just wondering if it's the right approach ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Include-stopwords-in-phrase-search-tp4184067p4184068.html
Sent from the Solr - User mailing list archive at Nabble.com.


Issue with Solr multiple sort

2015-01-21 Thread Shamik Bandopadhyay
Hi,

  I'm  facing a problem with multiple field sort in Solr. I'm using the
following fields in sort :

PublishDate asc,DocumentType asc

The sort is only happening on PublishDate, DocumentType seemsto completely
ignored. Here's my field type definition.

field name=PublishDate type=tdate indexed=true stored=true
default=NOW/
field name=DocumentType type=string indexed=true stored=true
multiValued=false required=false omitNorms=true/

Here's the sample query:

http://localhost:8983/solr/select?sort=PublishDate+desc%2CDocumentType+descq=cat:searchfl=PublishDate,DocumentTypedebugQuery=true

Here's the output :

result name=response numFound=8 start=0
doc
date name=PublishDate2015-01-17T00:00:00Z/date
str name=DocumentTypeHotfixes/str
/doc
doc
date name=PublishDate2014-11-17T00:00:00Z/date
str name=DocumentTypeHotfixes/str
/doc
doc
date name=PublishDate2013-01-17T00:00:00Z/date
str name=DocumentTypeTutorials/str
/doc
doc
date name=PublishDate2012-10-17T00:00:00Z/date
str name=DocumentTypeService Packs/str
/doc
doc
date name=PublishDate2012-01-17T00:00:00Z/date
str name=DocumentTypeTutorials/str
/doc
doc
date name=PublishDate2011-01-17T00:00:00Z/date
str name=DocumentTypeTutorials /str
/doc
doc
date name=PublishDate2006-01-17T00:00:00Z/date
str name=DocumentTypeObject Enablers/str
/doc
doc
date name=PublishDate2006-01-17T00:00:00Z/date
str name=DocumentTypeHotfixes/str
/doc
/result

As you can see, the sorting happened only on PublishDate. I'm using Solr
4.7.

Not sure what I'm missing here, any pointers will be appreciated.

Thanks,
Shamik


Re: Issue with Solr multiple sort

2015-01-21 Thread shamik
Thanks Hoss for clearing up my doubt. I was confused with the ordering. So I
guess, the first field is always the primary sort field followed by
secondary.

Thanks again.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Issue-with-Solr-multiple-sort-tp4181056p4181062.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Conditions in function query

2015-01-15 Thread shamik
This one worked.

if(termfreq(Source,'A'),sum(Likes,3),if(termfreq(Source,'B'),sum(Likes,3),0))



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Conditions-in-Boost-function-query-tp4179687p4179906.html
Sent from the Solr - User mailing list archive at Nabble.com.


Does DocValues improve Grouping performance ?

2015-01-15 Thread Shamik Bandopadhyay
Hi,

   Does use of DocValues provide any performance improvement for Grouping ?
I' looked into the blog which mentions improving Grouping performance
through DocValues.

https://lucidworks.com/blog/fun-with-docvalues-in-solr-4-2/

Right now, Group by queries (which I can't sadly avoid) has become a huge
bottleneck. It has an overhead of 60-70% compared to the same query san
group by. Unfortunately, I'm not able to be CollapsingQParserPlugin as it
doesn't have a support similar to group.facet feature.

My understanding on DocValues is that it's intended for faceting and
sorting. Just wondering if anyone have tried DocValues for Grouping and saw
any improvements ?

-Thanks,
Shamik


Conditions in function query

2015-01-14 Thread Shamik Bandopadhyay
Hi,

   Just wanted to know if it's possible to provide conditions with a
function query. Right now,I'm using the following functions to boost on
Likes data.

bf=recip(ms(NOW/DAY,PublishDate),3.16e-11,1,1)^2.0 sum(Likes,2)

What I would like to do is to apply the boost on Likes based on source.
For e.g.

if Source=A or B or C, then sum(Likes,4)
if Source=D then sum(Likes,3)
if Source=E the sum(Likes,2).

Is it possible to do this using a function ?

Any pointers will be appreciated.

Regards,
Shamik


Re: Conditions in function query

2015-01-14 Thread shamik
Thanks Eric, I did take a look at the if condition earlier, but not sure
how that can be used for multiple conditions. It works for a single
condition :

 if(termfreq(Source2,'A'),sum(Likes,3),0)

But for multiple, I'm struggling to find the right syntax. I tried using OR
in conjunction but hasn't worked out so far.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Conditions-in-Boost-function-query-tp4179687p4179696.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2014-12-12 Thread shamik
Anyone ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Have-anyone-used-Automatic-Phrase-Tokenization-AutoPhrasingTokenFilterFactory-tp4173808p4174069.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2014-12-12 Thread shamik
Ted,

  Here's the query I'm using and the debug info. It's still returning all 5
results back as if it's simply looking for either of the term with q.op set
as OR (default).

http://localhost:8983/solr/autophrase?q=text:seat+cushionswt=xmldebugQuery=true

Debug

lst name=debug
str name=rawquerystringtext:seat cushions/str
str name=querystringtext:seat cushions/str
str name=parsedquerytext:seat text:cushion/str
str name=parsedquery_toStringtext:seat text:cushion/str
lst name=explain
str name=2
0.430151 = (MATCH) sum of:
  0.11124363 = (MATCH) weight(text:seat in 1) [DefaultSimilarity], result
of:
0.11124363 = score(doc=1,freq=1.0 = termFreq=1.0
), product of:
  0.5085423 = queryWeight, product of:
1.0 = idf(docFreq=5, maxDocs=6)
0.5085423 = queryNorm
  0.21875 = fieldWeight in 1, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
1.0 = idf(docFreq=5, maxDocs=6)
0.21875 = fieldNorm(doc=1)
  0.31890735 = (MATCH) weight(text:cushion in 1) [DefaultSimilarity], result
of:
0.31890735 = score(doc=1,freq=1.0 = termFreq=1.0
), product of:
  0.86103696 = queryWeight, product of:
1.6931472 = idf(docFreq=2, maxDocs=6)
0.5085423 = queryNorm
  0.37037593 = fieldWeight in 1, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
1.6931472 = idf(docFreq=2, maxDocs=6)
0.21875 = fieldNorm(doc=1)
/str
str name=6
0.430151 = (MATCH) sum of:
  0.11124363 = (MATCH) weight(text:seat in 5) [DefaultSimilarity], result
of:
0.11124363 = score(doc=5,freq=1.0 = termFreq=1.0
), product of:
  0.5085423 = queryWeight, product of:
1.0 = idf(docFreq=5, maxDocs=6)
0.5085423 = queryNorm
  0.21875 = fieldWeight in 5, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
1.0 = idf(docFreq=5, maxDocs=6)
0.21875 = fieldNorm(doc=5)
  0.31890735 = (MATCH) weight(text:cushion in 5) [DefaultSimilarity], result
of:
0.31890735 = score(doc=5,freq=1.0 = termFreq=1.0
), product of:
  0.86103696 = queryWeight, product of:
1.6931472 = idf(docFreq=2, maxDocs=6)
0.5085423 = queryNorm
  0.37037593 = fieldWeight in 5, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
1.6931472 = idf(docFreq=2, maxDocs=6)
0.21875 = fieldNorm(doc=5)
/str
str name=1
0.06356779 = (MATCH) product of:
  0.12713557 = (MATCH) sum of:
0.12713557 = (MATCH) weight(text:seat in 0) [DefaultSimilarity], result
of:
  0.12713557 = score(doc=0,freq=1.0 = termFreq=1.0
), product of:
0.5085423 = queryWeight, product of:
  1.0 = idf(docFreq=5, maxDocs=6)
  0.5085423 = queryNorm
0.25 = fieldWeight in 0, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  1.0 = idf(docFreq=5, maxDocs=6)
  0.25 = fieldNorm(doc=0)
  0.5 = coord(1/2)
/str
str name=3
0.06356779 = (MATCH) product of:
  0.12713557 = (MATCH) sum of:
0.12713557 = (MATCH) weight(text:seat in 2) [DefaultSimilarity], result
of:
  0.12713557 = score(doc=2,freq=1.0 = termFreq=1.0
), product of:
0.5085423 = queryWeight, product of:
  1.0 = idf(docFreq=5, maxDocs=6)
  0.5085423 = queryNorm
0.25 = fieldWeight in 2, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  1.0 = idf(docFreq=5, maxDocs=6)
  0.25 = fieldNorm(doc=2)
  0.5 = coord(1/2)
/str
str name=5
0.055621814 = (MATCH) product of:
  0.11124363 = (MATCH) sum of:
0.11124363 = (MATCH) weight(text:seat in 4) [DefaultSimilarity], result
of:
  0.11124363 = score(doc=4,freq=1.0 = termFreq=1.0
), product of:
0.5085423 = queryWeight, product of:
  1.0 = idf(docFreq=5, maxDocs=6)
  0.5085423 = queryNorm
0.21875 = fieldWeight in 4, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  1.0 = idf(docFreq=5, maxDocs=6)
  0.21875 = fieldNorm(doc=4)
  0.5 = coord(1/2)
/str
/lst
str name=QParserLuceneQParser/str


Sample data

 add
  doc
field name=id1/field
field name=nameDoc 1/field
field name=textThis has a rear window defroster and really cool
bucket seats./field
  /doc
  doc
field name=id2/field
field name=nameDoc 2/field
field name=textThis one has rear seat cushions and air conditioning
– what a ride!/field
  /doc
  doc
field name=id3/field
field name=nameDoc 3/field
field name=textThis one has gold seat belts front and rear./field
  /doc
  doc
field name=id4/field
field name=nameDoc 4/field
field name=textThis one has front and side air bags and a heated
seat.The fan belt never breaks./field
  /doc
doc
field name=id5/field
field name=nameDoc 5/field
field name=textThis one has big rear wheels and a seat cushion.It
doesn't have a timing belt./field
  /doc
   

Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2014-12-12 Thread shamik
Jim,

  Thanks for your response. I've tried including
AutoPhrasingTokenFilterFactory as part of the query analyzer, but didn't
make any difference.

fieldType name=text_autophrase class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /
filter 
class=com.lucidworks.analysis.AutoPhrasingTokenFilterFactory
phrases=autophrases.txt includeTokens=true /
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt
ignoreCase=true expand=true /
filter class=solr.PorterStemFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /
filter 
class=com.lucidworks.analysis.AutoPhrasingTokenFilterFactory
phrases=autophrases.txt includeTokens=true /
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.PorterStemFilterFactory/
/analyzer
/fieldType

I'll try out your version and post my observation. Just curios, what version
of Solr are you using?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Have-anyone-used-Automatic-Phrase-Tokenization-AutoPhrasingTokenFilterFactory-tp4173808p4174096.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2014-12-12 Thread shamik
Ted,

  Thanks a lot, I had gone through your blogs but the white space issue
slipped out of my mind. replaceWhitespaceWith addressed the issue. I think
it's a great filter to have, surely takes care of an important use case.
Appreciate your help.

-Shamik 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Have-anyone-used-Automatic-Phrase-Tokenization-AutoPhrasingTokenFilterFactory-tp4173808p4174113.html
Sent from the Solr - User mailing list archive at Nabble.com.


Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2014-12-11 Thread shamik
Hi, 

  I'm trying to use AutoPhrasingTokenFilterFactory which seems to be a 
great solution to our phrase query issues. But doesn't seem to work as 
mentioned in the blog : 

https://lucidworks.com/blog/automatic-phrase-tokenization-improving-lucene-search-precision-by-more-precise-linguistic-analysis/

The tokenizer is working as expected during query time, where it's 
preserving the phrases as a single token based on the text file. Here's my 
field definition : 

fieldType name=text_autophrase class=solr.TextField 
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /
filter 
class=com.lucidworks.analysis.AutoPhrasingTokenFilterFactory 
phrases=autophrases.txt includeTokens=true /
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt 
ignoreCase=true expand=true /
filter class=solr.KStemFilterFactory /
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.KStemFilterFactory /
/analyzer
/fieldType

On analyzing, I can see the phrase seat cushions (defined in 
autophrases.txt) is being indexed as seat, seat cushions and cushion. 

The problem is during the query time. As per the blog, the request handler 
needs to use a custom query parser to achieve the result. Here's my entry 
in solrconfig. 

requestHandler name=/autophrase class=solr.SearchHandler
lst name=defaults

str name=wtvelocity/str
str name=v.templatebrowse/str
str name=v.layoutlayout/str
str name=titleSolritas/str

str name=echoParamsexplicit/str
int name=rows10/int
str name=dftext/str
str name=defTypeautophrasingParser/str
/lst
/requestHandler

queryParser name=autophrasingParser 
class=com.lucidworks.analysis.AutoPhrasingQParserPlugin 
str name=phrasesautophrases.txt/str
/queryParser

But if I query seat cushions  using this request handler, it's seemed to 
be treating the query as two separate terms and returning all results 
matching seat and cushion. Not sure what I'm missing here. I'm using 
Solr 4.10. 

The other question I had is whether 
com.lucidworks.analysis.AutoPhrasingQParserPlugin supports the edismax 
features which is my default parser. 

I'll appreciate if anyone provide their feedback. 

-Thanks 
Shamik



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Have-anyone-used-Automatic-Phrase-Tokenization-AutoPhrasingTokenFilterFactory-tp4173808.html
Sent from the Solr - User mailing list archive at Nabble.com.


Has anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2014-12-10 Thread Shamik Bandopadhyay
Hi,

  I'm trying to use AutoPhrasingTokenFilterFactory which seems to be a
great solution to our phrase query issues. But doesn't seem to work as
mentioned in the blog :

https://lucidworks.com/blog/automatic-phrase-tokenization-improving-lucene-search-precision-by-more-precise-linguistic-analysis/

The tokenizer is working as expected during query time, where it's
preserving the phrases as a single token based on the text file. Here's my
field definition :

fieldType name=text_autophrase class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /
filter class=com.lucidworks.analysis.AutoPhrasingTokenFilterFactory
phrases=autophrases.txt includeTokens=true /
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true /
filter class=solr.KStemFilterFactory /
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.KStemFilterFactory /
/analyzer
/fieldType

On analyzing, I can see the phrase seat cushions (defined in
autophrases.txt) is being indexed as seat, seat cushions and cushion.

The problem is during the query time. As per the blog, the request handler
needs to use a custom query parser to achieve the result. Here's my entry
in solrconfig.

requestHandler name=/autophrase class=solr.SearchHandler
lst name=defaults
!-- VelocityResponseWriter settings --
str name=wtvelocity/str
str name=v.templatebrowse/str
str name=v.layoutlayout/str
str name=titleSolritas/str

str name=echoParamsexplicit/str
int name=rows10/int
str name=dftext/str
str name=defTypeautophrasingParser/str
/lst
/requestHandler

queryParser name=autophrasingParser
class=com.lucidworks.analysis.AutoPhrasingQParserPlugin 
str name=phrasesautophrases.txt/str
/queryParser

But if I query seat cushions  using this request handler, it's seemed to
be treating the query as two separate terms and returning all results
matching seat and cushion. Not sure what I'm missing here. I'm using
Solr 4.10.

The other question I had is whether
com.lucidworks.analysis.AutoPhrasingQParserPlugin supports the edismax
features which is my default parser.

I'll appreciate if anyone provide their feedback.

-Thanks
Shamik


Highlighting simple.pre and simple.post values getting ignored

2014-11-10 Thread Shamik Bandopadhyay
Hi,

  I'm facing a weird issue where the specified hl.simple.pre and
hl.simple.post values for highlighting is getting ignored. In my test
handler, I've the following entry:

!-- Highlighting defaults --
str name=hltrue/str
str name=hl.simple.pre![CDATA[span class=vivbold qt0]]/str
str name=hl.simple.post![CDATA[/span]]/str
str name=hl.flname subject/str
str name=hl.encoderhtml/str
str name=f.subject.hl.fragsize200/str
str name=hl.usePhraseHighlighterfalse/str
str name=hl.useFastVectorHighlightertrue/str
str name=hl.boundaryScannerbreakIterator/str


 searchComponent class=solr.HighlightComponent name=highlight
highlighting
  fragmenter name=gap
  default=true
  class=solr.highlight.GapFragmenter
lst name=defaults
  int name=hl.fragsize100/int
/lst
  /fragmenter

  fragmenter name=regex
  class=solr.highlight.RegexFragmenter
lst name=defaults
  int name=hl.fragsize70/int
  float name=hl.regex.slop0.5/float
  str name=hl.regex.pattern[-\w ,/\n\quot;apos;]{20,200}/str
/lst
  /fragmenter

  formatter name=html
 default=true
 class=solr.highlight.HtmlFormatter
lst name=defaults
  str name=hl.simple.pre![CDATA[span class=vivbold
qt0]]/str
  str name=hl.simple.post![CDATA[/span]]/str
/lst
  /formatter

  encoder name=html
   class=solr.highlight.HtmlEncoder /

  fragListBuilder name=simple
   class=solr.highlight.SimpleFragListBuilder/

  fragListBuilder name=single
   class=solr.highlight.SingleFragListBuilder/

  fragListBuilder name=weighted
   default=true
   class=solr.highlight.WeightedFragListBuilder/

  !-- default tag FragmentsBuilder --
  fragmentsBuilder name=default
default=true
class=solr.highlight.ScoreOrderFragmentsBuilder
  /fragmentsBuilder

  !-- multi-colored tag FragmentsBuilder --
  fragmentsBuilder name=colored
class=solr.highlight.ScoreOrderFragmentsBuilder
lst name=defaults
  str name=hl.tag.pre![CDATA[
   b style=background:yellow,b
style=background:lawgreen,
   b style=background:aquamarine,b
style=background:magenta,
   b style=background:palegreen,b
style=background:coral,
   b style=background:wheat,b style=background:khaki,
   b style=background:lime,b
style=background:deepskyblue]]/str
  str name=hl.tag.post![CDATA[/b]]/str
/lst
  /fragmentsBuilder

  boundaryScanner name=default
   default=false
   class=solr.highlight.SimpleBoundaryScanner
lst name=defaults
  str name=hl.bs.maxScan10/str
  str name=hl.bs.chars.,!? #9;#10;#13;/str
/lst
  /boundaryScanner

  boundaryScanner name=breakIterator
   class=solr.highlight.BreakIteratorBoundaryScanner
lst name=defaults
  !-- type should be one of CHARACTER, WORD(default), LINE and
SENTENCE --
  str name=hl.bs.typeSENTENCE/str
  !-- language and country are used when constructing Locale
object.  --
  !-- And the Locale object will be used when getting instance of
BreakIterator --
  str name=hl.bs.languageen/str
  str name=hl.bs.countryUS/str
/lst
  /boundaryScanner
/highlighting
  /searchComponent

As you can see, I've specified the simple.pre and simple.post values in the
request handler as well as under standard formatter.

But, search result is always wrapping the term with em/em, not sure
where is this value coming from. There's no reference of it in solrconfig
file. Looks like it's ignoring the value from solrconfig and defaulting it
to em.

Can someone provide any pointer ? I'm using Solr 4.7.

Thanks,
Shamik


Re: Highlighting simple.pre and simple.post values getting ignored

2014-11-10 Thread shamik
Looks like this has to do with the selection of  fast vector and
breakIterator as boundary scanner. I'm using them to make sure that the
highlighted snippet starts from the beginning of a  sentence and not from
the middle.

str name=hl.usePhraseHighlighterfalse/str
str name=hl.useFastVectorHighlightertrue/str
str name=hl.boundaryScannerbreakIterator/str

Now, if I don't use them, I'm getting the right pre and post tags.

str name=hlon/str
str name=hl.fltitle name/str
str name=hl.encoderhtml/str
str name=hl.simple.pre/str
str name=hl.simple.post/str
str name=f.title.hl.fragsize0/str
str name=f.title.hl.alternateFieldmanu/str
str name=f.name.hl.fragsize0/str
str name=f.name.hl.alternateFieldname/str
str name=f.content.hl.snippets3/str
str name=f.content.hl.fragsize200/str


Do i need any separate setting or breakIterator to support custom pre and
post tags?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-simple-pre-and-simple-post-values-getting-ignored-tp4168657p4168662.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Highlighting simple.pre and simple.post values getting ignored

2014-11-10 Thread shamik
Found the issue, to use FastVectorHighlighter, the pre and post tag syntax
are different

str name=hl.tag.pre/str
str name=hl.tag.post/str

This worked out as expected.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-simple-pre-and-simple-post-values-getting-ignored-tp4168657p4168663.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Boost Query (bq) syntax/usage

2014-10-01 Thread shamik
Thanks a lot Jack, it makes total sense. I check the config and default q.op
was set to OR, which was influencing the query.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Boost-Query-bq-syntax-usage-tp4161989p4162169.html
Sent from the Solr - User mailing list archive at Nabble.com.


Boost Query (bq) syntax/usage

2014-09-30 Thread shamik
Hi,

  I'm little confused with the right syntax of defining boost queries. If I
use them in the following way:

http://localhost:8983/solr/testhandler?q=Application+Managerbq=(Source2:sfdc^6
Source2:downloads^5 Source2:topics^3)debugQuery=true

it gets translated to --

arr name=parsed_boost_queries
   str
   +Source2:sfdc^6.0 +Source2:downloads^5.0 +Source2:topics^3.0
   /str
/arr

Now, if I use the following query:

http://localhost:8983/solr/testhandler?q=Application+Managerbq=Source2:sfdc^6bq=Source2:downloads^5bq=Source2:topics^3debugQuery=true

gets translated as --

arr name=parsed_boost_queries
strSource2:sfdc^6.0/str
strSource2:downloads^5.0/str
strSource2:topics^3.0/str
/arr

Both queries generate different result in terms of relevancy. Just wondering
what is the right way of using bq ?

-Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Boost-Query-bq-syntax-usage-tp4161988.html
Sent from the Solr - User mailing list archive at Nabble.com.


Boost Query (bq) syntax/usage

2014-09-30 Thread Shamik Bandopadhyay
Hi,

  I'm little confused with the right syntax of defining boost queries. If I
use them in the following way:

http://localhost:8983/solr/testhandler?q=Application+Managerbq=(Source2:sfdc^6
Source2:downloads^5 Source2:topics^3)debugQuery=true

it gets translated to --

arr name=parsed_boost_queries
   str
   +Source2:sfdc^6.0 +Source2:downloads^5.0 +Source2:topics^3.0
   /str
/arr

Now, if I use the following query:

http://localhost:8983/solr/testhandler?q=Application+Managerbq=Source2:sfdc
^6bq=Source2:downloads^5bq=Source2:topics^3debugQuery=true

gets translated as --

arr name=parsed_boost_queries
strSource2:sfdc^6.0/str
strSource2:downloads^5.0/str
strSource2:topics^3.0/str
/arr

Both queries generate different result in terms of relevancy. Just
wondering what is the right way of using bq ?

-Thanks


Re: Boost Query (bq) syntax/usage

2014-09-30 Thread shamik
Thanks a lot Jack, makes sense. Just curios, if we used the following bq
entry in solrconfig xml 

str name=bqSource2:sfdc^6 Source2:downloads^5 Source2:topics^3/str 

will it always be treated as an AND query ? Some of local results suggests
otherwise.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Boost-Query-bq-syntax-usage-tp4161989p4161994.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr query field (qf) conditional boost

2014-09-29 Thread Shamik Bandopadhyay
Hi,

  I'm trying to check if it's possible to include a conditional boosting in
Solr qf field. For e.g. I've the following entry in qf parameter.

str name=qftext^0.5 title^10.0 ProductLine^5/str

What I'm looking is to add the productline boosting only for a given Author
field, something in the lines boost ProductLine^5 if Author:Tom.

I've been using a similar filtering in appends section, but not sure how
to do it in qf or whether it's possible.


lst name=appends
str name=fqAuthor:(Tom  +Solution:yes) /str
/lst

Any pointers will be appreciated.

Thanks,
Shamik


RE: Solr query field (qf) conditional boost

2014-09-29 Thread shamik
Thanks Markus. Well, I tried using a conditional if-else function, but it
doesn't seem to work for boosting field. What I'm trying to do is boost
ProductLine field by 5, if the result documents contain Author = 'Tom'.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-query-field-qf-conditional-boost-tp4161783p4161797.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr query field (qf) conditional boost

2014-09-29 Thread shamik
Thanks Markus, let me play around with the functions and see if I can achieve
the results.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-query-field-qf-conditional-boost-tp4161783p4161803.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to query certain fields filtered by a condition

2014-09-29 Thread Shamik Bandopadhyay
Hi,

  Just wanted to understand if it's possible to limit a searchable field
only to specific documents during query time. Following are my searchable
fields.

str name=qftext^0.5 title^10.0 country^1.0/str

What I want is to make country a searchable field only for documents which
contain author:Robert. For remaining documents, country should not be
considered as a searchable field, only text and title will come into play.
So If I search for usa, it should bring result from documents where
author=Robert (by matching country field), but not for remaining authors
even if they've a country field with value usa.

I don't how it can be done during query time or if it's possible at all
through some function queries. The other option is to add the country value
as part of title or text for documents containing Author:Robert during
index time. But I would like to know if its possible during query time.

Appreciate your feedback.

-Thanks,
Shamik


Re: How to query certain fields filtered by a condition

2014-09-29 Thread shamik
Thanks Jack for your reply ... I'm sorry but I'm not too clear on the
solution you proposed. Can you please provide a sample on what you suggested
?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-query-certain-fields-filtered-by-a-condition-tp4161815p4161827.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Czech stemmer

2014-09-11 Thread shamik
Lucas,
 
  Thanks for the information. I took the dictionary and used hunspell
stemmer. It worked for the use-case I had mentioned, i.e. posunout and
posunulo. But, it had an impact on other search terms. For e.g. a search
term ukončit or ukončí is not returning any result, though they work
with CzechStemFilterFactory. I know there'll be trade-offs with various
stemmers, but not sure which one fits the bill. Being an alien to Czech
language doesn't help the cause either.

Thanks,
Shamik



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Czech-stemmer-tp4157675p4158301.html
Sent from the Solr - User mailing list archive at Nabble.com.


Czech stemmer

2014-09-09 Thread Shamik Bandopadhyay
Hi,

  I'm facing stemming issues with the Czech language search. Solr/Lucene
currently provides CzechStemFilterFactory as the sole option. Snowball
Porter doesn't seem to be available for Czech. Here's the issue.

I'm trying to search for posunout (means move in English) which returns
result, but fails if I use ''posunulo (means moved in English). I used the
following text as field for search.

Pomocí multifunkčních uzlů je možné odkazy mnoha způsoby upravovat. Můžete
přidat a odstranit odkazy, přidat a odstranit vrcholy, prodloužit nebo
přesunout prodloužení čáry nebo přesunout text odkazu. Přístup k požadované
možnosti získáte po přesunutí ukazatele myši na uzel. Z uzlu prodloužení
čáry můžete zvolit tyto možnosti: Protáhnout: Umožňuje posunout prodloužení
odkazové čáry. Délka prodloužení čáry: Umožňuje prodloužit prodloužení
čáry. Přidat odkaz: Umožňuje přidat jednu nebo více odkazových čar. Z uzlu
koncového bodu odkazu můžete zvolit tyto možnosti: Protáhnout: Umožňuje
posunout koncový bod odkazové čáry. Přidat vrchol: Umožňuje přidat vrchol k
odkazové čáře. Odstranit odkaz: Umožňuje odstranit vybranou odkazovou čáru.
Z uzlu vrcholu odkazu můžete zvolit tyto možnosti: Protáhnout: Umožňuje
posunout vrchol. Přidat vrchol: Umožňuje přidat vrchol na odkazovou čáru.
Odstranit vrchol: Umožňuje odstranit vrchol. 

Just wondering if there's a different stemmer available or a way to address
this.

Schema :

fieldType name=text_csy class=solr.TextField
positionIncrementGap=100 autoGeneratePhraseQueries=true 
analyzer  type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_cz.txt /
filter class=solr.SynonymFilterFactory synonyms=synonyms_csy.txt
ignoreCase=true expand=true/
filter class=solr.CzechStemFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_cz.txt /
filter class=solr.CzechStemFilterFactory/
/analyzer
/fieldType

Any pointers will be appreciated.

- Thanks,
Shamik


Re: solr query gives different numFound upon refreshing

2014-09-04 Thread shamik
I've noticed similar behavior with our Solr cloud cluster for a while, it's
random though. We've 2 shards with 3 replicas each. At times, I've observed
that the same query on refresh will fetch different results (numFound) as
well as the content. The only way to mitigate is to refresh the index with
the documents till the nodes are in sync. I always use SolrJ which talks to
Solr through zookeeper, even with that it seemed to be unavoidable at times.
We are committing every 10 mins. I'm pretty much sure there's a minor glitch
which creates a sync issue at times. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-query-gives-different-numFound-upon-refreshing-tp4155414p4157026.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR-6143 Bad facet counts from CollapsingQParserPlugin

2014-07-10 Thread shamik
Are there any plans to release this feature anytime soon ? I think this is
pretty important as a lot of search use case are dependent on the facet
count being returned by the search result. This issue renders renders the
CollapsingQParserPlugin pretty much unusable. I'm now reverting back to the
old group query (painfully slow) since I can't use the facet count anymore.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/RE-SOLR-6143-Bad-facet-counts-from-CollapsingQParserPlugin-tp4140455p4146645.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Does solrj support partial update for solr cloud?

2014-07-10 Thread shamik
Yes it does and pretty straight forward.

Refer to following url :

http://heliosearch.org/solr/atomic-updates/

http://www.mumuio.com/solrj-4-0-0-alpha-atomic-updates/



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-solrj-support-partial-update-for-solr-cloud-tp4146654p4146660.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to get related facets using Solr query ?

2014-07-04 Thread shamik
Thanks for the pointer Eric. You are right, I forgot to include IJK under
AB. Also, facet field names are different. Unfortunately, I'm using
Solrcloud and facet pivot doesn't seem to work in a distributed mode. I'll
get back some result if I use distrib=false, but then it's not the right
data.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-get-related-facets-using-Solr-query-tp4145580p4145684.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MLT weird behaviour in Solrcloud

2014-07-03 Thread shamik
Anyone ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/MLT-weird-behaviour-in-Solrcloud-tp4145066p4145502.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to get related facets using Solr query ?

2014-07-03 Thread Shamik Bandopadhyay
Hi,

   I've trying construct a facet query to organize related facets in the
response. Let me illustrate a sample. Let's say I've the following
documents indexed in Solr.

1. Doc A --
  Facet:AB
  Facet:MNO

2. Doc B --
  Facet:CD
  Facet:XYZ

3. Doc C -- Facet:AB,CD
   Facet:IJK, XYZ


Now, I want the result organized as :

AB
MNO,XYZ
CD
IJK,XYZ

Is there a way to do this ?

Thanks,
Shamik


MLT weird behaviour in Solrcloud

2014-07-01 Thread Shamik Bandopadhyay
Hi,

  I'm trying to use mlt request handler in a Solrcloud cluster.
Apparently, its showing some weird behavior. I'm getting response randomly,
it's able to return results randomly for the same query. I'm using Solrj
client which in turn communicates the cluster using zookeeper ensemble.
Here's my mlt request handler.

!-- mlt request handler --
requestHandler name=/mlt class=solr.MoreLikeThisHandler
lst name=defaults
str name=omitHeadertrue/str
str name=echoParamsexplicit/str
str name=wtvelocity/str
str name=v.templatebrowse/str
str name=v.contentTypetext/html;charset=UTF-8/str
str name=v.layoutlayout/str
str name=v.channelmlt/str
str name=titleProject Sunshine - Mlt/str
str name=mlt.fltitle,text,language,caaskey/str
int name=mlt.mintf2/int
int name=mlt.mindf1/int
int name=mlt.minwl3/int
int name=mlt.maxwl1000/int
int name=mlt.maxqt50/int
int name=mlt.maxntp5000/int
str name=rows4/str
bool name=mlt.boosttrue/bool
str name=mlt.qftitle,textlanguage,caaskey/str
!--str name=mlt.interestingTermsdetails/str--
!-- Shard Tolerant --
str name=shards.toleranttrue/str
lst name=appends
str name=fqSource2:(TestSource OR help/str
/lst
str name=shards.qt/mlt/str
/lst
/requestHandler

Here's a sample query :

http://stage-int***.com/solr/mlt?fq=language:englishfq={!collapse
field=dedup}q=caaskey:caas/documentation/files/GUID-EDC69C3shards.qt=/mltshard.keys=enu/8!wt=xml

I've tried removing collapsing and composite key from the query, but it
didn't make any difference. I've 2 shards with a replica each. Weird part
is, same shard/replica which returns result for a given request, behaves
differently next time, i.e. doesn't return data at all. If I use any other
request handler, I'm getting response back for the given query. So,
something is not right with the mlt request handler.

Is this a known issue with solrcloud ? Any pointer will be appreciated.

Thanks,
Shamik


Re: MLT weird behaviour in Solrcloud

2014-07-01 Thread shamik
Sorry, that's a typo when I copied the mlt definition from my solrconfig, but
there's comma in my test environment. It's not the issue.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/MLT-weird-behaviour-in-Solrcloud-tp4145066p4145145.html
Sent from the Solr - User mailing list archive at Nabble.com.


Can we do conditional boosting using edismax ?

2014-06-11 Thread Shamik Bandopadhyay
Hi,

  I'm using edismax parser to perform a runtime boosting. Here's my sample
request handler entry.

str name=qftext^2 title^3/str
str name=bqSource:Blog^3 Source2:Videos^2/str
str name=bfrecip(ms(NOW/DAY,PublishDate),3.16e-11,1,1)^2.0/str

As you can see, I'm adding weights to text and title, as well as, boosting
on source. What I'm trying to see is if there's a way to change the the
weights based on Source.E.g. for source Blog, I would like to have the
following boost text^3 title^2 while for source Videos , I prefer
text^2 title^3.

Any pointers will be appreciated.

Thanks,
Shamik


Re: Can we do conditional boosting using edismax ?

2014-06-11 Thread shamik
Thanks Ahmet, I'll give it a shot.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-we-do-conditional-boosting-using-edismax-tp4141131p4141268.html
Sent from the Solr - User mailing list archive at Nabble.com.


Problem with French stopword filter

2014-05-28 Thread Shamik Bandopadhyay
 centre du cercle dont l'arc fait partie. Point de départ
Spécifiez le point de départ de l'arc. Extrémité Trace un arc dans le sens
trigonométrique depuis le point de départ (2) jusqu'au point situé sur une
demi-droite imaginaire tracée du centre (1) jusqu'au point final (3). Angle
Dessine un arc dans le sens trigonométrique à partir du point de départ
(2), en utilisant un centre (1), avec un angle décrit spécifié. Si l'ange
est négatif, un arc est tracé dans le sens horaire. Longueur de corde Trace
un grand ou un petit arc en respectant la distance en ligne droite entre le
point de départ et le point d'arrivée. Si la longueur de corde est
positive, le petit arc est tracé dans le sens trigonométrique à partir du
point de départ. Si la longueur de corde est négative, le grand arc est
tracé dans le sens trigonométrique. Tangente à la dernière ligne, à l'arc
ou à la polyligne Dessine un arc tangent à la dernière ligne, à la
polyligne ou à l'arc dessiné lorsque vous appuyez sur ENTREE à la première
invite. Extrémité de l'arc Spécifiez un point (1).
/field
/doc

Query
=
http://localhost:8983/solr/browse?q=arc de cercle

When I ran through the query term in admin analysis, the stopword filter
seemed to be working , but not when the actual search is happening.

Any pointers will be appreciated.

Thanks,
Shamik


Re: Problem with French stopword filter

2014-05-28 Thread shamik
Turned out to be a weird exception. Apparently, the comments in the
stopwords_fr.txt disrupts the stop filter factory. After I stripped off the
comments, it worked as expected. 

Referred to this thread :
http://mail-archives.apache.org/mod_mbox/lucene-dev/201309.mbox/%3CJIRA.12668581.1379112889603.133757.1379118831671@arcas%3E



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-French-stopword-filter-tp4138545p4138550.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem with French stopword filter

2014-05-28 Thread shamik
I found the issue. It had to do with edismax qf entry in request handler. I
had the following entry :

str name=qfname_fra^1.2  title_fra^10.0 description_fra^5.0 
author^1/str

Except for author, all other fields are of type adsktext_fra, while author
was of the type text_general, which uses english stopfilter. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-French-stopword-filter-tp4138545p4138561.html
Sent from the Solr - User mailing list archive at Nabble.com.


Question on 3-level composite-id routing

2014-05-19 Thread Shamik Bandopadhyay
Hi,

  Need some clarification on multilevel composite-id routing in SolrCloud
.I'm currently composite id routing using the following pattern *topic!url* .
This is aimed at run-time de-duplication based on topic field. As I'm
adding support for language search, I felt the need to include language
parameter for better multi-tenancy. Here's the new key structure I'm
thinking of -- *language!topic!url*.
An example would be : english!12345!www.testurl.com

Now, during query time, I'll always have language parameter at my disposal.
I was thinking,of leveraging the shard.key parameter to specify
*shard.keys=language!
*, which will route the request to the right shard and bring back english
content. Is this a valid assumption ?

Also, as per my understanding, the three fields will default to 8,8 and 16
bits of the routing hash. What'll be a valid scenario for providing
a custom allocation of bits for these fields. I was referring to the
following article http://searchhub.org/2014/01/06/10590/ , but was not
entirely sure on this section.


*At query time:*

*To query all records for myapp: shard.keys=myapp/8!*

*Note the explicit mention of 8 bits in case of querying by component 1
only i.e. app level. This is required because the usage of the router as 2
or 3 level isn’t implicit. Specifying ’8′ bits for the component highlights
the use of ’3′ level router*.
Any feedback will be much appreciated.

Thanks,
Shamik


Re: Question on 3-level composite-id routing

2014-05-19 Thread shamik
Awesome, thanks a lot Anshum, makes total sense now. Appreciate your help.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-on-3-level-composite-id-routing-tp4137044p4137071.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: What are the best practices on Multiple Language support in Solr Cloud ?

2014-05-05 Thread shamik
Thanks Nicole. Leveraging dynamic field definitions is a great idea. Probably
work for me as I've a bunch of fields which are indexed as String. Just
curious about the sharding, are you using Solr Cloud. I thought of taking
the dedicated shard / core route , but then, as using a composite key (for
dedup), managing dedicated core can cause issues at times.

As far as single field representation, thanks for validating my concern.
Probably its best to use when you've to address a multi-lingual search.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-are-the-best-practices-on-Multiple-Language-support-in-Solr-Cloud-tp4134006p4134743.html
Sent from the Solr - User mailing list archive at Nabble.com.


What are the best practices on Multiple Language support in Solr Cloud ?

2014-04-30 Thread Shamik Bandopadhyay
Hi,

  I'm  trying to implement multiple language support in Solr Cloud (4.7).
Although we've different languages in index, we were only supporting
english in terms of index and query. To provide some context, our current
index size is 35 GB with close to 15 million documents. We've two shards
with two replicas per shard. I'm using composite id to support
de-duplication, which puts the documents having the same field (dedup)
value to a specific shard.
Language is known prior to for every document being indexed. That saves the
need for runtime language detection. Similarly, during query, the language
will be known as well. To extend it, there's no need for multi-lingual
support.

Based on my understanding so far, there are three approaches which are
widely adopted. Multi-field indexing, Multi-Core indexing and Multiple
language in one field (based from Solr in Action).

First option seems easy to implement. But then, I've around 40 fields which
are getting indexed currently, though a majority of them are type=string
and not being analyzed. I'm planning to support around 10 languages, which
translates to 400 field definitions in the same schema. And this is poised
to grow with addition of languages and fields. My apprehension is whether
this approach becomes a maintenance nightmare ? Does it affect overall
scalability ? Does is affect any existing features like Suggester,
Spellcheck, etc. ? I was thinking of including language as part of the id
key. It'll look like Language!Dedup_id!url so that documents are spread
across the two shards.

Second option of a dedicated core sounds easy in terms of maintaining
config files. Also,routing requests will be fairly easy as the language
will be always known up-front,both during indexing and query time. But, as
I looked into the documents, 60% of our total index will be in English,
while rest 40% will constitute remaining 10-14 languages. Some language
content are in few thousands which perhaps doesn't merit a dedicate core.
On top of that, this approach has the potential of getting into a complex
infrastructure, which might be hard to maintain.

I read about the use of multiple language in a single field in Trey
Grainger's book. It looks like a great approach but not sure if it is meant
to address my scenario. My first impression is that it's more geared
towards supporting multi-lingual, but I maybe completely wrong. Also, this
is not supported by Solr / Lucene out of the box.

I know there's a lot of people in this group who have excelled as far as
supporting multiple language in Solr is concerned. I'm trying to gather
their inputs / experience on the best practice to help me decide the right
approach. Any pointer on this will be highly appreciated.

Thanks,
Shamik


Solr 4.7 not showing parsedQuery / parsedquery_toString information

2014-04-24 Thread shamik
,f.ADSKDocumentType.facet.mincount=1,f.ADSKAudience.facet.limit=-1,isShard=true,f.ADSKProductLine.facet.limit=-1}},response={numFound=0,start=0,maxScore=0.0,docs=[]},sort_values={},facet_counts={facet_queries={},facet_fields={ADSKProductLine={},ADSKContentGroup={},ADSKReleaseYear={},ADSKHelpTopic={},ADSKDocumentType={},ADSKAudience={}},facet_dates={},facet_ranges={}},debug={}}/str
/lst
 /lst
  /lst
  lst name=explain /
   /lst

Here's the sample query :
http://localhost:8983/solr/adskhelpportal?q=How%20can%20I%20obtain%20local%20offline%20Helpwt=xmldebugQuery=truerows=1

I'm using SolrCloud with 2 shards and a replica each. I'm getting
parsedQuery / parsedQueryString information if I use the earlier version.

Do I change something in the configuration ? Any pointers will be
appreciated.

Thanks,
Shamik



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-7-not-showing-parsedQuery-parsedquery-toString-information-tp4132964.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr 4.7 not showing parsedQuery / parsedquery_toString information

2014-04-24 Thread Shamik Bandopadhyay
Help,f.ADSKDocumentType.facet.mincount=1,f.ADSKAudience.facet.limit=-1,isShard=true,f.ADSKProductLine.facet.limit=-1}},response={numFound=0,start=0,maxScore=0.0,docs=[]},sort_values={},facet_counts={facet_queries={},facet_fields={ADSKProductLine={},ADSKContentGroup={},ADSKReleaseYear={},ADSKHelpTopic={},ADSKDocumentType={},ADSKAudience={}},facet_dates={},facet_ranges={}},debug={}}/str
/lst
 /lst
  /lst
  lst name=explain /
   /lst

Here's the sample query :
http://localhost:8983/solr/adskhelpportal?q=How%20can%20I%20obtain%20local%20offline%20Helpwt=xmldebugQuery=truerows=1

I'm using SolrCloud with 2 shards and a replica each. I'm getting
parsedQuery / parsedQueryString information if I use the earlier version.

Do I change something in the configuration ? Any pointers will be
appreciated.

Thanks,
Shamik


Re: CollapsingQParserPlugin returning different result set

2014-03-18 Thread shamik
Joel,

   I had a discussion with you earlier related ngroup inconsistent number
when you suggested to use the composite id to make sure that identical
(ADSKDedup) fields are available in the same shard. 

Here's the thread --
http://lucene.472066.n3.nabble.com/SolrCloud-Result-Grouping-vs-CollapsingQParserPlugin-td4111331.html

After making that change, the number of results returned matched with the
numfound parameter. I'm using the same setup after I upgraded to Solr 4.7
and started using CollapsingQParserPlugin API. 

I take a quick look at some of the ids, the composite ids look to be
correct. One thing I've noticed is that the difference in relevance and
number seems to be directly proportional to the number of documents in the
result.

I'll try to create a small set of documents in a local Solr cloud and see if
I can replicate the problem. In that way, it'll be probably easy for you to
look.

Regards,
Shamik



--
View this message in context: 
http://lucene.472066.n3.nabble.com/CollapsingQParserPlugin-returning-different-result-set-tp4123716p4125290.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: CollapsingQParserPlugin returning different result set

2014-03-17 Thread shamik
Hi Joel,  Thanks for taking a look into this. Here's the information you had
requested.*ADSKDedup:*I've attached separate files for debug information for
each query.Let me know if you need any information.Regards,Shamik

CollapsingQParserPlugin_Query_Debug.txt
http://lucene.472066.n3.nabble.com/file/n4124968/CollapsingQParserPlugin_Query_Debug.txt
  

Group_Query_Debug.txt
http://lucene.472066.n3.nabble.com/file/n4124968/Group_Query_Debug.txt  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/CollapsingQParserPlugin-returning-different-result-set-tp4123716p4124968.html
Sent from the Solr - User mailing list archive at Nabble.com.

SolrCloud - inconsistent result for the same query

2014-03-17 Thread shamik
Hi,
   
   I'm using SolrCloud 4.4 version with 2 shards having 2 replica each.
Lately, I'm observing issues where an obsolete document will suddenly show
up in search result. I'm crawling a bunch of source system on a daily basis
and updating the Solr index. Now, when I'm searching for a specific content
based on the url , it suddenly returns an old content which was updated by
the last crawl. This behavior is in-consistent, seems like it randomly picks
the old and new content. Here's the field signature in question.

field name=ADSKCaasContent type=string indexed=false stored=true
multiValued=true required=false/

The field is not indexed and being used for storing the data. I'm using a
composite key which distributes documents among shards based on a specific
field.

I can't think of any possible reason except for Solr cache. Based on the
Solr logs, it looks like, one of the shard/replicas are holding on to the
old value for some reason. I'm using a haproxy to perform a round-robin
request to any of the 6 servers (2 shard, 4 replicas). Ideally, a full crawl
should have updated the cache with the new set of data. I even re-started
the instance, but the problem seems to persist. 

I'll appreciate if someone can provide their feedback.

Regards,
Shamik



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-inconsistent-result-for-the-same-query-tp4125005.html
Sent from the Solr - User mailing list archive at Nabble.com.


CollapsingQParserPlugin returning different result set

2014-03-14 Thread shamik
Hi,

  I recently upgraded to 4.7, with the aim of replacing group queries with
CollapsingQParserPlugin. As I'm comparing results between the two APIs,
CollapsingQParserPlugin seems to be way off, in terms of relevancy and
result count. Here's an example :

*Group query*
http://test-dev.mydomain.com/solr/adskhelpportal?fq=language:(english)wt=xmlrows=40start=0fq=(ContentGroup-local:Learn
 Explore OR ContentGroup-local:Getting Started OR
ContentGroup-local:Troubleshooting)fq=Product:PRDq=linesort=score
descgroup=truegroup.field=ADSKDedupgroup.ngroups=truefl=title,ADSKDedup,scoredebugQuery=true

/Top 4 results/
lst name=grouped
  lst name=ADSKDedup
 int name=matches14593/int
 int name=ngroups*13648*/int
 arr name=groups
lst
   str name=groupValuefbfef4647e68c2300eba99028f2598a9/str
   result name=doclist numFound=1 start=0
  doc
 str
name=ADSKDedupfbfef4647e68c2300eba99028f2598a9/str
 arr name=title
strLINE/str
 /arr
 float name=score8.517085/float
  /doc
   /result
/lst
lst
   str
name=groupValueGUID-E8C1190C-A26C-484C-ADDD-DDF81666F69F/str
   result name=doclist numFound=3 start=0
  doc
 arr name=title
strLINE (Command)/str
 /arr
 str
name=ADSKDedupGUID-E8C1190C-A26C-484C-ADDD-DDF81666F69F/str
  /doc
   /result
/lst
lst
   str
name=groupValueGUID-695722CD-A131-48DB-9AB8-162F0832FE04/str
   result name=doclist numFound=4 start=0
  doc
 str
name=ADSKDedupGUID-695722CD-A131-48DB-9AB8-162F0832FE04/str
 arr name=title
strAbout Controlling Extension Lines/str
 /arr
 float name=score5.1433907/float
  /doc
   /result
/lst
lst
   str
name=groupValueGUID-9084DAC2-D5B7-4727-A443-205007A79440/str
   result name=doclist numFound=4 start=0
  doc
 arr name=title
strAbout Controlling Dimension Lines/str
 /arr
 str
name=ADSKDedupGUID-9084DAC2-D5B7-4727-A443-205007A79440/str
 float name=score5.1361656/float
  /doc
   /result
/lst


*CollapsingQParserPlugin query*

http://test-dev.mydomain.com/solr/adskhelpportal?fq=language:(english)wt=xmlrows=15start=0fq=(ContentGroup-local:Learn
 Explore OR ContentGroup-local:Getting Started OR
ContentGroup-local:Troubleshooting)fq=ProductLine:PRDq=linesort=score
descfq={!collapse field=ADSKDedup}fl=title,ADSKDedup,scoredebugQuery=true

/Top 4 results/
 result name=response numFound=27142 start=0 maxScore=8.517085
  doc
 str name=ADSKDedupfbfef4647e68c2300eba99028f2598a9/str
 arr name=title
strLINE/str
 /arr
 float name=score8.517085/float
  /doc
  doc
 str
name=ADSKDedupGUID-57CDDB6C-B12B-46CE-B9C5-22EFC17258FF/str
 arr name=title
strTo Draw Lines/str
 /arr
 float name=score6.276938/float
  /doc
  doc
 arr name=title
strDraw Lines/str
 /arr
 str name=ADSKDedup98b4a0e39400f0a216ff51a89922ce82/str
 float name=score6.224089/float
  /doc
  doc
 str name=ADSKDedup4e51abdc0e8d30e77069505d93c1d4d4/str
 arr name=title
strLines Tab/str
 /arr
 float name=score6.210026/float
  /doc

As you can see, the results are completely off, except for the first one.
Moreover, the number of results returned are different as well. Group query
has 13648 results which CollapsingQParserPlugin returns 27142, almost twice
the size.

I'm little baffled why the two APIs are returning different results for the
same query. Are they fundamentally different ?

Any pointers will be appreciated.

-Thanks,
Shamik



--
View this message in context: 
http://lucene.472066.n3.nabble.com/CollapsingQParserPlugin-returning-different-result-set-tp4123716.html
Sent from the Solr - User mailing list archive at Nabble.com.


Weird behavior of stopwords in search query

2014-02-18 Thread Shamik Bandopadhyay
Hi,

  I'm observing a weird behavior while using stopwords as part of the
search query. I'm able to replicate it in standalone Solr instance well.
The issue pops up when I'm trying to use other and and stopword
together in a query string. The query doesn't return any result. But it
works with any other combination. For e.g.

1. query yields no result --
http://localhost:8983/solr/collection1/browse?q=AWS+other+and+SearchdebugQuery=truewt=xml


Debug Query :


str name=rawquerystringAWS other and Search/str

str name=querystringAWS other and Search/strstr
name=parsedquery(+(DisjunctionMaxQuery((id:AWS^10.0 | author:aws^2.0 |
title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 |
manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 |
features:aws | sku:aw^1.5)) +DisjunctionMaxQuery((id:other^10.0 |
cat:other^1.4 | sku:other^1.5)) +DisjunctionMaxQuery((id:Search^10.0 |
author:search^2.0 | title:search^10.0 | text:search^0.5 | cat:Search^1.4 |
keywords:search^5.0 | manu:search^1.1 | description:search^5.0 |
resourcename:search | name:search^1.2 | features:search |
sku:search^1.5/no_coord/str

str name=parsedquery_toString+((id:AWS^10.0 | author:aws^2.0 |
title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 |
manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 |
features:aws | sku:aw^1.5) +(id:other^10.0 | cat:other^1.4 | sku:other^1.5)
+(id:Search^10.0 | author:search^2.0 | title:search^10.0 | text:search^0.5
| cat:Search^1.4 | keywords:search^5.0 | manu:search^1.1 |
description:search^5.0 | resourcename:search | name:search^1.2 |
features:search | sku:search^1.5))/str





2. query yields result --
http://localhost:8983/solr/collection1/browse?q=AWS+other+an+SearchdebugQuery=truewt=xml

Debug Query
-

str name=rawquerystringAWS other an Search/str

str name=querystringAWS other an Search/strstr
name=parsedquery(+(DisjunctionMaxQuery((id:AWS^10.0 | author:aws^2.0 |
title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 |
manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 |
features:aws | sku:aw^1.5)) DisjunctionMaxQuery((id:other^10.0 |
cat:other^1.4 | sku:other^1.5)) DisjunctionMaxQuery((id:an^10.0 |
cat:an^1.4)) DisjunctionMaxQuery((id:Search^10.0 | author:search^2.0 |
title:search^10.0 | text:search^0.5 | cat:Search^1.4 | keywords:search^5.0
| manu:search^1.1 | description:search^5.0 | resourcename:search |
name:search^1.2 | features:search | sku:search^1.5/no_coord/str

str name=parsedquery_toString+((id:AWS^10.0 | author:aws^2.0 |
title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 |
manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 |
features:aws | sku:aw^1.5) (id:other^10.0 | cat:other^1.4 | sku:other^1.5)
(id:an^10.0 | cat:an^1.4) (id:Search^10.0 | author:search^2.0 |
title:search^10.0 | text:search^0.5 | cat:Search^1.4 | keywords:search^5.0
| manu:search^1.1 | description:search^5.0 | resourcename:search |
name:search^1.2 | features:search | sku:search^1.5))/str

Both other and and are part of the stopwords list.

I ran an analysis on text_general field, both stopwords were shows as
ignored during indexing and query time, but not happening during actual
search.

Not sure what I'm missing here, any pointers will be appreciated.

- Thanks,
Shamik


Re: Weird behavior of stopwords in search query

2014-02-18 Thread shamik
Jack, thanks for the pointer. I should have checked this closely. I'm using
edismax and here's my qf entry :

str name=qf
  id^10.0 cat^1.4 text^0.5 features^1.0 name^1.2 sku^1.5 manu^1.1
title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0
   /str

As you can see, I was boosting id and cat which are of type string and of
course doesn't go through the stopwords filter. Removing them returned one
result which is based on AND operator. 

The part what I'm not clear is how and is being treated even through its a
stopword and the default operator is OR. Shouldn't this be ignored ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Weird-behavior-of-stopwords-in-search-query-tp4118156p4118188.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Fault Tolerant Technique of Solr Cloud

2014-02-18 Thread shamik
As Shawn had pointed, if you are using CloudSolrServer client, then you are
immune to the scenario where a shard and its replica(s) go down. The
communication should be ideally with the zookeepers and not the solr servers
directly, One thing you need to make sure is to add the shard.tolerant
parameter so that the query returns result from the shard which is alive,
though it'll fetch a partial resultset.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fault-Tolerant-Technique-of-Solr-Cloud-tp4118003p4118196.html
Sent from the Solr - User mailing list archive at Nabble.com.


Weird issue with q.op=AND

2014-02-12 Thread Shamik Bandopadhyay
Hi,

  I'm facing a weird problem while using q.op=AND condition. Looks like it
gets into some conflict if I use multiple appends condition in
conjunction. It works as long as I've one filtering condition in appends.

lst name=appends
   str name=fqSource:TestHelp/str
/lst

Now, the moment I add an additional parameter, search stops returning any
result.

lst name=appends
   str name=fqSource:TestHelp | Source:TestHelp2/str
/lst

If I remove q.op=AND from request handler, I get results back. Data is
present for both the Source I'm using, so it's not a filtering issue. Even
a blank query fails to return data.

Here's my request handler.

requestHandler name=/testhandler class=solr.SearchHandler
lst name=defaults
str name=echoParamsexplicit/str
float name=tie0.01/float
str name=wtvelocity/str
str name=v.templatebrowse/str
str name=v.contentTypetext/html;charset=UTF-8/str
str name=v.layoutlayout/str
str name=v.channeltesthandler/str
str name=defTypeedismax/str
str name=q.opAND/str
str name=q.alt*:*/str
str name=rows15/str
str name=flid,url,Source2,text/str
str name=qftext^1.5 title^2/str
str name=bqSource:TestHelp^3 Source:TestHelp2^0.85/str
str name=bfrecip(ms(NOW/DAY,PublishDate),3.16e-11,1,1)^2.0/str
str name=dftext/str

!-- facets --
str name=faceton/str
str name=facet.mincount1/str
str name=facet.limit100/str
str name=facet.fieldlanguage/str
str name=facet.fieldSource/str
 !-- Highlighting defaults --
str name=hltrue/str
str name=hl.fltext title/str
str name=f.text.hl.fragsize250/str
str name=f.text.hl.alternateFieldShortDesc/str

!-- Spell check settings --
str name=spellchecktrue/str
str name=spellcheck.dictionarydefault/str
str name=spellcheck.collatetrue/str
str name=spellcheck.onlyMorePopularfalse/str
str name=spellcheck.extendedResultsfalse/str
str name=spellcheck.count1/str

   !-- Shard Tolerant --
str name=shards.toleranttrue/str
/lst
lst name=appends
str name=fqSource:TestHelp | Source2:TestHelp2/str
/lst
arr name=last-components
strspellcheck/str
/arr
/requestHandler

Not sure what's going wrong. I'm using a SolrCloud environment with 2
shards having a replica each.

Any pointers will be appreciated.

Thanks,
Shamik


Re: Weird issue with q.op=AND

2014-02-12 Thread shamik
Thanks a lot Shawn. Changing the appends filtering based on your suggestion
worked. The part which confused me bigtime is the syntax I've been using so
far without an issue (barring the q.op part).


lst name=appends 
 str name=fqSource:TestHelp | Source:downloads |
-AccessMode:internal | -workflowparentid:[* TO *]/str 
/lst

This has been working as expected and applies the filter correctly. Just
curious, if its an invalid syntax, how's Solr handling this ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Weird-issue-with-q-op-AND-tp4117013p4117022.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Weird issue with q.op=AND

2014-02-12 Thread shamik
Thanks, I'll take a look at the debug data.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Weird-issue-with-q-op-AND-tp4117013p4117047.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing question on individual field update

2014-02-11 Thread shamik
Eric,

  Thanks for your reply. I should have given a better context. I'm currently
running an incremental crawl daily on this particular source and indexing
the documents. Incremental crawl looks for any change since last crawl date
based on the document publish date. But, there's no way for me to know if a
document has been deleted. To ensure that, I ran a full crawl on a weekend,
which basically re-index the entire content. After the full index is over, I
call a purge script, which deletes any content which is more than 24 hour
old, based on the indextimestamp field. 

The issue with atomic update is that it doesn't alter the indextimstamp
field. So even if I run a full crawl with atomic updates, the timestamp will
stick to its old value. Unfortunately, I can't rely on another date field
coming from the source as they are not consistent. That translates to the
fact that I can't remove stale content.

Let me know if I'm missing something here.

- Thanks,
Shamik





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-question-on-individual-field-update-tp4116605p4116757.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing question on individual field update

2014-02-11 Thread shamik
Ok, I was wrong here. I can always set the indextimestamp field with current
time (NOW) for every atomic update. On a similar note, is there any
performance constraint with updates compared to add ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-question-on-individual-field-update-tp4116605p4116772.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing question on individual field update

2014-02-11 Thread shamik
Thanks Eric and Shawn, appreciate your help.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-question-on-individual-field-update-tp4116605p4116831.html
Sent from the Solr - User mailing list archive at Nabble.com.


Indexing question on individual field update

2014-02-10 Thread Shamik Bandopadhyay
Hi,

  I'm currently indexing a bunch of fields for a given document. For e.g.
let's assume there's a field called rating. The rating field is not part
of the original document during index, so the value is blank. The field
gets updated by an external service when the document is rated by users.
The service makes a partial Solr update and sets the appropriate rating
value. But, when I re-index the same document, the rating fields get
over-written and reset to blank. I understand that an indexing in Solr is
delete and add, but is there a way to put a conditional indexing at the
field level, which will keep the value if its already present in the index
for a given id ?

Any pointers will be appreciated.

Thanks,
Shamik


Re: SolrCloud Result Grouping vs CollapsingQParserPlugin

2014-01-15 Thread shamik
Thanks Joel, really appreciate your help. I'll keep an eye on the 4.6.1
release.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Result-Grouping-vs-CollapsingQParserPlugin-tp4111331p4111486.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrCloud Result Grouping vs CollapsingQParserPlugin

2014-01-14 Thread Shamik Bandopadhyay
Hi,

  I'm planning to upgrade to Solr 4.6 to move from using Result Grouping to
CollapsingQParserPlugin. I'm currently using SolrCloud, couple of issues
with Result Grouping are :

1. Slow performance
2. Incorrect result count from ngroup

My understanding is that CollapsingQParserPlugin is aimed at addressing the
performance issue with Result Grouping. Based on the available
documentation, I'm not sure if CollapsingQParserPlugin addresses the result
count when the collapse field is spread across shards. The  Result Grouping
ngroup currently works if the groups are not distributed and confined to a
dedicated shard. Just wondering if this applies to CollapsingQParserPlugin
as well ? Will result name=response *numFound=6* start=0 be
incorrect if the collapsed field is distributed ?

I'll really appreciate if someone can provide pointers on this.

Thanks,
Shamik


Re: SolrCloud Result Grouping vs CollapsingQParserPlugin

2014-01-14 Thread shamik
Joel,

  Thanks for the pointer. I went through your blog on Document routing, very
informative. I do need some clarifications on the implementation. I'll try
to run it based on my use case. 

I'm indexing documents from multiple source system out of which a bunch
consist of duplicate content. I'm trying to remove them by applying result
grouping / CollapsingQParserPlugin. For e.g. lets say I've source ABC, MNO
and XYZ. Now, ABC and MNO source contains the duplicate documents, which is
identified by a field say adskdedup. I've couple of shards, the id being the
url of the documents. Now, to make field collapsing work, I need to update
the id field to include adskdedup!url . Documents having identical
adskdedup values should route to a dedicated shard , e.g. shard1. The ones
which are not identical will be routed to either Shard1 or Shard2. After the
indexing is done, shard1 should have all documents on which grouping needs
to be applied upon.

During query time, depending on the query, results can be returned from both
shards. For e.g. a query
q=solrgroup=truegroup.field=adskdedupgroup.ngroups=true would ideally
return data from both shards and apply the grouping on shard1 based on
adskdedup field. This will also ensure that group.ngroups=true will return
the right count.

The other clarification I wanted was based on this statement : When a
tenant is too large to fit on a single shard it can be spread across
multiple shards be specifying the number of bits to use from the shard key.
If we split shards, will Result Grouping / CollapsingQParserPlugin and
number of results still work ?

Last but not the least, when are you planning to release 4.6.1 ?

Again, appreciate your help on this.

- Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Result-Grouping-vs-CollapsingQParserPlugin-tp4111331p4111375.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Questionon CollapsingQParserPlugin

2014-01-14 Thread shamik
Thanks Joel, I found the issue. It had to do with the schema definition for
adskdedup field. I had defined it as a text_general which was analyzing it
based on -. After I changed it to type string, it worked as expected.
Thanks for looking into this.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Re-Questionon-CollapsingQParserPlugin-tp4111357p4111376.html
Sent from the Solr - User mailing list archive at Nabble.com.


Questionon CollapsingQParserPlugin

2014-01-13 Thread Shamik Bandopadhyay
Hi,

  I'm looking for some clarification on CollapsingQParserPlugin feature.

Here's what I tried. I downloaded 4.6, updated solr.xml under exampledocs
folder and added the following entry. I've added a new field adskdedup
on which I'm planning to test field collapsing. As you can see, out of four
documents, three have similar adskdedup values while the last one is
different.

doc
  field name=idSOLR1000/field
  field name=nameSolr, the Enterprise Search Server/field
  field name=price0/field
  field name=popularity10/field
  field name=inStocktrue/field
  field name=incubationdate_dt2006-01-17T00:00:00.000Z/field
  field name=adskdedupABCD-XYZ/field
/doc
doc
  field name=idSOLR1001/field
  field name=nameSolr, the Enterprise Search Server/field
  field name=price0/field
  field name=popularity10/field
  field name=inStocktrue/field
  field name=incubationdate_dt2006-01-17T00:00:00.000Z/field
  field name=adskdedupABCD-XYZ/field
/doc
doc
  field name=idSOLR1002/field
  field name=nameSolr, the Enterprise Search Server/field
  field name=price0/field
  field name=popularity10/field
  field name=inStocktrue/field
  field name=incubationdate_dt2006-01-17T00:00:00.000Z/field
  field name=adskdedupABCD-XYZ/field
/doc
doc
  field name=idSOLR1003/field
  field name=nameSolr, the Enterprise Search Server/field
  field name=price0/field
  field name=popularity10/field
  field name=inStocktrue/field
  field name=incubationdate_dt2006-01-17T00:00:00.000Z/field
  field name=adskdedupMNOP-QRS/field
/doc

Here's my query :

http://localhost:8983/solr/collection1/select?q=solrwt=xmlfq={!collapse%20field=adskdedup}

Based on my understanding of using group by, I was expecting couple of
results from the query. One with id=SOLR1000 and the second with
id=SOLR1003. Instead, its returning only 1 result based on the field
collapsing, i.e. id=SOLR1000.

Am I missing something here ?

Any pointer will be appreciated.

-Thanks


Re: Solr grouping performance porblem

2013-11-15 Thread shamik
Thanks for the update Shawn, will look forward to the release.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-grouping-performance-porblem-tp4098565p4101314.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr grouping performance porblem

2013-11-11 Thread shamik
Thanks Joel, appreciate your help. Is Solr 4.6 due this year ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-grouping-performance-porblem-tp4098565p4100358.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr grouping performance porblem

2013-10-30 Thread Shamik Bandopadhyay
Hi,

   I've recently upgraded to SolrCloud (4.4) from Master-Slave mode. One of
the changes I did the in queries is to add group functionality to remove
duplicate results. The grouping is done on a specific field. But the change
seemed to have a huge effect on the query performance. The group option
decreased the performance by 10 times. For e.g. this query takes 1 sec to
execute. The number of results is around 105387.

http://localhost:8083/solr/browse?fq=language:(english)wt=xmlrows=10start=0fq=(ContentGroup-local:Learn
 Explore OR ADSKContentGroup-local:Getting Started)q=linesort=score
descgroup=truegroup.field=dedupgroup.ngroups=true

If I exclude group option, it comes down to 190ms

http://localhost:8083/solr/browse?fq=language:(english)wt=xmlrows=10start=0fq=(ContentGroup-local:Learn
 Explore OR ADSKContentGroup-local:Getting Started)q=line

I'm running this query against a 8 million doc index . I've 2 shard with 1
replica each, running on a m1x.large EC2 instance, each having 8gb allocat
ed memory.

Is this a known issue or am I missing something which is making this query
expensive.

I bumped into this JIRA --
https://issues.apache.org/jira/browse/SOLR-5027 which
talks about CollapsingQParserPlugin as an alternate to grouping, but that
seemed to be available in 4.6. Just wondering if it can be an alternate in
my case and whether if its possible to apply as a patch in 4.4 version.

Any pointer will be appreciated.

- Thanks,
Shamik


Re: Grouping performance problem

2013-10-30 Thread shamik
Bumping up this thread as I'm facing similar issue . Any solution ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Grouping-performance-problem-tp3995245p4098566.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: shards.tolerant throwing null pointer exception when spellcheck is on

2013-10-23 Thread shamik
Thanks for the information. I think its good to have this issue fixed,
specially for cases where the spellcheck feature is on. I'll check out at
the source code and take a look, even a quick suppressing of the null
pointer exception might make a difference.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/shards-tolerant-throwing-null-pointer-exception-when-spellcheck-is-on-tp4097133p4097234.html
Sent from the Solr - User mailing list archive at Nabble.com.


shards.tolerant throwing null pointer exception when spellcheck is on

2013-10-22 Thread Shamik Bandopadhyay
Hi,
 .96
  I'm trying to simulate a fault tolerance test where a shard and its
replica(s) goes. down, leaving other shard(s) running. To test it, I added
str name=shards.toleranttrue/str in my request handler under defaults
section. This is to make sure that the condition is added to each query
running against this request handler.

In my test environment, I have to 2 shards with a replica each. I brought
down Shard 1 and Replica 1, then fired a query using SolrJ CloudSolrServer,
which internally talks to the zookeeper ensemble. In my request handler,
the spellcheck option is turned on. Due to this, the servers are throwing
null pointer exception. Here's the stack trace.

2013-10-22 20:24:43,875] INFO482886[qtp1783079124-15] -
org.apache.solr.core.SolrCore.execute(SolrCore.java:1909) - [collection1]
webapp=/solr path=/testhtmlhelp
params={spellcheck=onq=xrefwt=xmlfq=TestProductLine:ADTfq=TestProductRelease:ADT+2014fq=language:english}
hits=157 status=500 QTime=70
[2013-10-22 20:24:43,876]ERROR482887[qtp1783079124-15] -
org.apache.solr.common.SolrException.log(SolrException.java:119) -
null:java.lang.NullPointerException
at
org.apache.solr.handler.component.SpellCheckComponent.finishStage(SpellCheckComponent.java:323)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:317)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:619)


Here's the query detail from the server log, as you can see the spellcheck
is on.

[collection1] webapp=/solr path=/testhtmlhelp
params={facet=onf.TestCategory.facet.limit=160tie=0.01shards.qt=/testhtmlhelpfl=id,scorefacet.field=Source2fq=TestProductLine:ADTfq=TestProductRelease:ADT+2014fq=language:englishrows=150defType=edismaxstart=0spellcheck=onshards.tolerant=trueshard.url=localhost:8984/solr/collection1/|localhost:8983/solr/collection1/q=xrefisShard=true}
hits=157 status=0 QTime=15

 Now, if I comment out the spellcheck option in request handler, the query
works as expected, even if the other shard and its replica is down.

Is this a known bug in Solr 4.4 ? What'll be the recommended work-around to
address this issue ? Any pointers will be appreciated.

Thanks,
Shamik


Re: SolrCloud Performance Issue

2013-10-17 Thread shamik
Thanks Primoz, I was suspecting that too. But then, its hard to imagine that
query cache is only contributing to the big performance hit. The setting
applies to the old configuration, and it works pretty well even with the
query cache low hit rate.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Performance-Issue-tp4095971p4096123.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Performance Issue

2013-10-17 Thread shamik
I tried commenting out NOW in bq, but didn't make any difference in the
performance. I do see minor entry in the queryfiltercache rate which is a
meager 0.02. 

I'm really struggling to figure out the bottleneck, any known pain points I
should be checking ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Performance-Issue-tp4095971p4096277.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrCloud Performance Issue

2013-10-16 Thread shamik
=spellchecktrue/str
str name=spellcheck.dictionarydefault/str
str name=spellcheck.collatetrue/str
str name=spellcheck.onlyMorePopularfalse/str
str name=spellcheck.extendedResultsfalse/str
str name=spellcheck.count1/str
/lst
arr name=last-components
strspellcheck/str
/arr
/requestHandler

One thing I've noticed is that the queryresultcache hit rate is really low,
not sure our queries are always that unique. I'm using edismax and there's a
str name=bfrecip(ms(NOW,PublishDate),3.16e-11,1,1)^2.0/str , can this
contribute ?

Sorry about the long post, but I'm struggling to nail down the issue here,
especially when queries are running fine in a master-slave environment with
similar hardware and network.

Any pointers will be highly appreciated.

Regards,
Shamik




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Performance-Issue-tp4095940.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrCloud Performance Issue

2013-10-16 Thread Shamik Bandopadhyay
str name=spellcheck.collatetrue/str
str name=spellcheck.onlyMorePopularfalse/str
str name=spellcheck.extendedResultsfalse/str
str name=spellcheck.count1/str
/lst
arr name=last-components
strspellcheck/str
/arr
/requestHandler

One thing I've noticed is that the queryresultcache hit rate is really low,
not sure our queries are always that unique. I'm using edismax and there's
a str name=bfrecip(ms(NOW,PublishDate),3.16e-11,1,1)^2.0/str , can
this contribute ?

Sorry about the long post, but I'm struggling to nail down the issue here,
especially when queries are running fine in a master-slave environment with
similar hardware and network.

Any pointers will be highly appreciated.

Regards,
Shamik


RE: How to achieve distributed spelling check in SolrCloud ?

2013-10-08 Thread shamik
James,

  Thanks for your reply. The shards.qt did the trick. I read the
documentation earlier but was not clear on the implementation, now it
totally makes sense.

Appreciate your help.

Regards,
Shamik



--
View this message in context: 
http://lucene.472066.n3.nabble.com/RE-How-to-achieve-distributed-spelling-check-in-SolrCloud-tp4094113p4094137.html
Sent from the Solr - User mailing list archive at Nabble.com.


<    1   2   3   >