Re: Facet to part of search results

2020-12-04 Thread Andy Webb
I wonder if you could increase the precision of your result set to reduce
its size? If you have 10M results for a query but only the first 10K
deserve to be represented by the faceting, what is it about those 10K that
makes them better than the other 9.99M? For example if some items are
boosted by some attribute(s) to get higher scores, can you filter out items
that don't have those attributes? Also, maybe setting mm to require more
terms to match could cut out unwanted results (that's not useful for the
"dog" query of course).

Andy

On Fri, 4 Dec 2020 at 06:43, Radu Gheorghe 
wrote:

>
> > On 3 Dec 2020, at 20:18, Shawn Heisey  wrote:
> >
> > On 12/3/2020 9:55 AM, Jae Joo wrote:
> >> Is there any way to apply facet to the partial search result?
> >> For ex, we have 10m return by "dog" and like to apply facet to first
> 10K.
> >> Possible?
> >
> > The point of facets is to provide accurate numbers.
> >
> > What would it mean to only apply to the first 10K?  If there are 10
> million documents in the query results that contain "dog" then the facet
> should say 10 million, not 10K.  I do not understand what you're trying to
> do.
> >
>
> Maybe sampling? I’m not aware of a built-in way to do that. But you could
> index a random float between, say 0 and 100 and then filter out a sample by
> filtering for number
> Or maybe you’d think that faceting on 10K would be enough (e.g. if you
> don’t need the numbers, just some unique values). But I really don’t see a
> good solution to that - you’d have to terminateEarly and do faceting
> somehow…
>
> Best regards,
> Radu
>
>


Why am I able to sort on a multiValued field?

2020-11-13 Thread Andy C
I am adding a new float field to my index that I want to perform range
searches and sorting on. It will only contain a single value.

I have an existing dynamic field definition in my schema.xml that I wanted
to use to avoid having to updating the schema:




I went ahead and implemented this in a test system (recently updated to
Solr 8.7), but then it occurred to me that I am not going to be able to
sort on the field because it is defined as multiValued.

But to my surprise sorting worked, and gave the expected results.Why? Can
this behavior be relied on in future releases?

Appreciate any insights.

Thanks
- AndyC -


Re: Folding Repeated Letters

2020-10-08 Thread Andy Webb
How about something like this?

{
"add-field-type": [
{
"name": "norepeat",
"class": "solr.TextField",
"analyzer": {
"tokenizer": {
"class": "solr.StandardTokenizerFactory"
},
"filters": [
{
"class": "solr.LowerCaseFilterFactory"
},
{
"class": "solr.PatternReplaceFilterFactory",
"pattern": "(.)\\1+",
"replacement": "$1"
}
]
}
    }
]
}

This finds a match...
http://localhost:8983/solr/#/norepeat/analysis?analysis.fieldvalue=Yes&analysis.query=YyyeeEssSs&analysis.fieldtype=norepeat

Andy



On Thu, 8 Oct 2020 at 23:02, Mike Drob  wrote:

> I'm looking for a way to transform words with repeated letters into the
> same token - does something like this exist out of the box? Do our stemmers
> support it?
>
> For example, say I would want all of these terms to return the same search
> results:
>
> YES
> YESSS
> YYYEEESSS
> YYEE[...]S
>
> I don't know how long a user would hold down the S key at the end to
> capture their level of excitement, and I don't want to manually define
> synonyms for every length.
>
> I'm pretty sure that I don't want PhoneticFilter here, maybe
> PatternReplace? Not a huge fan of how that one is configured, and I think
> I'd have to set up a bunch of patterns inline for it?
>
> Mike
>


Re: Term too complex for spellcheck.q param

2020-10-08 Thread Andy Webb
I added the maxQueryLength option to DirectSolrSpellchecker in
https://issues.apache.org/jira/browse/SOLR-14131 - that landed in 8.5.0 so
should be available to you.

Andy

On Wed, 7 Oct 2020 at 23:53, gnandre  wrote:

> Is there a way to truncate spellcheck.q param value from Solr side?
>
> On Wed, Oct 7, 2020, 6:22 PM gnandre  wrote:
>
> > Thanks. Is this going to be fixed in some future version?
> >
> > On Wed, Oct 7, 2020, 4:15 PM Mike Drob  wrote:
> >
> >> Right now the only solution is to use a shorter term.
> >>
> >> In a fuzzy query you could also try using a lower edit distance e.g.
> >> term~1
> >> (default is 2), but I’m not sure what the syntax for a spellcheck would
> >> be.
> >>
> >> Mike
> >>
> >> On Wed, Oct 7, 2020 at 2:59 PM gnandre  wrote:
> >>
> >> > Hi,
> >> >
> >> > I am getting following error when I pass '
> >> > 김포오피➬유유닷컴➬✗UUDAT3.COM유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마
> >> > ' in spellcheck.q param. How to avoid this error? I am using Solr
> 8.5.2
> >> >
> >> > {
> >> >   "error": {
> >> > "code": 500,
> >> > "msg": "Term too complex: 김포오피➬유유닷컴➬✗uudat3.com
> >> > 유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마",
> >> > "trace":
> >> "org.apache.lucene.search.FuzzyTermsEnum$FuzzyTermsException:
> >> > Term too complex:
> >> > 김포오피➬유유닷컴➬✗uudat3.com유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마\n\tat
> >> >
> >> >
> >>
> org.apache.lucene.search.FuzzyAutomatonBuilder.buildAutomatonSet(FuzzyAutomatonBuilder.java:63)\n\tat
> >> >
> >> >
> >>
> org.apache.lucene.search.FuzzyTermsEnum$AutomatonAttributeImpl.init(FuzzyTermsEnum.java:365)\n\tat
> >> >
> >> >
> >>
> org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:125)\n\tat
> >> >
> >> >
> >>
> org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:92)\n\tat
> >> >
> >> >
> >>
> org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:425)\n\tat
> >> >
> >> >
> >>
> org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:376)\n\tat
> >> >
> >> >
> >>
> org.apache.solr.spelling.DirectSolrSpellChecker.getSuggestions(DirectSolrSpellChecker.java:196)\n\tat
> >> >
> >> >
> >>
> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:195)\n\tat
> >> >
> >> >
> >>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)\n\tat
> >> >
> >> >
> >>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)\n\tat
> >> > org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)\n\tat
> >> >
> >>
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:802)\n\tat
> >> > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:579)\n\tat
> >> >
> >> >
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420)\n\tat
> >> >
> >> >
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352)\n\tat
> >> >
> >> >
> >>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)\n\tat
> >> >
> >> >
> >>
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)\n\tat
> >> >
> >> >
> >>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
> >> >
> >> >
> >>
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)\n\tat
> >> >
> >> >
> >>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
> >> >
> >> >
> >>
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat
> >> >
> >> >
> >>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)\n\tat
> >> >
> >> >
> >>
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat
> >> >
> >> >
> >>
&g

Re: Updating configset

2020-09-11 Thread Andy C
Don't know if this is an option for you but the SolrJ Java Client library
has support for uploading a config set. If the config set already exists it
will overwrite it, and automatically RELOAD the dependent collection.

See
https://lucene.apache.org/solr/8_5_0/solr-solrj/org/apache/solr/common/cloud/ZkConfigManager.html

On Fri, Sep 11, 2020 at 1:45 PM Jörn Franke  wrote:

> I would go for the Solr rest api ... especially if you have a secured zk
> (eg with Kerberos). Then you need to manage access for humans only in Solr
> and not also in ZK.
>
> > Am 11.09.2020 um 19:41 schrieb Erick Erickson :
> >
> > Bin/solr zk upconfig...
> > Bin/solr zk cp... For individual files.
> >
> > Not as convenient as a nice API, but might let you get by...
> >
> >> On Fri, Sep 11, 2020, 13:26 Houston Putman 
> wrote:
> >>
> >> I completely agree, there should be a way to overwrite an existing
> >> configSet.
> >>
> >> Looks like https://issues.apache.org/jira/browse/SOLR-10391 already
> >> exists,
> >> so the work could be tracked there.
> >>
> >> On Fri, Sep 11, 2020 at 12:36 PM Tomás Fernández Löbbe <
> >> tomasflo...@gmail.com> wrote:
> >>
> >>> I was in the same situation recently. I think it would be nice to have
> >> the
> >>> configset UPLOAD command be able to override the existing configset
> >> instead
> >>> of just fail (with a parameter such as override=true or something). We
> >> need
> >>> to be careful with the trusted/unstrusted flag there, but that should
> be
> >>> possible.
> >>>
>  If we can’t modify the configset wholesale this way, is it possible to
> >>> create a new configset and swap the old collection to it?
> >>> You can create a new one and then call MODIFYCOLLECTION on the
> collection
> >>> that uses it:
> >>>
> >>>
> >>
> https://lucene.apache.org/solr/guide/8_6/collection-management.html#modifycollection-parameters
> >>> .
> >>> I've never used that though.
> >>>
> >>> On Fri, Sep 11, 2020 at 7:26 AM Carroll, Michael (ELS-PHI) <
> >>> m.carr...@elsevier.com> wrote:
> >>>
>  Hello,
> 
>  I am running SolrCloud in Kubernetes with Solr version 8.5.2.
> 
>  Is it possible to update a configset being used by a collection using
> a
>  SolrCloud API directly? I know that this is possible using the zkcli
> >> and
> >>> a
>  collection RELOAD. We essentially want to be able to checkout our
> >>> configset
>  from source control, and then replace everything in the active
> >> configset
> >>> in
>  SolrCloud (other than the schema.xml).
> 
>  We have a couple of custom plugins that use config files that reside
> in
>  the configset, and we don’t want to have to rebuild the collection or
>  access zookeeper directly if we don’t have to. If we can’t modify the
>  configset wholesale this way, is it possible to create a new configset
> >>> and
>  swap the old collection to it?
> 
>  Best,
>  Michael Carroll
> 
> >>>
> >>
>


Re: Error on searches containing specific character pattern

2020-09-07 Thread Andy @ BlueFusion

Thanks David, I'll set up  the techproducts schema and see what happens.

Kind regards,

Andy

On 4/09/20 4:09 pm, David Smiley wrote:

Hi,

I looked at the code at those line numbers and it seems simply impossible
that an ArrayIndexOutOfBoundsException could be thrown there because it's
guarded by a condition ensuring the array is of length 1.
https://github.com/apache/lucene-solr/blob/2752d50dd1dcf758a32dc573d02967612a2cf1ff/lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java#L653

If you can reproduce this with the "techproducts" schema, please share the
complete query.  If there's a problem here, I suspect the synonyms you have
may be pertinent.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Sep 1, 2020 at 11:50 PM Andy @ BlueFusion 
wrote:


Hi All,

I have an 8.6.0 instance that is working well with one exception.

It returns an error when the search term follows a pattern of numbers &
alpha characters such as:

   * 1a1 aa
   * 1a1 1aa
   * 1a1 11

Similar patterns that don't error

   * 1a1 a
   * 1a1 1
   * 1a11 aa
   * 11a1 aa
   * 1a1aa
   * 11a11 aa

The error is:

|"trace":"java.lang.ArrayIndexOutOfBoundsException: 0\n\t at
org.apache.lucene.util.QueryBuilder.newSynonymQuery(QueryBuilder.java:653)\n\t

at
org.apache.solr.parser.SolrQueryParserBase.newSynonymQuery(SolrQueryParserBase.java:617)\n\t

at
org.apache.lucene.util.QueryBuilder.analyzeGraphBoolean(QueryBuilder.java:533)\n\t

at
org.apache.lucene.util.QueryBuilder.createFieldQuery(QueryBuilder.java:320)\n\t

at
org.apache.lucene.util.QueryBuilder.createFieldQuery(QueryBuilder.java:240)\n\t

at
org.apache.solr.parser.SolrQueryParserBase.newFieldQuery(SolrQueryParserBase.java:524)\n\t

at
org.apache.solr.parser.QueryParser.newFieldQuery(QueryParser.java:62)\n\t
at
org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:1122)\n\t

at
org.apache.solr.parser.QueryParser.MultiTerm(QueryParser.java:593)\n\t
at org.apache.solr.parser.QueryParser.Query(QueryParser.java:142)\n\t at
org.apache.solr.parser.QueryParser.Clause(QueryParser.java:282)\n\t at
org.apache.solr.parser.QueryParser.Query(QueryParser.java:162)\n\t at
org.apache.solr.parser.QueryParser.Clause(QueryParser.java:282)\n\t at
org.apache.solr.parser.QueryParser.Query(QueryParser.java:162)\n\t at
org.apache.solr.parser.QueryParser.Clause(QueryParser.java:282)\n\t at
org.apache.solr.parser.QueryParser.Query(QueryParser.java:162)\n\t at
org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:131)\n\t
at
org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:260)\n\t

at org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:49)\n\t
at org.apache.solr.search.QParser.getQuery(QParser.java:174)\n\t at
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:160)\n\t

at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:302)\n\t

at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)\n\t

at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)\n\t at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)\n\t
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)\n\t
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)\n\t

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)\n\t

at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)\n\t

at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)\n\t

at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\t

at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\t

at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\t

at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\t

at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)\n\t

at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\t

at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)\n\t

at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\t

at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)\n\t

at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)\n\t

at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\t

at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)\n\t

at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\t

at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)\n\t

at
org.eclipse.jetty.server.handler.HandlerColl

Error on searches containing specific character pattern

2020-09-01 Thread Andy @ BlueFusion
dPoint.java:117)\n\t 
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)\n\t 
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)\n\t 
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)\n\t 
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)\n\t 
at 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)\n\t 
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:781)\n\t 
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:917)\n\t 
at java.lang.Thread.run(Thread.java:748)\n" I haven't been able to find 
anything similar reported online so I'm thinking it's a config issue and 
would be grateful for any pointers & solutions. Many thanks in advance,|


--
Andy Dopleach
/Director/
*BlueFusion <https://www.bluefusion.co.nz/>*
p:  03 328 8646  m: 021 255 7403 
w: 	www.bluefusion.co.nz <https://www.bluefusion.co.nz> e: 
a...@bluefusion.co.nz <mailto:a...@bluefusion.co.nz>



	Review BlueFusion on Google 
<https://search.google.com/local/writereview?placeid=ChIJC9VEfTMmMm0RyWB58kmqS7c>




Re: Solr with HDFS configuration example running in production/dev

2020-08-20 Thread Andy Hind
Hi

I would not go down this road. What is the use case?  Is this really the 
solution?

Go read all the relevant docs and configuration provided by 
Cloudera/HortonWorks and everything else related to SOLR and HDFS.

I am not inclined to help you down a road you do not want to travel. There be 
dragons!

Andy

> On 20 Aug 2020, at 07:25, Prashant Jyoti  wrote:
> 
> Hi Joe,
> These are the errors I am running into:
> 
> org.apache.solr.common.SolrException: Error CREATEing SolrCore
> 'newcollsolr2_shard1_replica_n1': Unable to create core
> [newcollsolr2_shard1_replica_n1] Caused by: Illegal char <:> at index 4:
> hdfs://
> hn1-pjhado.tvbhpqtgh3judk1e5ihrx2k21d.tx.internal.cloudapp.net:8020/user/solr-data/newcollsolr2/core_node3/data\
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1256)
> at
> org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$0(CoreAdminOperation.java:93)
> at
> org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:362)
> at
> org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:397)
> at
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
> at org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:842)
> at
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:808)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:559)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)
> at
> org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at org.eclipse.jetty.server.Server.handle(Server.java:500)
> at
> org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
> at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:270)
> at
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
> at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
> at
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:388)
> a

Re: Solrj client 8.6.0 issue special characters in query

2020-08-07 Thread Andy Webb
hi Jörn - something's decoding a UTF8 sequence using the legacy iso-8859-1
character set:

Jörn is J%C3%B6rn in UTF8
J%C3%B6rn misinterpreted as iso-8859-1 is Jörn
Jörn is J%C3%83%C2%B6rn in UTF8

I hope this helps track down the problem!
Andy

On Fri, 7 Aug 2020 at 12:08, Jörn Franke  wrote:

> Hmm, setting -Dfile.encoding=UTF-8 solves the problem. I have to now check
> which component of the application screws it up, but at the moment I do NOT
> believe it is related to Solrj.
>
> On Fri, Aug 7, 2020 at 11:53 AM Jörn Franke  wrote:
>
> > Dear all,
> >
> > I have the following issues. I have a Solrj Client 8.6 (but it happens
> > also in previous versions), where I execute, for example, the following
> > query:
> > Jörn
> >
> > If I look into Solr Admin UI it finds all the right results.
> >
> > If I use Solrj client then it does not find anything.
> > Further, investigating in debug mode it seems that the URI to server gets
> > wrongly encoded.
> > Jörn becomes J%C3%83%C2%B6rn
> > It should become only J%C3%B6rn
> > any idea why this happens and why it add %83%C2 inbetween? Those do not
> > seem to be even valid UTF-8 characters
> >
> > I verified with various statements that I give to Solrj the correct
> > encoded String "Jörn"
> >
> > Can anyone help me here?
> >
> > Thank you.
> >
> > best regards
> >
>


Unable to get ICUFoldingFilterFactory class loaded in unsecured 8.4.1 SolrCloud

2020-01-29 Thread Andy C
I have a schema currently used with Solr 7.3.1 that uses the ICU contrib
extensions. Previously I used a  directive in the solrconfig.xml to
load the icu4j and lucene-analyzers-icu jars.

The 8.4 upgrade notes indicate that this approach is no longer supported
for SolrCloud unless you enable authentication. So I removed the 
directive from the solrconfig.xml.

I tried creating a 'lib' directory underneath solr-8.4.1\server\solr and
copying the jars there. However I get a ClassNotFoundException for
ICUFoldingFilterFactory class when I try creating a collection using the
uploaded configset. Adding an explicit "lib"
entry to the solr.xml (kept outside zookeeper), didn't help. (Note: both
these approaches work with a standalone 8.4.1 Solr instance).

I tried copying the 2 jars into the one of directories that are part of the
standard classpath, but that seems to cause problems with the class loader,
as I start getting a  NoClassDefFoundError :
org/apache/lucene/analysis/util/ResourceLoaderAware exception.

Any suggestions?

Thanks,
- Andy -


Re: Boolean Searches?

2019-03-14 Thread Andy C
Dave,

You don't mention what query parser you are using, but with the default
query parser you can field qualify all the terms entered in a text box by
surrounding them with parenthesis. So if you want to search against the
'title' field and they entered:

train OR dragon

You could generate the Solr query:

title:(train OR dragon)

Historically however Solr has not processed queries that contain a mixture
of boolean operators as expected. The problem is described here:
http://robotlibrarian.billdueber.com/2011/12/solr-and-boolean-operators/

There is an open JIRA for this (
https://issues.apache.org/jira/browse/SOLR-4023) so I assume the problem
still exists in the most recent releases.

On Thu, Mar 14, 2019 at 10:50 AM Dave Beckstrom 
wrote:

> Hi Everyone,
>
> I'm building a SOLR search application and the customer wants the search to
> work like google search.
>
>
> They want the user to be able to enter boolean searches like:
>
> train OR dragon.
>
> which would find any matches that has the word "train" or the word "dragon"
> in the title.
>
> I know that the SOLR search would like this:
>
> title:train OR title:dragon
>
> I am trying to avoid having to parse through what the user enters and build
> out complex search strings.
>
> Is there any way that I can build a search against the "title" field where
> if the user enters something like:
>
> train OR dragon AND 2
>
> it will hour the boolean AND/OR logic without my having to convert it into
> somethng nasty like:
>
> title:train OR title:dragon AND title:2
>
>
> Thank you!
>
> --
> *Fig Leaf Software, Inc.*
> https://www.figleaf.com/
> 
>
> Full-Service Solutions Integrator
>
>
>
>
>
>
>


Re: which Zookeper version for Solr 6.6.5

2018-12-14 Thread Andy C
Bernd,

I recently asked a similar question about Solr 7.3 and Zookeeper 3.4.11.

This is the response I found most helpful:

https://www.mail-archive.com/solr-user@lucene.apache.org/msg138910.html

- Andy -


On Fri, Dec 14, 2018 at 7:41 AM Bernd Fehling <
bernd.fehl...@uni-bielefeld.de> wrote:

> This question sounds simple but nevertheless its spinning in my head.
>
> While using Solr 6.6.5 in Cloud mode which has Apache ZooKeeper 3.4.10
> in the list of "Major Components" is it possible to use
> Apache ZooKeeper 3.4.13 as stand-alone ensemble together with SolrCloud
> 6.6.5
> or do I have to recompile SolrCloud 6.6.5 with Zookeeper 3.4.13 libraries?
>
> Regards
> Bernd
>


Re: solr instance keep increasing thread count

2018-12-04 Thread andy
update about 3000 docs per minute ,but other solr instance is running
normally



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


solr instance keep increasing thread count

2018-12-04 Thread andy
we use solrcloud with version 5.2.1,openjdk 1.8,we have multiple solr
machines, one of them keep increasing thread count ,I saw from
http://xx:xxx/solr/#/~threads page, I found that a lot of
[java.util.concurrent.locks.ReentrantReadWriteLock$FairSync@x] threads,
the detail info like this:

sun.misc.Unsafe.park​(Native Method)
java.util.concurrent.locks.LockSupport.park​(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt​(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared​(AbstractQueuedSynchronizer.java:967)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared​(AbstractQueuedSynchronizer.java:1283)
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock​(ReentrantReadWriteLock.java:727)
org.apache.solr.update.VersionInfo.lockForUpdate​(VersionInfo.java:110)
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit​(DistributedUpdateProcessor.java:1630)
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit​(DistributedUpdateProcessor.java:1612)
org.apache.solr.update.processor.LogUpdateProcessor.processCommit​(LogUpdateProcessorFactory.java:161)
org.apache.solr.handler.RequestHandlerUtils.handleCommit​(RequestHandlerUtils.java:69)
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody​(ContentStreamHandlerBase.java:68)
org.apache.solr.handler.RequestHandlerBase.handleRequest​(RequestHandlerBase.java:143)
org.apache.solr.core.SolrCore.execute​(SolrCore.java:2064)
org.apache.solr.servlet.HttpSolrCall.execute​(HttpSolrCall.java:654)
org.apache.solr.servlet.HttpSolrCall.call​(HttpSolrCall.java:450)
org.apache.solr.servlet.SolrDispatchFilter.doFilter​(SolrDispatchFilter.java:227)
org.apache.solr.servlet.SolrDispatchFilter.doFilter​(SolrDispatchFilter.java:196)
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter​(ServletHandler.java:1652)
org.eclipse.jetty.servlet.ServletHandler.doHandle​(ServletHandler.java:585)
org.eclipse.jetty.server.handler.ScopedHandler.handle​(ScopedHandler.java:143)
org.eclipse.jetty.security.SecurityHandler.handle​(SecurityHandler.java:577)
org.eclipse.jetty.server.session.SessionHandler.doHandle​(SessionHandler.java:223)
org.eclipse.jetty.server.handler.ContextHandler.doHandle​(ContextHandler.java:1127)
org.eclipse.jetty.servlet.ServletHandler.doScope​(ServletHandler.java:515)
org.eclipse.jetty.server.session.SessionHandler.doScope​(SessionHandler.java:185)
org.eclipse.jetty.server.handler.ContextHandler.doScope​(ContextHandler.java:1061)
org.eclipse.jetty.server.handler.ScopedHandler.handle​(ScopedHandler.java:141)
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle​(ContextHandlerCollection.java:215)
org.eclipse.jetty.server.handler.HandlerCollection.handle​(HandlerCollection.java:110)
org.eclipse.jetty.server.handler.HandlerWrapper.handle​(HandlerWrapper.java:97)
org.eclipse.jetty.server.Server.handle​(Server.java:497)
org.eclipse.jetty.server.HttpChannel.handle​(HttpChannel.java:310)
org.eclipse.jetty.server.HttpConnection.onFillable​(HttpConnection.java:257)
org.eclipse.jetty.io.AbstractConnection$2.run​(AbstractConnection.java:540)
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob​(QueuedThreadPool.java:635)
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run​(QueuedThreadPool.java:555)
java.lang.Thread.run​(Thread.java:745)


anyone know the  probably reson for this?




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Connection Problem with CloudSolrClient.Builder().build When passing a Zookeeper Addresses and RootParam

2018-06-18 Thread Andy C
>From the error, I think the issue is with your zookeeperList definition.

Try changing:


zookeeperList.add("http://100.12.119.10:2281";);
zookeeperList.add("http://100.12.119.10:2282";);
zookeeperList.add("http://100.12.119.10:2283";);

to


zookeeperList.add("100.12.119.10:2281");
zookeeperList.add("100.12.119.10:2282");
zookeeperList.add("100.12.119.10:2283");

If you are not using a chroot in Zookeeper then just use chrootOption =
Optional.empty(); (as you have done).

Intent of my code was to support both using a chroot and not using a
chroot. The value of _zkChroot is read from a config file in code not shown.

- Andy -


Re: Connection Problem with CloudSolrClient.Builder().build When passing a Zookeeper Addresses and RootParam

2018-06-18 Thread Andy C
I am using the following (Solr 7.3.1) successfully:

import java.util.Optional;

 Optional chrootOption = null;
 if (StringUtils.isNotBlank(_zkChroot))
 {
chrootOption = Optional.of(_zkChroot);
 }
 else
 {
chrootOption = Optional.empty();
 }
 CloudSolrClient client = new CloudSolrClient.Builder(_zkHostList,
chrootOption).build();

Adapted from code I found somewhere (unit test?). Intent is to support the
option of configuring a chroot or not (stored in "_zkChroot")

- Andy -

On Mon, Jun 18, 2018 at 12:53 PM, THADC 
wrote:

> Hello,
>
> I am using solr 7.3 and zookeeper 3.4.10. I have custom client code that is
> supposed to connect the a zookeeper cluster. For the sake of clarity, the
> main code focus:
>
>
> private synchronized void initSolrClient()
> {
> List zookeeperList = new ArrayList();
>
> zookeeperList.add("http://100.12.119.10:2281";);
> zookeeperList.add("http://100.12.119.10:2282";);
> zookeeperList.add("http://100.12.119.10:2283";);
>
> String collectionName = "myCollection"
>
> log.debug("in initSolrClient(), collectionName: " +
> collectionName);
>
> try {
> solrClient = new 
> CloudSolrClient.Builder(zookeeperList,
> null).build();
>
> } catch (Exception e) {
> log.info("Exception creating solr client object.
> ");
> e.printStackTrace();
> }
> solrClient.setDefaultCollection(collectionName);
> }
>
> Before executing, I test that all three zoo nodes are running
> (./bin/zkServer.sh status zoo.cfg, ./bin/zkServer.sh status zoo2.cfg,
> ./bin/zkServer.sh status zoo3.cfg). The status shows the quorum is
> up and running, with one nodes as the leader and the other two as
> followers.
>
> When I execute my java client to connect to the zookeeper cluster, I get :
>
> java.lang.NullPointerException
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient$Builder.<
> init>(CloudSolrClient.java:1387)
>
>
> I am assuming it has a problem with my null value for zkChroot, but not
> certain. Th API says zkChroot is the path to the root ZooKeeper node
> containing Solr data. May be empty if Solr-data is located at the ZooKeeper
> root.
>
> I am confused on what exactly should go here, and when it can be null. I
> cannot find any coding examples.
>
> Any help greatly appreciated.
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Performance if there is a large number of field

2018-05-11 Thread Andy C
Shawn,

Why are range searches more efficient than wildcard searches? I guess I
would have expected that they just provide different mechanism for defining
the range of unique terms that are of interest, and that the merge
processing would be identical.

Would a search such as:

field:c*

be more efficient if rewritten as:

field:[c TO d}

then?

On Fri, May 11, 2018 at 10:45 AM, Shawn Heisey  wrote:

> On 5/10/2018 2:22 PM, Deepak Goel wrote:
>
>> Are there any benchmarks for this approach? If not, I can give it a spin.
>> Also wondering if there are any alternative approach (i guess lucene
>> stores
>> data in a inverted field format)
>>
>
> Here is the only other query I know of that can find documents missing a
> field:
>
> q=*:* -field:*
>
> The potential problem with this query is that it uses a wildcard.  On
> non-point fields with very low cardinality, the performance might be
> similar.  But if the field is a Point type, or has a large number of unique
> values, then performance would be a lot worse than the range query I
> mentioned before.  The range query is the best general purpose option.
>
> The *:* query, despite appearances, does not use wildcards.  It is special
> query syntax.
>
> Thanks,
> Shawn
>
>


Re: Upgrading to Solr 7.3 but Zookeeper 3.4.11 no longer available on Zookeeper mirror sites

2018-05-09 Thread Andy C
Thanks Shawn. That makes sense.

On Wed, May 9, 2018 at 5:10 PM, Shawn Heisey  wrote:

> On 5/9/2018 2:38 PM, Andy C wrote:
> > Was not quite sure from reading the JIRA why the Zookeeper team felt the
> > issue was so critical that they felt the need to pull the release from
> > their mirrors.
>
> If somebody upgrades their servers from an earlier 3.4.x release to
> 3.4.11, then 3.4.11 might be unable to properly read the existing data
> because it'll be looking in the wrong place.  Worst-case scenario could
> result in all data in a ZK ensemble disappearing, and the admin might
> have no idea why it all disappeared.  (the data would probably still be
> recoverable from the disk)
>
> That's why it was pulled.
>
> > It does present something of a PR issue for us, if we tell our customers
> to
> > use a ZK version that has been pulled from the mirrors. Any plans to move
> > to ZK 3.4.12 in future releases?
>
> There should be no issues with running 3.4.12 servers with the 3.4.11
> client in Solr.  Other version combinations are likely to work as well,
> though there are typically a lot of bugfixes included in later ZK
> releases, so running the latest stable release is recommended.
>
> The ZOOKEEPER-2960 problem is ONLY on the server side.  As I mentioned
> before, the ZK version information in the release notes is not a
> recommendation, it serves to inform users what version of ZK is included
> in Solr.
>
> Thanks,
> Shawn
>
>


Re: Upgrading to Solr 7.3 but Zookeeper 3.4.11 no longer available on Zookeeper mirror sites

2018-05-09 Thread Andy C
Thank Erick.

Was not quite sure from reading the JIRA why the Zookeeper team felt the
issue was so critical that they felt the need to pull the release from
their mirrors.

I guess the biggest issue is if you started out with a single ZK instance
and then implemented a ZK cluster that it would invert the dataDir and
dataLogDir directories.

It does present something of a PR issue for us, if we tell our customers to
use a ZK version that has been pulled from the mirrors. Any plans to move
to ZK 3.4.12 in future releases?

Thanks,
- Andy -

On Wed, May 9, 2018 at 4:09 PM, Erick Erickson 
wrote:

> That bug isn't all that critical, at worst you may have to invert
> where your two directories point.
>
> 3.4.11 is available from https://archive.apache.org/dist/zookeeper/
>
> Best,
> Erick
>
> On Wed, May 9, 2018 at 12:51 PM, Andy C  wrote:
> > According to the 7.3 release notes I should be using Zookeeper 3.4.11
> with
> > Solr 7.3.
> >
> > However it appears that Zookeeper has pulled Zookeeper 3.4.11 from their
> > mirror sites (this appears to be due to a serious bug in ZK 3.4.11 -
> > ZOOKEEPER-2960) <https://issues.apache.org/jira/browse/ZOOKEEPER-2960> .
> > Only 3.4.10 and 3.4.12 are available.
> >
> > Not quite sure how to proceed. Can I use ZK 3.4.10 or 3.4.12 with Solr
> 7.3?
> > Or should I try to find an archived version of ZK 3.4.11 somewhere?
> >
> > Will Solr 7.3.1 or 7.4 be integrated with ZK 3.4.12? If so, what is the
> > expected time frame for these releases?
> >
> > Would appreciate any guidance.
> >
> > Thanks,
> > - Andy -
>


Upgrading to Solr 7.3 but Zookeeper 3.4.11 no longer available on Zookeeper mirror sites

2018-05-09 Thread Andy C
According to the 7.3 release notes I should be using Zookeeper 3.4.11 with
Solr 7.3.

However it appears that Zookeeper has pulled Zookeeper 3.4.11 from their
mirror sites (this appears to be due to a serious bug in ZK 3.4.11 -
ZOOKEEPER-2960) <https://issues.apache.org/jira/browse/ZOOKEEPER-2960> .
Only 3.4.10 and 3.4.12 are available.

Not quite sure how to proceed. Can I use ZK 3.4.10 or 3.4.12 with Solr 7.3?
Or should I try to find an archived version of ZK 3.4.11 somewhere?

Will Solr 7.3.1 or 7.4 be integrated with ZK 3.4.12? If so, what is the
expected time frame for these releases?

Would appreciate any guidance.

Thanks,
- Andy -


Re: Adding Documents to Solr by using Java Client API is failed

2018-03-19 Thread Andy Tang
Erik,

Thank you so much!

On Sat, Mar 17, 2018 at 5:50 PM, Erick Erickson 
wrote:

> So if you're saying that the docs are successfully added, then you can
> ignore the SLF4J messages. They're just telling you that you don't have
> logging configured. If your client application wants to use a logging
> framework you have to do additional work.
>
> Solr (and SolrJ) allow you to use whatever SLF4J-compliant implementation
> you want for logging, but you must configure it. The referenced link will
> give
> you a start.
>
> But for test programs it's not _necessary_
>
> Best,
> Erick
>
>
> On Fri, Mar 16, 2018 at 2:02 PM, Andy Tang  wrote:
> > Erik,
> >
> > Thank you for reminding.
> > javac -cp
> > .:/opt/solr/solr-6.6.2/dist/*:/opt/solr/solr-6.6.2/dist/solrj-lib/*
> >  AddingDocument.java
> >
> > java -cp
> > .:/opt/solr/solr-6.6.2/dist/*:/opt/solr/solr-6.6.2/dist/solrj-lib/*
> >  AddingDocument
> >
> > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> > SLF4J: Defaulting to no-operation (NOP) logger implementation
> > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for
> further
> > details.
> > Documents added
> >
> > All jars are included and documents added successfully. However, there
> are
> > some error message coming out.
> >
> > Thank you.
> >
> >
> > On Fri, Mar 16, 2018 at 12:43 PM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> >> this is the important bit:
> >>
> >> java.lang.NoClassDefFoundError: org/apache/http/Header
> >>
> >> That class is not defined in the Solr code at all, it's in
> >> httpcore-#.#.#.jar
> >>
> >> You probably need to include /opt/solr/solr-6.6.2/dist/solrj-lib in
> >> your classpath.
> >>
> >> Best,
> >> Erick
> >>
> >> On Fri, Mar 16, 2018 at 12:14 PM, Andy Tang 
> >> wrote:
> >> > I have the code to add document to Solr. I tested it in Both Solr
> 6.6.2
> >> and
> >> > Solr 7.2.1 and failed.
> >> >
> >> > import java.io.IOException;  import
> >> > org.apache.solr.client.solrj.SolrClient; import
> >> > org.apache.solr.client.solrj.SolrServerException; import
> >> > org.apache.solr.client.solrj.impl.HttpSolrClient; import
> >> > org.apache.solr.common.SolrInputDocument;
> >> > public class AddingDocument {
> >> >public static void main(String args[]) throws Exception {
> >> >
> >> > String urlString ="http://localhost:8983/solr/Solr_example";;
> >> >  SolrClient Solr = new HttpSolrClient.Builder(urlString).build();
> >> >
> >> >   //Preparing the Solr document
> >> >   SolrInputDocument doc = new SolrInputDocument();
> >> >
> >> >   //Adding fields to the document
> >> >   doc.addField("id", "007");
> >> >   doc.addField("name", "James Bond");
> >> >   doc.addField("age","45");
> >> >   doc.addField("addr","England");
> >> >
> >> >   //Adding the document to Solr
> >> >   Solr.add(doc);
> >> >
> >> >   //Saving the changes
> >> >   Solr.commit();
> >> >   System.out.println("Documents added");
> >> >} }
> >> >
> >> > The compilation is successful like below.
> >> >
> >> > javac -cp .:/opt/solr/solr-6.6.2/dist/solr-solrj-6.6.2.jar
> >> > AddingDocument.java
> >> >
> >> > However, when I run it, it gave me some errors message confused.
> >> >
> >> > java -cp .:/opt/solr/solr-6.6.2/dist/solr-solrj-6.6.2.jar
> AddingDocument
> >> >
> >> > Exception in thread "main" java.lang.NoClassDefFoundError:
> >> > org/apache/http/Header
> >> > at org.apache.solr.client.solrj.impl.HttpSolrClient$Builder.
> >> build(HttpSolrClient.java:892)
> >> > at AddingDocument.main(AddingDocument.java:13)Caused by:
> >> > java.lang.ClassNotFoundException: org.apache.http.Header
> >> > at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> >> > at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> >> > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
> >> > at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> >> > ... 2 more
> >> >
> >> > What is wrong with it? Is this urlString correct?
> >> >
> >> > Any help is appreciated!
> >> > Andy Tang
> >>
>


Re: Adding Documents to Solr by using Java Client API is failed

2018-03-16 Thread Andy Tang
Erik,

Thank you for reminding.
javac -cp
.:/opt/solr/solr-6.6.2/dist/*:/opt/solr/solr-6.6.2/dist/solrj-lib/*
 AddingDocument.java

java -cp
.:/opt/solr/solr-6.6.2/dist/*:/opt/solr/solr-6.6.2/dist/solrj-lib/*
 AddingDocument

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further
details.
Documents added

All jars are included and documents added successfully. However, there are
some error message coming out.

Thank you.


On Fri, Mar 16, 2018 at 12:43 PM, Erick Erickson 
wrote:

> this is the important bit:
>
> java.lang.NoClassDefFoundError: org/apache/http/Header
>
> That class is not defined in the Solr code at all, it's in
> httpcore-#.#.#.jar
>
> You probably need to include /opt/solr/solr-6.6.2/dist/solrj-lib in
> your classpath.
>
> Best,
> Erick
>
> On Fri, Mar 16, 2018 at 12:14 PM, Andy Tang 
> wrote:
> > I have the code to add document to Solr. I tested it in Both Solr 6.6.2
> and
> > Solr 7.2.1 and failed.
> >
> > import java.io.IOException;  import
> > org.apache.solr.client.solrj.SolrClient; import
> > org.apache.solr.client.solrj.SolrServerException; import
> > org.apache.solr.client.solrj.impl.HttpSolrClient; import
> > org.apache.solr.common.SolrInputDocument;
> > public class AddingDocument {
> >public static void main(String args[]) throws Exception {
> >
> > String urlString ="http://localhost:8983/solr/Solr_example";;
> >  SolrClient Solr = new HttpSolrClient.Builder(urlString).build();
> >
> >   //Preparing the Solr document
> >   SolrInputDocument doc = new SolrInputDocument();
> >
> >   //Adding fields to the document
> >   doc.addField("id", "007");
> >   doc.addField("name", "James Bond");
> >   doc.addField("age","45");
> >   doc.addField("addr","England");
> >
> >   //Adding the document to Solr
> >   Solr.add(doc);
> >
> >   //Saving the changes
> >   Solr.commit();
> >   System.out.println("Documents added");
> >} }
> >
> > The compilation is successful like below.
> >
> > javac -cp .:/opt/solr/solr-6.6.2/dist/solr-solrj-6.6.2.jar
> > AddingDocument.java
> >
> > However, when I run it, it gave me some errors message confused.
> >
> > java -cp .:/opt/solr/solr-6.6.2/dist/solr-solrj-6.6.2.jar AddingDocument
> >
> > Exception in thread "main" java.lang.NoClassDefFoundError:
> > org/apache/http/Header
> > at org.apache.solr.client.solrj.impl.HttpSolrClient$Builder.
> build(HttpSolrClient.java:892)
> > at AddingDocument.main(AddingDocument.java:13)Caused by:
> > java.lang.ClassNotFoundException: org.apache.http.Header
> > at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> > ... 2 more
> >
> > What is wrong with it? Is this urlString correct?
> >
> > Any help is appreciated!
> > Andy Tang
>


Recovering from machine failure

2018-03-16 Thread Andy C
Running Solr 7.2 in SolrCloud mode with 5 Linux VMs. Each VM was a single
shard, no replication. Single Zookeeper instance running on the same VM as
one of the Solr instances.

IT was making changes, and 2 of the VMs won't reboot (including the VM
where Zookeeper is installed). There was a dedicated drive which Solr (and
Zookeeper for the one node) where installed on, and a dedicated drive where
the Solr indexes were created.

They believe the drives are still good. Their plan is to create 2 new VMs
and attach the drives from the old VMs to them. However the IP addresses of
the new VMs will be different.

In the solr.in.sh I had set the SOLR_HOST entry to the IP address of the
VM. Is this just an arbitrary name? Will Zookeeper still recognize the Solr
instance if the SOLR_HOST entry doesn't match the IP address.

Obviously I will need to adjust the ZK_HOST entries on all nodes to reflect
the new IP address of the VMs. But will that be sufficient?

Appreciate any guidance.

Thanks
- Andy -


Adding Documents to Solr by using Java Client API is failed

2018-03-16 Thread Andy Tang
I have the code to add document to Solr. I tested it in Both Solr 6.6.2 and
Solr 7.2.1 and failed.

import java.io.IOException;  import
org.apache.solr.client.solrj.SolrClient; import
org.apache.solr.client.solrj.SolrServerException; import
org.apache.solr.client.solrj.impl.HttpSolrClient; import
org.apache.solr.common.SolrInputDocument;
public class AddingDocument {
   public static void main(String args[]) throws Exception {

String urlString ="http://localhost:8983/solr/Solr_example";;
 SolrClient Solr = new HttpSolrClient.Builder(urlString).build();

  //Preparing the Solr document
  SolrInputDocument doc = new SolrInputDocument();

  //Adding fields to the document
  doc.addField("id", "007");
  doc.addField("name", "James Bond");
  doc.addField("age","45");
  doc.addField("addr","England");

  //Adding the document to Solr
  Solr.add(doc);

  //Saving the changes
  Solr.commit();
  System.out.println("Documents added");
   } }

The compilation is successful like below.

javac -cp .:/opt/solr/solr-6.6.2/dist/solr-solrj-6.6.2.jar
AddingDocument.java

However, when I run it, it gave me some errors message confused.

java -cp .:/opt/solr/solr-6.6.2/dist/solr-solrj-6.6.2.jar AddingDocument

Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/http/Header
at 
org.apache.solr.client.solrj.impl.HttpSolrClient$Builder.build(HttpSolrClient.java:892)
at AddingDocument.main(AddingDocument.java:13)Caused by:
java.lang.ClassNotFoundException: org.apache.http.Header
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 2 more

What is wrong with it? Is this urlString correct?

Any help is appreciated!
Andy Tang


Re: SolrCloud 7.2.1 - UnsupportedOperationException thrown after query on specific environments

2018-03-05 Thread Andy Jolly
We were able to locate the exact issue after some more digging.  We added a
query to another collection that runs alongside the job we were executing
and we were missing the collection reference in the URL.  If the below query
is run by itself in at least Solr 7, the error will be reproduced.
http://localhost:8983/solr//select?q=*:*

Since the collection was left empty, collectionsList in HttpSolrCall.java
was being set to an immutable Collections.emptyList() by the
resolveCollectionListOrAlias method.  Then, when
collectionsList.add(collectionName) was called in the getRemotCoreUrl method
the error we are seeing is thrown as it is trying to add to an immutable
list.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: SolrCloud 7.2.1 - UnsupportedOperationException thrown after query on specific environments

2018-03-02 Thread Andy Jolly
Erick Erickson wrote
> Maybe your remote job server is using a different set of jars than
> your local one? How does the remote job server work?

The remote job server and our local are running the same code as our local,
and both our local and remote job server are making queries against the same
SolrCloud cluster.  The main difference is we are running the job on our
local through a unit test that kicks off the entire job.

We have noticed that these errors are being thrown on all of our Solr nodes,
not just the node containing the collection that is being queried.


Erick Erickson wrote
> No log snippets came through BTW, so I'm guessing a bit. The Apache
> mail server is quite aggressive about stripping stuff

Here is the log snippet without any formatting.  Hopefully that should work.

2018-03-01 20:01:13.009 INFO  (qtp20671747-2258) [c:mycollection s:shard1
r:core_node2 x:mycollection_shard1_replica_n1] o.a.s.c.S.Request
[mycollection_shard1_replica_n1]  webapp=/solr path=/select
params={q=id:79ea39cb1fe01706a05d9595088fc0e04af7b5bf&defType=edismax&bf=recip(ms(NOW,published_on),3.16e-11,1,1)^2.0&start=0&fq=-excluded_tenants:(1)&fq=type:(News)&rows=1&version=2.2}
hits=1 status=0 QTime=0
2018-03-01 20:01:12.998 INFO  (qtp20671747-2231) [c:mycollection s:shard1
r:core_node2 x:mycollection_shard1_replica_n1] o.a.s.c.S.Request
[mycollection_shard1_replica_n1]  webapp=/solr path=/select
params={q=id:66d7fa7c716633e33aacf5b8514052f42889267f&defType=edismax&start=0&fq=type:(Job)&rows=1&version=2.2}
hits=0 status=0 QTime=0
2018-03-01 20:01:12.998 INFO  (qtp20671747-2257) [c:mycollection s:shard1
r:core_node2 x:mycollection_shard1_replica_n1] o.a.s.c.S.Request
[mycollection_shard1_replica_n1]  webapp=/solr path=/select
params={q=id:5f02f0d8034a15c4604baec33c40c1f48152ffdf&defType=edismax&start=0&fq=type:(Job)&rows=1&version=2.2}
hits=1 status=0 QTime=0
2018-02-28 20:00:11.713 ERROR (qtp20671747-314) [   ] o.a.s.s.HttpSolrCall
null:java.lang.UnsupportedOperationException
at java.util.AbstractList.add(AbstractList.java:148)
at java.util.AbstractList.add(AbstractList.java:108)
at
org.apache.solr.servlet.HttpSolrCall.getRemotCoreUrl(HttpSolrCall.java:901)
at
org.apache.solr.servlet.HttpSolrCall.extractRemotePath(HttpSolrCall.java:432)
at org.apache.solr.servlet.HttpSolrCall.init(HttpSolrCall.java:289)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:470)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at
org.eclipse.j

SolrCloud 7.2.1 - UnsupportedOperationException thrown after query on specific environments

2018-03-01 Thread Andy Jolly
We are receiving an UnsupportedOperationException after making certain
requests.  The requests themselves do not seem to be causing the issue as
when we run the job that makes these requests locally against the same
SolrCloud cluster where the errors are being thrown, there are no errors. 
These errors only occur when we run this job from our remote job server.  



In the log snippet are the last few requests made before the error is
thrown.

This error started happening after we did an upgrade from solr 6.3.0 to Solr
7.2.1.  We believe the line causing the issue in Solr's HttpSolrCall.java is
/collectionsList.add(collectionName);/ in the getRemotCoreUrl method as
collectionsList is immutable (perhaps because it is set to
Collections.emptyList() in the resolveCollectionListOrAlias method).

We are not completely confident this is a bug in Solr, but are unsure what
could be causing this issue on our end.  Does anyone have any insights as to
what could cause this error?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


./fs-manager process run under solr

2018-01-10 Thread Andy Fake
Hi,

I use Solr 5.5, I recently notice a process a process ./fs-manager is run
under user solr that take quite high CPU usage. I don't think I see such
process before.

Is that a legitimate process from Solr?

Thanks.


Re: After upgrade to Solr 6.5, q.op=AND affects filter query differently than in older version

2017-05-01 Thread Andy C
Thanks for the response Shawn.

Adding "*:*" in front of my filter query does indeed resolve the issue. It
seems odd to me that the fully negated query does work if I don't set
q.op=AND. I guess this must be "adding complexity". Actually I just
discovered that that simply removing the extraneous outer parenthesis
[ fq=-ctindex:({*
TO "MyId"} OR {"MyId" TO *}) ] also resolved the issue.

Your state that the best performing query that gives the desired results is:

> fq=ctindex:myId OR (*:* -ctindex:[* TO *])

Is this because there some sort of optimization invoked when you use [* TO
*], or just because a single range will be more efficient than multiple
ranges ORed together?

I was considering generating an additional field "ctindex_populated" that
would contain true or false depending on whether a ctindex value is
present. And then changing the filter query to:

fq=ctindex_populated:false OR ctindex:myId

Would this be more efficient than your proposed filter query?

Thanks again,
- Andy -

On Mon, May 1, 2017 at 10:19 AM, Shawn Heisey  wrote:

> On 4/26/2017 1:04 PM, Andy C wrote:
> > I'm looking at upgrading the version of Solr used with our application
> from
> > 5.3 to 6.5.
> >
> > Having an issue with a change in the behavior of one of the filter
> queries
> > we generate.
> >
> > The field "ctindex" is only present in a subset of documents. It
> basically
> > contains a user id. For those documents where it is present, I only want
> > documents returned where the ctindex value matches the id of the user
> > performing the search. Documents with no ctindex value should be returned
> > as well.
> >
> > This is implemented through a filter query that excludes documents that
> > contain some other value in the ctindex field: fq=(-ctindex:({* TO
> "MyId"}
> > OR {"MyId" TO *}))
>
> I am surprised that this works in 5.3.  The crux of the problem is that
> fully negative query clauses do not actually work.
>
> Here's the best-performing query that gives you the results you want:
>
> fq=ctindex:myId OR (*:* -ctindex:[* TO *])
>
> The *:* is needed in the second clause to give the query a starting
> point of all documents, from which is subtracted all documents where
> ctindex has a value.  Without the "all docs" starting point, you are
> subtracting from nothing, which yields nothing.
>
> You may notice that this query works perfectly, and wonder why:
>
> fq=-ctindex:[* TO *]
>
> This works because on such a simple query, Solr is able to detect that
> it is fully negated, so it implicitly adds the *:* starting point for
> you.  As soon as you implement any kind of complexity (multiple clauses,
> parentheses, etc) that detection doesn't work.
>
> Thanks,
> Shawn
>
>


After upgrade to Solr 6.5, q.op=AND affects filter query differently than in older version

2017-04-26 Thread Andy C
I'm looking at upgrading the version of Solr used with our application from
5.3 to 6.5.

Having an issue with a change in the behavior of one of the filter queries
we generate.

The field "ctindex" is only present in a subset of documents. It basically
contains a user id. For those documents where it is present, I only want
documents returned where the ctindex value matches the id of the user
performing the search. Documents with no ctindex value should be returned
as well.

This is implemented through a filter query that excludes documents that
contain some other value in the ctindex field: fq=(-ctindex:({* TO "MyId"}
OR {"MyId" TO *}))

In 6.5 if q.op=AND I always get 0 results returned when the fq is used.
This wasn't the case in 5.3. If I remove the q.op parameter (or set it to
OR) I get the expected results.

I can reproduce this in the Solr Admin UI. If I enable debugQuery, the
parsed_filter_queries output is different with q.op=AND and with no q.op
parameter:

For q.op=AND I see: ["+(-(SolrRangeQuery(ctindex:{* TO MyId})
SolrRangeQuery(ctindex:{MyId TO *})))"]

With no q.op set I get: ["-(SolrRangeQuery(ctindex:{* TO MyId})
SolrRangeQuery(ctindex:{MyId TO *}))"]

In 5.3 I always get the same parsed_filter_queries output regardless of the
q.op setting: ["-(ctindex:{* TO MyId} ctindex:{MyId TO *})"]

Any idea what is going on, or how to make the behavior of this filter query
independent of the q.op setting?

More details:
- Using the standard query parser
- The fieldType of the ctindex field is "string"
- I upgraded to 6.5 by copying my 5.3 config files over, updating the
schema version to 1.6 in the schema.xml, updating the luceneMatchVersion to
6.5.0 in the solrconfig.xml, and building a brand new index.

Thanks,
- Andy -


Re: Overlapped Gap Facets

2016-11-17 Thread Andy C
You might want to look at using Interval Facets (
https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-IntervalFaceting)
in combination with relative dates specified using the Date Math feature (
https://cwiki.apache.org/confluence/display/solr/Working+with+Dates)

You would have to decide exactly what you mean by each of these intervals.
Does "Last 1 Day" mean  today (which could be specified by the interval
"[NOW/DAY, NOW/DAY+1DAYS)"), yesterday and today ("[NOW/DAY-1DAYS,
NOW/DAY+1DAYS)"), etc.

You could decide that you want it to mean the last 24 hours
("[NOW-1DAYS,NOW]"), but be aware that when you subsequently restrict your
query using one of these intervals using NOW without rounding has a
negative impact on the filter query cache (see
https://dzone.com/articles/solr-date-math-now-and-filter for a better
explanation than I could provide.

- Andy -

On Thu, Nov 17, 2016 at 10:46 AM, David Santamauro <
david.santama...@gmail.com> wrote:

>
> I had a similar question a while back but it was regarding date
> differences. Perhaps that might give you some ideas.
>
> http://lucene.472066.n3.nabble.com/date-difference-faceting-td4249364.html
>
> //
>
>
>
>
> On 11/17/2016 09:49 AM, Furkan KAMACI wrote:
>
>> Is it possible to do such a facet on a date field:
>>
>>   Last 1 Day
>>   Last 1 Week
>>   Last 1 Month
>>   Last 6 Month
>>   Last 1 Year
>>   Older than 1 Year
>>
>> which has overlapped facet gaps?
>>
>> Kind Regards,
>> Furkan KAMACI
>>
>>


Re: Zero value fails to match Positive, Negative, or Zero interval facet

2016-10-21 Thread Andy C
Upon further investigation this is a bug in Solr.

If I change the order of my interval definitions to be Negative, Zero,
Positive, instead of Negative, Positive, Zero it correctly assigns the
document with the zero value to the Zero interval.

I dug into the 5.3.1 code and the problem is in the
org.apache.solr.request.IntervalFacets class. When the getSortedIntervals()
method sorts the interval definitions for a field by their starting value
is doesn't take into account the startOpen property. When two intervals
have equal start values it needs to sort intervals where startOpen == false
before intervals where startOpen == true.

In the accumIntervalWithValue() method it checks which intervals each
document value should be considered a match for. It iterates through the
sorted intervals and stops checking subsequent intervals when
LOWER_THAN_START result is returned. If the Positive interval is sorted
before the Zero interval it never checks a zero value against the Zero
interval.

I modified the compareStart() implementation and it seems to work correctly
now (see below). I also compared the 5.3.1 version of the IntervalFacets
class against the 6.2.1 code, and it looks like the same issue will occur
in 6.2.1.

How should I proceed from here?

Thanks,
- Andy -

  private int compareStart(FacetInterval o1, FacetInterval o2) {
if (o1.start == null) {
  if (o2.start == null) {
return 0;
  }
  return -1;
}
if (o2.start == null) {
  return 1;
}
//return o1.start.compareTo(o2.start);
int startComparison = o1.start.compareTo(o2.start);
if (startComparison == 0) {
  if (o1.startOpen != o2.startOpen) {
if (!o1.startOpen) {
  return -1;
}
else {
  return 1;
}
  }
}
return startComparison;
  }

On Wed, Oct 19, 2016 at 2:47 PM, Andy C  wrote:

> I have a field called "SCALE_double" that is defined as multivalued with
> the fieldType "tdouble".
>
> "tdouble" is defined as:
>
>  omitNorms="true" positionIncrementGap="0"/>
>
> I have a document with the value "0" indexed for this field. I am able to
> successfully retrieve the document with the range query "SCALE_double:[0 TO
> 0]". However it doesn't match any of the interval facets I am trying to
> populate that match negative, zero, or positive values:
>
> "{!key=\"Negative\"}(*,0)",
> "{!key=\"Positive\"}(0,*]",
> "{!key=\"Zero\"}[0,0]"
>
> I assume this is some sort of precision issue with the TrieDoubleField
> implementation (if I change the Zero interval to
> "(-.01,+.01)" it now considers the document a match).
> However the range query works fine (I had assumed that the interval was
> just converted to a range query internally), and it fails to show up in the
> Negative or Positive intervals either.
>
> Any ideas what is going on, and if there is anything I can do to get this
> to work correctly? I am using Solr 5.3.1. I've pasted the output from the
> Solr Admin UI query below.
>
> Thanks,
> - Andy -
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 0,
> "params": {
>   "facet": "true",
>   "fl": "SCALE_double",
>   "facet.mincount": "1",
>   "indent": "true",
>   "facet.interval": "SCALE_double",
>   "q": "SCALE_double:[0 TO 0]",
>   "facet.limit": "100",
>   "f.SCALE_double.facet.interval.set": [
> "{!key=\"Negative\"}(*,0)",
> "{!key=\"Positive\"}(0,*]",
> "{!key=\"Zero\"}[0,0]"
>   ],
>   "_": "1476900130184",
>   "wt": "json"
> }
>   },
>   "response": {
> "numFound": 1,
> "start": 0,
> "docs": [
>   {
> "SCALE_double": [
>   0
> ]
>   }
> ]
>   },
>   "facet_counts": {
> "facet_queries": {},
> "facet_fields": {},
> "facet_dates": {},
> "facet_ranges": {},
> "facet_intervals": {
>   "SCALE_double": {
> "Negative": 0,
> "Positive": 0,
> "Zero": 0
>   }
> },
> "facet_heatmaps": {}
>   }
> }
>


Zero value fails to match Positive, Negative, or Zero interval facet

2016-10-19 Thread Andy C
I have a field called "SCALE_double" that is defined as multivalued with
the fieldType "tdouble".

"tdouble" is defined as:



I have a document with the value "0" indexed for this field. I am able to
successfully retrieve the document with the range query "SCALE_double:[0 TO
0]". However it doesn't match any of the interval facets I am trying to
populate that match negative, zero, or positive values:

"{!key=\"Negative\"}(*,0)",
"{!key=\"Positive\"}(0,*]",
"{!key=\"Zero\"}[0,0]"

I assume this is some sort of precision issue with the TrieDoubleField
implementation (if I change the Zero interval to
"(-.01,+.01)" it now considers the document a match).
However the range query works fine (I had assumed that the interval was
just converted to a range query internally), and it fails to show up in the
Negative or Positive intervals either.

Any ideas what is going on, and if there is anything I can do to get this
to work correctly? I am using Solr 5.3.1. I've pasted the output from the
Solr Admin UI query below.

Thanks,
- Andy -

{
  "responseHeader": {
"status": 0,
"QTime": 0,
"params": {
  "facet": "true",
  "fl": "SCALE_double",
  "facet.mincount": "1",
  "indent": "true",
  "facet.interval": "SCALE_double",
  "q": "SCALE_double:[0 TO 0]",
  "facet.limit": "100",
  "f.SCALE_double.facet.interval.set": [
"{!key=\"Negative\"}(*,0)",
"{!key=\"Positive\"}(0,*]",
"{!key=\"Zero\"}[0,0]"
  ],
  "_": "1476900130184",
  "wt": "json"
}
  },
  "response": {
"numFound": 1,
"start": 0,
"docs": [
  {
"SCALE_double": [
  0
]
  }
]
  },
  "facet_counts": {
"facet_queries": {},
"facet_fields": {},
"facet_dates": {},
"facet_ranges": {},
"facet_intervals": {
  "SCALE_double": {
"Negative": 0,
"Positive": 0,
"Zero": 0
  }
},
"facet_heatmaps": {}
  }
}


Preceding special characters in ClassicTokenizerFactory

2016-10-03 Thread Whelan, Andy
Hello,
I am guessing that what I am looking for is probably going to require extending 
StandardTokenizerFactory or ClassicTokenizerFactory. But I thought I would ask 
the group here before attempting this. We are indexing documents from an 
eclectic set of sources. There is, however, a heavy interest in computing and 
social media sources. So computer terminology and social media terms (terms 
beginning with hashes (#), @ symbols, etc.) are terms that we would like to 
have searchable.

We are considering the ClassicTokenizerFactory because we like the fact that it 
does not use the Unicode standard annex 
UAX#29 word boundary rules. 
It preserves email addresses, internet domain names, etc.  We would also like 
to use it as the tokenizer element of index and query analyzers that would 
preserve @< rest of token > or # patterns.

I have seen examples where folks are replacing the StandardTokenizerFactory in 
their analyzer with stream combinations made up of charFilters,  
WhitespaceTokenizerFactory, etc. as in the following article 
http://www.prowave.io/indexing-special-terms-using-solr/ to remedy such 
problems.

Example:
 
 
 
 
 
 
 
 
 
 
 
 


I am just wondering if anyone knew of a smart way (without extending classes) 
to actually preserve most of the ClassicTokenizerFactory functionality without 
getting rid of leading special characters? The "Solr In Action" book (page 179) 
claims that it is hard to extend the StandardTokenizerFactory. I'm assuming 
this is the same for ClassicTokenizerFactory.

Thanks
-Andrew



Are there issues with the use of SolrCloud / embedded Zookeeper in non-HA deployments?

2016-07-28 Thread Andy C
We have integrated Solr 5.3.1 into our product. During installation
customers have the option of setting up a single Solr instance, or for high
availability deployments, multiple Solr instances in a master/slave
configuration.

We are looking at migrating to SolrCloud for HA deployments, but are
wondering if it makes sense to also use SolrCloud in non-HA deployments?

Our thought is that this would simplify things. We could use the same
approach for deploying our schema.xml and other configuration files on all
systems, we could always use the SolrJ CloudSolrClient class to communicate
with Solr, etc.

Would it make sense to use the embedded Zookeeper instance in this
situation? I have seen warning that the embedded Zookeeper should not be
used in production deployments, but the reason generally given is that if
Solr goes down Zookeeper will also go down, which doesn't seem relevant
here. Are there other reasons not to use the embedded Zookeeper?

More generally, are there downsides to using SolrCloud with a single
Zookeeper node and single Solr node?

Would appreciate any feedback.

Thanks,
Andy


using lucene parser syntax with eDisMax

2016-07-15 Thread Whelan, Andy
Hello,

I am using the eDisMax parser and have the following question.
With the eDisMax parser we can pass a query, q="brown and mazda",  and 
configure a bunch of fields in a solrconfig.xml SearchHandler to query on as 
"qf". Let's say I have a SOLR schema.xml with the following fields:



and the following request handler in solrconfig.xml:


edismax
color brand
 


This makes boosting very easy.  I can execute a query "q=brown^2.0 and 
mazda^3.0") against the query handler "/select" above without specifying fields 
in the query string.  I can do this without having to copy color and brand to a 
specific catch all field as I would with the "lucene" parser (which would be 
configured as the default field "df").
The documentation at 
https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser
 says that eDisMax "supports the full Lucene query parser syntax".
Does this mean that a query string "color:brown^2 and mazda" is legal with 
eDisMax too?  Notice that I am specifying the color field in the query (lucene 
parser syntax). If the answer is yes, does this mean that "brown" is only 
filtered against the color field and mazda will be filtered against both the 
color field and the brand field?
Thanks!



Facet in SOLR Cloud vs Core

2016-07-07 Thread Whelan, Andy
Hello,

I have am somewhat of a novice when it comes to using SOLR in a distributed 
SolrCloud environment. My team and I are doing development work with a SOLR 
core. We will shortly be transitioning over to a SolrCloud environment.

My question specifically has to do with Facets in a SOLR cloud/collection 
(distributed environment). The core I am working with has a field 
"dataSourceName" defined as following in its schema.xml file.



I am using the following facet query which works fine in more Core based index

http://localhost:8983/solr/gamra/select?q=*:*&rows=0&facet=true&facet.field=dataSourceName

It returns counts for each distinct dataSourceName as follows (which is the 
desired behavior).


   
  169
  121
  68
   


I am wondering if this should work fine in the SOLR Cloud as well?  Will this 
method give me accurate counts out of the box in a SOLR Cloud configuration?

Thanks
-Andrew

PS: The reason I ask is because I know there is some estimating performed in 
certain cases for the Facet "unique" function (as is outlined here: 
http://yonik.com/solr-count-distinct/ ). So I guess I am wondering why folks 
wouldn't just do what I have done vs going throught the trouble of using the 
unique(dataSourceName) function?




Re: Storing positions and offsets vs FieldType IndexOptions DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

2015-05-30 Thread Andy Lee
I also met the same problem, could you tell me why? Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Storing-positions-and-offsets-vs-FieldType-IndexOptions-DOCS-AND-FREQS-AND-POSITIONS-AND-OFFSETS-tp4061354p4208875.html
Sent from the Solr - User mailing list archive at Nabble.com.


Duplicate scoring situation in DelegatingCollector

2014-11-14 Thread Andy Crossen
Hi folks,

I have a DelegatingCollector installed via a PostFilter (kind of like an
AnalyticsQuery) that needs the document score to a) add to a collection of
score-based stats, and b) decide whether to keep the document based on the
score.

If I keep the document, I call super.collect() (where super is a
TopScoreDocCollector), which re-scores the document in its collect method.
The scoring is custom and reasonably expensive.

Is there an easy way to avoid this?  Or do I have to stop calling
super.collect(), manage my own bitset/PQ, and pass the filtered results in
the DelegatingCollector's finish() method?

There's a thread out there ("Configurable collectors for custom ranking")
that kind of talks about the above.  Seems cumbersome.

Thanks for any direction!


Re: Using hundreds of dynamic fields

2014-07-16 Thread Andy Crossen
Thanks, Jack and Jared, for your input on this.  I'm looking into whether
parent-child relationships via block or query time join will meet my
requirements.

Jack, I noticed in a bunch of other posts around the web that you've
suggested to use dynamic fields in moderation.  Is this suggestion based on
negative performance implications of having to read and rewrite all
previous fields for a document when doing atomic updates?  Or are there
additional inherent negatives to using lots of dynamic fields?

Andy


On Fri, Jun 27, 2014 at 11:46 AM, Jared Whiklo 
wrote:

> This is probably not the best answer, but my gut says that even if you
> changed your document to a simple 2 fields and have one as your metric and
> the other as a TrieDateField you would speed up and simplify your date
> range queries.
>
>
> --
> Jared Whiklo
>
>
>
> On 2014-06-27 10:10 AM, "Andy Crossen"  wrote:
>
> >Hi folks,
> >
> >My application requires tracking a daily performance metric for all
> >documents. I start tracking for an 18 month window from the time a doc is
> >indexed, so each doc will have ~548 of these fields.  I have in my schema
> >a
> >dynamic field to capture this requirement:
> >
> >
> >
> >Example:
> >metric_2014_06_24 : 15
> >metric_2014_06_25 : 21
> >…
> >
> >My application then issues a query that:
> >a) sorts documents by the sum of the metrics within a date range that is
> >variable for each query;
> >b) gathers stats on the metrics using the Statistics component.
> >
> >With this design, the app must unfortunately:
> >a) construct the sort as a long list of fields within the spec’d date
> >range
> >to accomplish the sum; e.g. sort=sum(metric_2014_06_24,metric_2014_06_25…)
> >desc
> >b) specify each field in the range independently to the Stats component;
> >e.g. stats.field=metric_2014_06_24&stats.field=metric_2014_06_25…
> >
> >Am I missing a cleaner way to accomplish this given the requirements
> >above?
> >
> >Thanks for any suggestions you may have.
>
>


Using hundreds of dynamic fields

2014-06-27 Thread Andy Crossen
Hi folks,

My application requires tracking a daily performance metric for all
documents. I start tracking for an 18 month window from the time a doc is
indexed, so each doc will have ~548 of these fields.  I have in my schema a
dynamic field to capture this requirement:



Example:
metric_2014_06_24 : 15
metric_2014_06_25 : 21
…

My application then issues a query that:
a) sorts documents by the sum of the metrics within a date range that is
variable for each query;
b) gathers stats on the metrics using the Statistics component.

With this design, the app must unfortunately:
a) construct the sort as a long list of fields within the spec’d date range
to accomplish the sum; e.g. sort=sum(metric_2014_06_24,metric_2014_06_25…)
desc
b) specify each field in the range independently to the Stats component;
e.g. stats.field=metric_2014_06_24&stats.field=metric_2014_06_25…

Am I missing a cleaner way to accomplish this given the requirements above?

Thanks for any suggestions you may have.


Re: [ANN] Heliosearch 0.06 released, native code faceting

2014-06-19 Thread Andy
Congrats! Any idea when will native faceting & off-heap fieldcache be available 
for multivalued fields? Most of my fields are multivalued so that's the big one 
for me.

Andy


On Thursday, June 19, 2014 3:46 PM, Yonik Seeley  wrote:
 


FYI, for those who want to try out the new native code faceting, this
is the first release containing it (for single valued string fields
only as of yet).

http://heliosearch.org/download/

Heliosearch v0.06

Features:
o  Heliosearch v0.06 is based on (and contains all features of)
Lucene/Solr 4.9.0
o  Native code faceting for single valued string fields.
    - Written in C++, statically compiled with gcc for Windows, Mac OS-X, Linux
    - static compilation avoids JVM hotspot warmup period,
mis-compilation bugs, and variations between runs
    - Improves performance over 2x
o  Top level Off-heap fieldcache for single valued string fields in nCache.
    - Improves sorting and faceting speed
    - Reduces garbage collection overhead
    - Eliminates FieldCache “insanity” that exists in Apache Solr from
faceting and sorting on the same field
o  Full request Parameter substitution / macro expansion, including
default value support.
o  frange query now only returns documents with a value.
     For example, in Apache Solr, {!frange l=-1 u=1 v=myfield} will
also return documents without a value since the numeric default value
of 0 lies within the range requested.
o  New JSON features via Noggit upgrade, allowing optional comments
(C/C++ and shell style), unquoted keys, and relaxed escaping that
allows one to backslash escape any character.


-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data

Apache Solr Configuration Problem (Japanese Language)

2014-03-05 Thread Andy Alexander
I am trying to pass a string of Japanese characters to an Apache Solr
query. The string in question is '製品'.

When a search is passed without any arguments, it brings up all of the
indexed information, including all of the documents that have this
particular string in them, however when this parameter is passed in as q=製品,
only one of the items is displayed.

Furthermore, when I have the query fq=ss_language:ja&q=製品 *three* items are
shown.

What would cause this peculiar behavior? The field in question where I am
searching for this string is indexed, and my assumption is that it should
bring up all documents with this string inside of them.

Here's the debug information:


製品
製品
+DisjunctionMaxQuery((content:製品)~0.01)
+(content:製品)~0.01


0.41303736 = (MATCH) fieldWeight(content:製品 in 80), product of:
1.4142135 = tf(termFreq(content:製品)=2) 5.3405533 = idf(docFreq=3,
maxDocs=307) 0.0546875 = fieldNorm(field=content, doc=80)


0.33378458 = (MATCH) fieldWeight(content:製品 in 66), product of: 1.0 =
tf(termFreq(content:製品)=1) 5.3405533 = idf(docFreq=3, maxDocs=307)
0.0625 = fieldNorm(field=content, doc=66)


0.2529327 = (MATCH) fieldWeight(content:製品 in 46), product of:
3.4641016 = tf(termFreq(content:製品)=12) 5.3405533 = idf(docFreq=3,
maxDocs=307) 0.013671875 = fieldNorm(field=content, doc=46)


ExtendedDismaxQParser



ss_language:ja


ss_language:ja


1.0

0.0

0.0


0.0


0.0


0.0


0.0


0.0


0.0



1.0

0.0


0.0


0.0


0.0


0.0


0.0


1.0






Sorting by a dynamically-generated field in a distributed context

2014-01-21 Thread Andy Crossen
Hi folks,

Using Solr 4.6.0 in a cloud configuration, I'm developing a SearchComponent
that generates a custom score for each document.  Its operational flow
looks like this:

1. The score is derived from an analysis of search results coming out of
the QueryComponent.  Therefore, the component is installed after
QueryComponent in the processing chain.
2. The scores are generated in the component's process method (i.e. at the
shard level), and a map of uniqueKey:score is attached to each shard's
response at this point.
3. The shard-wise maps are combined in handleResponses and the aggregate
map is attached to the top-level distributed query's response.
4. In the finishStage method at the coordinator node level (i.e. response
stage = Get Fields), I'm presented with the final list of search results
sorted by Lucene score.  My custom scores are now added as fields to their
corresponding documents based on a uniqueKey lookup in the aggregate score
map.

Now I need to sort the final document list (or do it at the shard level) by
the custom score, but I'm having trouble understanding how to accomplish
this.  Yes, I could just sort my list (which will never exceed 1K results)
in finishStage and be done with it, but I'm trying to learn Solr best
practices to see if there's a better way.  At the end of the day, I'd like
to be able to take advantage of the "sort" request parameter to effect my
sort.

Given the current operational flow, it seems like I'd need to add a new
SortField for my score in step 4 and reinvoke QueryComponent's mergeIds
sort routine now that my custom field is present in the document list.  Of
course, I can't do that since it's all private code; nor does it seem wise
from an extensibility perspective to copy that code into my component for
use in this manner.

Reading Sujit Pal's blog post on "Custom Sorting in Solr using External
Database Data", I started down the path of defining a custom
FieldType/FieldComparatorSource for my score, but I didn't see how that
would help since the sort is still applied in QueryComponent - before my
custom score is available.  Regardless, Sujit's example seems pretty close
to what I want.

I must be misusing/misunderstanding the distributed design here in some
way.  Can an expert on distributed search components weigh in here?

Thanks!


Highlight: simple.pre/post not being applied always

2013-10-31 Thread Andy Pickler
Solr: 4.5.1

I'm sending in a query of "july" and getting back the results and
highlighting I expect with one exception:




@@@hl@@@Julie@@@endhl@@@ A




#Month:July




The simple.pre of @@@hl@@@ and simple.post of @@@endhl@@@ is not being
applied to the one case of the field "#Month:July", even though it's
included in the highlighting section.  I've tried changing various
highlighting parameters to no avail.  Could someone help me know where to
look for why the pre/post aren't being applied?

Thanks,
Andy Pickler


Re: Join Query Behavior

2013-10-25 Thread Andy Pickler
If it helps to clarify any, here's the full query:

/select
?
q=*:*
&
fq=type:ProjectGroup
&
fq={!join from=project_id_i to=project_id_im}user_id_i:65615 -role_id_i:18
type:UserRole

We have two Solr servers that were indexed from the same database.  One of
the servers is running Solr 4.2, while the other (test server) is running
4.5.

Solr 4.2:


Solr 4.5.1:


Solr 4.2 returns the expected result with the project IDs "filtered" out
from the join query, while the 4.5 query shows *all* results (2642
records).  I can leave off the join query in 4.5 and get the same results,
which tells me obviously it is having no effect.

Is there a change to the join query behavior between these releases, or
could I have configured something differently in my 4.5.1 install?

Thanks,
Andy Pickler

On Thu, Oct 24, 2013 at 2:42 PM, Andy Pickler wrote:

> We're attempting to upgrade from Solr 4.2 to 4.5 but are finding that 4.5
> is not "honoring" this join query:
>
> ...
> &
> fq={!join from=project_id_i to=project_id_im}user_id_i:65615 -role_id_i:18
> type:UserRole
> &
> 
>
> On our Solr 4.2 instance adding/removing that query gives us different
> (and expected) results, while the query doesn't affect the results at all
> in 4.5.  Is there any known join query behavior differences/fixes between
> 4.2 and 4.5 that might explain this, or should I be looking at other
> factors?
>
> Thanks,
> Andy Pickler
>
>


Join Query Behavior

2013-10-24 Thread Andy Pickler
We're attempting to upgrade from Solr 4.2 to 4.5 but are finding that 4.5
is not "honoring" this join query:

...
&
fq={!join from=project_id_i to=project_id_im}user_id_i:65615 -role_id_i:18
type:UserRole
&


On our Solr 4.2 instance adding/removing that query gives us different (and
expected) results, while the query doesn't affect the results at all in
4.5.  Is there any known join query behavior differences/fixes between 4.2
and 4.5 that might explain this, or should I be looking at other factors?

Thanks,
Andy Pickler


Re: Schema Lint

2013-08-06 Thread Andy Lester

On Aug 6, 2013, at 9:55 AM, Steven Bower  wrote:

> Is there an easy way in code / command line to lint a solr config (or even
> just a solr schema)?

No, there's not.  I would love there to be one, especially for the DIH.

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



Re: Email regular expression.

2013-07-30 Thread Andy Lester

On Jul 30, 2013, at 9:53 AM, Luis Cappa Banda  wrote:

> The syntax is the following:
> 
> *E-mail: *
> text:/[a-z0-9_\|-]+(\.[a-z0-9_\|-]|)*@[a-z0-9-]|(\.[a-z0-9-]|)*\.([a-z]{2,4})/

Please note that the question of "How do I write a regex to match an email 
address" is one of the most discussed on the Internet.  Googling for "email 
address regular expression" will give you many many many many hits discussing 
how to do it, and lots of hotly-contested debates.  The topic is not nearly as 
simple as you might think at first glance.

There is no "right" way to do it.  Every approach you take will involve 
tradeoffs.  Read up on this already well-discussed topic and decide what answer 
is best for you in your case.

xoa


--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



Re: Perl Solr help - doing *:* query

2013-07-09 Thread Andy Lester

On Jul 9, 2013, at 2:48 PM, Shawn Heisey  wrote:

> This is primarily to Andy Lester, who wrote the WebService::Solr module
> on CPAN, but I'll take a response from anyone who knows what I can do.
> 
> If I use the following Perl code, I get an error.

What error do you get?  Never say "I get an error."  Always say "I get this 
error: ."

>  If I try to build
> some other query besides *:* to request all documents, the script runs,
> but the query doesn't do what I asked it to do.

What DOES it do?


> http://apaste.info/3j3Q

For the sake of future readers, please put your code in the message.  This 
message will get archived, and future people reading the lists will not be able 
to read the code at some arbitrary paste site.

Shawn's code is:

use strict;
use WebService::Solr;
use WebService::Solr::Query;
use WebService::Solr::Response;



my $url = "http://idx.REDACTED.com:8984/solr/ncmain";;
my $solr = WebService::Solr->new($url);
my $query = WebService::Solr::Query->new("*:*");
my $response = $solr->search($query, {'rows' => '0'});
my $numFound = $response->content->{response}->{numFound};

print "nf: $numFound\n";


xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



Re: DIH: HTMLStripTransformer in sub-entities?

2013-07-06 Thread Andy Pickler
That's exactly what turned out to be the problem.  We thought we had
already tried that permutation but apparently hadn't.  I know it's obvious
in retrospect.  Thanks for the suggestion.

Thanks,
Andy Pickler

On Wed, Jul 3, 2013 at 2:38 PM, Alexandre Rafalovitch wrote:

> On Tue, Jul 2, 2013 at 10:59 AM, Andy Pickler  >wrote:
>
> > SELECT
> >   br.other_content AS replyContent
> > FROM block_reply
> > ">
> >  *THIS DOESN'T
> WORK!*
> >
>
> shouldn't it be
> column="replyContent"
> since you are renaming it in SELECT?
>
> Regards,
>Alex.
>
>
>
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>


Re: DIH: HTMLStripTransformer in sub-entities?

2013-07-02 Thread Andy Pickler
Thanks for the quick reply.  Unfortunately, I don't believe my company
would want me sharing our exact production schema in a public forum,
although I realize it makes it harder to diagnose the problem.  The
sub-entity is a multi-valued field that indeed does have a relationship to
the outer entity.  I just left off the 'where' clause from the sub-entity,
as I didn't believe it was helpful in the context of this problem.  We use
the convention of..

SELECT dbColumnName AS solrFieldName

...so that we can relate the database column name to what we what it to be
named in the Solr index.

I don't think any of this helps you identify my problem, but I tried to
address your questions.

Thanks,
Andy

On Tue, Jul 2, 2013 at 9:14 AM, Gora Mohanty  wrote:

> On 2 July 2013 20:29, Andy Pickler  wrote:
> > Solr 4.1.0
> >
> > We've been using the DIH to pull data in from a MySQL database for quite
> > some time now.  We're now wanting to strip all the HTML content out of
> many
> > fields using the HTMLStripTransformer (
> > http://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer).
> >  Unfortunately, while it seems to be working fine for "top-level"
> entities,
> > we can't seem to get it to work for sub-entities:
> >
> > (not exact schema, reduced for example purposes)
>
> Please do not do that. This DIH configuration file does
> not make sense (please see comments below), and we
> are left guessing in the dark. If the file is too large,
> you can share it on something like pastebin.com
>
> >  > transformer="HTMLStripTransformer" query="
> >   SELECT
> > id as blockId,
> > name as blockTitle,
> > content as content
> >   FROM engagement_block
> >   ">
> > *THIS WORKS!*
> >> transformer="HTMLStripTransformer" query="
> > SELECT
> >   br.other_content AS replyContent
> > FROM block_reply
> > ">
> >  *THIS DOESN'T
> WORK!*
> [...]
>
> (a) You SELECT replyContent, but the column attribute
>  in the field is named "other_content". Nothing should
>  be getting indexed into the field.
> (b) Why are your entities nested if the inner entity has no
>  relationship to the outer one?
>
> Regards,
> Gora
>


DIH: HTMLStripTransformer in sub-entities?

2013-07-02 Thread Andy Pickler
Solr 4.1.0

We've been using the DIH to pull data in from a MySQL database for quite
some time now.  We're now wanting to strip all the HTML content out of many
fields using the HTMLStripTransformer (
http://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer).
 Unfortunately, while it seems to be working fine for "top-level" entities,
we can't seem to get it to work for sub-entities:

(not exact schema, reduced for example purposes)


*THIS WORKS!*
  
 *THIS DOESN'T WORK!*
  


We've tried several different permutations of putting the sub-entity column
in different nest levels of the XML to no avail.  I'm curious if we're
trying something that is just not supported or whether we are just trying
the wrong things.

Thanks,
Andy Pickler


Re: Solr Security

2013-06-24 Thread Andy Lester

On Jun 24, 2013, at 12:51 AM, Aaron Greenspan  wrote:

>  all of them are terrible,

> it looks like you can edit some XML files (if you can find them) 

> The wiki itself is full of semi-useless information, which is pretty 
> infuriating since it's supposed to be the best source.

> Statements like "standard Java web security can be added by tuning the 
> container and the Solr web application configuration itself via web.xml" are 
> not helpful to me.

>  this giant mess,

> It's just common sense.

> Netscape Enterprise Server prompted you to do that a decade and a half ago

>  But either way, that's a pretty ridiculous solution.

> I don't know of any other server product that disregards security so 
> willingly.


Why are you wasting your time with such an inferior project?  Perhaps 
ElasticSearch is more to your liking.

xoxo,
Andy

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



RE: Slow Highlighter Performance Even Using FastVectorHighlighter

2013-06-14 Thread Andy Brown
Bryan,

For specifics, I'll refer you back to my original email where I
specified all the fields/field types/handlers I use. Here's a general
overview. 
 
I really only have 3 fields that I index and search against: "name",
"description", and "content". All of which are just general text
(string) fields. I have a catch-all field called "text" that is only
used for querying. It's indexed but not stored. The "name",
"description", and "content" fields are copied into the "text" field. 
 
For partial word matching, I have 4 more fields: "name_par",
"description_par", "content_par", and "text_par". The "text_par" field
has the same relationship to the "*_par" fields as "text" does to the
others (only used for querying). Those partial word matching fields are
of type "text_general_partial" which I created. That field type is
analyzed different than the regular text field in that it goes through
an EdgeNGramFilterFactory with the minGramSize="2" and maxGramSize="7"
at index time. 
 
I query against both "text" and "text_par" fields using edismax deftype
with my qf set to "text^2 text_par^1" to give full word matches a higher
score. This part returns back very fast as previously stated. It's when
I turn on highlighting that I take the huge performance hit. 
 
Again, I'm using the FastVectorHighlighting. The hl.fl is set to "name
name_par description description_par content content_par" so that it
returns highlights for full and partial word matches. All of those
fields have indexed, stored, termPositions, termVectors, and termOffsets
set to "true". 
 
It all seems redundant just to allow for partial word
matching/highlighting but I didn't know of a better way. Does anything
stand out to you that could be the culprit? Let me know if you need any
more clarification. 
 
Thanks! 
 
- Andy 

-Original Message-
From: Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic.com] 
Sent: Wednesday, May 29, 2013 5:44 PM
To: solr-user@lucene.apache.org
Subject: RE: Slow Highlighter Performance Even Using
FastVectorHighlighter

Andy,

> I don't understand why it's taking 7 secs to return highlights. The
size
> of the index is only 20.93 MB. The JVM heap Xms and Xmx are both set
to
> 1024 for this verification purpose and that should be more than
enough.
> The processor is plenty powerful enough as well.
>
> Running VisualVM shows all my CPU time being taken by mainly these 3
> methods:
>
>
org.apache.lucene.search.vectorhighlight.FieldPhraseList$WeightedPhraseI
> nfo.getStartOffset()
>
org.apache.lucene.search.vectorhighlight.FieldPhraseList$WeightedPhraseI
> nfo.getStartOffset()
>
org.apache.lucene.search.vectorhighlight.FieldPhraseList.addIfNoOverlap(
> )

That is a strange and interesting set of things to be spending most of
your CPU time on. The implication, I think, is that the number of term
matches in the document for terms in your query (or, at least, terms
matching exact words or the beginning of phrases in your query) is
extremely high . Perhaps that's coming from this "partial word match"
you
mention -- how does that work?

-- Bryan

> My guess is that this has something to do with how I'm handling
partial
> word matches/highlighting. I have setup another request handler that
> only searches the whole word fields and it returns in 850 ms with
> highlighting.
>
> Any ideas?
>
> - Andy
>
>
> -Original Message-
> From: Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic.com]
> Sent: Monday, May 20, 2013 1:39 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Slow Highlighter Performance Even Using
> FastVectorHighlighter
>
> My guess is that the problem is those 200M documents.
> FastVectorHighlighter is fast at deciding whether a match, especially
a
> phrase, appears in a document, but it still starts out by walking the
> entire list of term vectors, and ends by breaking the document into
> candidate-snippet fragments, both processes that are proportional to
the
> length of the document.
>
> It's hard to do much about the first, but for the second you could
> choose
> to expose FastVectorHighlighter's FieldPhraseList representation, and
> return offsets to the caller rather than fragments, building up your
own
> snippets from a separate store of indexed files. This would also
permit
> you to set stored="false", improving your memory/core size ratio,
which
> I'm guessing could use some improving. It would require some work, and
> it
> would require you to store a representation of what was indexed
outside
> the Solr core, in some constant-bytes-to-

Velocity / Solritas not works in solr 4.3 and Tomcat 6

2013-06-08 Thread andy tang
*Could anyone help me to see what is the reason which Solritas page failed?*

*I can go to http://localhost:8080/solr without problem, but fail to go to
http://localhost:8080/solr/browse*

*As below is the status report! Any help is appreciated.*

*Thanks!*

*Andy*

*
*

*type* Status report

*message* *{msg=lazy loading
error,trace=org.apache.solr.common.SolrException: lazy loading error at
org.apache.solr.core.SolrCore$LazyQueryResponseWriterWrapper.getWrappedWriter(SolrCore.java:2260)
at
org.apache.solr.core.SolrCore$LazyQueryResponseWriterWrapper.getContentType(SolrCore.java:2279)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:623)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:372)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at
org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:879)
at
org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:617)
at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1760)
at java.lang.Thread.run(Unknown Source) Caused by:
org.apache.solr.common.SolrException: Error Instantiating Query Response
Writer, solr.VelocityResponseWriter failed to instantiate
org.apache.solr.response.QueryResponseWriter at
org.apache.solr.core.SolrCore.createInstance(SolrCore.java:539) at
org.apache.solr.core.SolrCore.createQueryResponseWriter(SolrCore.java:604)
at org.apache.solr.core.SolrCore.access$200(SolrCore.java:131) at
org.apache.solr.core.SolrCore$LazyQueryResponseWriterWrapper.getWrappedWriter(SolrCore.java:2255)
... 16 more Caused by: java.lang.ClassCastException: class
org.apache.solr.response.VelocityResponseWriter at
java.lang.Class.asSubclass(Unknown Source) at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:458)
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518) ... 19
more ,code=500}*

*description* *The server encountered an internal error that prevented it
from fulfilling this request.*


Re: [blogpost] Memory is overrated, use SSDs

2013-06-06 Thread Andy
This is very interesting. Thanks for sharing the benchmark.

One question I have is did you precondition the SSD ( 
http://www.sandforce.com/userfiles/file/downloads/FMS2009_F2A_Smith.pdf )? SSD 
performance tends to take a very deep dive once all blocks are written at least 
once and the garbage collector kicks in. 



 From: Toke Eskildsen 
To: "solr-user@lucene.apache.org"  
Sent: Thursday, June 6, 2013 7:11 PM
Subject: [blogpost] Memory is overrated, use SSDs
 

Inspired by multiple Solr mailing list entries during the last month or two, I 
did some search performance testing on our 11M documents / 49GB index using 
logged queries on Solr 4 with MMapDirectory. It turns out that our setup with 
Solid State Drives and 8GB of RAM (which leaves 5GB for disk cache) performs 
nearly as well as having the whole index in disk cache; the SSD solution 
delivering ~425 q/s for non-faceted searches and the memory solution delivering 
~475 q/s (roughly estimated from the graphs, sorry). Going full memory cache 
certainly is faster if we ignore warmup, but those last queries/second are 
quite expensive.

http://sbdevel.wordpress.com/2013/06/06/memory-is-overrated/

Regards,
Toke Eskildsen, State and University Library, Denmark

Re: Why do FQs make my spelling suggestions so slow?

2013-05-29 Thread Andy Lester

On May 29, 2013, at 9:46 AM, "Dyer, James"  wrote:

> Just an instanity check, I see I had misspelled "maxCollations" as 
> "maxCollation" in my prior response.  When you tested with this set the same 
> as "maxCollationTries", did you correct my spelling?

Yes, definitely.

Thanks for the ticket.  I am looking at the effects of turning on 
spellcheck.onlyMorePopular to true, which reduces the number of collations it 
seems to do, but doesn't affect the underlying question of "is the spellchecker 
doing FQs properly?"

Thanks,
Andy

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



Re: Why do FQs make my spelling suggestions so slow?

2013-05-28 Thread Andy Lester
Thanks for looking at this.

> What are the QTimes for the 0fq,1fq,2fq,4fq & 4fq cases with spellcheck 
> entirely turned off?  Is it about (or a little more than) half the total when 
> maxCollationTries=1 ?

With spellcheck off I get 8ms for 4fq query.


>  Also, with the varying # of fq's, how many collation tries does it take to 
> get 10 collations?

I don't know.  How can I tell?


> Possibly, a better way to test this is to set maxCollations = 
> maxCollationTries.  The reason is that it quits "trying" once it finds 
> "maxCollations", so if with 0fq's, lots of combinations can generate hits and 
> it doesn't need to try very many to get to 10.  But with more fq's, fewer 
> collations will pan out so now it is trying more up to 100 before (if ever) 
> it gets to 10.

It does just fine doing 100 collations so long as there are no FQs.  It seems 
to me that the FQs are taking an inordinate amount of extra time.  100 
collations in (roughly) the same amount of time as a single collation, so long 
as there are no FQs.  Why are the FQs such a drag on the collation process?


> (I'm assuming you have all non-search components like faceting turned off).

Yes, definitely.


>  So say with 2fq's it takes 10ms for the query to complete with spellcheck 
> off, and 20ms with "maxCollation = maxCollationTries = 1", then it will take 
> about 110ms with "maxCollation = maxCollationTries = 10".

I can do maxCollation = maxCollationTries = 100 and it comes back in 14ms, so 
long as I have FQs off.  Add a single FQ and it becomes 13499ms.

I can do maxCollation = maxCollationTries = 1000 and it comes back in 45ms, so 
long as I have FQs off.  Add a single FQ and it becomes 62038ms.


> But I think you're just setting maxCollationTries too high.  You're asking it 
> to do too much work in trying teens of combinations.

The results I get back with 100 tries are about twice as many as I get with 10 
tries.  That's a big difference to the user where it's trying to figure 
misspelled phrases.

Andy

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



Why do FQs make my spelling suggestions so slow?

2013-05-28 Thread Andy Lester
I'm working on using spellcheck for giving suggestions, and collations
are giving me good results, but they turn out to be very slow if
my original query has any FQs in it.  We can do 100 maxCollationTries
in no time at all, but if there are FQs in the query, things get
very slow.  As maxCollationTries and the count of FQs increase,
things get very slow very quickly.

 1102050   100 MaxCollationTries
0FQs 8 9101110
1FQ 11   160   599  1597  1668
2FQs20   346  1163  3360  3361
3FQs29   474  1852  5039  5095
4FQs36   589  2463  6797  6807

All times are QTimes of ms.

See that top row?  With no FQs, 50 MaxCollationTries comes back
instantly.  Add just one FQ, though, and things go bad, and they
get worse as I add more of the FQs.  Also note that things seem to
level off at 100 MaxCollationTries.

Here's a query that I've been using as a test:

df=title_tracings_t&
fl=flrid,nodeid,title_tracings_t&
q=bagdad+AND+diaries+AND+-parent_tracings:(bagdad+AND+diaries)&
spellcheck.q=bagdad+AND+diaries&
rows=4&
wt=xml&
sort=popular_score+desc,+grouping+asc,+copyrightyear+desc,+flrid+asc&
spellcheck=true&
spellcheck.dictionary=direct&
spellcheck.onlyMorePopular=false&
spellcheck.count=15&
spellcheck.extendedResults=false&
spellcheck.collate=true&
spellcheck.maxCollations=10&
spellcheck.maxCollationTries=50&
spellcheck.collateExtendedResults=true&
spellcheck.alternativeTermCount=5&
spellcheck.maxResultsForSuggest=10&
debugQuery=off&
fq=((grouping:"1"+OR+grouping:"2"+OR+grouping:"3")+OR+solrtype:"N")&
fq=((item_source:"F"+OR+item_source:"B"+OR+item_source:"M")+OR+solrtype:"N")&
fq={!tag%3Dgrouping}((grouping:"1"+OR+grouping:"2")+OR+solrtype:"N")&
fq={!tag%3Dlanguagecode}(languagecode:"eng"+OR+solrtype:"N")&

The only thing that changes between tests is the value of
spellcheck.maxCollationTries and how many FQs are at the end.

Am I doing something wrong?  Do the collation internals not handle
FQs correctly?  The lookup/hit counts on filterCache seem to be
increasing just fine.  It will do N lookups, N hits, so I'm not
thinking that caching is the problem.

We'd really like to be able to use the spellchecker but the results
with only 10-20 maxCollationTries aren't nearly as good as if we
can bump that up to 100, but we can't afford the slow response time.
We also can't do without the FQs.

Thanks,
Andy


--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



Re: MoreLikeThis - No Results

2013-05-22 Thread Andy Pickler
Answered my own question...

mlt.mintf: Minimum Term Frequency - the frequency below which terms will be
ignored in the source doc

Our "source doc" is a set of limited terms...not a large content field.  So
in our case I need to set that value to 1 (rather than the default of 2).
 Now I'm getting results...and they indeed are relevant.

Thanks,
Andy Pickler

On Wed, May 22, 2013 at 12:20 PM, Andy Pickler wrote:

> I'm a developing a recommendation feature in our app using the
> MoreLikeThisHandler <http://wiki.apache.org/solr/MoreLikeThisHandler>,
> and so far it is doing a great job.  We're using a user's "competency
> keywords" as the MLT field list and the user's corresponding document in
> Solr as the "comparison document".  I have found that for one user I'm not
> receiving any recommendations, and I'm not sure why.
>
> Solr: 4.1.0
>
> *relevant schema*:
>
>  stored="true" multiValued="true" termVectors="true"/>
>
>  positionIncrementGap="100" autoGeneratePhraseQueries="true">
>   
> 
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> 
> 
>   
>   
> 
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> 
> 
>   
> 
>
> *user's values*:
>
> 
> Healthcare Cost Trends
> 
>
> Is it possible that among all the ~40,000 users in this index (about 500
> of which have the same competency keywords), that the words "healthcare",
> "cost" and "trends" are just judged by Lucene to not be "significant".  I
> realize that I may not understand how the MLT Handler is doing things under
> the covers...I've only been guessing until now based on the (otherwise
> excellent) results I've been seeing.
>
> Thanks,
> Andy Pickler
>
> P.S.  For some additional information, the following query:
>
>
> /mlt?q=objectId:user91813&mlt.fl=competencyKeywords&mlt.interestingTerms=details&debugQuery=true&mlt.match.include=false
>
> ...produces the following results...
>
> 
> 
> 0
> 2
> 
> 
> 
> 
> objectId:user91813
> objectId:user91813
> 
> 
> 
> 
> 
>


RE: Slow Highlighter Performance Even Using FastVectorHighlighter

2013-05-22 Thread Andy Brown
After taking your advice on profiling, I didn't see any memory issues. I
wanted to verify this with a small set of data. So I created a new
sandbox core with the exact same schema and config file settings. I
indexed only 25 PDF documents with an average size of 2.8 MB, the
largest is approx 5 MB (39 pages). I run the exact same query on that
core and I'm seeing response times of 7 secs or more. Without
highlighting the response is usually 1 ms. 
 
I don't understand why it's taking 7 secs to return highlights. The size
of the index is only 20.93 MB. The JVM heap Xms and Xmx are both set to
1024 for this verification purpose and that should be more than enough.
The processor is plenty powerful enough as well. 
 
Running VisualVM shows all my CPU time being taken by mainly these 3
methods: 
 
org.apache.lucene.search.vectorhighlight.FieldPhraseList$WeightedPhraseI
nfo.getStartOffset() 
org.apache.lucene.search.vectorhighlight.FieldPhraseList$WeightedPhraseI
nfo.getStartOffset() 
org.apache.lucene.search.vectorhighlight.FieldPhraseList.addIfNoOverlap(
) 
 
My guess is that this has something to do with how I'm handling partial
word matches/highlighting. I have setup another request handler that
only searches the whole word fields and it returns in 850 ms with
highlighting. 
 
Any ideas? 

- Andy


-Original Message-
From: Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic.com] 
Sent: Monday, May 20, 2013 1:39 PM
To: solr-user@lucene.apache.org
Subject: RE: Slow Highlighter Performance Even Using
FastVectorHighlighter

My guess is that the problem is those 200M documents.
FastVectorHighlighter is fast at deciding whether a match, especially a
phrase, appears in a document, but it still starts out by walking the
entire list of term vectors, and ends by breaking the document into
candidate-snippet fragments, both processes that are proportional to the
length of the document.

It's hard to do much about the first, but for the second you could
choose
to expose FastVectorHighlighter's FieldPhraseList representation, and
return offsets to the caller rather than fragments, building up your own
snippets from a separate store of indexed files. This would also permit
you to set stored="false", improving your memory/core size ratio, which
I'm guessing could use some improving. It would require some work, and
it
would require you to store a representation of what was indexed outside
the Solr core, in some constant-bytes-to-character representation that
you
can use offsets with (e.g. UTF-16, or ASCII+entity references).

However, you may not need to do this -- it may be that you just need
more
memory for your search machine. Not JVM memory, but memory that the O/S
can use as a file cache. What do you have now? That is, how much memory
do
you have that is not used by the JVM or other apps, and how big is your
Solr core?

One way to start getting a handle on where time is being spent is to set
up VisualVM. Turn on CPU sampling, send in a bunch of the slow highlight
queries, and look at where the time is being spent. If it's mostly in
methods that are just reading from disk, buy more memory. If you're on
Linux, look at what top is telling you. If the CPU usage is low and the
"wa" number is above 1% more often than not, buy more memory (I don't
know
why that wa number makes sense, I just know that it has been a good rule
of thumb for us).

-- Bryan

> -Original Message-
> From: Andy Brown [mailto:andy_br...@rhoworld.com]
> Sent: Monday, May 20, 2013 9:53 AM
> To: solr-user@lucene.apache.org
> Subject: Slow Highlighter Performance Even Using FastVectorHighlighter
>
> I'm providing a search feature in a web app that searches for
documents
> that range in size from 1KB to 200MB of varying MIME types (PDF, DOC,
> etc). Currently there are about 3000 documents and this will continue
to
> grow. I'm providing full word search and partial word search. For each
> document, there are three source fields that I'm interested in
searching
> and highlighting on: name, description, and content. Since I'm
providing
> both full and partial word search, I've created additional fields that
> get tokenized differently: name_par, description_par, and content_par.
> Those are indexed and stored as well for querying and highlighting. As
> suggested in the Solr wiki, I've got two catch all fields text and
> text_par for faster querying.
>
> An average search results page displays 25 results and I provide
paging.
> I'm just returning the doc ID in my Solr search results and response
> times have been quite good (1 to 10 ms). The problem in performance
> occurs when I turn on highlighting. I'm already using the
> FastVectorHighlighter and depending on the query, it has taken as long
> as 15 seconds to get the highlight snippets. However, this i

MoreLikeThis - No Results

2013-05-22 Thread Andy Pickler
I'm a developing a recommendation feature in our app using the
MoreLikeThisHandler <http://wiki.apache.org/solr/MoreLikeThisHandler>, and
so far it is doing a great job.  We're using a user's "competency keywords"
as the MLT field list and the user's corresponding document in Solr as the
"comparison document".  I have found that for one user I'm not receiving
any recommendations, and I'm not sure why.

Solr: 4.1.0

*relevant schema*:




  





  
  





  


*user's values*:


Healthcare Cost Trends


Is it possible that among all the ~40,000 users in this index (about 500 of
which have the same competency keywords), that the words "healthcare",
"cost" and "trends" are just judged by Lucene to not be "significant".  I
realize that I may not understand how the MLT Handler is doing things under
the covers...I've only been guessing until now based on the (otherwise
excellent) results I've been seeing.

Thanks,
Andy Pickler

P.S.  For some additional information, the following query:

/mlt?q=objectId:user91813&mlt.fl=competencyKeywords&mlt.interestingTerms=details&debugQuery=true&mlt.match.include=false

...produces the following results...



0
2




objectId:user91813
objectId:user91813







Slow Highlighter Performance Even Using FastVectorHighlighter

2013-05-20 Thread Andy Brown
I'm providing a search feature in a web app that searches for documents
that range in size from 1KB to 200MB of varying MIME types (PDF, DOC,
etc). Currently there are about 3000 documents and this will continue to
grow. I'm providing full word search and partial word search. For each
document, there are three source fields that I'm interested in searching
and highlighting on: name, description, and content. Since I'm providing
both full and partial word search, I've created additional fields that
get tokenized differently: name_par, description_par, and content_par.
Those are indexed and stored as well for querying and highlighting. As
suggested in the Solr wiki, I've got two catch all fields text and
text_par for faster querying. 
 
An average search results page displays 25 results and I provide paging.
I'm just returning the doc ID in my Solr search results and response
times have been quite good (1 to 10 ms). The problem in performance
occurs when I turn on highlighting. I'm already using the
FastVectorHighlighter and depending on the query, it has taken as long
as 15 seconds to get the highlight snippets. However, this isn't always
the case. Certain query terms result in 1 sec or less response time. In
any case, 15 seconds is way too long. 
 
I'm fairly new to Solr but I've spent days coming up with what I've got
so far. Feel free to correct any misconceptions I have. Can anyone
advise me on what I'm doing wrong or offer a better way to setup my core
to improve highlighting performance? 
 
A typical query would look like:
/select?q=foo&start=0&rows=25&fl=id&hl=true 
 
I'm using Solr 4.1. Below the relevant core schema and config details: 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
   
 
 
 
   
   
 
 
 
 

  
 
 
 
   
 
 
 
 
   
   
 
 
 
 
   
 
 
 
 
 
 
  
   explicit 
   10 
   text 
   edismax 
   text^2 text_par^1
   true 
   true 
   true 
   true 
   breakIterator 
   2 
   name name_par description description_par
content content_par 
   162 
   simple 
   default 



   
  
  


Cheers!

- Andy



Re: Any estimation for solr 4.3?

2013-05-02 Thread Andy Lester

On May 2, 2013, at 9:20 AM, Alexandre Rafalovitch  wrote:

> Hopefully, this is not a secret, but the RCs are built and available
> for download and announced on the dev mailing list.


Thanks for the link.

I don't think it's a secret, but I sure don't see anything that says "This is 
how the dev process works."

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



Re: Any estimation for solr 4.3?

2013-05-02 Thread Andy Lester

On May 2, 2013, at 9:11 AM, Yago Riveiro  wrote:

> In attachment the change log of solr 4.3 RC3 
> 


And where would I find that?  I don't see anything at 
http://lucene.apache.org/solr/downloads.html to download?  Do I need to check 
out Subversion repo?  Is there a page somewhere that describes the process set 
up?

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



Re: Any estimation for solr 4.3?

2013-05-02 Thread Andy Lester

On May 2, 2013, at 9:03 AM, Yago Riveiro  wrote:

> The road map has this release note, but I think that most of it will be move 
> to 4.3.1 or 4.4
> 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310230&version=12324128
>  

So, is there a way I can see what is currently pending to go in 4.3?

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



Re: Any estimation for solr 4.3?

2013-05-02 Thread Andy Lester

On May 2, 2013, at 3:36 AM, "Jack Krupansky"  wrote:

> RC4 of 4.3 is available now. The final release of 4.3 is likely to be within 
> days.


How can I see the Changelog of what will be in it?

Thanks,
xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



Re: Solr indexing

2013-04-18 Thread Andy Lester

On Apr 18, 2013, at 10:49 AM, hassancrowdc  wrote:

> Solr is not showing the dates i have in database. any help? is solr following
> any specific timezone? On my database my date is 2013-04-18 11:29:33 but
> solr shows me "2013-04-18T15:29:33Z".   Any help


Solr knows nothing of timezones.  Solr expects everything is in UTC.  If you 
want time zone support, you'll have to convert local time to UTC before 
importing, and then convert back to local time from UTC when you read from Solr.

xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



Re: Top 10 Terms in Index (by date)

2013-04-02 Thread Andy Pickler
A key problem with those approaches as well as Lucene's HighFreqTerms class
(
http://lucene.apache.org/core/4_2_0/misc/org/apache/lucene/misc/HighFreqTerms.html)
is that none of them seem to have the ability to combine with a date range
query...which is key in my scenario.  I'm kinda thinking that what I'm
asking to do just isn't supported by Lucene or Solr, and that I'll have to
pursue another avenue.  If anyone has any other suggestions, I'm all ears.
I'm starting to wonder if I need to have some nightly batch job that
executes against my database and builds up "that day's top terms" in a
table or something.

Thanks,
Andy Pickler

On Tue, Apr 2, 2013 at 7:16 AM, Tomás Fernández Löbbe  wrote:

> Oh, I see, essentially you want to get the sum of the term frequencies for
> every term in a subset of documents (instead of the document frequency as
> the FacetComponent would give you). I don't know of an easy/out of the box
> solution for this. I know the TermVectorComponent will give you the tf for
> every term in a document, but I'm not sure if you can filter or sort on it.
> Maybe you can do something like:
> https://issues.apache.org/jira/browse/LUCENE-2393
> or what's suggested here:
> http://search-lucene.com/m/of5Fn1PUOHU/
> but I have never used something like that.
>
> Tomás
>
>
>
> On Mon, Apr 1, 2013 at 9:58 PM, Andy Pickler 
> wrote:
>
> > I need "total number of occurrences" across all documents for each term.
> > Imagine this...
> >
> > Post #1: "I think, therefore I am like you"
> > Reply #1: "You think too much"
> > Reply #2 "I think that I think much as you"
> >
> > Each of those "documents" are put into 'content'.  Pretending I don't
> have
> > stop words, the top term query (not considering dateCreated in this
> > example) would result in something like...
> >
> > "think": 4
> > "I": 4
> > "you": 3
> > "much": 2
> > ...
> >
> > Thus, just a "number of documents" approach doesn't work, because if a
> word
> > occurs more than one time in a document it needs to be counted that many
> > times.  That seemed to rule out faceting like you mentioned as well as
> the
> > TermsComponent (which as I understand also only counts "documents").
> >
> > Thanks,
> > Andy Pickler
> >
> > On Mon, Apr 1, 2013 at 4:31 PM, Tomás Fernández Löbbe <
> > tomasflo...@gmail.com
> > > wrote:
> >
> > > So you have one document per user comment? Why not use faceting plus
> > > filtering on the "dateCreated" field? That would count "number of
> > > documents" for each term (so, in your case, if a term is used twice in
> > one
> > > comment it would only count once). Is that what you are looking for?
> > >
> > > Tomás
> > >
> > >
> > > On Mon, Apr 1, 2013 at 6:32 PM, Andy Pickler 
> > > wrote:
> > >
> > > > Our company has an application that is "Facebook-like" for usage by
> > > > enterprise customers.  We'd like to do a report of "top 10 terms
> > entered
> > > by
> > > > users over (some time period)".  With that in mind I'm using the
> > > > DataImportHandler to put all the relevant data from our database
> into a
> > > > Solr 'content' field:
> > > >
> > > >  stored="false"
> > > > multiValued="false" required="true" termVectors="true"/>
> > > >
> > > > Along with the content is the 'dateCreated' for that content:
> > > >
> > > >  > > > multiValued="false" required="true"/>
> > > >
> > > > I'm struggling with the TermVectorComponent documentation to
> understand
> > > how
> > > > I can put together a query that answers the 'report' mentioned above.
> > >  For
> > > > each document I need each term counted however many times it is
> entered
> > > > (content of "I think what I think" would report 'think' as used
> twice).
> > > >  Does anyone have any insight as to whether I'm headed in the right
> > > > direction and then what my query would be?
> > > >
> > > > Thanks,
> > > > Andy Pickler
> > > >
> > >
> >
>


Re: Top 10 Terms in Index (by date)

2013-04-01 Thread Andy Pickler
I need "total number of occurrences" across all documents for each term.
Imagine this...

Post #1: "I think, therefore I am like you"
Reply #1: "You think too much"
Reply #2 "I think that I think much as you"

Each of those "documents" are put into 'content'.  Pretending I don't have
stop words, the top term query (not considering dateCreated in this
example) would result in something like...

"think": 4
"I": 4
"you": 3
"much": 2
...

Thus, just a "number of documents" approach doesn't work, because if a word
occurs more than one time in a document it needs to be counted that many
times.  That seemed to rule out faceting like you mentioned as well as the
TermsComponent (which as I understand also only counts "documents").

Thanks,
Andy Pickler

On Mon, Apr 1, 2013 at 4:31 PM, Tomás Fernández Löbbe  wrote:

> So you have one document per user comment? Why not use faceting plus
> filtering on the "dateCreated" field? That would count "number of
> documents" for each term (so, in your case, if a term is used twice in one
> comment it would only count once). Is that what you are looking for?
>
> Tomás
>
>
> On Mon, Apr 1, 2013 at 6:32 PM, Andy Pickler 
> wrote:
>
> > Our company has an application that is "Facebook-like" for usage by
> > enterprise customers.  We'd like to do a report of "top 10 terms entered
> by
> > users over (some time period)".  With that in mind I'm using the
> > DataImportHandler to put all the relevant data from our database into a
> > Solr 'content' field:
> >
> >  > multiValued="false" required="true" termVectors="true"/>
> >
> > Along with the content is the 'dateCreated' for that content:
> >
> >  > multiValued="false" required="true"/>
> >
> > I'm struggling with the TermVectorComponent documentation to understand
> how
> > I can put together a query that answers the 'report' mentioned above.
>  For
> > each document I need each term counted however many times it is entered
> > (content of "I think what I think" would report 'think' as used twice).
> >  Does anyone have any insight as to whether I'm headed in the right
> > direction and then what my query would be?
> >
> > Thanks,
> > Andy Pickler
> >
>


Top 10 Terms in Index (by date)

2013-04-01 Thread Andy Pickler
Our company has an application that is "Facebook-like" for usage by
enterprise customers.  We'd like to do a report of "top 10 terms entered by
users over (some time period)".  With that in mind I'm using the
DataImportHandler to put all the relevant data from our database into a
Solr 'content' field:



Along with the content is the 'dateCreated' for that content:



I'm struggling with the TermVectorComponent documentation to understand how
I can put together a query that answers the 'report' mentioned above.  For
each document I need each term counted however many times it is entered
(content of "I think what I think" would report 'think' as used twice).
 Does anyone have any insight as to whether I'm headed in the right
direction and then what my query would be?

Thanks,
Andy Pickler


Re: [ANNOUNCE] Solr wiki editing change

2013-03-28 Thread Andy Lester

On Mar 24, 2013, at 10:18 PM, Steve Rowe  wrote:

> The wiki at http://wiki.apache.org/solr/ has come under attack by spammers 
> more frequently of late, so the PMC has decided to lock it down in an attempt 
> to reduce the work involved in tracking and removing spam.
> 
> From now on, only people who appear on 
> http://wiki.apache.org/solr/ContributorsGroup will be able to 
> create/modify/delete wiki pages.
> 
> Please request either on the solr-user@lucene.apache.org or on 
> d...@lucene.apache.org to have your wiki username added to the 
> ContributorsGroup page - this is a one-time step.


Please add my username, AndyLester, to the approved editors list.  Thanks.

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



Re: Facets with 5000 facet fields

2013-03-21 Thread Andy
But if I just add facet.method=fcs, wouldn't I just get fcs? Mark said this new 
method based on docvalues is better than fcs, so wouldn't I need to do 
something other than specifying fcs to enable this new method?




 From: Upayavira 
To: solr-user@lucene.apache.org 
Sent: Thursday, March 21, 2013 9:04 AM
Subject: Re: Facets with 5000 facet fields
 
as was said below, add facet.method=fcs to your query URL.

Upayavira

On Thu, Mar 21, 2013, at 09:41 AM, Andy wrote:
> What do I need to do to use this new per segment faceting method?
> 
> 
> 
>  From: Mark Miller 
> To: solr-user@lucene.apache.org 
> Sent: Wednesday, March 20, 2013 1:09 PM
> Subject: Re: Facets with 5000 facet fields
>  
> 
> On Mar 20, 2013, at 11:29 AM, Chris Hostetter 
> wrote:
> 
> > Not true ... per segment FIeldCache support is available in solr 
> > faceting, you just have to specify facet.method=fcs (FieldCache per 
> > Segment)
> 
> Also, if you use docvalues in 4.2, Robert tells me it is uses a new per
> seg faceting method that may have some better nrt characteristics than
> fcs. I have not played with it yet but hope to soon.
> 
> - Mark

Re: Facets with 5000 facet fields

2013-03-21 Thread Andy
What do I need to do to use this new per segment faceting method?



 From: Mark Miller 
To: solr-user@lucene.apache.org 
Sent: Wednesday, March 20, 2013 1:09 PM
Subject: Re: Facets with 5000 facet fields
 

On Mar 20, 2013, at 11:29 AM, Chris Hostetter  wrote:

> Not true ... per segment FIeldCache support is available in solr 
> faceting, you just have to specify facet.method=fcs (FieldCache per 
> Segment)

Also, if you use docvalues in 4.2, Robert tells me it is uses a new per seg 
faceting method that may have some better nrt characteristics than fcs. I have 
not played with it yet but hope to soon.

- Mark

Re: Facets with 5000 facet fields

2013-03-20 Thread Andy
That's impressive performance.

Are you doing NRT updates? I seem to recall that facet cache is not per segment 
so every time the index is updated the facet cache will need to be re-computed. 
And that's going to kill performance. Have you run into that problem?



 From: Toke Eskildsen 
To: "solr-user@lucene.apache.org" ; Andy 
 
Sent: Wednesday, March 20, 2013 4:06 AM
Subject: Re: Facets with 5000 facet fields
 
On Wed, 2013-03-20 at 07:19 +0100, Andy wrote:
> What about the case where there's only a small number of fields (a
> dozen or two) but each field has hundreds of thousands or millions of
> values? Would Solr be able to handle that?

We do that on a daily basis at State and University Library, Denmark:
One of our facet fields has 10766502 unique terms, another has 6636746.
This is for 11M documents and it has query response times clustering at
~150ms, ~750ms and ~1500ms (I'll have to look into why it clusters like
that).

This is with standard Solr faceting on a quad core Xeon L5420 server
with SSD. It has 16GB of RAM and runs two search instances, each with
~11M documents, one with a 52GB index, one with 71GB.

- Toke Eskildsen

Re: Facets with 5000 facet fields

2013-03-19 Thread Andy
Hoss,

What about the case where there's only a small number of fields (a dozen or 
two) but each field has hundreds of thousands or millions of values? Would Solr 
be able to handle that?




 From: Chris Hostetter 
To: solr-user@lucene.apache.org 
Sent: Tuesday, March 19, 2013 6:09 PM
Subject: Re: Facets with 5000 facet fields
 

: In order to support faceting, Solr maintains a cache of the faceted
: field. You need one cache for each field you are faceting on, meaning
: your memory requirements will be substantial, unless, I guess, your

1) you can consider trading ram for time by using "facet.method=enum" (and 
disabling your filterCache) ... it will prevent the need for hte 
FieldCaches but will probably be slower as it will compute the docset per 
value per field instead of generating the FieldCaches once and re-useing 
them.

2) the entire question seems suspicious...

: > We have configured solr for 5000 facet fields as part of request
: > handler.We
: > have 10811177 docs in the index.

...i have lots of experience dealing with indexes that had thousands of 
fields that were faceted on, but i've never seen any realistic usecase for 
faceting on more then a few hundred fields per search.  Can you please 
elaborate on your goals and usecases so we can offer better advice...

https://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341


-Hoss

Re: Importing datetime

2013-03-19 Thread Andy Lester

On Mar 19, 2013, at 12:04 PM, Spadez  wrote:

> This is the datetime format SOLR requires as I understand it:
> 
> 1995-12-31T23:59:59Z
> 
> When I try to store this as a datetime field in MySQL it says it isn't
> valid. My question is, ideally I would want to keep a datetime in my
> database so I can sort by date rather than just making it a varchar, so I
> would store it like this:
> 
> 1995-12-31 23:59:59 
> 
> Can import date in this format into SOLR from MySQL?

Yes.  Don't change the storage type of your column in MySQL.  Changing to 
VARCHAR would be sad.

What you'll need to do is use a date formatting function in your SELECT out of 
the MySQL database to get the date into the format that MySQL likes.

See 
https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_date-format
 

xoa


--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



Re: How can I limit my Solr search to an arbitrary set of 100,000 documents?

2013-03-12 Thread Andy Lester

On Mar 12, 2013, at 1:21 PM, Chris Hostetter  wrote:

> How are these sets of flrids created/defined?  (undertsanding the source 
> of the filter information may help inspire alternative suggestsions, ie: 
> XY Problem)


It sounds like you're looking for patterns that could potentially providing 
groupings for these FLRIDs.  We've been down that road, too, but we don't see 
how there could be one.  The arbitrariness comes from the fact that the lists 
are maintained by users and can be changed at any time.

Each book in the database has an FLRID.  Each user can create lists of books.  
These lists can be modified at any time.  

That looks like this in Oracle:   USER   1->M   LIST   1->M   LISTDETAIL  M <- 
1  TITLE

The sizes we're talking about:  tens of thousands of users; hundreds of 
thousands of lists, with up to 100,000 items per list; tens of millions of 
listdetail.

We have a feature that lets the user do a keyword search on books within his 
list.  We can't update the Solr record to keep track of which lists it appears 
on because there may be, say, 20 people every second updating the contents of 
their lists, and those 20 people expect that their next search-within-a-list 
will have those new results.

Andy

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



Re: [Beginner] wants to contribute in open source project

2013-03-11 Thread Andy Lester

On Mar 11, 2013, at 11:14 AM, chandresh pancholi 
 wrote:

> I am beginner in this field. It would be great if you help me out. I love
> to code in java.
> can you guys share some link so that i can start contributing in
> solr/lucene project.


This article I wrote about getting started contributing to projects may give 
you some ideas.

http://blog.smartbear.com/software-quality/bid/167051/14-Ways-to-Contribute-to-Open-Source-without-Being-a-Programming-Genius-or-a-Rock-Star

I don't have tasks specifically for the Solr project (does Solr have such a 
list for newcomers to help on?) but I hope that you'll get some ideas.

xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



How can I limit my Solr search to an arbitrary set of 100,000 documents?

2013-03-08 Thread Andy Lester
We've got an 11,000,000-document index.  Most documents have a unique ID called 
"flrid", plus a different ID called "solrid" that is Solr's PK.  For some 
searches, we need to be able to limit the searches to a subset of documents 
defined by a list of FLRID values.  The list of FLRID values can change between 
every search and it will be rare enough to call it "never" that any two 
searches will have the same set of FLRIDs to limit on.

What we're doing right now is, roughly:

q=title:dogs AND 
(flrid:(123 125 139  34823) OR 
 flrid:(34837 ... 59091) OR 
 ... OR 
 flrid:(101294813 ... 103049934))

Each of those FQs parentheticals can be 1,000 FLRIDs strung together.  We have 
to subgroup to get past Solr's limitations on the number of terms that can be 
ORed together.

The problem with this approach (besides that it's clunky) is that it seems to 
perform O(N^2) or so.  With 1,000 FLRIDs, the search comes back in 50ms or so.  
If we have 10,000 FLRIDs, it comes back in 400-500ms.  With 100,000 FLRIDs, 
that jumps up to about 75000ms.  We want it be on the order of 1000-2000ms at 
most in all cases up to 100,000 FLRIDs.

How can we do this better?

Things we've tried or considered:

* Tried: Using dismax with minimum-match mm:0 to simulate an OR query.  No 
improvement.
* Tried: Putting the FLRIDs into the fq instead of the q.  No improvement.
* Considered: dumping all the FLRIDs for a given search into another core and 
doing a join between it and the main core, but if we do five or ten searches 
per second, it seems like Solr would die from all the commits.  The set of 
FLRIDs is unique between searches so there is no reuse possible.
* Considered: Translating FLRIDs to SolrID and then limiting on SolrID instead, 
so that Solr doesn't have to hit the documents in order to translate 
FLRID->SolrID to do the matching.

What we're hoping for:

* An efficient way to pass a long set of IDs, or for Solr to be able to pull 
them from the app's Oracle database.
* Have Solr do big ORs as a set operation not as (what we assume is) a naive 
one-at-a-time matching.
* A way to create a match vector that gets passed to the query, because strings 
of fqs in the query seems to be a suboptimal way to do it.

I've searched SO and the web and found people asking about this type of 
situation a few times, but no answers that I see beyond what we're doing now.

* 
http://stackoverflow.com/questions/11938342/solr-search-within-subset-defined-by-list-of-keys
* 
http://stackoverflow.com/questions/9183898/searching-within-a-subset-of-data-solr
* 
http://lucene.472066.n3.nabble.com/Filtered-search-for-subset-of-ids-td502245.html
* 
http://lucene.472066.n3.nabble.com/Search-within-a-subset-of-documents-td1680475.html

Thanks,
Andy

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



Re: How to use SolrCloud in multi-threaded indexing

2013-02-04 Thread andy
Thanks man



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-use-SolrCloud-in-multi-threaded-indexing-tp4037641p4038482.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to use SolrCloud in multi-threaded indexing

2013-02-04 Thread andy

Thanks man



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-use-SolrCloud-in-multi-threaded-indexing-tp4037641p4038481.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to use SolrCloud in multi-threaded indexing

2013-01-31 Thread andy
Hi, 

I am going to upgrade to solr 4.1 from version 3.6, and I want to set up to
shards.
I use ConcurrentUpdateSolrServer to index the documents in solr3.6.
I saw the api CloudSolrServer in 4.1,BUT
1:CloudSolrServer use the LBHttpSolrServer to issue requests,but "*
LBHttpSolrServer  should NOT be used for indexing *" documented in the api 
http://lucene.apache.org/solr/4_1_0/solr-solrj/index.html
  
2:it seems CloudSolrServer does not support multi thread indexing 

So, how to do multi-threaded indexing in solr 4.1?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-use-SolrCloud-in-multi-threaded-indexing-tp4037641.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: custom solr sort

2013-01-07 Thread andy
Thanks you guys, I got the reason now, there'is something wrong with
compareBottom method in my source,it's not consistent with compare method




--
View this message in context: 
http://lucene.472066.n3.nabble.com/custom-solr-sort-tp4031014p4031444.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: custom solr sort

2013-01-07 Thread andy
Hi Upayavira,

The custom sort field is not stored in the index, I want to archieve a
requirement that didfferent search users will get different search results
when  they search same keyword by my search engine, the search users have
relationship with the each result document in the solr. But the relationship
is provided by the other teams' rest service.
So the search sequence is as follows :
1. I add the search user's id in the solr query  ( i.e. :  
query.setParam("uid", vo.getUserId());)
   and specify my own request  hanlder "*mysearch*"  query.setParam("qt",
"mysearch");

2.  MySortComponent set the custom sort as the first sort.
3.  MyComparatorSource got the uid ,and send request to a rest service,
get the relationship according the uid
4.sort the result

Do you have any suggestions?



Upayavira wrote
> Can you explain why you want to implement a different sort first? There
> may be other ways of achieving the same thing.
> 
> Upayavira
> 
> On Sun, Jan 6, 2013, at 01:32 AM, andy wrote:
>> Hi,
>> 
>> Maybe this is an old thread or maybe it's different with previous one.
>> 
>> I want to custom solr sort and  pass solr param from client to solr
>> server,
>> so I  implemented SearchComponent which named MySortComponent in my code,
>> and also implemented FieldComparatorSource and FieldComparator. when I
>> use
>> "mysearch" requesthandler(see following codes), I found that custom sort
>> just effect on the current page when I got multiple page results, but the
>> sort is expected when I sets the rows which contains  all the results.
>> Does
>> anybody know how to solve it or the reason?
>> 
>> code snippet:
>> 
>> public class MySortComponent extends SearchComponent implements
>> SolrCoreAware {
>>   
>> public void inform(SolrCore arg0) {
>> }
>> 
>> @Override
>> public void prepare(ResponseBuilder rb) throws IOException {
>> SolrParams params = rb.req.getParams();
>>  String uid = params.get("uid")
>>  private RestTemplate restTemplate = new RestTemplate();
>>  
>> MyComparatorSource comparator = new MyComparatorSource(uid);
>> SortSpec sortSpec = rb.getSortSpec();
>> if (sortSpec.getSort() == null) {
>> sortSpec.setSort(new Sort(new SortField[] {
>> new SortField("relation",
>> comparator),SortField.FIELD_SCORE }));
>>   
>> } else {
>>   
>> SortField[] current = sortSpec.getSort().getSort();
>> ArrayList
> 
>  sorts = new ArrayList
> 
> (
>> current.length + 1);
>> sorts.add(new SortField("relation", comparator));
>> for (SortField sf : current) {
>> sorts.add(sf);
>> }
>> sortSpec.setSort(new Sort(sorts.toArray(new
>> SortField[sorts.size()])));
>>   
>> }
>> 
>> }
>> 
>> @Override
>> public void process(ResponseBuilder rb) throws IOException {
>> 
>> }
>> 
>> //
>> -
>> // SolrInfoMBean
>> //
>> -
>> 
>> @Override
>> public String getDescription() {
>> return "Custom Sorting";
>> }
>> 
>> @Override
>> public String getSource() {
>> return "";
>> }
>> 
>> @Override
>> public URL[] getDocs() {
>> try {
>> return new URL[] { new URL(
>> "http://wiki.apache.org/solr/QueryComponent";) };
>> } catch (MalformedURLException e) {
>> throw new RuntimeException(e);
>> }
>> }
>> 
>> public class MyComparatorSource extends FieldComparatorSource {
>> private BitSet dg1;
>> private BitSet dg2;
>> private BitSet dg3;
>> 
>> public MyComparatorSource(String uid) throws IOException {
>> 
>> SearchResponse responseBody = restTemplate.postForObject(
>> "http://search.test.com/userid/search/"; + uid, null,
>> SearchResponse.class);
>> 
>> String d1 = responseBody.getOneDe();
>> String d2 = responseBody.getTwo

custom solr sort

2013-01-05 Thread andy
 void copy(int slot, int doc) throws IOException {
values[slot] = getRelation(doc);

}

@Override
public void setBottom(int slot) {
bottom = values[slot];
}

@Override
public FieldComparator setNextReader(
AtomicReaderContext ctx) throws IOException {
uidDoc = FieldCache.DEFAULT.getInts(ctx.reader(), "userID",
true);
return this;
}

@Override
public Float value(int slot) {
return new Float(values[slot]);
}

private float getRelation(int doc) throws IOException {
if (dg3.get(uidDoc[doc])) {
return 3.0f;
} else if (dg2.get(uidDoc[doc])) {
return 4.0f;
} else if (dg1.get(uidDoc[doc])) {
return 5.0f;
} else {
return 1.0f;
}
}

@Override
public int compareDocToValue(int arg0, Object arg1)
throws IOException {
// TODO Auto-generated method stub
return 0;
    }
}

}
}


and solrconfig.xml configuration is 




   
mySortComponent
  



Thanks
Andy




--
View this message in context: 
http://lucene.472066.n3.nabble.com/custom-solr-sort-tp4031014.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Pause and resume indexing on SolR 4 for backups

2012-12-21 Thread Andy D'Arcy Jewell

On 20/12/12 20:19, alx...@aim.com wrote:

Depending on your architecture, why not index the same data into two machines? 
One will be your prod another your backup?
Because we're trying to keep costs and complexity low whilst in the 
development stage ;-)


But more seriously, this will obviously be a must sooner or later.

--
Andy D'Arcy Jewell

SysMicro Limited
Linux Support
T:  0844 9918804
M:  07961605631
E:  andy.jew...@sysmicro.co.uk
W:  www.sysmicro.co.uk



Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Andy D'Arcy Jewell

On 20/12/12 13:38, Upayavira wrote:

The backup directory should just be a clone of the index files. I'm
curious to know whether it is a cp -r or a cp -lr that the replication
handler produces.

You would prevent commits by telling your app not to commit. That is,
Solr only commits when it is *told* to.

Unless you use autocommit, in which case I guess you could monitor your
logs for the last commit, and do your backup a 10 seconds after that.


Hmm. Strange - the files created by the backup API don't seem to 
correlate exactly with the files stored under the solr data directory:


andydj@me-solr01:~$ find /tmp/snapshot.20121220155853703/
/tmp/snapshot.20121220155853703/
/tmp/snapshot.20121220155853703/_2vq.fdx
/tmp/snapshot.20121220155853703/_2vq_Lucene40_0.tim
/tmp/snapshot.20121220155853703/segments_2vs
/tmp/snapshot.20121220155853703/_2vq_nrm.cfs
/tmp/snapshot.20121220155853703/_2vq.fnm
/tmp/snapshot.20121220155853703/_2vq_nrm.cfe
/tmp/snapshot.20121220155853703/_2vq_Lucene40_0.frq
/tmp/snapshot.20121220155853703/_2vq.fdt
/tmp/snapshot.20121220155853703/_2vq.si
/tmp/snapshot.20121220155853703/_2vq_Lucene40_0.tip
andydj@me-solr01:~$ find /var/lib/solr/data/index/
/var/lib/solr/data/index/
/var/lib/solr/data/index/_2w6_Lucene40_0.frq
/var/lib/solr/data/index/_2w6.si
/var/lib/solr/data/index/segments_2w8
/var/lib/solr/data/index/write.lock
/var/lib/solr/data/index/_2w6_nrm.cfs
/var/lib/solr/data/index/_2w6.fdx
/var/lib/solr/data/index/_2w6_Lucene40_0.tip
/var/lib/solr/data/index/_2w6_nrm.cfe
/var/lib/solr/data/index/segments.gen
/var/lib/solr/data/index/_2w6.fnm
/var/lib/solr/data/index/_2w6.fdt
/var/lib/solr/data/index/_2w6_Lucene40_0.tim

Am I correct in thinking that to restore from this backup, I would need 
to do the following?


1. Stop Tomcat (or maybe just solr)
2. Remove all files under /var/lib/solr/data/index/
3. Move/copy files from /tmp/snapshot.20121220155853703/ to 
/var/lib/solr/data/index/

4. Restart Tomcat (or just solr)


Thanks everyone who's pitched in on this! Once I've got this working, 
I'll document it.

-Andy

--
Andy D'Arcy Jewell

SysMicro Limited
Linux Support
E:  andy.jew...@sysmicro.co.uk
W:  www.sysmicro.co.uk



Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Andy D'Arcy Jewell

On 20/12/12 11:58, Upayavira wrote:

I've never used it, but the replication handler has an option:

   http://master_host:port/solr/replication?command=backup

Which will take you a backup.
I've looked at that this morning as suggested by Markus Jelsma. Looks 
good, but I'll have to work out how to use the resultant backup 
directory. I've been dealing with another unrelated issue in the 
mean-time and I haven't had a chance to look for any docu so far.

Also something to note, if you don't want to use the above, and you are
running on Unix, you can create fast 'hard link' clones of lucene
indexes. Doing:

cp -lr data data.bak

will copy your index instantly. If you can avoid doing this when a
commit is happening, then you'll have a good index copy, that will take
no space on your disk and be made instantly. This is because it just
copies the directory structure, not the files themselves, and given
files in a lucene index never change (they are only ever deleted or
replaced), this works as a good copy technique for backing up.
That's the approach that Shawn Heisey proposed, and what I've been 
working towards,  but it still leaves open the question of how to 
*pause* SolR or prevent commits during the backup (otherwise we have a 
potential race condition).


-Andy


--
Andy D'Arcy Jewell

SysMicro Limited
Linux Support
E:  andy.jew...@sysmicro.co.uk
W:  www.sysmicro.co.uk



Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Andy D'Arcy Jewell

On 20/12/12 10:24, Gora Mohanty wrote:


Unless I am missing something, the index is only being written to
when you are adding/updating the index. So, the question is how
is this being done in your case, and could you pause indexing for
the duration of the backup?

Regards,
Gora
It's attached to a web-app, which accepts uploads and will be available 
24/7, with a global audience, so "pausing" it may be rather difficult 
(tho I may put this to the developer - it may for instance be possible 
if he has a small number of choke points for input into SolR).


Thanks.

--
Andy D'Arcy Jewell

SysMicro Limited
Linux Support
T:  0844 9918804
M:  07961605631
E:  andy.jew...@sysmicro.co.uk
W:  www.sysmicro.co.uk



Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Andy D'Arcy Jewell

Hi all.

Can anyone advise me of a way to pause and resume SolR 4 so I can 
perform a backup? I need to be able to revert to a usable (though not 
necessarily complete) index after a crash or other "disaster" more 
quickly than a re-index operation would yield.


I can't yet afford the "extravagance" of a separate SolR replica just 
for backups, and I'm not sure if I'll ever have the luxury. I'm 
currently running with just one node, be we are not yet live.


I can think of the following ways to do this, each with various downsides:

1) Just backup the existing index files whilst indexing continues
+ Easy
+ Fast
- Incomplete
- Potential for corruption? (e.g. partial files)

2) Stop/Start Tomcat
+ Easy
- Very slow and I/O, CPU intensive
- Client gets errors when trying to connect

3) Block/unblock SolR port with IpTables
+ Fast
- Client gets errors when trying to connect
- Have to wait for existing transactions to complete (not sure how, 
maybe watch socket FD's in /proc)


4) Pause/Restart SolR service
+ Fast ? (hopefully)
- Client gets errors when trying to connect

In any event, the web app will have to gracefully handle unavailability 
of SolR, probably by displaying a "down for maintenance" message, but 
this should preferably be only a very short amount of time.


Can anyone comment on my proposed solutions above, or provide any 
additional ones?


Thanks for any input you can provide!

-Andy

--
Andy D'Arcy Jewell

SysMicro Limited
Linux Support
E:  andy.jew...@sysmicro.co.uk
W:  www.sysmicro.co.uk



Re: solr searchHandler/searchComponent for query statistics

2012-12-06 Thread Andy Lester

On Dec 6, 2012, at 9:50 AM, "joe.cohe...@gmail.com"  
wrote:

> Is there an out-of-the-box or have anyone already implemented a feature for
> collecting statistics on queries?


What sort of statistics are you talking about?  Are you talking about 
collecting information in aggregate about queries over time?  Or for giving 
statistics about individual queries, like time breakouts for benchmarking?

For the latter, you want "debugQuery=true" and you get a raft of stats down in 
.

xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



Re: Backing up SolR 4.0

2012-12-04 Thread Andy D'Arcy Jewell

On 03/12/12 18:04, Shawn Heisey wrote:


Serious production Solr installs require at least two copies of your 
index.  Failures *will* happen, and sometimes they'll be the kind of 
failures that will take down an entire machine.  You can plan for some 
failures -- redundant power supply and RAID are important for this.  
Some failures will cause downtime, though -- multiple disk failures, 
motherboard, CPU, memory, software problems wiping out your index, 
user error, etc.If you have at least one other copy of your index, 
you'll be able to keep the system operational while you fix the down 
machine.


Replication is a very good way to accomplish getting two or more 
copies of your index.  I would expect that most production Solr 
installations use either plain replication or SolrCloud.  I do my 
redundancy a different way that gives me a lot more flexibility, but 
replication is a VERY solid way to go.


If you are running on a UNIX/Linux platform (just about anything 
*other* than Windows), and backups via replication are not enough for 
you, you can use the hardlink capability in the OS to avoid taking 
Solr down while you make backups.  Here's the basic sequence:


1) Pause indexing, wait for all commits and merges to complete.
2) Create a target directory on the same filesystem as your Solr index.
3) Make hardlinks of all files in your Solr index in the target 
directory.

4) Resume indexing.
5) Copy the target directory to your backup location at your leisure.
6) Delete the hardlink copies from the target directory.

Making hardlinks is a near-instantaneous operation.  The way that 
Solr/Lucene works will guarantee that your hardlink copy will continue 
to be a valid index snapshot no matter what happens to the live 
index.  If you can make the backup and get the hardlinks deleted 
before your index undergoes a merge, the hardlinks will use very 
little extra disk space.


If you leave the hardlink copies around, eventually your live index 
will diverge to the point where the copy has different files and 
therefore takes up disk space.  If you have a *LOT* of extra disk 
space on the Solr server, you can keep multiple hardlink copies around 
as snapshots.


Recent versions of Windows do have features similar to UNIX links, so 
there may in fact be a way to do this on Windows.  I will leave that 
for someone else to pursue.


Thanks,
Shawn

Thanks Shawn, that's very informative. I get twitchy with anything where 
you "can't" back it up (memcached excepted). As an administrator, it's 
my job to recover from failures, and backups are kind of my comfort blanket.


I'm running on Linux (on Debian Squeeze) in a fully virtual 
environment.  Initially, I think I'll have to just schedule the backup 
for the early hours (local time) but as we grow, I can see I'll have to 
use replication to do it seamlessly. The system is necessarily small 
right now, as we haven't yet gone live, butwe are anticipating rapid 
growth, so replication has always been on the cards.


Is there an easy way to tell (say from a shell script) when "all commits 
and merges [are] complete"?


If I keep a replica solely for backup purposes, I assume I can "do what 
I like with it" - presumably replication will resume/catch-up when I 
resume it (I admit, I have a bit of reading to do wrt replication - I 
just skimmed that because it wasn't in my initial brief).


I'm assuming that because you're using hardlinks, that means that SolR 
writes a "new" file when it updates (sortof copy-on-write style)? So we 
are relying on the principle that as long as you have at least one 
remaining reference to the data, it's not deleted...


Thanks once again!

-Andy



--
Andy D'Arcy Jewell

SysMicro Limited
Linux Support
E:  andy.jew...@sysmicro.co.uk
W:  www.sysmicro.co.uk



Re: Backing up SolR 4.0

2012-12-03 Thread Andy D'Arcy Jewell

On 03/12/12 16:39, Erick Erickson wrote:

There's no real need to do what you ask.

First thing is that you should always be prepared, in the worst-case
scenario, to regenerate your entire index.

That said, perhaps the easiest way to back up Solr is just to use
master/slave replication. Consider having a machine that's a slave to the
master (but not necessarily searched against) and periodically poll your
master (say daily or whatever your interval is). You can configure Solr to
keep N copies of the index as extra insurance. These will be fairly static
so if you_really_  wanted to you could just copy the /data
directory somewhere, but I don't know if that's necessary.

See:http://wiki.apache.org/solr/SolrReplication

Best
Erick

Hi Erick,

Thanks for that, I'll take a look.

However, wouldn't re-creating the index on a large dataset take an 
inordinate amount of time? The system I will be backing up is likely to 
undergo rapid development and thus schema changes, so I need some kind 
of insurance against corruption if we need to roll-back after a change.


How should I go about creating multiplebackup versions I can put aside 
(e.g. on tape) to hedge against the down-time which would be required to 
regenerate the indexes from scratch?


Regards,
-Andy

--
Andy D'Arcy Jewell

SysMicro Limited
Linux Support
E:  andy.jew...@sysmicro.co.uk
W:  www.sysmicro.co.uk



Backing up SolR 4.0

2012-12-03 Thread Andy D'Arcy Jewell

Hi all.

I'm new to SolR, and I have recently had to set up a SolR server running 
4.0.


I've been searching for info on backing it up, but all I've managed to 
come up with is "it'll be different" or "you'll be able to do push 
replication" or using http and the command=backup parameter, which 
doesn't sound like it will be effective for a production setup (unless 
I've got that wrong)...



I was wondering if I can just stop or suspend the SolR server, then do 
an LVM snapshot of the data store, before bringing it back on line, but 
I'm not sure if that will cut it. I gather merely rsyncing the data 
files won't do...


Can anyone give me a pointer to that "easy-to-find" document I have so 
far failed to find? Or failing that, maybe some sound advice on how to 
proceed?


Regards,
-Andy




--
Andy D'Arcy Jewell

SysMicro Limited
Linux Support
E:  andy.jew...@sysmicro.co.uk
W:  www.sysmicro.co.uk



Re: Permanently Full Old Generation...

2012-11-30 Thread Andy Kershaw
We are currently operating at reduced load which is why the ParNew
collections are not a problem. I don't know how long they were taking
before though. Thanks for the warning about index formats.

Our JVM is:

Java(TM) SE Runtime Environment (build 1.7.0_09-b05)
Java HotSpot(TM) 64-Bit Server VM (build 23.5-b02, mixed mode)

We are currently running more tests but it takes a while before the issues
become apparent.

Andy Kershaw

On 29 November 2012 18:31, Walter Underwood  wrote:

> Several suggestions.
>
> 1. Adjust the traffic load for about 75% CPU. When you hit 100%, you are
> already in an overload state and the variance of the response times goes
> way up. You'll have very noisy benchmark data.
>
> 2. Do not force manual GCs during a benchmark.
>
> 3. Do not force merge (optimise). That is a very expensive operation and
> will cause slowdowns.
>
> 4. Make eden big enough to hold all data allocated during a request for
> all simultaneous requests. All that stuff is garbage after the end of the
> request. If eden fills up, it will be allocated from the tenured space and
> cause that to grow unnecessarily. We use an 8GB heap and 2GB eden. I like
> setting the size better than setting ratios.
>
> 5. What version of the JVM are you using?
>
> wunder
>
> On Nov 29, 2012, at 10:15 AM, Shawn Heisey wrote:
>
> > On 11/29/2012 10:44 AM, Andy Kershaw wrote:
> >> Annette is away until Monday so I am looking into this in the meantime.
> >> Looking at the times of the Full GC entries at the end of the log, I
> think
> >> they are collections we started manually through jconsole to try and
> reduce
> >> the size of the old generation. This only seemed to have an effect when
> we
> >> reloaded the core first though.
> >>
> >> It is my understanding that the eden size is deliberately smaller to
> keep
> >> the ParNew collection time down. If it takes too long then the node is
> >> flagged as down.
> >
> > Your ParNew collections are taking less than 1 second (some WAY less
> than one second) to complete and the CMS collections are taking far longer
> -- 6 seconds seems to be a common number in the GC log.  GC is unavoidable
> with Java, so if there has to be a collection, you definitely want it to be
> on the young generation (ParNew).
> >
> > Controversial idea coming up, nothing concrete to back it up.  This
> means that you might want to wait for a committer to weigh in:  I have seen
> a lot of recent development work relating to SolrCloud and shard stability.
>  You may want to check out branch_4x from SVN and build that, rather than
> use 4.0.  I don't have any idea what the timeline for 4.1 is, but based on
> what I saw for 3.x releases, it should be released relatively soon.
> >
> > The above advice is a bad idea if you have to be able to upgrade from
> one 4.1 snapshot to a later one without reindexing. There is a possibility
> that the 4.1 index format will change before release and require a reindex,
> it has happened at least twice already.
> >
> > Thanks,
> > Shawn
> >
>
> --
> Walter Underwood
> wun...@wunderwood.org
>
>
>
>


-- 
Andy Kershaw

Technical Developer

ServiceTick Ltd



T: +44(0)1603 618326

M: +44 (0)7876 556833



Seebohm House, 2-4 Queen Street, Norwich, England, NR2 4SQ

www.ServiceTick.com <http://www.servicetick.com/>

www.SessionCam.com <http://www.sessioncam.com/>



*This message is confidential and is intended to be read solely by the
addressee. If you have received this message by mistake, please delete it
and do not copy it to anyone else. Internet communications are not secure
and may be intercepted or changed after they are sent. ServiceTick Ltd does
not accept liability for any such changes.*


Re: Permanently Full Old Generation...

2012-11-29 Thread Andy Kershaw
Thanks for responding Shawn.

Annette is away until Monday so I am looking into this in the meantime.
Looking at the times of the Full GC entries at the end of the log, I think
they are collections we started manually through jconsole to try and reduce
the size of the old generation. This only seemed to have an effect when we
reloaded the core first though.

It is my understanding that the eden size is deliberately smaller to keep
the ParNew collection time down. If it takes too long then the node is
flagged as down.

On 29 November 2012 15:28, Shawn Heisey  wrote:

> > My jvm settings:
> >
> >
> > -Xmx8192M -Xms8192M -XX:+CMSScavengeBeforeRemark -XX:NewRatio=2
> > -XX:+CMSParallelRemarkEnabled -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> > -XX:+AggressiveOpts -XX:CMSInitiatingOccupancyFraction=70
> > -XX:+UseCMSInitiatingOccupancyOnly -XX:-CMSIncrementalPacing
> > -XX:CMSIncrementalDutyCycle=75
> >
> > I turned off IncrementalPacing, and enabled
> > CMSInitiatingOccupancyFraction,
> > after issues with nodes being reported as down due to large Garbage
> > collection pauses.  The problem with the memory profile was visible
> before
> > the drop down to 1.2GB (this was when I reloaded the core), my concern
> was
> > that the collection of the old generation didn't seem to free any of the
> > heap, and we went from occasionally collecting to always collecting the
> > old
> > gen.
> >
> > Please see the attached gc log.
>
> I am on the train for my morning commute, so I have some time, but no
> access to the log or graph.
>
> Confession time: GC logs make me go glassy eyed and babble incoherently,
> but I did take a look at it. I saw 18 CMS collections and three entries
> near the end that saif Full GC. It looks like these collections take 6 to
> 8 seconds. That is pretty nasty, but probably unavoidable, so the goal is
> to make them happen extremely infrequently - do young generation
> collections instead.
>
> The thing that seems to make GC less of a problem for solr is maximizing
> the young generation memory pool. Based on the available info, I would
> start with making NewRatio 1 instead of 2.  This will increase the eden
> size and decrease the old gen size. You may even want to use an explicit
> -Xmn of 6144.  If that doesn't help, you might actually need 6GB or so of
> old gen heap, so try increasing the overall heap size to 9 or 10 GB and
> going back to a NewRatio of 2.
>
> Thanks,
> Shawn
>


Re: stopwords in solr

2012-11-27 Thread Andy Lester

On Nov 28, 2012, at 12:33 AM, Joe Zhang  wrote:

> that is really strange. so basic stopwords such as "a" "the' are not
> eliminated from the index?

There is no list of "basic stopwords" anywhere.  If you want stop words, you 
have to put them in the file yourself.  There are not really any sensible 
defaults for stopwords, so Solr doesn't provide them.

Just add them to the stopwords.txt and reindex your core.

xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



Re: Cacti monitoring of Solr and Tomcat

2012-11-19 Thread Andy Lester

On Nov 19, 2012, at 1:46 PM, Otis Gospodnetic  
wrote:

> My favourite topic ;)  See my sig below for SPM for Solr. At my last
> company we used Cacti but it felt very 1990s almost. Some ppl use zabbix,
> some graphite, some newrelic, some SPM, some nothing!


SPM looks mighty tasty, but we must have it in-house on our own servers, for 
monitoring internal dev systems, and we'd like it to be open source.

We already have Cacti up and running, but it's possible we could use something 
else.

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



  1   2   3   >