Re: Concat Fields in JSON Facet

2017-01-17 Thread Zheng Lin Edwin Yeo
Hi Evans,

Thanks for your reply.

I would like to show both the ItemId and the ItemName together in the same
JSON output bucket.
Currently I'm only able to show one of them in one bucket. If I want to
show both, it will be show in 2 buckets, like the one below, which will
probably cause the output to double in size.

   "facets":{
"count":1,
"itemNo":{
  "buckets":[{
  "val":"",
  "count":3591}]},
"itemName":{
  "buckets":[{
  "val":"item",
  "count":3591}]}

I'm using Solr 6.2.1.

Regards,
Edwin


On 18 January 2017 at 00:15, Tom Evans  wrote:

> On Mon, Jan 16, 2017 at 2:58 PM, Zheng Lin Edwin Yeo
>  wrote:
> > Hi,
> >
> > I have been using JSON Facet, but I am facing some constraints in
> > displaying the field.
> >
> > For example, I have 2 fields, itemId and itemName. However, when I do the
> > JSON Facet, I can only get it to show one of them in the output, and I
> > could not get it to show both together.
> > I will like to show both the ID and Name together, so that it will be
> more
> > meaningful and easier for user to understand, without having to refer to
> > another table to determine the match between the ID and Name.
>
> I don't understand what you mean. If you have these three documents in
> your index, what data do you want in the facet?
>
> [
>   {itemId: 1, itemName: "Apple"},
>   {itemId: 2, itemName: "Android"},
>   {itemId: 3, itemName: "Android"},
> ]
>
> Cheers
>
> Tom
>


Re: SolrCloud: ClusterState says we are the leader but locally we don't think so

2017-01-17 Thread Kelly, Frank
We bounced ZooKeeper nodes one by one but not change

Since this is our Prod server (100M+ docs) we don¹t want to have to
reindex from scratch (takes 7+ days)
So we¹re considering editing /collections//state.json via
zkcli.sh

Thoughts?

-Frank

 


On 1/17/17, 5:49 PM, "Pushkar Raste"  wrote:

>Try bouncing the overseer for your cluster.
>
>On Jan 17, 2017 12:01 PM, "Kelly, Frank"  wrote:
>
>> Solr Version: 5.3.1
>>
>> Configuration: 3 shards, 3 replicas each
>>
>> After running out of heap memory recently (cause unknown) we¹ve been
>> successfully restarting nodes to recover.
>>
>> Finally we did one restart and one of the nodes now says the following
>> 2017-01-17 16:57:16.835 ERROR (qtp1395089624-17)
>> [c:prod_us-east-1_here_account s:shard3 r:core_node26
>> x:prod_us-east-1_here_account_shard3_replica3] o.a.s.c.SolrCore
>> org.apache.solr.common.SolrException: ClusterState says we are the
>>leader
>> 
>>(https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2F10.255
>>.6.196%3A8983%2Fsolr%2Fprod_us-east-1_here_account_shard3_replica3=0
>>1%7C01%7C%7Ce6ff77fca8124af5cd7c08d43f2b2cf6%7C6d4034cd72254f72b85391feae
>>a64919%7C1=iN%2FH3MxId9b6aKcch%2FKSHYncsM5Ug6QJhRfGJ59G7uo%3D
>>ved=0),
>> but locally we don't think so. Request came from null
>>
>> How can we recover from this (for Solr 5.3.1)?
>> Is there someway to force a new leader (I know the following feature
>> exists but in 5.4.0
>>https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues
>>.apache.org%2Fjira%2Fbrowse%2FSOLR-7569=01%7C01%7C%7Ce6ff77fca8124af
>>5cd7c08d43f2b2cf6%7C6d4034cd72254f72b85391feaea64919%7C1=O7Ey7mUyuy
>>Kv%2FEPQvewITfxw8cmTEpUlV18hXGeIKUI%3D=0)
>>
>> Thanks!
>>
>> -Frank
>>
>> [image: Description: Macintosh
>> 
>>HD:Users:jerchow:Downloads:Asset_Package_01_160721:HERE_Logo_2016:sRGB:PD
>>F:HERE_Logo_2016_POS_sRGB.pdf]
>>
>>
>>
>> *Frank Kelly*
>>
>> *Principal Software Engineer*
>>
>>
>>
>> HERE
>>
>> 5 Wayside Rd, Burlington, MA 01803, USA
>>
>> *42° 29' 7" N 71° 11' 32" W*
>>
>>
>> [image: Description:
>> 
>>/Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Ima
>>ges/20160726_HERE_EMail_Signature_360.gif]
>> 
>>>re.com%2F=01%7C01%7C%7Ce6ff77fca8124af5cd7c08d43f2b2cf6%7C6d4034cd72
>>254f72b85391feaea64919%7C1=wDqPoxFI%2BLRB7kZ%2FVkvGPZnz8vHK4VoeEjcK
>>%2FXxTo%2BU%3D=0>[image: Description:
>> 
>>/Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Ima
>>ges/20160726_HERE_EMail_Signature_Twitter.gif]
>> 
>>>witter.com%2Fhere=01%7C01%7C%7Ce6ff77fca8124af5cd7c08d43f2b2cf6%7C6d
>>4034cd72254f72b85391feaea64919%7C1=La8C0bkCRhubUHz6KwLtMnLcv1Txglck
>>ua2YKDldM84%3D=0>   [image: Description:
>> 
>>/Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Ima
>>ges/20160726_HERE_EMail_Signature_FB.gif]
>> 
>>>acebook.com%2Fhere=01%7C01%7C%7Ce6ff77fca8124af5cd7c08d43f2b2cf6%7C6
>>d4034cd72254f72b85391feaea64919%7C1=ngYNMYgIra7Kb1FPus5sp1b%2B5EBis
>>ws9%2Fsy10K%2B4l9I%3D=0>[image: Description:
>> 
>>/Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Ima
>>ges/20160726_HERE_EMail_Signature_IN.gif]
>> 
>>>inkedin.com%2Fcompany%2Fheremaps=01%7C01%7C%7Ce6ff77fca8124af5cd7c08
>>d43f2b2cf6%7C6d4034cd72254f72b85391feaea64919%7C1=KqQpmAcp2l2h4R%2B
>>slYZGf0%2Brwa8c9lzbzeIK%2B94ifrc%3D=0>[image: Description:
>> 
>>/Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Ima
>>ges/20160726_HERE_EMail_Signature_Insta.gif]
>> 
>>>nstagram.com%2Fhere%2F=01%7C01%7C%7Ce6ff77fca8124af5cd7c08d43f2b2cf6
>>%7C6d4034cd72254f72b85391feaea64919%7C1=ys7%2FxZdOGXtlVMFF2%2FBn3At
>>pBN0GuOPePmf9sH4j8Dg%3D=0>
>>



Re: SolrCloud: ClusterState says we are the leader but locally we don't think so

2017-01-17 Thread Pushkar Raste
Try bouncing the overseer for your cluster.

On Jan 17, 2017 12:01 PM, "Kelly, Frank"  wrote:

> Solr Version: 5.3.1
>
> Configuration: 3 shards, 3 replicas each
>
> After running out of heap memory recently (cause unknown) we’ve been
> successfully restarting nodes to recover.
>
> Finally we did one restart and one of the nodes now says the following
> 2017-01-17 16:57:16.835 ERROR (qtp1395089624-17)
> [c:prod_us-east-1_here_account s:shard3 r:core_node26
> x:prod_us-east-1_here_account_shard3_replica3] o.a.s.c.SolrCore
> org.apache.solr.common.SolrException: ClusterState says we are the leader
> (http://10.255.6.196:8983/solr/prod_us-east-1_here_account_shard3_replica3),
> but locally we don't think so. Request came from null
>
> How can we recover from this (for Solr 5.3.1)?
> Is there someway to force a new leader (I know the following feature
> exists but in 5.4.0 https://issues.apache.org/jira/browse/SOLR-7569)
>
> Thanks!
>
> -Frank
>
> [image: Description: Macintosh
> HD:Users:jerchow:Downloads:Asset_Package_01_160721:HERE_Logo_2016:sRGB:PDF:HERE_Logo_2016_POS_sRGB.pdf]
>
>
>
> *Frank Kelly*
>
> *Principal Software Engineer*
>
>
>
> HERE
>
> 5 Wayside Rd, Burlington, MA 01803, USA
>
> *42° 29' 7" N 71° 11' 32" W*
>
>
> [image: Description:
> /Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_360.gif]
> [image: Description:
> /Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_Twitter.gif]
>    [image: Description:
> /Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_FB.gif]
> [image: Description:
> /Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_IN.gif]
> [image: Description:
> /Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_Insta.gif]
> 
>


RE: Search for ISBN-like identifiers

2017-01-17 Thread Moenieb Davids
Hi Guys

Just a quick question on search, which in not related to this post:

I have a few cores which is based on a mainframe extract, 1 core per extracted 
file which resembles a "DB Table"
The cores are all somehow linked via 1 to many fields, with a structure similar 
to a normal ERD

Is it possible to return the result from a query that joins lets say 3 cores in 
the following format:

"core1_id":"XXX",
"_childDocuments_":[
{
  "core2_id":"yyy",
  "core_2_fieldx":"ABC",
  "_childDocuments_":[
  {
"core3_id":"zzz",
"core_3_fieldx":"ABC",
"core3_fieldy":"123",
  {
  "core2_fieldy":"123",
{

Regards
Moenieb Davids

-Original Message-
From: Josh Lincoln [mailto:josh.linc...@gmail.com] 
Sent: 05 January 2017 08:57 PM
To: solr-user@lucene.apache.org
Subject: Re: Search for ISBN-like identifiers

Sebastian,
You may want to try adding autoGeneratePhraseQueries="true" to the fieldtype.
With that setting, a query for 978-3-8052-5094-8 will behave just like "978
3 8052 5094 8" (with the quotes)

A few notes about autoGeneratePhraseQueries
a) it used to be set to true by default, but that was changed several years ago
b) does NOT require a reindex, so very easy to test
c) apparently not recommended for non-whitespace delimited languages (CJK, 
etc), but maybe that's not an issue in your use case.
d) i'm unsure how it'll impact wildcard queries on that field. E.g. will
978-3-8052* match 978-3-8052-5094-8? At the very least, partial ISBNs (e.g.
978-3-8052) would match full ISBN without needing to use the wildcard. I'm just 
not sure what happens if the user includes the wildcard.

Josh

On Thu, Jan 5, 2017 at 1:41 PM Sebastian Riemer  wrote:

> Thank you very much for taking the time to help me!
>
> I'll definitely have a look at the link you've posted.
>
> @ShawnHeisey Thanks too for shedding light on the wildcard behaviour!
>
> Allow me one further question:
> - Assuming that I define a separate field for storing the ISBNs, using 
> the awesome analyzer provider by Mr. Bill Dueber. How do I get that 
> field copied into my general text field, which is used by my 
> QuickSearch-Input?
> Won't that field be processed again by the analyser defined on the 
> text field?
> - Should I alternatively add more fields to the q-Parameter? As for 
> now, I always have set q=text: but I 
> guess one could try something like 
> q=text:+isbnspeciallookupfield: want_to_search>
>
> I don't really know about that last idea though, since the searches 
> are propably OR-combined which is not what I like to have.
>
> Third option would be, to pre-process the distinction to where to look 
> at in the solr in my application of course. I.e. everything being a 
> regex containing only numbers and hyphens with length 13 -> don't 
> query on field text, instead use field isbnspeciallookupfield
>
>
> Many thanks again, and have a nice day!
> Sebastian
>
>
> -Ursprüngliche Nachricht-
> Von: Erik Hatcher [mailto:erik.hatc...@gmail.com]
> Gesendet: Donnerstag, 5. Januar 2017 19:10
> An: solr-user@lucene.apache.org
> Betreff: Re: Search for ISBN-like identifiers
>
> Sebastian -
>
> There’s some precedent out there for ISBN’s.  Bill Dueber and the 
> UMICH/code4lib folks have done amazing work, check it out here -
>
> https://github.com/mlibrary/umich_solr_library_filters < 
> https://github.com/mlibrary/umich_solr_library_filters>
>
>   - Erik
>
>
> > On Jan 5, 2017, at 5:08 AM, Sebastian Riemer 
> wrote:
> >
> > Hi folks,
> >
> >
> > TL;DR: Is there an easy way, to copy ISBNs with hyphens to the 
> > general
> text field, respectively configure the analyser on that field, so that 
> a search for the hyphenated ISBN returns exactly the matching document?
> >
> > Long version:
> > I've defined a field "text" of type "text_general", where I copy all 
> > my other fields to, to be able to do a "quick search" where I set 
> > q=text
> >
> > The definition of the type text_general is like this:
> >
> >
> >
> >  > positionIncrementGap="100">
> >
> >  
> >
> >
> >
> > > words="stopwords.txt" />
> >
> >
> >
> >  
> >
> >  
> >
> >
> >
> > > words="stopwords.txt" />
> >
> > > synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> >
> >
> >
> >  
> >
> >
> >
> >
> > I now face the problem, that searching for a book with
> > text:978-3-8052-5094-8* does not return the single result I expect.
> > However searching for text:9783805250948* instead returns a result.
> > Note, that I am adding a wildcard at the end automatically, to 
> > further broaden the resultset. Note also, that it does not seem to 
> > matter whether I put backslashes in front of the hyphen or not (to 
> > be exact, when sending via SolrJ from my application, I put in the 
> > 

Re: Solr schema design: fitting time-series data

2017-01-17 Thread map reduced
That's a good point Alex, about indexed vs stored. Since all my queries are
exact match, I can just have them stored=false to save space. I believe
that helps since there are billions of rows and it'll hopefully save on
quite some of space.
But nothing can be done for squeezing dates in same document right?

Thanks for the reply.


On Tue, Jan 17, 2017 at 10:50 AM, Alexandre Rafalovitch 
wrote:

> On 16 January 2017 at 00:54, map reduced  wrote:
> > some way to squeeze timestamps in single
> > document so that it doesn't increase the number of document by a lot and
> I
> > am still able to range query on 'ts'.
>
> Would DateRangeField be useful here?
> https://cwiki.apache.org/confluence/display/solr/Working+with+Dates
>
> Also, if the fields are indexed and not stored, more records with the
> same values is not such a big deal because they effectively just add
> more indexes from the token tables. Of course, I am not sure whether
> this advice scales as much as your specific use case requires, but it
> is just something to keep in mind.
>
> Regards,
>Alex.
>
>
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>


Re: Solr schema design: fitting time-series data

2017-01-17 Thread Alexandre Rafalovitch
On 16 January 2017 at 00:54, map reduced  wrote:
> some way to squeeze timestamps in single
> document so that it doesn't increase the number of document by a lot and I
> am still able to range query on 'ts'.

Would DateRangeField be useful here?
https://cwiki.apache.org/confluence/display/solr/Working+with+Dates

Also, if the fields are indexed and not stored, more records with the
same values is not such a big deal because they effectively just add
more indexes from the token tables. Of course, I am not sure whether
this advice scales as much as your specific use case requires, but it
is just something to keep in mind.

Regards,
   Alex.



http://www.solr-start.com/ - Resources for Solr users, new and experienced


Re: Solr schema design: fitting time-series data

2017-01-17 Thread map reduced
Anyone has any idea?

On Sun, Jan 15, 2017 at 9:54 PM, map reduced  wrote:

> I may have used wrong terminology, by complex types I meant non-primitive
> types. Mutlivalued can be conceptualized as a list of values for instance
> in your example myint = [ 32, 77] etc which you can possibly analyze and
> query upon. What I was trying to ask is if a complex type can be
> multi-valued or something along those lines that can be supported by range
> queries.
>
> For instance: Below rows will have to be individual docs in Solr (in my
> knowledge) -  If I want to range query from ts=Jan 12 to ts=Jan 15 give me
> sum of 'unique' where 'contentId=1,product=mobile'
>
> contentId=1,product=mobilets=Jan15 total=12,unique=5
> contentId=1,product=mobilets=Jan14 total=10,unique=3
> contentId=1,product=mobilets=Jan13 total=15,unique=2
> contentId=1,product=mobilets=Jan12 total=17,unique=4
> ..
>
> This increases number of documents in Solr by a lot. Only if there was a
> way to do something like:
>
> {
> contentId=1
> product=mobile
> ts = [
>
> {
>
> time = Jan15
>
> total = 12
>
> unique = 15
>
> },
> {
>
> time = Jan16
>
> total = 10
>
> unique = 3
>
> },
>
> ..
> ..
> ]}
>
> Of course above isn't allowed, but some way to squeeze timestamps in
> single document so that it doesn't increase the number of document by a lot
> and I am still able to range query on 'ts'.
>
> For some (combination of fields) rows the timestamps may go upto last 3-6
> months!
>
> Let me know if I am still being unclear.
>
> On Sun, Jan 15, 2017 at 8:04 PM, Erick Erickson 
> wrote:
>
>> bq: I know multivalued fields don't support complex data  types
>>
>> Not sure what you're talking about here. mulitValued actually has
>> nothing to do with data types. You can have text fields which
>> are analyzed and produce multiple tokens and are multiValued.
>> You can have primitive types (string, int/long/float/double,
>> boolean etc) that are multivalued. or they can be single valued.
>>
>> All "multiValued" means is that the _input_ can have the same field
>> repeated, i.e.
>> 
>>   some stuff
>>   more stuff
>>   77
>> 
>>
>> This doc would fail of mytext or myint were multiValued=false but
>> succeed if multiValued=true at index time.
>>
>> There are some subtleties with text (analyzed) multivalued fields having
>> to do with token offsets, but that's not germane.
>>
>> Does that change your problem? Your document could have a dozen
>> timestamps
>>
>> However, there isn't a good way to query across multiple multivalued
>> fields
>> in parallel. That is, a doc like
>>
>> myint=1
>> myint=2
>> myint=3
>> mylong=4
>> mylong=5
>> mylong=6
>>
>> there's no good way to say "only match this document if mhyint=1 AND
>> mylong=4 AND they_are_both_in_the_same_position.
>>
>> That is, asking for myint=1 AND mylong=6 would match the above. Is
>> that what you're
>> wondering about?
>>
>> --
>> I expect you're really asking to do the second above, in which case you
>> might
>> want to look at StreamingExpressions and/or ParallelSQL in Solr 6.x
>>
>> Best,
>> Erick
>>
>> On Sun, Jan 15, 2017 at 7:31 PM, map reduced  wrote:
>> > Hi,
>> >
>> > I am trying to fit the following data in Solr to support flexible
>> queries
>> > and would like to get your input on the same. I have data about users
>> say:
>> >
>> > contentID (assume uuid),
>> > platform (eg. website, mobile etc),
>> > softwareVersion (eg. sw1.1, sw2.5, ..etc),
>> > regionId (eg. us144, uk123, etc..)
>> > 
>> >
>> > and few more other such fields. This data is partially pre aggregated
>> (read
>> > Hadoop jobs): so let’s assume for "contentID = uuid123 and platform =
>> > mobile and softwareVersion = sw1.2 and regionId = ANY" I have data in
>> > format:
>> >
>> > timestamp  pre-aggregated data [ uniques, total]
>> >  Jan 15[ 12, 4]
>> >  Jan 14[ 4, 3]
>> >  Jan 13[ 8, 7]
>> >  ......
>> >
>> > And then I also have less granular data say "contentID = uuid123 and
>> > platform = mobile and softwareVersion = ANY and regionId = ANY (These
>> > values will be more than above table since granularity is reduced)
>> >
>> > timestamp : pre-aggregated data [uniques, total]
>> >  Jan 15[ 100, 40]
>> >  Jan 14[ 45, 30]
>> >  ...   ...
>> >
>> > I'll get queries like "contentID = uuid123 and platform = mobile" , give
>> > sum of 'uniques' for Jan15 - Jan13 or for "contentID=uuid123 and
>> > platform=mobile and softwareVersion=sw1.2", give sum of 'total' for
>> Jan15 -
>> > Jan01.
>> >
>> > I was thinking of simple schema where documents will be like (first
>> example
>> > above):
>> >
>> > {
>> >   "contentID": "uuid12349789",
>> >   "platform" : "mobile",
>> >   "softwareVersion": "sw1.2",
>> >   "regionId": "ANY",
>> >   "ts" : "2017-01-15T01:01:21Z",
>> >   "unique": 12,
>> >   "total": 4
>> > }
>> >
>> > second example from above:
>> >
>> > {
>> >   

RE: StringIndexOutOfBoundsException "in" SpellCheckCollator.getCollation

2017-01-17 Thread Dyer, James
This sounds a lot like SOLR-4489.  However it looks like this was fixed prior 
to you version (4.5).  So it could be you found another case where this bug 
still exists.

The other thing is the default Query Converter cannot handle all cases, and it 
could be the query you are sending is beyond its abilities?  Even in this case, 
it'd be nice if it failed more gracefully than this.

Could you provide the query parameters you are sending and also how you have 
spellcheck configured?

James Dyer
Ingram Content Group


-Original Message-
From: Clemens Wyss DEV [mailto:clemens...@mysign.ch] 
Sent: Thursday, January 05, 2017 8:22 AM
To: 'solr-user@lucene.apache.org' 
Subject: StringIndexOutOfBoundsException "in" SpellCheckCollator.getCollation

I am seeing many exceptions like this in my Solr [5.4.1] log:
null:java.lang.StringIndexOutOfBoundsException: String index out of range: -2
at 
java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:824)
at java.lang.StringBuilder.replace(StringBuilder.java:262)
at 
org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:236)
at 
org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:93)
at 
org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:238)
at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:203)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:273)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2073)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)
...
at java.lang.Thread.run(Thread.java:745)

What am I potentially facing here?

Thx
Clemens


RE: Can't get spelling suggestions to work properly

2017-01-17 Thread Dyer, James
Jimi,

Generally speaking, spellcheck does not work well against fields with stemming, 
or other "heavy" analysis.  I would  to a field that is tokenized 
on whitespace with little else, and use that field for spellcheck.

By default, the spellchecker does not suggest for words in the index.  So if 
the user misspells a word but the misspelling is actually some other word that 
is indexed, it will never suggest.  You can orverride this behavior by 
specifying  "spellcheck.alternativeTermCount" with a value >0.  This is how 
many suggestions it should give for words that indeed exist in the index.  This 
can be the same value as "spellcheck.count", but you may wish to set it to a 
lower value.

I do not recommend using "spellcheck.onlyMorePopular".  It is similar to 
"spellcheck.alternativeTermCount", but in my opinion, the later gives a better 
experience.

You might also wish to set "spellcheck.maxResultsForSuggest".  If you set this, 
then the spellchecker will not suggest anything if more results are returned 
than the value you specify.  This is helpful in providing "did you mean"-style 
suggestions for queries that return few results.

If you would like to ensure the suggestions combine nicely into a re-written 
query that returns results, then specify both "spellcheck.collate=true" and 
"spellcheck.maxCollationTries" to a value >0 (possibly 5-10).  This will cause 
it to internally check the re-written queries (aka. Collations) and report back 
on how many results you get for each.  If you are using "q.op=OR" or a low 
value for "mm", then you will likely want to override this with something like 
"spellcheck.collateParam.mm=0".  Otherwise every combination will get reported 
as returning results.

I hope this and other comments you've gotten helps demystify spellcheck 
configuration.  I do agree it is fairly complicated and frustrating to get it 
just right.

James Dyer
Ingram Content Group

-Original Message-
From: jimi.hulleg...@svensktnaringsliv.se 
[mailto:jimi.hulleg...@svensktnaringsliv.se] 
Sent: Friday, January 13, 2017 5:16 AM
To: solr-user@lucene.apache.org
Subject: RE: Can't get spelling suggestions to work properly

I just noticed why setting maxResultsForSuggest to a high value was not a good 
thing. Because now it show spelling suggestions even on correctly spelled words.

I think, what I would need is the logic of SuggestMode. 
SUGGEST_WHEN_NOT_IN_INDEX, but with a configurable limit instead of it being 
hard coded to 0. Ie just as maxQueryFrequency works.

/Jimi

-Original Message-
From: jimi.hulleg...@svensktnaringsliv.se 
[mailto:jimi.hulleg...@svensktnaringsliv.se] 
Sent: Friday, January 13, 2017 5:56 PM
To: solr-user@lucene.apache.org
Subject: RE: Can't get spelling suggestions to work properly

Hi Alessandro,

Thanks for your explanation. It helped a lot. Although setting 
"spellcheck.maxResultsForSuggest" to a value higher than zero was not enough. I 
also had to set "spellcheck.alternativeTermCount". With that done, I now get 
suggestions when searching for 'mycet' (a misspelling of the Swedish word 
'mycket', that didn't return suggestions before).

Although, I'm still not able to fully understand how to configure this 
properly. Because with this change there now are other misspelled searches that 
now longer gives suggestions. The problem here is stemming, I suspect. Because 
the main search fields use stemming, so that in some cases one can get lots of 
results for spellings that doesn't exist in the index at all (or, at least not 
in the spelling-field). How can I configure this component so that those 
suggestions are still included? Do I need to set maxResultsForSuggest to a 
really high number? Like Integer.MAX_VALUE? I feel that such a setting would 
defeat the purpose of that parameter, in a way. But I'm not sure how else to 
solve this.

Also, there is one other things I wonder about the spelling suggestions, that 
you might have the answer to. Is there a way to make the logic case 
insensitive, but the presentation case sensitive? For example, a search for 
'georg washington' now would return 'george washington' as a suggestion, but ' 
Georg Washington' would be even better.

Regards
/Jimi


-Original Message-
From: alessandro.benedetti [mailto:abenede...@apache.org] 
Sent: Thursday, January 12, 2017 5:14 PM
To: solr-user@lucene.apache.org
Subject: Re: Can't get spelling suggestions to work properly

Hi Jimi,
taking a look to the *maxQueryFrequency*  param :

Your understanding is correct.

1) we don't provide misspelled suggestions if we set the param to 1, and we 
have a minimum of 1 doc freq for the term .

2) we don't provide misspelled suggestions if the doc frequency of the term is 
greater than the max limit set.

Let us explore the code :

if (suggestMode==SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX && docfreq > 0) {
  return new SuggestWord[0];
}
/// If we are working in "Not in Index Mode" , with a document frequency >0 we 
get 

SolrCloud: ClusterState says we are the leader but locally we don't think so

2017-01-17 Thread Kelly, Frank
Solr Version: 5.3.1

Configuration: 3 shards, 3 replicas each

After running out of heap memory recently (cause unknown) we've been 
successfully restarting nodes to recover.

Finally we did one restart and one of the nodes now says the following
2017-01-17 16:57:16.835 ERROR (qtp1395089624-17) [c:prod_us-east-1_here_account 
s:shard3 r:core_node26 x:prod_us-east-1_here_account_shard3_replica3] 
o.a.s.c.SolrCore org.apache.solr.common.SolrException: ClusterState says we are 
the leader 
(http://10.255.6.196:8983/solr/prod_us-east-1_here_account_shard3_replica3), 
but locally we don't think so. Request came from null

How can we recover from this (for Solr 5.3.1)?
Is there someway to force a new leader (I know the following feature exists but 
in 5.4.0 https://issues.apache.org/jira/browse/SOLR-7569)

Thanks!

-Frank

[Description: Macintosh 
HD:Users:jerchow:Downloads:Asset_Package_01_160721:HERE_Logo_2016:sRGB:PDF:HERE_Logo_2016_POS_sRGB.pdf]



Frank Kelly

Principal Software Engineer



HERE

5 Wayside Rd, Burlington, MA 01803, USA

42° 29' 7" N 71° 11' 32" W

[Description: 
/Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_360.gif]
[Description: 
/Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_Twitter.gif]
 [Description: 
/Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_FB.gif]
  [Description: 
/Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_IN.gif]
  [Description: 
/Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_Insta.gif]
 


Re: problem with data import handler delta import due to use of multiple datasource

2017-01-17 Thread amylindan
Did you solve the problem? I'm stuck with exactly the same problem now. Let
me know if you already had a solution,please.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/problem-with-data-import-handler-delta-import-due-to-use-of-multiple-datasource-tp4093698p4314273.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Concat Fields in JSON Facet

2017-01-17 Thread Tom Evans
On Mon, Jan 16, 2017 at 2:58 PM, Zheng Lin Edwin Yeo
 wrote:
> Hi,
>
> I have been using JSON Facet, but I am facing some constraints in
> displaying the field.
>
> For example, I have 2 fields, itemId and itemName. However, when I do the
> JSON Facet, I can only get it to show one of them in the output, and I
> could not get it to show both together.
> I will like to show both the ID and Name together, so that it will be more
> meaningful and easier for user to understand, without having to refer to
> another table to determine the match between the ID and Name.

I don't understand what you mean. If you have these three documents in
your index, what data do you want in the facet?

[
  {itemId: 1, itemName: "Apple"},
  {itemId: 2, itemName: "Android"},
  {itemId: 3, itemName: "Android"},
]

Cheers

Tom


indexing error - 6.3.0

2017-01-17 Thread Joe Obernberger
While indexing a large number of records in Solr Cloud 6.3.0 with a 5 
node configuration, I received an error.  I'm using java code / solrj to 
perform the indexing by creating a list of SolrInputDocuments, 1000 at a 
time, and then calling CloudSolrClient.add(list).  The records are small 
- about 6 fields of short strings and numbers.


If I do 100 at a time, I can't replicate the error, but 1000 at a time 
has consistently causes the below exception to occur.  The index is 
stored in a shared HDFS.


2017-01-17 04:21:00.022 ERROR (qtp606548741-21) [c:Worldline s:shard5 
r:core_node1 x:Worldline_shard5_replica1] o.a.s.h.RequestHandlerBase 
org.apache.solr.common.SolrException: Exception writing document id 
6228601a-8756-4b16-bdc3-ad026754b225 to the index; possible analysis error.
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:178)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:335)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:74)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:957)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1112)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:738)
at 
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
at 
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:179)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:135)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:275)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:240)
at 
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:158)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:186)
at 
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:107)
at 
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:54)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:153)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:2213)
at 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)

at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:303)

Re: A feature idea for discussion -- fields that can only be explicitly retrieved

2017-01-17 Thread alessandro.benedetti
Shawn Heisey wrote
> If the data for a field in the results comes from docValues instead of 
> stored fields, I don't think it is compressed, which hopefully means 
> that if a field is NOT requested, the corresponding docValues data is 
> never read. 

I think we need to make a consideration here.
DocValues is a data structure per field ( column style).
And it is compressed on disk (
https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/codecs/lucene70/Lucene70DocValuesFormat.java
).
I took a brief look to the code,I was wondering if it is possible that the
index reader will read the entire segment content for all the docValues per
type( like all the numeric doc values, all the binary ect) and then put it
in a map in the Solr process heap only the docValues related the fields
requested ? 
So the OS should memory map the whole content for the segment and then when
Solr requests a specific field, it accesses the entry in the Map in the Heap
( really a brief look into Lucene70DocValuesProducer so I may be completely
wrong).
If this is correct, it means that we read the entire content from the
segment for the docValues, even the portion related a field that is not
requested.
The only difference would be that a not requested doc Values field content,
will not be stored in the Solr process heap.
Is there any place to read more about this ? ( apart the source code)

Cheers






--
View this message in context: 
http://lucene.472066.n3.nabble.com/A-feature-idea-for-discussion-fields-that-can-only-be-explicitly-retrieved-tp4313890p4314288.html
Sent from the Solr - User mailing list archive at Nabble.com.