Re: field set up help

2016-11-17 Thread Will Martin
don't give up yet kris.

q={!prefix f=metatag.date}2016-10&debugQuery

g'luck

will

On 11/17/2016 5:56 PM, Kris Musshorn wrote:

This q={!prefix f=metatag.date}2016-10 returns zero records

-Original Message-
From: KRIS MUSSHORN [mailto:mussho...@comcast.net]
Sent: Thursday, November 17, 2016 3:00 PM
To: solr-user@lucene.apache.org
Subject: Re: field set up help

so if the field was named metatag.date q={!prefix f=metatag.date}2016-10

- Original Message -

From: "Erik Hatcher" 
To: solr-user@lucene.apache.org
Sent: Thursday, November 17, 2016 2:46:32 PM
Subject: Re: field set up help

Given what you’ve said, my hunch is you could make the query like this:

q={!prefix f=field_name}2016-10

tada!  ?!

there’s nothing wrong with indexing dates as text like that, as long as your 
queries are performantly possible.   And in the case of the query type you 
mentioned, the text/string’ish indexing you’ve done is suited quite well to 
prefix queries to grab dates by year, year-month, and year-month-day.   But you 
could, if needed to get more sophisticated with date queries (DateRangeField is 
my new favorite) you can leverage ParseDateFieldUpdateProcessorFactory without 
having to change the incoming format.

Erik






On Nov 17, 2016, at 1:55 PM, KRIS MUSSHORN 
 wrote:


I have a field in solr 5.4.1 that has values like:
2016-10-15
2016-09-10
2015-10-12
2010-09-02

Yes it is a date being stored as text.

I am getting the data onto solr via nutch and the metatag plug in.

The data is coming directly from the website I am crawling and I am not able to 
change the data at the source to something more palpable.

The field is set in solr to be of type TextField that is indexed, tokenized, 
stored, multivalued and norms are omitted.

Both the index and query analysis chains contain just the whitespace tokenizer 
factory and the lowercase filter factory.

I need to be able to query for 2016-10 and only match 2016-10-15.

Any ideas on how to set this up?

TIA

Kris










Re: Is it possible to do pivot grouping in SOLR?

2016-11-17 Thread Will Martin
well, a quickly formulated query against some strange kind of 
endpoint...

collapse and expand; with expand.sort

look it up; its in the ref guide.

On 11/17/2016 1:42 PM, bbarani wrote:
> Is there a way to do pivot grouping (group within a group) in SOLR?
>
> We initially group the results by category and inturn we are trying to group
> the data under one category based on another field. Is there a way to do
> that?
>
> Categories (group by)
> |--Shop
>|---Color (group by)
> |--Support
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Is-it-possible-to-do-pivot-grouping-in-SOLR-tp4306352.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: field set up help

2016-11-17 Thread Erick Erickson
because you have this as an analyzed field rather than a string field I think.

Best,
Erick

On Thu, Nov 17, 2016 at 2:56 PM, Kris Musshorn  wrote:
> This q={!prefix f=metatag.date}2016-10 returns zero records
>
> -Original Message-
> From: KRIS MUSSHORN [mailto:mussho...@comcast.net]
> Sent: Thursday, November 17, 2016 3:00 PM
> To: solr-user@lucene.apache.org
> Subject: Re: field set up help
>
> so if the field was named metatag.date q={!prefix f=metatag.date}2016-10
>
> - Original Message -
>
> From: "Erik Hatcher" 
> To: solr-user@lucene.apache.org
> Sent: Thursday, November 17, 2016 2:46:32 PM
> Subject: Re: field set up help
>
> Given what you’ve said, my hunch is you could make the query like this:
>
> q={!prefix f=field_name}2016-10
>
> tada!  ?!
>
> there’s nothing wrong with indexing dates as text like that, as long as your 
> queries are performantly possible.   And in the case of the query type you 
> mentioned, the text/string’ish indexing you’ve done is suited quite well to 
> prefix queries to grab dates by year, year-month, and year-month-day.   But 
> you could, if needed to get more sophisticated with date queries 
> (DateRangeField is my new favorite) you can leverage 
> ParseDateFieldUpdateProcessorFactory without having to change the incoming 
> format.
>
> Erik
>
>
>
>
>> On Nov 17, 2016, at 1:55 PM, KRIS MUSSHORN  wrote:
>>
>>
>> I have a field in solr 5.4.1 that has values like:
>> 2016-10-15
>> 2016-09-10
>> 2015-10-12
>> 2010-09-02
>>
>> Yes it is a date being stored as text.
>>
>> I am getting the data onto solr via nutch and the metatag plug in.
>>
>> The data is coming directly from the website I am crawling and I am not able 
>> to change the data at the source to something more palpable.
>>
>> The field is set in solr to be of type TextField that is indexed, tokenized, 
>> stored, multivalued and norms are omitted.
>>
>> Both the index and query analysis chains contain just the whitespace 
>> tokenizer factory and the lowercase filter factory.
>>
>> I need to be able to query for 2016-10 and only match 2016-10-15.
>>
>> Any ideas on how to set this up?
>>
>> TIA
>>
>> Kris
>>
>
>
>


RE: field set up help

2016-11-17 Thread Kris Musshorn
This q={!prefix f=metatag.date}2016-10 returns zero records

-Original Message-
From: KRIS MUSSHORN [mailto:mussho...@comcast.net] 
Sent: Thursday, November 17, 2016 3:00 PM
To: solr-user@lucene.apache.org
Subject: Re: field set up help

so if the field was named metatag.date q={!prefix f=metatag.date}2016-10 

- Original Message -

From: "Erik Hatcher"  
To: solr-user@lucene.apache.org 
Sent: Thursday, November 17, 2016 2:46:32 PM 
Subject: Re: field set up help 

Given what you’ve said, my hunch is you could make the query like this: 

q={!prefix f=field_name}2016-10 

tada!  ?! 

there’s nothing wrong with indexing dates as text like that, as long as your 
queries are performantly possible.   And in the case of the query type you 
mentioned, the text/string’ish indexing you’ve done is suited quite well to 
prefix queries to grab dates by year, year-month, and year-month-day.   But you 
could, if needed to get more sophisticated with date queries (DateRangeField is 
my new favorite) you can leverage ParseDateFieldUpdateProcessorFactory without 
having to change the incoming format. 

Erik 




> On Nov 17, 2016, at 1:55 PM, KRIS MUSSHORN  wrote: 
> 
> 
> I have a field in solr 5.4.1 that has values like: 
> 2016-10-15 
> 2016-09-10 
> 2015-10-12 
> 2010-09-02 
>   
> Yes it is a date being stored as text. 
>   
> I am getting the data onto solr via nutch and the metatag plug in. 
>   
> The data is coming directly from the website I am crawling and I am not able 
> to change the data at the source to something more palpable. 
>   
> The field is set in solr to be of type TextField that is indexed, tokenized, 
> stored, multivalued and norms are omitted. 
>   
> Both the index and query analysis chains contain just the whitespace 
> tokenizer factory and the lowercase filter factory. 
>   
> I need to be able to query for 2016-10 and only match 2016-10-15. 
>   
> Any ideas on how to set this up? 
>   
> TIA 
>   
> Kris   
>   





Re: highlighting on child document

2016-11-17 Thread Yangrui Guo
Thanks. Does Solr plan to add highlighting on children in future?

On Thursday, November 17, 2016, vstrugatsky  wrote:

> It appears that highlighting works for fields in the parent documents only.
> https://issues.apache.org/jira/browse/LUCENE-5929 only fixed a bug when
> trying to highlight fields in a parent document when using Block Join
> Parser.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/highlighting-on-child-document-tp4238236p4306375.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: highlighting on child document

2016-11-17 Thread vstrugatsky
It appears that highlighting works for fields in the parent documents only.
https://issues.apache.org/jira/browse/LUCENE-5929 only fixed a bug when
trying to highlight fields in a parent document when using Block Join
Parser.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/highlighting-on-child-document-tp4238236p4306375.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Fwd: Standard highlighting doesn't work for Block Join

2016-11-17 Thread vstrugatsky
My understanding is that the fields you are highlighting need to be in a
parent document, and that there is no support for highlighting fields in a
child document at the moment.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fwd-Standard-highlighting-doesn-t-work-for-Block-Join-tp4260784p4306374.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: field set up help

2016-11-17 Thread KRIS MUSSHORN
so if the field was named metatag.date q={!prefix f=metatag.date}2016-10 

- Original Message -

From: "Erik Hatcher"  
To: solr-user@lucene.apache.org 
Sent: Thursday, November 17, 2016 2:46:32 PM 
Subject: Re: field set up help 

Given what you’ve said, my hunch is you could make the query like this: 

    q={!prefix f=field_name}2016-10 

tada!  ?! 

there’s nothing wrong with indexing dates as text like that, as long as your 
queries are performantly possible.   And in the case of the query type you 
mentioned, the text/string’ish indexing you’ve done is suited quite well to 
prefix queries to grab dates by year, year-month, and year-month-day.   But you 
could, if needed to get more sophisticated with date queries (DateRangeField is 
my new favorite) you can leverage ParseDateFieldUpdateProcessorFactory without 
having to change the incoming format. 

Erik 




> On Nov 17, 2016, at 1:55 PM, KRIS MUSSHORN  wrote: 
> 
> 
> I have a field in solr 5.4.1 that has values like: 
> 2016-10-15 
> 2016-09-10 
> 2015-10-12 
> 2010-09-02 
>   
> Yes it is a date being stored as text. 
>   
> I am getting the data onto solr via nutch and the metatag plug in. 
>   
> The data is coming directly from the website I am crawling and I am not able 
> to change the data at the source to something more palpable. 
>   
> The field is set in solr to be of type TextField that is indexed, tokenized, 
> stored, multivalued and norms are omitted. 
>   
> Both the index and query analysis chains contain just the whitespace 
> tokenizer factory and the lowercase filter factory. 
>   
> I need to be able to query for 2016-10 and only match 2016-10-15. 
>   
> Any ideas on how to set this up? 
>   
> TIA 
>   
> Kris   
>   




Re: field set up help

2016-11-17 Thread Shawn Heisey
On 11/17/2016 11:55 AM, KRIS MUSSHORN wrote:
> I have a field in solr 5.4.1 that has values like: 
> 2016-10-15 
> 2016-09-10 
> 2015-10-12 
> 2010-09-02 
>   
> Yes it is a date being stored as text. 

> I need to be able to query for 2016-10 and only match 2016-10-15. 

I think your best bet is to use DateRangeField instead of TextField. 
You will need to reindex after changing your schema.

https://lucidworks.com/blog/2016/02/13/solrs-daterangefield-perform/
https://cwiki.apache.org/confluence/display/solr/Working+with+Dates#WorkingwithDates-DateRangeFormatting

Thanks,
Shawn



Re: field set up help

2016-11-17 Thread Erik Hatcher
Given what you’ve said, my hunch is you could make the query like this:

q={!prefix f=field_name}2016-10

tada!  ?!

there’s nothing wrong with indexing dates as text like that, as long as your 
queries are performantly possible.   And in the case of the query type you 
mentioned, the text/string’ish indexing you’ve done is suited quite well to 
prefix queries to grab dates by year, year-month, and year-month-day.   But you 
could, if needed to get more sophisticated with date queries (DateRangeField is 
my new favorite) you can leverage ParseDateFieldUpdateProcessorFactory without 
having to change the incoming format.

Erik




> On Nov 17, 2016, at 1:55 PM, KRIS MUSSHORN  wrote:
> 
> 
> I have a field in solr 5.4.1 that has values like: 
> 2016-10-15 
> 2016-09-10 
> 2015-10-12 
> 2010-09-02 
>   
> Yes it is a date being stored as text. 
>   
> I am getting the data onto solr via nutch and the metatag plug in. 
>   
> The data is coming directly from the website I am crawling and I am not able 
> to change the data at the source to something more palpable. 
>   
> The field is set in solr to be of type TextField that is indexed, tokenized, 
> stored, multivalued and norms are omitted. 
>   
> Both the index and query analysis chains contain just the whitespace 
> tokenizer factory and the lowercase filter factory. 
>   
> I need to be able to query for 2016-10 and only match 2016-10-15. 
>  
> Any ideas on how to set this up? 
>   
> TIA 
>   
> Kris  
>   



field set up help

2016-11-17 Thread KRIS MUSSHORN

I have a field in solr 5.4.1 that has values like: 
2016-10-15 
2016-09-10 
2015-10-12 
2010-09-02 
  
Yes it is a date being stored as text. 
  
I am getting the data onto solr via nutch and the metatag plug in. 
  
The data is coming directly from the website I am crawling and I am not able to 
change the data at the source to something more palpable. 
  
The field is set in solr to be of type TextField that is indexed, tokenized, 
stored, multivalued and norms are omitted. 
  
Both the index and query analysis chains contain just the whitespace tokenizer 
factory and the lowercase filter factory. 
  
I need to be able to query for 2016-10 and only match 2016-10-15. 
  
Any ideas on how to set this up? 
  
TIA 
  
Kris  
  


Re: Multiple search-queries in 1 http request ?

2016-11-17 Thread Mikhail Khludnev
Hello,
There is nothing like that in Solr.

On Thursday, November 17, 2016, Dorian Hoxha  wrote:

> Hi,
>
> I couldn't find anything in core for "multiple separate queries in 1 http
> request" like elasticsearch
>  current/search-multi-search.html>
> ? I found this
> 
> blog-post though I thought there is/should/would be something in core ?
>
> Thank You
>


-- 
Sincerely yours
Mikhail Khludnev


Is it possible to do pivot grouping in SOLR?

2016-11-17 Thread bbarani
Is there a way to do pivot grouping (group within a group) in SOLR?

We initially group the results by category and inturn we are trying to group
the data under one category based on another field. Is there a way to do
that?

Categories (group by)
|--Shop
  |---Color (group by)
|--Support



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-possible-to-do-pivot-grouping-in-SOLR-tp4306352.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Overlapped Gap Facets

2016-11-17 Thread Andy C
You might want to look at using Interval Facets (
https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-IntervalFaceting)
in combination with relative dates specified using the Date Math feature (
https://cwiki.apache.org/confluence/display/solr/Working+with+Dates)

You would have to decide exactly what you mean by each of these intervals.
Does "Last 1 Day" mean  today (which could be specified by the interval
"[NOW/DAY, NOW/DAY+1DAYS)"), yesterday and today ("[NOW/DAY-1DAYS,
NOW/DAY+1DAYS)"), etc.

You could decide that you want it to mean the last 24 hours
("[NOW-1DAYS,NOW]"), but be aware that when you subsequently restrict your
query using one of these intervals using NOW without rounding has a
negative impact on the filter query cache (see
https://dzone.com/articles/solr-date-math-now-and-filter for a better
explanation than I could provide.

- Andy -

On Thu, Nov 17, 2016 at 10:46 AM, David Santamauro <
david.santama...@gmail.com> wrote:

>
> I had a similar question a while back but it was regarding date
> differences. Perhaps that might give you some ideas.
>
> http://lucene.472066.n3.nabble.com/date-difference-faceting-td4249364.html
>
> //
>
>
>
>
> On 11/17/2016 09:49 AM, Furkan KAMACI wrote:
>
>> Is it possible to do such a facet on a date field:
>>
>>   Last 1 Day
>>   Last 1 Week
>>   Last 1 Month
>>   Last 6 Month
>>   Last 1 Year
>>   Older than 1 Year
>>
>> which has overlapped facet gaps?
>>
>> Kind Regards,
>> Furkan KAMACI
>>
>>


Re: Using solr(cloud) as source-of-truth for data (with no backing external db)

2016-11-17 Thread Walter Underwood
I agree, it is a bad idea.

Solr is missing nearly everything you want in a repository, because it is
not designed to be a repository.

Does not have:

* access control
* transactions
* transactional backup
* dump and load
* schema migration
* versioning

And so on.

Also, I’m glad to share a one-line curl command that will delete all the 
documents
in your collection.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Nov 17, 2016, at 1:20 AM, Alexandre Rafalovitch  wrote:
> 
> I've heard of people doing it but it is not recommended.
> 
> One of the biggest implementation breakthroughs is that - after the
> initial learning curve - you will start mapping your input data to
> signals. Those signals will not look very much like your original data
> and therefore are not terribly suitable to be the source of it.
> 
> We are talking copyFields, UpdateRequestProcessor pre-processing,
> fields that are not stored, nested documents flattening,
> denormalization, etc. Getting back from that to original shape of data
> is painful.
> 
> Regards,
>   Alex.
> 
> Solr Example reading group is starting November 2016, join us at
> http://j.mp/SolrERG
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
> 
> 
> On 17 November 2016 at 18:46, Dorian Hoxha  wrote:
>> Hi,
>> 
>> Anyone use solr for source-of-data with no `normal` db (of course with
>> normal backups/replication) ?
>> 
>> Are there any drawbacks ?
>> 
>> Thank You



Re: How to stop long running/memory eating query

2016-11-17 Thread Erick Erickson
Right. Each shard has to sort over 30M documents, ship the candidate
30M to the aggregator which sorts into the final top 10 (assuming
rows=10). gah...

You want to see either the cursorMark stuff or the export handler,
depending on whether the goal is to return one page at a time or the
entire set. Note that export has some restrictions (i.e. it only
returns docValues fields).

The cursormark capability was explicitly added to handle this case,
although it does _not_ handle something like "go to last page", rather
it handles paging through to the last page (which for something like
this would only be a program of some sort).

BTW, an interesting trick for "go to last page" is to reverse the sort
order, i.e. sort by score _ascending_. Then last becomes first..
In general, though, "go to last page" isn't all that useful
considering what it takes to support it.

https://lucidworks.com/blog/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/

Best,
Erick

On Thu, Nov 17, 2016 at 8:39 AM, Susheel Kumar  wrote:
> Hi Erick, you got it.  I missed to put the rest of the query and the
> parameter which caused the issue is start parameter.  The start parameter
> for this query was put like 30+ milllion by the user due to bad UI design
> (deep pagination issue) and bringing the whole cluster down .
>
> Thnx
>
> On Thu, Nov 17, 2016 at 11:08 AM, Erick Erickson 
> wrote:
>
>> That query frankly doesn't seem like it'd lead to OOM or run for a
>> very long time unless there are (at least) hundreds of terms and a
>> _lot_ of documents. Or you're trying to return a zillion rows. Or
>> you're faceting on a high cardinality field. Or
>>
>> The terms should be being kept in MMapDirectory space.
>>
>> My guess is that you aren't showing the part that's really causing the
>> problem, perhaps try peeling parts of the query off until you find the
>> culprit?
>>
>> And if you're sorting, faceting or the like docValues will help
>> prevent OOM problems.
>>
>> Best,
>> Erick
>>
>> On Thu, Nov 17, 2016 at 7:17 AM, Davis, Daniel (NIH/NLM) [C]
>>  wrote:
>> > Mikhail,
>> >
>> > If the query is not asynchronous, it would certainly be OK to stop the
>> long-running query if the client socket is disconnected.   I know that is a
>> feature of the niche indexer used in the products of www.indexengines.com,
>> because I wrote it.   We did not have asynchronous queries, and because of
>> the content and query-time deduplication, some queries could take hours
>> -that's 72 billion objects on a 2U box for you.   Hope they've added better
>> index-time deduplication by now.
>> >
>> > Thanks,
>> >
>> > -dan
>> >
>> > -Original Message-
>> > From: Mikhail Khludnev [mailto:m...@apache.org]
>> > Sent: Thursday, November 17, 2016 6:55 AM
>> > To: solr-user 
>> > Subject: Re: How to stop long running/memory eating query
>> >
>> > There is a circuit breaker
>> > https://cwiki.apache.org/confluence/display/solr/
>> Common+Query+Parameters#CommonQueryParameters-ThetimeAllowedParameter
>> > If I'm right, it does not interrupt faceting.
>> >
>> > On Thu, Nov 17, 2016 at 2:07 PM, Susheel Kumar 
>> > wrote:
>> >
>> >> Hello,
>> >>
>> >> We found a query which was running forever and thus causing OOM (
>> >> q=+AND++AND+Tom+AND+Jerry...).  Is there any way similar to SQL/No SQL
>> >> world where we can watch currently executed queries and able to kill
>> them.
>> >> This can be desiring feature in these situations and avoid whole
>> >> cluster going down. Is there any existing JIRA/can create one.
>> >>
>> >> Also what would be the different ways we can examine and stop such
>> >> queries to execute.
>> >>
>> >> Thanks,
>> >> Susheel
>> >>
>> >
>> >
>> >
>> > --
>> > Sincerely yours
>> > Mikhail Khludnev
>>


Re: How to stop long running/memory eating query

2016-11-17 Thread Susheel Kumar
Hi Erick, you got it.  I missed to put the rest of the query and the
parameter which caused the issue is start parameter.  The start parameter
for this query was put like 30+ milllion by the user due to bad UI design
(deep pagination issue) and bringing the whole cluster down .

Thnx

On Thu, Nov 17, 2016 at 11:08 AM, Erick Erickson 
wrote:

> That query frankly doesn't seem like it'd lead to OOM or run for a
> very long time unless there are (at least) hundreds of terms and a
> _lot_ of documents. Or you're trying to return a zillion rows. Or
> you're faceting on a high cardinality field. Or
>
> The terms should be being kept in MMapDirectory space.
>
> My guess is that you aren't showing the part that's really causing the
> problem, perhaps try peeling parts of the query off until you find the
> culprit?
>
> And if you're sorting, faceting or the like docValues will help
> prevent OOM problems.
>
> Best,
> Erick
>
> On Thu, Nov 17, 2016 at 7:17 AM, Davis, Daniel (NIH/NLM) [C]
>  wrote:
> > Mikhail,
> >
> > If the query is not asynchronous, it would certainly be OK to stop the
> long-running query if the client socket is disconnected.   I know that is a
> feature of the niche indexer used in the products of www.indexengines.com,
> because I wrote it.   We did not have asynchronous queries, and because of
> the content and query-time deduplication, some queries could take hours
> -that's 72 billion objects on a 2U box for you.   Hope they've added better
> index-time deduplication by now.
> >
> > Thanks,
> >
> > -dan
> >
> > -Original Message-
> > From: Mikhail Khludnev [mailto:m...@apache.org]
> > Sent: Thursday, November 17, 2016 6:55 AM
> > To: solr-user 
> > Subject: Re: How to stop long running/memory eating query
> >
> > There is a circuit breaker
> > https://cwiki.apache.org/confluence/display/solr/
> Common+Query+Parameters#CommonQueryParameters-ThetimeAllowedParameter
> > If I'm right, it does not interrupt faceting.
> >
> > On Thu, Nov 17, 2016 at 2:07 PM, Susheel Kumar 
> > wrote:
> >
> >> Hello,
> >>
> >> We found a query which was running forever and thus causing OOM (
> >> q=+AND++AND+Tom+AND+Jerry...).  Is there any way similar to SQL/No SQL
> >> world where we can watch currently executed queries and able to kill
> them.
> >> This can be desiring feature in these situations and avoid whole
> >> cluster going down. Is there any existing JIRA/can create one.
> >>
> >> Also what would be the different ways we can examine and stop such
> >> queries to execute.
> >>
> >> Thanks,
> >> Susheel
> >>
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
>


Re: Question about synonym's behavior with NGramTokenizer

2016-11-17 Thread Erick Erickson
Wouldn't it be better to put the synonym filter in front of the
NGramTokenizerFactory and just let the SynonymFilter take care of
ngramming the injected tokens just like the other tokens like this?



  


That said, I urge you to use the admin/anlysis page ot see the effects
of various tweaks you can do to the analysis chain, it'll help make
sense of all the interactions. Hint: Unless you care to see _lots_ lf
detail, uncheck the "verbose" checkbox

Also, please describe exactly _what_ doesn't work. We need to know
what behavior you expect, what behavior you're seeing and, if
possible, some example data, queries and results you'd like to see.

Best,
Erick



Best,
Erick

On Thu, Nov 17, 2016 at 3:21 AM, Yutaka Nakajima  wrote:
> Hi,
>
> I have a question about Solr synonym's behavior with NGramTokenizer.
>
> I'm using below setting but does not work well. Synonyms doesn't work.
> Please someone help me
>
>  positionIncrementGap="100" autoGeneratePhraseQueries="false">
>   
>  maxGramSize="2"/>
>  synonyms="synonyms-index.txt"
> tokenizerFactory="solr.NGramTokenizerFactory"
> tokenizerFactory.minGramSize="2"
> tokenizerFactory.maxGramSize="2"
> luceneMatchVersion="3.3"
> ignoreCase="true" expand="true"/>
>   
>   
>  maxGramSize="2"/>
>   
> 
>
> Thanks,
> Yutaka Nakajima


Re: Updating documents with docvalues (not stored), commit question

2016-11-17 Thread Erick Erickson
I'm pretty sure that atomic updates use Real Time Get which means they'll
pull the values from in-memory structures for docs that haven't been
committed yet.

And as Shawn says, docValues isn't relevant here.

Best,
Erick

On Thu, Nov 17, 2016 at 5:52 AM, Shawn Heisey  wrote:
> On 11/17/2016 6:26 AM, Dorian Hoxha wrote:
>> Looks like you can update documents even using just doc-values
>> (without stored). While I understand the columnar-format, my issue
>> with this is that docValues are added when a 'commit' is done
>> (right?). Does that mean that it will force a commit (which is a slow
>> operation) when updating with docValues or does it do something more
>> smart ?
>
> The presence  or absence of docValues does not change commits at all.  A
> commit is a separate operation from indexing, although you can send
> commit=true with an indexing request and it would be started as soon as
> all the indexing for that request is done.
>
> https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> The URL above says "SolrCloud" but what it says also applies to
> non-cloud installs.
>
> Thanks,
> Shawn
>


Re: How to stop long running/memory eating query

2016-11-17 Thread Erick Erickson
That query frankly doesn't seem like it'd lead to OOM or run for a
very long time unless there are (at least) hundreds of terms and a
_lot_ of documents. Or you're trying to return a zillion rows. Or
you're faceting on a high cardinality field. Or

The terms should be being kept in MMapDirectory space.

My guess is that you aren't showing the part that's really causing the
problem, perhaps try peeling parts of the query off until you find the
culprit?

And if you're sorting, faceting or the like docValues will help
prevent OOM problems.

Best,
Erick

On Thu, Nov 17, 2016 at 7:17 AM, Davis, Daniel (NIH/NLM) [C]
 wrote:
> Mikhail,
>
> If the query is not asynchronous, it would certainly be OK to stop the 
> long-running query if the client socket is disconnected.   I know that is a 
> feature of the niche indexer used in the products of www.indexengines.com, 
> because I wrote it.   We did not have asynchronous queries, and because of 
> the content and query-time deduplication, some queries could take hours 
> -that's 72 billion objects on a 2U box for you.   Hope they've added better 
> index-time deduplication by now.
>
> Thanks,
>
> -dan
>
> -Original Message-
> From: Mikhail Khludnev [mailto:m...@apache.org]
> Sent: Thursday, November 17, 2016 6:55 AM
> To: solr-user 
> Subject: Re: How to stop long running/memory eating query
>
> There is a circuit breaker
> https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThetimeAllowedParameter
> If I'm right, it does not interrupt faceting.
>
> On Thu, Nov 17, 2016 at 2:07 PM, Susheel Kumar 
> wrote:
>
>> Hello,
>>
>> We found a query which was running forever and thus causing OOM (
>> q=+AND++AND+Tom+AND+Jerry...).  Is there any way similar to SQL/No SQL
>> world where we can watch currently executed queries and able to kill them.
>> This can be desiring feature in these situations and avoid whole
>> cluster going down. Is there any existing JIRA/can create one.
>>
>> Also what would be the different ways we can examine and stop such
>> queries to execute.
>>
>> Thanks,
>> Susheel
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev


Re: potential issue/bug reported by Jetty team? was: Re: Change Solr contextPath =“/“ ?

2016-11-17 Thread Shawn Heisey
On 11/16/2016 10:22 AM, matthew grisius wrote:
> Looks like a little bug on solr. In jetty 9.1.something we changed the
> definition of the webdefault.xml file to avoid the "Uncovered http
> methods" warning.

I have opened an issue for this.

https://issues.apache.org/jira/browse/SOLR-9781

When I find some free time, I will look at all the Jetty config files
and try to refresh them with current defaults.

Thanks,
Shawn



Re: Overlapped Gap Facets

2016-11-17 Thread David Santamauro


I had a similar question a while back but it was regarding date 
differences. Perhaps that might give you some ideas.


http://lucene.472066.n3.nabble.com/date-difference-faceting-td4249364.html

//



On 11/17/2016 09:49 AM, Furkan KAMACI wrote:

Is it possible to do such a facet on a date field:

  Last 1 Day
  Last 1 Week
  Last 1 Month
  Last 6 Month
  Last 1 Year
  Older than 1 Year

which has overlapped facet gaps?

Kind Regards,
Furkan KAMACI



RE: How to stop long running/memory eating query

2016-11-17 Thread Davis, Daniel (NIH/NLM) [C]
Mikhail,

If the query is not asynchronous, it would certainly be OK to stop the 
long-running query if the client socket is disconnected.   I know that is a 
feature of the niche indexer used in the products of www.indexengines.com, 
because I wrote it.   We did not have asynchronous queries, and because of the 
content and query-time deduplication, some queries could take hours -that's 72 
billion objects on a 2U box for you.   Hope they've added better index-time 
deduplication by now.

Thanks,

-dan

-Original Message-
From: Mikhail Khludnev [mailto:m...@apache.org] 
Sent: Thursday, November 17, 2016 6:55 AM
To: solr-user 
Subject: Re: How to stop long running/memory eating query

There is a circuit breaker
https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThetimeAllowedParameter
If I'm right, it does not interrupt faceting.

On Thu, Nov 17, 2016 at 2:07 PM, Susheel Kumar 
wrote:

> Hello,
>
> We found a query which was running forever and thus causing OOM ( 
> q=+AND++AND+Tom+AND+Jerry...).  Is there any way similar to SQL/No SQL 
> world where we can watch currently executed queries and able to kill them.
> This can be desiring feature in these situations and avoid whole 
> cluster going down. Is there any existing JIRA/can create one.
>
> Also what would be the different ways we can examine and stop such 
> queries to execute.
>
> Thanks,
> Susheel
>



--
Sincerely yours
Mikhail Khludnev


Overlapped Gap Facets

2016-11-17 Thread Furkan KAMACI
Is it possible to do such a facet on a date field:

 Last 1 Day
 Last 1 Week
 Last 1 Month
 Last 6 Month
 Last 1 Year
 Older than 1 Year

which has overlapped facet gaps?

Kind Regards,
Furkan KAMACI


Re: Hardware size in solrcloud

2016-11-17 Thread Toke Eskildsen
On Thu, 2016-11-17 at 06:28 -0700, Mugeesh Husain wrote:
> > If I have 15 document ,according to above formula
> 
> 1 document=7 byte per document
> 
> 15 document=15MB

That math does not make sense.


I doubt it is even possible to create a Solr index, where each document
only takes up 7 bytes. If each document taking up 1MB (your 15 docs =
15MB statement) is more realistic. But in that case your full index
would be 10^6MB * 10^9 = 10^15 bytes = 1 petabyte.


- Toke Eskildsen, State and University Library, Denmark


Re: Hardware size in solrcloud

2016-11-17 Thread Shawn Heisey
On 11/17/2016 6:28 AM, Mugeesh Husain wrote:
>> 7GB/1 billion = 7 bytes per document? That would be basically 7
>> characters?
> If I have 15 document ,according to above formula
>
> 1 document=7 byte per document
>
> 15 document=15MB
>
> if i have 1 billions document then what would be the configuration for this
> kind of data.
>
> 1. OS(32/64bit):
> 2. Processor:
> 3. RAM:
> 4. No of physical servers/systems :
>
> Please suggest.

Did you not read Kevin's response fully, and visit the URL you were
provided?  There are so many variables involved that unless you actually
TRY it, you won't know how your install will behave.  The nature of your
documents and how you have configured Solr will affect what hardware you
need.  The nature of your queries, and how many queries per second you
send, will affect what hardware you need.  Here is that URL again:

https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

We do not know what hardware you will need, and cannot make calculations
based on information you provide.  Even if you provide more information,
we can only make guesses.  Those guesses will be very much on the high
end, and may be more than the minimum required.

Memory tends to be the most critical resource for the performance of
individual queries.  CPU becomes more important as query load
increases.  Here's some information I've put together, which mostly
concerns itself with memory:

https://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn



Changing route and still ending on the same shard (or increasing the % of shards a tenant is distrbuted on without reindexing)

2016-11-17 Thread Dorian Hoxha
Hi,

Assuming I use `tenant1/4!doc50` for id (which means 1/16th of shards), and
I later change it to `tenant1/2!doc50` (which means 1/8), is it guaranteed
that the document will go to the same shard ? (it would be nice, but I
don't think so). Meaning , when you change the `/x!`, do you have to
reindex all data for that tenant ? (if not, is there a way without fully
reindexing a tenant?) This probably will also fail because the id changes,
but what if using another field for _routeing and id stays the same ?

Thank You


Re: Updating documents with docvalues (not stored), commit question

2016-11-17 Thread Shawn Heisey
On 11/17/2016 6:26 AM, Dorian Hoxha wrote:
> Looks like you can update documents even using just doc-values
> (without stored). While I understand the columnar-format, my issue
> with this is that docValues are added when a 'commit' is done
> (right?). Does that mean that it will force a commit (which is a slow
> operation) when updating with docValues or does it do something more
> smart ? 

The presence  or absence of docValues does not change commits at all.  A
commit is a separate operation from indexing, although you can send
commit=true with an indexing request and it would be started as soon as
all the indexing for that request is done.

https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

The URL above says "SolrCloud" but what it says also applies to
non-cloud installs.

Thanks,
Shawn



Multilevel sorting in JSON-facet

2016-11-17 Thread Wonderful Little Things
Hi,

I want to do the sorting on multiple fields using the JSON-facet API, so is
this available? And if it is, then what would be the syntax?

Thanks,
Aman Tandon


Bkd tree numbers/geo on solr 6.3 ?

2016-11-17 Thread Dorian Hoxha
Hi,

I've read that lucene 6 has fancy bkd-tree implementation for numbers. But
on latest cwiki I only see TrieNumbers. Aren't they implemented or did I
miss something (they still mention "indexing multiple values for
range-queries" , which is the old way)?

Thank You


Multiple search-queries in 1 http request ?

2016-11-17 Thread Dorian Hoxha
Hi,

I couldn't find anything in core for "multiple separate queries in 1 http
request" like elasticsearch

? I found this

blog-post though I thought there is/should/would be something in core ?

Thank You


Re: Hardware size in solrcloud

2016-11-17 Thread Mugeesh Husain
>
> First question: is your initial sizing correct?
>
Yes

>
> 7GB/1 billion = 7 bytes per document? That would be basically 7
> characters?
>
>
If I have 15 document ,according to above formula

1 document=7 byte per document

15 document=15MB

if i have 1 billions document then what would be the configuration for this
kind of data.

1. OS(32/64bit):
2. Processor:
3. RAM:
4. No of physical servers/systems :

Please suggest.


*Thanks,*
*Mugeesh Husain   *




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Hardware-size-in-solrcloud-tp4306169p4306282.html
Sent from the Solr - User mailing list archive at Nabble.com.

Updating documents with docvalues (not stored), commit question

2016-11-17 Thread Dorian Hoxha
Looks like you can update documents even using just doc-values (without
stored). While I understand the columnar-format, my issue with this is that
docValues are added when a 'commit' is done (right?). Does that mean that
it will force a commit (which is a slow operation) when updating with
docValues or does it do something more smart ?

Thank You


Re: Parent child relationship, where children aren't nested but separate (like elasticsearch)

2016-11-17 Thread Dorian Hoxha
It's not mentioned on that page, but I'm assuming the join should work on
solrcloud when joining the same collection with the same routing (example:
users and user_events both routed by user_id (and joining on user_id))


On Thu, Nov 17, 2016 at 10:23 AM, Alexandre Rafalovitch 
wrote:

> You want just the usual join (not the block-join). That's the way it
> was before nested documents became supported.
> https://cwiki.apache.org/confluence/display/solr/Other+
> Parsers#OtherParsers-JoinQueryParser
>
> Also, Elasticsearch - as far as I remember - stores the original
> document structure (including children) as a special field and then
> flattens all the children into parallel fields within parent. Which
> causes interesting hidden ranking issues, but that's an issue for a
> different day.
>
> Rgards,
>Alex.
> 
> Solr Example reading group is starting November 2016, join us at
> http://j.mp/SolrERG
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 17 November 2016 at 18:08, Dorian Hoxha  wrote:
> > Hi,
> >
> > I'm not finding a way to support parent-child like es does (using
> > blockjoin)? I've seen some blogs
> >  nested-documents-in-apache-solr>
> > with having children as nested inside the parent-document, but I want to
> > freely crud childs/parents as separate documents (i know that nested also
> > writes separate documents) and have a special field to link them +
> manually
> > route them to the same shard.
> >
> > Is this possible/available ?
> >
> > Thank You
>


Re: How to stop long running/memory eating query

2016-11-17 Thread Mikhail Khludnev
There is a circuit breaker
https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThetimeAllowedParameter
If I'm right, it does not interrupt faceting.

On Thu, Nov 17, 2016 at 2:07 PM, Susheel Kumar 
wrote:

> Hello,
>
> We found a query which was running forever and thus causing OOM (
> q=+AND++AND+Tom+AND+Jerry...).  Is there any way similar to SQL/No SQL
> world where we can watch currently executed queries and able to kill them.
> This can be desiring feature in these situations and avoid whole cluster
> going down. Is there any existing JIRA/can create one.
>
> Also what would be the different ways we can examine and stop such queries
> to execute.
>
> Thanks,
> Susheel
>



-- 
Sincerely yours
Mikhail Khludnev


Question about synonym's behavior with NGramTokenizer

2016-11-17 Thread Yutaka Nakajima
Hi,

I have a question about Solr synonym's behavior with NGramTokenizer.

I'm using below setting but does not work well. Synonyms doesn't work.
Please someone help me


  


  
  

  


Thanks,
Yutaka Nakajima


How to stop long running/memory eating query

2016-11-17 Thread Susheel Kumar
Hello,

We found a query which was running forever and thus causing OOM (
q=+AND++AND+Tom+AND+Jerry...).  Is there any way similar to SQL/No SQL
world where we can watch currently executed queries and able to kill them.
This can be desiring feature in these situations and avoid whole cluster
going down. Is there any existing JIRA/can create one.

Also what would be the different ways we can examine and stop such queries
to execute.

Thanks,
Susheel


Re: "add and limit" update modifier or scripted update like elasticsearch

2016-11-17 Thread Dorian Hoxha
Hi Alex,

Yes I saw the udpate-modifiers, but there isn't an add-and-limit() thing.
The update request processors should work.

Thanks

On Thu, Nov 17, 2016 at 10:26 AM, Alexandre Rafalovitch 
wrote:

> Solr has an partial update support, though you need to be careful to
> have all fields retrievable (stored or docvalue).
> https://cwiki.apache.org/confluence/display/solr/
> Updating+Parts+of+Documents
>
> Solr also has UpdateRequestProcessor which can do many things,
> including scripting.
> https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors
> I believe, you would need to place it AFTER DistributedUpdateProcessor
> if you want to apply it on the whole reconstructed "updated" document
> as opposed to just on changes sent.
>
> Regards,
>Alex.
> 
> Solr Example reading group is starting November 2016, join us at
> http://j.mp/SolrERG
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 17 November 2016 at 18:06, Dorian Hoxha  wrote:
> > Hi,
> >
> > Is there an "add and limit" update modifier (couldn't find in docs) ? If
> > not, can I run a script to update a document (still couldn't find
> anything)
> > ? If not, how should I do that  (custom plugin? )?
> >
> > Thank You
>


Re: "add and limit" update modifier or scripted update like elasticsearch

2016-11-17 Thread Alexandre Rafalovitch
Solr has an partial update support, though you need to be careful to
have all fields retrievable (stored or docvalue).
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents

Solr also has UpdateRequestProcessor which can do many things,
including scripting.
https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors
I believe, you would need to place it AFTER DistributedUpdateProcessor
if you want to apply it on the whole reconstructed "updated" document
as opposed to just on changes sent.

Regards,
   Alex.

Solr Example reading group is starting November 2016, join us at
http://j.mp/SolrERG
Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 17 November 2016 at 18:06, Dorian Hoxha  wrote:
> Hi,
>
> Is there an "add and limit" update modifier (couldn't find in docs) ? If
> not, can I run a script to update a document (still couldn't find anything)
> ? If not, how should I do that  (custom plugin? )?
>
> Thank You


Index time sorting and per index mergePolicyFactory

2016-11-17 Thread Dorian Hoxha
Hi,

I know this is done in lucene, but I don't see it in solr (by searching +
docs on collections).

I see
https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig
but it's not mentioned for index-time-sorting.

So, is it possible and definable for each index ? I want to have some
sorted by 'x' field, some by 'y' field, and some staying as default.

Thank You


Re: Parent child relationship, where children aren't nested but separate (like elasticsearch)

2016-11-17 Thread Alexandre Rafalovitch
You want just the usual join (not the block-join). That's the way it
was before nested documents became supported.
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser

Also, Elasticsearch - as far as I remember - stores the original
document structure (including children) as a special field and then
flattens all the children into parallel fields within parent. Which
causes interesting hidden ranking issues, but that's an issue for a
different day.

Rgards,
   Alex.

Solr Example reading group is starting November 2016, join us at
http://j.mp/SolrERG
Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 17 November 2016 at 18:08, Dorian Hoxha  wrote:
> Hi,
>
> I'm not finding a way to support parent-child like es does (using
> blockjoin)? I've seen some blogs
> 
> with having children as nested inside the parent-document, but I want to
> freely crud childs/parents as separate documents (i know that nested also
> writes separate documents) and have a special field to link them + manually
> route them to the same shard.
>
> Is this possible/available ?
>
> Thank You


Re: Using solr(cloud) as source-of-truth for data (with no backing external db)

2016-11-17 Thread Alexandre Rafalovitch
I've heard of people doing it but it is not recommended.

One of the biggest implementation breakthroughs is that - after the
initial learning curve - you will start mapping your input data to
signals. Those signals will not look very much like your original data
and therefore are not terribly suitable to be the source of it.

We are talking copyFields, UpdateRequestProcessor pre-processing,
fields that are not stored, nested documents flattening,
denormalization, etc. Getting back from that to original shape of data
is painful.

Regards,
   Alex.

Solr Example reading group is starting November 2016, join us at
http://j.mp/SolrERG
Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 17 November 2016 at 18:46, Dorian Hoxha  wrote:
> Hi,
>
> Anyone use solr for source-of-data with no `normal` db (of course with
> normal backups/replication) ?
>
> Are there any drawbacks ?
>
> Thank You


Re: compilation error

2016-11-17 Thread Midas A
sorry ,

i am using solr 5.2.1 version

On Thu, Nov 17, 2016 at 2:22 PM, Daniel Collins 
wrote:

> Also, remember a significant number of the people on this group are in the
> US.  Asking for a rapid response at 1am is a pretty harsh SLA
> expectation...
>
> On 17 November 2016 at 08:51, Daniel Collins 
> wrote:
>
> > Can you be more specific?  What version are you compiling, what command
> do
> > you use?  That looks to me like maven output, not ant?
> >
> > On 17 November 2016 at 06:30, Midas A  wrote:
> >
> >> Please reply?
> >>
> >> On Thu, Nov 17, 2016 at 11:31 AM, Midas A  wrote:
> >>
> >> > gettting following error while compiling .
> >> >  .
> >> > org.apache.avro#avro;1.7.5: configuration not found in
> >> > org.apache.avro#avro;1.7.5: 'master'. It was required from
> >> > org.apache.solr#morphlines-core;
> >> >
> >> >
> >> > and not able to resolve . please help in resolving .
> >> >
> >>
> >
> >
>


Re: compilation error

2016-11-17 Thread Daniel Collins
Also, remember a significant number of the people on this group are in the
US.  Asking for a rapid response at 1am is a pretty harsh SLA expectation...

On 17 November 2016 at 08:51, Daniel Collins  wrote:

> Can you be more specific?  What version are you compiling, what command do
> you use?  That looks to me like maven output, not ant?
>
> On 17 November 2016 at 06:30, Midas A  wrote:
>
>> Please reply?
>>
>> On Thu, Nov 17, 2016 at 11:31 AM, Midas A  wrote:
>>
>> > gettting following error while compiling .
>> >  .
>> > org.apache.avro#avro;1.7.5: configuration not found in
>> > org.apache.avro#avro;1.7.5: 'master'. It was required from
>> > org.apache.solr#morphlines-core;
>> >
>> >
>> > and not able to resolve . please help in resolving .
>> >
>>
>
>


Re: compilation error

2016-11-17 Thread Daniel Collins
Can you be more specific?  What version are you compiling, what command do
you use?  That looks to me like maven output, not ant?

On 17 November 2016 at 06:30, Midas A  wrote:

> Please reply?
>
> On Thu, Nov 17, 2016 at 11:31 AM, Midas A  wrote:
>
> > gettting following error while compiling .
> >  .
> > org.apache.avro#avro;1.7.5: configuration not found in
> > org.apache.avro#avro;1.7.5: 'master'. It was required from
> > org.apache.solr#morphlines-core;
> >
> >
> > and not able to resolve . please help in resolving .
> >
>