Re: Range faceting on timestamp field

2020-12-24 Thread Erick Erickson
Then you need to form your start at relative to your timezone.

What I’d actually recommend is that if you need to bucket by day,
you index the day in a separate field. Of course, if you have to
bucket by day in arbitrary timezones that won’t work…..

Best,
Erick

> On Dec 24, 2020, at 4:42 PM, ufuk yılmaz  wrote:
> 
> Hello all,
> 
> I have a plong field in my schema representing a Unix timestamp
> 
> 
> 
> I’m doing a range facet over this field to find which event occured on which 
> day. I’m setting “start” on some date at 00:00 o’clock, end on another, and 
> setting gap to 86400 (total seconds in a day)
> ...
> "type": "range",
> "field": "timestamp_s",
> "start": 1338498000,
> "end": 1339275600,
> "gap": 86400,
> ...
> 
> Lets say that an event occured at 19:00 GMT+00. This facet puts it in the 
> bucket of that day, which starts at 00:00. I’m living in GMT+2 timezone, so 
> clock was 21:00 and that event occured on the same day with me, which is all 
> good and correct.
> 
> Another event occured at 23:00 GMT+00, Day 2. At that time, it was 01:00 Day 
> 3 here. Faceting puts the event at Day 2 00:00’s bucket, when converted to my 
> timezone, puts the event on Day 2. But it was Day 3 here when the event 
> happened...
> 
> I wish I didn’t bore the hell out of you. Do you have any suggestion to solve 
> this problem? Unfortunately my timestamp field is not a date field and I need 
> to show the results from my perspective, not from the universal time.
> 
> Have a nice day!
> 
> Sent from Mail for Windows 10
> 



Range faceting on timestamp field

2020-12-24 Thread ufuk yılmaz
Hello all,

I have a plong field in my schema representing a Unix timestamp



I’m doing a range facet over this field to find which event occured on which 
day. I’m setting “start” on some date at 00:00 o’clock, end on another, and 
setting gap to 86400 (total seconds in a day)
...
"type": "range",
"field": "timestamp_s",
"start": 1338498000,
"end": 1339275600,
"gap": 86400,
...

Lets say that an event occured at 19:00 GMT+00. This facet puts it in the 
bucket of that day, which starts at 00:00. I’m living in GMT+2 timezone, so 
clock was 21:00 and that event occured on the same day with me, which is all 
good and correct.

Another event occured at 23:00 GMT+00, Day 2. At that time, it was 01:00 Day 3 
here. Faceting puts the event at Day 2 00:00’s bucket, when converted to my 
timezone, puts the event on Day 2. But it was Day 3 here when the event 
happened...

I wish I didn’t bore the hell out of you. Do you have any suggestion to solve 
this problem? Unfortunately my timestamp field is not a date field and I need 
to show the results from my perspective, not from the universal time.

Have a nice day!

Sent from Mail for Windows 10



Re: Why do I get different results for the same query with two Solr versions?

2020-12-24 Thread Tulsi Das
Hi,
Try adding debug=true or debug=query in the url and see the formed query at
the end .
You will get to know why the results are different.


On Thu, 24 Dec, 2020, 8:05 pm nettadalet,  wrote:

> Hello,
>
> I have the the same field type defined in Solr 4.6 and Solr 7.5. When I
> search with both versions, I get different results, and I don't know why
>
> I have the following *field type definition in Solr 4.6*:
>  positionIncrementGap="1000">
> 
> 
> 
>  words="stopwords.txt" />
>  generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="0"/>
> 
> 
> 
> 
> 
>  synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>  ignoreCase="true"
> words="stopwords.txt"
> />
>  generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0"/>
> 
> 
> 
>
>
> I have the following *field type definition in Solr 7.5*:
>  positionIncrementGap="1000">
> 
> 
> 
>  words="stopwords.txt" />
>  generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="0"/>
> 
> 
> 
> 
> 
> 
>  synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
> ignoreCase="true"
>words="stopwords.txt"
>/>
>  generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0"/>
> 
> 
> 
>
> * I tried to use solr.WordDelimiterFilterFactory with Solr 7.5 instead of
> solr.WordDelimiterGraphFilterFactory so the field types will be more alike,
> but the result was the same.
>
> I have the following *6 values set for field text1 of type text_type1 for 6
> different documents* (the type(s) from above):
> KI_d5e7b43a
> KI_b7c490bd
> KI_7df2f026
> KI_fa7d129d
> KI_5867aec7
> KI_7c3c0b93
>
>
> My query is *text1=KI_7*.
> Using Solr 4.6, I get 2 result - KI_7df2f026, KI_7c3c0b93
> Using Solr 7.5, I get all 6 results.
>
> Questions:
> 1. How come I get different results with the same data, when my fields
> definitions are the same (as far as I can tell)?
>
> 2. What are the expected results?
> I think that the results Solr 7.5 returns are the correct ones, since at
> the
> end of the of the analysis I get *KA* as a term and *7* as a term, both
> during the indexing analysis and the query analysis, so, to my
> understanding, all 6 results should be found.
> Is this correct? if not, what am I missing? what don't I understand
> correctly?
>
> I would very much appreciate a full/partial answer, but even a link that
> could explain at least the expected results part would be great.
>
> Thanks in advance, I know this might be a tough one to answer [Hope not
> :)]
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Why do I get different results for the same query with two Solr versions?

2020-12-24 Thread nettadalet
Hello,

I have the the same field type defined in Solr 4.6 and Solr 7.5. When I
search with both versions, I get different results, and I don't know why

I have the following *field type definition in Solr 4.6*:



















I have the following *field type definition in Solr 7.5*:



















* I tried to use solr.WordDelimiterFilterFactory with Solr 7.5 instead of
solr.WordDelimiterGraphFilterFactory so the field types will be more alike,
but the result was the same.

I have the following *6 values set for field text1 of type text_type1 for 6
different documents* (the type(s) from above):
KI_d5e7b43a
KI_b7c490bd
KI_7df2f026
KI_fa7d129d
KI_5867aec7
KI_7c3c0b93


My query is *text1=KI_7*.
Using Solr 4.6, I get 2 result - KI_7df2f026, KI_7c3c0b93
Using Solr 7.5, I get all 6 results.

Questions:
1. How come I get different results with the same data, when my fields
definitions are the same (as far as I can tell)?

2. What are the expected results?
I think that the results Solr 7.5 returns are the correct ones, since at the
end of the of the analysis I get *KA* as a term and *7* as a term, both
during the indexing analysis and the query analysis, so, to my
understanding, all 6 results should be found.
Is this correct? if not, what am I missing? what don't I understand
correctly?

I would very much appreciate a full/partial answer, but even a link that
could explain at least the expected results part would be great. 

Thanks in advance, I know this might be a tough one to answer [Hope not  :)]



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: increasing number of threads for faceting in JSON format

2020-12-24 Thread Arturas Mazeika
Hi Christine,

Thanks a lot for the posts. Very impressive information (article as well as
the youtube video!)

Thanks a lot Merry Xmas and and Happy New Year!

Cheers,
Arturas

On Thu, Dec 24, 2020 at 11:03 AM Christine Poerschke (BLOOMBERG/ LONDON) <
cpoersc...@bloomberg.net> wrote:

> Hello again Arturas.
>
> I meant to reply before but somehow lost track of it ... The "Lifecycle of
> a Solr Search Request" slides [1] and/or talk [2] may be of interest to you.
>
> Regards,
> Christine
>
> [1] https://home.apache.org/~hossman/rev2017/
> [2] https://youtu.be/qItRilJLj5o
>
> From: solr-user@lucene.apache.org At: 12/10/20 21:42:19To:
> solr-user@lucene.apache.org
> Subject: Re: increasing number of threads for faceting in JSON format
>
> Hi Christine Munendra et al,
>
> Wow, you dag into the code and checked weather threads are being blown in
> range and term queries! I wish one day to be able to do the same myself.
>
> How does one get to the level, so one can check the code herself? Is there
> like a nice primer or crash course, solr 101 so to say, things you did not
> learn in school about solr, but you wish you had learned web page? Well,
> I'll take this opportunity to scroll through the lines in the github. Your
> answer is very helpful.
>
> Cheers,
> Arturas
>
> On Thu, Dec 10, 2020 at 7:08 PM Munendra S N 
> wrote:
>
> > Thank you Christine.
> > Yeah, JSON facet does not support specifying threads.
> >
> >
> > On Thu, Dec 10, 2020, 11:15 PM Christine Poerschke (BLOOMBERG/ LONDON) <
> > cpoersc...@bloomberg.net> wrote:
> >
> > > Hello Arturas and Munendra!
> > >
> > > In the "Currently, JSON facets have support for specifying the number
> of
> > > threads." sentence, I wonder if perhaps a "does not" got inadvertently
> > > omitted i.e. "Currently, JSON facets does not have support for
> specifying
> > > the number of threads." was intended?
> > >
> > > Let me share what I learnt from digging into the code:
> > >
> > > * "facet.threads" is for field value faceting [1] [2] but you're
> > > interested in (JSON) field range faceting as well as JSON field value
> > > faceting.
> > >
> > > * The area of the code [3] that does the JSON field range faceting
> shows
> > > no obvious threading or parallelisation.
> > >
> > > Hope that helps?
> > >
> > > Regards,
> > >
> > > Christine
> > >
> > > [1]
> > >
> >
>
> https://lucene.apache.org/solr/guide/8_7/faceting.html#field-value-faceting-para
> meters
> > > [2]
> > >
> >
>
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/solr/core/
> src/java/org/apache/solr/request/SimpleFacets.java
> 
> > > [3]
> > >
> >
>
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/solr/core/
> src/java/org/apache/solr/search/facet/FacetRangeProcessor.java#L112-L113
> 
> > >
> > > From: solr-user@lucene.apache.org At: 12/03/20 22:47:35To:
> > > solr-user@lucene.apache.org
> > > Subject: Re: increasing number of threads for faceting in JSON format
> > >
> > > Hi Munedra,
> > >
> > > This is great that I can get things faster by reducing the gap and by
> > > increasing the number of threads. How to reduce gaps I know: one can
> > > replace   "gap":   "+1HOUR" with   "gap":   "+1MONTH" What should I
> > change
> > > in the text below to increase the number of threads from one to 20?
> > >
> > > Cheers,
> > > Arturas
> > >
> > > On Thu, Dec 3, 2020 at 1:54 PM Munendra S N 
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Currently, JSON facets have support for specifying the number of
> > threads.
> > > > In the above request, the range facet is computed over 2 years with a
> > gap
> > > > of 1 hour. By reducing the number of buckets, computation should
> become
> > > > much faster
> > > >
> > > > Regards,
> > > > Munendra S N
> > > >
> > > >
> > > >
> > > > On Thu, Dec 3, 2020 at 1:52 PM Arturas Mazeika 
> > > wrote:
> > > >
> > > > > Hi Solr-Users,
> > > > >
> > > > > I am trying to better understand the solr capabilities, how one can
> > > > > formulate queries in JSON format as well as tweak parameters.
> > > Currently I
> > > > > have a logs collection (ca 6GB large) with a dozen of attributes
> > > running
> > > > in
> > > > > single server mode (F:\solr_deployment\solr-8.7.0\bin\solr.cmd
> start
> > -h
> > > > > localhost -p  -m 4g)
> > > > >
> > > > > I am playing with faceting functionality in solr and query a couple
> > of
> > > > > attributes there. My typical query is:
> > > > >
> > > > > GET http://localhost:/solr/db/query
> > > > >  HTTP/1.1
> > > > > content-type: application/json
> > > > >
> > > > > {
> > > > > "query"  : "*:*",
> > > > > "limit"  : 0,
> > > > > "facet": {
> > > > > 

Re: increasing number of threads for faceting in JSON format

2020-12-24 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hello again Arturas.

I meant to reply before but somehow lost track of it ... The "Lifecycle of a 
Solr Search Request" slides [1] and/or talk [2] may be of interest to you.

Regards,
Christine

[1] https://home.apache.org/~hossman/rev2017/
[2] https://youtu.be/qItRilJLj5o

From: solr-user@lucene.apache.org At: 12/10/20 21:42:19To:  
solr-user@lucene.apache.org
Subject: Re: increasing number of threads for faceting in JSON format

Hi Christine Munendra et al,

Wow, you dag into the code and checked weather threads are being blown in
range and term queries! I wish one day to be able to do the same myself.

How does one get to the level, so one can check the code herself? Is there
like a nice primer or crash course, solr 101 so to say, things you did not
learn in school about solr, but you wish you had learned web page? Well,
I'll take this opportunity to scroll through the lines in the github. Your
answer is very helpful.

Cheers,
Arturas

On Thu, Dec 10, 2020 at 7:08 PM Munendra S N 
wrote:

> Thank you Christine.
> Yeah, JSON facet does not support specifying threads.
>
>
> On Thu, Dec 10, 2020, 11:15 PM Christine Poerschke (BLOOMBERG/ LONDON) <
> cpoersc...@bloomberg.net> wrote:
>
> > Hello Arturas and Munendra!
> >
> > In the "Currently, JSON facets have support for specifying the number of
> > threads." sentence, I wonder if perhaps a "does not" got inadvertently
> > omitted i.e. "Currently, JSON facets does not have support for specifying
> > the number of threads." was intended?
> >
> > Let me share what I learnt from digging into the code:
> >
> > * "facet.threads" is for field value faceting [1] [2] but you're
> > interested in (JSON) field range faceting as well as JSON field value
> > faceting.
> >
> > * The area of the code [3] that does the JSON field range faceting shows
> > no obvious threading or parallelisation.
> >
> > Hope that helps?
> >
> > Regards,
> >
> > Christine
> >
> > [1]
> >
> 
https://lucene.apache.org/solr/guide/8_7/faceting.html#field-value-faceting-para
meters
> > [2]
> >
> 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/solr/core/
src/java/org/apache/solr/request/SimpleFacets.java
> > [3]
> >
> 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/solr/core/
src/java/org/apache/solr/search/facet/FacetRangeProcessor.java#L112-L113
> >
> > From: solr-user@lucene.apache.org At: 12/03/20 22:47:35To:
> > solr-user@lucene.apache.org
> > Subject: Re: increasing number of threads for faceting in JSON format
> >
> > Hi Munedra,
> >
> > This is great that I can get things faster by reducing the gap and by
> > increasing the number of threads. How to reduce gaps I know: one can
> > replace   "gap":   "+1HOUR" with   "gap":   "+1MONTH" What should I
> change
> > in the text below to increase the number of threads from one to 20?
> >
> > Cheers,
> > Arturas
> >
> > On Thu, Dec 3, 2020 at 1:54 PM Munendra S N 
> > wrote:
> >
> > > Hi,
> > >
> > > Currently, JSON facets have support for specifying the number of
> threads.
> > > In the above request, the range facet is computed over 2 years with a
> gap
> > > of 1 hour. By reducing the number of buckets, computation should become
> > > much faster
> > >
> > > Regards,
> > > Munendra S N
> > >
> > >
> > >
> > > On Thu, Dec 3, 2020 at 1:52 PM Arturas Mazeika 
> > wrote:
> > >
> > > > Hi Solr-Users,
> > > >
> > > > I am trying to better understand the solr capabilities, how one can
> > > > formulate queries in JSON format as well as tweak parameters.
> > Currently I
> > > > have a logs collection (ca 6GB large) with a dozen of attributes
> > running
> > > in
> > > > single server mode (F:\solr_deployment\solr-8.7.0\bin\solr.cmd start
> -h
> > > > localhost -p  -m 4g)
> > > >
> > > > I am playing with faceting functionality in solr and query a couple
> of
> > > > attributes there. My typical query is:
> > > >
> > > > GET http://localhost:/solr/db/query
> > > >  HTTP/1.1
> > > > content-type: application/json
> > > >
> > > > {
> > > > "query"  : "*:*",
> > > > "limit"  : 0,
> > > > "facet": {
> > > > "t" : {
> > > > "type":  "terms",
> > > > "field": "fcomp",
> > > > "sort":  "index",
> > > >
> > > > "facet": {
> > > > "t_buckets": {
> > > > "type":  "range",
> > > > "field": "t",
> > > > "sort": { "t": "asc" },
> > > > "start": "2018-05-02T17:00:00.000Z",
> > > > "end":   "2020-11-16T21:00:00.000Z",
> > > > "gap":   "+1HOUR"
> > > > }
> > > > }
> > > > },
> > > > }
> > > > }
> > > >
> > > > not surprisingly, it takes a bit to compute the result, so I tried to
> > > > increase the number of threads. How do I do it in JSON format? I
> tried
> > > > adding
> > > >
> > > > {
> > > > "params": {
> > > >