Re: Histogram facet?

2014-05-06 Thread Romain Rigaux
This looks nice!

The only missing piece for more interactivity would be to be able to map
multiple field values into the same bucket.

e.g.

http://localhost:8983/solr/query?
   q=*:*
   facet=true
   facet.field=*round(date, '15MINUTES')*
   facet.stat=sum(retweetCount)

This is a bit similar to
SOLR-4772https://issues.apache.org/jira/browse/SOLR-4772for the
rounding.

Then we could zoom out just by changing the size of the bucket, without any
index change, e.g.:
http://localhost:8983/solr/query?
   q=*:*
   facet=true
   facet.field=*round(date, '1HOURS')*
   facet.stat=sum(retweetCount)

Romain

On Tue, May 6, 2014 at 10:09 AM, Yonik Seeley yo...@heliosearch.com wrote:

 On Mon, May 5, 2014 at 6:18 PM, Romain romain@gmail.com wrote:
  Hi,
 
  I am trying to plot a non date field by time in order to draw an
 histogram
  showing its evolution during the week.
 
  For example, if I have a tweet index:
 
  Tweet:
date
retweetCount
 
  3 tweets indexed:
  Tweet | Date | Retweet
  A01/01   100
  B01/01   100
  C01/02   100
 
  If I want to plot the number of tweets by day: easy with a date range
 facet:
  Day 1: 2
  Day 2: 1
 
  But now counting the number of retweet by day is not possible natively:
  Day 1: 200
  Day 2: 100

 Check out facet functions in Heliosearch (an experimental fork of Solr):
 http://heliosearch.org/solr-facet-functions/

 All you would need to do is add:
 facet.stat=sum(retweetCount)

 -Yonik
 http://heliosearch.org - solve Solr GC pauses with off-heap filters
 and fieldcache



Re: Histogram facet?

2014-05-06 Thread Romain Rigaux
This is super nice, I tried (even without subfacets) and it works! Thanks a
lot!

Romain

facet=truefacet.range=pricefacet.range.start=0facet.range.end=1000facet.range.gap=100facet.stat=avg(popularity)


facets: { price: { buckets: [ { val: 0.0, avg(popularity):
3.5714285714285716 }, { val: 100.0, avg(popularity): 5.5 }, { val:
200.0, avg(popularity): 6 }, { val: 300.0, avg(popularity):
7.667 }, { val: 400.0, avg(popularity): 7 }, { val:
500.0, avg(popularity): NaN }, { val: 600.0, avg(popularity): 7},
{ val: 700.0, avg(popularity): NaN }, { val: 800.0, 
avg(popularity): NaN }, { val: 900.0, avg(popularity): NaN } ], 
gap: 100, start: 0, end: 1000 }


On Tue, May 6, 2014 at 3:15 PM, Yonik Seeley yo...@heliosearch.com wrote:

 On Tue, May 6, 2014 at 5:30 PM, Romain Rigaux rom...@cloudera.com wrote:
  This looks nice!
 
  The only missing piece for more interactivity would be to be able to map
  multiple field values into the same bucket.
 
  e.g.
 
  http://localhost:8983/solr/query?
 q=*:*
 facet=true
 facet.field=*round(date, '15MINUTES')*
 facet.stat=sum(retweetCount)
 
  This is a bit similar to
  SOLR-4772https://issues.apache.org/jira/browse/SOLR-4772for the
  rounding.
 
  Then we could zoom out just by changing the size of the bucket, without
 any
  index change, e.g.:
  http://localhost:8983/solr/query?
 q=*:*
 facet=true
 facet.field=*round(date, '1HOURS')*
 facet.stat=sum(retweetCount)

 For this specific example, I think map multiple field values into the
 same bucket equates to a range facet?

 facet.range=mydatefield
 facet.range.start=...
 facet.range.end=...
 facet.range.gap=+1HOURS
 facet.stat=sum(retweetCount)

 And then if you need additional breakouts by time range, you can use
 subfacets:

 subfacet.mydatefield.field=mycategoryfield

 That will provide retweet counts broken out by mycategoryfield for
 every bucket produced by the range query.

 See http://heliosearch.org/solr-subfacets/

 -Yonik
 http://heliosearch.org - facet functions, subfacets, off-heap
 filtersfieldcache



Re: Histogram facet?

2014-05-05 Thread Romain Rigaux
The dates won't match unless you truncate all of them to day. But then if
you want to have slots of 15minutes it won't work as you would need to
truncate the dates every 15minutes in the index.

In ES, they have 1 field to make the slots and 1 field to insert into the
bucket, e.g.:

{
query : {


match_all : {}


},
facets : {


histo1 : {


date_histogram : {


key_field : timestamp,


value_field : price,


interval : day


}
}


}
}

Romain


On Mon, May 5, 2014 at 9:05 PM, Erick Erickson erickerick...@gmail.comwrote:

 Hmmm, I _think_ pivot faceting works here. One dimension would be day
 and the other retweet count. The response will have the number of
 retweets per day, you'd have to sum them up I suppose.

 Best,
 Erick

 On Mon, May 5, 2014 at 3:18 PM, Romain romain@gmail.com wrote:
  Hi,
 
  I am trying to plot a non date field by time in order to draw an
 histogram
  showing its evolution during the week.
 
  For example, if I have a tweet index:
 
  Tweet:
date
retweetCount
 
  3 tweets indexed:
  Tweet | Date | Retweet
  A01/01   100
  B01/01   100
  C01/02   100
 
  If I want to plot the number of tweets by day: easy with a date range
 facet:
  Day 1: 2
  Day 2: 1
 
  But now counting the number of retweet by day is not possible natively:
  Day 1: 200
  Day 2: 100
 
  On current workaround would be to do a date rage facet to get the date
  slots and ask only for the retweet field and compute the sums in the
  client. We could compute other stats like average, etc... too
 
  The closest I could see was
  https://issues.apache.org/jira/browse/SOLR-4772but it seems to be
  slightly different.
 
  Basically I am trying to do something very similar to the Date Histogram
  Facet
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-date-histogram-facet.html#search-facets-date-histogram-facet
 in
  ES.
 
  Is there a way to move the counting logic to the Solr server?
 
  Thanks!
 
  Romain