Re: Histogram facet?
This looks nice! The only missing piece for more interactivity would be to be able to map multiple field values into the same bucket. e.g. http://localhost:8983/solr/query? q=*:* facet=true facet.field=*round(date, '15MINUTES')* facet.stat=sum(retweetCount) This is a bit similar to SOLR-4772https://issues.apache.org/jira/browse/SOLR-4772for the rounding. Then we could zoom out just by changing the size of the bucket, without any index change, e.g.: http://localhost:8983/solr/query? q=*:* facet=true facet.field=*round(date, '1HOURS')* facet.stat=sum(retweetCount) Romain On Tue, May 6, 2014 at 10:09 AM, Yonik Seeley yo...@heliosearch.com wrote: On Mon, May 5, 2014 at 6:18 PM, Romain romain@gmail.com wrote: Hi, I am trying to plot a non date field by time in order to draw an histogram showing its evolution during the week. For example, if I have a tweet index: Tweet: date retweetCount 3 tweets indexed: Tweet | Date | Retweet A01/01 100 B01/01 100 C01/02 100 If I want to plot the number of tweets by day: easy with a date range facet: Day 1: 2 Day 2: 1 But now counting the number of retweet by day is not possible natively: Day 1: 200 Day 2: 100 Check out facet functions in Heliosearch (an experimental fork of Solr): http://heliosearch.org/solr-facet-functions/ All you would need to do is add: facet.stat=sum(retweetCount) -Yonik http://heliosearch.org - solve Solr GC pauses with off-heap filters and fieldcache
Re: Histogram facet?
This is super nice, I tried (even without subfacets) and it works! Thanks a lot! Romain facet=truefacet.range=pricefacet.range.start=0facet.range.end=1000facet.range.gap=100facet.stat=avg(popularity) facets: { price: { buckets: [ { val: 0.0, avg(popularity): 3.5714285714285716 }, { val: 100.0, avg(popularity): 5.5 }, { val: 200.0, avg(popularity): 6 }, { val: 300.0, avg(popularity): 7.667 }, { val: 400.0, avg(popularity): 7 }, { val: 500.0, avg(popularity): NaN }, { val: 600.0, avg(popularity): 7}, { val: 700.0, avg(popularity): NaN }, { val: 800.0, avg(popularity): NaN }, { val: 900.0, avg(popularity): NaN } ], gap: 100, start: 0, end: 1000 } On Tue, May 6, 2014 at 3:15 PM, Yonik Seeley yo...@heliosearch.com wrote: On Tue, May 6, 2014 at 5:30 PM, Romain Rigaux rom...@cloudera.com wrote: This looks nice! The only missing piece for more interactivity would be to be able to map multiple field values into the same bucket. e.g. http://localhost:8983/solr/query? q=*:* facet=true facet.field=*round(date, '15MINUTES')* facet.stat=sum(retweetCount) This is a bit similar to SOLR-4772https://issues.apache.org/jira/browse/SOLR-4772for the rounding. Then we could zoom out just by changing the size of the bucket, without any index change, e.g.: http://localhost:8983/solr/query? q=*:* facet=true facet.field=*round(date, '1HOURS')* facet.stat=sum(retweetCount) For this specific example, I think map multiple field values into the same bucket equates to a range facet? facet.range=mydatefield facet.range.start=... facet.range.end=... facet.range.gap=+1HOURS facet.stat=sum(retweetCount) And then if you need additional breakouts by time range, you can use subfacets: subfacet.mydatefield.field=mycategoryfield That will provide retweet counts broken out by mycategoryfield for every bucket produced by the range query. See http://heliosearch.org/solr-subfacets/ -Yonik http://heliosearch.org - facet functions, subfacets, off-heap filtersfieldcache
Re: Histogram facet?
The dates won't match unless you truncate all of them to day. But then if you want to have slots of 15minutes it won't work as you would need to truncate the dates every 15minutes in the index. In ES, they have 1 field to make the slots and 1 field to insert into the bucket, e.g.: { query : { match_all : {} }, facets : { histo1 : { date_histogram : { key_field : timestamp, value_field : price, interval : day } } } } Romain On Mon, May 5, 2014 at 9:05 PM, Erick Erickson erickerick...@gmail.comwrote: Hmmm, I _think_ pivot faceting works here. One dimension would be day and the other retweet count. The response will have the number of retweets per day, you'd have to sum them up I suppose. Best, Erick On Mon, May 5, 2014 at 3:18 PM, Romain romain@gmail.com wrote: Hi, I am trying to plot a non date field by time in order to draw an histogram showing its evolution during the week. For example, if I have a tweet index: Tweet: date retweetCount 3 tweets indexed: Tweet | Date | Retweet A01/01 100 B01/01 100 C01/02 100 If I want to plot the number of tweets by day: easy with a date range facet: Day 1: 2 Day 2: 1 But now counting the number of retweet by day is not possible natively: Day 1: 200 Day 2: 100 On current workaround would be to do a date rage facet to get the date slots and ask only for the retweet field and compute the sums in the client. We could compute other stats like average, etc... too The closest I could see was https://issues.apache.org/jira/browse/SOLR-4772but it seems to be slightly different. Basically I am trying to do something very similar to the Date Histogram Facet http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-date-histogram-facet.html#search-facets-date-histogram-facet in ES. Is there a way to move the counting logic to the Solr server? Thanks! Romain