Re: Division with Stats Component when Grouping in Solr
I think I have this about working with the analytics component. It seems to fill in all the gaps that the stats component and the json facet don't support. It solved the following problems for me: - I am able to perform math on stats to form other stats.. Then i can sort on those as needed. - When I perform math on stats it uses the summed totals per group rather than doing it per row - I am able to to do offsets and number of rows to handle paging I am confused why this module isn't built into Sor. This functionality is so vital for any adhoc querying on time series data. Pretty much any scenario like the SQL query I provided would need all of these things. Only thing I couldn't figure out is how to get the list of total buckets... or in other words the distinct count of keywords. If anyone is able to help with this, I could really use it in order to provide a total record count to the user (e.g. Showing records 1-10 of 2939). Here is what I have in case this helps someone: olap=trueo.r1.ff=keyword_so.r1.s.visits=sum(visits_i)o.r1.s.bounces=sum(bounces_i)o.r1.s.bounce_rate=div(sum(bounces_i),sum(visits_i))o.r1.ff.keyword_s.sortstatistic=bounce_rateo.r1.ff.keyword_s.sortdirection=desco.r1.ff.keyword_s.offset=0o.r1.ff.keyword_s.limit=10 Also if anyone has access to the original documentation from bloomberg mentioned in the stats component PDF, I'd love to have it :) https://issues.apache.org/jira/secure/attachment/12606793/Search%20Analytics%20Component.pdf All the links for detailed documentation are now broken. -- View this message in context: http://lucene.472066.n3.nabble.com/Division-with-Stats-Component-when-Grouping-in-Solr-tp4211402p4211751.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Division with Stats Component when Grouping in Solr
Why it isn't in core Solr... Because it doesn't (and probably can't) support distributed mode. The Streaming aggregation stuff, and the (in trunk Real Soon Now) Parallel SQL support are where the effort is going to support this kind of stuff. https://issues.apache.org/jira/browse/SOLR-7560 https://issues.apache.org/jira/browse/SOLR-7082 Best, Erick On Sun, Jun 14, 2015 at 2:25 PM, kingofhypocrites kingofhypocri...@gmail.com wrote: I think I have this about working with the analytics component. It seems to fill in all the gaps that the stats component and the json facet don't support. It solved the following problems for me: - I am able to perform math on stats to form other stats.. Then i can sort on those as needed. - When I perform math on stats it uses the summed totals per group rather than doing it per row - I am able to to do offsets and number of rows to handle paging I am confused why this module isn't built into Sor. This functionality is so vital for any adhoc querying on time series data. Pretty much any scenario like the SQL query I provided would need all of these things. Only thing I couldn't figure out is how to get the list of total buckets... or in other words the distinct count of keywords. If anyone is able to help with this, I could really use it in order to provide a total record count to the user (e.g. Showing records 1-10 of 2939). Here is what I have in case this helps someone: olap=trueo.r1.ff=keyword_so.r1.s.visits=sum(visits_i)o.r1.s.bounces=sum(bounces_i)o.r1.s.bounce_rate=div(sum(bounces_i),sum(visits_i))o.r1.ff.keyword_s.sortstatistic=bounce_rateo.r1.ff.keyword_s.sortdirection=desco.r1.ff.keyword_s.offset=0o.r1.ff.keyword_s.limit=10 Also if anyone has access to the original documentation from bloomberg mentioned in the stats component PDF, I'd love to have it :) https://issues.apache.org/jira/secure/attachment/12606793/Search%20Analytics%20Component.pdf All the links for detailed documentation are now broken. -- View this message in context: http://lucene.472066.n3.nabble.com/Division-with-Stats-Component-when-Grouping-in-Solr-tp4211402p4211751.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Division with Stats Component when Grouping in Solr
I was able to get the new version of Solr installed. This query gets me really close, but it is averaging the rows BEFORE the grouping so it's not totally accurate. I need it to sum the visits and bounces by keyword and then perform the division. The avg here probably seems confusing and pointless, but it wouldn't let me just put the div directly in the facet without wrapping it with a function. So instead of summing all the rows into one group and performing the divide, it is diving each row one by one and then averaging them together which creates skewed results since one day may have more data than the other. It seems dividing is possible if only I can tell it to divide the grouped by keyword result and not the individual rows and having to average them together, etc. Here is what I have (granted it's a simplified version for testing) json.facet={ keywords:{ type:terms, limit:10, field:keyword, facet:{ bounces_sum:sum(bounces), visits_sum:sum(visits), bounce_rate:avg(div(sum(bounces),sum(visits))) } } } What I really want is: bounce_rate: div(bounces_sum, visits_sum) ... but this doesn't work. -- View this message in context: http://lucene.472066.n3.nabble.com/Division-with-Stats-Component-when-Grouping-in-Solr-tp4211402p4211639.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Division with Stats Component when Grouping in Solr
Not sure why but half of my posts are showing up as not accepted by the mailing list. I've made a few replies to others that haven't gone through. I am not sure if it's because I'm replying via email or what the issue is. -- View this message in context: http://lucene.472066.n3.nabble.com/Division-with-Stats-Component-when-Grouping-in-Solr-tp4211402p4211631.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Division with Stats Component when Grouping in Solr
kingofhypocrites: Usually that's because your e-mail formats with html or some other non-plain-text format. Try sending them as plain text. On Sat, Jun 13, 2015 at 5:26 PM, kingofhypocrites kingofhypocri...@gmail.com wrote: Not sure why but half of my posts are showing up as not accepted by the mailing list. I've made a few replies to others that haven't gone through. I am not sure if it's because I'm replying via email or what the issue is. -- View this message in context: http://lucene.472066.n3.nabble.com/Division-with-Stats-Component-when-Grouping-in-Solr-tp4211402p4211631.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Division with Stats Component when Grouping in Solr
@Billnbell What did you conclude with the Analytics component? It sounds like you are saying it does the same thing as the stats component but it has several other features that aren't supported by the stats library. I'd love to have a talk with you offline if possible. -- View this message in context: http://lucene.472066.n3.nabble.com/Division-with-Stats-Component-when-Grouping-in-Solr-tp4211402p4211635.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Division with Stats Component when Grouping in Solr
@Yonik, Thanks for this! I was actually just looking at your blog earlier today and thinking that the json facet feature may be just what I need. I'm using Solr. 4.3 currently as that is what comes with DataStax, so I'm trying to create a new build with the latest Solr version so i can test this feature. For the sort I am assuming this would be sorting on sum(visits) for the given keyword correct? Also can you confirm if it's possible to do a division in the facet? Something like facet: { bouncerate: div(sum(bounces) / sum(visits)) } Because of the large number of results, I would need to precalculate this (division operation) if they happen to sort on it. I don't see anything like this mentioned in the api docs, so maybe it's not possible. -- View this message in context: http://lucene.472066.n3.nabble.com/Division-with-Stats-Component-when-Grouping-in-Solr-tp4211402p4211634.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Division with Stats Component when Grouping in Solr
This looks very promising if only I could get it to work: https://issues.apache.org/jira/browse/SOLR-5302 https://issues.apache.org/jira/secure/attachment/12606793/Search%20Analytics%20Component.pdf Various links it points to are broken now and i can't find anything about it online, but the PDF indicates I can set olap=true to turn it on, although this doesn't seem to do anything. The docs say it supports limiting the results and doing math operations on statistics which is exactly what I need. I'm not clear if I need to install this or if this component is even used anymore. On Fri, Jun 12, 2015 at 12:00 PM Joel Bernstein [via Lucene] ml-node+s472066n4211422...@n3.nabble.com wrote: https://issues.apache.org/jira/browse/SOLR-7560, will almost support this in Solr 5.3. The compound function support won't be there yet though. But it will be there in the near future. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Jun 12, 2015 at 9:30 AM, kingofhypocrites [hidden email] http:///user/SendEmail.jtp?type=nodenode=4211422i=0 wrote: I am migrating a database from SQL Server to Cassandra. Currently I have a setup as follows: - Log data in Cassandra - Summarize data in Spark and put into Cassandra summary tables - Query data in Solr Everything fits beautifully until I need to do stats on groups. I am hoping to get this to work with Solr so I can stick to one database, but I am not sure it's possible. If I had it in SQL Server, I could do it like so: SELECT site_id, keyword, SUM(visits) as visits, CONVERT(DECIMAL(13, 3), SUM(bounces)) / SUM(visits) as bounce_rate, SUM(pageviews) as pageviews, CONVERT(DECIMAL(13, 3), SUM(pageviews)) / SUM(visits) as avg_pages_per_visit FROM report_all_keywords_daily WHERE site_id = 55 AND date_key = '20150606' AND date_key = '20150608' GROUP BY site_id, keyword ORDER BY visits DESC Now I need to replicate this in Solr. The closest I could get to this is by using the Stats component and then using field collapsing. group=truegroup.field=keywordstats=truestats.field=visitsstats.facet=keyword And here are some results I get back: http://pastebin.com/raw.php?i=Fxhe2RA0 However, I need to do able to divide certain metrics. I tried including functions in the stats.field such as div(sum(bounce_rate), (sum(visits)) but it doesn't recognize the functions. Also it seems to ignoring the paging for the stats results and returns all groups regardless. Ultimately I'd like something like this which is what I would get in SQL: http://lucene.472066.n3.nabble.com/file/n4211402/pic.png Is this possible or do I have to give up on the prospect of using Solr? I have to query this data dynamically so I can't pre-summarize all of it. To clarify I having the following two problems: - Paging is ignored for stats data - I can't figure out how to divide two stats together to get a third stat. Note: In some cases I would need to be able to sort on this combined stat -- View this message in context: http://lucene.472066.n3.nabble.com/Division-with-Stats-Component-when-Grouping-in-Solr-tp4211402.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Division-with-Stats-Component-when-Grouping-in-Solr-tp4211402p4211422.html To unsubscribe from Division with Stats Component when Grouping in Solr, click here http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4211402code=a2luZ29maHlwb2NyaXRlc0BnbWFpbC5jb218NDIxMTQwMnwtNDY4MDgyMzk1 . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Division-with-Stats-Component-when-Grouping-in-Solr-tp4211402p4211525.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Division with Stats Component when Grouping in Solr
OK more info requestHandler name=standard class=solr.StandardRequestHandler arr name=components strquery/str strfacet/str stranalytics/str strhighlight/str strdebug/str strexpand/str /arr /requestHandler searchComponent name=analytics class=org.apache.solr.handler.component.AnalyticsComponent / I am going to try that after adding it to solrconfig.xml. On Sat, Jun 13, 2015 at 1:11 PM, William Bell billnb...@gmail.com wrote: Same here. What do we need to add to solrconfig.xml to get it to work? 1. SOLR-5302 https://issues.apache.org/jira/browse/SOLR-5302 2. 3. Help/ On Sat, Jun 13, 2015 at 8:34 AM, kingofhypocrites kingofhypocri...@gmail.com wrote: This looks very promising if only I could get it to work: https://issues.apache.org/jira/browse/SOLR-5302 https://issues.apache.org/jira/secure/attachment/12606793/Search%20Analytics%20Component.pdf Various links it points to are broken now and i can't find anything about it online, but the PDF indicates I can set olap=true to turn it on, although this doesn't seem to do anything. The docs say it supports limiting the results and doing math operations on statistics which is exactly what I need. I'm not clear if I need to install this or if this component is even used anymore. On Fri, Jun 12, 2015 at 12:00 PM Joel Bernstein [via Lucene] ml-node+s472066n4211422...@n3.nabble.com wrote: https://issues.apache.org/jira/browse/SOLR-7560, will almost support this in Solr 5.3. The compound function support won't be there yet though. But it will be there in the near future. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Jun 12, 2015 at 9:30 AM, kingofhypocrites [hidden email] http:///user/SendEmail.jtp?type=nodenode=4211422i=0 wrote: I am migrating a database from SQL Server to Cassandra. Currently I have a setup as follows: - Log data in Cassandra - Summarize data in Spark and put into Cassandra summary tables - Query data in Solr Everything fits beautifully until I need to do stats on groups. I am hoping to get this to work with Solr so I can stick to one database, but I am not sure it's possible. If I had it in SQL Server, I could do it like so: SELECT site_id, keyword, SUM(visits) as visits, CONVERT(DECIMAL(13, 3), SUM(bounces)) / SUM(visits) as bounce_rate, SUM(pageviews) as pageviews, CONVERT(DECIMAL(13, 3), SUM(pageviews)) / SUM(visits) as avg_pages_per_visit FROM report_all_keywords_daily WHERE site_id = 55 AND date_key = '20150606' AND date_key = '20150608' GROUP BY site_id, keyword ORDER BY visits DESC Now I need to replicate this in Solr. The closest I could get to this is by using the Stats component and then using field collapsing. group=truegroup.field=keywordstats=truestats.field=visitsstats.facet=keyword And here are some results I get back: http://pastebin.com/raw.php?i=Fxhe2RA0 However, I need to do able to divide certain metrics. I tried including functions in the stats.field such as div(sum(bounce_rate), (sum(visits)) but it doesn't recognize the functions. Also it seems to ignoring the paging for the stats results and returns all groups regardless. Ultimately I'd like something like this which is what I would get in SQL: http://lucene.472066.n3.nabble.com/file/n4211402/pic.png Is this possible or do I have to give up on the prospect of using Solr? I have to query this data dynamically so I can't pre-summarize all of it. To clarify I having the following two problems: - Paging is ignored for stats data - I can't figure out how to divide two stats together to get a third stat. Note: In some cases I would need to be able to sort on this combined stat -- View this message in context: http://lucene.472066.n3.nabble.com/Division-with-Stats-Component-when-Grouping-in-Solr-tp4211402.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Division-with-Stats-Component-when-Grouping-in-Solr-tp4211402p4211422.html To unsubscribe from Division with Stats Component when Grouping in Solr, click here http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4211402code=a2luZ29maHlwb2NyaXRlc0BnbWFpbC5jb218NDIxMTQwMnwtNDY4MDgyMzk1 . NAML
Re: Division with Stats Component when Grouping in Solr
Not you need to enable docValues to get range stuff to work. docValues=true on the field. On Sat, Jun 13, 2015 at 1:37 PM, William Bell billnb...@gmail.com wrote: OK. That works with one more change. lib dir=../../../dist/ regex=solr-analytics-.*\.jar / lib dir=../../../dist/ regex=solr-analysis-.*\.jar / http://localhost:8983/solr/select?q=*%3A*wt=jsonindent=truestats=trueolap=trueolap.overall_score.statistic.sum=sum(overall_score) On Sat, Jun 13, 2015 at 1:16 PM, William Bell billnb...@gmail.com wrote: OK more info requestHandler name=standard class=solr.StandardRequestHandler arr name=components strquery/str strfacet/str stranalytics/str strhighlight/str strdebug/str strexpand/str /arr /requestHandler searchComponent name=analytics class=org.apache.solr.handler.component.AnalyticsComponent / I am going to try that after adding it to solrconfig.xml. On Sat, Jun 13, 2015 at 1:11 PM, William Bell billnb...@gmail.com wrote: Same here. What do we need to add to solrconfig.xml to get it to work? 1. SOLR-5302 https://issues.apache.org/jira/browse/SOLR-5302 2. 3. Help/ On Sat, Jun 13, 2015 at 8:34 AM, kingofhypocrites kingofhypocri...@gmail.com wrote: This looks very promising if only I could get it to work: https://issues.apache.org/jira/browse/SOLR-5302 https://issues.apache.org/jira/secure/attachment/12606793/Search%20Analytics%20Component.pdf Various links it points to are broken now and i can't find anything about it online, but the PDF indicates I can set olap=true to turn it on, although this doesn't seem to do anything. The docs say it supports limiting the results and doing math operations on statistics which is exactly what I need. I'm not clear if I need to install this or if this component is even used anymore. On Fri, Jun 12, 2015 at 12:00 PM Joel Bernstein [via Lucene] ml-node+s472066n4211422...@n3.nabble.com wrote: https://issues.apache.org/jira/browse/SOLR-7560, will almost support this in Solr 5.3. The compound function support won't be there yet though. But it will be there in the near future. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Jun 12, 2015 at 9:30 AM, kingofhypocrites [hidden email] http:// /user/SendEmail.jtp?type=nodenode=4211422i=0 wrote: I am migrating a database from SQL Server to Cassandra. Currently I have a setup as follows: - Log data in Cassandra - Summarize data in Spark and put into Cassandra summary tables - Query data in Solr Everything fits beautifully until I need to do stats on groups. I am hoping to get this to work with Solr so I can stick to one database, but I am not sure it's possible. If I had it in SQL Server, I could do it like so: SELECT site_id, keyword, SUM(visits) as visits, CONVERT(DECIMAL(13, 3), SUM(bounces)) / SUM(visits) as bounce_rate, SUM(pageviews) as pageviews, CONVERT(DECIMAL(13, 3), SUM(pageviews)) / SUM(visits) as avg_pages_per_visit FROM report_all_keywords_daily WHERE site_id = 55 AND date_key = '20150606' AND date_key = '20150608' GROUP BY site_id, keyword ORDER BY visits DESC Now I need to replicate this in Solr. The closest I could get to this is by using the Stats component and then using field collapsing. group=truegroup.field=keywordstats=truestats.field=visitsstats.facet=keyword And here are some results I get back: http://pastebin.com/raw.php?i=Fxhe2RA0 However, I need to do able to divide certain metrics. I tried including functions in the stats.field such as div(sum(bounce_rate), (sum(visits)) but it doesn't recognize the functions. Also it seems to ignoring the paging for the stats results and returns all groups regardless. Ultimately I'd like something like this which is what I would get in SQL: http://lucene.472066.n3.nabble.com/file/n4211402/pic.png Is this possible or do I have to give up on the prospect of using Solr? I have to query this data dynamically so I can't pre-summarize all of it. To clarify I having the following two problems: - Paging is ignored for stats data - I can't figure out how to divide two stats together to get a third stat. Note: In some cases I would need to be able to sort on this combined stat -- View this message in context: http://lucene.472066.n3.nabble.com/Division-with-Stats-Component-when-Grouping-in-Solr-tp4211402.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Division-with-Stats-Component-when-Grouping-in-Solr-tp4211402p4211422.html To unsubscribe from Division with Stats Component when Grouping in
Re: Division with Stats Component when Grouping in Solr
OK. Kinda like pivoting stats... http://localhost:8983/solr/select?q=*%3A*wt=jsonindent=trueolap=trueolap.req1.fieldfacet=overall_scorefacet=truefacet.field=overall_scoreolap.req1.statistic.count=count(overall_score) Basically this does the same think in olap and facet. - response: { - numFound: 63061, - start: 0, - docs: [] }, - facet_counts: { - facet_queries: { }, - facet_fields: { - overall_score: [ - 1, - 40138, - 5, - 17487, - 2, - 2299, - 4, - 1810, - 3, - 1314 ] }, - facet_dates: { }, - facet_ranges: { }, - facet_intervals: { }, - facet_heatmaps: { } }, - stats: [ - req1, - [ - count, - 63048, - fieldFacets, - [ - overall_score, - [ - 1, - [ - count, - 40138 ], - 2, - [ - count, - 2299 ], - 3, - [ - count, - 1314 ], - 4, - [ - count, - 1810 ], - 5, - [ - count, - 17487 ] ] ], - rangeFacets, - [ ], - queryFacets, - [ ] ] ] } On Sat, Jun 13, 2015 at 2:06 PM, William Bell billnb...@gmail.com wrote: Having a hard time getting this to work: http://localhost:8983/solr/select?q=*%3A*wt=jsonindent=trueolap=trueolap.req1.fieldfacet=overall_score and even tried... I made sure docValues was set for overall_score too. http://hgsolr2devmstr:8983/solr/survey/select?q=*%3A*wt=jsonindent=trueolap=trueolap.fieldfacet=overall_score field name=overall_score type=int indexed=true stored=true docValues=true / On Sat, Jun 13, 2015 at 2:02 PM, William Bell billnb...@gmail.com wrote: Not you need to enable docValues to get range stuff to work. docValues=true on the field. On Sat, Jun 13, 2015 at 1:37 PM, William Bell billnb...@gmail.com wrote: OK. That works with one more change. lib dir=../../../dist/ regex=solr-analytics-.*\.jar / lib dir=../../../dist/ regex=solr-analysis-.*\.jar / http://localhost:8983/solr/select?q=*%3A*wt=jsonindent=truestats=trueolap=trueolap.overall_score.statistic.sum=sum(overall_score) On Sat, Jun 13, 2015 at 1:16 PM, William Bell billnb...@gmail.com wrote: OK more info requestHandler name=standard class=solr.StandardRequestHandler arr name=components strquery/str strfacet/str stranalytics/str strhighlight/str strdebug/str strexpand/str /arr /requestHandler searchComponent name=analytics class=org.apache.solr.handler.component.AnalyticsComponent / I am going to try that after adding it to solrconfig.xml. On Sat, Jun 13, 2015 at 1:11 PM, William Bell billnb...@gmail.com wrote: Same here. What do we need to add to solrconfig.xml to get it to work? 1. SOLR-5302 https://issues.apache.org/jira/browse/SOLR-5302 2. 3. Help/ On Sat, Jun 13, 2015 at 8:34 AM, kingofhypocrites kingofhypocri...@gmail.com wrote: This looks very promising if only I could get it to work: https://issues.apache.org/jira/browse/SOLR-5302 https://issues.apache.org/jira/secure/attachment/12606793/Search%20Analytics%20Component.pdf Various links it points to are broken now and i can't find anything about it online, but the PDF indicates I can set olap=true to turn it on, although this doesn't seem to do anything. The docs say it supports limiting the results and doing math operations on statistics which is exactly what I need. I'm not clear if I need to install this or if this component is even used anymore. On Fri, Jun 12, 2015 at 12:00 PM Joel Bernstein [via Lucene] ml-node+s472066n4211422...@n3.nabble.com wrote: https://issues.apache.org/jira/browse/SOLR-7560, will almost support this in Solr 5.3. The compound function support won't be there yet though. But it will be there in the near future. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Jun 12, 2015 at 9:30 AM, kingofhypocrites [hidden email] http:// /user/SendEmail.jtp?type=nodenode=4211422i=0 wrote: I am migrating a database from SQL Server to Cassandra. Currently I have a setup as follows: - Log data in Cassandra - Summarize data in Spark and put into Cassandra summary tables - Query data in Solr Everything fits beautifully until I need to do stats
Re: Division with Stats Component when Grouping in Solr
Same here. What do we need to add to solrconfig.xml to get it to work? 1. SOLR-5302 https://issues.apache.org/jira/browse/SOLR-5302 2. 3. Help/ On Sat, Jun 13, 2015 at 8:34 AM, kingofhypocrites kingofhypocri...@gmail.com wrote: This looks very promising if only I could get it to work: https://issues.apache.org/jira/browse/SOLR-5302 https://issues.apache.org/jira/secure/attachment/12606793/Search%20Analytics%20Component.pdf Various links it points to are broken now and i can't find anything about it online, but the PDF indicates I can set olap=true to turn it on, although this doesn't seem to do anything. The docs say it supports limiting the results and doing math operations on statistics which is exactly what I need. I'm not clear if I need to install this or if this component is even used anymore. On Fri, Jun 12, 2015 at 12:00 PM Joel Bernstein [via Lucene] ml-node+s472066n4211422...@n3.nabble.com wrote: https://issues.apache.org/jira/browse/SOLR-7560, will almost support this in Solr 5.3. The compound function support won't be there yet though. But it will be there in the near future. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Jun 12, 2015 at 9:30 AM, kingofhypocrites [hidden email] http:///user/SendEmail.jtp?type=nodenode=4211422i=0 wrote: I am migrating a database from SQL Server to Cassandra. Currently I have a setup as follows: - Log data in Cassandra - Summarize data in Spark and put into Cassandra summary tables - Query data in Solr Everything fits beautifully until I need to do stats on groups. I am hoping to get this to work with Solr so I can stick to one database, but I am not sure it's possible. If I had it in SQL Server, I could do it like so: SELECT site_id, keyword, SUM(visits) as visits, CONVERT(DECIMAL(13, 3), SUM(bounces)) / SUM(visits) as bounce_rate, SUM(pageviews) as pageviews, CONVERT(DECIMAL(13, 3), SUM(pageviews)) / SUM(visits) as avg_pages_per_visit FROM report_all_keywords_daily WHERE site_id = 55 AND date_key = '20150606' AND date_key = '20150608' GROUP BY site_id, keyword ORDER BY visits DESC Now I need to replicate this in Solr. The closest I could get to this is by using the Stats component and then using field collapsing. group=truegroup.field=keywordstats=truestats.field=visitsstats.facet=keyword And here are some results I get back: http://pastebin.com/raw.php?i=Fxhe2RA0 However, I need to do able to divide certain metrics. I tried including functions in the stats.field such as div(sum(bounce_rate), (sum(visits)) but it doesn't recognize the functions. Also it seems to ignoring the paging for the stats results and returns all groups regardless. Ultimately I'd like something like this which is what I would get in SQL: http://lucene.472066.n3.nabble.com/file/n4211402/pic.png Is this possible or do I have to give up on the prospect of using Solr? I have to query this data dynamically so I can't pre-summarize all of it. To clarify I having the following two problems: - Paging is ignored for stats data - I can't figure out how to divide two stats together to get a third stat. Note: In some cases I would need to be able to sort on this combined stat -- View this message in context: http://lucene.472066.n3.nabble.com/Division-with-Stats-Component-when-Grouping-in-Solr-tp4211402.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Division-with-Stats-Component-when-Grouping-in-Solr-tp4211402p4211422.html To unsubscribe from Division with Stats Component when Grouping in Solr, click here http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4211402code=a2luZ29maHlwb2NyaXRlc0BnbWFpbC5jb218NDIxMTQwMnwtNDY4MDgyMzk1 . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Division-with-Stats-Component-when-Grouping-in-Solr-tp4211402p4211525.html Sent from the Solr - User mailing list archive at Nabble.com. -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: Division with Stats Component when Grouping in Solr
OK. That works with one more change. lib dir=../../../dist/ regex=solr-analytics-.*\.jar / lib dir=../../../dist/ regex=solr-analysis-.*\.jar / http://localhost:8983/solr/select?q=*%3A*wt=jsonindent=truestats=trueolap=trueolap.overall_score.statistic.sum=sum(overall_score) On Sat, Jun 13, 2015 at 1:16 PM, William Bell billnb...@gmail.com wrote: OK more info requestHandler name=standard class=solr.StandardRequestHandler arr name=components strquery/str strfacet/str stranalytics/str strhighlight/str strdebug/str strexpand/str /arr /requestHandler searchComponent name=analytics class=org.apache.solr.handler.component.AnalyticsComponent / I am going to try that after adding it to solrconfig.xml. On Sat, Jun 13, 2015 at 1:11 PM, William Bell billnb...@gmail.com wrote: Same here. What do we need to add to solrconfig.xml to get it to work? 1. SOLR-5302 https://issues.apache.org/jira/browse/SOLR-5302 2. 3. Help/ On Sat, Jun 13, 2015 at 8:34 AM, kingofhypocrites kingofhypocri...@gmail.com wrote: This looks very promising if only I could get it to work: https://issues.apache.org/jira/browse/SOLR-5302 https://issues.apache.org/jira/secure/attachment/12606793/Search%20Analytics%20Component.pdf Various links it points to are broken now and i can't find anything about it online, but the PDF indicates I can set olap=true to turn it on, although this doesn't seem to do anything. The docs say it supports limiting the results and doing math operations on statistics which is exactly what I need. I'm not clear if I need to install this or if this component is even used anymore. On Fri, Jun 12, 2015 at 12:00 PM Joel Bernstein [via Lucene] ml-node+s472066n4211422...@n3.nabble.com wrote: https://issues.apache.org/jira/browse/SOLR-7560, will almost support this in Solr 5.3. The compound function support won't be there yet though. But it will be there in the near future. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Jun 12, 2015 at 9:30 AM, kingofhypocrites [hidden email] http:// /user/SendEmail.jtp?type=nodenode=4211422i=0 wrote: I am migrating a database from SQL Server to Cassandra. Currently I have a setup as follows: - Log data in Cassandra - Summarize data in Spark and put into Cassandra summary tables - Query data in Solr Everything fits beautifully until I need to do stats on groups. I am hoping to get this to work with Solr so I can stick to one database, but I am not sure it's possible. If I had it in SQL Server, I could do it like so: SELECT site_id, keyword, SUM(visits) as visits, CONVERT(DECIMAL(13, 3), SUM(bounces)) / SUM(visits) as bounce_rate, SUM(pageviews) as pageviews, CONVERT(DECIMAL(13, 3), SUM(pageviews)) / SUM(visits) as avg_pages_per_visit FROM report_all_keywords_daily WHERE site_id = 55 AND date_key = '20150606' AND date_key = '20150608' GROUP BY site_id, keyword ORDER BY visits DESC Now I need to replicate this in Solr. The closest I could get to this is by using the Stats component and then using field collapsing. group=truegroup.field=keywordstats=truestats.field=visitsstats.facet=keyword And here are some results I get back: http://pastebin.com/raw.php?i=Fxhe2RA0 However, I need to do able to divide certain metrics. I tried including functions in the stats.field such as div(sum(bounce_rate), (sum(visits)) but it doesn't recognize the functions. Also it seems to ignoring the paging for the stats results and returns all groups regardless. Ultimately I'd like something like this which is what I would get in SQL: http://lucene.472066.n3.nabble.com/file/n4211402/pic.png Is this possible or do I have to give up on the prospect of using Solr? I have to query this data dynamically so I can't pre-summarize all of it. To clarify I having the following two problems: - Paging is ignored for stats data - I can't figure out how to divide two stats together to get a third stat. Note: In some cases I would need to be able to sort on this combined stat -- View this message in context: http://lucene.472066.n3.nabble.com/Division-with-Stats-Component-when-Grouping-in-Solr-tp4211402.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Division-with-Stats-Component-when-Grouping-in-Solr-tp4211402p4211422.html To unsubscribe from Division with Stats Component when Grouping in Solr, click here
Re: Division with Stats Component when Grouping in Solr
Having a hard time getting this to work: http://localhost:8983/solr/select?q=*%3A*wt=jsonindent=trueolap=trueolap.req1.fieldfacet=overall_score and even tried... I made sure docValues was set for overall_score too. http://hgsolr2devmstr:8983/solr/survey/select?q=*%3A*wt=jsonindent=trueolap=trueolap.fieldfacet=overall_score field name=overall_score type=int indexed=true stored=true docValues=true / On Sat, Jun 13, 2015 at 2:02 PM, William Bell billnb...@gmail.com wrote: Not you need to enable docValues to get range stuff to work. docValues=true on the field. On Sat, Jun 13, 2015 at 1:37 PM, William Bell billnb...@gmail.com wrote: OK. That works with one more change. lib dir=../../../dist/ regex=solr-analytics-.*\.jar / lib dir=../../../dist/ regex=solr-analysis-.*\.jar / http://localhost:8983/solr/select?q=*%3A*wt=jsonindent=truestats=trueolap=trueolap.overall_score.statistic.sum=sum(overall_score) On Sat, Jun 13, 2015 at 1:16 PM, William Bell billnb...@gmail.com wrote: OK more info requestHandler name=standard class=solr.StandardRequestHandler arr name=components strquery/str strfacet/str stranalytics/str strhighlight/str strdebug/str strexpand/str /arr /requestHandler searchComponent name=analytics class=org.apache.solr.handler.component.AnalyticsComponent / I am going to try that after adding it to solrconfig.xml. On Sat, Jun 13, 2015 at 1:11 PM, William Bell billnb...@gmail.com wrote: Same here. What do we need to add to solrconfig.xml to get it to work? 1. SOLR-5302 https://issues.apache.org/jira/browse/SOLR-5302 2. 3. Help/ On Sat, Jun 13, 2015 at 8:34 AM, kingofhypocrites kingofhypocri...@gmail.com wrote: This looks very promising if only I could get it to work: https://issues.apache.org/jira/browse/SOLR-5302 https://issues.apache.org/jira/secure/attachment/12606793/Search%20Analytics%20Component.pdf Various links it points to are broken now and i can't find anything about it online, but the PDF indicates I can set olap=true to turn it on, although this doesn't seem to do anything. The docs say it supports limiting the results and doing math operations on statistics which is exactly what I need. I'm not clear if I need to install this or if this component is even used anymore. On Fri, Jun 12, 2015 at 12:00 PM Joel Bernstein [via Lucene] ml-node+s472066n4211422...@n3.nabble.com wrote: https://issues.apache.org/jira/browse/SOLR-7560, will almost support this in Solr 5.3. The compound function support won't be there yet though. But it will be there in the near future. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Jun 12, 2015 at 9:30 AM, kingofhypocrites [hidden email] http:// /user/SendEmail.jtp?type=nodenode=4211422i=0 wrote: I am migrating a database from SQL Server to Cassandra. Currently I have a setup as follows: - Log data in Cassandra - Summarize data in Spark and put into Cassandra summary tables - Query data in Solr Everything fits beautifully until I need to do stats on groups. I am hoping to get this to work with Solr so I can stick to one database, but I am not sure it's possible. If I had it in SQL Server, I could do it like so: SELECT site_id, keyword, SUM(visits) as visits, CONVERT(DECIMAL(13, 3), SUM(bounces)) / SUM(visits) as bounce_rate, SUM(pageviews) as pageviews, CONVERT(DECIMAL(13, 3), SUM(pageviews)) / SUM(visits) as avg_pages_per_visit FROM report_all_keywords_daily WHERE site_id = 55 AND date_key = '20150606' AND date_key = '20150608' GROUP BY site_id, keyword ORDER BY visits DESC Now I need to replicate this in Solr. The closest I could get to this is by using the Stats component and then using field collapsing. group=truegroup.field=keywordstats=truestats.field=visitsstats.facet=keyword And here are some results I get back: http://pastebin.com/raw.php?i=Fxhe2RA0 However, I need to do able to divide certain metrics. I tried including functions in the stats.field such as div(sum(bounce_rate), (sum(visits)) but it doesn't recognize the functions. Also it seems to ignoring the paging for the stats results and returns all groups regardless. Ultimately I'd like something like this which is what I would get in SQL: http://lucene.472066.n3.nabble.com/file/n4211402/pic.png Is this possible or do I have to give up on the prospect of using Solr? I have to query this data dynamically so I can't pre-summarize all of it. To clarify I having the following two problems: - Paging is ignored for stats data - I can't figure out how to divide two stats together to get a third stat. Note: In some cases I would need to be able to sort on this combined stat -- View this message in
Re: Division with Stats Component when Grouping in Solr
On Fri, Jun 12, 2015 at 10:30 AM, kingofhypocrites kingofhypocri...@gmail.com wrote: I am migrating a database from SQL Server to Cassandra. Currently I have a setup as follows: - Log data in Cassandra - Summarize data in Spark and put into Cassandra summary tables - Query data in Solr Everything fits beautifully until I need to do stats on groups. I am hoping to get this to work with Solr so I can stick to one database, but I am not sure it's possible. If I had it in SQL Server, I could do it like so: SELECT site_id, keyword, SUM(visits) as visits, CONVERT(DECIMAL(13, 3), SUM(bounces)) / SUM(visits) as bounce_rate, SUM(pageviews) as pageviews, CONVERT(DECIMAL(13, 3), SUM(pageviews)) / SUM(visits) as avg_pages_per_visit FROM report_all_keywords_daily WHERE site_id = 55 AND date_key = '20150606' AND date_key = '20150608' GROUP BY site_id, keyword ORDER BY visits DESC This is the closest we can get with the JSON Facet API today: json.facet={ sites: { type : terms, field : site_id, sort : visits desc, facet : { visits : sum(visits), bounces : sum(bounces), pageviews : sum(pageviews) } } } That doesn't take into account keyword when sorting the buckets. You could nest a ketword facet inside a site facet and thus calculate the stats for the top N keywords per site: json.facet={ sites: { type : terms, field : site_id, facet : { keywords: { type : terms, field : keyword, sort : visits desc, facet : { visits : sum(visits), bounces : sum(bounces), pageviews : sum(pageviews) } } } } More info here: http://yonik.com/json-facet-api/ -Yonik
Re: Division with Stats Component when Grouping in Solr
It would be cool to be able to set 2 group by with facets GROUP BY site_id, keyword Bill Bell Sent from mobile On Jun 13, 2015, at 2:28 PM, Yonik Seeley ysee...@gmail.com wrote: GROUP BY site_id, keyword
Re: Division with Stats Component when Grouping in Solr
https://issues.apache.org/jira/browse/SOLR-7560, will almost support this in Solr 5.3. The compound function support won't be there yet though. But it will be there in the near future. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Jun 12, 2015 at 9:30 AM, kingofhypocrites kingofhypocri...@gmail.com wrote: I am migrating a database from SQL Server to Cassandra. Currently I have a setup as follows: - Log data in Cassandra - Summarize data in Spark and put into Cassandra summary tables - Query data in Solr Everything fits beautifully until I need to do stats on groups. I am hoping to get this to work with Solr so I can stick to one database, but I am not sure it's possible. If I had it in SQL Server, I could do it like so: SELECT site_id, keyword, SUM(visits) as visits, CONVERT(DECIMAL(13, 3), SUM(bounces)) / SUM(visits) as bounce_rate, SUM(pageviews) as pageviews, CONVERT(DECIMAL(13, 3), SUM(pageviews)) / SUM(visits) as avg_pages_per_visit FROM report_all_keywords_daily WHERE site_id = 55 AND date_key = '20150606' AND date_key = '20150608' GROUP BY site_id, keyword ORDER BY visits DESC Now I need to replicate this in Solr. The closest I could get to this is by using the Stats component and then using field collapsing. group=truegroup.field=keywordstats=truestats.field=visitsstats.facet=keyword And here are some results I get back: http://pastebin.com/raw.php?i=Fxhe2RA0 However, I need to do able to divide certain metrics. I tried including functions in the stats.field such as div(sum(bounce_rate), (sum(visits)) but it doesn't recognize the functions. Also it seems to ignoring the paging for the stats results and returns all groups regardless. Ultimately I'd like something like this which is what I would get in SQL: http://lucene.472066.n3.nabble.com/file/n4211402/pic.png Is this possible or do I have to give up on the prospect of using Solr? I have to query this data dynamically so I can't pre-summarize all of it. To clarify I having the following two problems: - Paging is ignored for stats data - I can't figure out how to divide two stats together to get a third stat. Note: In some cases I would need to be able to sort on this combined stat -- View this message in context: http://lucene.472066.n3.nabble.com/Division-with-Stats-Component-when-Grouping-in-Solr-tp4211402.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Division with Stats Component when Grouping in Solr
If you are a java programmer you may want to look at plugging in your own custom Streams into the Streaming API. The SQL stuff is built on top of the Streaming API. http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Jun 12, 2015 at 11:00 AM, Joel Bernstein joels...@gmail.com wrote: https://issues.apache.org/jira/browse/SOLR-7560, will almost support this in Solr 5.3. The compound function support won't be there yet though. But it will be there in the near future. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Jun 12, 2015 at 9:30 AM, kingofhypocrites kingofhypocri...@gmail.com wrote: I am migrating a database from SQL Server to Cassandra. Currently I have a setup as follows: - Log data in Cassandra - Summarize data in Spark and put into Cassandra summary tables - Query data in Solr Everything fits beautifully until I need to do stats on groups. I am hoping to get this to work with Solr so I can stick to one database, but I am not sure it's possible. If I had it in SQL Server, I could do it like so: SELECT site_id, keyword, SUM(visits) as visits, CONVERT(DECIMAL(13, 3), SUM(bounces)) / SUM(visits) as bounce_rate, SUM(pageviews) as pageviews, CONVERT(DECIMAL(13, 3), SUM(pageviews)) / SUM(visits) as avg_pages_per_visit FROM report_all_keywords_daily WHERE site_id = 55 AND date_key = '20150606' AND date_key = '20150608' GROUP BY site_id, keyword ORDER BY visits DESC Now I need to replicate this in Solr. The closest I could get to this is by using the Stats component and then using field collapsing. group=truegroup.field=keywordstats=truestats.field=visitsstats.facet=keyword And here are some results I get back: http://pastebin.com/raw.php?i=Fxhe2RA0 However, I need to do able to divide certain metrics. I tried including functions in the stats.field such as div(sum(bounce_rate), (sum(visits)) but it doesn't recognize the functions. Also it seems to ignoring the paging for the stats results and returns all groups regardless. Ultimately I'd like something like this which is what I would get in SQL: http://lucene.472066.n3.nabble.com/file/n4211402/pic.png Is this possible or do I have to give up on the prospect of using Solr? I have to query this data dynamically so I can't pre-summarize all of it. To clarify I having the following two problems: - Paging is ignored for stats data - I can't figure out how to divide two stats together to get a third stat. Note: In some cases I would need to be able to sort on this combined stat -- View this message in context: http://lucene.472066.n3.nabble.com/Division-with-Stats-Component-when-Grouping-in-Solr-tp4211402.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Division with Stats Component when Grouping in Solr
: However, I need to do able to divide certain metrics. I tried including : functions in the stats.field such as div(sum(bounce_rate), (sum(visits)) but : it doesn't recognize the functions. Also it seems to ignoring the paging for : the stats results and returns all groups regardless. i'm lost on what your goal is regarding grouping and what you mean by ignoring the paging but FWIW stats.field does support functions (or query scores) -- you just need to use local params to make it clear that you are passing in a function name and not a field name... https://cwiki.apache.org/confluence/display/solr/The+Stats+Component Example... http://localhost:8983/solr/techproducts/select?q=*:*stats=truestats.field={!func}termfreq('text','memory')stats.field=pricestats.field=popularityrows=0indent=true : Ultimately I'd like something like this which is what I would get in SQL: : http://lucene.472066.n3.nabble.com/file/n4211402/pic.png at first glance, making some assumptions about your data, this looks like pivot faceting with some stats hanging off of it -- ie: facet.pivot={!stats=nest}site_id,keyword stats.field={!tag=nest sum=true}visits stats.field={!tag=nest sum=true}bounces stats.field={!tag=nest sum=true}pageviews https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-CombiningStatsComponentWithPivots ...that will give you the sum or each of the specified fields for each top keyword (by doc count) for each top site_id (by doc count). (Computing the bounce_rate and avg_pages_per_visit is simple client side division) : - Paging is ignored for stats data How/Why exactly do you want/expect paging to affect stats computation? stats are over entire result sets -- if you wnated stats just over a single page that's trivial to do in the client. : - I can't figure out how to divide two stats together to get a third stat. : Note: In some cases I would need to be able to sort on this combined stat Yeah, unfortunately sorting pivots facet results currently only works by either hte doc count or the term, not an arbitrary stat on the docs in the pivot subset (that's a really hard problem to solve for arbitrary functions in a distributed setup) ... the new JSON faceting stuff might do what you want, but i don't really know enough about it to say... https://cwiki.apache.org/confluence/display/solr/JSON+Request+API -Hoss http://www.lucidworks.com/