Re: Get all results from a solr query
: stores, just a portion of it. Currently, I need to get 16 records at : once, not just the 10 that show. So I have the rows set to "99" for : the testing phase, and I can increase it later. I just wanted to have : a better way of getting all the results that didn't require hard : coding a value. I don't foresee the results ever getting to the : thousands -- and if grows to become larger then I will do paging on : the results. if you don't foresee it getting bigger then the thousands, use rows=999 and add an assertion that the result count isn't bigger then that. that way if you don't foresee correctly, you won't get back more data then you cna handle. : It seems that Solr doesn't have the feature that I need. I'll make do This is intentional... http://wiki.apache.org/solr/FAQ#How_can_I_get_ALL_the_matching_documents_back.3F_..._How_can_I_return_an_unlimited_number_of_rows.3F -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
Re: Get all results from a solr query
Look up _docid_ on the Solr wiki. It lets you walk the entire index about as fast as possible. On Fri, Sep 17, 2010 at 8:47 AM, Christopher Gross wrote: > Thanks for being so helpful! You really helped me to answer my > question! You aren't condescending at all! > > I'm not using it to pull down *everything* that the Solr instance > stores, just a portion of it. Currently, I need to get 16 records at > once, not just the 10 that show. So I have the rows set to "99" for > the testing phase, and I can increase it later. I just wanted to have > a better way of getting all the results that didn't require hard > coding a value. I don't foresee the results ever getting to the > thousands -- and if grows to become larger then I will do paging on > the results. > > Doing multiple queries isn't an option -- the results are getting > processed with an xslt and then immediately being displayed, hence my > need to just do this in one shot. > > It seems that Solr doesn't have the feature that I need. I'll make do > with what I have for now, unless they end up adding something to > return all rows. I appreciate the ideas, thanks to everyone who > posted something useful! > > -- Chris > > > > On Fri, Sep 17, 2010 at 11:19 AM, Walter Underwood > wrote: >> Go ahead and put an absurdly large value as the rows parameter. >> >> Then wait, because that query is going to take a really long time, it can >> interfere with every other query on the Solr server (denial of service), and >> quite possibly cause your client to run out of memory as it parses the >> result. >> >> After you break your system with the query, you can go back to paged results. >> >> wunder >> >> On Sep 17, 2010, at 5:23 AM, Christopher Gross wrote: >> >>> @Markus Jelsma - the wiki confirms what I said before: >>> rows >>> >>> This parameter is used to paginate results from a query. When >>> specified, it indicates the maximum number of documents from the >>> complete result set to return to the client for every request. (You >>> can consider it as the maximum number of result appear in the page) >>> >>> The default value is "10" >>> >>> ...So it defaults to 10, which is my problem. >>> >>> @Sashi Kant - I was hoping that there was a way to get everything in >>> one shot, hence trying to override the rows parameter without having >>> to put in an absurdly large number (that I might have to >>> replace/change if the collection size grows above it). >>> >>> @Scott Gonyea - It's a 10-net anyways, I'd have to be on your network >>> to do any damage. ;) >>> >>> -- Chris >>> >>> >>> >>> On Thu, Sep 16, 2010 at 5:57 PM, Scott Gonyea wrote: lol, note to self: scratch out IPs. Good thing firewalls exist to keep my stupidity at bay. Scott On Thu, Sep 16, 2010 at 2:55 PM, Scott Gonyea wrote: > If you want to do it in Ruby, you can use this script as scaffolding: > require 'rsolr' # run `gem install rsolr` to get this > solr = RSolr.connect(:url => 'http://ip-10-164-13-204:8983/solr') > total = solr.select({:rows => 0})["response"]["numFound"] > rows = 10 > query = { > :rows => rows, > :start => 0 > } > pages = (total.to_f / rows.to_f).ceil # round up > (1..pages).each do |page| > query[:start] = (page-1) * rows > results = solr.select(query) > docs = results[:response][:docs] > # Do stuff here > # > docs.each do |doc| > doc[:content] = "IN UR SOLR MESSIN UP UR CONTENT!#{doc[:content]}" > end > # Add it back in to Solr > solr.add(docs) > solr.commit > end > > Scott > > On Thu, Sep 16, 2010 at 2:27 PM, Shashi Kant wrote: >> >> Start with a *:*, then the “numFound” attribute of the >> element should give you the rows to fetch by a 2nd request. >> >> >> On Thu, Sep 16, 2010 at 4:49 PM, Christopher Gross >> wrote: >>> That will stil just return 10 rows for me. Is there something else in >>> the configuration of solr to have it return all the rows in the >>> results? >>> >>> -- Chris >>> >>> >>> >>> On Thu, Sep 16, 2010 at 4:43 PM, Shashi Kant >>> wrote: q=*:* On Thu, Sep 16, 2010 at 4:39 PM, Christopher Gross wrote: > I have some queries that I'm running against a solr instance (older, > 1.2 I believe), and I would like to get *all* the results back (and > not have to put an absurdly large number as a part of the rows > parameter). > > Is there a way that I can do that? Any help would be appreciated. > > -- Chris > >>> > >> >> >> >> >> >> > -- Lance Norskog goks...@gmail.com
Re: Get all results from a solr query
Thanks for being so helpful! You really helped me to answer my question! You aren't condescending at all! I'm not using it to pull down *everything* that the Solr instance stores, just a portion of it. Currently, I need to get 16 records at once, not just the 10 that show. So I have the rows set to "99" for the testing phase, and I can increase it later. I just wanted to have a better way of getting all the results that didn't require hard coding a value. I don't foresee the results ever getting to the thousands -- and if grows to become larger then I will do paging on the results. Doing multiple queries isn't an option -- the results are getting processed with an xslt and then immediately being displayed, hence my need to just do this in one shot. It seems that Solr doesn't have the feature that I need. I'll make do with what I have for now, unless they end up adding something to return all rows. I appreciate the ideas, thanks to everyone who posted something useful! -- Chris On Fri, Sep 17, 2010 at 11:19 AM, Walter Underwood wrote: > Go ahead and put an absurdly large value as the rows parameter. > > Then wait, because that query is going to take a really long time, it can > interfere with every other query on the Solr server (denial of service), and > quite possibly cause your client to run out of memory as it parses the result. > > After you break your system with the query, you can go back to paged results. > > wunder > > On Sep 17, 2010, at 5:23 AM, Christopher Gross wrote: > >> @Markus Jelsma - the wiki confirms what I said before: >> rows >> >> This parameter is used to paginate results from a query. When >> specified, it indicates the maximum number of documents from the >> complete result set to return to the client for every request. (You >> can consider it as the maximum number of result appear in the page) >> >> The default value is "10" >> >> ...So it defaults to 10, which is my problem. >> >> @Sashi Kant - I was hoping that there was a way to get everything in >> one shot, hence trying to override the rows parameter without having >> to put in an absurdly large number (that I might have to >> replace/change if the collection size grows above it). >> >> @Scott Gonyea - It's a 10-net anyways, I'd have to be on your network >> to do any damage. ;) >> >> -- Chris >> >> >> >> On Thu, Sep 16, 2010 at 5:57 PM, Scott Gonyea wrote: >>> lol, note to self: scratch out IPs. Good thing firewalls exist to >>> keep my stupidity at bay. >>> >>> Scott >>> >>> On Thu, Sep 16, 2010 at 2:55 PM, Scott Gonyea wrote: If you want to do it in Ruby, you can use this script as scaffolding: require 'rsolr' # run `gem install rsolr` to get this solr = RSolr.connect(:url => 'http://ip-10-164-13-204:8983/solr') total = solr.select({:rows => 0})["response"]["numFound"] rows = 10 query = { :rows => rows, :start => 0 } pages = (total.to_f / rows.to_f).ceil # round up (1..pages).each do |page| query[:start] = (page-1) * rows results = solr.select(query) docs = results[:response][:docs] # Do stuff here # docs.each do |doc| doc[:content] = "IN UR SOLR MESSIN UP UR CONTENT!#{doc[:content]}" end # Add it back in to Solr solr.add(docs) solr.commit end Scott On Thu, Sep 16, 2010 at 2:27 PM, Shashi Kant wrote: > > Start with a *:*, then the “numFound” attribute of the > element should give you the rows to fetch by a 2nd request. > > > On Thu, Sep 16, 2010 at 4:49 PM, Christopher Gross > wrote: >> That will stil just return 10 rows for me. Is there something else in >> the configuration of solr to have it return all the rows in the >> results? >> >> -- Chris >> >> >> >> On Thu, Sep 16, 2010 at 4:43 PM, Shashi Kant wrote: >>> q=*:* >>> >>> On Thu, Sep 16, 2010 at 4:39 PM, Christopher Gross >>> wrote: I have some queries that I'm running against a solr instance (older, 1.2 I believe), and I would like to get *all* the results back (and not have to put an absurdly large number as a part of the rows parameter). Is there a way that I can do that? Any help would be appreciated. -- Chris >>> >> >>> > > > > > >
Re: Get all results from a solr query
Go ahead and put an absurdly large value as the rows parameter. Then wait, because that query is going to take a really long time, it can interfere with every other query on the Solr server (denial of service), and quite possibly cause your client to run out of memory as it parses the result. After you break your system with the query, you can go back to paged results. wunder On Sep 17, 2010, at 5:23 AM, Christopher Gross wrote: > @Markus Jelsma - the wiki confirms what I said before: > rows > > This parameter is used to paginate results from a query. When > specified, it indicates the maximum number of documents from the > complete result set to return to the client for every request. (You > can consider it as the maximum number of result appear in the page) > > The default value is "10" > > ...So it defaults to 10, which is my problem. > > @Sashi Kant - I was hoping that there was a way to get everything in > one shot, hence trying to override the rows parameter without having > to put in an absurdly large number (that I might have to > replace/change if the collection size grows above it). > > @Scott Gonyea - It's a 10-net anyways, I'd have to be on your network > to do any damage. ;) > > -- Chris > > > > On Thu, Sep 16, 2010 at 5:57 PM, Scott Gonyea wrote: >> lol, note to self: scratch out IPs. Good thing firewalls exist to >> keep my stupidity at bay. >> >> Scott >> >> On Thu, Sep 16, 2010 at 2:55 PM, Scott Gonyea wrote: >>> If you want to do it in Ruby, you can use this script as scaffolding: >>> require 'rsolr' # run `gem install rsolr` to get this >>> solr = RSolr.connect(:url => 'http://ip-10-164-13-204:8983/solr') >>> total = solr.select({:rows => 0})["response"]["numFound"] >>> rows = 10 >>> query = { >>> :rows => rows, >>> :start => 0 >>> } >>> pages = (total.to_f / rows.to_f).ceil # round up >>> (1..pages).each do |page| >>> query[:start] = (page-1) * rows >>> results = solr.select(query) >>> docs= results[:response][:docs] >>> # Do stuff here >>> # >>> docs.each do |doc| >>> doc[:content] = "IN UR SOLR MESSIN UP UR CONTENT!#{doc[:content]}" >>> end >>> # Add it back in to Solr >>> solr.add(docs) >>> solr.commit >>> end >>> >>> Scott >>> >>> On Thu, Sep 16, 2010 at 2:27 PM, Shashi Kant wrote: Start with a *:*, then the “numFound” attribute of the element should give you the rows to fetch by a 2nd request. On Thu, Sep 16, 2010 at 4:49 PM, Christopher Gross wrote: > That will stil just return 10 rows for me. Is there something else in > the configuration of solr to have it return all the rows in the > results? > > -- Chris > > > > On Thu, Sep 16, 2010 at 4:43 PM, Shashi Kant wrote: >> q=*:* >> >> On Thu, Sep 16, 2010 at 4:39 PM, Christopher Gross >> wrote: >>> I have some queries that I'm running against a solr instance (older, >>> 1.2 I believe), and I would like to get *all* the results back (and >>> not have to put an absurdly large number as a part of the rows >>> parameter). >>> >>> Is there a way that I can do that? Any help would be appreciated. >>> >>> -- Chris >>> >> > >>> >>
Re: Get all results from a solr query
Chris, I agree, having the ability to make rows something like -1 to bring back everything would be convenient. However, the 2 call approach (q=blah&rows=0 followed by q=blah&rows=numFound) isn't that slow, and does give you more information up front. You can optimize your Array or List<> sizes in advance, you could make sure that it isn't a runaway query and you are about to be overloaded with data, you could split it up into parallel processes, ie: Thread(q=blah&start=0&rows=numFound/4) Thread(q=blah&start=numFound/4&rows=numFound/4) Thread(q=blah&start=(numFound/4 *2)&rows=numFound/4) Thread(q=blah&start=(numFound/4*3)&rows=numFound/4) (not sure my math is right, did it quickly, but you get the point). Anyway, having that number can be very useful for more than just knowing max results. Ken -- View this message in context: http://lucene.472066.n3.nabble.com/Get-all-results-from-a-solr-query-tp1515125p1516751.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Get all results from a solr query
@Markus Jelsma - the wiki confirms what I said before: rows This parameter is used to paginate results from a query. When specified, it indicates the maximum number of documents from the complete result set to return to the client for every request. (You can consider it as the maximum number of result appear in the page) The default value is "10" ...So it defaults to 10, which is my problem. @Sashi Kant - I was hoping that there was a way to get everything in one shot, hence trying to override the rows parameter without having to put in an absurdly large number (that I might have to replace/change if the collection size grows above it). @Scott Gonyea - It's a 10-net anyways, I'd have to be on your network to do any damage. ;) -- Chris On Thu, Sep 16, 2010 at 5:57 PM, Scott Gonyea wrote: > lol, note to self: scratch out IPs. Good thing firewalls exist to > keep my stupidity at bay. > > Scott > > On Thu, Sep 16, 2010 at 2:55 PM, Scott Gonyea wrote: >> If you want to do it in Ruby, you can use this script as scaffolding: >> require 'rsolr' # run `gem install rsolr` to get this >> solr = RSolr.connect(:url => 'http://ip-10-164-13-204:8983/solr') >> total = solr.select({:rows => 0})["response"]["numFound"] >> rows = 10 >> query = { >> :rows => rows, >> :start => 0 >> } >> pages = (total.to_f / rows.to_f).ceil # round up >> (1..pages).each do |page| >> query[:start] = (page-1) * rows >> results = solr.select(query) >> docs = results[:response][:docs] >> # Do stuff here >> # >> docs.each do |doc| >> doc[:content] = "IN UR SOLR MESSIN UP UR CONTENT!#{doc[:content]}" >> end >> # Add it back in to Solr >> solr.add(docs) >> solr.commit >> end >> >> Scott >> >> On Thu, Sep 16, 2010 at 2:27 PM, Shashi Kant wrote: >>> >>> Start with a *:*, then the “numFound” attribute of the >>> element should give you the rows to fetch by a 2nd request. >>> >>> >>> On Thu, Sep 16, 2010 at 4:49 PM, Christopher Gross >>> wrote: >>> > That will stil just return 10 rows for me. Is there something else in >>> > the configuration of solr to have it return all the rows in the >>> > results? >>> > >>> > -- Chris >>> > >>> > >>> > >>> > On Thu, Sep 16, 2010 at 4:43 PM, Shashi Kant wrote: >>> >> q=*:* >>> >> >>> >> On Thu, Sep 16, 2010 at 4:39 PM, Christopher Gross >>> >> wrote: >>> >>> I have some queries that I'm running against a solr instance (older, >>> >>> 1.2 I believe), and I would like to get *all* the results back (and >>> >>> not have to put an absurdly large number as a part of the rows >>> >>> parameter). >>> >>> >>> >>> Is there a way that I can do that? Any help would be appreciated. >>> >>> >>> >>> -- Chris >>> >>> >>> >> >>> > >> >
Re: Get all results from a solr query
lol, note to self: scratch out IPs. Good thing firewalls exist to keep my stupidity at bay. Scott On Thu, Sep 16, 2010 at 2:55 PM, Scott Gonyea wrote: > If you want to do it in Ruby, you can use this script as scaffolding: > require 'rsolr' # run `gem install rsolr` to get this > solr = RSolr.connect(:url => 'http://ip-10-164-13-204:8983/solr') > total = solr.select({:rows => 0})["response"]["numFound"] > rows = 10 > query = { > :rows => rows, > :start => 0 > } > pages = (total.to_f / rows.to_f).ceil # round up > (1..pages).each do |page| > query[:start] = (page-1) * rows > results = solr.select(query) > docs = results[:response][:docs] > # Do stuff here > # > docs.each do |doc| > doc[:content] = "IN UR SOLR MESSIN UP UR CONTENT!#{doc[:content]}" > end > # Add it back in to Solr > solr.add(docs) > solr.commit > end > > Scott > > On Thu, Sep 16, 2010 at 2:27 PM, Shashi Kant wrote: >> >> Start with a *:*, then the “numFound” attribute of the >> element should give you the rows to fetch by a 2nd request. >> >> >> On Thu, Sep 16, 2010 at 4:49 PM, Christopher Gross wrote: >> > That will stil just return 10 rows for me. Is there something else in >> > the configuration of solr to have it return all the rows in the >> > results? >> > >> > -- Chris >> > >> > >> > >> > On Thu, Sep 16, 2010 at 4:43 PM, Shashi Kant wrote: >> >> q=*:* >> >> >> >> On Thu, Sep 16, 2010 at 4:39 PM, Christopher Gross >> >> wrote: >> >>> I have some queries that I'm running against a solr instance (older, >> >>> 1.2 I believe), and I would like to get *all* the results back (and >> >>> not have to put an absurdly large number as a part of the rows >> >>> parameter). >> >>> >> >>> Is there a way that I can do that? Any help would be appreciated. >> >>> >> >>> -- Chris >> >>> >> >> >> > >
Re: Get all results from a solr query
If you want to do it in Ruby, you can use this script as scaffolding: require 'rsolr' # run `gem install rsolr` to get this solr = RSolr.connect(:url => 'http://ip-10-164-13-204:8983/solr') total = solr.select({:rows => 0})["response"]["numFound"] rows = 10 query = { :rows => rows, :start => 0 } pages = (total.to_f / rows.to_f).ceil # round up (1..pages).each do |page| query[:start] = (page-1) * rows results = solr.select(query) docs = results[:response][:docs] # Do stuff here # docs.each do |doc| doc[:content] = "IN UR SOLR MESSIN UP UR CONTENT!#{doc[:content]}" end # Add it back in to Solr solr.add(docs) solr.commit end Scott On Thu, Sep 16, 2010 at 2:27 PM, Shashi Kant wrote: > > Start with a *:*, then the “numFound” attribute of the > element should give you the rows to fetch by a 2nd request. > > > On Thu, Sep 16, 2010 at 4:49 PM, Christopher Gross wrote: > > That will stil just return 10 rows for me. Is there something else in > > the configuration of solr to have it return all the rows in the > > results? > > > > -- Chris > > > > > > > > On Thu, Sep 16, 2010 at 4:43 PM, Shashi Kant wrote: > >> q=*:* > >> > >> On Thu, Sep 16, 2010 at 4:39 PM, Christopher Gross > >> wrote: > >>> I have some queries that I'm running against a solr instance (older, > >>> 1.2 I believe), and I would like to get *all* the results back (and > >>> not have to put an absurdly large number as a part of the rows > >>> parameter). > >>> > >>> Is there a way that I can do that? Any help would be appreciated. > >>> > >>> -- Chris > >>> > >> > >
Re: Get all results from a solr query
Start with a *:*, then the “numFound” attribute of the element should give you the rows to fetch by a 2nd request. On Thu, Sep 16, 2010 at 4:49 PM, Christopher Gross wrote: > That will stil just return 10 rows for me. Is there something else in > the configuration of solr to have it return all the rows in the > results? > > -- Chris > > > > On Thu, Sep 16, 2010 at 4:43 PM, Shashi Kant wrote: >> q=*:* >> >> On Thu, Sep 16, 2010 at 4:39 PM, Christopher Gross wrote: >>> I have some queries that I'm running against a solr instance (older, >>> 1.2 I believe), and I would like to get *all* the results back (and >>> not have to put an absurdly large number as a part of the rows >>> parameter). >>> >>> Is there a way that I can do that? Any help would be appreciated. >>> >>> -- Chris >>> >> >
RE: Re: Get all results from a solr query
Not according to the wiki; http://wiki.apache.org/solr/CommonQueryParameters#rows But you could always create an issue for this one. -Original message- From: Christopher Gross Sent: Thu 16-09-2010 22:50 To: solr-user@lucene.apache.org; Subject: Re: Get all results from a solr query That will stil just return 10 rows for me. Is there something else in the configuration of solr to have it return all the rows in the results? -- Chris On Thu, Sep 16, 2010 at 4:43 PM, Shashi Kant wrote: > q=*:* > > On Thu, Sep 16, 2010 at 4:39 PM, Christopher Gross wrote: >> I have some queries that I'm running against a solr instance (older, >> 1.2 I believe), and I would like to get *all* the results back (and >> not have to put an absurdly large number as a part of the rows >> parameter). >> >> Is there a way that I can do that? Any help would be appreciated. >> >> -- Chris >> >
Re: Get all results from a solr query
That will stil just return 10 rows for me. Is there something else in the configuration of solr to have it return all the rows in the results? -- Chris On Thu, Sep 16, 2010 at 4:43 PM, Shashi Kant wrote: > q=*:* > > On Thu, Sep 16, 2010 at 4:39 PM, Christopher Gross wrote: >> I have some queries that I'm running against a solr instance (older, >> 1.2 I believe), and I would like to get *all* the results back (and >> not have to put an absurdly large number as a part of the rows >> parameter). >> >> Is there a way that I can do that? Any help would be appreciated. >> >> -- Chris >> >
Re: Get all results from a solr query
q=*:* On Thu, Sep 16, 2010 at 4:39 PM, Christopher Gross wrote: > I have some queries that I'm running against a solr instance (older, > 1.2 I believe), and I would like to get *all* the results back (and > not have to put an absurdly large number as a part of the rows > parameter). > > Is there a way that I can do that? Any help would be appreciated. > > -- Chris >