My big question is how do you loop 1M records, sum up field(s), and then sort on that field... all in memory (could use too much ram) ? In a temporary index (could take a while to re-write a lot of documents in a new index) ?
- Mike aka...@gmail.com On Thu, Apr 1, 2010 at 5:31 PM, Chris Lu <chris...@gmail.com> wrote: > Thanks. Not really trying to sell DBSight here since most people here are > Lucene experts. > Just to confirm that this "challenge" has been done via Lucene for quite a > while. > > The technique for it is very similar to how facet search is done, which has > several ways also. > Million's of rows are not really "that" big when everything is properly > warmed up. > > > -- > Chris Lu > ------------------------- > Instant Scalable Full-Text Search On Any Database/Application > site: http://www.dbsight.net > demo: http://search.dbsight.com > Lucene Database Search in 3 minutes: > http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes > DBSight customer, a shopping comparison site, (anonymous per request) got > 2.6 Million Euro funding! > > > > Michel Nadeau wrote: > >> I'm sure the DBSight feature is great, but we already have a system in >> place >> and we're not throwing it away -- it's closely integrated with our whole >> platform. We're way past the point to switch our solution to DBSight. >> We'd >> be more than happy to use the DBSight feature if it would be opensource >> but >> unfortunately it's not - so we won't even consider it. >> >> Chris: are you a developer at DBSight? Can you tell us more about how it >> works? Because I don't really see how it can be "fast" when dealing with >> millions of records... as it has to loop through them, compute, store >> everything (in a temp index? memory?) and then re-sort. >> >> - Mike >> aka...@gmail.com >> >> >> On Thu, Apr 1, 2010 at 5:02 PM, Chris Lu <chris...@gmail.com> wrote: >> >> >> >>> For DBSight, the aggregated values are computed during run time. >>> And the sorting on the computed aggregated values are done when >>> displaying >>> the results. >>> >>> The idea is, after the aggregation, the number of aggregated values are >>> much much smaller. >>> >>> >>> -- >>> Chris Lu >>> ------------------------- >>> Instant Scalable Full-Text Search On Any Database/Application >>> site: http://www.dbsight.net >>> demo: http://search.dbsight.com >>> Lucene Database Search in 3 minutes: >>> >>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes >>> DBSight customer, a shopping comparison site, (anonymous per request) got >>> 2.6 Million Euro funding! >>> >>> >>> prasenjit mukherjee wrote: >>> >>> >>> >>>> On Fri, Apr 2, 2010 at 12:54 AM, Chris Lu <chris...@gmail.com> wrote: >>>> >>>> >>>> >>>> >>>>> No need for Hadoop. It's even more slower. Lucene can do it easily. >>>>> >>>>> This has been implemented in DBSight. >>>>> The implementation is very similar to Facet search. Just need a way to >>>>> load >>>>> the field quickly, like put it in memory or some data structure, and >>>>> count >>>>> the sum/min/max during searching. >>>>> >>>>> >>>>> >>>>> >>>> This will ONLY compute the aggregated value ( sum,count,min,max etc. >>>> ). I guess what Mike wants is use the aggregated value to sort the >>>> entries. Dynamically maintaining a sorted list while searching could >>>> be extremely expensive. >>>> >>>> >>>> >>>> >>>> >>>> >>>>> -- >>>>> Chris Lu >>>>> ------------------------- >>>>> Instant Scalable Full-Text Search On Any Database/Application >>>>> site: http://www.dbsight.net >>>>> demo: http://search.dbsight.com >>>>> Lucene Database Search in 3 minutes: >>>>> >>>>> >>>>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes >>>>> DBSight customer, a shopping comparison site, (anonymous per request) >>>>> got >>>>> 2.6 Million Euro funding! >>>>> >>>>> >>>>> prasenjit mukherjee wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> This looks like a use case more suited for Pig ( over Hadoop ). >>>>>> >>>>>> It could be difficult for lucene to do sort and sum simultaneously as >>>>>> sorting itself depends upon summed value. >>>>>> >>>>>> On Thu, Apr 1, 2010 at 11:47 PM, Michel Nadeau <aka...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Well that's my problem: we have a lot of records of all types >>>>>>> (afiiliates, >>>>>>> sales) so looping tons of records each time isn't possible. >>>>>>> >>>>>>> - Mike >>>>>>> aka...@gmail.com >>>>>>> >>>>>>> >>>>>>> On Thu, Apr 1, 2010 at 2:11 PM, prasenjit mukherjee >>>>>>> <prasen....@gmail.com>wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >>>> >>>> >>>> >>> >> >> >