On Mon, Jul 10, 2017 at 6:15 AM, Dylan Hutchison <dhutc...@cs.washington.edu> wrote: > You might be able to take a batched approach, using server-side iterators > to gather as many S's from POS rows as possible at each tablet server up to > a memory budget, and then querying the SPO table from inside those > iterators. (With some caution to be mindful of tablet server thread > limits, you can scan another table from inside a server-side iterator.) > This likely has the effect of querying the same SPO data multiple times, > which may or may not be acceptable. > > Another alternative is a MapReduce job. > > By the way, you don't necessarily need to sort the S's in order to query > the SPO table. It depends on how you do the query, such as by providing a > collection of ranges to a Scanner / BatchScanner or doing server-side > filtering.
+1 to that. Dropping the requirement to get a sorted list of subjects for some pair P-O would make a server-side filter much easier. You can also play tricks like doing a "limited" deduplication server-side. You can hold up to N subjects server-side to avoid running out of memory, and then perform a final deduplication client-side. > Cheers, Dylan > > On Thu, Jul 6, 2017 at 3:05 AM, damodaram.sunda...@harman.com < > damodaram.sunda...@harman.com> wrote: > >> Thanks for your reply Dylan. >> >> *Are your range queries *small enough to fit in memory*?* Not likely, >> because given condition on POS table might result few hundred thousands as >> I >> am talking about my table would be 100M. Hence, I might not be able to >> store >> them in the memory to the Sorting and I might end up getting memory issues. >> >> My tables are built with RowIds as POS in it and not on the column family >> as I am looking at each cell of my relational data into a single Row at >> accumulo. >> >> The 'S values' will be used to query the SPO table with prefix filter on S, >> which is stored (Subject|Predicate|Object). If my subjects are in the >> sorted >> order then I would not need to put much effort while querying with "List of >> Order Set of Subjects". >> >> >> >> -- >> View this message in context: http://apache-accumulo. >> 1065345.n5.nabble.com/Sorted-RowId-suffix-retrieval-using- >> Server-Side-Iterators-tp21787p21791.html >> Sent from the Developers mailing list archive at Nabble.com. >>