Re: [Maria-developers] Extending storage engine API for random-row extraction for histogram collection (and others)

Sergei Golubchik Tue, 11 Dec 2018 05:02:32 -0800

Hi, Vicențiu!

On Dec 11, Vicențiu Ciorbaru wrote:
> On Tue, 11 Dec 2018 at 14:33 Sergei Golubchik <s...@mariadb.org> wrote:
> >
> > But then I was thinking, why do you need to specify an index at all?
> > Shouldn't it be just "get me a random row"? Index or whatever -
> > that's engine implementation detail. For example, MyISAM with a
> > fixed-size rows can just read from
> > lseek(floor((file_size/row_size)*rand())*row_size).
> 
> I agree that the need for an index seems a bit much. My reasoning was
> that I wanted to allow random sampling on a particular range. This
> could help for example when one wants to collect histograms for a
> multi-distribution dataset, to get individual distributions (if the
> indexed column is able to separate them).
> 
> A more generic idea would be if one could pass some conditions for
> random row retrieval to the storage engine, but it feels like this
> would complicate storage engine implementation by quite a bit.
> 
> For the first iteration, after considering your input, I'd go with
> "init function", "get random row", "end function", without imposing an
> index, but somehow passing a (COND or similar) arg to the init
> function.


For the first iteration I'd go without a condition. You, probably
shouldn't add an API that you won't use, and in the first iteration you
won't use it, right? It can be added later when needed.

Regards,
Sergei

_______________________________________________
Mailing list: https://launchpad.net/~maria-developers
Post to     : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-developers] Extending storage engine API for random-row extraction for histogram collection (and others)

Reply via email to