Re: [Numpy-discussion] Fast Access to Container of Numpy Arrays on Disk?

Stephan Hoyer Thu, 14 Jan 2016 15:38:15 -0800

On Thu, Jan 14, 2016 at 2:30 PM, Nathaniel Smith <n...@pobox.com> wrote:


> The reason I didn't suggest dask is that I had the impression that
> dask's model is better suited to bulk/streaming computations with
> vectorized semantics ("do the same thing to lots of data" kinds of
> problems, basically), whereas it sounded like the OP's algorithm
> needed lots of one-off unpredictable random access.
>
> Obviously even if this is true then it's useful to point out both
> because the OP's problem might turn out to be a better fit for dask's
> model than they indicated -- the post is somewhat vague :-).
>
> But, I just wanted to check, is the above a good characterization of
> dask's strengths/applicability?
>

Yes, dask is definitely designed around setting up a large streaming
computation and then executing it all at once.

But it is pretty flexible in terms of what those specific computations are,
and can also work for non-vectorized computation (especially via dask
imperative). It's worth taking a look at dask's collections for a sense of
what it can do here. The recently refreshed docs provide a nice overview:
http://dask.pydata.org/

Cheers,
Stephan

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Fast Access to Container of Numpy Arrays on Disk?

Reply via email to