Re: PRAM Distributed Sorting

Atri Sharma Tue, 14 Jul 2015 08:04:09 -0700

While I do agree with you in principle, I am not sure about the startup
costs and node transfer costs.


This is pretty experimental so I might be re inventing the wheel :)
On 14 Jul 2015 19:18, "Gianfranco Murador" <[email protected]>
wrote:

> I believe that an "reduce" function is is appropriate for this type of task
> and is generic enough to sort by any criteria.
> Maybe I'm wrong, but that's just my opinion.
> Regards,
>   Gianfranco
>
> 2015-07-14 15:11 GMT+02:00 Atri Sharma <[email protected]>:
>
> > So, consider a relational database, like postgres. A major component of
> > sorting performance comes from the in memory sorting that happens for
> this
> > case. Normally, something like an external sort would be used in
> > conjugation with the disk files. However, a big data analytical
> production
> > use case has this requirement that the available memory to postgres for
> > sorting is pretty huge *but* so is the data and the response time has to
> be
> > really fast and oh, the data has to be streamed from the database given
> > certain events.
> >
> > So what I was thinking was on these lines:
> >
> > 1) Add a sorting module to the engine.
> > 2) Allow the sorting module to get the data streamed through data
> > streamers.
> > 3) Give sorting module access to the cache.
> > 4) Make a sort API which can be used by an external engine to chunk sort
> > into ignite, using streamers to stream data and distribute sort across
> > multiple threads, and give sorted results back.
> >
> > Note : This is actually more of a use case for Ignite. The reasons I
> > proposed adding it to core were: 1) Since direct interaction with data
> > streamer and cache is needed. 2) It would be a good use case demo. 3) It
> > might allow Ignite to be used as a pure play sorting engine thus allowing
> > existing databases to work with it.
> >
> > Thoughts?
> >
> > On Tue, Jul 14, 2015 at 4:49 PM, Gianfranco Murador <
> > [email protected]> wrote:
> >
> > > I would  say that in case of a distributed algorithm complexity lies
> not
> > > only in the number of input data, but also and, more, in the number of
> > > messages exchanged between nodes to achieve the result.
> > > I  agree to maintain a certain principle of locality for related data,
> or
> > > leave this task  to a system that already has a data model suitable to
> > > scale sorting ( RDBMS  ? ).
> > > Regards,
> > > Gianfranco
> > >
> > >
> > > 2015-07-14 12:14 GMT+02:00 Atri Sharma <[email protected]>:
> > >
> > > > Hi Roman,
> > > >
> > > > On Tue, Jul 14, 2015 at 12:32 AM, Roman Shaposhnik <
> > [email protected]
> > > >
> > > > wrote:
> > > >
> > > > > On Sun, Jul 12, 2015 at 11:41 PM, Atri Sharma <[email protected]
> >
> > > > wrote:
> > > > >
> > > > >
> > > > > What's the interconnect for this system?
> > > > >
> > > >
> > > > Not sure I got what you meant here.
> > > >
> > > >
> > > > --
> > > > Regards,
> > > >
> > > > Atri
> > > > *l'apprenant*
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > Atri
> > *l'apprenant*
> >
>

Re: PRAM Distributed Sorting

Reply via email to