While I do agree with you in principle, I am not sure about the startup costs and node transfer costs.
This is pretty experimental so I might be re inventing the wheel :) On 14 Jul 2015 19:18, "Gianfranco Murador" <[email protected]> wrote: > I believe that an "reduce" function is is appropriate for this type of task > and is generic enough to sort by any criteria. > Maybe I'm wrong, but that's just my opinion. > Regards, > Gianfranco > > 2015-07-14 15:11 GMT+02:00 Atri Sharma <[email protected]>: > > > So, consider a relational database, like postgres. A major component of > > sorting performance comes from the in memory sorting that happens for > this > > case. Normally, something like an external sort would be used in > > conjugation with the disk files. However, a big data analytical > production > > use case has this requirement that the available memory to postgres for > > sorting is pretty huge *but* so is the data and the response time has to > be > > really fast and oh, the data has to be streamed from the database given > > certain events. > > > > So what I was thinking was on these lines: > > > > 1) Add a sorting module to the engine. > > 2) Allow the sorting module to get the data streamed through data > > streamers. > > 3) Give sorting module access to the cache. > > 4) Make a sort API which can be used by an external engine to chunk sort > > into ignite, using streamers to stream data and distribute sort across > > multiple threads, and give sorted results back. > > > > Note : This is actually more of a use case for Ignite. The reasons I > > proposed adding it to core were: 1) Since direct interaction with data > > streamer and cache is needed. 2) It would be a good use case demo. 3) It > > might allow Ignite to be used as a pure play sorting engine thus allowing > > existing databases to work with it. > > > > Thoughts? > > > > On Tue, Jul 14, 2015 at 4:49 PM, Gianfranco Murador < > > [email protected]> wrote: > > > > > I would say that in case of a distributed algorithm complexity lies > not > > > only in the number of input data, but also and, more, in the number of > > > messages exchanged between nodes to achieve the result. > > > I agree to maintain a certain principle of locality for related data, > or > > > leave this task to a system that already has a data model suitable to > > > scale sorting ( RDBMS ? ). > > > Regards, > > > Gianfranco > > > > > > > > > 2015-07-14 12:14 GMT+02:00 Atri Sharma <[email protected]>: > > > > > > > Hi Roman, > > > > > > > > On Tue, Jul 14, 2015 at 12:32 AM, Roman Shaposhnik < > > [email protected] > > > > > > > > wrote: > > > > > > > > > On Sun, Jul 12, 2015 at 11:41 PM, Atri Sharma <[email protected] > > > > > > wrote: > > > > > > > > > > > > > > > What's the interconnect for this system? > > > > > > > > > > > > > Not sure I got what you meant here. > > > > > > > > > > > > -- > > > > Regards, > > > > > > > > Atri > > > > *l'apprenant* > > > > > > > > > > > > > > > -- > > Regards, > > > > Atri > > *l'apprenant* > > >
