Joel,  Is this ticket an attempt to solve that ? SOLR-7560

On Wed, May 20, 2015 at 11:08 PM, Joel Bernstein <[email protected]> wrote:

> The Streaming Expressions language is a DSL to process docs and emit
> processed data. The parallel SQL engine will also fit into this category.
> Both of these languages compile to the Streaming API which is basically a
> real-time map-reduce framework that runs on SolrCloud worker nodes.
>
> The Streaming API has excellent data locality for a Map-Reduce engine
> because it performs the map stage and sorting and partitioning of result
> sets inside of Solr before tuples are streamed.  Sorted and partitioned
> tuples are then sent directly to the correct worker nodes to be reduced.
> The Streaming API doesn't follow a strict map/reduce model though. Streams
> are merged and manipulated by wrapping decorator streams around each other.
> So the streaming API is much more flexible then old style map/reduce.
>
> But the Streaming API is not designed for parallel iterative algorithms
> like gradient descent. For the parallel iterative case it's much faster to
> leave the data in place and run embedded algorithm inside of the Solr.
>
>
>
>
>
> At this point data must cross the network if you have multiple worker
> nodes.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, May 20, 2015 at 5:57 PM, Noble Paul <[email protected]> wrote:
>
>>
>>
>> On Wed, May 20, 2015 at 10:17 PM, Yonik Seeley <[email protected]> wrote:
>>
>>> On Wed, May 20, 2015 at 12:04 PM, Noble Paul <[email protected]>
>>> wrote:
>>> >
>>> > On Wed, May 20, 2015 at 8:41 PM, Yonik Seeley <[email protected]>
>>> wrote:
>>> >>
>>> >> On Wed, May 20, 2015 at 11:06 AM, Noble Paul <[email protected]>
>>> wrote:
>>> >> > The problem with streaming is data locality. Data needs to be
>>> >> > transferred
>>> >> > across network to do the processing
>>> >>
>>> >> Nothing saying that you can't process data before it's streamed out,
>>> >> right?
>>> >
>>> > yes, if our query language is expressive enough . Sometimes you need a
>>> > little programming language to achieve that
>>>
>>> Right - and different languages can go on top of the base streaming
>>> stuff... either before or after the streaming step.
>>> There's no reason we can't stream derived data - it doesn't need to be
>>> just documents.
>>>
>> Yes, but is there away to do it now? If we can have a DSL which can do
>> process docs and emit the processed data , then the streaming API may be
>> able to do without data locality .
>>
>> I guess the streaming API run as a standalone program. can it not be
>> running soemwhere in the Solr cluster itself?
>>
>>>
>>> -Yonik
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>
>>
>> --
>> -----------------------------------------------------
>> Noble Paul
>>
>
>


-- 
-----------------------------------------------------
Noble Paul

Reply via email to