Re: [Architecture] [C5] Spark/Lucene Integration in Stream Processor

Sachith Withana Sun, 23 Oct 2016 21:34:20 -0700

Hi all,

We will have to pack the Spark and Solr connectors to the product anyways.
But it's optional to pack the binaries as you have mentioned.
It makes sense to have them as separate downloadables.


But, for the internal product analytics, if we are to include the binaries
as well, it could lead to a bigger pack ( as Spark alone is ~190 MB).
And when we ship the products with internal analytics, the total pack size
can become huge.

But if we separate the binaries from the internal analytics, it would not
play well with the story of analytics working out of the box.

In that case, we'll have to have analytics for the products as separate
downloadables.

Thanks,
Sachith


On Sun, Oct 23, 2016 at 8:18 PM, Niranda Perera <nira...@wso2.com> wrote:

> +1 for this approach. This would be a very cleaner way to integrate with
> Spark.
> So, now rather than trying to customize spark to work with our own
> clustering, we can focus on a more generic approach and then may be
> contribute to the community as well!
>
> @Nirmal & Suho,
> I still think we would need spark binaries in the runtime. It's just that
> we would not have to meddle with the internals of spark clustering etc,
> which we are handling internally at the moment.
>
> On Sat, Oct 22, 2016 at 2:48 PM, Sriskandarajah Suhothayan <s...@wso2.com>
> wrote:
>
>>
>>
>> On Sat, Oct 22, 2016 at 10:45 AM, Nirmal Fernando <nir...@wso2.com>
>> wrote:
>>
>>>
>>>
>>> On Fri, Oct 21, 2016 at 2:00 PM, Anjana Fernando <anj...@wso2.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> So we are starting on porting the earlier DAS specific functionality to
>>>> C5. And with this, we are planning on not embedding the Spark server
>>>> functionality to the primary binary itself, but rather run it separately as
>>>> another script in the same distribution. So basically, when running the
>>>> server in the standalone mode, from a centralized script, we will start
>>>> Spark processes and then the main stream processor server. And in a
>>>> clustered setup, we will start the Spark processes separately, and do the
>>>> clustering that is native to it, which is currently by integrating with
>>>> ZooKeeper.
>>>>
>>>
>>> Does this mean we still keep Spark binaries inside Stream Processor? If
>>> not how are we planning to start a Spark process from Stream Processor?
>>>
>>
>> We don't need to have Spark binaries in Stream Processor and I believe
>> its wrong as its not the core functionality of that. But when it comes to
>> Product Analytics we may ship that. We need to decide on that.
>>
>>
>>>> So basically, for the minimum H/A setup, we would need two stream
>>>> processing nodes and also ZK to build up the cluster, if we are using Spark
>>>> also. So with C5, since we are not anyway not using Hazelcast, for other
>>>> general coordination operations also we can use ZK, since it is already a
>>>> requirement for Spark. And we have the added benefit of not getting the
>>>> issues that comes with a peer-to-peer coordination library, such as split
>>>> brain scenarios.
>>>>
>>>> Also, aligning with the above approach, we are considering of directly
>>>> integrate to Solr in running in external to stream processor, rather than
>>>> doing the indexing in the embedded mode. Now also in DAS, we have a
>>>> separate indexing mode (profile), so rather than using that, we can use
>>>> Solr directly. So one of the main reasons for using this would be, it has
>>>> additional functionality to base Lucene, where it comes OOTB functionality
>>>> with aggregates etc.. which at the moment, we don't have full
>>>> functionality. So the suggestion is, Solr will also come as a separate
>>>> profile (script) with the distribution, and this will be started up if the
>>>> indexing scenarios are required for the stream processor, which we can
>>>> automatically start it up or selectively start it. Also, Solr clustering is
>>>> also done with ZK, which we will anyway have with the new Spark clustering
>>>> approach we are using.
>>>>
>>>> So the aim of getting out the non-WSO2 specific servers without
>>>> embedding is, the simplicity it provides in our codebase, since we do not
>>>> have to maintain the integration code that is required to embed it, and
>>>> those servers can use its own recommended deployment patterns. For example,
>>>> Spark isn't designed to be embedded in to other servers, so we had to mess
>>>> around with some things to embed and cluster it internally. And also,
>>>> upgrading dependencies such as that becomes very straightforward, since
>>>> it's external to the base binary.
>>>>
>>>> Cheers,
>>>> Anjana.
>>>> --
>>>> *Anjana Fernando*
>>>> Associate Director / Architect
>>>> WSO2 Inc. | http://wso2.com
>>>> lean . enterprise . middleware
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Thanks & regards,
>>> Nirmal
>>>
>>> Team Lead - WSO2 Machine Learner
>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>> Mobile: +94715779733
>>> Blog: http://nirmalfdo.blogspot.com/
>>>
>>>
>>>
>>
>>
>> --
>>
>> *S. Suhothayan*
>> Associate Director / Architect & Team Lead of WSO2 Complex Event
>> Processor
>> *WSO2 Inc. *http://wso2.com
>> * <http://wso2.com/>*
>> lean . enterprise . middleware
>>
>>
>> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog:
>> http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/>twitter:
>> http://twitter.com/suhothayan <http://twitter.com/suhothayan> | linked-in:
>> http://lk.linkedin.com/in/suhothayan <http://lk.linkedin.com/in/suhothayan>*
>>
>> _______________________________________________
>> Architecture mailing list
>> Architecture@wso2.org
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
>
> --
> *Niranda Perera*
> Software Engineer, WSO2 Inc.
> Mobile: +94-71-554-8430
> Twitter: @n1r44 <https://twitter.com/N1R44>
> https://pythagoreanscript.wordpress.com/
>
> _______________________________________________
> Architecture mailing list
> Architecture@wso2.org
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
Sachith Withana
Software Engineer; WSO2 Inc.; http://wso2.com
E-mail: sachith AT wso2.com
M: +94715518127
Linked-In: <http://goog_416592669>https://lk.linkedin.com/in/sachithwithana

_______________________________________________
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] [C5] Spark/Lucene Integration in Stream Processor

Reply via email to