Hello Val

First of all thx for this answer. Let me explain our use case.

*What we are doing*

Our company is a monitoring solution for machines in the manufacturing
industry. We have a hardware logger attached to each machine wich collects
up to 6 different metrics (  like power, piece count ). These metrics are
sampled on a per second basis and sent to our cloud every minute. Data is
currently stored in a cassandra cluster.

*For the math of that *

One metric will generate about 33 million data points per year meaning all
six metrics will cause a total of 100 million data points per machine /
year. Lets say we have about 2000 machines out there its very obvious that
we are talking about terra bytes of metric data.

*The goal*

We need to do some analytics on this data to provide reports for our
customers. Therefore we need to do all kind of transformations, filtering
and joining on that data. We also need support for secondary indexes and
grouping! This was the reason we chose spark for this kind of job. We want
to speed up the spark calculations with Ignite to provide a better
experience for our customers.

My idea was to use Ignite as a read through cache to our cassandra cluster
and combining this with Spark SQL. The data for the calculation should only
stay in the cache during the calculations and can easily be discared
afterwards.


Now i need some information how to setup my cluster correctly for that use
case. I don't know how many nodes i need and how much GB of RAM and if i
should put my ignite nodes on the spark workers or create a separate
cluster. I need this information for cost estimates.

Hope that helps a bit

Thx










2017-09-22 5:12 GMT+02:00 Valentin Kulichenko <valentin.kuliche...@gmail.com
>:

> Hello Patrick,
>
> See my comments below.
>
> Most of your questions don't have a generic answer and would heavily
> depend on your use case. Would you mind giving some more details about it
> so that I can give more specific suggestions?
>
> -Val
>
> On Thu, Sep 21, 2017 at 8:24 AM, Patrick Brunmayr <
> patrick.brunm...@kpibench.com> wrote:
>
>> Hello
>>
>>
>>    - What is currently the best practice of deploying Ignite with Spark ?
>>
>>
>>    - Should the Ignite node sit on the same machine as the Spark
>>    executor ?
>>
>>
> Ignite can run either on same boxes where Spark runs, or as a separate
> cluster, and both approaches have their pros and cons.
>
>
>> According to this documentation
>> <https://spark.apache.org/docs/latest/hardware-provisioning.html> Spark
>> should be given 75% of machine memory but what is left for Ignite then ?
>>
>> In general, Spark can run well with anywhere from *8 GB to hundreds of
>>> gigabytes* of memory per machine. In all cases, we recommend allocating
>>> only at most 75% of the memory for Spark; leave the rest for the operating
>>> system and buffer cache.
>>
>>
> Documentation states that you should give *at most* 75% to make sure OS
> has a safe cushion for its own purposes. If Ignite runs along with Spark,
> amount of memory allocated to Spark should be less then that maximum of
> course.
>
>
>>
>>    - Don't they battle for memory ?
>>
>>
> You should configure both Spark and Ignite so that they never try to
> consume more memory than physically available, also leaving some for OS.
> This way there will be no conflict.
>
>>
>>    -
>>    - Should i give the memory to Ignite or Spark ?
>>
>>
> Again, this heavily depends on use case and on how heavily you use both
> Spark and Ignite.
>
>
>>    -
>>    - Would Spark even benefit from Ignite if the Ignite nodes would be
>>    hostet on other machines ?
>>
>>
> There are definitely use cases when this can be useful. Although in others
> it is better to run Ignite separately.
>
>
>>    -
>>
>>
>> We are currently having hundress of GB for analytics and we want to use
>> ignite to speed up things up.
>>
>> Thank you
>>
>>
>>
>>
>>
>>
>>
>>
>

Reply via email to