Ignite and spark for deep learning

2019-01-02 Thread mehdi sey
Hi. two platforms (spark and ignite) use in memory for computing. instead of
loading data into ignite catch we also can loading data to spark memory and
catch it on spark node. if we could do  this (catching on spark node) why we
load data to ignite catch?. loading to ignite catch have benefit only for
sharing rdd between spark jobs ad indexing query index.i want to integrate
spark and ignite for deep learning platform. I want to use DL4J (deep
learning 4 java) as platform for deep learning. I want to use dl4j in spark
node and integrate spark node with ignite. Is there any speed up in this
idea,?if i want to use ignite i can use ignite only for cache data for
spark?or i can use spark and ignite as an engeen process simultaneously?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite Persistence

2019-01-02 Thread Denis Magda
Hello,

Please try to apply and consider generic optimization techniques for the
persistence:
https://apacheignite.readme.io/docs/durable-memory-tuning

In the meantime:

   - Try to keep investigating the cause of the GC pause unless you 100%
   sure it's caused by rebalancing
   - Increase IgniteConfiguration.failureDetectionTimeout to 20 seconds to
   prevent nodes shut down on long GC pauses
   - Have you tuned up your JVM settings?
   https://apacheignite.readme.io/docs/jvm-and-system-tuning

As for FSYNC vs LOG_ONLY, the former protects you from global cluster
outages when all the nodes go down at one time. If it's an unlikely event
situation that it's ok to relax the mode to LOG_ONLY as long as you have
backups copies on other nodes.

--
Denis

On Tue, Jan 1, 2019 at 8:23 PM Ignite Enthusiast 
wrote:

> Question on Ignite Persistence:
>
> On a deployed Ignite (3 node) cluster, I see one one node being taken out
> of the cluster because it encounters GC Pauses. Worse, when this node
> leaves the cluster, a Rebalance is initiated (and re-initiated when the
> node joins back).
>
> Note: Data that Ignite Cluster holds is fully transactional. We cannot put
> up with Data Loss.
>
> From the logs :
>
> [14:32:01,643][INFO][wal-file-archiver%null-#44][FsyncModeFileWriteAheadLogManager]
> Copied file
> [src=/data2/data/wal/node00-8d707f27-d022-4237-85cf-28c36828a0a3/0006.wal,
> dst=/data2/data/wal/archive/node00-8d707f27-d022-4237-85cf-28c36828a0a3/0306.wal]
>
> [14:32:02,830][INFO][wal-file-archiver%null-#44][FsyncModeFileWriteAheadLogManager]
> Starting to copy WAL segment [absIdx=307, segIdx=7,
> origFile=/data2/data/wal/node00-8d707f27-d022-4237-85cf-28c36828a0a3/0007.wal,
> dstFile=/data2/data/wal/archive/node00-8d707f27-d022-4237-85cf-28c36828a0a3/0307.wal]
>
> [14:32:17,999][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible
> too long JVM pause: 15044 milliseconds.
>
> It is clear that WAL writes (FSYNC in this case) always precede GC Pauses.
>
> Question:
>
> The only advantage of FSYNC Vs LOG_ONLY seems to be surviving OS Level
> Crashes. With a Journaled filesystem like Ext4FS, do I really need FSYNC?
> Can't I get around with LOG_ONLY ?
>
> If not, how do I minimise the perf bottlenecks using FSYNC ?
>


Re: Remote debugging

2019-01-02 Thread Denis Magda
Check the following pages:

   - Ignite tooling: https://apacheignite-sql.readme.io/docs/sql-tooling
   - Performance and debugging:
   https://apacheignite-sql.readme.io/docs/performance-and-debugging
   - Ignite console: console.gridgain.com

--
Denis

On Wed, Jan 2, 2019 at 5:49 AM Lokesh Sharma 
wrote:

> Is there a way to query Ignite database that is deployed remotely?
>
> I use H2 Debug Console locally but this doesn't work for remote purposes.
>


Re: ignite questions

2019-01-02 Thread Denis Magda
Yes, a custom affinity function is what you need to control entries
distribution across physical machines. It's feasible to do. Worked with one
of Ignite customers who did something similar for their needs - the code is
not open sourced.

--
Denis

On Wed, Jan 2, 2019 at 10:17 AM Mikael  wrote:

> Hi!
>
> By default you cannot assign a specific affinity key to a specific node
> but I think that could be done with a custom affinity function, you can do
> pretty much whatever you want with that, for example set an attribute in
> the XML file and use that to match with a specific affinity key value, so a
> node with attribute x will be assigned all affinity keys with value y.
>
> I never tried it but I do not see any reason why it would not work.
>
> Mikael
>
>
> Den 2019-01-02 kl. 17:13, skrev Clay Teahouse:
>
> Thanks Mikael.
>
> I did come across that link before, but I am not sure it addresses my
> concern. I want to see how I need I size my physical VMs based on affinity
> keys. How would I say for India affinity key use this super size VM and for
> others use the other smaller ones, so the data doesn't get shuffled around?
> Maybe, there is no way, and I just have to wait for ignite to rebalance the
> partitions and fit things where they should be based on the affinity key.
>
> On Wed, Jan 2, 2019 at 8:32 AM Mikael  wrote:
>
>> You can find some information about capacity planning here:
>>
>> https://apacheignite.readme.io/docs/capacity-planning
>>
>> About your India example you can use affinity keys to keep data together
>> in groups to avoid network traffic.
>>
>> https://apacheignite.readme.io/docs/affinity-collocation
>>
>> Mikael
>> Den 2019-01-02 kl. 14:44, skrev Clay Teahouse:
>>
>> Thanks Naveen.
>>
>> -- Cache Groups: When would I start considering cache groups, if my
>> system is growing, and sooner or later I will have to add to my caches and
>> I need to know 1) should I starting grouping now (I'd think yes), 2) if no,
>> when, what number of caches?
>> -- Capacity Planning: So, there is no guidelines on how to size the nodes
>> and the physical storage nodes reside on? How do I make sure all the
>> related data fit the same VM? It can't be the case that I have to come up
>> with 100s of super size VMs just because I have one instance with a huge
>> set of entries. For example, if I have millions of entries for India and
>> only a few for other countries, how do I make sure all the India related
>> data fits the same VM (to avoid the network) and have the data for all the
>> small countries fit on the same VM?
>> -- Pinning the data to cache: the data pinned to on-heap cache does not
>> get evicted from the memory? I want to see if there is something similar to
>> Oracle's memory pinning.
>> -- Read through: How do I know if something on cache or disk (using
>> native persistence)?
>> 5) Service chaining: Is there an example of service chaining that you can
>> point me to?
>>
>> 6) How do I implement service pipelining in apache ignite? Would
>> continuous query be the mechanism? Any examples?
>>
>> 7) Streaming: Are there examples on how to define watermarks, i.e., input
>> completeness with regard to the event timestamp?
>>
>> thank you
>> Clay
>>
>> On Tue, Jan 1, 2019 at 11:29 PM Naveen  wrote:
>>
>>> Hello
>>> Couple of things I would like to with my experience
>>>
>>> 1. Cache Groups : Around 100 caches, I do not think we need to go for
>>> Cache
>>> groups, as you mentioned cache groups will have impact on you
>>> read/writes.
>>> However, changing the partition count to 128 from default 1024 would
>>> improve
>>> your cluster restart.
>>>
>>> 2. I doubt if Ignite has any settings we have for this.
>>>
>>> 3. The only I can think of is to keep the data in on-heap if the data
>>> size
>>> is not so huge.
>>>
>>> 4. Read through, with native persistence enabled, doing a read to the
>>> disk
>>> will load the cache. But the read is much slower compared with read from
>>> RAM, by default it does not pre-load the data. If you want to avoid this
>>> you
>>> can pre-load the data programatically and load Memory, good for even SQL
>>> SELECT as well. But with the 3rd party persistence, we need to pre-load
>>> the
>>> data to make your read work for SQL SELECT.
>>>
>>> Thanks
>>> Naveen
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>>
>>


Re: Ignite cache read performance optimization

2019-01-02 Thread Denis Magda
Hello,

Please share the actual queries and execution plans for each. Use "EXPLAIN
SELECT" in Ignite instead (not H2 version).

--
Denis

On Wed, Jan 2, 2019 at 10:58 AM JayX  wrote:

> Hello!
>
> I have been working on Ignite and comparing its performance to traditional
> databases. However, the performance was not significantly better as
> expected. Any advice on how to improve the performance would be greatly
> appreciated!
>
> The dataset I have is around 10M records, stored on a cluster with 4
> machines. I'm mainly concerned with cache read performance as the service
> is
> expected to handle read requests frequently.
>
> I have been using Ignite SqlQuery which gives the best performance. The way
> I implement it is to generate a sql string from request parameters, create
> a
> SqlQuery instance with that string, use cache.query() to get a QueryCursor
> instance, iterate through the QueryCursor and obtain results. Everything is
> done following the Ignite API.
>
> The most commonly used sql string here is a simple string of conditions
> like
> "fieldA = a AND fieldB = b ...". I have also added individual indexes for
> each of the fields that are commonly queried on. The execution plan I
> observed in H2 debug console showed that the indexes were being used.
>
> I have pretty much tried out everything in this link:
> https://apacheignite.readme.io/docs/performance-tips
>   , but none of them
> seem to help significantly.
>
> I also tried to fire multiple queries on different partitions of the cache
> in parallel to increase IO throughput. However, I think this could
> potentially make other requests to the same cache starve.
>
> In addition, using ScanQuery instead of SqlQuery made execution several
> times slower...
>
> I would really hope to seek help on if I made any mistakes or if there are
> any suggestions on what I could try out and see if it results in better
> performance? Thank you very much!
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Migrate from 2.6 to 2.7

2019-01-02 Thread Denis Magda
Are you using JDBC/ODBC drivers? Just want to know why it's hard to execute
SQL queries outside of transactions.

Can you switch to pessimistic transactions instead?

--
Denis

On Wed, Jan 2, 2019 at 7:24 AM whiteman  wrote:

> Hi guys,
>
> As far as I am concerned this is a breaking behaviour. In Apache Ignite v
> 2.5 it was possible to have the SQL query inside the optimistic
> serializable
> transaction. Point here is that SQL query might not be part of the
> transaction (no guarantees) but was at least performed. In 2.7 this code
> won't work at all. The advice to move all SQL queries outside the
> transactions is in real world not possible. It would greatly increased
> complexity of the codebase. My question is if there is a switch for
> enabling
> pre 2.7 behaviour.
>
> THanks,
> Cheers,
> D.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


sql table names and annotations

2019-01-02 Thread Scott Cote
Is there a way to name a table different from the type without resorting to the 
use of DDL?   I'm using the annotation approach to creating a table, and I seem 
to be restricted to having the tables named after the type of class that was 
annotated.


Ignite cache read performance optimization

2019-01-02 Thread JayX
Hello!

I have been working on Ignite and comparing its performance to traditional
databases. However, the performance was not significantly better as
expected. Any advice on how to improve the performance would be greatly
appreciated!

The dataset I have is around 10M records, stored on a cluster with 4
machines. I'm mainly concerned with cache read performance as the service is
expected to handle read requests frequently. 

I have been using Ignite SqlQuery which gives the best performance. The way
I implement it is to generate a sql string from request parameters, create a
SqlQuery instance with that string, use cache.query() to get a QueryCursor
instance, iterate through the QueryCursor and obtain results. Everything is
done following the Ignite API.

The most commonly used sql string here is a simple string of conditions like
"fieldA = a AND fieldB = b ...". I have also added individual indexes for
each of the fields that are commonly queried on. The execution plan I
observed in H2 debug console showed that the indexes were being used.

I have pretty much tried out everything in this link: 
https://apacheignite.readme.io/docs/performance-tips
  , but none of them
seem to help significantly. 

I also tried to fire multiple queries on different partitions of the cache
in parallel to increase IO throughput. However, I think this could
potentially make other requests to the same cache starve.

In addition, using ScanQuery instead of SqlQuery made execution several
times slower...

I would really hope to seek help on if I made any mistakes or if there are
any suggestions on what I could try out and see if it results in better
performance? Thank you very much!



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: ignite questions

2019-01-02 Thread Mikael

Hi!

By default you cannot assign a specific affinity key to a specific node 
but I think that could be done with a custom affinity function, you can 
do pretty much whatever you want with that, for example set an attribute 
in the XML file and use that to match with a specific affinity key 
value, so a node with attribute x will be assigned all affinity keys 
with value y.


I never tried it but I do not see any reason why it would not work.

Mikael


Den 2019-01-02 kl. 17:13, skrev Clay Teahouse:

Thanks Mikael.

I did come across that link before, but I am not sure it addresses my 
concern. I want to see how I need I size my physical VMs based on 
affinity keys. How would I say for India affinity key use this super 
size VM and for others use the other smaller ones, so the data doesn't 
get shuffled around? Maybe, there is no way, and I just have to wait 
for ignite to rebalance the partitions and fit things where they 
should be based on the affinity key.


On Wed, Jan 2, 2019 at 8:32 AM Mikael > wrote:


You can find some information about capacity planning here:

https://apacheignite.readme.io/docs/capacity-planning

About your India example you can use affinity keys to keep data
together in groups to avoid network traffic.

https://apacheignite.readme.io/docs/affinity-collocation

Mikael

Den 2019-01-02 kl. 14:44, skrev Clay Teahouse:

Thanks Naveen.

-- Cache Groups: When would I start considering cache groups, if
my system is growing, and sooner or later I will have to add to
my caches and I need to know 1) should I starting grouping now
(I'd think yes), 2) if no, when, what number of caches?
-- Capacity Planning: So, there is no guidelines on how to size
the nodes and the physical storage nodes reside on? How do I make
sure all the related data fit the same VM? It can't be the case
that I have to come up with 100s of super size VMs just because I
have one instance with a huge set of entries. For example, if I
have millions of entries for India and only a few for other
countries, how do I make sure all the India related data fits the
same VM (to avoid the network) and have the data for all the
small countries fit on the same VM?
-- Pinning the data to cache: the data pinned to on-heap cache
does not get evicted from the memory? I want to see if there is
something similar to Oracle's memory pinning.
-- Read through: How do I know if something on cache or disk
(using native persistence)?
5) Service chaining: Is there an example of service chaining that
you can point me to?

6) How do I implement service pipelining in apache ignite? Would
continuous query be the mechanism? Any examples?

7) Streaming: Are there examples on how to define watermarks,
i.e., input completeness with regard to the event timestamp?

thank you
Clay

On Tue, Jan 1, 2019 at 11:29 PM Naveen mailto:naveen.band...@gmail.com>> wrote:

Hello
Couple of things I would like to with my experience

1. Cache Groups : Around 100 caches, I do not think we need
to go for Cache
groups, as you mentioned cache groups will have impact on you
read/writes.
However, changing the partition count to 128 from default
1024 would improve
your cluster restart.

2. I doubt if Ignite has any settings we have for this.

3. The only I can think of is to keep the data in on-heap if
the data size
is not so huge.

4. Read through, with native persistence enabled, doing a
read to the disk
will load the cache. But the read is much slower compared
with read from
RAM, by default it does not pre-load the data. If you want to
avoid this you
can pre-load the data programatically and load Memory, good
for even SQL
SELECT as well. But with the 3rd party persistence, we need
to pre-load the
data to make your read work for SQL SELECT.

Thanks
Naveen



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/



Re: Do we require to set MaxDirectMemorySize JVM parameter?

2019-01-02 Thread colinc
Thanks for the responses. To summarise:

* JVM Heap (Xmx) - Not normally used by Ignite for caching data.
* MaxDirectMemorySize - Used by Ignite for some file operations but not for
caching data. As per above, 256m is usually sufficient.
* DataRegion maxSize - Used by Ignite to determine how much memory to
allocate using some combination of durable, swap and off-heap (Java unsafe)
RAM.

In my case, I have configured swap storage
(https://apacheignite.readme.io/docs/swap-space) but *not* Ignite durable
memory. If DataRegion maxSize is say 100GB and my physical RAM is 50GB then
the swap file will be 100GB but Ignite will also use some portion (<50GB) of
the available physical RAM for off-heap cache data storage.

My question is about how to limit the size of this portion while still
allowing the DataRegion to specify a large swap file for use as overflow of
less regularly accessed data.

For example, say I wanted my node to use 8GB off-heap physical RAM and 100GB
swap file on a machine that has a total of 50GB of physical RAM (shared with
other processes).

What parameter would I need to configure 8GB? Is it possible to control
this?

Thanks,
Colin.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: ignite questions

2019-01-02 Thread Clay Teahouse
Thanks Mikael.

I did come across that link before, but I am not sure it addresses my
concern. I want to see how I need I size my physical VMs based on affinity
keys. How would I say for India affinity key use this super size VM and for
others use the other smaller ones, so the data doesn't get shuffled around?
Maybe, there is no way, and I just have to wait for ignite to rebalance the
partitions and fit things where they should be based on the affinity key.

On Wed, Jan 2, 2019 at 8:32 AM Mikael  wrote:

> You can find some information about capacity planning here:
>
> https://apacheignite.readme.io/docs/capacity-planning
>
> About your India example you can use affinity keys to keep data together
> in groups to avoid network traffic.
>
> https://apacheignite.readme.io/docs/affinity-collocation
>
> Mikael
> Den 2019-01-02 kl. 14:44, skrev Clay Teahouse:
>
> Thanks Naveen.
>
> -- Cache Groups: When would I start considering cache groups, if my system
> is growing, and sooner or later I will have to add to my caches and I need
> to know 1) should I starting grouping now (I'd think yes), 2) if no, when,
> what number of caches?
> -- Capacity Planning: So, there is no guidelines on how to size the nodes
> and the physical storage nodes reside on? How do I make sure all the
> related data fit the same VM? It can't be the case that I have to come up
> with 100s of super size VMs just because I have one instance with a huge
> set of entries. For example, if I have millions of entries for India and
> only a few for other countries, how do I make sure all the India related
> data fits the same VM (to avoid the network) and have the data for all the
> small countries fit on the same VM?
> -- Pinning the data to cache: the data pinned to on-heap cache does not
> get evicted from the memory? I want to see if there is something similar to
> Oracle's memory pinning.
> -- Read through: How do I know if something on cache or disk (using native
> persistence)?
> 5) Service chaining: Is there an example of service chaining that you can
> point me to?
>
> 6) How do I implement service pipelining in apache ignite? Would
> continuous query be the mechanism? Any examples?
>
> 7) Streaming: Are there examples on how to define watermarks, i.e., input
> completeness with regard to the event timestamp?
>
> thank you
> Clay
>
> On Tue, Jan 1, 2019 at 11:29 PM Naveen  wrote:
>
>> Hello
>> Couple of things I would like to with my experience
>>
>> 1. Cache Groups : Around 100 caches, I do not think we need to go for
>> Cache
>> groups, as you mentioned cache groups will have impact on you read/writes.
>> However, changing the partition count to 128 from default 1024 would
>> improve
>> your cluster restart.
>>
>> 2. I doubt if Ignite has any settings we have for this.
>>
>> 3. The only I can think of is to keep the data in on-heap if the data size
>> is not so huge.
>>
>> 4. Read through, with native persistence enabled, doing a read to the disk
>> will load the cache. But the read is much slower compared with read from
>> RAM, by default it does not pre-load the data. If you want to avoid this
>> you
>> can pre-load the data programatically and load Memory, good for even SQL
>> SELECT as well. But with the 3rd party persistence, we need to pre-load
>> the
>> data to make your read work for SQL SELECT.
>>
>> Thanks
>> Naveen
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>


Re: Migrate from 2.6 to 2.7

2019-01-02 Thread whiteman
Hi guys,

As far as I am concerned this is a breaking behaviour. In Apache Ignite v
2.5 it was possible to have the SQL query inside the optimistic serializable
transaction. Point here is that SQL query might not be part of the
transaction (no guarantees) but was at least performed. In 2.7 this code
won't work at all. The advice to move all SQL queries outside the
transactions is in real world not possible. It would greatly increased
complexity of the codebase. My question is if there is a switch for enabling
pre 2.7 behaviour. 

THanks,
Cheers,
D.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: ignite questions

2019-01-02 Thread Mikael

You can find some information about capacity planning here:

https://apacheignite.readme.io/docs/capacity-planning

About your India example you can use affinity keys to keep data together 
in groups to avoid network traffic.


https://apacheignite.readme.io/docs/affinity-collocation

Mikael

Den 2019-01-02 kl. 14:44, skrev Clay Teahouse:

Thanks Naveen.

-- Cache Groups: When would I start considering cache groups, if my 
system is growing, and sooner or later I will have to add to my caches 
and I need to know 1) should I starting grouping now (I'd think yes), 
2) if no, when, what number of caches?
-- Capacity Planning: So, there is no guidelines on how to size the 
nodes and the physical storage nodes reside on? How do I make sure all 
the related data fit the same VM? It can't be the case that I have to 
come up with 100s of super size VMs just because I have one instance 
with a huge set of entries. For example, if I have millions of entries 
for India and only a few for other countries, how do I make sure all 
the India related data fits the same VM (to avoid the network) and 
have the data for all the small countries fit on the same VM?
-- Pinning the data to cache: the data pinned to on-heap cache does 
not get evicted from the memory? I want to see if there is something 
similar to Oracle's memory pinning.
-- Read through: How do I know if something on cache or disk (using 
native persistence)?
5) Service chaining: Is there an example of service chaining that you 
can point me to?


6) How do I implement service pipelining in apache ignite? Would 
continuous query be the mechanism? Any examples?


7) Streaming: Are there examples on how to define watermarks, i.e., 
input completeness with regard to the event timestamp?


thank you
Clay

On Tue, Jan 1, 2019 at 11:29 PM Naveen > wrote:


Hello
Couple of things I would like to with my experience

1. Cache Groups : Around 100 caches, I do not think we need to go
for Cache
groups, as you mentioned cache groups will have impact on you
read/writes.
However, changing the partition count to 128 from default 1024
would improve
your cluster restart.

2. I doubt if Ignite has any settings we have for this.

3. The only I can think of is to keep the data in on-heap if the
data size
is not so huge.

4. Read through, with native persistence enabled, doing a read to
the disk
will load the cache. But the read is much slower compared with
read from
RAM, by default it does not pre-load the data. If you want to
avoid this you
can pre-load the data programatically and load Memory, good for
even SQL
SELECT as well. But with the 3rd party persistence, we need to
pre-load the
data to make your read work for SQL SELECT.

Thanks
Naveen



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/



Remote debugging

2019-01-02 Thread Lokesh Sharma
Is there a way to query Ignite database that is deployed remotely?

I use H2 Debug Console locally but this doesn't work for remote purposes.


Re: ignite questions

2019-01-02 Thread Clay Teahouse
Thanks Naveen.

-- Cache Groups: When would I start considering cache groups, if my system
is growing, and sooner or later I will have to add to my caches and I need
to know 1) should I starting grouping now (I'd think yes), 2) if no, when,
what number of caches?
-- Capacity Planning: So, there is no guidelines on how to size the nodes
and the physical storage nodes reside on? How do I make sure all the
related data fit the same VM? It can't be the case that I have to come up
with 100s of super size VMs just because I have one instance with a huge
set of entries. For example, if I have millions of entries for India and
only a few for other countries, how do I make sure all the India related
data fits the same VM (to avoid the network) and have the data for all the
small countries fit on the same VM?
-- Pinning the data to cache: the data pinned to on-heap cache does not get
evicted from the memory? I want to see if there is something similar to
Oracle's memory pinning.
-- Read through: How do I know if something on cache or disk (using native
persistence)?
5) Service chaining: Is there an example of service chaining that you can
point me to?

6) How do I implement service pipelining in apache ignite? Would continuous
query be the mechanism? Any examples?

7) Streaming: Are there examples on how to define watermarks, i.e., input
completeness with regard to the event timestamp?

thank you
Clay

On Tue, Jan 1, 2019 at 11:29 PM Naveen  wrote:

> Hello
> Couple of things I would like to with my experience
>
> 1. Cache Groups : Around 100 caches, I do not think we need to go for Cache
> groups, as you mentioned cache groups will have impact on you read/writes.
> However, changing the partition count to 128 from default 1024 would
> improve
> your cluster restart.
>
> 2. I doubt if Ignite has any settings we have for this.
>
> 3. The only I can think of is to keep the data in on-heap if the data size
> is not so huge.
>
> 4. Read through, with native persistence enabled, doing a read to the disk
> will load the cache. But the read is much slower compared with read from
> RAM, by default it does not pre-load the data. If you want to avoid this
> you
> can pre-load the data programatically and load Memory, good for even SQL
> SELECT as well. But with the 3rd party persistence, we need to pre-load the
> data to make your read work for SQL SELECT.
>
> Thanks
> Naveen
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: ignite-cassandra-store module has incorrect dependencies

2019-01-02 Thread Serg


Unfortunately I could not just change my pom  because we use ignite in
docker and this is a part of modules inside docker.
Of course As solution I can  build my own docker but this is not very
useful. 

Also you can check that tests of cassandra modules fails :(
https://github.com/apache/ignite/tree/master/modules/cassandra even if I
update dependencies tests can not run embedded cassandra in my environment.

I updated  cassandra driver to  3.6.0 and add netty-resolver directly in
docker and this solved this problem.





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: ignite-cassandra-store module has incorrect dependencies

2019-01-02 Thread Dmitriy Pavlov
Hi,

Apache Ignite 2.7 has an updated version for a number of dependencies. It
was done for security reasons and to remove possible vulnerable versions of
components from Apache Ignite distribution.

Probably you know: Which version of Guava is compatible with Apache
Cassandra store? Are there any new versions of Cassandra compatible with
newer Guava?

I'm not an expert in Apache Cassandra, but one thing seems to be strange
for me, why stack trace with exception starts from com.datastax, but not
from org.apache.cassandra.

As a workaround, you always can try to re-define Guava in your pom.

Sincerely,
Dmitriy Pavlov


ср, 2 янв. 2019 г. в 13:56, Serg :

> Hi All
>
> I got exceptions in ignite  after update to 2.7.0
>
> 2019-01-02 10:09:52,824 ERROR [cassandra-cache-loader-#101]
> log4j.Log4JLogger (Log4JLogger.java:586) - Failed to execute Cassandra
> loadContactsCache operation
> class org.apache.ignite.IgniteException: Failed to execute Cassandra
> loadContactsCache operation
>
> Caused by: class org.apache.ignite.IgniteException: Failed to establish
> session with Cassandra database
> at
>
> org.apache.ignite.cache.store.cassandra.session.CassandraSessionImpl.session(CassandraSessionImpl.java:586)
> at
>
> org.apache.ignite.cache.store.cassandra.session.CassandraSessionImpl.execute(CassandraSessionImpl.java:394)
> ... 6 more
> Caused by: java.lang.NoSuchMethodError:
>
> com.google.common.base.Objects.firstNonNull(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
> at
> com.datastax.driver.core.policies.Policies$Builder.build(Policies.java:285)
> at
>
> com.datastax.driver.core.Cluster$Builder.getConfiguration(Cluster.java:1246)
> at com.datastax.driver.core.Cluster.(Cluster.java:116)
> at com.datastax.driver.core.Cluster.buildFrom(Cluster.java:181)
> at
> com.datastax.driver.core.Cluster$Builder.build(Cluster.java:1264)
> at
>
> org.apache.ignite.cache.store.cassandra.session.CassandraSessionImpl.session(CassandraSessionImpl.java:581)
> ... 7 more
>
>
> Look like problem in parent dependencies which manage guava to 25. But
> cassandra-driver incompatible with this version of guava.
>
>
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


ignite-cassandra-store module has incorrect dependencies

2019-01-02 Thread Serg
Hi All 

I got exceptions in ignite  after update to 2.7.0 

2019-01-02 10:09:52,824 ERROR [cassandra-cache-loader-#101]
log4j.Log4JLogger (Log4JLogger.java:586) - Failed to execute Cassandra
loadContactsCache operation
class org.apache.ignite.IgniteException: Failed to execute Cassandra
loadContactsCache operation

Caused by: class org.apache.ignite.IgniteException: Failed to establish
session with Cassandra database
at
org.apache.ignite.cache.store.cassandra.session.CassandraSessionImpl.session(CassandraSessionImpl.java:586)
at
org.apache.ignite.cache.store.cassandra.session.CassandraSessionImpl.execute(CassandraSessionImpl.java:394)
... 6 more
Caused by: java.lang.NoSuchMethodError:
com.google.common.base.Objects.firstNonNull(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
at
com.datastax.driver.core.policies.Policies$Builder.build(Policies.java:285)
at
com.datastax.driver.core.Cluster$Builder.getConfiguration(Cluster.java:1246)
at com.datastax.driver.core.Cluster.(Cluster.java:116)
at com.datastax.driver.core.Cluster.buildFrom(Cluster.java:181)
at com.datastax.driver.core.Cluster$Builder.build(Cluster.java:1264)
at
org.apache.ignite.cache.store.cassandra.session.CassandraSessionImpl.session(CassandraSessionImpl.java:581)
... 7 more


Look like problem in parent dependencies which manage guava to 25. But
cassandra-driver incompatible with this version of guava. 







--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Loading data from Spark Cluster to Ignite Cache to perform Ignite ML

2019-01-02 Thread Stephen Darlington
Where does the data in your Spark DataFrame come from? As I understand it, that 
would all be in Spark’s memory anyway?

Anyway, I didn’t test this exact scenario, but it seems that it writing 
directly to an Ignite DataFrame should work — why did you think it wouldn’t? I 
can’t say whether it would be the most efficient way of doing it but it would 
certainly be more efficient than your code below.

ds.write()
   .outputMode("append")
   .format(IgniteDataFrameSettings.FORMAT_IGNITE())
   .option(IgniteDataFrameSettings.OPTION_CONFIG_FILE(), igniteCfgFile)
   .option(IgniteDataFrameSettings.OPTION_CREATE_TABLE_PARAMETERS(), 
"backups=1,key_type=Integer")
   .save();

As a general point for bulk-loading data, using putAll from a collection is 
more efficient than putting in a loop. You might also consider the DataStream 
API.

Regards,
Stephen

> On 26 Dec 2018, at 13:38, zaleslaw  wrote:
> 
> Hi, Igniters!
> 
> I am looking for a possibility to load data from Spark RDD or DataFrame to
> Ignite cache with next declaration IgniteCache dataCache
> to perform Ignite ML algorithms.
> 
> As I understand the current mechanism of Ignite-Spark integration helps to
> store RDD/DF from Spark in Ignite to improve performance of Spark Jobs and
> this implementation couldn't help me, am I correct?
> 
> Dou  you know how to make this small ETL more effectively? Without
> collecting data on one node like in example below?
> 
> IgniteCache cache = getCache(ignite);
> 
>SparkSession spark = SparkSession
>.builder()
>.appName("SparkForIgnite")
>.master("local")
>.config("spark.executor.instances", "2")
>.getOrCreate();
> 
>Dataset ds = ;
> 
>ds.show();
> 
>List data = ds.collectAsList(); // stupid solution
> 
>Object[] parsedRow = new Object[14];
>for (int i = 0; i < data.size(); i++) {
>for (int j = 0; j < 14; j++)
>parsedRow[j] = data.get(i).get(j);
>cache.put(i, parsedRow);
>}
> 
>spark.stop();
> 
> 
> 
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/




Re: Ignite ML withKeepBinary cache

2019-01-02 Thread Stephen Darlington
That’s a great investigation! I think the developer mailing list 
(http://apache-ignite-developers.2346864.n4.nabble.com) would be a better place 
to discuss the best way to fix it, though.

Regards,
Stephen

> On 2 Jan 2019, at 07:20, otorreno  wrote:
> 
> Hi everyone,
> 
> After the new release (2.7.0), I have been playing around with the machine
> learning algorithms a bit.
> We have some data in a cache created with the "withKeepBinary()" option, and
> I wanted
> to test if the machine learning algos would work with such a cache. I tried,
> but it fails with the following stacktrace:
> 
> org.apache.ignite.IgniteException: testType
>at
> org.apache.ignite.internal.processors.closure.GridClosureProcessor$C2.execute(GridClosureProcessor.java:1858)
>at
> org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:568)
>at
> org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6816)
>at
> org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:562)
>at
> org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:491)
>at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.ignite.binary.BinaryInvalidTypeException: testType
>at
> org.apache.ignite.internal.binary.BinaryContext.descriptorForTypeId(BinaryContext.java:707)
>at
> org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1757)
>at
> org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1716)
>at
> org.apache.ignite.internal.binary.BinaryObjectImpl.deserializeValue(BinaryObjectImpl.java:798)
>at
> org.apache.ignite.internal.binary.BinaryObjectImpl.value(BinaryObjectImpl.java:143)
>at
> org.apache.ignite.internal.processors.cache.CacheObjectUtils.unwrapBinary(CacheObjectUtils.java:177)
>at
> org.apache.ignite.internal.processors.cache.CacheObjectUtils.unwrapBinaryIfNeeded(CacheObjectUtils.java:39)
>at
> org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager$ScanQueryIterator.advance(GridCacheQueryManager.java:3063)
>at
> org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager$ScanQueryIterator.onHasNext(GridCacheQueryManager.java:2965)
>at
> org.apache.ignite.internal.util.GridCloseableIteratorAdapter.hasNextX(GridCloseableIteratorAdapter.java:53)
>at
> org.apache.ignite.internal.util.lang.GridIteratorAdapter.hasNext(GridIteratorAdapter.java:45)
>at
> org.apache.ignite.ml.dataset.impl.cache.util.ComputeUtils.computeCount(ComputeUtils.java:313)
>at
> org.apache.ignite.ml.dataset.impl.cache.util.ComputeUtils.computeCount(ComputeUtils.java:300)
>at
> org.apache.ignite.ml.dataset.impl.cache.util.ComputeUtils.lambda$initContext$9b68d858$1(ComputeUtils.java:222)
>at
> org.apache.ignite.ml.dataset.impl.cache.util.ComputeUtils.lambda$affinityCallWithRetries$b46c4136$1(ComputeUtils.java:90)
>at
> org.apache.ignite.internal.processors.closure.GridClosureProcessor$C2.execute(GridClosureProcessor.java:1855)
>... 8 common frames omitted
> Caused by: java.lang.ClassNotFoundException: testType
>at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
>at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>at java.lang.Class.forName0(Native Method)
>at java.lang.Class.forName(Class.java:348)
>at
> org.apache.ignite.internal.util.IgniteUtils.forName(IgniteUtils.java:8771)
>at
> org.apache.ignite.internal.MarshallerContextImpl.getClass(MarshallerContextImpl.java:349)
>at
> org.apache.ignite.internal.binary.BinaryContext.descriptorForTypeId(BinaryContext.java:698)
>... 23 common frames omitted
> 
> Debugging, I found the source of the error, at some point you are just
> taking the
> name of the upstreamCache (where the data resides), and creating a new
> IgniteCache
> object using such name before copying the data to a dataset cache. However,
> you
> are not using the keepBinary property of the original cache. I hardcoded the
> "withKeepBinary()" to the following lines:
> https://github.com/apache/ignite/blob/2.7.0/modules/ml/src/main/java/org/apache/ignite/ml/dataset/impl/cache/util/ComputeUtils.java#L162
> https://github.com/apache/ignite/blob/2.7.0/modules/ml/src/main/java/org/apache/ignite/ml/dataset/impl/cache/util/ComputeUtils.java#L215
> https://github.com/apache/ignite/blob/2.7.0/modules/ml/src/main/java/org/apache/ignite/ml/dataset/impl/cache/CacheBasedDatasetBuilder.java#L99
> 
> The previous made it work. I tried to retrieve the keep