Re: Issues running Ignite with Cassandra and spark.
I fixed the issue with dataframe api and am getting all columns now. However, I am not able to perform grouping + udaf operations as it tries to perform these on ignite. setting OPTION_DISABLE_SPARK_SQL_OPTIMIZATION = true is not helping. How so we tell ignite to just fetch data and perform all other operations in spark?
Re: Issues running Ignite with Cassandra and spark.
Hi, Thanks for the answer. Unfortunately, we cannot remove Cassandra as it is being used elsewhere as well. We will have to write directly in ignite and sync with cassandra. We had a few other issues while getting data from spark: 1) cacherdd.sql("select * from table") is giving me heap memory (GC) issues. However, getting data using spark.read.format() works fine. Why is this so ? 2) in my persistence, i have IndexedTypes with key and value POJO classes. The key class corresponds to the key in cassandra with partition and clustering keys defined. While querying with sql, (select * from value_class) i get all the columns of the table. However, while querying using spark.read.format(...).option(OPTION_TABLE,value_class).load() , I only get the columns stored in the value class. How do i fetch all the columns using dataframe api ? Thanks, Shrey On Fri, 28 Sep 2018, 08:43 Alexey Kuznetsov, wrote: > Hi, Shrey! > > Just as idea - Ignite now has persistence (see > https://apacheignite.readme.io/docs/distributed-persistent-store), > may be you can completely replace Cassandra with Ignite? > > In this case all data always be actual, no need to sync with external db. > > -- > Alexey Kuznetsov >
Re: Issues running Ignite with Cassandra and spark.
Hi, Shrey! Just as idea - Ignite now has persistence (see https://apacheignite.readme.io/docs/distributed-persistent-store), may be you can completely replace Cassandra with Ignite? In this case all data always be actual, no need to sync with external db. -- Alexey Kuznetsov
Re: Issues running Ignite with Cassandra and spark.
Hello! 1) There is no generic way of pulling updates from 3rd party database and there is no API support for it usually, so it's not obvious how we could implement that even if we wanted. 2) By default cache store will process data in parallel on all nodes. However if will not align data distribution with that of cassandra, and I would say that implementing it will be infeasible. However, you could try to see if there are ways to speed up loadCache by tuning Ignite and-or cache configurations. Regards, -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Issues running Ignite with Cassandra and spark.
Hi, we are using Ignite as a cache layer over Cassandra for faster read queries using spark. Our cluster has 10 nodes running an instance of Cassandra and Ignite. However, we came across a few issues: 1) We currently store the data from spark to cassandra. Hence to load data, we need to call .loadCache() . I know there are ways for data written in Ignite to be synced with cassandra (writeBehind, writeThroughs) . However we want to do the opposite. Load in cassandra and want it to be reflected in the cache which can be queries by spark. Is there a way to do so ? 2) To load data into the cache from Cassandra, I start a new client in another machine and call the .loadCache() method. However, it takes almost 45 minutes to load the data (around 30 million rows with 20 columns each) . Is there a way to make this faster by ensuring that data from a particular node in cassandra cluster is parallelly loaded to the cache instance of the same node ? I have defined my partition and clustering columns in the my spring persistance-settings. Thanks, Shrey -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/