Re: count not updated through SQL using DataStreamer

2019-05-16 Thread Ivan Pavlukhina
Hi Calvin,

Cache.size and SELECT COUNT(*) are not always equal in Ignite. Could you please 
tell what arguments did you pass to IgniteDataStreamer.addData method?

Sent from my iPhone

> On 16 May 2019, at 23:40, Banias H  wrote:
> 
> Hello Ignite experts,
> 
> I am very new to Ignite. I am trying to ingest 15M rows of data using 
> DataStreamer into a table in a two-node Ignite cluster (v2.7) but run into 
> problems of not getting the data through running SQL on DBeaver.
> 
> Here is the list of steps I took:
> 
> 1. Start up two nodes using the following xml.
> 
> 
> http://www.springframework.org/schema/beans";
>xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
>xsi:schemaLocation="
>http://www.springframework.org/schema/beans
>http://www.springframework.org/schema/beans/spring-beans.xsd";>
>   
> 
> 
>   
> 
>class="org.apache.ignite.configuration.DataRegionConfiguration">
> 
>   
> 
>   
> 
> 
>   
> 
>class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
> 
>   
> IP1
> IP2
>   
> 
>   
> 
>   
> 
> 
>   
> 
> 
> 
>   
> 
>   
> 
> 
> 2. Use python thin client to get the cache SQL_PUBLIC_FOO and insert ten row 
> of data. After this step, both thin client and DBeaver SQL client report the 
> same count:
> 
> - thin client:
> 
> nodes = [
> (IP1, 10800),
> (IP2, 10800),
> ]
> client = Client()
> client.connect(nodes)
> cache = client.get_cache("SQL_PUBLIC_FOO")
> print(cache.get_size())
> 
> returns 10
> 
> - SQL through DBeaver
> 
> SELECT COUNT(*) FROM FOO 
> 
> returns 10
> 
> 3. However when I tried using DataStreamer to ingest 100 rows into the cache 
> SQL_PUBLIC_FOO, only thin client showed new count value and SQL returned old 
> count value:
> 
> - ingesting through DataStreamer
> //I ran the jar on one of the Ignite nodes
> String CONFIG_FILE = ;
> Ignition.setClientMode(true);
> Ignite ignite = Ignition.start(CONFIG_FILE);
> IgniteDataStreamer stmr = 
> ignite.dataStreamer("SQL_PUBLIC_FOO");
> stmr.addData(rowCount, value);
> 
> - thin client:
> 
> nodes = [
> (IP1, 10800),
> (IP2, 10800),
> ]
> client = Client()
> client.connect(nodes)
> cache = client.get_cache("SQL_PUBLIC_FOO")
> cache.get_size() 
> 
> returns 110
> 
> - SQL through DBeaver
> 
> SELECT COUNT(*) FROM FOO 
> 
> returns 10
> 
> Would anyone shed some lights on what I did wrong? I would love to use 
> DataStreamer to put much more data into the cache so that I would wan to be 
> able to query them through SQL.
> 
> Thanks for the help. I appreciate it.
> 
> Regards,
> Calvin
> 


Re: H2 SQL query optimiser strategy in Ignite

2019-05-09 Thread Ivan Pavlukhina
Hi Jose,

Yes cost-based optimization of query fragments executed locally on nodes using 
local node statistics sounds as a good idea. Also there might be other options. 
Unfortunately neither was implemented yet in Ignite.

Sent from my iPhone

> On 9 May 2019, at 08:48, joseheitor  wrote:
> 
> Hi Ignite Team,
> 
> Have the system architects explored the option of maintaining table
> statistics on each node (as Postgres and other legacy SQL engines do), and
> then distributing the raw SQL query to each node and letting each node
> execute the query planner locally, optimising the query based on the
> statistics on hand for the given node...?
> 
> Would this not optimise overall performance of the query, and eliminate the
> need for developers and DBAs to have try to guess the optimum JOIN order?
> (which may in fact vary on each node...?)
> 
> Thanks,
> Jose
> 
> 
> 
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite many to many relationship

2019-05-09 Thread Ivan Pavlukhina
Hi Stéphane,

Yes it sounds that extra non primary key column is a straightforward solution.

I believe that the limitation is caused by peculiarities of Ignite SQL. But 
storage overhead should be calculated in order to measure how critical is it. 
Also it worth to search Ignite Jira for relevant ticket and create one if none 
exsts.

On a bright side it could be a case that an additional column can have some 
business meaning in practice.

On a bright

Sent from my iPhone

> On 5 May 2019, at 17:34, Stéphane Thibaud  wrote:
> 
> After some more thought about this: I really think that the constraint of 
> having one 'non primary key' column is a severe limitation. In combination 
> with the fact that it is also not possible to create a table without primary 
> key, it means that a column that wastes space has to be created for a 
> many-to-many relationship. Am I missing something?
> 
> 
> Kind regards,
> 
> Stéphane Thibaud
> 
> 2019年5月4日(土) 22:46 Stéphane Thibaud :
>> Hello Apache Ignite users,
>> 
>> I was wondering what would be the best way to model a many to many 
>> relationship. Since uniqueness constraints do not exist in ignite, I thought 
>> of setting a primary key constraint on the two columns in my linking table. 
>> However, Ignite complains that at least one non primary key column needs to 
>> exist in this case. I could come up maybe come up with a column like 
>> 'created date', but it seems like a waste of space if not truly necessary. 
>> Otherwise I might have to accept duplicates and filter on selection.
>> What approach would you suggest?
>> 
>> 
>> Kind regards,
>> 
>> Stéphane Thibaud


Re: Behavior of Ignite during PME on new nodes added to the cluster

2019-05-07 Thread Ivan Pavlukhina
Evangelos,

Thank you for your feedback!

Regarding your questions:
1. Sounds quite widely. To avoid misunderstanding I try to give a general hint 
only. Actually a decision to rebalance data or not is made after PME. So, it is 
true that not every PME leads to a rebalance. (one more example when it is not 
needed is changing ownership of an empty partition). Also you can observe 
decisions about rebalance in logs (I guess INFO level). If something still 
needs clarification feel free to ask.
2. Yes it is true. At least for latest versions.

Sent from my iPhone

> On 4 May 2019, at 00:49, Evangelos Morakis  wrote:
> 
> 
> Dear Ivan, 
> Thank you very much for you comprehensive answer, may I just say that ignite 
> is definitely my favorite “beast” amongst the existing solutions due to its 
> versatility and the power it delivers when it comes to designing complex 
> distributed solutions as in my specific use case. In any case, based on your 
> answer, things are clearer now in regards to ignite’s operation but could you 
> just confirm 2 points in order to validate my understanding.
> 1) PME does NOT always result in data rebalancing among nodes since as you 
> mention ignite is clever enough to keep primary partitions in existing nodes 
> prior to PME, caused by a new server node joining the cluster, to the same 
> node as before. In addition if backup partitions have not been defined in the 
> config then there should not be any data rebalancing  happening is that 
> correct ? 
> 2) the behavior regarding data rebalancing during PME is as you mention in 
> the case where a new server node joins in. What happens if the node is not a 
> server but a client node (meaning no data are ever stored locally in that 
> node)? Am I correct to assume that in such a case there will NOT be any data 
> rebalancing triggered ?
> 
> Thank you in advance for your time and effort. 
> Kind regards
> 
> Dr. Evangelos Morakis
> Software Architect 
> 
>> On 3 May 2019, at 17:35, Павлухин Иван  wrote:
>> 
>> Hi Evangelos and Matt,
>> 
>> As far as know there were issues with a join of a client node in
>> previous Ignite versions. In new versions a joining client should not
>> cause any spikes.
>> 
>> In fact PME is (unfortunately) a widely known beast in the Ignite
>> world. Fundamentally PME can (and should) perform smooth when new
>> server nodes join the cluster not very frequently. I will bring some
>> details what happens when a new server node joins the cluster. I hope
>> it will help to answer a question 3 from a first message in this
>> thread.
>> 
>> As its name hints PME is a process when all nodes agree on a data
>> distribution in the cluster after an events which leads to a
>> redistribution. E.g. such event is node joining. And data distribution
>> is a knowledge that a partition i is located on a node j. And for
>> correct cluster operations each node should agree on the same
>> distribution (consensus). So, it is all about a consistent data
>> distribution.
>> 
>> Consquently some data should be rebalanced after nodes come to an
>> agreement on a distribution. And Ignite uses a clever trick to allow
>> operations during data is rebalanced. When new node joins:
>> 1. PME occurs and nodes agree on a same data distribution among nodes.
>> And in that distribution all primary partitions belong to same nodes
>> which they belong before PME. Also temporary backup partitions are
>> assigned to the new node which will become a primary node for those
>> partitions (keep reading).
>> 2. Rebalance starts and delivers a data to the temporary backup
>> partitions* mentioned before. The cluster is fully operational
>> meanwhile.
>> 3. Once rebalance completes another one PME happens. Now the temporary
>> backups become primary (and other redundant partitions are marked for
>> unload).
>> * it worth noting here that a partition was empty and loaded during
>> rebalance is marked as MOVING. It is not readable because it does not
>> containt all data yet, but all writes come to this partition as well
>> in order to make it up to date when rebalnce completes.
>> (In Ignite the described trick is sometimes called "late affinity 
>> assignment")
>> 
>> So, PME should not be very heavy because it is mainly about
>> establishing an agreement on data distribution. Heavier data rebalance
>> process happens when a cluster is fully operational. But PME still
>> requires a silence period during establishing an agreement. As you
>> might know PME and write operations use a mechanism similar to a
>> read-write lock. Write operations are guarded by that lock in a shared
>> mode. PME acquires that lock in an exclusive mode. So, at any moment
>> we can have either several running write operations or only one
>> running PME. It means that PME have to await all write operations to
>> complete before it can start. Also it blocks all new write operations
>> to start. Therefore long running transactions blocking PME can lead to
>> a prolonged "sil