Re: count not updated through SQL using DataStreamer
Hi Calvin, Cache.size and SELECT COUNT(*) are not always equal in Ignite. Could you please tell what arguments did you pass to IgniteDataStreamer.addData method? Sent from my iPhone > On 16 May 2019, at 23:40, Banias H wrote: > > Hello Ignite experts, > > I am very new to Ignite. I am trying to ingest 15M rows of data using > DataStreamer into a table in a two-node Ignite cluster (v2.7) but run into > problems of not getting the data through running SQL on DBeaver. > > Here is the list of steps I took: > > 1. Start up two nodes using the following xml. > > > http://www.springframework.org/schema/beans"; >xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; >xsi:schemaLocation=" >http://www.springframework.org/schema/beans >http://www.springframework.org/schema/beans/spring-beans.xsd";> > > > > > >class="org.apache.ignite.configuration.DataRegionConfiguration"> > > > > > > > > >class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder"> > > > IP1 > IP2 > > > > > > > > > > > > > > > > > 2. Use python thin client to get the cache SQL_PUBLIC_FOO and insert ten row > of data. After this step, both thin client and DBeaver SQL client report the > same count: > > - thin client: > > nodes = [ > (IP1, 10800), > (IP2, 10800), > ] > client = Client() > client.connect(nodes) > cache = client.get_cache("SQL_PUBLIC_FOO") > print(cache.get_size()) > > returns 10 > > - SQL through DBeaver > > SELECT COUNT(*) FROM FOO > > returns 10 > > 3. However when I tried using DataStreamer to ingest 100 rows into the cache > SQL_PUBLIC_FOO, only thin client showed new count value and SQL returned old > count value: > > - ingesting through DataStreamer > //I ran the jar on one of the Ignite nodes > String CONFIG_FILE = ; > Ignition.setClientMode(true); > Ignite ignite = Ignition.start(CONFIG_FILE); > IgniteDataStreamer stmr = > ignite.dataStreamer("SQL_PUBLIC_FOO"); > stmr.addData(rowCount, value); > > - thin client: > > nodes = [ > (IP1, 10800), > (IP2, 10800), > ] > client = Client() > client.connect(nodes) > cache = client.get_cache("SQL_PUBLIC_FOO") > cache.get_size() > > returns 110 > > - SQL through DBeaver > > SELECT COUNT(*) FROM FOO > > returns 10 > > Would anyone shed some lights on what I did wrong? I would love to use > DataStreamer to put much more data into the cache so that I would wan to be > able to query them through SQL. > > Thanks for the help. I appreciate it. > > Regards, > Calvin >
Re: H2 SQL query optimiser strategy in Ignite
Hi Jose, Yes cost-based optimization of query fragments executed locally on nodes using local node statistics sounds as a good idea. Also there might be other options. Unfortunately neither was implemented yet in Ignite. Sent from my iPhone > On 9 May 2019, at 08:48, joseheitor wrote: > > Hi Ignite Team, > > Have the system architects explored the option of maintaining table > statistics on each node (as Postgres and other legacy SQL engines do), and > then distributing the raw SQL query to each node and letting each node > execute the query planner locally, optimising the query based on the > statistics on hand for the given node...? > > Would this not optimise overall performance of the query, and eliminate the > need for developers and DBAs to have try to guess the optimum JOIN order? > (which may in fact vary on each node...?) > > Thanks, > Jose > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Ignite many to many relationship
Hi Stéphane, Yes it sounds that extra non primary key column is a straightforward solution. I believe that the limitation is caused by peculiarities of Ignite SQL. But storage overhead should be calculated in order to measure how critical is it. Also it worth to search Ignite Jira for relevant ticket and create one if none exsts. On a bright side it could be a case that an additional column can have some business meaning in practice. On a bright Sent from my iPhone > On 5 May 2019, at 17:34, Stéphane Thibaud wrote: > > After some more thought about this: I really think that the constraint of > having one 'non primary key' column is a severe limitation. In combination > with the fact that it is also not possible to create a table without primary > key, it means that a column that wastes space has to be created for a > many-to-many relationship. Am I missing something? > > > Kind regards, > > Stéphane Thibaud > > 2019年5月4日(土) 22:46 Stéphane Thibaud : >> Hello Apache Ignite users, >> >> I was wondering what would be the best way to model a many to many >> relationship. Since uniqueness constraints do not exist in ignite, I thought >> of setting a primary key constraint on the two columns in my linking table. >> However, Ignite complains that at least one non primary key column needs to >> exist in this case. I could come up maybe come up with a column like >> 'created date', but it seems like a waste of space if not truly necessary. >> Otherwise I might have to accept duplicates and filter on selection. >> What approach would you suggest? >> >> >> Kind regards, >> >> Stéphane Thibaud
Re: Behavior of Ignite during PME on new nodes added to the cluster
Evangelos, Thank you for your feedback! Regarding your questions: 1. Sounds quite widely. To avoid misunderstanding I try to give a general hint only. Actually a decision to rebalance data or not is made after PME. So, it is true that not every PME leads to a rebalance. (one more example when it is not needed is changing ownership of an empty partition). Also you can observe decisions about rebalance in logs (I guess INFO level). If something still needs clarification feel free to ask. 2. Yes it is true. At least for latest versions. Sent from my iPhone > On 4 May 2019, at 00:49, Evangelos Morakis wrote: > > > Dear Ivan, > Thank you very much for you comprehensive answer, may I just say that ignite > is definitely my favorite “beast” amongst the existing solutions due to its > versatility and the power it delivers when it comes to designing complex > distributed solutions as in my specific use case. In any case, based on your > answer, things are clearer now in regards to ignite’s operation but could you > just confirm 2 points in order to validate my understanding. > 1) PME does NOT always result in data rebalancing among nodes since as you > mention ignite is clever enough to keep primary partitions in existing nodes > prior to PME, caused by a new server node joining the cluster, to the same > node as before. In addition if backup partitions have not been defined in the > config then there should not be any data rebalancing happening is that > correct ? > 2) the behavior regarding data rebalancing during PME is as you mention in > the case where a new server node joins in. What happens if the node is not a > server but a client node (meaning no data are ever stored locally in that > node)? Am I correct to assume that in such a case there will NOT be any data > rebalancing triggered ? > > Thank you in advance for your time and effort. > Kind regards > > Dr. Evangelos Morakis > Software Architect > >> On 3 May 2019, at 17:35, Павлухин Иван wrote: >> >> Hi Evangelos and Matt, >> >> As far as know there were issues with a join of a client node in >> previous Ignite versions. In new versions a joining client should not >> cause any spikes. >> >> In fact PME is (unfortunately) a widely known beast in the Ignite >> world. Fundamentally PME can (and should) perform smooth when new >> server nodes join the cluster not very frequently. I will bring some >> details what happens when a new server node joins the cluster. I hope >> it will help to answer a question 3 from a first message in this >> thread. >> >> As its name hints PME is a process when all nodes agree on a data >> distribution in the cluster after an events which leads to a >> redistribution. E.g. such event is node joining. And data distribution >> is a knowledge that a partition i is located on a node j. And for >> correct cluster operations each node should agree on the same >> distribution (consensus). So, it is all about a consistent data >> distribution. >> >> Consquently some data should be rebalanced after nodes come to an >> agreement on a distribution. And Ignite uses a clever trick to allow >> operations during data is rebalanced. When new node joins: >> 1. PME occurs and nodes agree on a same data distribution among nodes. >> And in that distribution all primary partitions belong to same nodes >> which they belong before PME. Also temporary backup partitions are >> assigned to the new node which will become a primary node for those >> partitions (keep reading). >> 2. Rebalance starts and delivers a data to the temporary backup >> partitions* mentioned before. The cluster is fully operational >> meanwhile. >> 3. Once rebalance completes another one PME happens. Now the temporary >> backups become primary (and other redundant partitions are marked for >> unload). >> * it worth noting here that a partition was empty and loaded during >> rebalance is marked as MOVING. It is not readable because it does not >> containt all data yet, but all writes come to this partition as well >> in order to make it up to date when rebalnce completes. >> (In Ignite the described trick is sometimes called "late affinity >> assignment") >> >> So, PME should not be very heavy because it is mainly about >> establishing an agreement on data distribution. Heavier data rebalance >> process happens when a cluster is fully operational. But PME still >> requires a silence period during establishing an agreement. As you >> might know PME and write operations use a mechanism similar to a >> read-write lock. Write operations are guarded by that lock in a shared >> mode. PME acquires that lock in an exclusive mode. So, at any moment >> we can have either several running write operations or only one >> running PME. It means that PME have to await all write operations to >> complete before it can start. Also it blocks all new write operations >> to start. Therefore long running transactions blocking PME can lead to >> a prolonged "sil