Alexey,
Thanks for details!
Common replication infra suggestion looks great!
Agree with your points regarding per-page replication, but still have a
feeling that this protocol can be made compact enough, e.g. by sending only
deltas. As far as entry processors we can decide on what to send - if
Hey Valentin!
Any design docs/wiki for 1, 4 and 5 so far?
Yakov Zhdanov
Hi!
I am back!
Here are several ideas on top of my mind for Ignite 3.0
1. Client nodes should take the config from servers. Basically it should be
enough to provide some cluster identifier or any known IP address to start
a client.
2. Thread per partition. Again. I strongly recommend taking a
I think there should be a link on the page - Suggest edits
--Yakov
Lukas, would you be so kind as to suggest edits for the mentioned
documentation page. Your suggesions will be reviewed and incorporated.
Thanks!
--Yakov
We still need to differentiate between nulls and empty strings.
--Yakov
Stan, I thnk we never know the truth here. Imagine you never dealed with
distributed systems. You just copy distibs to intended machines and startup
distributed cluster out of the box. Is not it good? If we do the change we
should expect many questions to user@ asking why nodes do not see each
Guys,
I remember we did the opposite change some time ago - switched VM IP finder
to multicast. That was done for user being able to start cluster spanning
multiple machines using examples configuration. With this change you
removed all the working samples for starting really distributed
Guys, I remember we did this to
--Yakov
ср, 27 февр. 2019 г. в 14:46, Dmitrii Ryabov :
> Hello, Igniters!
>
> Code is ready and reviewed, tests are passed.
>
> Can we make final decision about this change? Do we really need it? [1]
>
> Pros:
>
> * Multicast ipFinder adds some instability when
> How big the message worker's queue may grow until it becomes a problem?
Denis, you never know. Imagine node may be flooded with messages because of
the increased timeouts and network problems. I remember some cases with
hundreds of messages in queue on large topologies. Please, no O(n)
Denis, what if we remove priority difference for messages and always add
new to the end of the queue?
As far as traversing the queue - I don't like O(n) approaches =). So, with
adding all messages to the end of the queue (removing prio difference) I
would suggest that we save latest 1st lap
Yakov Zhdanov created IGNITE-10698:
--
Summary: Get rid of @MXBeanParametersNames and
@MXBeanParametersDescriptions
Key: IGNITE-10698
URL: https://issues.apache.org/jira/browse/IGNITE-10698
Project
Zhenya,
Vladimir suggested not to restrict anything. However, my opinion is to
throw exception on duplicating indexes. We should better add ability to
rename index if it can be useful for anyone. Having same field set indexed
with same index type is pretty strange and adds a lot of risk for
Vladimir, can you please take a look at
https://issues.apache.org/jira/browse/IGNITE-10376?
--Yakov
Max, correct link to ticket is
https://issues.apache.org/jira/browse/IGNITE-10258
I would agree with you and Vladimir that parameters in mentioned case may
appear in any order.
--Yakov
Denis, you can go even further. E.g. you can start topology once for the
full set of single threaded full api cache tests. Each test should start
cache dynamically and run it logic.
As for me, I would think of splitting RunAll to 2 steps - one containing
basic tests and another with more complex
Nikolay, let me take a look at the changes. I will do it possibly over
weekend.
Thanks!
--Yakov
2018-11-08 17:20 GMT+03:00 Nikolay Izhikov :
> Hello, Igniters.
>
> Please, respond if anyone wish to do the additional review of this
> improvement.
>
> I think it's ready to be merged, so if noone
Wayne, can you please share a reproducer for this problem that can be
launched from IDE?
--Yakov
Wayne, can you please share a reproducer for this problem that can be
launched from IDE?
--Yakov
No, I meant under Ignite's git so any change to resource file arrives with
project workspace updates and gets automatically picked up by plugin.
Makes sense?
--Yakov
Agree with Vyacheslav - reviewers can either fix the issues or ask to fix
them. After several PRs new contributors will get used with project
requirements.
As far as one time contributions, they are usually pretty simple and should
not take any significant time to fix. If one time contirbutor
Denis, there were email notifications from wiki on corresponding edits =)
--Yakov
Ivan I removed "lic" from the list. Thanks for catch!
Agree with Andrey. After several code reviews newcomers will get used to
abbreviations.
Andrey, try searching for "fut" and make sure to have "Word" checked. You
will see plenty of usages. "f" is also ok for future in case it does not
bring
Igniters,
I have shortened the list of abbreviation rules and edited our wiki page -
https://cwiki.apache.org/confluence/display/IGNITE/Abbreviation+Rules.
Thanks to Vladimir Ozerov and Alexey Goncharuk for their useful feedback.
My idea was to leave only "common sense" abbreviations and those
Hi Mike!
Thanks for reproducer. Now I understand the problem. NIO worker reads
chunks from the network and notifies the parser on data read. Parser
expects chunks to be complete and has all the data to read entire message,
but this is not guaranteed and single message can arrive in several
Guys, I am sorry I missed this discussion. Apparently, abbreviations use is
far from being the biggest problem in the project. I think everyone agrees
here.
I vote for leaving abbreviations mandatory, and would be strongly against
making them optional since we will endup in situation when
Michael, can you please share a reproducer? Is it possible to snapshot a
packet that causes the error and just emulate packet send with manually
opened socket bypassing Redis client lib?
--Yakov
Maxim,
Thanks for response, let's do it the way you suggested.
Please consider adding more checks
- line endings. I think we should only have \n
- ensure blank line in the end of file
All these are code reviews issues I pointed out many times when reviewing
conributions. It would be cool if we
Andrey,
Probability of a OOM kill will be much lower if offheap is pretouched. What
do you mean by JVM internal needs? In my understanding if user enables
option to pretouch heap and fixes the heap to prevent jvm releasing memory
back to OS, then OOM killing is very unlikely.
I would agree that
Agree with Petr.
Maxim, what are our next steps? Can we add check for
- line length
- indents (tabs vs spaces)
This may require some efforts (will it and how much?), but can we add check
for:
- log messages structure
- log.warn() vs U.warn()
- abbreviations for local variables and fields.
And
gt; > > > > I think we should find exact answers to these
> > > > >
> > > > > questions:
> > > > > > > > > > > > > > > > 1. What `critical` issue exactly is?
> > > > > >
Agree with David. We need to have an opporunity set backups count threshold
(at runtime also!) that will not allow any automatic stop if there will be
a data loss. Andrey, what do you think?
--Yakov
ence we can terminate the
> next node as well. Eventually we can collapse the entire cluster. Is it a
> possible scenario?
>
> пт, 7 сент. 2018 г. в 18:44, Yakov Zhdanov :
>
> > Andrey,
> >
> > I don't understand your point. My opinion, the idea of these changes is
> to
&
Andrey,
I don't understand your point. My opinion, the idea of these changes is to
make cluster more stable and responsive by eliminating hanged nodes. I
would not make too much difference between threads trapped in deadlock and
threads hanging on fsync calls for too long. Both situations lead to
Great news, Vladimir! Congratulations!
--Yakov
2018-08-30 15:15 GMT+03:00 Vladimir Ozerov :
> Igniters,
>
> I am glad to announce that we finally merged MVCC and transactional SQL
> support to master branch.
>
> This long journey started more than a year ago with multiple design
> brainstorm
Nikolay,
I think we should have 2 weeks after code freeze which by the way may
include RC1 voting stage. This way I would like us to agree that release
candidate should be sent to vote on Oct, 11th and we can release on Oct,
15th.
What do you think?
--Yakov
Igniters,
We should definitely expand the list of important tickets adding tickets
related to (1) Partition Map Exchange speed up and (2) SQL memory
optimization tickets. Alex Goncharuk, Vladimir, can you please add labels
to corresponding tickets.
Also we have several blocker tickets (mostly
Guys, what time in % does crc calculation take in WAL logging process?
--Yakov
2018-08-14 13:37 GMT+03:00 Dmitriy Pavlov :
> Hi Alex, thank you for this idea.
>
> Evgeniy, Alex, would you like to submit the patch with bypassing
> implementation differences to keep compatibility?
>
> Sincerely,
Hi Yriy!
Go ahead! You were added!
--Yakov
2018-08-07 17:19 GMT+03:00 Юрий :
> Hello, Ignite Community!
>
> My name is Iurii. I want to contribute to Apache Ignite.
> my JIRA user name is jooger. Any help on this will be appreciated.
>
> Thanks!
>
> --
> Live with a smile! :D
>
Sergei, you are welcome! I have added you to contributors. Please go ahead!
--Yakov
2018-08-07 16:18 GMT+03:00 s v :
> Hello, Ignite Community!
>
> My name is Sergei. I want to contribute to Apache Ignite and want to start
> with this issue - https://issues.apache.org/jira/browse/IGNITE-9141 ,
st VmIpFinder
> and
> > in other test Multicast without any reason or comment.
> > If you care about test coverage of MulticastIpFinder you can pick several
> > suites where number of starting/stopping is most frequent and leave
> > multicast there, but in general it's not necessary
Maxim, did we have .net tests failing before merge? If yes, how come we
merged it?
--Yakov
2018-08-02 18:03 GMT+03:00 Maxim Muzafarov :
> Folks,
>
> Seems like rebalancing changes leads us to hanging TestRebalance() test for
> .NET.
> Created issue [1], I will try to investigate it.
>
> [1]
It should be true, otherwise we would have nodes from all agents
intersecting. No?
And multicast IP finder is the defailt one, so I would not reduce its test
volume.
Yakov Zhdanov
www.gridgain.com
2018-08-02 0:32 GMT+03:00 Dmitriy Pavlov :
> Hi Yakov,
>
> Regarding Each TC agen
I disagree. Probably, no change required. Each TC agent use own multicast
group so nodes do not intersect. If any of the test does not properly clean
up and leaves nodes running this dhould be flagged as test fail which is
the case.
Please provide strong reasons to start with this.
--Yakov
Agree with Dmitry P.
Guys! Even if you file a ticket on yourself please spend additional 5 mins
to add minimal description to provide idea on the intended changes for the
rest of our community or those who will have to deal with your changes in
future.
--Yakov
Guys, multicast IP finder gives new user an opportunity to run tests on
several machines with zero config changes. And you want to change this
which is not good in my view.
Probably, we need to output warning pointing that user can change multicast
group to avoid undesired discovery and isolate
are
> going to deprecate getAll() method and set keepAll flag to false for scan
> query.
>
> Agree?
>
> пн, 16 июл. 2018 г. в 23:40, Dmitriy Setrakyan :
>
> > On Mon, Jul 16, 2018 at 5:42 PM, Yakov Zhdanov
> > wrote:
> >
> > > Dmitry, let's
>
> May be we use GridCachePartitionExchangeManager#forceRebalance (or may
> be forceReassign) if we fail rebalance all that retries. What do you think?
>
>
>
> пн, 16 июл. 2018 г. в 21:12, Yakov Zhdanov :
>
> > Maxim, I looked at the code you provided. I think we need to add some
&
Dmitry, it hink we can do this change right away. All we need is to add
proper error message on cache config validation in order to tell user that
default changed and manual configuration is needed for compatibility.
--Yakov
2018-07-16 15:47 GMT+03:00 Dmitry Karachentsev :
> Created a ticket
the 1st one.
You can use the following message
Failed to wait for supply message from node within 30 secs [cache=C,
partId=XX]
Alex Goncharuk do you have comments here?
Yakov Zhdanov
www.gridgain.com
2018-07-14 19:45 GMT+03:00 Maxim Muzafarov :
> Yakov,
>
> Yes, you're right. Whole re
I think you need to signup to Apache jira and let us know your user ID so
we can add you to contributors. Dmitry Pavlov, can you please help.
--Yakov
2018-07-12 18:54 GMT+03:00 vgrigorev :
> Hi colleges!
>
> I would like move topic to suitable place.
>
> Please only clarify how to do it:
> In a
. Would be nice to receive some feedback from
> > other community members, though, because this is formally a breaking
> > change.
> >
> > пн, 16 июл. 2018 г. в 16:40, Yakov Zhdanov :
> >
> > > Guys, it seems we need to deprecate getAll() and remove it in 3.0.
Guys, it seems we need to deprecate getAll() and remove it in 3.0. I think
it is usable only for queries that return 1 row. Every other case needs
iteration. So having getFirst() seems to be better. Thoughts?
As far as ScanQuery I think we can properly initialize keepAll to false on
scan query
Maxim, I do not understand the problem. Imagine I do not have any ordering
but rebalancing of some cache fails to start - so in my understanding
overall rebalancing progress becomes blocked. Is that true?
Can you pleaes provide reproducer for your problem?
--Yakov
2018-07-09 16:42 GMT+03:00
If events are working when grid is not active and adding/removing listeners
is also possible then I agree/
--Yakov
Hi!
Can you please move you proposal to Apache Ignite Wiki as new IEP? So that
community can discuss and comment? Eventually we will end up with a plan on
creating the guide. What do you think?
--Yakov
2018-07-12 12:24 GMT+03:00 vgrigorev :
> Very many developers have desire to participate in
Its counter event on_after_deactivated cannot be fired through event
notification support. Therefore, it is better to gather such events in
LifeCycleEventType.
--Yakov
Ivan, yes. We can go with reflection through configuration and SPIs and,
you are rigth, suppressed list should be manually defined.
--Yakov
Ivan, I would think of some test that will randomly generate configs for
nodes and run some logic.
--Yakov
Stan, feel free to file the ticket. Just make sure to add detailed
description to it. Your suggestion seems to make sense.
--Yakov
> What is the difference between a lifecycle even and regular events?
Lifecycle events should be used when there is no opportunity for Ignite to
fire regular event, e.g. node stops or is not started yet. Please see
Igor,
I can't say if I agree with any of the suggestions. I would like us to
start from answering the question - what is data streamer used for?
First of all, for initial data loading. This should be super fast mode
probably ignoring all transactional semantics, but providing certain
guarantees
Guys, I created ticket for config params validation -
https://issues.apache.org/jira/browse/IGNITE-8951. Feel free to comment.
Yakov Zhdanov
www.gridgain.com
2018-07-04 10:51 GMT+03:00 Andrew Medvedev :
> Hi Nikolay
>
> No, we have been beaten by
> https://issues.apache.org/jira/b
Yakov Zhdanov created IGNITE-8951:
-
Summary: Need to validate nodes configuration across cluster and
warn on different parameters value
Key: IGNITE-8951
URL: https://issues.apache.org/jira/browse/IGNITE-8951
We can start with separate discussion on dev list. Or can you point to
existing one? I think I need some details here.
--Yakov
Ilya,
In my view putting @Deprecated is enough for now for things that should be
removed. When we will come closer to 3.0 release we will look through all
deprecated stuff and remove it. This applies to
IGNITE_BINARY_SORT_OBJECT_FIELDS.
Can you please annotate it?
As far as list of breaking
This is for offline WAL analysis. So skipping record with proper message is
also a solution. If it is possible, iterator should output a suggestion on
what is missing in classpath. Option to suppress warnings should also
present.
Makes sense?
And final question - did we look at similar utilities
Andrey, your suggestions look good to me. Let maintainer review your patch.
--Yakov
Alexey Goncharuk, I remember we started working on async connection
establishment. This should fix latency issue related to network which I
believe gives the most contribution to overall latency. Mapping logic and
other stuff can be ignored as it can very rarely be an issue at least on
stable
Guys, how about we release 2.5 in the nearest future after adding proper
usability log messages that will explain user what to do and also output
link to readme.io with the first BLT related message during node uptime.
This should not take much time and we can use the same messages when we
have
Guys,
How do we update counter right now?
If we move to fair thread-per-partition we can update counter only if we
add new key and skip if we add or remove a version. Thoughts?
--Yakov
2018-04-25 12:07 GMT+03:00 Vladimir Ozerov :
> This is interesting question. Full-scan
Alexey Goncharuk, Vladimir Ozerov, what do you think about these tests?
I believe they were created as a part of variuos optimization and profiling
activities. I also think we can remove them since nobody cares about them
for too long.
Thoughts?
Yakov Zhdanov
ср, 18 апр. 2018 г., 16:42 Ilya
Of course, no guarantees, but at least an effort.
--Yakov
Guys,
We have activity to implement a set of mechanisms to handle critical issues
on nodes (IEP-14 - [1]).
I have an idea to spread message about critical issues to nodes through
entire topology and put it to logs of all nodes. In my view this will add
much more clarity. Imagine all nodes output
+1 here
Always wanted to remove ForceKeysRequest =)
--Yakov
How about skipWalOnRebalancing or disableWalOnRebalancing? I like the first
one better and both are shorter.
--Yakov
2018-04-09 19:55 GMT+03:00 Denis Magda :
> I would enable this option only after we confirm it's stable. Until it
> happens it should be treated as an
Guys, has anybody checked with INFRA if we can have module structure? Denis?
--Yakov
Dmitry, I think Anton meant AtomicsConfiguration, not atomic caches.
However, I would make sure we validate all conf parameters.
Anton, can you please share junit test that shows the problem?
Yakov Zhdanov
сб, 7 апр. 2018 г., 6:12 Dmitriy Setrakyan <dsetrak...@apache.org>:
> I
Cross posting to dev.
Vladimir Ozerov, can you please take a look at NPE from query processor
(see below - GridQueryProcessor.typeByValue(GridQueryProcessor.java:1901))?
--Yakov
2018-03-29 0:19 GMT+03:00 smurphy :
> Code works in Ignite 2.1.0. Upgrading to 2.4.0 produces
Andrey, I understand your point but you are trying to build one more
mechanism and introduce abstractions that are already here. Again, please
take a look at segmentation policy and event types we already have.
Thanks!
Yakov
If java runs oome then you cannot guarantee anything. Including calling
runtime.halt().
My point is about consistent approach throughout the project. I think
developing new mechanism with separate interface is incorrect.
Yakov
Awesome article, Dmitry!
Denis Magda, should we put a link to it from apacheignite.readme.io?
--Yakov
Andrey Gura,
Why should we have any FailureHandler abstraction? We already have it -
this is EventListener. In my view it is better (and cleaner design) to add
events (similar to, for
example, org.apache.ignite.events.EventType#EVT_NODE_SEGMENTED) like
EVT_IGNITE_OOME, EVT_SYS_WORKER_FAILED and
Great progress, guys! Keep going!
--Yakov
Alexey, generally I agree. However, I don't understand what exactly you
suggest. Can you please list the list of parameters you want to deprecate
(1), internal logic changes (2) and updates to the javadocs/description of
the params you want to keep (3)?
--Yakov
to
them are already fsynced.
Correct?
Yakov Zhdanov,
www.gridgain.com
2018-02-13 21:29 GMT+03:00 Ivan Rakov <ivan.glu...@gmail.com>:
> Yakov,
>
> I see the only one problem with your suggestion - number of
> "uncheckpointed" segments is potentially unlimited.
> Right
I meant we still will be copying segment once and then will be moving it to
archive which should not affect file system much.
Thoughts?
--Yakov
2018-02-13 21:19 GMT+03:00 Yakov Zhdanov <yzhda...@apache.org>:
> Alex,
>
> I remember we had some confusing behavior for WAL archiv
Alex,
I remember we had some confusing behavior for WAL archive when archived
segments were required for successful recovery.
Is issue still present?
If yes, what if we copy "uncheckpointed" segments to a directory under wal
directory and then move the segments to archive after checkpoint? Will
Alex, you can alter ServerImpl and insert a latch or thread.sleep(xxx)
anywhere you like to show the incorrect behavior you describe.
--Yakov
+1 for deprecation
--Yakov
2018-01-30 1:06 GMT+03:00 Valentin Kulichenko :
> +1
>
> On Mon, Jan 29, 2018 at 8:31 AM, Andrey Mashenkov <
> andrey.mashen...@gmail.com> wrote:
>
> > Vyacheslav,
> >
> > +1 for dropping @CacheLocalStore.
> > Ignite have no support
Guys,
When running tests from core module I see that Ignite has 2 plugins
configured by default (because they are available in classpath):
-TestReconnectPlugin 1.0
-StanByClusterTestProvider 1.0
It seems they were introduced by Dmitry Karachentsev and Dmitry Govorukhin.
Guys, can you please
Pavel,
I tried this out recently and had few minor issues.
1. due to path issues mvn executable was not found, but process coninued to
.net build
2. after I set path for mvn it still was failing due to incorrect java home
Should we report maven build issues and stop the overall process with
> >>> I reviewed your contribution and left you some comments on the pr.
> >>> Thanks!
> >>>
> >>> Vladisav
> >>>
> >>> On Wed, Jan 17, 2018 at 10:14 PM, Vladisav Jelisavcic <
> >>> vladis...@gmail.com> wrote:
>
Did we add what Pavel suggested to README.txt and readme.io documentation?
Yakov Zhdanov,
www.gridgain.com
2018-01-25 14:27 GMT-08:00 Denis Magda <dma...@apache.org>:
> Pavel, it’s a good point.
>
> Peter, could you ensure that all Ignite scripts (ignite.sh/bat,
> control.sh/
No. This is not 100% consistent. Since operations started on prev version
after node has left (but system has not got event yet) would succeed. For
me consistent behavior is to throw exception for "select avg(x) from bla"
if data is currently missing or any data loss occurs in the middle of the
I'm still not sure on what Val has suggested. Dmitry, Val, Do you have any
concrete API/algorithm in mind?
--Yakov
Val, your computation fails once it reaches the absent partition. Agree
with the point that any new computation should not start. Guys, any ideas
on how to achieve that? I would think of scan/sql query checking that there
is no data loss on current topology version prior to start. Val, please
note
I do not see
> any point in reducing cluster availability when operations can be safely
> completed.
>
> 2018-01-23 2:22 GMT+03:00 Yakov Zhdanov <yzhda...@apache.org>:
>
> > Val,
> >
> > Your suggestion to prohibit any cache operation on partition loss does
Alex Goncharuk, can you please take a look and comment? Test seems to be
valid from my standpoint.
Yakov Zhdanov,
www.gridgain.com
2018-01-22 23:14 GMT-08:00 ALEKSEY KUZNETSOV <alkuznetsov...@gmail.com>:
>
> created ticket with reproducer [1]
>
> I think the fix should b
Val,
Your suggestion to prohibit any cache operation on partition loss does not
make sense to me. Why should I care about some partition during particular
operation if I don't access it? Imagine I use data on nodes A and B
performing reads and writes and node C crashes in the middle of tx. Should
1 - 100 of 619 matches
Mail list logo