[Meetup] Native persistence storage overview. April 27 2021

2021-04-19 Thread Anton,Kalashnikov

Hi Igniters,

There is meetup next week, where I want to share my knowledge about 
Ignite Persistence module.  It will be high level overview of the main 
components and the main ideas behind the Persistence module. So if you 
are a new ignite contributor or you don't feel  confident enough in 
Ignite persistence it can be helpful for you.


If you interested you can join at 8AM(PST), April 27. More info - 
https://www.meetup.com/ru-RU/Apache-Ignite-Virtual-Meetup/events/277298901.


--
Best regards,
Anton Kalashnikov


[jira] [Created] (IGNITE-14197) Checkpoint thread can't take checkpoint write lock because it waits for parked threads to complete their work

2021-02-17 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-14197:
--

 Summary: Checkpoint thread can't take checkpoint write lock 
because it waits for parked threads to complete their work
 Key: IGNITE-14197
 URL: https://issues.apache.org/jira/browse/IGNITE-14197
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


In case of enabled write throttling, when, for example, node parks data 
streamer thread, it still holds checkpoint read lock and it leads to the long 
pauses on waiting for checkpoint lock:
[2020-07-23 07:09:21,614][INFO 
][db-checkpoint-thread-#371][GridCacheDatabaseSharedManager] Checkpoint started 
[checkpointId=f964c8f2-daa5-41b2-80ef-944326f26f8a, startPtr=FileWALPointer 
[idx=56913, fileOff=10362905, len=41972], checkpointBeforeLockTime=1983ms, 
*checkpointLockWait=812117ms*, checkpointListenersExecuteTime=90ms, 
checkpointLockHoldTime=93ms, walCpRecordFsyncDuration=123ms, 
writeCheckpointEntryDuration=4ms, splitAndSortCpPagesDuration=4155ms, 
pages=10516815, reason='too big size of WAL without checkpoint']
All operations at this moment are blocked.

Sometimes, it can lead to a complete disaster:
Parking thread=data-streamer-stripe-47-#144 for timeout(ms)=*21278855*
{quote}“data-streamer-stripe-78-#175” #209 prio=5 os_prio=0 
tid=0x7f6161d6a800 nid=0xf932 waiting on condition [0x7f5c292d1000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PagesWriteSpeedBasedThrottle.doPark(PagesWriteSpeedBasedThrottle.java:244)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PagesWriteSpeedBasedThrottle.onMarkDirty(PagesWriteSpeedBasedThrottle.java:227)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlockPage(PageMemoryImpl.java:1730)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:491)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:483)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writeUnlock(PageHandler.java:394)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writePage(PageHandler.java:369)
at 
org.apache.ignite.internal.processors.cache.persistence.DataStructure.write(DataStructure.java:296)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$11300(BPlusTree.java:98)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.tryInsert(BPlusTree.java:3864)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.access$7100(BPlusTree.java:3544)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.onNotFound(BPlusTree.java:4103)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.access$5800(BPlusTree.java:3894)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:2022)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1997)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1904)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1662)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1645)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2473)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:436)
at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4306)
at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3441)
at 
org.apache.ignite.internal.processors.cache.GridCacheEntryEx.initialValue(GridCacheEntryEx.java:770)
at 
org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$IsolatedUpdater.receive(DataStreamerImpl.java:2278)
at 
org.apache.ignite.internal.processors.datastreamer.DataStreamerUpdateJob.call(DataStreamerUpdateJob.java:139)
at 
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7104)
at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:966)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
at 
org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecut

[jira] [Created] (IGNITE-14110) Create networking module

2021-02-02 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-14110:
--

 Summary: Create networking module
 Key: IGNITE-14110
 URL: https://issues.apache.org/jira/browse/IGNITE-14110
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


It needs to create a networking module with some API and simple implementation 
for further improvment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14092) Design network address resolver

2021-01-28 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-14092:
--

 Summary: Design network address resolver
 Key: IGNITE-14092
 URL: https://issues.apache.org/jira/browse/IGNITE-14092
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


It needs to design network address resolver/ip finder/discovery which would 
help to choose the right ip/port for connection. Perhaps we don't need such a 
service at all but it should be explicitly agreed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14091) Implement messaging service

2021-01-28 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-14091:
--

 Summary: Implement messaging service
 Key: IGNITE-14091
 URL: https://issues.apache.org/jira/browse/IGNITE-14091
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


It needs to implement the ability to send/receive messages to/from network 
members:
 * there's a requirements of being able to send idempotent messages with very 
weak guarantees:

 ** no delivery guarantees required;

 ** multiple copies of the same message might be sent;

 ** no need to have any kind of acknowledgement;

 * there's another requirement for the common use:

 ** message must be sent exactly once with an acknowledgement that it has 
actually been received (not necessarily processed);

 ** messages must be received in the same order they were sent.
These types of messages might utilize current recovery protocol with acks every 
32 (or so) messages. This setting must be flexible enough so that we won't get 
OOM in big topologies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14090) Networking API

2021-01-28 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-14090:
--

 Summary: Networking API
 Key: IGNITE-14090
 URL: https://issues.apache.org/jira/browse/IGNITE-14090
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


It needs to design convinient public API for networking module which allow to 
get information about network members and send/receive messages from them.

Draft:

{noformat}

public interface NetworkService \{ static NetworkService 
create(NetworkConfiguration cfg); void shutdown() throws ???; NetworkMember 
localMember(); Collection remoteMembers(); void 
weakSend(NetworkMember member, Message msg); Future 
guaranteedSend(NetworkMember member, Message msg); void 
listenMembers(MembershipListener lsnr); void 
listenMessages(Consumer lsnr); } public interface 
MembershipListener \{ void onAppeared(NetworkMember member); void 
onDisappeared(NetworkMember member); void onAcceptedByGroup(List 
remoteMembers); } public interface NetworkMember \{ UUID id(); }

{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14089) Override scalecube internal message by custom one

2021-01-28 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-14089:
--

 Summary: Override scalecube internal message by custom one
 Key: IGNITE-14089
 URL: https://issues.apache.org/jira/browse/IGNITE-14089
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


There is some custom logic in the networking module like a specific handshake, 
message recovery etc. which requires to have specific messages but at the same 
time default scalecube behaviour should be worked correctly. So it needs to 
implement one logic over another.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14088) Implement scalecube transport API over netty

2021-01-28 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-14088:
--

 Summary: Implement scalecube transport API over netty
 Key: IGNITE-14088
 URL: https://issues.apache.org/jira/browse/IGNITE-14088
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


scalecube has its own netty inside but it is idea to integrate our expanded 
netty into it. It will help us to support more features like our own handshake, 
marshalling etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14086) Implement retry of establishing connection if it was lost

2021-01-28 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-14086:
--

 Summary: Implement retry of establishing connection if it was lost
 Key: IGNITE-14086
 URL: https://issues.apache.org/jira/browse/IGNITE-14086
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


It needs to implement a retry of establishing the connection. It is not clear 
which way is better to implement such idea because the current implementation 
too difficult to configure(number of retries, several properties of retry 
time). So it needs to think a better way to configure it. And it needs to be 
implementeded.

Perhaps, scalecube(gossip protocol) do all work already and we should do 
nothing here. Need to recheck.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14085) Implement message recovery protocol over handshake

2021-01-28 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-14085:
--

 Summary: Implement message recovery protocol over handshake
 Key: IGNITE-14085
 URL: https://issues.apache.org/jira/browse/IGNITE-14085
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


The central idea of recovery protocol is the same as it is in the current 
implementation. So it needs to implement a similar idea with the recovery 
descriptor. This means information about last sending/received messages should 
be sent during the handshake and according to this information messages which 
were not received should be sent one more time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14084) Integrate direct marshalling to networking

2021-01-28 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-14084:
--

 Summary: Integrate direct marshalling to networking
 Key: IGNITE-14084
 URL: https://issues.apache.org/jira/browse/IGNITE-14084
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


Direct marshalling can be extracted from ignite2.x and integrate to ignite3.0. 
It helps to avoid extra data copy during the sending/receiving messages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14083) Add SSL support to networking

2021-01-28 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-14083:
--

 Summary: Add SSL support to networking
 Key: IGNITE-14083
 URL: https://issues.apache.org/jira/browse/IGNITE-14083
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


It needs to add the ability to establish SSL connection. It looks like it 
should not be a problem. But at least, it needs to design configuration which 
allow to manage the ssl(path to certificate, password, etc.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14082) Implementation of handshake for new connection

2021-01-28 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-14082:
--

 Summary: Implementation of handshake for new connection
 Key: IGNITE-14082
 URL: https://issues.apache.org/jira/browse/IGNITE-14082
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


It needs to implement the handshake after netty establish the connection. 
Perhaps, It makes sense to use netty handlers. During the handshake, It needs 
to exchange instanceId from one endpoint to another.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14081) Networking module

2021-01-28 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-14081:
--

 Summary: Networking module
 Key: IGNITE-14081
 URL: https://issues.apache.org/jira/browse/IGNITE-14081
 Project: Ignite
  Issue Type: New Feature
Reporter: Anton Kalashnikov






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14055) Deadlock in timeoutObjectProcessor between 'send messag'e & 'handshake timeout'

2021-01-25 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-14055:
--

 Summary: Deadlock in timeoutObjectProcessor between 'send messag'e 
& 'handshake timeout'
 Key: IGNITE-14055
 URL: https://issues.apache.org/jira/browse/IGNITE-14055
 Project: Ignite
  Issue Type: Bug
    Reporter: Anton Kalashnikov
    Assignee: Anton Kalashnikov


Cluster hangs after jvm pauses on one of server nodes.
Scenario:
1. Start three server nodes with put operations using StartServerWithTxPuts.
2. Emulate jvm freezes on one server node by running the attached script:
{{*sh freeze.sh  *}}
3. Wait until the script has finished.

Result:
The cluster hangs on tx put operations.

The first server node continuously prints:
{{{noformat}}}
{{}}{{[2020-11-03 09:36:01,719][INFO 
][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
rmtAddr=/127.0.0.1:57714][2020-11-03 09:36:01,720][INFO 
][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] 
Received incoming connection from remote node while connecting to this node, 
rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 
09:36:01,922][INFO 
][grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
rmtAddr=/127.0.0.1:57716][2020-11-03 09:36:01,922][INFO 
][grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi%][TcpCommunicationSpi] 
Received incoming connection from remote node while connecting to this node, 
rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 
09:36:02,124][INFO 
][grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
rmtAddr=/127.0.0.1:57718][2020-11-03 09:36:02,125][INFO 
][grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%][TcpCommunicationSpi] 
Received incoming connection from remote node while connecting to this node, 
rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 
09:36:02,326][INFO 
][grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
rmtAddr=/127.0.0.1:57720][2020-11-03 09:36:02,327][INFO 
][grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi%][TcpCommunicationSpi] 
Received incoming connection from remote node while connecting to this node, 
rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 
09:36:02,528][INFO 
][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
rmtAddr=/127.0.0.1:57722][2020-11-03 09:36:02,529][INFO 
][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] 
Received incoming connection from remote node while connecting to this node, 
rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3]}}
{{{noformat}}}{{}}
The second node prints long running transactions in prepared state ignoring the 
default tx timeout:

{{{noformat}}}
{{1}}{{[2020-11-03 09:36:46,199][WARN 
][sys-#83%56b4f715-82d6-4d63-ba99-441ffcd673b4%][diagnostic] >>> Future 
[startTime=09:33:08.496, curTime=09:36:46.181, fut=GridNearTxFinishFuture 
[futId=425decc8571-4ce98554-8c56-4daf-a7a9-5b9bff52fa08, tx=GridNearTxLocal 
[mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping 
[entries=LinkedHashSet [IgniteTxEntry [txKey=IgniteTxKey 
[key=KeyCacheObjectImpl [part=833, val=833, hasValBytes=true], 
cacheId=-923393186], val=TxEntryValueHolder [val=CacheObjectByteArrayImpl 
[arrLen=1048576], op=CREATE], prevVal=TxEntryValueHolder [val=null, op=NOOP], 
oldVal=TxEntryValueHolder [val=null, op=NOOP], entryProcessorsCol=null, ttl=-1, 
conflictExpireTime=-1, conflictVer=null, explicitVer=null, dhtVer=null, 
filters=CacheEntryPredicate[] [], filtersPassed=false, filtersSet=true, 
entry=GridDhtDetachedCacheEntry [super=GridDistributedCacheEntry 
[super=GridCacheMapEntry [key=KeyCacheObjectImpl [part=833, val=833, 
hasValBytes=true], val=null, ver=GridCacheVersion [topVer=0, order=0, 
nodeOrder=0], hash=833, extras=null, flags=0]]], prepared=0, locked=false, 
nodeId=07583a9d-36c8-4100-a69c-8cbd26ca82c9, locMapped=false, expiryPlc=null, 
transferExpiryPlc=false, flags=0, partUpdateCntr=0, serReadVer=null, 
xidVer=GridCacheVersion [topVer=215865159, order=1604385188157, nodeOrder=2]]], 
explicitLock

[jira] [Created] (IGNITE-13972) Clear the item id before moving the page to the reuse bucket

2021-01-11 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13972:
--

 Summary: Clear the item id before moving the page to the reuse  
bucket
 Key: IGNITE-13972
 URL: https://issues.apache.org/jira/browse/IGNITE-13972
 Project: Ignite
  Issue Type: Task
Reporter: Anton Kalashnikov


There is assert - 'Incorrectly recycled pageId in reuse 
bucket:'(org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList#takeEmptyPage).
 This assert sometimes fails. The reason is not clear because the same 
condition checked before putting this page in to reuse bucket. (Perhaps we have 
more than 1 link to this page?)

There is an idea to reset item id to 1 before the putting page to reuse bucket 
in order of decreasing the possible invariants which can break this assert. It 
is already true for all data pages but item id can be still more than 1 if it 
is not a data page(ex. inner page).

After that, we can change this assert from checking the range to checking the 
equality to 1 which theoretically will help us detect the problem fastly.

Maybe it is also not a bad idea to set itemId to an impossible value(ex. 0 or 
255). Then we can add the assert on every taking from the free list which 
checks that itemId more than 0 and if it is false that means we have a link to 
the reuse bucket page from the bucket which is not reused. Which is a bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Metastorage limitations

2020-12-22 Thread Anton Kalashnikov
> How it differs from the current implementation?
There are no differences in implementation. But according to the original 
topic, plugins or other external things can write to metastore their own 
classes. I just said that according to current architecture it is not a good 
idea to do that because these POJO's are not from the core.

> Why do you think that «good» solution should exist for this kind of issue?
I don't think so. I just emphasize my concern about local filtering(the usage 
of system property) in this solution because it can lead to different behavior 
on different nodes. But perhaps you are right, and for such bug we can use a 
such solution.

> Maybe we should make metastore fully lazy, so any stored key will not be 
> deserialized before it explicitly queried.
I approximately meant the same. We should think about that.

In conclusion(my opinions):

In the current design, inside of plugins(etc.) it makes sense to use only 
primitives or POJO's from the core. (This is my answer to the original topic)
It makes sense to think about serialization by demand inside of metastorage 
rather than in the discovery.
I have no good solution for resolving the bug with removed classes. Perhaps, we 
can use Nikolay's fix. Only maybe, it makes sense to remove such records (with 
all history) instead of filtering them.

-- 
Best regards,
Anton Kalashnikov



22.12.2020, 14:44, "Nikolay Izhikov" :
> Hello, Anton.
>
>>  or use the POJO from the ignite core.
>
> How it differs from the current implementation?
>
>>  As I can see you have tests only for one node but what happens if different 
>> nodes have different filters?
>
> The error will happen.
>
> Please, don’t forget that we are talking about two scenarios:
>
> 1. Blocker bug - we delete some class that was written to metastore from the 
> new version.
> 2. Migration between custom Ignite distributions where one of them has a 
> custom class and the other doesn’t.
>
> It’s a very rare incident in my experience.
>
> Why do you think that «good» solution should exist for this kind of issue?
> I don’t think we should limit internal architecture to cover these cases.
>
> Maybe we should make metastore fully lazy, so any stored key will not be 
> deserialized before it explicitly queried.
>
>>  22 дек. 2020 г., в 14:30, Anton Kalashnikov  написал(а):
>>
>>  Hello everyone,
>>
>>  In my opinion, it is indeed better to limit storing to the metastore by 
>> primitive type(map or list are also possible) or use the POJO from the 
>> ignite core.
>>  As Kirill correctly notice, right now, it is a problem not inside of the 
>> distributed metastore but inside of discovery.
>>  In fact, we can rewrite the sending metastore data in such a way that the 
>> discovery would think that there is just a simple byte array which shouldn't 
>> be deserialized. Right now, it understands that it is a serialized java 
>> object and it tries to deserialize it after receiving it. But this way 
>> requires more investigation about possible corner cases.
>>
>>  Nikolay, I also don't sure that your fix handles metastorage history 
>> correctly.
>>  As I can see you have tests only for one node but what happens if different 
>> nodes have different filters?
>>  or if we need to send history to the joining node but some of the keys 
>> don't pass the filter?
>>  Maybe I wrong but in the first eye, it can lead to different 
>> results/histories on different nodes which is a problem.
>>  I just briefly looked at your PR(so maybe I didn't understand something), I 
>> will try to do it more carefully at the nearest time.
>>
>>  --
>>  Best regards,
>>  Anton Kalashnikov
>>
>>  18.12.2020, 15:33, "Mekhanikov Denis" :
>>>  Nikolay,
>>>
>>>  Thanks for your reply!
>>>
>>>  I encountered a similar case to what you've described in point #1. I used 
>>> a private plugin that writes some information to the metastorage.
>>>  After that I decided to get rid of that plugin, while the information had 
>>> already been written to the metastorage.
>>>  Following the approach that you described and implemented in the PR, I'll 
>>> need to work with the flag to ignore certain keys in the metastorage 
>>> forever. That's quite inconvenient.
>>>  Wouldn't it be better if we just limited the set of allowed types that can 
>>> be stored in the metastorage? Instead of a POJO, a Map will be accepted.
>>>
>>>  Denis
>>>
>>>  On 18.12.2020, 13:59, "ткаленко кирилл"  wrote:
>>>

Re: Metastorage limitations

2020-12-22 Thread Anton Kalashnikov
Hello everyone,

In my opinion, it is indeed better to limit storing to the metastore by 
primitive type(map or list are also possible) or use the POJO from the ignite 
core.
As Kirill correctly notice, right now, it is a problem not inside of the 
distributed metastore but inside of discovery. 
In fact, we can rewrite the sending metastore data  in such a way that the 
discovery would think that there is just a simple byte array which shouldn't be 
deserialized. Right now, it understands that it is a serialized java object and 
it tries to deserialize it after receiving it. But this way requires more 
investigation about possible corner cases.

Nikolay,  I also don't sure that your fix handles metastorage history 
correctly. 
As I can see you have tests only for one node but what happens if different 
nodes have different filters? 
or if we need to send history to the joining node but some of the keys don't 
pass the filter? 
Maybe I wrong but in the first eye, it can lead to different results/histories 
on different nodes which is a problem.
I just briefly looked at your PR(so maybe I didn't understand something), I 
will try to do it more carefully at the nearest time. 

-- 
Best regards,
Anton Kalashnikov



18.12.2020, 15:33, "Mekhanikov Denis" :
> Nikolay,
>
> Thanks for your reply!
>
> I encountered a similar case to what you've described in point #1. I used a 
> private plugin that writes some information to the metastorage.
> After that I decided to get rid of that plugin, while the information had 
> already been written to the metastorage.
> Following the approach that you described and implemented in the PR, I'll 
> need to work with the flag to ignore certain keys in the metastorage forever. 
> That's quite inconvenient.
> Wouldn't it be better if we just limited the set of allowed types that can be 
> stored in the metastorage? Instead of a POJO, a Map will be accepted.
>
> Denis
>
> On 18.12.2020, 13:59, "ткаленко кирилл"  wrote:
>
> Hello everybody!
>
> If you look at the stackTrace, the error is that deserialized objects are 
> being sent to listeners.
> It may be more correct to send a raw arrays of bytes, and each plugin 
> will be able to process it if needed.
>
> 18.12.2020, 12:18, "Nikolay Izhikov" :
> > Hello, Denis.
> >
> > It’s a known issue for me.
> > Metastore is a private API, isn’t it?
> > AFAICU it can occur for two reasons:
> >
> > * User migrates from custom Ignite fork that has private improvements 
> or plugins that write to the metastore.
> > * We have a blocker bug and just remove some internal class that can be 
> written into metastore from distribution.
> >
> > I planned to fix it with some system flag.
> > During startup administrator just sets a list of the metastore items 
> that should be ignored.
> > Please, take a look at the PR [1]
> >
> > WDYT?
> >
> >> it’s better to limit the metastorage with storing primitives only
> >
> > I think that ability to write object is very useful and should stay.
> >
> > [1] https://github.com/apache/ignite/pull/8221
> >
> >> 18 дек. 2020 г., в 12:06, Mekhanikov Denis  
> написал(а):
> >>
> >> Hi everyone!
> >>
> >> Ignite has a limitation that it can’t work with custom classes put 
> into metastorage: https://issues.apache.org/jira/browse/IGNITE-13642
> >> If you put a POJO into the metastorage, then Ignite will try to 
> deserialize it using the classes it finds on the classpath. If it can’t do 
> the deserialization, then the node will fail.
> >>
> >> There is an opinion that the metastorage wasn’t design for a case when 
> classes that can disappear from Ignite distribution.
> >> If we follow this path, then it’s better to limit the metastorage with 
> storing primitives only, so that it’s impossible to occasionally put anything 
> breaking.
> >> If a piece of configuration is put into the metastorage by a plugin, 
> then the plugin will be in charge of deserializing the configuration, and not 
> Ignite.
> >>
> >> Alternatively we can try to fix the metastorage and make it ignore 
> deserialization errors when they occur.
> >>
> >> What do you think?
> >>
> >> Denis


[jira] [Created] (IGNITE-13843) Wrapper/Converter for primitive configuration

2020-12-11 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13843:
--

 Summary: Wrapper/Converter for primitive configuration 
 Key: IGNITE-13843
 URL: https://issues.apache.org/jira/browse/IGNITE-13843
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


Do we need the ability to use complex type such InternetAddress as wrapper of 
some string property?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13842) Creating the new configuration on old cluster

2020-12-11 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13842:
--

 Summary: Creating the new configuration on old cluster
 Key: IGNITE-13842
 URL: https://issues.apache.org/jira/browse/IGNITE-13842
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


Do we need the ability to create a new configuration/property on the working 
cluster? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13841) Cluster bootstrapping

2020-12-11 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13841:
--

 Summary: Cluster bootstrapping 
 Key: IGNITE-13841
 URL: https://issues.apache.org/jira/browse/IGNITE-13841
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


How cluster bootstrapping should look like? Format of files? What is the right 
moment fr applying configuration? What is the state of the cluster before 
applying?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13840) Rething API of Init*, change* classes

2020-12-11 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13840:
--

 Summary: Rething API  of Init*, change* classes
 Key: IGNITE-13840
 URL: https://issues.apache.org/jira/browse/IGNITE-13840
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


Right now, API of Init*, change* classes look too heavy and contain a lot of 
code boilerplate. It needs to think about how to simplify it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13837) Configuration initialization

2020-12-10 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13837:
--

 Summary: Configuration initialization
 Key: IGNITE-13837
 URL: https://issues.apache.org/jira/browse/IGNITE-13837
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


It needs to think how the first initialization of node/cluster should look 
like. What is the format of initial properties(json/hocon etc.)? How should 
they be handled?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13836) Multiple property roots support

2020-12-10 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13836:
--

 Summary: Multiple property roots support
 Key: IGNITE-13836
 URL: https://issues.apache.org/jira/browse/IGNITE-13836
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


Right now, Configurator is able to manage only one root. It looks like it is 
not enough. The current idea is to provide the ability to maintain multiple 
property roots, which allows other modules to create their own roots as needed.

ex.:
 * indexing.query.bufferSize
 * persistence.pageSize

NB! There is not any local/cluster root because it looks like local/cluster 
shouldn't be there at all. Perhaps it should be a storage-specific feature 
rather than a property path specific.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSSION] Modules organization in Ignite 3

2020-12-10 Thread Anton Kalashnikov
Ilya, it is a good point about jar name. If it really unresolvable problem we 
should leave modules name as is(I do not insist on renaming it was just a 
proposal)

I also not really got you point about 'zillion JARs'. Do you think it will be a 
problem if the number of JARs increases a little? If so can you clarify it with 
some specific examples?

-- 
Best regards,
Anton Kalashnikov



09.12.2020, 18:12, "Ilya Kasnacheev" :
> Hello!
>
> When you do mvn dependencies:copy, you will now end up with
> "ignite-core-X.Y.Z.jar", "ignite-indexing-X.Y.Z.jar"
> But if we remove "ignite" from artifact name, user will end up with cryptic
> "configuration.jar", "raft.jar", etc, etc.
>
> Remember that group name is discarded in file names. Therefore, every
> project out there keeps stuff like "spring-" or "hibernate-" in artifact
> name.
>
> I also hope that our users will not end up with zillion JARs.
>
> Regards,
> --
> Ilya Kasnacheev
>
> ср, 9 дек. 2020 г. в 16:16, Anton Kalashnikov :
>
>>  Hello,
>>
>>  I totally agree that we should start to think about the module
>>  organization.
>>  I suggest making the new confluence page where we define new rules on how
>>  to develop modules.
>>  In my opinion, at least, the following topics should be covered there(it
>>  makes sense to discuss every topic separately, not here):
>>  * In which cases new modules are required
>>  * The naming of modules, packages
>>  * Class dependency management ( inversion of control, no more context)
>>  * Test organization ( module contains only unit, module tests. All
>>  integration tests are in the extra module)
>>  * Feature/Module lifecycle - experimental, stable, unstable, deprecated
>>
>>  Let's get back to the original topic. I agree with the package naming
>>  rule:
>>  if the module name is configuration the package name should be
>>  org.apache.ignite.configuration and only after that any other subpackages.
>>  Also, I don't sure that we need ignite- prefix in the module name because
>>  it doesn't have any extra information:
>>
>>    org.apache.ignite
>>    ignite-configuration
>>
>>  we don't lose anything if convert it to
>>
>>    org.apache.ignite
>>    configuration
>>
>>  I also hope that jigsaw can help us somehow with class visibility.
>>  But if not we can take agreement that for example 'internal' package -
>>  shouldn't be touched outside of the module -
>>  of course, using the proper class access level is also the solution(where
>>  it is possible)
>>  org.apache.ignite.configuration.Configurator // it has access to the
>>  internal package but it shouldn't return any class from the internal
>>  package only from the public one.
>>  org.apache.ignite.configuration.DynamicProperty //interface
>>  org.apache.ignite.configuration.internal.properties.IntDynamicProperty
>>
>>  User API makes sense only for one end module - ignite(or ignite-core)
>>  which depends on all other modules
>>  and doing some integration(adapters) and provide final API for the user.
>>  So I agree that separated module Ignite-API with zero dependencies will be
>>  a good solution.
>>
>>  configuration module:
>>    Configurator.baseline().enabled() -> DynamicProperties
>>
>>  ignite-api module:
>>    BaselineConfiguration.enabled() -> boolean //interface
>>
>>  ignite module:
>>    BaselineConfigurationImpl implements BaselineConfiguration{
>>  Configurator configurator;
>>  public boolean enabled(){
>> return configurator.baseline().enabled().value();
>>  }
>>    }
>>
>>  So maybe my example is not so good. But I want to show that end-user API
>>  will be defined only in ignite-api
>>  and you need to adapt it in ignite module which leads to some
>>  overhead(especially in my example)
>>   but it makes development pretty manageable/predictable -
>>  you can easily implement a new module without any worries that user starts
>>  to use it.
>>  It will be available only after making changes in ignite-api.
>>  The major advantage here is the small size of ignite-api which allows to
>>  carefully review every change
>>  which allows keeping ignite API in better quality(I hope at least)
>>
>>  Nikolay, maybe is it better to discuss your question on a separate topic?
>>  Because looks like it is a pretty discussable topic.
>>
>>  --
>>  Best regards,
>>  Anton

Re: [DISCUSSION] Modules organization in Ignite 3

2020-12-09 Thread Anton Kalashnikov
Hello,

I totally agree that we should start to think about the module organization. 
I suggest making the new confluence page where we define new rules on how to 
develop modules. 
In my opinion, at least, the following topics should be covered there(it makes 
sense to discuss every topic separately, not here):
* In which cases new modules are required
* The naming of modules, packages
* Class dependency management ( inversion of control, no more context)
* Test organization ( module contains only unit, module tests. All integration 
tests are in the extra module)
* Feature/Module lifecycle - experimental, stable, unstable, deprecated


Let's get back to the original topic. I agree with the package naming rule: 
if the module name is configuration the package name should be 
org.apache.ignite.configuration and only after that any other subpackages.
Also, I don't sure that we need ignite- prefix in the module name because it 
doesn't have any extra information:

  org.apache.ignite 
  ignite-configuration

we don't lose anything if convert it to

  org.apache.ignite 
  configuration


I also hope that jigsaw can help us somehow with class visibility. 
But if not we can take agreement that for example 'internal' package - 
shouldn't be touched outside of the module - 
of course, using the proper class access level is also the solution(where it is 
possible)
org.apache.ignite.configuration.Configurator // it has access to the internal 
package but it shouldn't return any class from the internal package only from 
the public one.
org.apache.ignite.configuration.DynamicProperty //interface
org.apache.ignite.configuration.internal.properties.IntDynamicProperty


User API makes sense only for one end module - ignite(or ignite-core) which 
depends on all other modules 
and doing some integration(adapters) and provide final API for the user. 
So I agree that separated module Ignite-API with zero dependencies will be a 
good solution.

configuration module:
  Configurator.baseline().enabled() -> DynamicProperties

ignite-api module:
  BaselineConfiguration.enabled() -> boolean //interface

ignite module:
  BaselineConfigurationImpl implements BaselineConfiguration{
    Configurator configurator;
    public boolean enabled(){
       return configurator.baseline().enabled().value();
    }
  }

So maybe my example is not so good. But I want to show that end-user API will 
be defined only in ignite-api 
and you need to adapt it in ignite module which leads to some 
overhead(especially in my example)
 but it makes development pretty manageable/predictable - 
you can easily implement a new module without any worries that user starts to 
use it. 
It will be available only after making changes in ignite-api. 
The major advantage here is the small size of ignite-api which allows to 
carefully review every change 
which allows keeping ignite API in better quality(I hope at least)

Nikolay, maybe is it better to discuss your question on a separate topic? 
Because looks like it is a pretty discussable topic.

-- 
Best regards,
Anton Kalashnikov



09.12.2020, 10:31, "Nikolay Izhikov" :
> Hello, Zhenya, Ivan.
>
>>  Hello Nikolay, if i find out introduced features structure in some project, 
>> i would prefer to choose different one )
>
> Many, of the real world users disagree with you.
> Please, take a look at some examples from widely used projects:
>
> Kafka - 
> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/annotation/InterfaceStability.java#L28
> - Stable, Evolving, Unstable
>
> Spark - 
> https://github.com/apache/spark/tree/master/common/tags/src/main/java/org/apache/spark/annotation
> - AlphaComponent, DeveloperApi, Evolving, Experimental, Private, 
> Stable, Unstable
>
>>  Having officially "unstable" features doesn't sound good for product 
>> reputation.
>
> Can’t agree with you.
>
> Forcing ourselves to make perfect API from the first try we just put too much 
> pressure on every decision.
> Every developer making mistakes.
> The product is evolving and the API too - it’s totally OK.
>
> For every new feature time required to be adopted and used in real-world 
> production.
> I believe, slight API changes is totally fine for early adopters.
> Moreover, I think, that we should warn our users that some feature is very 
> fresh and can have issues.
>
> So, Why Kafka and Spark is good enough to have unstable API and Ignite not? :)
>
>>  9 дек. 2020 г., в 10:08, Ivan Bessonov  написал(а):
>>
>>  Conversation shifted into an unintended direction, but I agree.
>>
>>  I think that if API can (or will) be changed then it should be deprecated.
>>  For that
>>  we can introduce @IgniteDeprecated that will contain Ignite version when
>>

[jira] [Created] (IGNITE-13720) Defragmentation parallelism implementation

2020-11-18 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13720:
--

 Summary: Defragmentation parallelism implementation
 Key: IGNITE-13720
 URL: https://issues.apache.org/jira/browse/IGNITE-13720
 Project: Ignite
  Issue Type: Sub-task
  Components: persistence
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


Defragmentation is executed in a single thread right now. It makes sense to 
execute the defragmentation of partitions of one group in parallel.

Several parameters will be added to the defragmentation configuration:
 * checkpointThreadPoolSize - the size of thread pool which would be used by 
checkpointer for writing defragmented pages to disk.
 * executionThreadPoolSize - the size of the thread pool which shows how many 
partitions maximum can be defragmented at the same time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13684) Rewrite PageIo resolver from static to explicit dependency

2020-11-06 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13684:
--

 Summary: Rewrite PageIo resolver from static to explicit dependency
 Key: IGNITE-13684
 URL: https://issues.apache.org/jira/browse/IGNITE-13684
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov
Assignee: Ivan Bessonov


Right now, ignite has a static pageIo resolver which not allow substituting the 
different implementation if needed. So it is needed to rewrite the current 
implementation in order of this target.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13683) Added MVCC validation to ValidateIndexesClosure

2020-11-06 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13683:
--

 Summary: Added MVCC validation to ValidateIndexesClosure
 Key: IGNITE-13683
 URL: https://issues.apache.org/jira/browse/IGNITE-13683
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov
Assignee: Semyon Danilov


MVCC indexes validation should be added to ValidateIndexesClosure



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13682) Added generic to maintenance mode feature

2020-11-06 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13682:
--

 Summary: Added generic to maintenance mode feature
 Key: IGNITE-13682
 URL: https://issues.apache.org/jira/browse/IGNITE-13682
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


MaintenanceAction has no generic right now which lead to parametirezed problem



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13681) Non markers checkpoint implementation

2020-11-06 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13681:
--

 Summary: Non markers checkpoint implementation
 Key: IGNITE-13681
 URL: https://issues.apache.org/jira/browse/IGNITE-13681
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


It's needed to implement a new version of checkpoint which will be simpler than 
the current one. The main differences compared to the current checkpoint:
* It doesn't contain any write operation to WAL.
* It doesn't create checkpoint markers.
* It should be possible to configure checkpoint listener only on the exact data 
region
This checkpoint will be helpful for defragmentation and for recovery(it is not 
possible to use the current checkpoint during recovery right now)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13569) disable archiving + walCompactionEnabled probably broke reading from wal on server restart

2020-10-09 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13569:
--

 Summary: disable archiving + walCompactionEnabled probably broke 
reading from wal on server restart
 Key: IGNITE-13569
 URL: https://issues.apache.org/jira/browse/IGNITE-13569
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


* Start cluster with 4 server node
* Preload
* Start 4 clients 
* Start transactional loading
* Wait 10 sec
While loading:
For node in server nodes:
   Kill -9 node
   Wait 20 sec
   Return node back
   Wait 20 sec

Wal + Wal_archive - lab40, lab41 - 
/storage/hdd/aromantsov/GG-18739

Looks like node can't read all wal files that was generated before start node 
back

{noformat}
[12:50:27,001][SEVERE][wal-file-compressor-%null%-1-#71][FileWriteAheadLogManager]
 Compression of WAL segment [idx=0] was skipped due to unexpected error
class org.apache.ignite.IgniteCheckedException: WAL archive segment is missing: 
/storage/ssd/aromantsov/tiden/snapshots-190514-121520/test_pitr/node_1_1/.wal
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body0(FileWriteAheadLogManager.java:2076)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body(FileWriteAheadLogManager.java:2054)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748)
[12:50:27,001][SEVERE][wal-file-compressor-%null%-0-#69][FileWriteAheadLogManager]
 Compression of WAL segment [idx=2] was skipped due to unexpected error
class org.apache.ignite.IgniteCheckedException: WAL archive segment is missing: 
/storage/ssd/aromantsov/tiden/snapshots-190514-121520/test_pitr/node_1_1/0002.wal
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body0(FileWriteAheadLogManager.java:2076)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.access$4800(FileWriteAheadLogManager.java:2019)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressor.body(FileWriteAheadLogManager.java:1995)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748)
[12:50:27,001][SEVERE][wal-file-compressor-%null%-3-#73][FileWriteAheadLogManager]
 Compression of WAL segment [idx=3] was skipped due to unexpected error
class org.apache.ignite.IgniteCheckedException: WAL archive segment is missing: 
/storage/ssd/aromantsov/tiden/snapshots-190514-121520/test_pitr/node_1_1/0003.wal
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body0(FileWriteAheadLogManager.java:2076)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body(FileWriteAheadLogManager.java:2054)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748)
[12:50:27,001][SEVERE][wal-file-compressor-%null%-2-#72][FileWriteAheadLogManager]
 Compression of WAL segment [idx=1] was skipped due to unexpected error
class org.apache.ignite.IgniteCheckedException: WAL archive segment is missing: 
/storage/ssd/aromantsov/tiden/snapshots-190514-121520/test_pitr/node_1_1/0001.wal
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body0(FileWriteAheadLogManager.java:2076)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body(FileWriteAheadLogManager.java:2054)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748)
[12:50:27,002][SEVERE][wal-file-compressor-%null%-1-#71][FileWriteAheadLogManager]
 Compression of WAL segment [idx=4] was skipped due to unexpected error
class org.apache.ignite.IgniteCheckedException: WAL archive segment is missing: 
/storage/ssd/aromantsov/tiden/snapshots-190514-121520/test_pitr/node_1_1/0004.wal
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body0(FileWriteAheadLogManager.java:2076)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body(FileWriteAheadLogManager.java:2054)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748)
[12:50:27,002][SEVERE][wal-file-compressor-%null%-0-#69][FileWriteAheadLogMa

Re: Broken test in master: BasicIndexTest

2020-10-09 Thread Anton Kalashnikov
Hi everyone,

I believe I have a fix for this bug - 
https://issues.apache.org/jira/browse/IGNITE-13500, So Zhenya you can leave 
this problem to me.

-- 
Best regards,
Anton Kalashnikov



09.10.2020, 10:19, "Sergey Chugunov" :
> Max,
>
> Thanks for spotting this, great catch!
>
> Zhenya, could you please file a ticket of at least Critical priority?
>
> On Fri, Oct 9, 2020 at 9:24 AM Zhenya Stanilovsky
>  wrote:
>
>>  Thanks Maxim, the test is correct no need for removal.
>>  I checked 2.9 too, but looks it all ok there. I will take a look.
>>  >Hi, Igniters!
>>  >
>>  >I was discovering how indexes work and found a failed test.
>>  >BasicIndexTest#testInlineSizeChange is broken in master and it's not a
>>  >flaky case [1]. But it has been failing since 25/09 only.
>>  >
>>  >I discovered that it happened after the IGNITE-13207 ticket merged
>>  >(Checkpointer code refactoring) [2]. I'm not sure about the expected
>>  >behaviour of the inline index and how checkpointer affects it. But let's
>>  >fix it if it is a bug or completely remove this test.
>>  >
>>  >[1]
>>  >
>>  
>> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=6131871779633595667&branch=%3Cdefault%3E&tab=testDetails
>>  >
>>  >[2] https://issues.apache.org/jira/browse/IGNITE-13207
>>  >


[jira] [Created] (IGNITE-13562) Prototype dynamic configuration

2020-10-08 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13562:
--

 Summary: Prototype dynamic configuration
 Key: IGNITE-13562
 URL: https://issues.apache.org/jira/browse/IGNITE-13562
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov
Assignee: Semyon Danilov


The main target to add a new extra configuration module with a framework that 
allows us to create dynamic properties(node local and cluster wide?).

The framework should provide the following:
* Describing a rule for the schema by which public and private property classes 
would be generated
* Implementing generation public and private classes from schema
* Describing a view of public POJO(update/insert/get) to interact with 
properties in a type-safe way 
* Converting the property from HOCON to the inner view







--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13511) Unified configuration

2020-10-02 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13511:
--

 Summary: Unified configuration
 Key: IGNITE-13511
 URL: https://issues.apache.org/jira/browse/IGNITE-13511
 Project: Ignite
  Issue Type: New Feature
Reporter: Anton Kalashnikov


https://cwiki.apache.org/confluence/display/IGNITE/IEP-55+Unified+Configuration



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13500) Checkpoint read lock fail if it is taking under write lock during the stopping node

2020-09-30 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13500:
--

 Summary: Checkpoint read lock fail if it is taking under write 
lock during the stopping node
 Key: IGNITE-13500
 URL: https://issues.apache.org/jira/browse/IGNITE-13500
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov


org.apache.ignite.internal.processors.cache.index.BasicIndexTest#testDynamicIndexesDropWithPersistence

{noformat}
[2020-09-30 
15:09:26,085][ERROR][db-checkpoint-thread-#371%index.BasicIndexTest0%][Checkpointer]
 Runtime error caught during grid runnable execution: GridWorker 
[name=db-checkpoint-thread, igniteInstanceName=index.BasicIndexTest0, 
finished=false, heartbeatTs=1601467766063, hashCode=963964001, 
interrupted=false, runner=db-checkpoint-thread-#371%index.BasicIndexTest0%]
class org.apache.ignite.IgniteException: Failed to perform cache update: node 
is stopping.
at 
org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.doCheckpoint(Checkpointer.java:396)
at 
org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.body(Checkpointer.java:263)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748)
Caused by: class org.apache.ignite.IgniteException: Failed to perform cache 
update: node is stopping.
at 
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:128)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1298)
at 
org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor.removeDurableBackgroundTask(DurableBackgroundTasksProcessor.java:245)
at 
org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor.onMarkCheckpointBegin(DurableBackgroundTasksProcessor.java:277)
at 
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointWorkflow.markCheckpointBegin(CheckpointWorkflow.java:274)
at 
org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.doCheckpoint(Checkpointer.java:387)
... 3 more
Caused by: class org.apache.ignite.internal.NodeStoppingException: Failed to 
perform cache update: node is stopping.
... 9 more
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Checkpoint refactoring(part of IEP-47)

2020-09-15 Thread Anton Kalashnikov
Hello,

I have finished the last checkpoint refactoring 
ticket(https://issues.apache.org/jira/browse/IGNITE-13207).
In general, there are several new classes were created(it mostly extracted from 
GridDatabaseSharedManager). More detailed description you can find on the 
ticket.
If somebody interested then they can take a look at pull-request and maybe 
suggest different class segregation.

-- 
Best regards,
Anton Kalashnikov



22.06.2020, 11:52, "Anton Kalashnikov" :
> In fact, It is also one of my targets. I believe it will be possible when 
> checkpoint's classes will be restructured to smaller classes with more clear 
> responsibilities. So if everything goes good we can do it after step 3 which 
> I described above.
>
> --
> Best regards,
> Anton Kalashnikov
>
> 19.06.2020, 17:28, "Ivan Pavlukhin" :
>>  Hi Anton,
>>
>>  A side question. Do you feel that it is possible to cover extracted
>>  classes with unit tests (I mean unit tests which do not start Ignite
>>  nodes)?
>>
>>  2020-06-19 16:59 GMT+03:00, Anton Kalashnikov :
>>>   Hi Igniters,
>>>
>>>   IEP-47(Native persistence defragmentation) contains a part that implies
>>>   refactoring of checkpoint(with the goal of reusing this feature in
>>>   defragmentation).
>>>
>>>   I just to want to emphasize this part(refactoring) here and share my view 
>>> to
>>>   implementation
>>>   I want to split this job to at least 2(but maybe 3) separated tasks:
>>>   1. Extracting checkpoint related classes from
>>>   GridCacheDatabaseSharedManager(ex. Checkpoint, Checkpointer,
>>>   WriteCheckpointPages, etc.) -
>>>   https://issues.apache.org/jira/browse/IGNITE-13151(almost done)
>>>   2. Simplifying result code - I don't sure it is possible, but right now I
>>>   see some code which on the first eye has duplication and redundancy
>>>   3. Reorganizing code - There is a lot of work which Checkpointer do right
>>>   now, I believe at least this class should be split.
>>>
>>>   Perhaps, 2 and 3 items will be done in one ticket.
>>>   I believe you understand that I suggested several tickets instead of one 
>>> in
>>>   the target of simplification of review and find bugs.
>>>
>>>   Any objections?
>>>
>>>   --
>>>   Best regards,
>>>   Anton Kalashnikov
>>
>>  --
>>
>>  Best regards,
>>  Ivan Pavlukhin


Re: [DISCUSSION] Add autocompletion for commands in control.sh

2020-09-15 Thread Anton Kalashnikov
Hello,

This idea looks great in my opinion. I agree that a more common approach would 
be better but I am the same as Kirill haven't found the better one. So in my 
point of view, this library is better than nothing, and if nobody will 
recommend a different one we can take picocli right now.

-- 
Best regards,
Anton Kalashnikov



14.09.2020, 13:15, "Данилов Семён" :
> Hello!
>
> I've looked through the picocli manual and it looks really great.
> We will be able easily add i18n and stuff like this.
> Nevertheless, we will get rid of the manual formatting of examples and help 
> texts.
>
> Kind regards,
> Semyon.
>
> 14.09.2020, 13:00, "Alexey Goncharuk" :
>>  Hi folks,
>>
>>  Despite the autocompletion support only for bash, I see the following
>>  benefits from this change:
>>   * It may unify all the CLI tooliing in Ignite, providing a better user
>>  experience
>>   * The library has an ability to generate man pages, which may be nice
>>   * I see there is an open issue for adding support to powershell, so
>>  Windows platform will be also somewhat covered
>>
>>  Overall, I support this idea.
>>
>>  вт, 8 сент. 2020 г. в 10:43, ткаленко кирилл :
>>
>>>   Hello, Ilya!
>>>
>>>   I agree that it would be better if we found a common approach, but I
>>>   haven't found one yet.
>>>
>>>   07.09.2020, 18:50, "Ilya Kasnacheev" :
>>>   > Hello!
>>>   >
>>>   > Not everyone is using bash, which leads me to question whether there's
>>>   any
>>>   > common approach where we can hint a shell what our executable can do so
>>>   > that it can discover and auto-complete our control.sh
>>>   >
>>>   > Regards,
>>>   > --
>>>   > Ilya Kasnacheev
>>>   >
>>>   > пн, 7 сент. 2020 г. в 17:47, ткаленко кирилл :
>>>   >
>>>   >> Hello, folks!
>>>   >>
>>>   >> I spent time to analyze the possibility of adding auto completion for
>>>   the
>>>   >> "control.sh" with the [1].
>>>   >>
>>>   >> To do this, at the beginning, we need to adapt the "control.sh" code to
>>>   >> [1], then we can automatically create a "bash completion script" via
>>>   [2],
>>>   >> and then install it, for example, with the "source" command and the
>>>   >> "control.sh" script itself via "install".
>>>   >>
>>>   >> This is only possible for nix systems.
>>>   >>
>>>   >> It is theoretically possible to add the "control.sh" extension via
>>>   plugins
>>>   >> and auto-generate "bash completion script".
>>>   >>
>>>   >> Thus, I propose a plan:
>>>   >> 1)Adapt "control.sh" to [1];
>>>   >> 2)Automatic creation of "bash completion script" for the release build;
>>>   >> 3)Adding extensibility "control.sh" and automatic re-creation of "bash
>>>   >> completion script". (optional)
>>>   >>
>>>   >> What do you think, comments?
>>>   >>
>>>   >> [1] - https://picocli.info/
>>>   >> [2] -
>>>   >>
>>>   
>>> https://picocli.info/autocomplete.html#_completion_script_generation_details


Re: Command line interface to manage distributed properties

2020-09-08 Thread Anton Kalashnikov
Hi everyone,

I think I agree with Nikolay that we should make available all our property not 
only some of them. Also maybe it makes sense to split this task into two tasks 
in the following way: Publish all distributed property through public 
interfaces(control.sh, jmx, etc.), Giving possibility to add permission to some 
properties.

In my opinion, we need the following changes(it is just a high overview of my 
ideas):

1) Publish distributed property
- DistributedConfigurationProcessor new methods: propertyList(): 
List, get(propertyName): DistributedProperty
- DistributedProperty new methods: stringView(), 
propagateFromString(valueAsString) - it's discussable, I still don't sure it is 
great place for such methods. Perhaps we should have a some converter inside of 
control.sh or whatever.

Usage:
List allClusterProperties = 
distributedConfigurationProcessor.list();
allClusterProperties.forEach( prop -> System.out.printl(prop.stringView()) );

DistributedProperty baselineAutoAdjustEnabled = 
distributedConfigurationProcessor.get("baselineAutoAdjustEnabled");
baselineAutoAdjustEnabled.propagateFromString("true");//baselineAutoAdjustEnabled.propagate(true)

Open questions:
- How to update complex objects(any object which is not primitive)? 
- Does it ok to convert the object to string inside the DistributedProperty?

2) Permission for distributed properties
- New class for permission - unfortunately, it's broking all 
hierarchy(DistributedLongProperty, DistributedBooleanProperty etc.) but maybe 
it is not a big problem
class PermissibleDistributedProperty extends 
SimpleDistributedProperty> {
   PermittedDistributedProperty(key, realValue, readPermission, 
writePermission) {
      super(key, new InnerPermissionWrapper(realValue, readPermission, 
writePermission);
    }
}

- I don't know a lot about ignite security so I don't sure where we should 
check the permission in that case - it can be a new special processor or just 
inside a job

One more idea - instead of creating the PermittedDistributedProperty we can 
store some mapping  separately(it 
can be static mapping or it can be stored in some new DistributedProperty). But 
in this case, it is possible to lost permission after property renaming.

-- 
Best regards,
Anton Kalashnikov



05.09.2020, 10:09, "Nikolay Izhikov" :
> Hello, Taras.
>
> One more thing:
>
>>  --property list - prints list of the available properties with description, 
>> e.g.:
>
> We have a convenient API to show Ignite internal objects - System Views [1]
>
> Any system view available via SQL and JMX.
> It seems we should have METASTORAGE view instead of this option.
>
> P.S. Should we add some CMD interface for system views?
>
> [1] https://apacheignite.readme.io/docs/system-views
>
>>  3 сент. 2020 г., в 10:37, Nikolay Izhikov  
>> написал(а):
>>
>>  Hello, Taras.
>>
>>>  I guess some properties (may be future properties) shouldn't be published 
>>> through generic cmd line interface.
>>
>>  With marker interface user have to wait for a new release to fix not 
>> published property.
>>  New release is a very long way for fixing one tiny configuration value.
>>
>>  Also, we shouldn’t hide anything from the administrator.
>>
>>  I’m sure that hiding any internals from our users is always a bad idea and 
>> hides some issue in the codebase.
>>  Let’s do it in Apache Way? :) - «Not restriction but common sense»
>>
>>  We can have some kind of `IgniteSystemProperty` with default read-only list 
>> and description to it -
>>  «User, you edit this properties fully on your own. We can’t predict results 
>> of such kind of edits»
>>  So user can fix this list manually.
>>
>>  WDYT?
>>
>>>  3 сент. 2020 г., в 10:21, Taras Ledkov  написал(а):
>>>
>>>  Hi,
>>>
>>>>  Why do we want to restrict property management somehow?
>>>  I guess some properties (may be future properties) shouldn't be published 
>>> through generic cmd line interface.
>>>  May be its require separate more complex cmd line commands, some 
>>> properties may have dependencies and require complex management not only 
>>> set/get.
>>>  In this case we can use distributed property without publish one via 
>>> simpel cmd line interface.
>>>
>>>  On 03.09.2020 10:05, Nikolay Izhikov wrote:
>>>>  Hello, Taras.
>>>>
>>>>  It a shame we don’t have a well-written guide for the development of the 
>>>> Ignite management interfaces at the moment.
>>>>  For now, we have dozen of some management APIs - java, JMX, SQL, 
>>>> control.sh, visorcmd.s

Re: Apache Ignite 2.9.0 RELEASE [Time, Scope, Manager]

2020-08-28 Thread Anton Kalashnikov
Hi Guys,

As I understand we will be merging some tickets to release. May I suggest also 
add ticket [1] to 2.9 release.

There are not a lot of changes in code but It's a critical fix for the ability 
to launch ignite in lamba on Azure(There are not any workaround).

So if nobody minds let's merge it to 2.9.

[1] https://issues.apache.org/jira/browse/IGNITE-13013

-- 
Best regards,
Anton Kalashnikov



28.08.2020, 11:16, "Alex Plehanov" :
> Guys,
>
> We have benchmarked 2.9 without IGNITE-13060 and IGNITE-12568 (reverted it
> locally) and got the same performance as on 2.8.1
>
> IGNITE-13060 (Tracing) - some code was added to hot paths, to trace these
> hot paths, it's clear why we have performance drop here.
>
> IGNITE-12568 (MessageFactory refactoring) - switch/case block was
> refactored to an array of message suppliers. The message factory is on the
> hot path, which explains why this commit has an impact on total
> performance.
> I've checked JIT assembly output, done some JMH microbenchmarks, and found
> that old implementation of MessageFactory.create() about 30-35% faster than
> the new one. The reason - approach with switch/case can effectively inline
> message creation code, but with an array of suppliers relatively heavy
> "invokeinterface" cannot be skipped. I've tried to rewrite the code using
> an abstract class for suppliers instead of an interface (to
> replace "invokeinterface" with the "invokevirtual"), but it gives back only
> 10% of method performance and in this case, code looks ugly (lambdas can't
> be used). Currently, I can't find any more ways to optimize the current
> approach (except return to the switch/case block). Andrey Gura, as the
> author of IGNITE-12568, maybe you have some ideas about optimization?
>
> Perhaps we should revert IGNITE-12568, but there are some metrics already
> created, which can't be rewritten using old message factory implementation
> (IGNITE-12756). Guys, WDYT?
>
> пт, 28 авг. 2020 г. в 01:52, Denis Magda :
>
>>  Looks beautiful and easy to use, thanks, Artem! Could you please add the
>>  following copyright to the footer of the pages?
>>
>>  *© 2020 The Apache Software Foundation.*
>>  *Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are
>>  either registered trademarks or trademarks of The Apache Software
>>  Foundation. *
>>  *Privacy Policy*
>>
>>  -
>>  Denis
>>
>>  On Thu, Aug 27, 2020 at 5:20 AM Artem Budnikov <
>>  a.budnikov.ign...@gmail.com> wrote:
>>
>>>  Hi everyone,
>>>
>>>  We published the draft of Ignite 2.9 documentation on the Apache Ignite
>>>  web-site. The docs are available via the following link:
>>>
>>>  https://ignite.apache.org/docs/2.9.0/installation/installing-using-docker
>>>
>>>  Alex,
>>>
>>>  Is there an estimate for the release date?
>>>
>>>  -Artem
>>>
>>>  On 26.08.2020 17:47, Alex Plehanov wrote:
>>>  > Denis,
>>>  >
>>>  > Currently, we are running mostly IgnitePutTxImplicitBenchmark without
>>>  > persistence. For other benchmarks drop is lower and it's harder to find
>>>  > problematic commit.
>>>  >
>>>  > ср, 26 авг. 2020 г. в 17:34, Denis Magda :
>>>  >
>>>  >> Alex,
>>>  >>
>>>  >> Thanks for sending an update. The drop is quite big. What are the
>>>  types of
>>>  >> benchmarks you are observing the degradation for (atomic puts,
>>>  >> transactions, sql, etc.)?
>>>  >>
>>>  >> Let us know if any help by particular committers is required.
>>>  >>
>>>  >> -
>>>  >> Denis
>>>  >>
>>>  >>
>>>  >> On Wed, Aug 26, 2020 at 12:26 AM Alex Plehanov <
>>>  plehanov.a...@gmail.com>
>>>  >> wrote:
>>>  >>
>>>  >>> Hello, guys!
>>>  >>>
>>>  >>> We finally have some benchmark results. Looks like there is more than
>>>  one
>>>  >>> commit with a performance drop. Detected drops for those commits only
>>>  >>> slightly higher than measurement error, so it was hard to find them
>>>  and
>>>  >> we
>>>  >>> are not completely sure we found them all and found them right.
>>>  >>>
>>>  >>> Drops detected:
>>>  >>> 2-3% drop on commit 99b0e0143e0 (IGNITE-13060 Tracing:

Re: [MTCGA]: new failures in builds [5539465] needs to be handled

2020-08-18 Thread Anton Kalashnikov
Hi,

It's my fail. I will fix it.
https://issues.apache.org/jira/browse/IGNITE-13368

-- 
Best regards,
Anton Kalashnikov



17.08.2020, 22:52, "dpavlov.ta...@gmail.com" :
> Hi Igniters,
>
>  I've detected some new issue on TeamCity to be handled. You are more than 
> welcomed to help.
>
>  If your changes can lead to this failure(s): We're grateful that you were a 
> volunteer to make the contribution to this project, but things change and you 
> may no longer be able to finalize your contribution.
>  Could you respond to this email and indicate if you wish to continue and fix 
> test failures or step down and some committer may revert you commit.
>
>  * New test failure in master PagesWriteThrottleSmokeTest.testThrottle 
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=2808794487465215609&branch=%3Cdefault%3E&tab=testDetails
>  Changes may lead to failure were done by
>  - ivan rakov  
> https://ci.ignite.apache.org/viewModification.html?modId=905838
>  - ymolochkov  
> https://ci.ignite.apache.org/viewModification.html?modId=905833
>  - anton kalashnikov  
> https://ci.ignite.apache.org/viewModification.html?modId=905850
>  - ibessonov  
> https://ci.ignite.apache.org/viewModification.html?modId=905840
>
>  - Here's a reminder of what contributors were agreed to do 
> https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute
>  - Should you have any questions please contact dev@ignite.apache.org
>
> Best Regards,
> Apache Ignite TeamCity Bot
> https://github.com/apache/ignite-teamcity-bot
> Notification generated at 22:44:23 17-08-2020


[jira] [Created] (IGNITE-13368) Speed base throttling unexpectedly degraded to zero

2020-08-18 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13368:
--

 Summary: Speed base throttling unexpectedly degraded to zero
 Key: IGNITE-13368
 URL: https://issues.apache.org/jira/browse/IGNITE-13368
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


New test failure in master PagesWriteThrottleSmokeTest.testThrottle 
https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=2808794487465215609&branch=%3Cdefault%3E&tab=testDetails

Throttling degraded to zero.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Ignite compilation fails in IntelliJ IDEA (IgniteLinkTaglet)

2020-08-10 Thread Anton Kalashnikov
Hi Pavel,

You can do the same action in Project tree -> right button on java11 -> Mark 
directory as -> excluded

-- 
Best regards,
Anton Kalashnikov



10.08.2020, 11:34, "Ivan Bessonov" :
> Hi Pavel,
>
> this issue is unrelated to your problem, but yes, it wouldn't allow you to
> save changes.
> This sucks. You should remove one of those modules in your settings, they
> point on
> the same pom.xml, this means that this is the same module twice in your
> settings.
>
> пн, 10 авг. 2020 г. в 11:24, Pavel Tupitsyn :
>
>>  Ivan,
>>
>>  Thanks for the suggestion. Unfortunately, it does not help -
>>  Idea does not let me apply the changes:
>>
>>  `Content root "/home/pavel/w/ignite" is defined for modules "apache-ignite"
>>  and "ignite".
>>  Two modules in a project cannot share the same content root.`
>>
>>  Clearly I'm doing something wrong - maybe I'm importing the project into
>>  Idea in a wrong way?
>>  Or should I use a different JDK? Which version is best for Ignite
>>  development right now?
>>  (I'm using OpenJDK 8 just out of habit)
>>
>>  On Mon, Aug 10, 2020 at 10:02 AM Ivan Bessonov 
>>  wrote:
>>
>>  > Hi Pavel,
>>  >
>>  > please go to "Project Structure | Project Settings | Modules",
>>  > find module "ignite-tools", open tab "Sources" and mark folder
>>  > "src/main/java11" as "excluded". Should help.
>>  >
>>  > This happens from time to time if you switch from a very old branch
>>  > (like "ignite-2.5") to a fresh branch like "master".
>>  >
>>  > вс, 9 авг. 2020 г. в 21:18, Pavel Tupitsyn :
>>  >
>>  > > Igniters,
>>  > >
>>  > > The project does not seem to compile in IDEA:
>>  > > there are two IgniteLinkTaglet versions for Java 8 and Java 9+,
>>  > > and both files get picked up by the IDE for some reason, resulting
>>  > > in build errors.
>>  > >
>>  > > I've done all the usual things (fresh clone, invalidate caches).
>>  > > java-8 profile is enabled, java-9+ disabled, only JDK 8 is installed.
>>  > > Maven build is fine, only IDEA gives me errors.
>>  > >
>>  > > I've seen some people just delete one of the IgniteLinkTaglet files,
>>  > > this works for me too but is quite inconvenient.
>>  > > Is there any trick to this?
>>  > >
>>  >
>>  >
>>  > --
>>  > Sincerely yours,
>>  > Ivan Bessonov
>>  >
>
> --
> Sincerely yours,
> Ivan Bessonov


Re: [DISCUSSION] Complete Discontinuation of IGFS and Hadoop Accelerator

2020-07-22 Thread Anton Kalashnikov
Sorry, I was mistaken, we can not leave these methods because at least 
FileSystemConfiguration and HadoopConfiguration require corresponded classes 
that were deleted. So I think we should just remove it right now.

-- 
Best regards,
Anton Kalashnikov



22.07.2020, 18:56, "Anton Kalashnikov" :
> Hi,
>
> All of these methods are from IgniteConfiguration:
> Hadoop configuration:
> - HadoopConfiguration getHadoopConfiguration()
> - IgniteConfiguration setHadoopConfiguration(HadoopConfiguration hadoopCfg)
>
> IGFS (Ignite In-Memory File System) configurations:
> - FileSystemConfiguration[] getFileSystemConfiguration
> - IgniteConfiguration setFileSystemConfiguration(FileSystemConfiguration... 
> igfsCfg)
>
> thread pool size that will be used to process outgoing IGFS messages:
> - IgniteConfiguration setIgfsThreadPoolSize(int poolSize)
> - int getIgfsThreadPoolSize()
>
> Of course, I can leave these methods intact but they will be doing nothing so 
> API formally wouldn't be changed but, in fact, features would be removed. 
> Does it make sense? I don't think so and in my opinion, perhaps it is ok to 
> remove these methods right now if we are ready to remove these features right 
> now. (but again, if there are some concerns about it, I can easily to leave 
> these methods with empty implementation)
>
> --
> Best regards,
> Anton Kalashnikov
>
> 22.07.2020, 17:47, "Denis Magda" :
>>  Hi Alex,
>>
>>  It's been a year since we voted to discontinue this integration [1] and it
>>  wasn't removed from the source code earlier only because of the internal
>>  dependencies with the ML component. Now all the dependencies are gone and
>>  Ignite 2.9 is the right version to finish the discontinuation process. It
>>  would make sense to wait for Ignite 3.0 only there are some breaking
>>  changes in the APIs that will stay in Ignite.
>>
>>   @Anton Kalashnikov , you mentioned that you
>>  removed some methods from the configuration. Could you please list them
>>  here? Are they Hadoop-specific or generic?
>>
>>  [1]
>>  
>> http://apache-ignite-developers.2346864.n4.nabble.com/VOTE-Complete-Discontinuation-of-IGFS-and-Hadoop-Accelerator-td42405.html
>>
>>  -
>>  Denis
>>
>>  On Wed, Jul 22, 2020 at 1:52 AM Alex Plehanov 
>>  wrote:
>>
>>>   Guys,
>>>
>>>   Any updates here? Looks like we still don't have a consensus about release
>>>   version for this patch (already mention it in the release thread).
>>>   Currently, the ticket is still targeted to 2.9.
>>>
>>>   ср, 15 июл. 2020 г. в 00:40, Denis Magda :
>>>
>>>   > I don't think it's required to wait until Ignite 3.0 to make this 
>>> happen.
>>>   > If I'm not mistaken, we stopped releasing Hadoop binaries and sources a
>>>   > long time ago (at least you can't longer find them on the downloads
>>>   page).
>>>   > Also, we removed all the mentioning from the documentation and website.
>>>   > Nobody complained or requested for a maintenance release since that 
>>> time.
>>>   > Thus, I would remove the integration in 2.9. If anybody shows up later
>>>   then
>>>   > they can use the sources in the 2.8 branch and do whatever they want.
>>>   >
>>>   > -
>>>   > Denis
>>>   >
>>>   >
>>>   > On Thu, Jul 9, 2020 at 3:52 AM Pavel Tupitsyn 
>>>   > wrote:
>>>   >
>>>   > > We are breaking backwards compatibility,
>>>   > > so this can be only done for Ignite 3.0, am I right?
>>>   > >
>>>   > > On Thu, Jul 9, 2020 at 1:46 PM Anton Kalashnikov 
>>>   > > wrote:
>>>   > >
>>>   > > > Hi everyone,
>>>   > > >
>>>   > > > The task of removal IGFS and Hadoop accelerator is ready to review.(
>>>   > > > https://issues.apache.org/jira/browse/IGNITE-11942)
>>>   > > > I've already asked some guys to take a look at it but if somebody
>>>   > > familiar
>>>   > > > with this part of code, feel free to take a look at the changes
>>>   > > > too(especially scripts changes).
>>>   > > >
>>>   > > > I also think it is good to decide which release it should be planned
>>>   > on.
>>>   > > > This task planned for 2.9 right now but I should notice that first 
>>> of
>>>   > all
>>>

Re: [DISCUSSION] Complete Discontinuation of IGFS and Hadoop Accelerator

2020-07-22 Thread Anton Kalashnikov
Hi,

All of these methods are from IgniteConfiguration:
Hadoop configuration:
- HadoopConfiguration getHadoopConfiguration()
- IgniteConfiguration setHadoopConfiguration(HadoopConfiguration hadoopCfg)

IGFS (Ignite In-Memory File System) configurations:
- FileSystemConfiguration[] getFileSystemConfiguration
- IgniteConfiguration setFileSystemConfiguration(FileSystemConfiguration... 
igfsCfg)

thread pool size that will be used to process outgoing IGFS messages:
- IgniteConfiguration setIgfsThreadPoolSize(int poolSize)
- int getIgfsThreadPoolSize()


Of course, I can leave these methods intact but they will be doing nothing so 
API formally wouldn't be changed but, in fact, features would be removed. Does 
it make sense? I don't think so and in my opinion, perhaps it is ok to remove 
these methods right now if we are ready to remove these features right now. 
(but again, if there are some concerns about it, I can easily to leave these 
methods with empty implementation)

-- 
Best regards,
Anton Kalashnikov



22.07.2020, 17:47, "Denis Magda" :
> Hi Alex,
>
> It's been a year since we voted to discontinue this integration [1] and it
> wasn't removed from the source code earlier only because of the internal
> dependencies with the ML component. Now all the dependencies are gone and
> Ignite 2.9 is the right version to finish the discontinuation process. It
> would make sense to wait for Ignite 3.0 only there are some breaking
> changes in the APIs that will stay in Ignite.
>
>  @Anton Kalashnikov , you mentioned that you
> removed some methods from the configuration. Could you please list them
> here? Are they Hadoop-specific or generic?
>
> [1]
> http://apache-ignite-developers.2346864.n4.nabble.com/VOTE-Complete-Discontinuation-of-IGFS-and-Hadoop-Accelerator-td42405.html
>
> -
> Denis
>
> On Wed, Jul 22, 2020 at 1:52 AM Alex Plehanov 
> wrote:
>
>>  Guys,
>>
>>  Any updates here? Looks like we still don't have a consensus about release
>>  version for this patch (already mention it in the release thread).
>>  Currently, the ticket is still targeted to 2.9.
>>
>>  ср, 15 июл. 2020 г. в 00:40, Denis Magda :
>>
>>  > I don't think it's required to wait until Ignite 3.0 to make this happen.
>>  > If I'm not mistaken, we stopped releasing Hadoop binaries and sources a
>>  > long time ago (at least you can't longer find them on the downloads
>>  page).
>>  > Also, we removed all the mentioning from the documentation and website.
>>  > Nobody complained or requested for a maintenance release since that time.
>>  > Thus, I would remove the integration in 2.9. If anybody shows up later
>>  then
>>  > they can use the sources in the 2.8 branch and do whatever they want.
>>  >
>>  > -
>>  > Denis
>>  >
>>  >
>>  > On Thu, Jul 9, 2020 at 3:52 AM Pavel Tupitsyn 
>>  > wrote:
>>  >
>>  > > We are breaking backwards compatibility,
>>  > > so this can be only done for Ignite 3.0, am I right?
>>  > >
>>  > > On Thu, Jul 9, 2020 at 1:46 PM Anton Kalashnikov 
>>  > > wrote:
>>  > >
>>  > > > Hi everyone,
>>  > > >
>>  > > > The task of removal IGFS and Hadoop accelerator is ready to review.(
>>  > > > https://issues.apache.org/jira/browse/IGNITE-11942)
>>  > > > I've already asked some guys to take a look at it but if somebody
>>  > > familiar
>>  > > > with this part of code, feel free to take a look at the changes
>>  > > > too(especially scripts changes).
>>  > > >
>>  > > > I also think it is good to decide which release it should be planned
>>  > on.
>>  > > > This task planned for 2.9 right now but I should notice that first of
>>  > all
>>  > > > there are a lot of changes and secondly there are some changes in
>>  > public
>>  > > > API(removed some methods from configuration). So maybe it makes sense
>>  > to
>>  > > > move this ticket to the next release. What do you think?
>>  > > >
>>  > > > --
>>  > > > Best regards,
>>  > > > Anton Kalashnikov
>>  > > >
>>  > > >
>>  > > > 10.02.2020, 15:45, "Alexey Zinoviev" :
>>  > > > > Thank you so you much! Will wait:)
>>  > > > >
>>  > > > > пн, 10 февр. 2020 г. в 15:13, Alexey Goncharuk <
>>  > > > alexey.goncha...@gmail.com>:
>>  > > > 

Re: [DISCUSSION] Complete Discontinuation of IGFS and Hadoop Accelerator

2020-07-09 Thread Anton Kalashnikov
Hi everyone,

The task of removal IGFS and Hadoop accelerator is ready to 
review.(https://issues.apache.org/jira/browse/IGNITE-11942)
I've already asked some guys to take a look at it but if somebody familiar with 
this part of code, feel free to take a look at the changes too(especially 
scripts changes).

I also think it is good to decide which release it should be planned on. This 
task planned for 2.9 right now but I should notice that first of all there are 
a lot of changes and secondly there are some changes in public API(removed some 
methods from configuration). So maybe it makes sense to move this ticket to the 
next release. What do you think?

-- 
Best regards,
Anton Kalashnikov


10.02.2020, 15:45, "Alexey Zinoviev" :
> Thank you so you much! Will wait:)
>
> пн, 10 февр. 2020 г. в 15:13, Alexey Goncharuk :
>
>>  Got it, then no need to rush, let's wait for the TF-IGFS decoupling.
>>
>>  пн, 10 февр. 2020 г. в 13:15, Alexey Zinoviev :
>>
>>  > Tensorflow integration uses IGFS, if you have any idea how to store files
>>  > in memory by another way, please suggest something.
>>  > I hope to decouple Ignite-TF integration to the separate repository
>>  before
>>  > release 2.9 with its own file system over Ignite Caches
>>  >
>>  > пн, 10 февр. 2020 г. в 12:49, Ivan Pavlukhin :
>>  >
>>  > > Is not it blocked by
>>  > > https://issues.apache.org/jira/browse/IGNITE-10292 as stated in JIRA?
>>  > >
>>  > > @Alex Zinoviev could you please shed some light on this?
>>  > >
>>  > > Best regards,
>>  > > Ivan Pavlukhin
>>  > >
>>  > > пн, 10 февр. 2020 г. в 12:46, Anton Kalashnikov :
>>  > >
>>  > > >
>>  > > > I found the correct ticket for such activity -
>>  > > https://issues.apache.org/jira/browse/IGNITE-11942
>>  > > >
>>  > > > --
>>  > > > Best regards,
>>  > > > Anton Kalashnikov
>>  > > >
>>  > > >
>>  > > > 10.02.2020, 12:16, "Anton Kalashnikov" :
>>  > > > > Hello.
>>  > > > >
>>  > > > > I created a ticket for this activity -
>>  > > https://issues.apache.org/jira/browse/IGNITE-12647. And if we are
>>  still
>>  > > in consensus I'll do it at the nearest time(I've already had the
>>  prepared
>>  > > code).
>>  > > > >
>>  > > > > --
>>  > > > > Best regards,
>>  > > > > Anton Kalashnikov
>>  > > > >
>>  > > > > 10.02.2020, 12:07, "Alexey Goncharuk" >  >:
>>  > > > >> Folks,
>>  > > > >>
>>  > > > >> I think there is a consensus here, but we did not remove IGFS
>>  > > neither in
>>  > > > >> 2.7 nor in 2.8, did we? Should we schedule a corresponding ticket
>>  > > for 2.9?
>>  > >
>>  >


[jira] [Created] (IGNITE-13207) Checkpointer code refactoring: Splitting GridCacheDatabaseSharedManager ant Checkpointer

2020-07-02 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13207:
--

 Summary: Checkpointer code refactoring: Splitting 
GridCacheDatabaseSharedManager ant Checkpointer
 Key: IGNITE-13207
 URL: https://issues.apache.org/jira/browse/IGNITE-13207
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Checkpoint refactoring(part of IEP-47)

2020-06-22 Thread Anton Kalashnikov
In fact, It is also one of my targets. I believe it will be possible when 
checkpoint's classes will be restructured to smaller classes with more clear 
responsibilities. So if everything goes good we can do it after step 3 which I 
described above.

-- 
Best regards,
Anton Kalashnikov


19.06.2020, 17:28, "Ivan Pavlukhin" :
> Hi Anton,
>
> A side question. Do you feel that it is possible to cover extracted
> classes with unit tests (I mean unit tests which do not start Ignite
> nodes)?
>
> 2020-06-19 16:59 GMT+03:00, Anton Kalashnikov :
>>  Hi Igniters,
>>
>>  IEP-47(Native persistence defragmentation) contains a part that implies
>>  refactoring of checkpoint(with the goal of reusing this feature in
>>  defragmentation).
>>
>>  I just to want to emphasize this part(refactoring) here and share my view to
>>  implementation
>>  I want to split this job to at least 2(but maybe 3) separated tasks:
>>  1. Extracting checkpoint related classes from
>>  GridCacheDatabaseSharedManager(ex. Checkpoint, Checkpointer,
>>  WriteCheckpointPages, etc.) -
>>  https://issues.apache.org/jira/browse/IGNITE-13151(almost done)
>>  2. Simplifying result code - I don't sure it is possible, but right now I
>>  see some code which on the first eye has duplication and redundancy
>>  3. Reorganizing code - There is a lot of work which Checkpointer do right
>>  now, I believe at least this class should be split.
>>
>>  Perhaps, 2 and 3 items will be done in one ticket.
>>  I believe you understand that I suggested several tickets instead of one in
>>  the target of simplification of review and find bugs.
>>
>>  Any objections?
>>
>>  --
>>  Best regards,
>>  Anton Kalashnikov
>
> --
>
> Best regards,
> Ivan Pavlukhin


Checkpoint refactoring(part of IEP-47)

2020-06-19 Thread Anton Kalashnikov
Hi Igniters,

IEP-47(Native persistence defragmentation) contains a part that implies 
refactoring of checkpoint(with the goal of reusing this feature in 
defragmentation).

I just to want to emphasize this part(refactoring) here and share my view to 
implementation 
I want to split this job to at least 2(but maybe 3) separated tasks:
1. Extracting checkpoint related classes from 
GridCacheDatabaseSharedManager(ex. Checkpoint, Checkpointer, 
WriteCheckpointPages, etc.) - 
https://issues.apache.org/jira/browse/IGNITE-13151(almost done)
2. Simplifying result code - I don't sure it is possible, but right now I see 
some code which on the first eye has duplication and redundancy
3. Reorganizing code - There is a lot of work which Checkpointer do right now, 
I believe at least this class should be split.

Perhaps, 2 and 3 items will be done in one ticket.
I believe you understand that I suggested several tickets instead of one in the 
target of simplification of review and find bugs. 

Any objections?

-- 
Best regards,
Anton Kalashnikov



[jira] [Created] (IGNITE-13080) Incorrect hash calculation for binaryObject in case of deduplication

2020-05-27 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13080:
--

 Summary: Incorrect hash calculation for binaryObject in case of 
deduplication
 Key: IGNITE-13080
 URL: https://issues.apache.org/jira/browse/IGNITE-13080
 Project: Ignite
  Issue Type: Bug
  Components: binary
Reporter: Anton Kalashnikov


Lets suppose we have two follows classes(Implimentation of SubKey doesn't 
matter here):
{noformat}
public static class Key {   
private SubKey subKey;
}

public static class Value {
private SubKey subKey;
private Key key;
}
{noformat}
If subKey would be same in Key and Value, and we try to do follows things:
{noformat}
SubKey subKey = new SubKey();
Key key = new Key(subKey);
Value value = new Value(subKey, key);

cache.put(key, value); 

assert cache.size() == 1; //true

BinaryObject keyAsBinaryObject = cache.get(key).field("key");

cache.put(keyAsBinaryObject, value); // cache.size shuld be 1 but it would be 2

assert cache.size() == 1; //false because right now we have to different key 
which is wrong
{noformat}
We get two different record instead of one.

Reason:
When we put raw class Key to cache ignite convert it to binary object(literally 
to a byte array), and then calculate the hash over this byte array and store it 
to this object.

When we put the raw class Value, the same thing happens, but since we have two 
references to the same object (subKey) inside Value, deduplication occurs. This 
means that the first time we meet an object, we save it as it is and remember 
its location, and then if we meet the same object again instead of saving all 
their bytes as is, we mark this place as HANDLE and record only the offset at 
which we can find the saved object.
After that, we try to receive some object(Key) from BinaryObject of Value as a 
result we don't have a new BinaryObject with a new byte array but instead, we 
have BinaryObject with same byte array and with offset which shows us where we 
can find the requested value(Key). And when we try to store this object to 
cache, ignite does it incorrectly - first of all, byte array contains HANDLE 
mark with offset instead of real bytes of the inner object what is already 
wrong but more than it we also calculate hash incorrectly.

Problem:
Right now, Ignite isn't able to store BinaryObject with contains HANDLE. And as 
I understand, it's not so easy to fix. Maybe it makes sense just explicitly 
forbid to work with BinaryObject such described above but of course, it is 
discussable.

Workaround:
we can change the order of field in Value, like this:
{noformat}
public static class Value {   
private Key key;
private SubKey subKey;
}
{noformat}
After that subKey would be inlined inside of key and subKey inside of Value 
would be represented as HANDLE.

Also we can rebuild the object such that:
{noformat}
keyAsBinaryObject.toBuilder().build();
{noformat}
During the this procedure all HANDLE would be restored to real objects.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13041) PDS (Indexing) is failed with 137 code

2020-05-20 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13041:
--

 Summary: PDS (Indexing) is failed with 137 code
 Key: IGNITE-13041
 URL: https://issues.apache.org/jira/browse/IGNITE-13041
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


Process exited with code 137

https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_PdsIndexing?branch=%3Cdefault%3E&buildTypeTab=overview&mode=builds



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Moving binary metadata to PDS storage folder

2020-05-15 Thread Anton Kalashnikov
Hello,

I agree with the described disadvantages(which Maxim mentioned) and I think it 
is a good idea to discuss a new way of pds migration in an extra ticket. But 
also, I agree with Denis that the script will add ambiguity(it's not clear 
which script should be used and when) if we add it. In my opinion, maybe the 
better way is to store some migration metainformation in ignite metastorage, 
and in case, when this information not found, execute some migration code - but 
of course, there is a lot to discuss.

According to this task, I don't see any problem if we left it as is because the 
overhead is minimum right now(only on start and only if the old folder exists). 
Also, we can easily drop(or move) this piece of code if we implement a new 
migration way in some time.

-- 
Best regards,
Anton Kalashnikov


13.05.2020, 21:59, "Denis Mekhanikov" :
> Maxim,
>
> This way we'll introduce a migration procedure between versions, which
> we currently don't have. Different Ignite versions are compatible by
> persistence storage. If we add a migration script, we need to decide,
> whether we need to run it every time when the version is upgraded, or
> only some specific versions are affected.
>
> I suggest having a procedure that will look for metadata in the work
> directory, and if it finds it there, then the node will use it.
> Otherwise the persistence directory is used.
>
> Denis
>
> On 13.05.2020 21:40, Maxim Muzafarov wrote:
>>  Folks,
>>
>>  I think it's important to discuss the following question regarding this 
>> thread:
>>  Should we consider moving the migration procedure from the java
>>  production code to migration scripts?
>>
>>   From my understanding, keeping all such things in java production
>>  source code has some disadvantages:
>>  1. It executes only once at the migration stage.
>>  2. It affects the complexity of the source code and code maintenance.
>>  3. Node crash cases must be covered during the migration procedure.
>>  4. It affects the production usage e.g. the process doesn't have the
>>  right access to the old directory (migration already completed) and
>>  will fail the node start.
>>
>>  The right behavior from my point should be:
>>  1. Change the default path of binary/marshaller directory to the new one.
>>  2. Provide migration scripts for users.
>>
>>  WDYT?
>>
>>  On Wed, 13 May 2020 at 21:10, Denis Mekhanikov  
>> wrote:
>>>  Sounds great!
>>>
>>>  It happens pretty frequently that users migrate to a new version of
>>>  Ignite and preserve persistence files only without caring too much about
>>>  the work folder. But it turns out, that the work folder actually has
>>>  some important stuff.
>>>  This improvement should help with this issue.
>>>
>>>  What's about in-memory mode? As far as I know, we write binary metadata
>>>  to disk even when no persistence is configured. Do you plan to address
>>>  it in any way?
>>>
>>>  Denis
>>>
>>>  On 12.05.2020 15:56, Sergey Antonov wrote:
>>>>  Hello Semyon,
>>>>
>>>>  This is a good idea!
>>>>
>>>>  вт, 12 мая 2020 г. в 15:53, Вячеслав Коптилин :
>>>>
>>>>>  Hello Semyon,
>>>>>
>>>>>  This is a good and long-awaited improvement! Thank you for your efforts!
>>>>>
>>>>>  Thanks,
>>>>>  S.
>>>>>
>>>>>  вт, 12 мая 2020 г. в 15:11, Данилов Семён :
>>>>>
>>>>>>  Hello!
>>>>>>
>>>>>>  I would like to propose moving /binary_meta and /marshaller folders to
>>>>>  the
>>>>>>  PDS folder.
>>>>>>
>>>>>>  Motivation: data, directly related to the persistence, is stored outside
>>>>>>  the persistence dir, which can lead to various issues and also is not
>>>>>  very
>>>>>>  convenient to use. In particular, with k8s, deployment disk that is
>>>>>>  attached to a container can not be accessed from other containers or
>>>>>>  outside of k8s. In case if support will need to drop persistence except
>>>>>>  data, there will be no way to recover due to binary metadata is required
>>>>>  to
>>>>>>  process PDS files.
>>>>>>
>>>>>>  I created an issue (https://issues.apache.org/jira/browse/IGNITE-12994)
>>>>>  and a
>>>>>>  pull request(https://github.com/apache/ignite/pull/7792) that fixes the
>>>>>>  issue.
>>>>>>
>>>>>>  In that PR I made the following:
>>>>>>
>>>>>>  * store binary meta and marshaller data inside db/ folder
>>>>>>  * if binary meta of marshaller are found in "legacy" locations --
>>>>>>  safely move them to new locations during the node startup
>>>>>>
>>>>>>  Kind regards,
>>>>>>
>>>>>>  Semyon Danilov.


[jira] [Created] (IGNITE-12817) Streamer threads don't update timestamp

2020-03-20 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-12817:
--

 Summary: Streamer threads don't update timestamp
 Key: IGNITE-12817
 URL: https://issues.apache.org/jira/browse/IGNITE-12817
 Project: Ignite
  Issue Type: Bug
  Components: streaming
Reporter: Anton Kalashnikov


Scenario:
1. Start 3 data nodes 
2. Start load with a streamer on 6 clients
3. Start data nodes restarter

Result:
Keys weren't loaded in all (1000) caches.
In the server node log I see:
{noformat}
[2019-07-17 16:52:36,881][ERROR][tcp-disco-msg-worker-#2] Blocked 
system-critical thread has been detected. This can lead to cluster-wide 
undefined behaviour [threadName=data-streamer-stripe-7, blockedFor=16s]
[2019-07-17 16:52:36,883][WARN ][tcp-disco-msg-worker-#2] Thread 
[name="data-streamer-stripe-7-#24", id=43, state=WAITING, blockCnt=111, 
waitCnt=169964]
[2019-07-17 16:52:36,885][ERROR][tcp-disco-msg-worker-#2] Critical system error 
detected. Will be handled accordingly to configured handler 
[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet 
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], 
failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-7, 
igniteInstanceName=null, finished=false, heartbeatTs=1563371540069]]]
org.apache.ignite.IgniteException: GridWorker [name=data-streamer-stripe-7, 
igniteInstanceName=null, finished=false, heartbeatTs=1563371540069]
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1838)
 ~[ignite-core-2.5.9.jar:2.5.9]
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1833)
 ~[ignite-core-2.5.9.jar:2.5.9]
at 
org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:230)
 ~[ignite-core-2.5.9.jar:2.5.9]
at 
org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) 
~[ignite-core-2.5.9.jar:2.5.9]
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2804)
 ~[ignite-core-2.5.9.jar:2.5.9]
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7568)
 [ignite-core-2.5.9.jar:2.5.9]
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2866)
 [ignite-core-2.5.9.jar:2.5.9]
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) 
[ignite-core-2.5.9.jar:2.5.9]
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7506)
 [ignite-core-2.5.9.jar:2.5.9]
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) 
[ignite-core-2.5.9.jar:2.5.9]
{noformat}



The problem is in data streamer threads. They should update progress timestamps.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12801) Possible extra page release when throttling and checkpoint thread store its concurrently

2020-03-18 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-12801:
--

 Summary: Possible extra page release when throttling and 
checkpoint thread store its concurrently
 Key: IGNITE-12801
 URL: https://issues.apache.org/jira/browse/IGNITE-12801
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


* User thread acquire page on write release
* Checkpoint thread sees that page was acquired
* Throttling thread sees that page was acquired
* Checkpoint thread saves page to disk and releases the page
* Throttling thread understand that the page was already saved but nonetheless 
release this page again. - this is not ok.
{noformat}
java.lang.AssertionError: null
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.copyPageForCheckpoint(PageMemoryImpl.java:1181)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.checkpointWritePage(PageMemoryImpl.java:1160)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$WriteCheckpointPages.writePages(GridCacheDatabaseSharedManager.java:4868)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$WriteCheckpointPages.run(GridCacheDatabaseSharedManager.java:4792)
... 3 common frames omitted
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12714) Absence of default value of IGNITE_SYSTEM_WORKER_BLOCKED TIMEOUT

2020-02-21 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-12714:
--

 Summary: Absence of default value of IGNITE_SYSTEM_WORKER_BLOCKED 
TIMEOUT
 Key: IGNITE-12714
 URL: https://issues.apache.org/jira/browse/IGNITE-12714
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


Scenario:
1. Start 3 data nodes 
2. Start load with a streamer on 6 clients
3. Start data nodes restarter

Result:
Keys weren't loaded in all (1000) caches.
In the server node log I see:
{noformat}
[2019-07-17 16:52:36,881][ERROR][tcp-disco-msg-worker-#2] Blocked 
system-critical thread has been detected. This can lead to cluster-wide 
undefined behaviour [threadName=data-streamer-stripe-7, blockedFor=16s]
[2019-07-17 16:52:36,883][WARN ][tcp-disco-msg-worker-#2] Thread 
[name="data-streamer-stripe-7-#24", id=43, state=WAITING, blockCnt=111, 
waitCnt=169964]
[2019-07-17 16:52:36,885][ERROR][tcp-disco-msg-worker-#2] Critical system error 
detected. Will be handled accordingly to configured handler 
[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet 
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], 
failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-7, 
igniteInstanceName=null, finished=false, heartbeatTs=1563371540069]]]
org.apache.ignite.IgniteException: GridWorker [name=data-streamer-stripe-7, 
igniteInstanceName=null, finished=false, heartbeatTs=1563371540069]
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1838)
 ~[ignite-core-2.5.9.jar:2.5.9]
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1833)
 ~[ignite-core-2.5.9.jar:2.5.9]
at 
org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:230)
 ~[ignite-core-2.5.9.jar:2.5.9]
at 
org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) 
~[ignite-core-2.5.9.jar:2.5.9]
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2804)
 ~[ignite-core-2.5.9.jar:2.5.9]
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7568)
 [ignite-core-2.5.9.jar:2.5.9]
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2866)
 [ignite-core-2.5.9.jar:2.5.9]
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) 
[ignite-core-2.5.9.jar:2.5.9]
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7506)
 [ignite-core-2.5.9.jar:2.5.9]
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) 
[ignite-core-2.5.9.jar:2.5.9]
{noformat}

Logs: ftp://gg@172.25.2.50/poc-tester-logs/1723/log-2019-07-17-17-33-23
Log with dumps: 
ftp://gg@172.25.2.50/poc-tester-logs/1723/log-2019-07-17-17-33-23/servers/172.25.1.12/poc-tester-server-172.25.1.12-id-0-2019-07-17-16-46-58.log-1-2019-07-17.log.gz


*Solution:*
Increase timeout to 2 min 
org.apache.ignite.IgniteSystemProperties#IGNITE_SYSTEM_WORKER_BLOCKED_TIMEOUT



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12713) [Suite] PDS 1 flaky failed on TC

2020-02-21 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-12713:
--

 Summary: [Suite] PDS 1 flaky failed on TC
 Key: IGNITE-12713
 URL: https://issues.apache.org/jira/browse/IGNITE-12713
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


IgnitePdsTestSuite: 
BPlusTreeReuseListPageMemoryImplTest.testIterateConcurrentPutRemove_2   
⚂IgnitePdsTestSuite: 
BPlusTreeReuseListPageMemoryImplTest.testMassiveRemove2_false  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12712) NPE in checkpoint thread

2020-02-21 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-12712:
--

 Summary: NPE in checkpoint thread
 Key: IGNITE-12712
 URL: https://issues.apache.org/jira/browse/IGNITE-12712
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


NPE occured in checkpoint thread (rare reproducing):
{noformat}
[2019-11-04 20:54:58,018][INFO ][sys-#50][GridDhtPartitionsExchangeFuture] 
Received full message, will finish exchange 
[node=1784645d-3bef-44fe-8288-e0c16202f5e3, resVer=AffinityTopologyVersion 
[topVer=4, minorTopVer=9]]
[2019-11-04 20:54:58,023][INFO ][sys-#50][GridDhtPartitionsExchangeFuture] 
Finish exchange future [startVer=AffinityTopologyVersion [topVer=4, 
minorTopVer=9], resVer=AffinityTopologyVersion [topVer=4, minorTopVer=9], 
err=null]
[2019-11-04 20:54:58,029][INFO ][sys-#50][GridCacheProcessor] Finish proxy 
initialization, cacheName=SQL_PUBLIC_T8, 
localNodeId=5b153e14-70f2-4408-a125-584752532ebd
[2019-11-04 20:54:58,030][INFO ][sys-#50][GridDhtPartitionsExchangeFuture] 
Completed partition exchange [localNode=5b153e14-70f2-4408-a125-584752532ebd, 
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion 
[topVer=4, minorTopVer=9], evt=DISCOVERY_CUSTOM_EVT, evtNode=TcpDiscoveryNode 
[id=1784645d-3bef-44fe-8288-e0c16202f5e3, consistentId=1, addrs=ArrayList 
[127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47500], discPort=47500, order=1, 
intOrder=1, lastExchangeTime=1572890071469, loc=false, 
ver=8.7.8#20191101-sha1:e344ed04, isClient=false], done=true, newCrdFut=null], 
topVer=AffinityTopologyVersion [topVer=4, minorTopVer=9]]
[2019-11-04 20:54:58,030][INFO ][sys-#50][GridDhtPartitionsExchangeFuture] 
Exchange timings [startVer=AffinityTopologyVersion [topVer=4, minorTopVer=9], 
resVer=AffinityTopologyVersion [topVer=4, minorTopVer=9], stage="Waiting in 
exchange queue" (0 ms), stage="Exchange parameters initialization" (0 ms), 
stage="Update caches registry" (0 ms), stage="Start caches" (52 ms), 
stage="Affinity initialization on cache group start" (1 ms), stage="Determine 
exchange type" (0 ms), stage="Preloading notification" (0 ms), stage="WAL 
history reservation" (0 ms), stage="Wait partitions release" (1 ms), 
stage="Wait partitions release latch" (5 ms), stage="Wait partitions release" 
(0 ms), stage="Restore partition states" (7 ms), stage="After states restored 
callback" (10 ms), stage="Waiting for Full message" (59 ms), stage="Affinity 
recalculation" (0 ms), stage="Full map updating" (4 ms), stage="Exchange done" 
(7 ms), stage="Total time" (146 ms)]
[2019-11-04 20:54:58,030][INFO ][sys-#50][GridDhtPartitionsExchangeFuture] 
Exchange longest local stages [startVer=AffinityTopologyVersion [topVer=4, 
minorTopVer=9], resVer=AffinityTopologyVersion [topVer=4, minorTopVer=9], 
stage="Affinity initialization on cache group start [grp=SQL_PUBLIC_T8]" (1 ms) 
(parent=Affinity initialization on cache group start), stage="Restore partition 
states [grp=SQL_PUBLIC_T8]" (6 ms) (parent=Restore partition states), 
stage="Restore partition states [grp=ignite-sys-cache]" (3 ms) (parent=Restore 
partition states), stage="Restore partition states [grp=cache_group_3]" (0 ms) 
(parent=Restore partition states)]
[2019-11-04 20:54:58,037][INFO 
][exchange-worker-#45][GridCachePartitionExchangeManager] Skipping rebalancing 
(nothing scheduled) [top=AffinityTopologyVersion [topVer=4, minorTopVer=9], 
force=false, evt=DISCOVERY_CUSTOM_EVT, 
node=1784645d-3bef-44fe-8288-e0c16202f5e3]
[2019-11-04 20:54:58,713][INFO 
][db-checkpoint-thread-#53][GridCacheDatabaseSharedManager] Checkpoint started 
[checkpointId=82969270-b1a5-4480-9513-3af65bab0e17, startPtr=FileWALPointer 
[idx=0, fileOff=3550077, len=12350], checkpointBeforeLockTime=8ms, 
checkpointLockWait=4ms, checkpointListenersExecuteTime=56ms, 
checkpointLockHoldTime=61ms, walCpRecordFsyncDuration=4ms, 
writeCheckpointEntryDuration=8ms, splitAndSortCpPagesDuration=1ms,  pages=178, 
reason='timeout']
[2019-11-04 20:54:58,914][INFO ][exchange-worker-#45][time] Started exchange 
init [topVer=AffinityTopologyVersion [topVer=4, minorTopVer=10], crd=false, 
evt=DISCOVERY_CUSTOM_EVT, evtNode=1784645d-3bef-44fe-8288-e0c16202f5e3, 
customEvt=DynamicCacheChangeBatch 
[id=8b06d873e61-af9e27a6-8fe9-4da1-bc0a-d19cd0eabd36, reqs=ArrayList 
[DynamicCacheChangeRequest [cacheName=SQL_PUBLIC_T9, hasCfg=true, 
nodeId=1784645d-3bef-44fe-8288-e0c16202f5e3, clientStartOnly=false, stop=false, 
destroy=false, disabledAfterStartfalse]], exchangeActions=ExchangeActions 
[startCaches=[SQL_PUBLIC_T9], stopCaches=null, startGrps=[cache_group_4], 
stopGrps=[], resetParts=null, stateChangeRequest=null], startC

[jira] [Created] (IGNITE-12709) Server latch initialized after client latch in Zookeeper discovery

2020-02-20 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-12709:
--

 Summary: Server latch initialized after client latch in Zookeeper 
discovery
 Key: IGNITE-12709
 URL: https://issues.apache.org/jira/browse/IGNITE-12709
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


The coordinator node missed latch message from the client because it doesn't 
receive a triggered message of exchange. So it leads to infinity wait of answer 
from the coordinator.

{noformat}

[2019-10-23 
12:49:42,110]\[ERROR]\[sys-#39470%continuous.GridEventConsumeSelfTest0%]\[GridIoManager]
 An error occurred processing the message \[msg=GridIoMessage \[plc=2, 
topic=TOPIC_EXCHANGE, topicOrd=31, ordered=fa
lse, timeout=0, skipOnTimeout=false, 
msg=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.LatchAckMessage@7699f4f2],
 nodeId=857a40a8-f384-4740-816c-dd54d3a1].
class org.apache.ignite.IgniteException: Topology AffinityTopologyVersion 
\[topVer=54, minorTopVer=0] not found in discovery history ; consider 
increasing IGNITE_DISCOVERY_HISTORY_SIZE property. Current value is
-1
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.aliveNodesForTopologyVer(ExchangeLatchManager.java:292)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.getLatchCoordinator(ExchangeLatchManager.java:334)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.processAck(ExchangeLatchManager.java:379)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.lambda$new$0(ExchangeLatchManager.java:119)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1632)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1252)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4300(GridIoManager.java:143)
at 
org.apache.ignite.internal.managers.communication.GridIoManager$8.execute(GridIoManager.java:1143)
at 
org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:50)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

[2019-10-23 12:50:02,106]\[WARN 
]\[exchange-worker-#39517%continuous.GridEventConsumeSelfTest1%]\[GridDhtPartitionsExchangeFuture]
 Unable to await partitions release latch within timeout: ClientLatch 
\[coordinator=ZookeeperClusterNode \[id=760ca6b5-f30b-4c40-81b1-5b602c20, 
addrs=\[127.0.0.1], order=1, loc=false, client=false], ackSent=true, 
super=CompletableLatch \[id=CompletableLatchUid \[id=exchange, 
topVer=AffinityTopologyVersion \[topVer=54, minorTopVer=0

[2019-10-23 12:50:02,192]\[WARN 
]\[exchange-worker-#39469%continuous.GridEventConsumeSelfTest0%]\[GridDhtPartitionsExchangeFuture]
 Unable to await partitions release latch within timeout: ServerLatch 
\[permits=1, pendingAcks=HashSet \[06c3094b-c1f3-4fe8-81e8-22cb6602], 
super=CompletableLatch \[id=CompletableLatchUid \[id=exchange, 
topVer=AffinityTopologyVersion \[topVer=54, minorTopVer=0

{noformat}

Reproduced by 
org.apache.ignite.internal.processors.continuous.GridEventConsumeSelfTest#testMultithreadedWithNodeRestart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12653) Add example of baseline auto-adjust feature

2020-02-10 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-12653:
--

 Summary: Add example of baseline auto-adjust feature
 Key: IGNITE-12653
 URL: https://issues.apache.org/jira/browse/IGNITE-12653
 Project: Ignite
  Issue Type: Task
  Components: examples
Reporter: Anton Kalashnikov


Work on the Phase II of IEP-4 (Baseline topology) [1] has finished. It makes 
sense to implement some examples of "Baseline auto-adjust" [2]. 

"Baseline auto-adjust" feature implements mechanism of auto-adjust baseline 
corresponding to current topology after event join/left was appeared. It is 
required because when a node left the grid and nobody would change baseline 
manually it can lead to lost data(when some more nodes left the grid on depends 
in backup factor) but permanent tracking of grid is not always 
possible/desirible. Looks like in many cases auto-adjust baseline after some 
timeout is very helpfull. 

Distributed metastore[3](it is already done): 

First of all it is required the ability to store configuration data 
consistently and cluster-wide. Ignite doesn't have any specific API for such 
configurations and we don't want to have many similar implementations of the 
same feature in our code. After some thoughts is was proposed to implement it 
as some kind of distributed metastorage that gives the ability to store any 
data in it. 
First implementation is based on existing local metastorage API for persistent 
clusters (in-memory clusters will store data in memory). Write/remove operation 
use Discovery SPI to send updates to the cluster, it guarantees updates order 
and the fact that all existing (alive) nodes have handled the update message. 
As a way to find out which node has the latest data there is a "version" value 
of distributed metastorage, which is basically . All updates history until some point in the past is stored along with 
the data, so when an outdated node connects to the cluster it will receive all 
the missing data and apply it locally. If there's not enough history stored or 
joining node is clear then it'll receive shapshot of distributed metastorage so 
there won't be inconsistencies. 

Baseline auto-adjust: 

Main scenario: 
- There is a grid with the baseline is equal to the current topology 
- New node joins to grid or some node left(failed) the grid 
- New mechanism detects this event and it add a task for changing 
baseline to queue with configured timeout 
- If a new event happens before baseline would be changed task would be 
removed from the queue and a new task will be added 
- When a timeout is expired the task would try to set new baseline 
corresponded to current topology 

First of all we need to add two parameters[4]: 
- baselineAutoAdjustEnabled - enable/disable "Baseline auto-adjust" 
feature. 
- baselineAutoAdjustTimeout - timeout after which baseline should be 
changed. 

These parameters are cluster-wide and can be changed in real-time because it is 
based on "Distributed metastore". 

Restrictions: 
- This mechanism handling events only on active grid 
- for in-memory nodes - enabled by default. For persistent nodes - 
disabled.
- If lost partitions was detected this feature would be disabled 
- If baseline was adjusted manually on baselineNodes != gridNodes the 
exception would be thrown

[1] 
https://cwiki.apache.org/confluence/display/IGNITE/IEP-4+Baseline+topology+for+caches
[2] https://issues.apache.org/jira/browse/IGNITE-8571
[3] https://issues.apache.org/jira/browse/IGNITE-10640
[4] https://issues.apache.org/jira/browse/IGNITE-8573



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12652) Add example of failure handling

2020-02-10 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-12652:
--

 Summary: Add example of failure handling
 Key: IGNITE-12652
 URL: https://issues.apache.org/jira/browse/IGNITE-12652
 Project: Ignite
  Issue Type: Task
  Components: examples
Reporter: Anton Kalashnikov


Ignite has the following feature - 
https://apacheignite.readme.io/docs/critical-failures-handling, but there is 
not an example of how to use it correctly. So it is good to add some examples.

Also, Ignite has DiagnosticProcessor which invokes when the failure handler is 
triggered. Maybe it is a good idea to add to this example some samples of 
diagnostic work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSSION] Complete Discontinuation of IGFS and Hadoop Accelerator

2020-02-10 Thread Anton Kalashnikov
I found the correct ticket for such activity - 
https://issues.apache.org/jira/browse/IGNITE-11942

-- 
Best regards,
Anton Kalashnikov


10.02.2020, 12:16, "Anton Kalashnikov" :
> Hello.
>
> I created a ticket for this activity - 
> https://issues.apache.org/jira/browse/IGNITE-12647. And if we are still in 
> consensus I'll do it at the nearest time(I've already had the prepared code).
>
> --
> Best regards,
> Anton Kalashnikov
>
> 10.02.2020, 12:07, "Alexey Goncharuk" :
>>  Folks,
>>
>>  I think there is a consensus here, but we did not remove IGFS neither in
>>  2.7 nor in 2.8, did we? Should we schedule a corresponding ticket for 2.9?


Re: [VOTE] Allow or prohibit a joint use of @deprecated and @IgniteExperimental

2020-02-10 Thread Anton Kalashnikov
-1 Prohibit 

because otherwise, instead of one stable API we'll have the old(not recommend 
to use) and unstable one. Which is not user-friendly.

-- 
Best regards,
Anton Kalashnikov


10.02.2020, 12:28, "Ivan Rakov" :
> -1 Prohibit
>
> From my point of view, deprecation of the existing API will confuse users
> in case API suggested as a replacement is marked with @IgniteExperimental.
>
> On Mon, Feb 10, 2020 at 12:20 PM Nikolay Izhikov 
> wrote:
>
>>  +1
>>
>>  > 10 февр. 2020 г., в 11:57, Andrey Mashenkov 
>>  написал(а):
>>  >
>>  > -1 Prohibit.
>>  >
>>  > We must not deprecate old API without have a new stable well-documented
>>  > alternative and a way to migrate to new one.
>>  >
>>  >
>>  > On Mon, Feb 10, 2020 at 11:02 AM Alexey Goncharuk >  >
>>  > wrote:
>>  >
>>  >> Dear Apache Ignite community,
>>  >>
>>  >> We would like to conduct a formal vote on the subject of whether to
>>  allow
>>  >> or prohibit a joint existence of @deprecated annotation for an old API
>>  >> and @IgniteExperimental [1] for a new (replacement) API. The result of
>>  this
>>  >> vote will be formalized as an Apache Ignite development rule to be used
>>  in
>>  >> future.
>>  >>
>>  >> The discussion thread where you can address all non-vote messages is
>>  [2].
>>  >>
>>  >> The votes are:
>>  >> *[+1 Allow]* Allow to deprecate the old APIs even when new APIs are
>>  marked
>>  >> with @IgniteExperimental to explicitly notify users that an old APIs
>>  will
>>  >> be removed in the next major release AND new APIs are available.
>>  >> *[-1 Prohibit]* Never deprecate the old APIs unless the new APIs are
>>  stable
>>  >> and released without @IgniteExperimental. The old APIs javadoc may be
>>  >> updated with a reference to new APIs to encourage users to evaluate new
>>  >> APIs. The deprecation and new API release may happen simultaneously if
>>  the
>>  >> new API is not marked with @IgniteExperimental or the annotation is
>>  removed
>>  >> in the same release.
>>  >>
>>  >> Neither of the choices prohibits deprecation of an API without a
>>  >> replacement if community decides so.
>>  >>
>>  >> The vote will hold for 72 hours and will end on February 13th 2020 08:00
>>  >> UTC:
>>  >>
>>  >>
>>  
>> https://www.timeanddate.com/countdown/to?year=2020&month=2&day=13&hour=8&min=0&sec=0&p0=utc-1
>>  >>
>>  >> All votes count, there is no binding/non-binding status for this.
>>  >>
>>  >> [1]
>>  >>
>>  >>
>>  
>> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/lang/IgniteExperimental.java
>>  >> [2]
>>  >>
>>  >>
>>  
>> http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSS-Public-API-deprecation-rules-td45647.html
>>  >>
>>  >> Thanks,
>>  >> --AG
>>  >>
>>  >
>>  >
>>  > --
>>  > Best regards,
>>  > Andrey V. Mashenkov


Re: [DISCUSSION] Complete Discontinuation of IGFS and Hadoop Accelerator

2020-02-10 Thread Anton Kalashnikov
Hello.

I created a ticket for this activity - 
https://issues.apache.org/jira/browse/IGNITE-12647. And if we are still in 
consensus I'll do it at the nearest time(I've already had the prepared code). 

-- 
Best regards,
Anton Kalashnikov


10.02.2020, 12:07, "Alexey Goncharuk" :
> Folks,
>
> I think there is a consensus here, but we did not remove IGFS neither in
> 2.7 nor in 2.8, did we? Should we schedule a corresponding ticket for 2.9?


[jira] [Created] (IGNITE-12647) Get rid of IGFS and Hadoop Accelerator

2020-02-10 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-12647:
--

 Summary: Get rid of IGFS and Hadoop Accelerator
 Key: IGNITE-12647
 URL: https://issues.apache.org/jira/browse/IGNITE-12647
 Project: Ignite
  Issue Type: Improvement
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


There is no single committer who maintains the
integrations; they are no longer tested and, even more, the community
stopped providing the binaries since Ignite 2.6.0 release (look for
In-Memory Hadoop Accelerator table).

So it makes sense to get rid of IGFS and Hadoop Accelerator



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12631) Incorrect rewriting wal record type in marshalled mode during iteration

2020-02-06 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-12631:
--

 Summary: Incorrect rewriting wal record type in marshalled mode 
during iteration 
 Key: IGNITE-12631
 URL: https://issues.apache.org/jira/browse/IGNITE-12631
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


The fail happens on iteration over wal record which was written under 
marshalled mode in case when RecordType#ordinal != RecordType#index
{noformat}
[16:46:58,800][SEVERE][pitr-ctx-exec-#399][GridRecoveryProcessor] Fail scan wal 
log for recovery localNodeConstId=node_1_1
 class org.apache.ignite.IgniteCheckedException: Failed to read WAL record at 
position: 45905 size: -1
at 
org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.handleRecordException(AbstractWalRecordsIterator.java:292)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.handleRecordException(FileWriteAheadLogManager.java:3302)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.advanceRecord(AbstractWalRecordsIterator.java:258)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.advance(AbstractWalRecordsIterator.java:154)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.onNext(AbstractWalRecordsIterator.java:123)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.onNext(AbstractWalRecordsIterator.java:52)
at 
org.apache.ignite.internal.util.GridCloseableIteratorAdapter.nextX(GridCloseableIteratorAdapter.java:41)
at 
org.apache.ignite.internal.util.lang.GridIteratorAdapter.next(GridIteratorAdapter.java:35)
... 7 more
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to read WAL 
record at position: 45905 size: -1
at 
org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV1Serializer.readWithCrc(RecordV1Serializer.java:394)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV2Serializer.readRecord(RecordV2Serializer.java:235)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.advanceRecord(AbstractWalRecordsIterator.java:243)
... 12 more
Caused by: java.io.IOException: Unknown record type: null, expected pointer 
[idx=2, offset=45905]
at 
org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV2Serializer$2.readWithHeaders(RecordV2Serializer.java:122)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV1Serializer.readWithCrc(RecordV1Serializer.java:373)
... 14 more
Suppressed: class 
org.apache.ignite.internal.processors.cache.persistence.wal.crc.IgniteDataIntegrityViolationException:
 val: 1445348818 writtenCrc: 374280888
at 
org.apache.ignite.internal.processors.cache.persistence.wal.io.FileInput$Crc32CheckingFileInput.close(FileInput.java:106)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV1Serializer.readWithCrc(RecordV1Serializer.java:380)
... 14 more
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12594) Deadlock between GridCacheDataStore#purgeExpiredInternal and GridNearTxLocal#enlistWriteEntry

2020-01-28 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-12594:
--

 Summary: Deadlock between GridCacheDataStore#purgeExpiredInternal 
and GridNearTxLocal#enlistWriteEntry
 Key: IGNITE-12594
 URL: https://issues.apache.org/jira/browse/IGNITE-12594
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


The deadlock is reproduced occasionally in PDS3 suite and can be seen in the 
thread dump below.
One thread attempts to unwind evicts, acquires checkpoint read lock and then 
locks {{GridCacheMapEntry}}. Another thread does {{GridCacheMapEntry#unswap}}, 
determines that the entry is expired and acquires checkpoint read lock to 
remove the entry from the store. 
We should not acquire checkpoint read lock inside of a locked 
{{GridCacheMapEntry}}.

{code:java}Thread [name="updater-1", id=29900, state=WAITING, blockCnt=2, 
waitCnt=4450]
Lock 
[object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@2fc51685, 
ownerName=null, ownerId=-1]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
at 
o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1632)
   <- CP read lock
at 
o.a.i.i.processors.cache.GridCacheMapEntry.onExpired(GridCacheMapEntry.java:4081)
at 
o.a.i.i.processors.cache.GridCacheMapEntry.unswap(GridCacheMapEntry.java:559)
at 
o.a.i.i.processors.cache.GridCacheMapEntry.unswap(GridCacheMapEntry.java:519)   
   <- locked entry
at 
o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.enlistWriteEntry(GridNearTxLocal.java:1437)
at 
o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.enlistWrite(GridNearTxLocal.java:1303)
at 
o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.putAllAsync0(GridNearTxLocal.java:957)
at 
o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.putAllAsync(GridNearTxLocal.java:491)
at 
o.a.i.i.processors.cache.GridCacheAdapter$29.inOp(GridCacheAdapter.java:2526)
at 
o.a.i.i.processors.cache.GridCacheAdapter$SyncInOp.op(GridCacheAdapter.java:4727)
at 
o.a.i.i.processors.cache.GridCacheAdapter.syncOp(GridCacheAdapter.java:3740)
at 
o.a.i.i.processors.cache.GridCacheAdapter.putAll0(GridCacheAdapter.java:2524)
at 
o.a.i.i.processors.cache.GridCacheAdapter.putAll(GridCacheAdapter.java:2513)
at 
o.a.i.i.processors.cache.IgniteCacheProxyImpl.putAll(IgniteCacheProxyImpl.java:1264)
at 
o.a.i.i.processors.cache.GatewayProtectedCacheProxy.putAll(GatewayProtectedCacheProxy.java:863)
at 
o.a.i.i.processors.cache.persistence.IgnitePdsContinuousRestartTest$1.call(IgnitePdsContinuousRestartTest.java:291)
at o.a.i.testframework.GridTestThread.run(GridTestThread.java:83)

Locked synchronizers:
java.util.concurrent.locks.ReentrantLock$NonfairSync@762613f7


Thread 
[name="sys-stripe-0-#24086%persistence.IgnitePdsContinuousRestartTestWithExpiryPolicy0%",
 id=29617, state=WAITING, blockCnt=2, waitCnt=65381]
Lock [object=java.util.concurrent.locks.ReentrantLock$NonfairSync@762613f7, 
ownerName=updater-1, ownerId=29900]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at 
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
at 
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)   
<- lock entry
at 
o.a.i.i.processors.cache.GridCacheMapEntry.lockEntry(GridCacheMapEntry.java:5017)
at 
o.a.i.i.processors.cache.GridCacheMapEntry.markObsoleteVersion(GridCacheMapEntry.java:2799)
at 
o.a.i.i.processors.cache.distributed.dht.topology.GridDhtLocalPartition.removeVersionedEntry(GridDhtLocalP

[jira] [Created] (IGNITE-12593) Corruption of B+Tree caused by byte array values and TTL

2020-01-28 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-12593:
--

 Summary: Corruption of B+Tree caused by byte array values and TTL
 Key: IGNITE-12593
 URL: https://issues.apache.org/jira/browse/IGNITE-12593
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


It seems that the following set of parameters may lead to a corruption of 
B+Tree:
 - persistence is enabled
 - TTL is enabled 
 - Expiry policy - AccessedExpiryPolicy 1 sec.
 - cache value type is byte[]
 - all caches belong to the same cache group

Example of the stack trace:
{code:java}
[2019-07-16 
21:13:19,288][ERROR][sys-stripe-2-#46%db.IgnitePdsWithTtlDeactivateOnHighloadTest1%][IgniteTestResources]
 Critical system error detected. Will be handled accordingly to configured 
handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler 
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
[type=CRITICAL_ERROR, err=class 
o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is 
corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-1237460590, 
val2=281586645860358]], msg=Runtime failure on search row: SearchRow 
[key=KeyCacheObjectImpl [part=26, val=378, hasValBytes=true], hash=378, 
cacheId=-1806498247
class 
org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
 B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-1237460590, 
val2=281586645860358]], msg=Runtime failure on search row: SearchRow 
[key=KeyCacheObjectImpl [part=26, val=378, hasValBytes=true], hash=378, 
cacheId=-1806498247]]
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:5910)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1859)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1662)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1645)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2410)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:445)
at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2309)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2570)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:2030)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1848)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1668)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3235)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:139)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:273)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:268)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1141)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1558)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1186)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125)
at 
org.apache.ignite.internal.managers.communication.GridIoManager$8.run

Re: Apache Ignite 2.8 RELEASE [Time, Scope, Manager]

2019-12-27 Thread Anton Kalashnikov
Hello.

Ivan is right that "baseline auto-adjust" is disabled by default if you start 
your node on
existing PDS. But "baseline auto-adjust" is enabled by default for in-memory 
cluster due to in-memory nodes also have bound to baseline since 2.8 version.

Also, I want to note that after this 
ticket(https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12227).  
"baseline auto-adjust" would be disabled by default for any persistent 
cluster(empty and existed one) because current logic is a little confused and 
can lead to some problems which described in the ticket.

-- 
Best regards,
Anton Kalashnikov


27.12.2019, 17:58, "Ivan Bessonov" :
> Hello,
>
> "baseline auto-adjust" is disabled by default if you start your node on
> existing PDS.
> It's enabled on new clusters only.
>
> Existing installations should not be affected by the update. Is that ok?
>
> пт, 27 дек. 2019 г. в 14:46, Maxim Muzafarov :
>
>>  Ilya,
>>
>>  +1 from my side.
>>
>>  On Fri, 27 Dec 2019 at 14:36, Ilya Kasnacheev 
>>  wrote:
>>  >
>>  > Hello!
>>  >
>>  > I have also noticed that we have baseline auto-adjust enabled by default
>>  in
>>  > 2.8 builds, and it breaks existing code in runtime:
>>  > https://issues.apache.org/jira/browse/IGNITE-12504
>>  >
>>  > I propose to turn auto-adjust off by default in 2.8 release. What do you
>>  > think?
>>  >
>>  > Regards,
>>  > --
>>  > Ilya Kasnacheev
>>  >
>>  >
>>  > пт, 27 дек. 2019 г. в 12:40, Sergei Ryzhov :
>>  >
>>  > > Hello!
>>  > > Task IGNITE-12470 is ready.
>>  > > https://issues.apache.org/jira/browse/IGNITE-12470
>>  > > Please check this API.
>>  > >
>>  > >
>>  > > Regards,
>>  > > Ryzhov Sergei.
>>  > >
>>  > > чт, 26 дек. 2019 г. в 18:50, Maxim Muzafarov :
>>  > >
>>  > > > Ilya,
>>  > > >
>>  > > >
>>  > > > I agree with you that there is no risk and spring-data-2.2 can be
>>  > > > safely cherry-picked to the ignite-2.8 branch. I'm OK with it. Will
>>  > > > you do such merge or I should do it by myself?
>>  > > >
>>  > > >
>>  > > > As for the second part of your email, you are proposing to bump up a
>>  > > > minor dependencies version (no API changes) for the whole components
>>  > > > mentioned in the parent/pom.xml file, right? From a point of the
>>  > > > release view, it seems not a good thing since a scope test of the
>>  > > > release becomes too wider. I don't think we will simplify thus the
>>  > > > year-long release test scope, so as for me, this sounds not good but
>>  > > > I'd like to hear thoughts of other community members on this point.
>>  > > >
>>  > > > As an alternative, for instance, we can bump minor versions only for
>>  > > > those components which have security vulnerabilities. To find such
>>  > > > dependencies, I've run some local test with a maven
>>  > > > dependency-check-maven [1] an open-source dependency check tool. Here
>>  > > > is a brief report (only a few modules tested):
>>  > > >
>>  > > > spring-core-4.3.18.RELEASE.jar : CVE-2018-15756 [2]
>>  > > > h2-1.4.197.jar : CVE-2018-10054, CVE-2018-14335 (discussed also [3])
>>  > > > ignite-shmem-1.0.0.jar : CVE-2017-14614
>>  > > >
>>  > > >
>>  > > > [1] https://jeremylong.github.io/DependencyCheck/index.html
>>  > > > [2] https://nvd.nist.gov/vuln/detail/CVE-2018-15756
>>  > > > [3] https://issues.apache.org/jira/browse/IGNITE-10801
>>  > > >
>>  > > >
>>  > > >
>>  > > > On Thu, 26 Dec 2019 at 15:52, Ilya Kasnacheev <
>>  ilya.kasnach...@gmail.com
>>  > > >
>>  > > > wrote:
>>  > > > >
>>  > > > > Hello!
>>  > > > >
>>  > > > > I propose to add the following ticket to the scope:
>>  > > > > https://issues.apache.org/jira/browse/IGNITE-12259 (3 commits, be
>>  > > > careful
>>  > > > > with release version)
>>  > > > >
>>  > > > > Adding tickets to scope surely seems crazy now, but I will provide
>>  the
>>  > > > > followin

[jira] [Created] (IGNITE-12463) Inconsistancy of checkpoint progress future with its state

2019-12-17 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-12463:
--

 Summary: Inconsistancy of checkpoint progress future with its 
state 
 Key: IGNITE-12463
 URL: https://issues.apache.org/jira/browse/IGNITE-12463
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


It needs to reorganize checkpoint futures(start, finish) so they should be 
matched to states.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12460) Cluster fails to find the node by consistent ID

2019-12-17 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-12460:
--

 Summary: Cluster fails to find the node by consistent ID
 Key: IGNITE-12460
 URL: https://issues.apache.org/jira/browse/IGNITE-12460
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


Steps to reproduce 1:

* Start cluster of three nodes
* Navigate to Baseline screen
* Start one more node
* Include it into baseline
* Hit 'Save' btn

Expected:

* Success alert, node enters baseline

Actual:

* Exception is thrown and is displayed

Steps to reproduce 2:

# Start topology with 2 nodes.
# Activate cluster.
# Start third node.
# Stop second node.
# Try to add third node to baseline in Web console.

Also reproduced with *control.sh --baseline set* command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12459) Searching checkpoint record in WAL doesn't work with segment compaction

2019-12-17 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-12459:
--

 Summary: Searching checkpoint record in WAL doesn't work with 
segment compaction
 Key: IGNITE-12459
 URL: https://issues.apache.org/jira/browse/IGNITE-12459
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


During iteration over WAL we have two invariants about result(Tuple):
* WALPointer equal to WALRecord.position() when segment is uncompacted
* WALPointer not equal to WALRecord.position() when the segment is compacted
Unfortunately, the second variant is broken in 
FileWriteAheadLogManager#read(WALPointer ptr)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Apache Ignite 2.8 RELEASE [Time, Scope, Manager]

2019-09-24 Thread Anton Kalashnikov
Hello, Igniters.

I want to notice one more blocker for release [1]. This bug can lead to some 
incorrect baseline default enabled flag calculation(more details in the ticket).

[1] https://issues.apache.org/jira/browse/IGNITE-12227

-- 
Best regards,
Anton Kalashnikov


24.09.2019, 17:01, "Andrey Gura" :
> Sergey,
>
> As I know, scope freeze is not announced yet.
>
> On Tue, Sep 24, 2019 at 4:41 PM Sergey Antonov
>  wrote:
>>  Hi, I would add to release scope my ticket [1].
>>
>>  Any objections?
>>
>>  [1] https://issues.apache.org/jira/browse/IGNITE-12225
>>
>>  вт, 24 сент. 2019 г. в 09:21, Nikolay Izhikov :
>>
>>  > > merge to master only fully finished features
>>  >
>>  > It's already true for Ignite master branch.
>>  >
>>  >
>>  > В Вт, 24/09/2019 в 09:03 +0300, Alexey Zinoviev пишет:
>>  > > The planned before 2_3 months release dates are good defender from
>>  > > partially merged features, In my opinion
>>  > >
>>  > > Or we should have Master and dev branch separetely, and merge to master
>>  > > only fully finished features
>>  > >
>>  > > пн, 23 сент. 2019 г., 20:27 Maxim Muzafarov :
>>  > >
>>  > > > Andrey,
>>  > > >
>>  > > > Agree with you. It can affect the user impression.
>>  > > >
>>  > > > Can you advise, how can we guarantee in our case when we complete with
>>  > > > current partially merged features that someone will not partially
>>  > > > merge the new one? Should we monitor the master branch commits for
>>  > > > such purpose?
>>  > > >
>>  > > > On Mon, 23 Sep 2019 at 20:18, Andrey Gura  wrote:
>>  > > > >
>>  > > > > Maxim,
>>  > > > >
>>  > > > > > > From my point, if some components will not be ready by
>>  > > > > > > previously discussed `scope freeze` date it is absolutely OK to
>>  > > > > > > perform the next (e.g. 2.8.1, 2.8.2) releases.
>>  > > > >
>>  > > > > It is good approach if partial implemented features aren't merged to
>>  > > > > master branch. Unfortunately this is not our case.
>>  > > > >
>>  > > > > I don't see any reasons to force new Apache Ignite release. Time is
>>  > > > > not driver for release. If we want release Ignite periodically we
>>  > must
>>  > > > > significantly review the process. And most valuable change in this
>>  > > > > process is feature branches that will not block new release by
>>  > design.
>>  > > > >
>>  > > > > On Mon, Sep 23, 2019 at 8:12 PM Andrey Gura 
>>  > wrote:
>>  > > > > >
>>  > > > > > > > From my point of view monitoring isn't ready for release.
>>  > > > > > > Can you clarify, what exactly is not ready?
>>  > > > > > > Can we track planned changes somehow?
>>  > > > > >
>>  > > > > > We have too many not resolved tickets under IEP-35 label [1]. Also
>>  > it
>>  > > > > > makes sense to do some usability testing: JMX beans interfaces,
>>  > system
>>  > > > > > views, etc.
>>  > > > > >
>>  > > > > >
>>  > > > > > [1]
>>  > https://issues.apache.org/jira/issues/?jql=labels%20%3D%20IEP-35
>>  > > > > >
>>  > > > > > On Mon, Sep 23, 2019 at 6:04 PM Nikolay Izhikov <
>>  > nizhi...@apache.org>
>>  > > >
>>  > > > wrote:
>>  > > > > > >
>>  > > > > > > Hello, Andrey.
>>  > > > > > >
>>  > > > > > > > From my point of view monitoring isn't ready for release.
>>  > > > > > >
>>  > > > > > > Can you clarify, what exactly is not ready?
>>  > > > > > > Can we track planned changes somehow?
>>  > > > > > >
>>  > > > > > >
>>  > > > > > > В Пн, 23/09/2019 в 17:59 +0300, Andrey Gura пишет:
>>  > > > > > > > Igniters,
>>  > > > > > > >
>>  > > > > > > > From my point of view monitoring isn't ready for release. So 
>> it

[jira] [Created] (IGNITE-12227) Default auto-adjust baseline enabled flag calculated incorrectly in some cases

2019-09-24 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-12227:
--

 Summary: Default auto-adjust baseline enabled flag calculated 
incorrectly in some cases
 Key: IGNITE-12227
 URL: https://issues.apache.org/jira/browse/IGNITE-12227
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


baselineAutoAdjustEnabled can be been different on different nodes because of 
the calculation of default value happening locally on each node and including 
only local configuration. It issue can happen by the following reasons:
*  If IGNITE_BASELINE_AUTO_ADJUST_ENABLED flag set to a different value on 
different nodes it leads to cluster hanging due to baseline calculation 
finishing with the unpredictable state on each node.
* if cluster in mixed mode(included in-memory and persistent nodes) sometimes 
flag is set to a different value due to calculation doesn't consider remote 
nodes configuration.

Possible solution(both points required):
* Get rid of IGNITE_BASELINE_AUTO_ADJUST_ENABLED and replace it by the explicit 
call of IgniteCluster#baselineAutoAdjustEnabled where it required(test only).
* Calculating default value on the first started node as early as 
possible(instead of activation) and this value always should be set to 
distributed metastorage(unlike it happening now). It means that instead of 
awaiting activation, the default value would be calculated by the first started 
node.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12179) Test and javadoc fixes

2019-09-17 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-12179:
--

 Summary: Test and javadoc fixes
 Key: IGNITE-12179
 URL: https://issues.apache.org/jira/browse/IGNITE-12179
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


Some javadoc package descriptions missed:
* org.apache.ignite.spi.communication.tcp.internal
* org.apache.ignite.spi.discovery.zk
* org.apache.ignite.spi.discovery.zk.internal
* org.apache.ignite.ml.structures.partition
* org.gridgain.grid.persistentstore.snapshot.file.copy
Unclear CLEANUP_RESTARTING_CACHES command in snapshot utility
unclear error when connecting to secure cluster (SSL + Auth)
Update log message to avoid confusion for an user

*.testTtlNoTx flaky failed on TC
TcpCommunicationSpiFreezingClientTest failed
TcpCommunicationSpiFaultyClientSslTest.testNotAcceptedConnection failed
testCacheIdleVerifyPrintLostPartitions failed



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IGNITE-12154) Test testCheckpointFailBeforeMarkEntityWrite fail in compression suit

2019-09-10 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-12154:
--

 Summary: Test testCheckpointFailBeforeMarkEntityWrite fail in 
compression suit
 Key: IGNITE-12154
 URL: https://issues.apache.org/jira/browse/IGNITE-12154
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


CheckpointFailBeforeWriteMarkTest.testCheckpointFailBeforeMarkEntityWrite


https://ci.ignite.apache.org/viewLog.html?buildId=4584051&buildTypeId=IgniteTests24Java8_DiskPageCompressions&tab=buildResultsDiv&branch_IgniteTests24Java8=%3Cdefault%3E



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IGNITE-12121) Double checkpoint triggering due to incorrect place of update current checkpoint

2019-08-29 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-12121:
--

 Summary: Double checkpoint triggering due to incorrect place of 
update current checkpoint
 Key: IGNITE-12121
 URL: https://issues.apache.org/jira/browse/IGNITE-12121
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


Double checkpoint triggering due to incorrect place of update current 
checkpoint. This can lead to two ckeckpoints one by one if checkpoint trigger 
was 'too many dirty pages'.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


Re: Control.sh usability & possible bugs

2019-08-26 Thread Anton Kalashnikov
Hello, Igniters.

+1 for Script help usability - issue 3

Looks like it will be great if we avoid the repeated output of the commands, 
ex.:

control.sh [--host HOST_OR_IP] [--port PORT] [--user USER] [--password 
PASSWORD]  [--ping-interval PING_INTERVAL] [--ping-timeout PING_TIMEOUT] 
[] [--yes]

Allowable :
--activate - ...
--deactivate - ...
...

-- 
Best regards,
Anton Kalashnikov


26.08.2019, 15:00, "Dmitriy Pavlov" :
> Hi Igniters,
>
> During voting on 2.7.6-rc1, I saw how experienced Ignite contributor
> committer and PMC member were trying to activate cluster using control.sh
> command.
>
> We usually just call cluster().active(true), but end users have to use the
> script provided in the distribution.
>
> Related to control.sh there are 3 concerns:
>
> Issue 1: On Mac OS if there is an empty (unset) JAVA_HOME variable, script
> outputs noting and probably does not execute its comment.
>
> Petr Ivanov, Alexey Goncharuck, could you please double-check if it could
> be fixed?
>
> Issue 2: Control.sh was not able to connect to cluster (local). AFAIK
> multicast is still our defaults. so it should be possible to connect to
> cluster without any options.
>
> Ivan Rakov, could you please chime in? Is it a local issue or bug?
>
> Issue 3: Script help usability
>
> Example of output:
>
>  Activate cluster:
>
> control.sh [--host HOST_OR_IP] [--port PORT] [--user USER] [--password
> PASSWORD] [--ping-interval PING_INTERVAL] [--ping-timeout PING_TIMEOUT]
> --activate
>
>   Deactivate cluster:
>
> control.sh [--host HOST_OR_IP] [--port PORT] [--user USER] [--password
> PASSWORD] [--ping-interval PING_INTERVAL] [--ping-timeout PING_TIMEOUT]
> --deactivate [--yes]
>
>  ...
>
> Why do we repeat tons of parameters each time? Is it better for users to
> enlist options and commands separately?
>
>  control.sh [options] command
>
> and then enlist options
>
> [--host HOST_OR_IP]
>
> [--port PORT]
>
> [--user USER]
>
> [--password PASSWORD]
>
> [--ping-interval PING_INTERVAL]
>
> [--ping-timeout PING_TIMEOUT]
>
> and describe several commands we have?
>
> In coding WET is not the best solution. So maybe we could DRY in our help,
> should we?
>
> Artem Boudnikov, could you evaluate this idea?
>
> Sincerely,
> Dmitriy Pavlov


Re: Replacing default work dir from tmp to current dir

2019-08-26 Thread Anton Kalashnikov
Hello, Igniters.

There are a lot of variants was already proposed lets vote to one of them. I 
made a list of possible paths which was mentioned earlier. I also included 
variants outside of home directory('user.dir') to this list but I want to notes 
that we had already discussed it and we decided to choose some path in home 
directory rather outside of that. Also If you have any other variants feel free 
to add it.

1) ~/.ignite/work
2) ~/ignite/work
3) ~/.config/ignite/work

4)/var/lib/ignite
5)/usr/local/ignite
6)/var/ignite
7)/var/lib/ignite
8)/opt/ignite/

+1 for '~/.ignite/work'

-- 
Best regards,
Anton Kalashnikov


26.08.2019, 12:39, "Nikolay Izhikov" :
> Ilya,
>
>>  In development environment one can just run Java from /var/lib/ignite
>
> Actually, I doesn't understand you.
> Are you talking about development of some application that uses Ignite or 
> contribution to Ignite code base?
>
> If we are talking about some application that uses Ignite then we should 
> decide, which scenario is primary.
> (One more time, we are talking about PDS enabled caches):
>
> 1. Ignite server node started as separate java process.
> 2. Ignite server node embedded in application as a library.
>
> I think, for PDS enabled cashes first case is primary.
> In that case, user should install Ignite via some package(deb, rpm, docker, 
> etc).
> This package should done all required configuration.
> Including directory permissions.
>
> This should be done like other DBMS do.
>
> If we are talking about embedded Ignite then we can ask the user to provide 
> sufficient permission for default dir or change dir to some other.
>
> So, I still think we should use /var/lig/ignite for PDS data.
>
> How it sounds?
>
> В Пн, 26/08/2019 в 12:23 +0300, Ilya Kasnacheev пишет:
>>  Hello!
>>
>>  In development environment one can just run Java from /var/lib/ignite
>>  (makes total sense) and will immediately get almost correct behavior (well,
>>  data will be stored to /var/lib/ignite/ignite/work)
>>
>>  However, I still think that we should write to user.dir/ignite and not just
>>  user.dir since current directory is often crowded.
>>
>>  Fellows, anyone who is against using user.dir? Please share your concerns.
>>
>>  Regards,



Re: [VOTE] Release Apache Ignite 2.7.6-rc1

2019-08-23 Thread Anton Kalashnikov
+1
Downloaded sources, successfully assembled and started.


-- 
Best regards,
Anton Kalashnikov


23.08.2019, 19:31, "Ivan Rakov" :
> +1
> Downloaded binaries, successfully assembled cluster.
>
> Best Regards,
> Ivan Rakov
>
> On 23.08.2019 19:07, Dmitriy Pavlov wrote:
>>  +1
>>
>>  Checked: build from sources, startup node on Windows, simple topology,
>>  version and copyright year output,
>>  2.7.6-rc0 is used in the Apache Ignite Teamcity Bot since Sunday, Aug 18
>>  2.7.6-rc1 (ver. 2.7.6#20190821-sha1:6b3acf40) installed as DB for the TC
>>  Bot just now and the bot works well.
>>
>>  пт, 23 авг. 2019 г. в 18:58, Alexey Kuznetsov :
>>
>>>  +1
>>>  Compiled from sources on Windows, started ignite.bat.
>>>
>>>  On Fri, Aug 23, 2019 at 10:52 PM Pavel Tupitsyn 
>>>  wrote:
>>>
>>>>  +1, checked .NET node start and examples
>>>>
>>>>  On Fri, Aug 23, 2019 at 6:49 PM Alexei Scherbakov <
>>>>  alexey.scherbak...@gmail.com> wrote:
>>>>
>>>>>  +1
>>>>>
>>>>>  пт, 23 авг. 2019 г. в 18:33, Alexey Goncharuk <
>>>>  alexey.goncha...@gmail.com
>>>>>>  :
>>>>>>  +1
>>>>>>  Checked the source compilation and release package build, node start,
>>>>>  and a
>>>>>>  few examples. Left a comment on the failed TC task in the discussion
>>>>>>  thread.
>>>>>>
>>>>>>  пт, 23 авг. 2019 г. в 18:15, Andrey Gura :
>>>>>>
>>>>>>>  +1
>>>>>>>
>>>>>>>  On Fri, Aug 23, 2019 at 3:32 PM Anton Vinogradov 
>>>>>  wrote:
>>>>>>>>  -1 (binding)
>>>>>>>>  Explained at discussion thread.
>>>>>>>>
>>>>>>>>  On Fri, Aug 23, 2019 at 11:17 AM Anton Vinogradov >>>>>  wrote:
>>>>>>>>>  Dmitriy,
>>>>>>>>>
>>>>>>>>>  Did you check RC using automated TeamCity task?
>>>>>>>>>
>>>>>>>>>  On Fri, Aug 23, 2019 at 11:09 AM Zhenya Stanilovsky
>>>>>>>>>   wrote:
>>>>>>>>>
>>>>>>>>>>  Build from sources, run yardstick test.
>>>>>>>>>>  +1
>>>>>>>>>>>  --- Forwarded message ---
>>>>>>>>>>>  From: "Dmitriy Pavlov" < dpav...@apache.org >
>>>>>>>>>>>  To: dev < dev@ignite.apache.org >
>>>>>>>>>>>  Cc:
>>>>>>>>>>>  Subject: [VOTE] Release Apache Ignite 2.7.6-rc1
>>>>>>>>>>>  Date: Thu, 22 Aug 2019 20:11:58 +0300
>>>>>>>>>>>
>>>>>>>>>>>  Dear Community,
>>>>>>>>>>>
>>>>>>>>>>>  I have uploaded release candidate to
>>>>>>>>>>>  https://dist.apache.org/repos/dist/dev/ignite/2.7.6-rc1/
>>>>>  https://dist.apache.org/repos/dist/dev/ignite/packages_2.7.6-rc1/
>>>>>>>>>>>  The following staging can be used for any dependent project
>>>  for
>>>>>>>  testing:
>>>  https://repository.apache.org/content/repositories/orgapacheignite-1466/
>>>>>>>>>>>  This is the second maintenance release for 2.7.x with a
>>>  number
>>>>  of
>>>>>>>  fixes.
>>>>>>>>>>>  Tag name is 2.7.6-rc1:
>>>  
>>> https://gitbox.apache.org/repos/asf?p=ignite.git;a=tag;h=refs/tags/2.7.6-rc1
>>>>>>>>>>>  2.7.6 changes:
>>>>>>>>>>> * Ignite work directory is now set to the current user's
>>>>  home
>>>>>>>>>>  directory
>>>>>>>>>>>  by
>>>>>>>>>>>  default, native persistence files will not be stored in the
>>>>  Temp
>>>>>>>>>>  directory
>>>>>>>>>>>  anymore
>>>>>>>>>>> * Fixed a bug that caused a SELECT query with an equality
>>>>>>>  predicate
>>>>>>>>>>  on a
>>>>>>>>>>>  p

Replacing default work dir from tmp to current dir

2019-08-12 Thread Anton Kalashnikov
Hello, Igniters.

Currently, in the case, when work directory wasn't set by user ignite can 
resolve it to tmp directory which leads to some problem - tmp directory can be 
cleared at some unexpected moment by operation system and different types of 
critical data would be lost(ex. binary_meta, persistance data).

Looks like it is not expected behaviour and maybe it is better instead of tmp 
directory use the current working directory("user.dir")? Or any other idea?

A little more details you can find in the ticket - 
https://issues.apache.org/jira/browse/IGNITE-12057
-- 
Best regards,
Anton Kalashnikov



[jira] [Created] (IGNITE-11982) Fix bugs of pds

2019-07-15 Thread Anton Kalashnikov (JIRA)
Anton Kalashnikov created IGNITE-11982:
--

 Summary: Fix bugs of pds
 Key: IGNITE-11982
 URL: https://issues.apache.org/jira/browse/IGNITE-11982
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


fixed pds crash:
* Fail during logical recovery
* JVM crash in all compatibility LFS tests
* WAL segments serialization problem
* Unable to read last WAL record after crash during checkpoint
* Node failed on detecting storage block size if page compression enabled on 
many caches
* Can not change baseline for in-memory cluster
* SqlFieldsQuery DELETE FROM causes JVM crash
* Fixed IgniteCheckedException: Compound exception for CountDownFuture.

fixed tests:
* WalCompactionAndPageCompressionTest
* IgnitePdsRestartAfterFailedToWriteMetaPageTest.test
 * GridPointInTimeRecoveryRebalanceTest.testRecoveryNotFailsIfWalSomewhereEnab
* 
IgniteClusterActivateDeactivateTest.testDeactivateSimple_5_Servers_5_Clients_Fro
* IgniteCacheReplicatedQuerySelfTest.testNodeLeft 
* .NET tests

optimization:
* Replace TcpDiscoveryNode to nodeId in TcpDiscoveryMessages
* Failures to deserialize discovery data should be handled by a failure handler
* Optimize GridToStringBuilder



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IGNITE-11969) Incorrect DefaultConcurrencyLevel value in .net test

2019-07-09 Thread Anton Kalashnikov (JIRA)
Anton Kalashnikov created IGNITE-11969:
--

 Summary: Incorrect DefaultConcurrencyLevel value in .net test
 Key: IGNITE-11969
 URL: https://issues.apache.org/jira/browse/IGNITE-11969
 Project: Ignite
  Issue Type: Test
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


Incorrect DefaultConcurrencyLevel value in .net test after default 
configuration in java was changed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11892) Incorrect assert in wal scanner test

2019-06-04 Thread Anton Kalashnikov (JIRA)
Anton Kalashnikov created IGNITE-11892:
--

 Summary: Incorrect assert  in wal scanner test
 Key: IGNITE-11892
 URL: https://issues.apache.org/jira/browse/IGNITE-11892
 Project: Ignite
  Issue Type: Improvement
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


https://ci.ignite.apache.org/viewLog.html?buildId=4038516&buildTypeId=IgniteTests24Java8_Pds2

{noformat}
junit.framework.AssertionFailedError: Next WAL record :: Record : PAGE_RECORD - 
Unable to convert to string representation.
at 
org.apache.ignite.internal.processors.cache.persistence.wal.scanner.WalScannerTest.shouldDumpToFileFoundRecord(WalScannerTest.java:254)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11818) Support JMX/control.sh for debug page info

2019-04-26 Thread Anton Kalashnikov (JIRA)
Anton Kalashnikov created IGNITE-11818:
--

 Summary: Support JMX/control.sh for debug page info
 Key: IGNITE-11818
 URL: https://issues.apache.org/jira/browse/IGNITE-11818
 Project: Ignite
  Issue Type: Improvement
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


Support JMX/control.sh for debug page info



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11816) Debug processor for dump page history info

2019-04-26 Thread Anton Kalashnikov (JIRA)
Anton Kalashnikov created IGNITE-11816:
--

 Summary: Debug processor for dump page history info
 Key: IGNITE-11816
 URL: https://issues.apache.org/jira/browse/IGNITE-11816
 Project: Ignite
  Issue Type: Improvement
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


Debug processor for dump page history info



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11782) WAL iterator for collect per-pageId info

2019-04-18 Thread Anton Kalashnikov (JIRA)
Anton Kalashnikov created IGNITE-11782:
--

 Summary: WAL iterator for collect per-pageId info
 Key: IGNITE-11782
 URL: https://issues.apache.org/jira/browse/IGNITE-11782
 Project: Ignite
  Issue Type: Improvement
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


Implement WAL iterator for collect per-pageId info (page is root)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11678) Forbidding joining persistence node to in-memory cluster

2019-04-03 Thread Anton Kalashnikov (JIRA)
Anton Kalashnikov created IGNITE-11678:
--

 Summary: Forbidding joining persistence node to in-memory cluster
 Key: IGNITE-11678
 URL: https://issues.apache.org/jira/browse/IGNITE-11678
 Project: Ignite
  Issue Type: Improvement
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


Forbidding joining persistence node to in-memory cluster when baseline 
auto-adjust enabled and timeout equal to 0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11650) Commutication worker doesn't kick client node after expired idleConnTimeout

2019-03-28 Thread Anton Kalashnikov (JIRA)
Anton Kalashnikov created IGNITE-11650:
--

 Summary: Commutication worker doesn't kick client node after 
expired idleConnTimeout
 Key: IGNITE-11650
 URL: https://issues.apache.org/jira/browse/IGNITE-11650
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


Reproduced by TcpCommunicationSpiFreezingClientTest.testFreezingClient
{noformat}
java.lang.AssertionError: Client node must be kicked from topology
at org.junit.Assert.fail(Assert.java:88)
at 
org.apache.ignite.testframework.junits.JUnitAssertAware.fail(JUnitAssertAware.java:49)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpiFreezingClientTest.testFreezingClient(TcpCommunicationSpiFreezingClientTest.java:122)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.apache.ignite.testframework.junits.GridAbstractTest$6.run(GridAbstractTest.java:2102)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11627) Test CheckpointFreeListTest.testRestoreFreeListCorrectlyAfterRandomStop always fails in DiskCompression suite

2019-03-26 Thread Anton Kalashnikov (JIRA)
Anton Kalashnikov created IGNITE-11627:
--

 Summary: Test 
CheckpointFreeListTest.testRestoreFreeListCorrectlyAfterRandomStop always fails 
in DiskCompression suite
 Key: IGNITE-11627
 URL: https://issues.apache.org/jira/browse/IGNITE-11627
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=5828425958400232265&tab=testDetails&branch_IgniteTests24Java8=%3Cdefault%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11605) Incorrect check condition in BinaryTypeRegistrationTest.shouldSendOnlyOneMetadataMessage

2019-03-22 Thread Anton Kalashnikov (JIRA)
Anton Kalashnikov created IGNITE-11605:
--

 Summary: Incorrect check condition in 
BinaryTypeRegistrationTest.shouldSendOnlyOneMetadataMessage
 Key: IGNITE-11605
 URL: https://issues.apache.org/jira/browse/IGNITE-11605
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


BinaryTypeRegistrationTest.shouldSendOnlyOneMetadataMessage is flaky.
{noformat}
java.lang.AssertionError: 
Expected :1
Actual :2


at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.ignite.testframework.junits.JUnitAssertAware.assertEquals(JUnitAssertAware.java:94)
at 
org.apache.ignite.internal.processors.cache.BinaryTypeRegistrationTest.shouldSendOnlyOneMetadataMessage(BinaryTypeRegistrationTest.java:106)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.apache.ignite.testframework.junits.GridAbstractTest$6.run(GridAbstractTest.java:2102)
at java.lang.Thread.run(Thread.java:748)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11590) NPE during onKernalStop in mvcc processor

2019-03-21 Thread Anton Kalashnikov (JIRA)
Anton Kalashnikov created IGNITE-11590:
--

 Summary: NPE during onKernalStop in mvcc processor 
 Key: IGNITE-11590
 URL: https://issues.apache.org/jira/browse/IGNITE-11590
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


IgniteProjectionStartStopRestartSelfTest#testStopNodesByIds
{noformat}
java.lang.NullPointerException
at 
java.util.concurrent.ConcurrentHashMap.replaceNode(ConcurrentHashMap.java:1106)
at 
java.util.concurrent.ConcurrentHashMap.remove(ConcurrentHashMap.java:1097)
at 
org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorFailed(MvccProcessorImpl.java:527)
at 
org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onKernalStop(MvccProcessorImpl.java:459)
at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2335)
at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2283)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2570)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2533)
at org.apache.ignite.internal.IgnitionEx.stop(IgnitionEx.java:330)
at org.apache.ignite.internal.IgnitionEx.stop(IgnitionEx.java:297)
at org.apache.ignite.Ignition.stop(Ignition.java:200)
at 
org.apache.ignite.internal.IgniteProjectionStartStopRestartSelfTest.afterTest(IgniteProjectionStartStopRestartSelfTest.java:190)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.tearDown(GridAbstractTest.java:1804)
at 
org.apache.ignite.testframework.junits.JUnit3TestLegacySupport.runTestCase(JUnit3TestLegacySupport.java:70)
at 
org.apache.ignite.testframework.junits.GridAbstractTest$2.evaluate(GridAbstractTest.java:185)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.evaluateInsideFixture(GridAbstractTest.java:2579)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.access$500(GridAbstractTest.java:152)
at 
org.apache.ignite.testframework.junits.GridAbstractTest$BeforeFirstAndAfterLastTestRule$1.evaluate(GridAbstractTest.java:2559)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at 
com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at 
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11569) Enable baseline auto-adjust by default only for empty cluster

2019-03-19 Thread Anton Kalashnikov (JIRA)
Anton Kalashnikov created IGNITE-11569:
--

 Summary: Enable baseline auto-adjust by default only for empty 
cluster
 Key: IGNITE-11569
 URL: https://issues.apache.org/jira/browse/IGNITE-11569
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


It is required to enable baseline auto-adjust by default only for empty cluster



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11545) Logging baseline auto-adjust

2019-03-14 Thread Anton Kalashnikov (JIRA)
Anton Kalashnikov created IGNITE-11545:
--

 Summary: Logging baseline auto-adjust
 Key: IGNITE-11545
 URL: https://issues.apache.org/jira/browse/IGNITE-11545
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


It needs to add some extra log to baseline auto-adjust process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11391) Test on free list is freezes sometimes

2019-02-22 Thread Anton Kalashnikov (JIRA)
Anton Kalashnikov created IGNITE-11391:
--

 Summary: Test on free list is freezes sometimes
 Key: IGNITE-11391
 URL: https://issues.apache.org/jira/browse/IGNITE-11391
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


CheckpointFreeListTest#testRestoreFreeListCorrectlyAfterRandomStop - freezed 
sometimes 
 CheckpointFreeListTest.testFreeListRestoredCorrectly - flaky



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11382) Stop managers from all caches before caches stop

2019-02-21 Thread Anton Kalashnikov (JIRA)
Anton Kalashnikov created IGNITE-11382:
--

 Summary: Stop managers from all caches before caches stop
 Key: IGNITE-11382
 URL: https://issues.apache.org/jira/browse/IGNITE-11382
 Project: Ignite
  Issue Type: Improvement
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


It is required to stop all cache managers before stopping this caches



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11377) Display time to baseline auto-adjust event in console.sh

2019-02-20 Thread Anton Kalashnikov (JIRA)
Anton Kalashnikov created IGNITE-11377:
--

 Summary: Display time to baseline auto-adjust event in console.sh
 Key: IGNITE-11377
 URL: https://issues.apache.org/jira/browse/IGNITE-11377
 Project: Ignite
  Issue Type: Improvement
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


It required to add information about next auto-adjust.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11297) Improving read of hot variables in WAL

2019-02-12 Thread Anton Kalashnikov (JIRA)
Anton Kalashnikov created IGNITE-11297:
--

 Summary: Improving read of hot variables in WAL
 Key: IGNITE-11297
 URL: https://issues.apache.org/jira/browse/IGNITE-11297
 Project: Ignite
  Issue Type: Improvement
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


Looks like it is not neccessery mark some variables as volatile in 
FileWriteAheadLogManager because its initialized only one time on start but its 
have a lot of read of them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Baseline auto-adjust`s discuss

2019-01-29 Thread Anton Kalashnikov
Ivan, I glad you interested in this feature. Some answers are below.

Yes, it is correct about properties it is consistent across the cluster cause 
it based on distributed metastore(we have other topic for discuss it).

Some details of implementation: When event happened we added task to 
GridTimeoutProcessor(old mechanism for task execution with delay). Task is 
added only for coordinator. There is not required do it on non coordinator 
nodes because if coordinator will be failed new event will be appear and we 
will generate new task on new coordinator. 

-- 
Best regards,
Anton Kalashnikov


28.01.2019, 13:17, "Павлухин Иван" :
> Anton,
>
> Great feature!
>
> Could you please clarify a bit about implementation details? As I
> understood auto-adjust properties is meant to be consistent across the
> cluster. And baseline adjustment is put into some delay queue. Do we
> put event into a queue on each node? Or is there some dedicated node
> driving baseline adjustment?
>
> пт, 25 янв. 2019 г. в 16:31, Anton Kalashnikov :
>>  Initially, hard timeout should protect grid from constantly changing 
>> topology(constantly blinking node). But in fact if we have constantly 
>> changing topology, baseline adjust operation is failed in most cases. As 
>> result hard timeout only added complexity but it don't give any new 
>> guarantee. So I think we can skip it in first implementation.
>>
>>  First of all timeout protect us from unnecessary adjust of baseline . If 
>> node left the grid and immediately(or after some time less than us timeout) 
>> it join back to grid. Also timeout is helpful in other cases when some 
>> events happened one after another.
>>
>>  This feature doesn't have any complex heuristic to react, except of 
>> described in restrictions section.
>>
>>  Also I want to notes that this feature isn't protect us from constantly 
>> blinking node. We need one more heuristic mechanism for detect this 
>> situation and doing some actions like removing this node from grid.
>>
>>  --
>>  Best regards,
>>  Anton Kalashnikov
>>
>>  25.01.2019, 15:43, "Sergey Chugunov" :
>>  > Anton,
>>  >
>>  > As I understand from the IEP document policy was supposed to support two
>>  > timeouts: soft and hard, so here you're proposing a bit simpler
>>  > functionality.
>>  >
>>  > Just to clarify, do I understand correctly that this feature when enabled
>>  > will auto-adjust blt on each node join/node left event, and timeout is
>>  > necessary to protect us from blinking nodes?
>>  > So no complexities with taking into account number of alive backups or
>>  > something like that?
>>  >
>>  > On Fri, Jan 25, 2019 at 1:11 PM Vladimir Ozerov 
>>  > wrote:
>>  >
>>  >> Got it, makes sense.
>>  >>
>>  >> On Fri, Jan 25, 2019 at 11:06 AM Anton Kalashnikov 
>>  >> wrote:
>>  >>
>>  >> > Vladimir, thanks for your notes, both of them looks good enough but I
>>  >> > have two different thoughts about it.
>>  >> >
>>  >> > I think I agree about enabling only one of manual/auto adjustment. It 
>> is
>>  >> > easier than current solution and in fact as extra feature we can allow
>>  >> > user to force task to execute(if they doesn't want to wait until 
>> timeout
>>  >> > expired).
>>  >> > But about second one I don't sure that one parameters instead of two
>>  >> would
>>  >> > be more convenient. For example: in case when user changed timeout and
>>  >> then
>>  >> > disable auto-adjust after then when someone will want to enable it they
>>  >> > should know what value of timeout was before auto-adjust was disabled. 
>> I
>>  >> > think "negative value" pattern good choice for always usable parameters
>>  >> > like timeout of connection (ex. -1 equal to endless waiting) and so on,
>>  >> but
>>  >> > in our case we want to disable whole functionality rather than change
>>  >> > parameter value.
>>  >> >
>>  >> > --
>>  >> > Best regards,
>>  >> > Anton Kalashnikov
>>  >> >
>>  >> >
>>  >> > 24.01.2019, 22:03, "Vladimir Ozerov" :
>>  >> > > Hi Anton,
>>  >> > >
>>  >> > > This is great feature, but I am a bit confused about automatic
>>  >> disabling
&g

Re: Baseline auto-adjust`s discuss

2019-01-25 Thread Anton Kalashnikov
Initially, hard timeout should protect grid from constantly changing 
topology(constantly blinking node). But in fact if we have constantly changing 
topology, baseline adjust operation is failed in most cases. As result hard 
timeout only added complexity but it don't give any new guarantee. So I think 
we can skip it in first implementation.

First of all timeout protect us from unnecessary adjust of baseline . If node 
left the grid and immediately(or after some time less than us timeout) it join 
back to grid. Also timeout is helpful in other cases when some events happened 
one after another.

This feature doesn't have any complex heuristic to react, except of described 
in restrictions section.

Also I want to notes that this feature isn't protect us from constantly 
blinking node. We need one more heuristic mechanism for detect this situation 
and doing some actions like removing this node from grid.

-- 
Best regards,
Anton Kalashnikov


25.01.2019, 15:43, "Sergey Chugunov" :
> Anton,
>
> As I understand from the IEP document policy was supposed to support two
> timeouts: soft and hard, so here you're proposing a bit simpler
> functionality.
>
> Just to clarify, do I understand correctly that this feature when enabled
> will auto-adjust blt on each node join/node left event, and timeout is
> necessary to protect us from blinking nodes?
> So no complexities with taking into account number of alive backups or
> something like that?
>
> On Fri, Jan 25, 2019 at 1:11 PM Vladimir Ozerov 
> wrote:
>
>>  Got it, makes sense.
>>
>>  On Fri, Jan 25, 2019 at 11:06 AM Anton Kalashnikov 
>>  wrote:
>>
>>  > Vladimir, thanks for your notes, both of them looks good enough but I
>>  > have two different thoughts about it.
>>  >
>>  > I think I agree about enabling only one of manual/auto adjustment. It is
>>  > easier than current solution and in fact as extra feature we can allow
>>  > user to force task to execute(if they doesn't want to wait until timeout
>>  > expired).
>>  > But about second one I don't sure that one parameters instead of two
>>  would
>>  > be more convenient. For example: in case when user changed timeout and
>>  then
>>  > disable auto-adjust after then when someone will want to enable it they
>>  > should know what value of timeout was before auto-adjust was disabled. I
>>  > think "negative value" pattern good choice for always usable parameters
>>  > like timeout of connection (ex. -1 equal to endless waiting) and so on,
>>  but
>>  > in our case we want to disable whole functionality rather than change
>>  > parameter value.
>>  >
>>  > --
>>  > Best regards,
>>  > Anton Kalashnikov
>>  >
>>  >
>>  > 24.01.2019, 22:03, "Vladimir Ozerov" :
>>  > > Hi Anton,
>>  > >
>>  > > This is great feature, but I am a bit confused about automatic
>>  disabling
>>  > of
>>  > > a feature during manual baseline adjustment. This may lead to
>>  unpleasant
>>  > > situations when a user enabled auto-adjustment, then re-adjusted it
>>  > > manually somehow (e.g. from some previously created script) so that
>>  > > auto-adjustment disabling went unnoticed, then added more nodes hoping
>>  > that
>>  > > auto-baseline is still active, etc.
>>  > >
>>  > > Instead, I would rather make manual and auto adjustment mutually
>>  > exclusive
>>  > > - baseline cannot be adjusted manually when auto mode is set, and vice
>>  > > versa. If exception is thrown in that cases, administrators will always
>>  > > know current behavior of the system.
>>  > >
>>  > > As far as configuration, wouldn’t it be enough to have a single long
>>  > value
>>  > > as opposed to Boolean + long? Say, 0 - immediate auto adjustment,
>>  > negative
>>  > > - disabled, positive - auto adjustment after timeout.
>>  > >
>>  > > Thoughts?
>>  > >
>>  > > чт, 24 янв. 2019 г. в 18:33, Anton Kalashnikov :
>>  > >
>>  > >> Hello, Igniters!
>>  > >>
>>  > >> Work on the Phase II of IEP-4 (Baseline topology) [1] has started. I
>>  > want
>>  > >> to start to discuss of implementation of "Baseline auto-adjust" [2].
>>  > >>
>>  > >> "Baseline auto-adjust" feature implements mechanism of auto-adjust
>>  > >> baseline corresponding to current topology aft

Re: Baseline auto-adjust`s discuss

2019-01-25 Thread Anton Kalashnikov
Vladimir, thanks  for your notes, both of them looks good enough but I have two 
different thoughts about it. 

I think I agree about enabling only one of manual/auto adjustment. It is easier 
than current solution and in fact as extra feature  we can allow user to force 
task to execute(if they doesn't want to wait until timeout expired). 
But about second one I don't sure that one parameters instead of two would be 
more convenient. For example: in case when user changed timeout and then 
disable auto-adjust after then when someone will want to enable it they should 
know what value of timeout was before auto-adjust was disabled. I think 
"negative value" pattern good choice for always usable parameters like timeout 
of connection (ex. -1 equal to endless waiting) and so on, but in our case we 
want to disable whole functionality rather than change parameter value.

-- 
Best regards,
Anton Kalashnikov


24.01.2019, 22:03, "Vladimir Ozerov" :
> Hi Anton,
>
> This is great feature, but I am a bit confused about automatic disabling of
> a feature during manual baseline adjustment. This may lead to unpleasant
> situations when a user enabled auto-adjustment, then re-adjusted it
> manually somehow (e.g. from some previously created script) so that
> auto-adjustment disabling went unnoticed, then added more nodes hoping that
> auto-baseline is still active, etc.
>
> Instead, I would rather make manual and auto adjustment mutually exclusive
> - baseline cannot be adjusted manually when auto mode is set, and vice
> versa. If exception is thrown in that cases, administrators will always
> know current behavior of the system.
>
> As far as configuration, wouldn’t it be enough to have a single long value
> as opposed to Boolean + long? Say, 0 - immediate auto adjustment, negative
> - disabled, positive - auto adjustment after timeout.
>
> Thoughts?
>
> чт, 24 янв. 2019 г. в 18:33, Anton Kalashnikov :
>
>>  Hello, Igniters!
>>
>>  Work on the Phase II of IEP-4 (Baseline topology) [1] has started. I want
>>  to start to discuss of implementation of "Baseline auto-adjust" [2].
>>
>>  "Baseline auto-adjust" feature implements mechanism of auto-adjust
>>  baseline corresponding to current topology after event join/left was
>>  appeared. It is required because when a node left the grid and nobody would
>>  change baseline manually it can lead to lost data(when some more nodes left
>>  the grid on depends in backup factor) but permanent tracking of grid is not
>>  always possible/desirible. Looks like in many cases auto-adjust baseline
>>  after some timeout is very helpfull.
>>
>>  Distributed metastore[3](it is already done):
>>
>>  First of all it is required the ability to store configuration data
>>  consistently and cluster-wide. Ignite doesn't have any specific API for
>>  such configurations and we don't want to have many similar implementations
>>  of the same feature in our code. After some thoughts is was proposed to
>>  implement it as some kind of distributed metastorage that gives the ability
>>  to store any data in it.
>>  First implementation is based on existing local metastorage API for
>>  persistent clusters (in-memory clusters will store data in memory).
>>  Write/remove operation use Discovery SPI to send updates to the cluster, it
>>  guarantees updates order and the fact that all existing (alive) nodes have
>>  handled the update message. As a way to find out which node has the latest
>>  data there is a "version" value of distributed metastorage, which is
>>  basically . All updates history
>>  until some point in the past is stored along with the data, so when an
>>  outdated node connects to the cluster it will receive all the missing data
>>  and apply it locally. If there's not enough history stored or joining node
>>  is clear then it'll receive shapshot of distributed metastorage so there
>>  won't be inconsistencies.
>>
>>  Baseline auto-adjust:
>>
>>  Main scenario:
>>  - There is grid with the baseline is equal to the current topology
>>  - New node joins to grid or some node left(failed) the grid
>>  - New mechanism detects this event and it add task for changing
>>  baseline to queue with configured timeout
>>  - If new event are happened before baseline would be changed task
>>  would be removed from queue and new task will be added
>>  - When timeout are expired the task would try to set new baseline
>>  corresponded to current topology
>>
>>  First of all we need to add two parameters[4]:
>>  - baseline

  1   2   >