Re: IgniteInterruptedException on cache.get in a transaction inside runnable (ignite 2.6)

2019-01-15 Thread bintisepaha
The next time it happens, I will gather the logs.

Thanks,
Binti



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: IgniteInterruptedException on cache.get in a transaction inside runnable (ignite 2.6)

2019-01-15 Thread bintisepaha
Sorry, we are still on 2.3 :)
The timeout is 5 minutes on this transaction and it fails pretty quickly. At
the same time, many threads fail at the exact same operation across nodes.

Was there a similar bug in 2.3? We will be upgrading to 2.7 soon.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


IgniteInterruptedException on cache.get in a transaction inside runnable (ignite 2.6)

2019-01-14 Thread bintisepaha


Hi folks, we are getting this error in existing code in ignite 2.6.0.
The cache.get is on a replicated/transactional cache and holds only only a
single key/value pair. It has been used like this for a while in production.
The code is executed in a runnable and wrapped in a
pessimistic/repeatable_read transaction.

The below line throws an exception. Any idea what could be causing this?

Date positionStartDate = (Date) posStartDateCache.get("positionStartDate"); 

[14 Jan 2019 14:55:49.690 EST] [pub-#12352%DataGridServer-Staging%] ERROR
11223 (TradeOrdersLoaderForMatching.java:69) Exception received while
loading tradeOrders for key: TraderTidSettlementKey [traderId=6671,
instrumentId=60083, settlement=null]
javax.cache.CacheException: class
org.apache.ignite.IgniteInterruptedException: Got interrupted while waiting
for future to complete.
at
org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1287)
~[ignite-core-2.3.0.jar:2.3.0]
at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.cacheException(IgniteCacheProxyImpl.java:1648)
~[ignite-core-2.3.0.jar:2.3.0]
at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.get(IgniteCacheProxyImpl.java:831)
~[ignite-core-2.3.0.jar:2.3.0]
at
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.get(GatewayProtectedCacheProxy.java:662)
~[ignite-core-2.3.0.jar:2.3.0]
at com.tudor.datagridI.utils.Util.getPositionStartDate(Util.java:27)
~[data-grid-server-ignite.jar:?]
at
com.tudor.datagridI.server.cachestore.springjdbc.TradeOrderTradeCacheLoader.loadingFromSingleTrimDb1(TradeOrderTradeCacheLoader.java:34)
~[data-grid-server-ignite.jar:?]
at
com.tudor.datagridI.server.matching.LoadTradeOrdersForMatchingLoader.loadCache(LoadTradeOrdersForMatchingLoader.java:46)
~[data-grid-server-ignite.jar:?]
at
com.tudor.datagridI.server.matching.LoadTradeOrdersForMatchingLoader.loadCache(LoadTradeOrdersForMatchingLoader.java:55)
~[data-grid-server-ignite.jar:?]
at
com.tudor.datagridI.server.matching.TradeOrdersLoaderForMatching.addTradeOrdersForTrader(TradeOrdersLoaderForMatching.java:79)
~[data-grid-server-ignite.jar:?]
at
com.tudor.datagridI.server.matching.TradeOrdersLoaderForMatching.loadTradeOrders(TradeOrdersLoaderForMatching.java:65)
~[data-grid-server-ignite.jar:?]
at
com.tudor.datagridI.server.matching.TradeOrdersLoaderForMatching.run(TradeOrdersLoaderForMatching.java:87)
~[data-grid-server-ignite.jar:?]
at
org.apache.ignite.internal.processors.closure.GridClosureProcessor$C4.execute(GridClosureProcessor.java:1944)
~[ignite-core-2.3.0.jar:2.3.0]
at
org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:566)
~[ignite-core-2.3.0.jar:2.3.0]
at
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6631)
~[ignite-core-2.3.0.jar:2.3.0]
at
org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:560)
~[ignite-core-2.3.0.jar:2.3.0]
at
org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:489)
~[ignite-core-2.3.0.jar:2.3.0]
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
~[ignite-core-2.3.0.jar:2.3.0]
at
org.apache.ignite.internal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1181)
~[ignite-core-2.3.0.jar:2.3.0]
at
org.apache.ignite.internal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1913)
~[ignite-core-2.3.0.jar:2.3.0]
at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1555)
~[ignite-core-2.3.0.jar:2.3.0]
at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1183)
~[ignite-core-2.3.0.jar:2.3.0]
at
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
~[ignite-core-2.3.0.jar:2.3.0]
at
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1090)
~[ignite-core-2.3.0.jar:2.3.0]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[?:1.8.0_112]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[?:1.8.0_112]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
Caused by: org.apache.ignite.IgniteInterruptedException: Got interrupted
while waiting for future to complete.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Thread got interrupted while trying to acquire table lock & Got interrupted while waiting for future to complete

2019-01-14 Thread bintisepaha
Was there any resolution to this?




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite SQL Queries not getting all data back in ignite 2.4 and 2.6

2018-08-15 Thread bintisepaha
Thanks for getting back, but we do not use Ignite's native persistence.
Anything else changed from 2.3 to 2.4 to cause this around SQL Queries? 




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Ignite SQL Queries not getting all data back in ignite 2.4 and 2.6

2018-08-15 Thread bintisepaha
Hi, we have used this query in ignite 2.3 for a while now. But we had some
data streamer exceptions that seemed to have been resolved on 2.4, so we
decided to upgrade. However, in 2.6 and also downgrading to 2.4 we have been
seeing this issue, that a SQL query does not return the data that is in
cache. when we go back to 2.3, it works as expected.

Here is the cache config for that cache.



 










































traderId

orderId






SORTED









traderId

insIid

clearAgent

strategy






SORTED




  




parentId






SORTED











and here is the query

public List getTradeOrdersForPSGroup(Integer traderId, Short
psRuleId, Integer tid, String clearAgent, String strategy, Integer pvId,
Date settlementDate, Date psTime) {
logger.info(String.format("Getting TradeOrders from the cache 
for
traderId: %s, tid: %s, clearAgent: %s, strategy: %s, pvId: %s, settlement:
%s, psTime: %s", traderId, tid, clearAgent, strategy, pvId, settlementDate,

Re: Storing Time Series data efficiently on Ignite

2018-07-26 Thread bintisepaha
Welly, would you please mind sharing how did this work out for you? what was
the time-series size and how was the performance?

Thanks,
Binti



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Storing Time Series data efficiently on Ignite

2018-04-06 Thread bintisepaha
Welly,

Hi, wondering how this turned our for you? we have a similar use case now.
DId you end up using ignite for this?

Thanks,
Binti



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite 2.x upgrade guidelines

2018-01-24 Thread bintisepaha
Thanks Evgenii. We will let you know how it goes.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Ignite 2.x upgrade guidelines

2018-01-22 Thread bintisepaha
Hi, we are upgrading ignite 1.7.0 to 2.3.0 soon. on 1.7 we were on heap and
used G1GC with 16 nodes each using 30 GB heap. Although we never ended up
using more than 40% heaps on any node at a given time.

With 2.3.0 it would be all off-heap. Are there any guidelines to follow or
things to check for performance wise before we upgrade in prod?

If you have a document about it or older post, that would also be helpful.

Thanks,
Binti



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


POJO get from any ignite console? visor or rest?

2017-09-21 Thread bintisepaha
Hi, I see that ignire rest API does not yet have support for json/custom
objects lookup by key.
But is this something I can do via JMX or visor or a management console?

I only would like to see how the object looks in the cache easily.
The key is usually a composite key of 2 integers.

Let me know if there is a quick web based way to access this.

Thanks,
Binti



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: SQL query on client stalling the grid when server node dies

2017-05-25 Thread bintisepaha
Anton, thanks for the response. Will reproduce in dev again and get you the
logs and thread dumps next week.

Question around rebalanceDelay and # of rebalance Threads. What are the
optimal settings for these?

On the client nodes, is it ok to call an IgniteCallable from within a txn?
Should I use timeouts for the callable? 

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/SQL-query-on-client-stalling-the-grid-when-server-node-dies-tp13107p13157.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


SQL query on client stalling the grid when server node dies

2017-05-23 Thread bintisepaha
Hi Igniters,

We have been testing with Ignite 1.9.0 and have this client that runs a
simple (no-join) SQL Query on a single distributed cache. But if we kill the
server node for testing in the meantime and if the client was running this
query, it actually stalls the whole cluster.

All we have to do for the grid to resume functioning is restart the client.
This may have something to do with data rebalancing when a server node dies.
Would setting a rebalanceDelay help? we are using the default of 0 now.

How does a client affect the whole cluster like this? and restarting it
fixes the stall? The server nodes exchange worker threads are stuck on
partitioning data.

Client thread stuck below (thread dump)

Name: main
State: TIMED_WAITING
Total blocked: 40  Total waited: 102,828

Stack trace: 
java.lang.Thread.sleep(Native Method)
org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.query(GridReduceQueryExecutor.java:494)
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$7.iterator(IgniteH2Indexing.java:1315)
org.apache.ignite.internal.processors.cache.QueryCursorImpl.iterator(QueryCursorImpl.java:94)
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$8.iterator(IgniteH2Indexing.java:1355)
org.apache.ignite.internal.processors.cache.QueryCursorImpl.iterator(QueryCursorImpl.java:94)
com.tudor.server.grid.matching.GridMatcher.getTradeOrdersForPSGroup(GridMatcher.java:322)
com.tudor.server.grid.matching.MatcherDelegate.unmatchRematch(MatcherDelegate.java:101)
com.tudor.server.grid.matching.GridMatcher.processPendingOrder(GridMatcher.java:275)
com.tudor.server.grid.matching.GridMatcher.run(GridMatcher.java:201)
com.tudor.server.grid.matching.GridMatcher.main(GridMatcher.java:99)


server node exchange worker thread dump


"exchange-worker-#34%DataGridServer-Development%" Id=68 in TIMED_WAITING on
lock=org.apache.ignite.internal.util.future.GridCompoundFuture@7e9c149b
  at sun.misc.Unsafe.park(Native Method)
  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
  at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
  at
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
  at
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:189)
  at
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:139)
  at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.waitPartitionRelease(GridDhtPartitionsExchangeFuture.java:779)
  at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:732)
  at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:489)
  at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1674)
  at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
  at java.lang.Thread.run(Thread.java:745)

Any help is appreciated.

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/SQL-query-on-client-stalling-the-grid-when-server-node-dies-tp13107.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

2017-05-10 Thread bintisepaha
Hey guys, we had a key lock issue again on 1.7.0. here is a suspicious thread
dump. Is this helpful for tracking down our issue further?
we did not see any topology changes or any other exceptions.

Attaching the entire thread dump too tdump.zip
  

"pub-#7%DataGridServer-Production%" Id=47 in WAITING on
lock=org.apache.ignite.internal.util.future.GridFutureAdapter$ChainFuture@2094df59
  at sun.misc.Unsafe.park(Native Method)
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
  at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
  at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
  at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
  at
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:159)
  at
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:117)
  at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:4800)
  at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:4783)
  at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:1395)
  at
org.apache.ignite.internal.processors.cache.IgniteCacheProxy.get(IgniteCacheProxy.java:956)
  at
com.tudor.datagridI.server.tradegen.OrderHolderSaveRunnable.updatePosition(OrderHolderSaveRunnable.java:790)
  at
com.tudor.datagridI.server.tradegen.OrderHolderSaveRunnable.cancelPosition(OrderHolderSaveRunnable.java:805)
  at
com.tudor.datagridI.server.tradegen.OrderHolderSaveRunnable.cancelExistingTradeOrderForPositionUpdate(OrderHolderSaveRunnable.java:756)
  at
com.tudor.datagridI.server.tradegen.OrderHolderSaveRunnable.processOrderHolders(OrderHolderSaveRunnable.java:356)
  at
com.tudor.datagridI.server.tradegen.OrderHolderSaveRunnable.run(OrderHolderSaveRunnable.java:109)
  at
org.apache.ignite.internal.processors.closure.GridClosureProcessor$C4V2.execute(GridClosureProcessor.java:2184)
  at
org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:509)
  at
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6521)
  at
org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:503)
  at
org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:456)
  at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
  at
org.apache.ignite.internal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1161)
  at
org.apache.ignite.internal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1766)
  at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1238)
  at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:866)
  at
org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:106)
  at
org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:829)
  at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)

  Locked synchronizers: count = 1
 <1237e0be>  - java.util.concurrent.ThreadPoolExecutor$Worker@1237e0be





--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p12611.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

2017-04-21 Thread bintisepaha
Andrey, we never start a txn on the client side. The key that gets locked on
our end and stays locked even after a successful txn is never read or
updated in a txn from the client side. Are you also able to reproduce the
key remaining locked issue?

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p12164.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

2017-04-19 Thread bintisepaha
This is positive news Andrey. Thanks a lot.

Please keep us posted about reproducing this. We are definitely not using
node filters...and we suspect topology changes to be causing issues, but
irrespective of that, we are not able to reproduce it. we also do not see
deadlock issues reported anywhere. the last time we got a key lock last
week, we did not see the NPE but only topology change for client.

Thanks,
Binti




--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p12094.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

2017-04-17 Thread bintisepaha
Looking further, I see this in the failed exception stack trace. The topology
did change but it is only a client that joined, do you think that has any
correlation to the key being locked?

[INFO ] 2017-04-13 14:15:44.360 [pub-#44%DataGridServer-Production%]
OrderHolderSaveRunnable - Updating PositionKey: PositionId [fundAbbrev=BVI,
clearBrokerId=12718, insIid=679675, strategy=AFI, traderId=6531,
valueDate=19000101]
[14:15:46] Topology snapshot [ver=1980, servers=16, clients=82, CPUs=273,
heap=850.0GB]
[ERROR] 2017-04-13 14:15:54.348 [pub-#44%DataGridServer-Production%]
OrderHolderSaveRunnable - Received Exception - printing on Entry
javax.cache.CacheException: class
org.apache.ignite.transactions.TransactionTimeoutException: Failed to
acquire lock within provided timeout for transaction [timeout=1,
tx=GridNearTxLocal [mappings=IgniteTxMappingsImpl [],
nearLocallyMapped=false, colocatedLocallyMapped=false, needCheckBackup=null,
hasRemoteLocks=true, thread=pub-#44%DataGridServer-Production%,
mappings=IgniteTxMappingsImpl [], super=GridDhtTxLocalAdapter
[nearOnOriginatingNode=false, nearNodes=[], dhtNodes=[], explicitLock=false,
super=IgniteTxLocalAdapter [completedBase=null, sndTransformedVals=false,
depEnabled=false, txState=IgniteTxStateImpl [activeCacheIds=GridLongList
[idx=2, arr=[2062286236,812449097]], txMap={IgniteTxKey
[key=KeyCacheObjectImpl [val=OrderKey [traderId=6531, orderId=12382604],
hasValBytes=true], cacheId=2062286236]=IgniteTxEntry [key=KeyCacheObjectImpl
[val=OrderKey [traderId=6531, orderId=12382604], hasValBytes=true],
cacheId=2062286236, partId=-1, txKey=IgniteTxKey [key=KeyCacheObjectImpl
[val=OrderKey [traderId=6531, orderId=12382604], hasValBytes=true],
cacheId=2062286236], val=[op=READ, val=CacheObjectImpl [val=TradeOrder
[orderKey=OrderKey [traderId=6531, orderId=12382604], insIid=679675,
clearBrokerId=12718, strategy=AFI, time=2017-04-13 13:30:00.0,
settlement=2017-04-19 00:00:00.0, quantity=-6800.0, insType=STK, version=1,
userId=3081, created=2017-04-13 13:29:47.831, status=open, allocFund=STD,
isAlloc=Y, clearAgent=MSCOEPB, execBroker=DBKSE, initiate=L,
notes=ClOrdId[20170413-Y47D580RHH99], allocRule=H2L, comType=T, comTurn=N,
comImplied=N, trdCur=USD, trdFreeze=N, kindFlag=, lastRepo=, exCpn=,
generatedTime=Thu Apr 13 14:15:02 EDT 2017, batchMatchFlag=N,
commission=0.003, trdRate=1.0, gross=, delivInstruct=null, startflys=3,
parentId=null, linkId=null, repo=N, repoRate=null, repoCalendar=null,
repoStartDate=null, repoEndDate=null, xiid=null, quantityCurr=null,
masterOrderId=null, unfilledQty=800.0, avgFillPrice=18.0021324, psRuleId=6,
origDate=2017-04-13 00:00:00.0, postingId=2, executingUserId=5647,
repoCloseDate=1900-01-01 00:00:00.0, repoPrice=0.0, directFxFlag=N, tax=0.0,
fixStatusId=58, txnTypeId=0, yield=null, valueDate=null,
interestOnlyRepoFlag=null, orderGroupId=0, fundingDate=2017-04-19
00:00:00.0, execBrokerId=12038, branchBrokerId=7511, fillOrigUserId=3081,
initialMargin=null, cmmsnChgUserId=0, cmmsnChgReasonId=0, fixingSourceId=0,
orderDesignationId=0, riskRewardId=0, placementTime=2017-04-13 13:29:47.657,
initialInvestment=0.0, equityFxBrokerTypeId=0, execBranchBrokerId=0,
createUserId=3081, targetAllocFlag=N, pvDate=null, pvFactor=null, pvId=0,
executionTypeId=0, borrowScheduleId=0, borrowScheduleTypeId=0,
marketPrice=null, interestAccrualDate=null, sourceAppId=103,
initiatingUserId=6531, isDiscretionary=Y, traderBsssc=S, clearingBsssc=S,
executingBsssc=S, shortsellBanApproverUserId=null, intendedQuantity=-7600.0,
lastUpdated=2017-04-13 14:15:02.147, traderStrategyId=24686,
businessDate=2017-04-13 00:00:00.0, firstExecutionTime=2017-04-13
13:29:47.657, doNotBulkFlag=null, trimDb=trim_grn, trades=[Trade
[tradeKey=TradeKey [tradeId=263603637, tradeId64=789971421, traderId=6531],
orderId=12382604, ftbId=2023850, quantity=-985.0, fundAbbrev=TRCP,
subfundAbbrev=TRCP_EDAB, date=Thu Apr 13 00:00:00 EDT 2017,
commission=0.003, fillId=1, flyallocNumber=1, pnlTime=Tue Jan 01 00:00:00
EST 2036, price=18.0021324, psId=0, psId64=0, psLiquid=I, psSettle=Tue Jan
01 00:00:00 EST 2036, psTime=Tue Jan 01 00:00:00 EST 2036, splitTradeId=0,
splitTradeId64=0, trimDb=], Trade [tradeKey=TradeKey [tradeId=888175445,
tradeId64=182624390, traderId=6531], orderId=12382604, ftbId=2022525,
quantity=-4141.0, fundAbbrev=BVI, subfundAbbrev=BVI_EDAB, date=Thu Apr 13
00:00:00 EDT 2017, commission=0.003, fillId=1, flyallocNumber=2, pnlTime=Tue
Jan 01 00:00:00 EST 2036, price=18.0021324, psId=0, psId64=0, psLiquid=I,
psSettle=Tue Jan 01 00:00:00 EST 2036, psTime=Tue Jan 01 00:00:00 EST 2036,
splitTradeId=0, splitTradeId64=0, trimDb=], Trade [tradeKey=TradeKey
[tradeId=938093803, tradeId64=953318988, traderId=6531], orderId=12382604,
ftbId=2022524, quantity=-1674.0, fundAbbrev=TGF, subfundAbbrev=TGF_EDAB,
date=Thu Apr 13 00:00:00 EDT 2017, commission=0.003, fillId=1,
flyallocNumber=3, pnlTime=Tue Jan 01 00:00:00 EST 2036, price=18.0021324,
psId=0, psId64=0, psLiquid=I, psSettle=Tue Jan 01 

Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

2017-04-13 Thread bintisepaha
Andrey, in the exception above, we see this

 pendingLocks=[KeyCacheObjectImpl [val=PositionId [fundAbbrev=BVI,
clearBrokerId=12718, insIid=679675, strategy=AFI, traderId=6531,
valueDate=19000101], hasValBytes=true]], super=GridCompoundIdentityFuture
[super=GridCompoundFuture [rdc=Bool reducer: true, initFlag=0, lsnrCalls=0,
done=true, cancelled=false, err=null, futs=[]]]
class
org.apache.ignite.internal.transactions.IgniteTxTimeoutCheckedException:
Failed to acquire lock within provided timeout for transaction
[timeout=1,
tx=org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocalAdapter$1@7597f862]

What does this mean? clearly its a pendingLock and we know that be behavior
because every subsequent update to this key fails with a txn timeout. But up
until this point, all updates were fine.

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p11968.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

2017-04-13 Thread bintisepaha
We got into the key lock situation again today.
Does the below error help in identifying anything?

Apr 13, 2017 2:16:09 PM org.apache.ignite.logger.java.JavaLogger error
SEVERE:  Failed to acquire lock for request: GridNearLockRequest
[topVer=AffinityTopologyVersion [topVer=1980, minorTopVer=34],
miniId=0628e8f4b51-2d0a76c5-854f-4909-90ed-b12db61a1632, implicitTx=false,
implicitSingleTx=false, onePhaseCommit=false, dhtVers=[null],
subjId=bdd5e4ed-aac9-4769-b241-a2e6f21f7e18, taskNameHash=0,
hasTransforms=false, syncCommit=true, accessTtl=-1, retVal=true,
firstClientReq=false, filter=null, super=GridDistributedLockRequest
[nodeId=bdd5e4ed-aac9-4769-b241-a2e6f21f7e18, nearXidVer=GridCacheVersion
[topVer=103141927, time=1492107359249, order=1492027414045, nodeOrder=13],
threadId=83, futId=e528e8f4b51-2d0a76c5-854f-4909-90ed-b12db61a1632,
timeout=1, isInTx=true, isInvalidate=false, isRead=true,
isolation=REPEATABLE_READ, retVals=[true], txSize=0, flags=0, keysCnt=1,
super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=103141927,
time=1492107359249, order=1492027414045, nodeOrder=13], committedVers=null,
rolledbackVers=null, cnt=0, super=GridCacheMessage [msgId=4632574,
depInfo=null, err=null, skipPrepare=false, cacheId=812449097,
cacheId=812449097
class
org.apache.ignite.internal.transactions.IgniteTxTimeoutCheckedException:
Failed to acquire lock within provided timeout for transaction
[timeout=1,
tx=org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocalAdapter$1@7597f862]
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter$PostLockClosure1.apply(IgniteTxLocalAdapter.java:3924)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter$PostLockClosure1.apply(IgniteTxLocalAdapter.java:3874)
at
org.apache.ignite.internal.util.future.GridEmbeddedFuture$2.applyx(GridEmbeddedFuture.java:91)
at
org.apache.ignite.internal.util.future.GridEmbeddedFuture$AsyncListener1.apply(GridEmbeddedFuture.java:297)
at
org.apache.ignite.internal.util.future.GridEmbeddedFuture$AsyncListener1.apply(GridEmbeddedFuture.java:290)
at
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:263)
at
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListeners(GridFutureAdapter.java:251)
at
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:381)
at
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:347)
at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.onComplete(GridDhtLockFuture.java:752)
at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.access$600(GridDhtLockFuture.java:79)
at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture$LockTimeoutObject.onTimeout(GridDhtLockFuture.java:1116)
at
org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor$TimeoutWorker.body(GridTimeoutProcessor.java:159)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:745)

Apr 13, 2017 2:16:09 PM org.apache.ignite.logger.java.JavaLogger error
SEVERE:  Future execution resulted in error: GridDhtEmbeddedFuture
[super=GridEmbeddedFuture [embedded=GridEmbeddedFuture
[embedded=GridDhtLockFuture
[nearNodeId=bdd5e4ed-aac9-4769-b241-a2e6f21f7e18,
nearLockVer=GridCacheVersion [topVer=103141927, time=1492107359249,
order=1492027414045, nodeOrder=13], topVer=AffinityTopologyVersion
[topVer=1980, minorTopVer=34], threadId=83,
futId=ee848215b51-be4c49fc-3ecf-43d8-a667-8bbe34fa045d,
lockVer=GridCacheVersion [topVer=103141927, time=1492107359742,
order=1492027414057, nodeOrder=17], read=true, err=null, timedOut=true,
timeout=1, tx=GridDhtTxLocal
[nearNodeId=bdd5e4ed-aac9-4769-b241-a2e6f21f7e18,
nearFutId=e528e8f4b51-2d0a76c5-854f-4909-90ed-b12db61a1632,
nearMiniId=0628e8f4b51-2d0a76c5-854f-4909-90ed-b12db61a1632,
nearFinFutId=d088e8f4b51-2d0a76c5-854f-4909-90ed-b12db61a1632,
nearFinMiniId=e088e8f4b51-2d0a76c5-854f-4909-90ed-b12db61a1632,
nearXidVer=GridCacheVersion [topVer=103141927, time=1492107359249,
order=1492027414045, nodeOrder=13], super=GridDhtTxLocalAdapter
[nearOnOriginatingNode=false, nearNodes=[], dhtNodes=[], explicitLock=false,
super=IgniteTxLocalAdapter [completedBase=null, sndTransformedVals=false,
depEnabled=false, txState=IgniteTxStateImpl [activeCacheIds=GridLongList
[idx=1, arr=[812449097]], txMap={IgniteTxKey [key=KeyCacheObjectImpl
[val=PositionId [fundAbbrev=BVI, clearBrokerId=12718, insIid=679675,
strategy=AFI, traderId=6531, valueDate=19000101], hasValBytes=true],
cacheId=812449097]=IgniteTxEntry [key=KeyCacheObjectImpl [val=PositionId
[fundAbbrev=BVI, clearBrokerId=12718, insIid=679675, strategy=AFI,
traderId=6531, valueDate=19000101], hasValBytes=true], 

Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

2017-04-12 Thread bintisepaha
Thanks for trying, is the node filter an issue someone else is seeing?



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p11905.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

2017-04-06 Thread bintisepaha
Andrey, 

We start the caches only at server start up time. 2 of our caches are
replicated, all other caches are partitioned with 1 backup. Do you think
replicated caches might be causing this issue, when clients leave and join
the cluster?

All server nodes start with the same cache config. All client nodes start
with the same config but have no knowledge of cache creation. 

We don't use any node filters. For IgniteCompute we start the Runnable on
server nodes, but for cache creation there are no filters.

Any luck reproducing? Has anyone else had an issue where a txn seems like it
is finished but the key lock is not released?

Thanks,
Binti






--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p11784.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

2017-04-03 Thread bintisepaha
Sorry for the late response. We do not close/destroy caches dynamically.
could you please explain this NPE?



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p11676.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

2017-03-24 Thread bintisepaha
Hey Andrey, Thanks a lot for getting back.

These errors were a result of a bad client connected to grid.

We have been running clients that leave and join the cluster constantly in
order to see if we can reproduce this issue. Last night we saw this issue
again. Here is one of the errors that a sys thread has on a client node that
initiates a transaction. The client node was not restarted or disconnected.
It kept working fine.
We do not restart these clients but there are some otehr clietns that leave
and join the cluster.

Do you think this is helpful in locating the cause?

Exception in thread "sys-#41%DataGridServer-Production%"
java.lang.NullPointerException
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxKey.finishUnmarshal(IgniteTxKey.java:92)
at
org.apache.ignite.internal.processors.cache.transactions.TxLocksResponse.finishUnmarshal(TxLocksResponse.java:190)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager$DeadlockDetectionListener.unmarshall(IgniteTxManager.java:2427)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager$DeadlockDetectionListener.onMessage(IgniteTxManager.java:2317)
at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1238)
at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:866)
at
org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:106)
at
org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:829)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[20:02:18] Topology snapshot [ver=7551, servers=16, clients=53, CPUs=217,
heap=740.0GB]
[20:02:22] Topology snapshot [ver=7552, servers=16, clients=52, CPUs=213,
heap=740.0GB]
[20:02:28] Topology snapshot [ver=7553, servers=16, clients=53, CPUs=217,
heap=740.0GB]
[20:02:36] Topology snapshot [ver=7554, servers=16, clients=54, CPUs=217,
heap=740.0GB]
[20:02:40] Topology snapshot [ver=7555, servers=16, clients=53, CPUs=217,
heap=740.0GB]
[20:02:41] Topology snapshot [ver=7556, servers=16, clients=54, CPUs=217,
heap=740.0GB]
[20:02:48] Topology snapshot [ver=7557, servers=16, clients=53, CPUs=217,
heap=740.0GB]




--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p11433.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

2017-03-21 Thread bintisepaha
Sorry for keep following up on this, but this is becoming a major issue for
us and we need to understand how ignite treats cluster topology when
transaction are running on server nodes. How do clients joining and leaving
the cluster affects txns?

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p11346.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

2017-03-20 Thread bintisepaha
Andrey, would you be able to look at the attached errors and advice please?

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p11322.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

2017-03-16 Thread bintisepaha
Evans, we don't have a grid hanging issue. Also we do have server nodes
leaving the cluster. the toplogy changes are for clients.

I am attaching some errors we saw when a key got locked.
Could someone please look at it and help us to find out why a client
topology change is causing key locks?

Errors.txt
  



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p11254.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

2017-03-15 Thread bintisepaha
Andrey, do you think topology changes with client nodes (not server nodes)
leaving and joining the cluster may have any impact on transaction?

twice out of the 4 times we saw topology changes while the last successful
update was done to the key that was eventually locked.

Yesterday we saw this same issue on another cache. We are due for 1.8
upgrade soon, but given that no one else has seen this issue before, we are
not sure if 1.8 would have fixed it either.

Let us know what you think.

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p11203.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

2017-03-09 Thread bintisepaha
This is helpful and we have seen it earlier in our code too. So you are right
that the reason for this issue we are seeing is not deadlock.

It feels like a strange bug where a pessimistic txn is committing but not
releasing the key. Let's see if 1.8 helps. We will upgrade to it in 2 weeks.

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p11100.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: getOrCreateCache hang

2017-03-07 Thread bintisepaha
Matt, I have not tried your code (not a committer to the community), but what
happens when you use Ignite.cache(), if the cache was already created on
server startup.



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/getOrCreateCache-hang-tp10737p11065.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

2017-03-07 Thread bintisepaha
Could you please tell us what the logs would say if there was a deadlock?
we rollback the txn on any exception explicitly.

We are running 1.8 in UAT environment this week. It is not a simple upgrade
to production, we need to let it burn in. But in UAT we never saw this issue
with 1.7 either. It already happened in production, so far 3 times.



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p11064.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

2017-03-06 Thread bintisepaha
Andrey, Could you please tell us a little bit more about the deadlock
detection feature in 1.8?

https://issues.apache.org/jira/browse/IGNITE-2969

How would we able to know that there is a deadlock? Do you think that is the
case for us now in 1.7 but we can't be sure because we have no such feature?

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p11038.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

2017-02-27 Thread bintisepaha
Andrey, thanks for getting back.
The long pause is between 2 different threads, so isn't that normal?

Also the 9990 ms and 10 ms used earlier for some previous step in the tn, is
this how ignite breaks out the time? we have always seen the timeouts for
the value in ms that we set in our code. Also the stack trace does not come
from our code which it usually does on genuine timeouts that do not leave
the key locked.

We are trying 1.8 in UAT environment and will release to production soon.
Unfortunately this issue does not happen in UAT and we have no way of
reproducing it.

Is there any way we can force release a locked key without restarting the
whole cluster?

Thanks,
Binti





--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p10910.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

2017-02-23 Thread bintisepaha
This is the actual error that looks like it is not coming from our code

109714 Feb 22, 2017 3:46:17 PM org.apache.ignite.logger.java.JavaLogger
error
109715 SEVERE:  Failed to acquire lock for request:
GridNearLockRequest [topVer=AffinityTopologyVersion [topVer=3153,
minorTopVer=32], miniId=acdced25a51-c5e64ee6-1079-4b90-bb7b-5ec14032a859,
implicitTx=f   alse, implicitSingleTx=false, onePhaseCommit=false,
dhtVers=[null], subjId=f6663b00-24fc-4515-91ac-20c3b47d90ec, taskNameHash=0,
hasTransforms=false, syncCommit=true, accessTtl=-1, retVal=true,
firstClientRe   q=false, filter=null, super=GridDistributedLockRequest
[nodeId=f6663b00-24fc-4515-91ac-20c3b47d90ec, nearXidVer=GridCacheVersion
[topVer=98913254, time=1487796367155, order=1487785731866, nodeOrder=7],
threa   dId=57, futId=8cdced25a51-c5e64ee6-1079-4b90-bb7b-5ec14032a859,
timeout=9990, isInTx=true, isInvalidate=false, isRead=true,
isolation=REPEATABLE_READ, retVals=[true], txSize=0, flags=0, keysCnt=1,
super=Grid   DistributedBaseMessage [ver=GridCacheVersion
[topVer=98913254, time=1487796367155, order=1487785731866, nodeOrder=7],
committedVers=null, rolledbackVers=null, cnt=0, super=GridCacheMessage
[msgId=2688455, de   pInfo=null, err=null, skipPrepare=false,
cacheId=812449097, cacheId=812449097
109716 class
org.apache.ignite.internal.transactions.IgniteTxTimeoutCheckedException:
Failed to acquire lock within provided timeout for transaction
[timeout=9990, tx=org.apache.ignite.internal.processors.cache.dis  
tributed.dht.GridDhtTxLocalAdapter$1@7f2a8c8a]
109717 at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter$PostLockClosure1.apply(IgniteTxLocalAdapter.java:3924)
109718 at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter$PostLockClosure1.apply(IgniteTxLocalAdapter.java:3874)
109719 at
org.apache.ignite.internal.util.future.GridEmbeddedFuture$2.applyx(GridEmbeddedFuture.java:91)
109720 at
org.apache.ignite.internal.util.future.GridEmbeddedFuture$AsyncListener1.apply(GridEmbeddedFuture.java:297)
109721 at
org.apache.ignite.internal.util.future.GridEmbeddedFuture$AsyncListener1.apply(GridEmbeddedFuture.java:290)
109722 at
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:263)
109723 at
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListeners(GridFutureAdapter.java:251)
109724 at
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:381)
109725 at
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:347)
109726 at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.onComplete(GridDhtLockFuture.java:752)
109727 at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.access$600(GridDhtLockFuture.java:79)
109728 at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture$LockTimeoutObject.onTimeout(GridDhtLockFuture.java:1116)
109729 at
org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor$TimeoutWorker.body(GridTimeoutProcessor.java:159)
109730 at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
109731 at java.lang.Thread.run(Thread.java:745)
109732 
109733 Feb 22, 2017 3:46:17 PM org.apache.ignite.logger.java.JavaLogger
error
109716 class
org.apache.ignite.internal.transactions.IgniteTxTimeoutCheckedException:
Failed to acquire lock within provided timeout for transaction
[timeout=9990, tx=org.apache.ignite.internal.processors.cache.dis  
tributed.dht.GridDhtTxLocalAdapter$1@7f2a8c8a]
109717 at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter$PostLockClosure1.apply(IgniteTxLocalAdapter.java:3924)
109718 at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter$PostLockClosure1.apply(IgniteTxLocalAdapter.java:3874)
109719 at
org.apache.ignite.internal.util.future.GridEmbeddedFuture$2.applyx(GridEmbeddedFuture.java:91)
109720 at
org.apache.ignite.internal.util.future.GridEmbeddedFuture$AsyncListener1.apply(GridEmbeddedFuture.java:297)
109721 at
org.apache.ignite.internal.util.future.GridEmbeddedFuture$AsyncListener1.apply(GridEmbeddedFuture.java:290)
109722 at
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:263)
109723 at
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListeners(GridFutureAdapter.java:251)
109724 at
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:381)
109725 at
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:347)
109726 at

Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

2017-02-22 Thread bintisepaha
Andrey,

I finally have an error that might help. this happened again in production
today for us.
Ignite-Console-3.zip

  

This is the last update that threw an error, after this error every update
just times out.

The timeout=9990 in this error, none of our transactions have this timeout.
Do you think this is an ignite bug? If you look at the stack trace, this is
not happening due to our code.

Although there is an error on marshaling. How can we further narrow it down?

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p10828.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Monitoring Cache - Data counters, Cache Data Size

2017-02-21 Thread bintisepaha
Hi Val, 

I saw that MBean, but it reports the same number as the Local MBean. and if
I go on each node, the CacheCluster and CacheLocal Cache size matches. I do
not see a sum total across all nodes.

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Monitoring-Cache-Data-counters-Cache-Data-Size-tp3203p10780.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

2017-02-21 Thread bintisepaha
Attaching the console log file too, it has the above error.
Unfortunately we lost the files for older updates to this key now, they
rolled over.
Ignite-Console-1.zip

  

but the "Transaction has been already completed" error could also happen
because maybe we call explicit rollback in case of any Runtime Exception.

tx.rollback();

We do get genuine timeouts in the code, but at that time, we never get this
exception.

Hope this helps.

Thanks,
Binti





--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p10779.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

2017-02-21 Thread bintisepaha
Hi, I will try this code the next time this issue happens.

Attaching one nodes full logs. It has a lot of info.
Ignite-11221.gz
  

However, I found this on the console logs on the first exception that
occurred. Is this usual?

 14764 Feb 17, 2017 1:16:04 PM org.apache.ignite.logger.java.JavaLogger
error
 14765 SEVERE: Failed to execute job
[jobId=fcedb9c4a51-9aa7d7c5-f6fa-4bdd-9473-0439b889d46f,
ses=GridJobSessionImpl [ses=GridTaskSessionImpl
[taskName=com.tudor.datagridI.server.tradegen.OrderHolderSaveRunnable,
dep=LocalDeployment [super=GridDeployment[ts=1486832074045,
depMode=SHARED, clsLdr=sun.misc.Launcher$AppClassLoader@18b4aac2,
clsLdrId=678f81e2a51-b602d584-5565-434b-9727-94e218108073, userVer=0,
loc=true, sampleClsName=java.lang.String, pendingUndeploy=false,
undeployed=false, usage=0]],   
taskClsName=com.tudor.datagridI.server.tradegen.OrderHolderSaveRunnable,
sesId=ecedb9c4a51-9aa7d7c5-f6fa-4bdd-9473-0439b889d46f,
startTime=1487355352478, endTime=9223372036854775807,
taskNodeId=9aa7d7c5-f6fa-4bdd-9473-0439b889d46f, clsLdr=sun.misc  
.Launcher$AppClassLoader@18b4aac2, closed=false, cpSpi=null, failSpi=null,
loadSpi=null, usage=1, fullSup=false,
subjId=9aa7d7c5-f6fa-4bdd-9473-0439b889d46f, mapFut=IgniteFuture
[orig=GridFutureAdapter [resFlag=0, res=null, startTime=1487355352483,   
endTime=0, ignoreInterrupts=false, state=INIT]]],
jobId=fcedb9c4a51-9aa7d7c5-f6fa-4bdd-9473-0439b889d46f]]
 14766 class org.apache.ignite.IgniteException: Transaction has been already
completed.
 14767 at
org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:908)
 14768 at
org.apache.ignite.internal.processors.cache.transactions.TransactionProxyImpl.rollback(TransactionProxyImpl.java:299)
 14769 at
com.tudor.datagridI.server.cache.transaction.IgniteCacheTransaction.rollback(IgniteCacheTransaction.java:19)
 14770 at
com.tudor.datagridI.server.tradegen.OrderHolderSaveRunnable.processOrderHolders(OrderHolderSaveRunnable.java:509)
 
 14771 at
com.tudor.datagridI.server.tradegen.OrderHolderSaveRunnable.run(OrderHolderSaveRunnable.java:105)
 14772 at
org.apache.ignite.internal.processors.closure.GridClosureProcessor$C4V2.execute(GridClosureProcessor.java:2184)
 14773 at
org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:509)
 14774 at
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6521)
 14775 at
org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:503)
 14776 at
org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:456)
 14777 at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
 14778 at
org.apache.ignite.internal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1161)
 
 14779 at
org.apache.ignite.internal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1766)
 14780 at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1238)
 14781 at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:866)
 14782 at
org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:106)
 14783 at
org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:829)
 14784 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 14785 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 14786 at java.lang.Thread.run(Thread.java:745)
 14787 Caused by: class org.apache.ignite.IgniteCheckedException:
Transaction has been already completed.
 14788 at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finishDhtLocal(IgniteTxHandler.java:776)
 14789 at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finish(IgniteTxHandler.java:718)
 
 14790 at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxFinishRequest(IgniteTxHandler.java:681)
 
 14791 at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:156)
 14792 at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:154)
 14793 at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:748)
 14794 at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:353)
 14795 at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:277)
 14796  

Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

2017-02-21 Thread bintisepaha
Andrey, thanks for getting back.
I am attaching the stack trace. Don't think the cause is a deadloc, but the
trace is long so maybe I am missing out something, let me know if you find
something useful.

We cannot ourselves reproduce this issue as there are no errors on the prior
successful update. It feels like the txn was marked successful, but on one
of the keys the lock was not released. and later when we try to access the
key, its locked, hence the exceptions.

No messages in the logs for long running txns or futures. 
By killing the node that holds the key, the lock is not released.

Is there a way to query ignite to see if the locks are being held on a
particular key? Any code we can run to salvage such locks?

Any other suggestions?

Thanks,
Binti






--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p10764.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

2017-02-17 Thread bintisepaha
Thanks Andrew,

The same thing happened again today. Clearly the key is locked, we get the
timeout exceptions. But prior update to the same has not thrown any
exceptions. Suddenly one update fails with timeout exceptions and we are
notified due to those exceptions that the key is locked.

We will upgrade to 1.8, but in the meantime is there a way to free up this
locked key using some code?

We try killing nodes, but we have one back up and it looks like the lock is
carried over too, which would be the right thing to do.

Outside the transaction we can read this key (dirty read). This is becoming
an issue for us, since its a production system and the only way to free up
is to restart the cluster.

Please point us in a direction where we can avoid this or free it up.

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p10713.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Monitoring Cache - Data counters, Cache Data Size

2017-02-16 Thread bintisepaha
Is there any plan to expose cluster wise metrics on JMX?
for e.g. for a distributed cache, I would like to know the Size for all
nodes combined via JMX.

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Monitoring-Cache-Data-counters-Cache-Data-Size-tp3203p10684.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

2017-02-16 Thread bintisepaha
Hello, any help on this would be very much appreciated.



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p10683.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

2017-02-13 Thread bintisepaha
Hi Andrey,

We are using ignite 1.7.0.
I am attaching the thread dumps. As far as we can tell the grid was working
fine, no threads were hanging. Only subsequent updates to this key were
hung.

The topology was stable. And the client sends an object to be saved to the
server and the transaction is actually only on the server side. 
prodThreadDump20170208.zip

  

Please let us know if you see something useful.

Thanks,
Binti




--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p10601.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Performance with increase in node

2017-01-05 Thread bintisepaha
Sam, could you post your cache configuration? How many backups do you have?
What is the marshaller you are using? is there garbage collection happening?

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Performance-with-increase-in-node-tp9378p9921.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Streaming Exception on client - nothing on server side

2017-01-05 Thread bintisepaha
val, we resolved this by eliminating some spring injection. the error was not
clear, but on the client side we dd not have he right bean declared. thank
you.



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Streaming-Exception-on-client-nothing-on-server-side-tp9807p9920.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Streaming Exception on client - nothing on server side

2016-12-30 Thread bintisepaha
Hi, Can anyone help me with this exception? What does this mean?

It happens in the below call from Client on serverCompute.run()
Although we require serializable = false in our configuration, I have made
these classes serializable too.
But I still get the error. Why is there an ArrayIndexOutOfBounds exception?
I see no error on the server nodes.

@Override
public void loadTradeOrdersForMatching(Integer traderId, Integer tid, 
Date
settlementDate) throws Exception {

ClusterGroup serverGroup = ignite.cluster().forServers();
IgniteCompute serverCompute = ignite.compute(serverGroup);
try {
serverCompute.run(new 
TradeOrdersLoaderForMatching(traderId, tid,
settlementDate, tradeOrderForMatchingLoader));
} catch(Exception e) {
logger.error(e,e);
throw e;
}
}


2016-12-30 15:54:05,196 ERROR com.tudor.datagridI.TradingDataAccessImpl
(TradingDataAccessImpl.java:423) - class org.apache.ignite.IgniteException:
521338879
class org.apache.ignite.IgniteException: 521338879
at
org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:908)
at
org.apache.ignite.internal.IgniteComputeImpl.run(IgniteComputeImpl.java:304)
at
com.tudor.datagridI.TradingDataAccessImpl.loadTradeOrdersForMatching(TradingDataAccessImpl.java:421)
at
orderserver.client.GridClient.loadPSGroupForOrder(GridClient.java:325)
at orderserver.OrderFactory.saveOrders(OrderFactory.java:6011)
at
orderserver.XML.XMLUpdateCcyOrderWriter.saveOrders(XMLUpdateCcyOrderWriter.java:367)
at
orderserver.XML.XMLUpdateOrderWriter.(XMLUpdateOrderWriter.java:38)
at
orderserver.XML.XMLUpdateCcyOrderWriter.(XMLUpdateCcyOrderWriter.java:32)
at
orderserver.XML.Dispatcher.processUpdateOMCcyOrdersRootNodeName(Dispatcher.java:657)
at orderserver.XML.Dispatcher.dispatchRequest(Dispatcher.java:317)
at
orderserver.corba.OrderFactoryImpl.xmlRequest(OrderFactoryImpl.java:1467)
at idl2java.OrderFactoryPOA._invoke(Unknown Source)
at idl2java.OrderFactoryPOA._invoke(Unknown Source)
at com.inprise.vbroker.poa.POAImpl.invoke(POAImpl.java:2693)
at
com.inprise.vbroker.poa.ActivationRecord.invoke(ActivationRecord.java:109)
at
com.inprise.vbroker.GIOP.GiopProtocolAdapter.doRequest(GiopProtocolAdapter.java:824)
at
com.inprise.vbroker.IIOP.ServerProtocolAdapter.doRequest(ServerProtocolAdapter.java:68)
at
com.inprise.vbroker.GIOP.GiopProtocolAdapter.dispatchMessage(GiopProtocolAdapter.java:1106)
at
com.inprise.vbroker.orb.TPDispatcherImpl$TPDispatcher.run(TPDispatcherImpl.java:100)
at
com.inprise.vbroker.orb.ThreadPool$PoolWorker.run(ThreadPool.java:76)
Caused by: class org.apache.ignite.IgniteCheckedException: 521338879
at
org.apache.ignite.internal.util.IgniteUtils.cast(IgniteUtils.java:7142)
at
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:168)
at
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:117)
at
org.apache.ignite.internal.AsyncSupportAdapter.saveOrGet(AsyncSupportAdapter.java:112)
at
org.apache.ignite.internal.IgniteComputeImpl.run(IgniteComputeImpl.java:301)
... 18 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 521338879
at
org.apache.ignite.marshaller.optimized.OptimizedObjectInputStream$HandleTable.lookup(OptimizedObjectInputStream.java:1065)
at
org.apache.ignite.marshaller.optimized.OptimizedObjectInputStream.readObjectOverride(OptimizedObjectInputStream.java:204)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:367)
at
org.apache.ignite.marshaller.optimized.OptimizedObjectInputStream.readFields(OptimizedObjectInputStream.java:491)
at
org.apache.ignite.marshaller.optimized.OptimizedObjectInputStream.readSerializable(OptimizedObjectInputStream.java:579)
at
org.apache.ignite.marshaller.optimized.OptimizedClassDescriptor.read(OptimizedClassDescriptor.java:841)
at
org.apache.ignite.marshaller.optimized.OptimizedObjectInputStream.readObjectOverride(OptimizedObjectInputStream.java:324)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:367)
at
org.apache.ignite.marshaller.optimized.OptimizedObjectInputStream.readFields(OptimizedObjectInputStream.java:491)
at
org.apache.ignite.marshaller.optimized.OptimizedObjectInputStream.readSerializable(OptimizedObjectInputStream.java:579)
at
org.apache.ignite.marshaller.optimized.OptimizedClassDescriptor.read(OptimizedClassDescriptor.java:841)
at
org.apache.ignite.marshaller.optimized.OptimizedObjectInputStream.readObjectOverride(OptimizedObjectInputStream.java:324)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:367)
  

Re: Killing a node under load stalls the grid with ignite 1.7

2016-11-03 Thread bintisepaha
the problem is when I am in write behind for order, how do I access the trade
object. its only present in the cache. at that time I need access trade
cache and that is causing issues.



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Killing-a-node-under-load-stalls-the-grid-with-ignite-1-7-tp8130p8695.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Killing a node under load stalls the grid with ignite 1.7

2016-10-27 Thread bintisepaha
yes I think you are write. Is there any setting that we can use in write
behind that will not lock the entries?
the use case is we have is like this

Parent table - Order (Order Cache)
Child Table - Trade (Trade Cache)

We only have write behind on Order Cache and when writing that we write
order and trade table both. so we query trade cache from order cache store
writeAll() which is causing the above issue. We need to do this because we
cannot write trade in the database without writing order. Foreign key
constraints and data-integrity. 

Do you have any recommendations to solve this problem? We cannot use
write-through. How do we make sure 2 tables are written in an order if they
are in separate caches?

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Killing-a-node-under-load-stalls-the-grid-with-ignite-1-7-tp8130p8557.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Killing a node under load stalls the grid with ignite 1.7

2016-10-24 Thread bintisepaha
Hi, actually we use a lot of caches from cache store writeAll().
For confirming if that is the cause of the grid stall, we would have to
completely change our design. 

Can someone confirm that this is the cause for grid to stall? referencing
cache.get from a cache store and then killing or bringing up nodes leads to
a stall?

We see a node blocked on flusher thread while doing a cache.get() when the
grid is stalled, if we kill that node, the grid starts functioning. But we
would like to understand are we using write behind incorrectly or there are
some settings that we can use to re-balance or write-behind that might save
us from something like this.

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Killing-a-node-under-load-stalls-the-grid-with-ignite-1-7-tp8130p8449.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Killing a node under load stalls the grid with ignite 1.7

2016-10-21 Thread bintisepaha
This was done to optimize our writes to the DB. on every save, we do not want
to delete and insert records, so we do a digest comparison. Do you think
this causes an issue? How does cache store handle transactions or locks?
when a node dies, if a flusher thread is doing write-behind how does that
affect data rebalancing?

If you could answer the above questions, it will give us more clarity. 

We are removing it now. but still killing a node is stalling the cluster.
Will send the latest thread dumps to you today.

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Killing-a-node-under-load-stalls-the-grid-with-ignite-1-7-tp8130p8405.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Killing a node under load stalls the grid with ignite 1.7

2016-10-14 Thread bintisepaha
Hi, I don't have a simple working example. But under even some load its a
very reproducible problem. We had the same issue with ignite 1.5.0-final as
well. Never used 1.6 as much, and now have the same issue with 1.7.0.

If you are able to reproduce on your end , it will be really helpful.

Where do you see lock between GridCacheWriteBehindStore and
GridCachePartitionExchangeManager?

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Killing-a-node-under-load-stalls-the-grid-with-ignite-1-7-tp8130p8302.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Killing a node under load stalls the grid with ignite 1.7

2016-10-08 Thread bintisepaha
Hi could someone please look at this and respond? 

Thanks, 
Binti 



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Killing-a-node-under-load-stalls-the-grid-with-ignite-1-7-tp8130p8158.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Killing a node under load stalls the grid with ignite 1.7

2016-10-06 Thread bintisepaha
Hi, we are using ignite 1.7 and under some load when caches are being updated
and write behind is moving along if we just kill a node, the entire grid
stalls. attaching thread dumps when the partitioned caches were in full_sync
mode and also when all were in full_async mode. It looks like something to
do with exchange worker. we have a failureDetection Timeout on server nodes
of 30 seconds. this is to avoid grid from stalling when we have long major
GC pauses. with all g1gc settings we are unable to avoid major GCs. so we
had to workaround and use a longer failureDetection time.

DevDump06Oct2016.zip

  

When there is no load, killing a node does not stall the grid.

On the client node when the grid stalls, we see this being logged
continuously.

U.warn(log, "Failed to wait for partition map exchange [" +
"topVer=" +
exchFut.topologyVersion() +
", node=" + cctx.localNodeId() + "].
" +
"Dumping pending objects that might
be the cause: ");

Thanks,
Binti




--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Killing-a-node-under-load-stalls-the-grid-with-ignite-1-7-tp8130.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: failureDetectionTimeout tuning

2016-09-01 Thread bintisepaha
We will try this and get back to you. Does it mean that usually the  node
recovers from it, if it were due to GC? Would we also have to remove
joinTimeout?



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/failureDetectionTimeout-tuning-tp7374p7473.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: IgniteCompute.broadcast() stuck

2016-09-01 Thread bintisepaha
Also, it was stuck like this for hours.



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/IgniteCompute-broadcast-stuck-tp7255p7472.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: IgniteCompute.broadcast() stuck

2016-09-01 Thread bintisepaha
Val, this dump was from the client node which I sent on the original email. 
the zipped up dumps were from all the server nodes that participate in the
distributed cache.

anyways, changing it to run() fixed the issue. But we never understand the
root cause of the hanging, its always that the alternative suggestion works.
And we move on to it without understanding why what we tried first did not
work.

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/IgniteCompute-broadcast-stuck-tp7255p7471.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


failureDetectionTimeout tuning

2016-08-29 Thread bintisepaha
We see this message logged in our logs from time to time. Is this a potential
issue with one of the nodes? 


[tcp-disco-sock-reader-#41%DataGridServer-Staging%] ERROR
(Log4JLogger.java:495) Failed to initialize connection
[sock=Socket[addr=/10.22.50.193,port=35612,localport=47501]]
class org.apache.ignite.spi.IgniteSpiOperationTimeoutException: Network
operation timed out. Increase 'failureDetectionTimeout' configuration
property [failureDetectionTimeout=1]
at
org.apache.ignite.spi.IgniteSpiOperationTimeoutHelper.nextTimeoutChunk(IgniteSpiOperationTimeoutHelper.java:81)
at
org.apache.ignite.spi.discovery.tcp.ServerImpl$SocketReader.body(ServerImpl.java:5035)
at
org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)

How do you recommend fixing it? We are not using networkDetection or
failureDetection timeout yet in our configs.

This is what we have
 



  
  



Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/failureDetectionTimeout-tuning-tp7374.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: IgniteCompute.broadcast() stuck

2016-08-26 Thread bintisepaha
It was hanging because all of our clients were stuck at the below

Thread dump 

Name: main 
State: WAITING on
org.apache.ignite.internal.ComputeTaskInternalFuture@3fd1be52 
Total blocked: 5  Total waited: 5,975 

Stack trace: 
sun.misc.Unsafe.park(Native Method) 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
 
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:157)
 
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:115)
 
org.apache.ignite.internal.AsyncSupportAdapter.saveOrGet(AsyncSupportAdapter.java:112)
 
org.apache.ignite.internal.IgniteComputeImpl.broadcast(IgniteComputeImpl.java:250)
 
com.tudor.datagridI.client.TradeOrderStoreHelper.processOrderHolders(TradeOrderStoreHelper.java:37)
 
com.tudor.datagridI.TradingDataAccessImpl.saveOrders(TradingDataAccessImpl.java:399)
 
orderserver.client.GridClient.updateOrderHoldersInGrid(GridClient.java:138) 
orderserver.Order.save(Order.java:3619) 
   - locked orderserver.Order@732871ce 
orderserver.Order.save(Order.java:3563) 
   - locked orderserver.Order@732871ce 
izi.izi_data_grid_ignite_test.OrderBooker.bookRegularOrder(OrderBooker.java:111)
 
izi.izi_data_grid_ignite_test.OrderBooker.bookOrder(OrderBooker.java:33) 
izi.izi_data_grid_ignite_test.Main.bookOrders(Main.java:47) 
izi.izi_data_grid_ignite_test.Main.runExc(Main.java:83) 
izi.izi_data_grid_ignite_test.Main.run(Main.java:35) 
izi.izi_data_grid_ignite_test.Runner.run(Runner.java:37) 
izi.izi_data_grid_ignite_test.Runner.main(Runner.java:17) 



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/IgniteCompute-broadcast-stuck-tp7255p7349.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: IgniteCompute.broadcast() stuck

2016-08-25 Thread bintisepaha
Also, it looks like a lot of other nodes did not have these TCP threads.
Could that be the reason for this issue? server nodes not accepting
connections?



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/IgniteCompute-broadcast-stuck-tp7255p7314.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: IgniteCompute.broadcast() stuck

2016-08-25 Thread bintisepaha
Vlad, 

Look at this one below in file 2511995.txt

"RMI TCP Connection(8)-10.10.11.100" #202 daemon prio=5 os_prio=0
tid=0x7ff050009000 nid=0x2a7a23 in Object.wait() [0x7fef5aef5000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at
com.sun.jmx.remote.internal.ArrayNotificationBuffer.fetchNotifications(ArrayNotificationBuffer.java:449)
- locked <0x00063066a628> (a
com.sun.jmx.remote.internal.ArrayNotificationBuffer)
at
com.sun.jmx.remote.internal.ArrayNotificationBuffer$ShareBuffer.fetchNotifications(ArrayNotificationBuffer.java:227)
at
com.sun.jmx.remote.internal.ServerNotifForwarder.fetchNotifs(ServerNotifForwarder.java:274)
at
javax.management.remote.rmi.RMIConnectionImpl$4.run(RMIConnectionImpl.java:1273)
at
javax.management.remote.rmi.RMIConnectionImpl$4.run(RMIConnectionImpl.java:1271)
at
javax.management.remote.rmi.RMIConnectionImpl.fetchNotifications(RMIConnectionImpl.java:1277)
at sun.reflect.GeneratedMethodAccessor161.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323)
at sun.rmi.transport.Transport$1.run(Transport.java:200)
at sun.rmi.transport.Transport$1.run(Transport.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$79(TCPTransport.java:683)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$5/1005089754.run(Unknown
Source)
at java.security.AccessController.doPrivileged(Native Method)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/IgniteCompute-broadcast-stuck-tp7255p7313.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: IgniteCompute.broadcast() stuck

2016-08-24 Thread bintisepaha
Did you see the dumps for RMI threads? We are seeing some RMI TCP
Communication threads locked.
Computation is stuck because clients are hanging in broadcast(), could this
be related to the rmi tcp threads being locked.



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/IgniteCompute-broadcast-stuck-tp7255p7286.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: IgniteCompute.broadcast() stuck

2016-08-24 Thread bintisepaha
Val, if I selecting only one random node like below

ClusterGroup serverGroup =
ignite.cluster().forServers().forRandom(); 
IgniteCompute serverCompute = ignite.compute(serverGroup); 

The broadcast should only send the task to the pre-selected random node from
the cluster group? or am I misunderstanding this behavior?

Looks like in the above code, I can easily switch to using
run(IgniteRunnanle runnable) and that will run the job on one node. Not sure
how broadcast is different here, but i do not have to use broadcast. So if
you confirm run() is a better choice here, I will switch to run and give it
a try.

The volume te threadDumps.zip
 
sting we are doing is running 20 clients in parallel calling the above code
on a cluster of 16 nodes. not sure if that is causing it and we need to
scale up.

Attached the threadDumps from all 16 server nodes.

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/IgniteCompute-broadcast-stuck-tp7255p7278.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


IgniteCompute.broadcast() stuck

2016-08-23 Thread bintisepaha
Hi We are on ignite 1.5.0-final and are recently facing this issue from the
client side broadcaasting a job to a random remote server node. how can we
avoid this and what is causing this? 

We can 20 parallel clients for load testing, and 15 completed with no
issues, 5 got stuck here.

code that calls the below and hangs from client side, it hangs at this line
serverCompute.broadcast(new OrderHolderSaveRunnable(ignite,
orderHolderList));


public Boolean processOrderHolders(List orderHolderList)
throws Exception {
ClusterGroup serverGroup = 
ignite.cluster().forServers().forRandom();
IgniteCompute serverCompute = ignite.compute(serverGroup);
try {
serverCompute.broadcast(new 
OrderHolderSaveRunnable(ignite,
orderHolderList));
} catch(Exception e) {
logger.error(e,e);
throw e;
}

return true;
}

Thread dump

Name: main
State: WAITING on
org.apache.ignite.internal.ComputeTaskInternalFuture@3fd1be52
Total blocked: 5  Total waited: 5,975

Stack trace: 
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:157)
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:115)
org.apache.ignite.internal.AsyncSupportAdapter.saveOrGet(AsyncSupportAdapter.java:112)
org.apache.ignite.internal.IgniteComputeImpl.broadcast(IgniteComputeImpl.java:250)
com.tudor.datagridI.client.TradeOrderStoreHelper.processOrderHolders(TradeOrderStoreHelper.java:37)
com.tudor.datagridI.TradingDataAccessImpl.saveOrders(TradingDataAccessImpl.java:399)
orderserver.client.GridClient.updateOrderHoldersInGrid(GridClient.java:138)
orderserver.Order.save(Order.java:3619)
   - locked orderserver.Order@732871ce
orderserver.Order.save(Order.java:3563)
   - locked orderserver.Order@732871ce
izi.izi_data_grid_ignite_test.OrderBooker.bookRegularOrder(OrderBooker.java:111)
izi.izi_data_grid_ignite_test.OrderBooker.bookOrder(OrderBooker.java:33)
izi.izi_data_grid_ignite_test.Main.bookOrders(Main.java:47)
izi.izi_data_grid_ignite_test.Main.runExc(Main.java:83)
izi.izi_data_grid_ignite_test.Main.run(Main.java:35)
izi.izi_data_grid_ignite_test.Runner.run(Runner.java:37)
izi.izi_data_grid_ignite_test.Runner.main(Runner.java:17)

Any help is greatly appreciated.




--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/IgniteCompute-broadcast-stuck-tp7255.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: One failing node stalling the whole cluster

2016-08-21 Thread bintisepaha
Hi Denis, 
we see this exception too from a client when the cluster is
restarted."IllegalStateException: Cache has been closed or destroyed: cache"
we reconnect the client to the cluster by calling Ignition.stop and
Ignition.start/ignite again and we are able to avoid this.

As far as cluster hanging goes, We are seeing many issues. optimistic txns
hanging in commit(), if we try to kill nodes with hanging txns, the cluster
hangs afterwards. unfortunately we cannot stop using Ignite at this point,
we are already in production with some functionality.

What can we send you to help us solve this issue?

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/One-failing-node-stalling-the-whole-cluster-tp5372p7199.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: transaction not timing out

2016-08-19 Thread bintisepaha
Val, we are still seeing this issue from time to time.
Transactions are hung during commit() . we do not use write-through so DB
will not cause slowness for us. I understand that prior to commit, locks
were acquired in prepare phase, so if prepare is successful, then why would
commit hang? I am assuming within the commit() call, the first phase of
prepare won't even complete if locks were not acquired, in that case we have
seen transaction optimistic exception various times and on retry the txn
completes successfully.

in the case where it hangs on commit(), we are unable to figure out what
resource it is blocked on?

Also, to free up the resource and see how the cluster behaves, if we kill
any of the offending nodes then the entire cluster hangs.

we use full_sync mode for caches, is there any setting that we need to use
with that? most of our distributed caches have a config similar to below

Any help is very much appreciated.























































traderId

orderId






SORTED



















--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/transaction-not-timing-out-tp5540p7187.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: One failing node stalling the whole cluster

2016-08-19 Thread bintisepaha
Hi, we are seeing similar issues too with ignite 1.5.0. Were you guys able to
resolve it? we use distributed caches in full-sync mode and also a few
replicated caches.

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/One-failing-node-stalling-the-whole-cluster-tp5372p7183.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: transaction not timing out

2016-07-29 Thread bintisepaha
Does prepare happen as part of commit() call? 
As a user we only call commit(). so the txn should get to commit only if
prepare succeeded - right?
once in prepare mode, locks are acquired, no other txn should be able to
update these entries. then the txn goes to goes to commit() once locks are
acquired - what is holding it from updating the entries?



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/transaction-not-timing-out-tp5540p6638.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: transaction not timing out

2016-07-27 Thread bintisepaha
What I mean is why would commit() hang? we use write behind, so cache store
updates are asynchronous (I doubt it would be an issue in our case)

during the commit() phase, if locks were not acquired on txn entries, we
would receive txn optimistic exception, we do receive that sometimes, but in
some other cases, what could be the possible reason for commit() to hang? 

thread dump below for the hanging thread

Name: pub-#2%DataGridServer-Staging%
State: WAITING on
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture@6120e8e8
Total blocked: 11  Total waited: 584

Stack trace: 
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:155)
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:115)
org.apache.ignite.internal.processors.cache.transactions.TransactionProxyImpl.commit(TransactionProxyImpl.java:261)
com.tudor.datagridI.server.tradegen.OrderHolderSaveRunnable.run(OrderHolderSaveRunnable.java:268)
org.apache.ignite.internal.processors.closure.GridClosureProcessor$C4.execute(GridClosureProcessor.java:1879)
org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:509)
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6397)
org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:503)
org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:456)
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
org.apache.ignite.internal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1166)
org.apache.ignite.internal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1770)
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:821)
org.apache.ignite.internal.managers.communication.GridIoManager.access$1600(GridIoManager.java:103)
org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:784)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java)
java.lang.Thread.run(Thread.java)




--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/transaction-not-timing-out-tp5540p6576.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: transaction not timing out

2016-07-26 Thread bintisepaha
Hi, 

Trying to find an answer to the above.
I do not understand why commit() is unable to finish. In an
optimistic/serializable transaction, it should not even reach the commit
phase using the 2 phase protocol. If the txn reached he commit() phase and
then it was successful in acquiring all the locks it needs and it should not
stall.

Another related question, is it not recommended to start an ignite
transaction from the client? what happens if the client crashes in the
middle of the transaction? With pessimistic txn on client side, and if I
kill the client in the middle of the txn, those keys that it had locks on
are never released. now that same entry n cache cannot be updated by any
process. how do I get out of such a situation.

However, in optimistic locking this does not happen until commit, so it
behaves differently. But what if the client crashes during commit phase? i
also cannot use optimistic in this scenario.

Thanks,
Binti




--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/transaction-not-timing-out-tp5540p6557.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


index/exposing SqlField on a composition object?

2016-07-12 Thread bintisepaha
Hi,

If we have a cache that contains Order objects which has a set of Trades, is
it possible to query the cache on some fields from the Trade object or
search ?

The cache is IgniteCache 

class Order {
/* class variables */
// some key fields
private Set trades;
}

Here the tables are separate in the database, orders and trades, but for our
use cases, we always query them together, so we thought of merging them.

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/index-exposing-SqlField-on-a-composition-object-tp6243.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: transaction not timing out

2016-07-12 Thread bintisepaha
If in any case commit() is unable to finish, how can we catch that? in an
optimistic transaction, it should not even reach the commit phase using the
2 phase protocol. 



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/transaction-not-timing-out-tp5540p6241.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: System.exit() not exiting cleanly (locked on IgnitionEx$IgniteNamedInstance)

2016-06-29 Thread bintisepaha
Thanks for your response. We are disabling shmem this weekend in production
as per your advice.
Will let you know how that helps next week.

The issue started happening after 3 months of smooth running, so we were
curious to understand how it became a problem suddenly. 



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/System-exit-not-exiting-cleanly-locked-on-IgnitionEx-IgniteNamedInstance-tp5814p5996.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: System.exit() not exiting cleanly (locked on IgnitionEx$IgniteNamedInstance)

2016-06-28 Thread bintisepaha
Denis, is it ok to delete this file while the cluster is up? or this folder?
servers and clietns might be connected to it.

I posted another question for the community trying to understand the use of
the work directory. Could you please respond to that?

This lock.file has not been updated since Apr 13, the first time we used
ignite in production. the size of the file is 0 KB.

Also it is hard for us to understand why this started happening after 3
months, we were running smoothly until now. and now System.exit() or
Ignite.stop() both don't work randomly. For 2 days we were good, by passing
-DIGNITE_NO_SHUTDOWN_HOOK=true, now even that is not a reliable solution.
Its does not work each time.

is System.exit() or Ignite.stop(grid, false) not a good solution? Should we
be using Ignite.close()? or Ignite.stop(grid, true) to kill all the current
running jobs.

Thanks,
Binti







--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/System-exit-not-exiting-cleanly-locked-on-IgnitionEx-IgniteNamedInstance-tp5814p5967.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Creating cache with CacheLoaderFactory on client node brings exception org.apache.ignite.IgniteCheckedException: Failed to find class with given class loader for unmarshalling (make sure same ver

2016-06-27 Thread bintisepaha
Are our nodes running the same JDK version?



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Creating-cache-with-CacheLoaderFactory-on-client-node-brings-exception-org-apache-ignite-IgniteCheck-tp5915p5938.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Ignite work directory usage?

2016-06-27 Thread bintisepaha
Hi, 

Could someone explain the use of work directory? How does it work for client
and server?
Do they need to have access to the same directory?

There is not much documentation on it.

Thanks,
Binti




--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Ignite-work-directory-usage-tp5936.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: System.exit() not exiting cleanly (locked on IgnitionEx$IgniteNamedInstance)

2016-06-27 Thread bintisepaha
bearrito, in 1.6 we saw streamer hung on flush() or close() and would never
return. We do not see that issue in 1.5.0.final.

Igniters, could you please look at the earlier attached log files and thread
dumps to help with the original issue.

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/System-exit-not-exiting-cleanly-locked-on-IgnitionEx-IgniteNamedInstance-tp5814p5924.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: System.exit() not exiting cleanly (locked on IgnitionEx$IgniteNamedInstance)

2016-06-26 Thread bintisepaha
Attached is the zipped RGP.zip
   log
file (RGP.log) from the same client node that hangs. I see the shmem thread
gets into a GC iteration at the same time. Do you think that might be
causing it not to shut down? 

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/System-exit-not-exiting-cleanly-locked-on-IgnitionEx-IgniteNamedInstance-tp5814p5905.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: System.exit() not exiting cleanly (locked on IgnitionEx$IgniteNamedInstance)

2016-06-24 Thread bintisepaha
We are seeing multiple issues with 1.6.0 with dataStreamer and when server
nodes are killed, the grid becomes unstable and unresponsive. So at this
point we cannot really use 1.6

here is the thread dump for the thread you asked for and the ones I thought
were relevant. I will try to get all thread dumps for you.

We can try disabling shmem, but we need to be able to understand what
changed suddenly.

Name: shmem-communication-acceptor-#72%DataGridServer-Production%
State: WAITING on java.lang.Object@7377b101
Total blocked: 1  Total waited: 5

Stack trace: 
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:503)
org.apache.ignite.internal.util.worker.GridWorker.join(GridWorker.java:233)
org.apache.ignite.internal.util.IgniteUtils.join(IgniteUtils.java:7295)
org.apache.ignite.internal.util.ipc.shmem.IpcSharedMemoryServerEndpoint.close(IpcSharedMemoryServerEndpoint.java:467)
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$ShmemAcceptWorker.body(TcpCommunicationSpi.java:2905)
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
java.lang.Thread.run(Unknown Source)


Name: ipc-shmem-gc-#66%DataGridServer-Production%
State: RUNNABLE
Total blocked: 0  Total waited: 1

Stack trace: 
sun.nio.ch.FileDispatcherImpl.lock0(Native Method)
sun.nio.ch.FileDispatcherImpl.lock(Unknown Source)
sun.nio.ch.FileChannelImpl.lock(Unknown Source)
java.nio.channels.FileChannel.lock(Unknown Source)
org.apache.ignite.internal.util.ipc.shmem.IpcSharedMemoryServerEndpoint$GcWorker.cleanupResources(IpcSharedMemoryServerEndpoint.java:608)
org.apache.ignite.internal.util.ipc.shmem.IpcSharedMemoryServerEndpoint$GcWorker.body(IpcSharedMemoryServerEndpoint.java:563)
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
java.lang.Thread.run(Unknown Source)

Name: Thread-11
State: BLOCKED on
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance@7a95f8da owned by:
node-stop-thread
Total blocked: 1  Total waited: 0

Stack trace: 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2178)
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2170)
org.apache.ignite.internal.IgnitionEx.stop(IgnitionEx.java:322)
org.apache.ignite.Ignition.stop(Ignition.java:224)
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$8.run(GridDiscoveryManager.java:1794)
java.lang.Thread.run(Unknown Source)

Name: node-stop-thread
State: WAITING on java.lang.Object@7377b101
Total blocked: 19  Total waited: 22

Stack trace: 
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:503)
org.apache.ignite.internal.util.worker.GridWorker.join(GridWorker.java:233)
org.apache.ignite.internal.util.IgniteUtils.join(IgniteUtils.java:7295)
org.apache.ignite.internal.util.ipc.shmem.IpcSharedMemoryServerEndpoint.close(IpcSharedMemoryServerEndpoint.java:467)
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$ShmemAcceptWorker.cancel(TcpCommunicationSpi.java:2913)
org.apache.ignite.internal.util.IgniteUtils.cancel(IgniteUtils.java:4446)
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.spiStop(TcpCommunicationSpi.java:1775)
org.apache.ignite.internal.managers.GridManagerAdapter.stopSpi(GridManagerAdapter.java:279)
org.apache.ignite.internal.managers.communication.GridIoManager.stop(GridIoManager.java:546)
org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:1928)
org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:1794)
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2207)
   - locked
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance@7a95f8da
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2170)
org.apache.ignite.internal.IgnitionEx.stop(IgnitionEx.java:322)
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker$1.run(ServerImpl.java:2174)
java.lang.Thread.run(Unknown Source)

Name: main
State: WAITING on java.lang.Object@4b38e7df
Total blocked: 2  Total waited: 2,524

Stack trace: 
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:503)
com.tudor.datagridI.server.ServerCommon.waitForever(ServerCommon.java:24)
com.tudor.datagridI.server.DataGridServerJoiner.main(DataGridServerJoiner.java:38)




--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/System-exit-not-exiting-cleanly-locked-on-IgnitionEx-IgniteNamedInstance-tp5814p5893.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: System.exit() not exiting cleanly (locked on IgnitionEx$IgniteNamedInstance)

2016-06-24 Thread bintisepaha
JDK 1.7, 16 node cluster, each JVM 10 GB heap and 4 GB off heap. 4 nodes run
on 4 linux boxes, each box has 64 GB memory. the heap is under-utilized when
this happens. so doubt GC will cause it.
multiple clients connect to it. clients cannot exit properly either with
System.exit() or Ignition.stop().

We did not have the issue on our end until this week, but then something
could have changed with ports in our environment. We cannot reproduce
ourselves in UAT it only happens for us in Production.





--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/System-exit-not-exiting-cleanly-locked-on-IgnitionEx-IgniteNamedInstance-tp5814p5876.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: System.exit() not exiting cleanly (locked on IgnitionEx$IgniteNamedInstance)

2016-06-22 Thread bintisepaha
I also see this when the process is stuck at system.exit, many SIGINT calls.

Name: SIGINT handler
State: BLOCKED on java.lang.Class@4debc642 owned by: main
Total blocked: 1  Total waited: 0

Stack trace: 
java.lang.Shutdown.exit(Shutdown.java:212)
java.lang.Terminator$1.handle(Terminator.java:52)
sun.misc.Signal$1.run(Signal.java:212)
java.lang.Thread.run(Thread.java:745)



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/System-exit-not-exiting-cleanly-locked-on-IgnitionEx-IgniteNamedInstance-tp5814p5815.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


System.exit() not exiting cleanly (locked on IgnitionEx$IgniteNamedInstance)

2016-06-22 Thread bintisepaha
Hi Igniters,

We are on 1.5.0.final and have been on it for a while. But we started seeing
this issue in production recently. We want a JVM to call System.exit() when
its done its work, but we see the below locking. This affects us badly, we
cannot kill the processes on linux with a Ctrl+C either. we have to involve
support to run a kill -9.

We are unable to reproduce this issue on windows. And it happens on Linux
most of the times now.

Can you please look at the thread dump and suggest some solution?

Name: Thread-3
State: WAITING on java.lang.Object@43ba110c
Total blocked: 9  Total waited: 12

Stack trace: 
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:503)
org.apache.ignite.internal.util.worker.GridWorker.join(GridWorker.java:233)
org.apache.ignite.internal.util.IgniteUtils.join(IgniteUtils.java:7295)
org.apache.ignite.internal.util.ipc.shmem.IpcSharedMemoryServerEndpoint.close(IpcSharedMemoryServerEndpoint.java:467)
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$ShmemAcceptWorker.cancel(TcpCommunicationSpi.java:2913)
org.apache.ignite.internal.util.IgniteUtils.cancel(IgniteUtils.java:4446)
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.spiStop(TcpCommunicationSpi.java:1775)
org.apache.ignite.internal.managers.GridManagerAdapter.stopSpi(GridManagerAdapter.java:279)
org.apache.ignite.internal.managers.communication.GridIoManager.stop(GridIoManager.java:546)
org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:1928)
org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:1794)
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2207)
   - locked
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance@5dadab2f
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2170)
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.run(IgnitionEx.java:1731)




Name: main
State: WAITING on
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2@6eecdcff
Total blocked: 82  Total waited: 105

Stack trace: 
java.lang.Object.wait(Native Method)
java.lang.Thread.join(Thread.java:1258)
java.lang.Thread.join(Thread.java:1332)
java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106)
java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46)
java.lang.Shutdown.runHooks(Shutdown.java:123)
java.lang.Shutdown.sequence(Shutdown.java:167)
java.lang.Shutdown.exit(Shutdown.java:212)
   - locked java.lang.Class@34c1cab5
java.lang.Runtime.exit(Runtime.java:107)
java.lang.System.exit(System.java:960)
com.tudor.reconcile.ReconcilePositions.reconcile(ReconcilePositions.java:353)
com.tudor.reconcile.ReconcilePositions.main(ReconcilePositions.java:113)




--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/System-exit-not-exiting-cleanly-locked-on-IgnitionEx-IgniteNamedInstance-tp5814.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: transaction not timing out

2016-06-22 Thread bintisepaha
Denis, did you get a chance to look at this?
any help will be greatly appreciated.

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/transaction-not-timing-out-tp5540p5805.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: transaction not timing out

2016-06-17 Thread bintisepaha
Denis, the txn timeout works in the code as long as it happens before commit
is called.
I am thinking if its in committing state, it does not honor timeout. 

When the framework calls the checkValid() method in IgniteTxLocalAdapter, it
makes a call to the below

  if (remainingTime() == -1 && setRollbackOnly())
throw new IgniteTxTimeoutCheckedException("Cache transaction
timed out " +
"(was rolled back automatically): " + this);

setRollbackOnly() calls state(MARKED_ROLLBACK)
and here I see that it will only return true if it was in one of the below
states


case MARKED_ROLLBACK: {
valid = prev == ACTIVE || prev == PREPARING || prev ==
PREPARED;

break;
}

So if it was in COMMITTING, we would not throw a Txn Timeout exception. Is
this correct?

here is the sample code from our project, we maintain an
orderUpdateLossPrevention cache in case of node failures when primary node
fails before it write behinds the entry to the DB Store. if the write behind
finishes, we remove the entry.

First TXN that updates a transaction cache
(orderUpdateLossPrevention.put(order.getOrderKey(), order)) along with other
caches (all transactional)

Transaction tx = null;
try {
if (txns == null)
txns = ignite.transactions();

tx=ignite.transactions().txStart(TransactionConcurrency.OPTIMISTIC,
TransactionIsolation.SERIALIZABLE);
tx.timeout(10*1000);
for (OrderHolder oh : orderHolderList) {  
Order order = oh.getOrder();
if(order!=null){
OrderHelper.saveOrder(order);

orderUpdateLossPrevention.put(order.getOrderKey(), order);

}   
}
// more code
tx.commit();  // the thread hangs here forever, because some other thread on
some other node is doing the below...stuck on write behind.

Second Transaction where the same entry is removed when the writebehind is
finished. its a part of the method call from CacheStore writeAll(). 

IgniteTransactions txns = null;
Transaction tx = null;
Ignite ignite = Ignition.ignite(Global.DATA_GRID_NAME);
try {
if (txns == null)
txns = ignite.transactions();
tx=ignite.transactions().txStart(TransactionConcurrency.OPTIMISTIC,
TransactionIsolation.SERIALIZABLE);
tx.timeout(2*1000);
orderUpdateLossPreventionCache.remove(entry.getKey());
tx.commit();
} catch (IgniteException e) {
logger.error("Received IgniteException - rolling back transaction", e);
tx.rollback();
throw e;
}





--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/transaction-not-timing-out-tp5540p5725.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: transaction not timing out

2016-06-16 Thread bintisepaha
Thanks Denis, did not realize I was creating a new cache with that name with
default settings. However, this was a test case, I fixed it and I can it
timing out. The real issue we see in our UAT enivornment is as follows.

All caches are defined as TRANSACTIONAL in the config file.
But we see one thread on the client side that invokes a compute on the
server node, stuck on the below.
Its at commit and it never times out. In my test case, I cannot reproduce it
because I see exactly what you are describing, a deadlock free scenario.

When this thread is hung, its hung forever, causing the client to come a
complete halt. I cannot reproduce this with test code to share with you, but
this is easily reproducible in our Environment.

It seems like if the txn is in its commit phase, timeout is not honored.

Any help or pointers are greatly appreciated.

Thanks,
Binti

Name: pub-#8%DataGridServer-Staging%
State: WAITING on
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture@199a76e0
Total blocked: 2  Total waited: 77

Stack trace: 
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(Unknown Source)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown
Source)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(Unknown
Source)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(Unknown
Source)
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:157)
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:117)
org.apache.ignite.internal.processors.cache.transactions.TransactionProxyImpl.commit(TransactionProxyImpl.java:261)
com.tudor.datagridI.server.tradegen.OrderHolderSaveRunnable.run(OrderHolderSaveRunnable.java:135)
org.apache.ignite.internal.processors.closure.GridClosureProcessor$C4V2.execute(GridClosureProcessor.java:2206)
org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:509)
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6459)
org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:503)
org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:456)
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
org.apache.ignite.internal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1161)
org.apache.ignite.internal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1766)
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1219)
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:847)
org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:105)
org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:810)
java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
java.lang.Thread.run(Unknown Source)




--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/transaction-not-timing-out-tp5540p5681.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: transaction not timing out

2016-06-15 Thread bintisepaha
TestTxnTimout.java
  

I have attached the test case. Instead of my cache-local file, you can use
the default cache config file, since I am declaring the cache in java code.

Thread A never times out. I have even tried to put Thread.sleep() in it. why
does it not timeout? 
The issue we saw in our UAT environment was the thread not timing out at
commit() but here i can see that no matter which point the thread is - it
never times out.

Appreciate your help,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/transaction-not-timing-out-tp5540p5656.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: transaction not timing out

2016-06-13 Thread bintisepaha
I will get this out to you shortly. In the meantime could you please explain
how transaction timeout works? Why does commit block forever?



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/transaction-not-timing-out-tp5540p5617.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: transaction not timing out

2016-06-10 Thread bintisepaha
Hi Ignite Community,

We can easily reproduce this by locking resources, but what we do not
understand is why the txn does not time out if it is in commit state? Why
doe sit block forever?

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/transaction-not-timing-out-tp5540p5583.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Ignite Write Behind performance

2016-06-08 Thread bintisepaha
Thanks for that. we were setting flushsize to 0. how about flush thread
count?



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Ignite-Write-Behind-performance-tp5385p5541.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Cache.put delays

2016-06-08 Thread bintisepaha
We still saw issues with 1.6.0, but sing FULL_SYNC mode resolves it.



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Cache-put-delays-tp5386p5539.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Inserts stalled by write-behind process

2016-06-07 Thread bintisepaha
Where do you see this property?
WriteBehindTotalCriticalOverflowCount 




--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Inserts-stalled-by-write-behind-process-tp3390p5500.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Ignite Write Behind performance

2016-06-06 Thread bintisepaha
amitpa, We would be interested in learning how did this perform for you?
We have implemented spring txn to insert in database for write-behind.
Hoever, we see that sometiems write-behind is not even called for some
objects that we are certain were just updated in the cache. have you noticed
that?



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Ignite-Write-Behind-performance-tp5385p5470.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Cache.put delays

2016-06-02 Thread bintisepaha
We are using ignite 1.5.0-final version. While testing on a single server
node we see this behavior with cache.put(). We know we have just loaded an
object in the cache from the DB Store, as soon as that is put in the cache,
the writebehind is also kicked off. But right after loading it when we do a
cache.get for the same key the object returned is null. I was debugging this
code and I had a breakpoint after cache.get() line and a few seconds later
when I inspected the same cache.get() call, I saw the object being returned
correctly in the eclipse variable window.

As soon as I wrap the loading of this object from the DB store in a
transaction, I don't see this issue at all. We do that we are invoking load
sequentially for each object and the object is not loaded twice. SO don't
see a locking issue.

Is this the same as issue described here?
https://issues.apache.org/jira/browse/IGNITE-2407?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%201.6%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC

However, its important to note that we see this issue with a single server
node, so not sure why partitioned caches might be an issue here.

We are seeing more issues with cache.put and get. the behavior is random but
the results are as if all puts are not seen by gets. Any help is
appreciated.

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Cache-put-delays-tp5386.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Client fails to connect - joinTimeout vs networkTimeout

2016-04-29 Thread bintisepaha
Val, thanks a lot. Will this also work if the caches do not use affinity?
We are trying not to use affinity because our data is very skewed.



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Client-fails-to-connect-joinTimeout-vs-networkTimeout-tp4419p4706.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Client fails to connect - joinTimeout vs networkTimeout

2016-04-28 Thread bintisepaha
Is there a way to configure backups node on a different physical host in such
a scenario? I do not want the primary and back up on the same host in the
event that host crashes. 

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Client-fails-to-connect-joinTimeout-vs-networkTimeout-tp4419p4661.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Ignite cache data size problem.

2016-04-27 Thread bintisepaha
Kevin, we are facing similar issues. And the reason is not ignite using up
the heap, its the data you load from the database that takes up so much
memory until it makes it to the cache. We see a lot of heap usge after start
up and then if we perform a GC the memory on each node drops by at least 4
GB. 

The way we are loading the data from the database is only running the
loadCache on the final node (we call it Initiator) and others are called
joiners. the joiners do not go to the database and only initiator does. But
initiator calls ignite.compute on the server nodes so that data is loaded
parallely like we want. the joiners are started first and have no data
loading code and then initiator is started later and it distributes work to
other joiner nodes and itself. hence DB queries are performed only once, but
in parallel.

Let me know if this makes sense.

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Ignite-cache-data-size-problem-tp4449p4625.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Client fails to connect - joinTimeout vs networkTimeout

2016-04-27 Thread bintisepaha
Thanks Val, we will look at the jvm tuning and see what fits our environment.
That seems really helpful.

Another question you had asked earlier was why do we have more than one JVM
node on one physical machine? Is this a bad design? We are just trying to
start up the grid by loading records from database in parallel and with many
nodes it helps us load the cluster caches faster. it also helps us keep
smaller and reasonable heap sizes instead of one large heap.

Do you have any different recommendations?



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Client-fails-to-connect-joinTimeout-vs-networkTimeout-tp4419p4623.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Client fails to connect - joinTimeout vs networkTimeout

2016-04-25 Thread bintisepaha
Val, we are going to try 10 GB off heap storage with 4 GB heap sizes this
week and next and will revert how that looks. A follow up question - what
happens when data does not fit off heap and we have defined affinity, where
does it go - on heap or do we start seeing inconsistent results? We do not
have swap sapce configured yet or any eviction policy.

Another question I asked earlier was about if you guys had any specific GC
tuning recommendations? 



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Client-fails-to-connect-joinTimeout-vs-networkTimeout-tp4419p4516.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: ignite logging not captured in log file (log4j)

2016-04-25 Thread bintisepaha
I will give it one more shot on our end with several clients connected to the
grid when we attempt to restart and let you know if we saw the same effect.

Does ignite support a rolling restart methodology?



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/ignite-logging-not-captured-in-log-file-log4j-tp4334p4515.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: ignite logging not captured in log file (log4j)

2016-04-25 Thread bintisepaha
After enabling the dependency this is what I used. Does not seem to log
anything extra than my java code in the log file. 

log4j.logger.org.apache.ignite=ERROR
log4j.logger.org.apache.activemq=INFO
log4j.logger.org.apache.commons=INFO



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/ignite-logging-not-captured-in-log-file-log4j-tp4334p4505.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: ignite logging not captured in log file (log4j)

2016-04-22 Thread bintisepaha
I had set it on ERROR level. 



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/ignite-logging-not-captured-in-log-file-log4j-tp4334p4467.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: ignite logging not captured in log file (log4j)

2016-04-22 Thread bintisepaha
Val, adding the dependency worked. The clients also started logging at root
level. We will have to modify the client log4j files to only log Error
level. Thanks, but we saw another issue now. When we tried to bring up the
grid (server nodes - ERROR level) and if old clients were already connected,
it was very slow to bring up the grid. I had to undo the change in our UAT
environment and remove the logging dependency.

Have you noticed this issue before?

Is my understanding correct, that if the ignite-log4j dependency is found on
the classpath any node will start logging ignite logs?

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/ignite-logging-not-captured-in-log-file-log4j-tp4334p4460.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


  1   2   >