[orientdb] Re: Observed issues with running a cluster in Windows Azure

Amir Khawaja Tue, 24 Mar 2015 10:40:48 -0700

The cluster is now online in US East2 and US West. I did the following:

- Changed the default-distributed-db-config.json to:


{
    "replication": true,
    "autoDeploy": true,
    "hotAlignment": false,
    "resyncEvery": 15,
    "clusters": {
        "internal": {
            "replication": false
        },
        "index": {
            "replication": false
        },
        "*": {
            "replication": true,
            "readQuorum": 1,
            "writeQuorum": 1,
            "failureAvailableNodesLessQuorum": false,
            "readYourWrites": true,
            "partitioning": {
                "strategy": "round-robin",
                "default": 0,
                "partitions": [
                    [ "<NEW_NODE>" ]
                ]
            }
        }
    }
}

- Deleted the distributed-config.json file from each database folder and 
restarted each node in the cluster.

Now, when I connect to one of the nodes and try to delete a vertex, I 
receive the following error:

com.orientechnologies.orient.server.distributed.ODistributedException: 
Error on executing distributed request (id=141 
from=odb02uw task=command_sql(delete vertex #42:2) userName=) against 
database 'vis.[]' to nodes [odb02ue2, odb02uw, 
odb01uw, odb01ue2] --> 
com.orientechnologies.orient.server.distributed.ODistributedException: 
Quorum 4 not reached for 
request (id=141 from=odb02uw task=command_sql(delete vertex #42:2) 
userName=). Timeout=407ms Servers in timeout/
conflict are: - odb02ue2: 
com.orientechnologies.orient.core.exception.OCommandExecutionException: 
Error on execution 
of command: sql.delete vertex #42:2 - odb01ue2: 
com.orientechnologies.orient.core.exception.
OCommandExecutionException: Error on execution of command: sql.delete 
vertex #42:2 - odb01uw: com.orientechnologies.
orient.core.exception.OCommandExecutionException: Error on execution of 
command: sql.delete vertex #42:2 Received: 
{odb02uw=com.orientechnologies.orient.core.exception.OCommandExecutionException:
 
Error on execution of command: sql.
delete vertex #42:2, 
odb01uw=com.orientechnologies.orient.core.exception.OCommandExecutionException: 
Error on 
execution of command: sql.delete vertex #42:2, 
odb02ue2=com.orientechnologies.orient.core.exception.
OCommandExecutionException: Error on execution of command: sql.delete 
vertex #42:2, odb01ue2=com.orientechnologies.
orient.core.exception.OCommandExecutionException: Error on execution of 
command: sql.delete vertex #42:2}

Why am I not able to delete a vertex?

Amir.


On Tuesday, March 24, 2015 at 12:20:37 PM UTC-5, Colin wrote:
>
> That latency should be fine so long as it's consistent.
>
> -Colin
>
> On Tuesday, March 24, 2015 at 11:52:58 AM UTC-5, Amir Khawaja wrote:
>>
>> Hi Colin,
>>
>> I checked the latency prior to posting and between regions it is about 
>> 65ms on average. What should I set the latency to for Hazelcast?
>>
>> Amir.
>>
>> On Tuesday, March 24, 2015 at 11:49:25 AM UTC-5, Colin wrote:
>>>
>>> Hi Amir,
>>>
>>> You might also do a ping and a traceroute between the machines and see 
>>> what kind of latency you're getting, just in case it's a timeout issue with 
>>> Hazelcast.
>>>
>>> -Colin
>>>
>>> On Tuesday, March 24, 2015 at 11:32:21 AM UTC-5, Amir Khawaja wrote:
>>>>
>>>> Hi Colin,
>>>>
>>>> Thank you for the prompt response.
>>>>
>>>> I'm a little confused as you say "the US West node will not come online 
>>>>> telling me that the database is not yet online.  At that point, I kill 
>>>>> the 
>>>>> process and then eventually the database comes online."
>>>>
>>>> Do you mean you kill the database process and then restart it and then 
>>>>> it starts communicating? 
>>>>
>>>>
>>>> Yes. I kill the database process on the cluster node where the OrientDB 
>>>> is not coming online.
>>>>
>>>> Can you see on each machine when Hazelcast 'sees' all the members?  Are 
>>>>> all the members showing up?
>>>>
>>>>
>>>> Yes. I see the databases are talking to each other as the IP address of 
>>>> the nodes show up in the log of each database server.
>>>>
>>>> I will try setting hotAlignment to false and report my results on this 
>>>> thread.
>>>>
>>>> Amir.
>>>>
>>>>
>>>> On Tuesday, March 24, 2015 at 11:25:16 AM UTC-5, Colin wrote:
>>>>>
>>>>> Hi Amir,
>>>>>
>>>>> Is it consistently a problem between the same machines not seeing each 
>>>>> other?
>>>>>
>>>>> I'm a little confused as you say "the US West node will not come 
>>>>> online telling me that the database is not yet online.  At that point, I 
>>>>> kill the process and then eventually the database comes online."
>>>>>
>>>>> Do you mean you kill the database process and then restart it and then 
>>>>> it starts communicating?
>>>>>
>>>>> In your distributed json file, try setting "hotAlignment" to false.
>>>>>
>>>>> Can you see on each machine when Hazelcast 'sees' all the members? 
>>>>>  Are all the members showing up?
>>>>>
>>>>> -Colin
>>>>>
>>>>> Orient Technologies
>>>>>
>>>>> The Company behind OrientDB
>>>>>
>>>>> On Tuesday, March 24, 2015 at 11:19:05 AM UTC-5, Amir Khawaja wrote:
>>>>>>
>>>>>> Greetings, everyone. Has anyone had much success running an OrientDB 
>>>>>> 2.0.5 cluster in Azure? I created a cluster in Windows Azure with 4 
>>>>>> nodes 
>>>>>> using CentOS 7 and OrientDB Community 2.0.4 -- 2 nodes in US East2 and 2 
>>>>>> nodes in US West. There is a Site-to-Site VPN connection between the two 
>>>>>> regions in Azure and data is flowing between machines across the 
>>>>>> network. I 
>>>>>> have three databases that I have currently deployed and testing. I find 
>>>>>> that many times the synchronization between databases does not occur. 
>>>>>> For 
>>>>>> instance, if I startup the first node in US East2 and once that comes 
>>>>>> online, fire up the second node in US West, the US West node will not 
>>>>>> come 
>>>>>> online telling me that the database is not yet online. At that point, I 
>>>>>> kill the process and then eventually the database comes online. I even 
>>>>>> have 
>>>>>> to go so far as to delete the databases in the database path folder. I 
>>>>>> do 
>>>>>> this a few times and eventually the server may startup. Sometimes, I 
>>>>>> will 
>>>>>> have three of the four nodes working and the fourth just refuses to come 
>>>>>> online. 
>>>>>>
>>>>>> The VM size selected for each node in the cluster is a D4 (4 cores, 
>>>>>> 28GB RAM). This should be more than sufficient to handle most loads. 
>>>>>> Surely, I must be missing something as this is not acceptable production 
>>>>>> behavior. For reference, I am pasting the hazelcast.xml and 
>>>>>> default-distributed-db-config.json files here in hopes that someone has 
>>>>>> some pointers for me.
>>>>>>
>>>>>> *** hazelcast.xml ***
>>>>>>
>>>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>>>> <!-- ~ Copyright (c) 2008-2012, Hazel Bilisim Ltd. All Rights 
>>>>>> Reserved. ~
>>>>>> ~ Licensed under the Apache License, Version 2.0 (the "License"); ~ 
>>>>>> you may
>>>>>> not use this file except in compliance with the License. ~ You may 
>>>>>> obtain
>>>>>> a copy of the License at ~ ~ 
>>>>>> http://www.apache.org/licenses/LICENSE-2.0 ~
>>>>>> ~ Unless required by applicable law or agreed to in writing, software 
>>>>>> ~ distributed
>>>>>> under the License is distributed on an "AS IS" BASIS, ~ WITHOUT 
>>>>>> WARRANTIES
>>>>>> OR CONDITIONS OF ANY KIND, either express or implied. ~ See the 
>>>>>> License for
>>>>>> the specific language governing permissions and ~ limitations under 
>>>>>> the License. -->
>>>>>>
>>>>>> <hazelcast
>>>>>> xsi:schemaLocation="http://www.hazelcast.com/schema/config 
>>>>>> hazelcast-config-3.0.xsd"
>>>>>> xmlns="http://www.hazelcast.com/schema/config"; xmlns:xsi="
>>>>>> http://www.w3.org/2001/XMLSchema-instance";>
>>>>>> <group>
>>>>>> <name>[name]</name>
>>>>>> <password>[password]</password>
>>>>>> </group>
>>>>>> <network>
>>>>>> <port auto-increment="true">2434</port>
>>>>>> <join>
>>>>>> <multicast enabled="false">
>>>>>> <multicast-group>235.1.1.1</multicast-group>
>>>>>> <multicast-port>2434</multicast-port>
>>>>>> </multicast>
>>>>>> <tcp-ip enabled="true">
>>>>>> <member>10.0.0.4</member>
>>>>>> <member>10.0.0.5</member>
>>>>>> <member>10.1.0.4</member>
>>>>>> <member>10.1.0.5</member>
>>>>>> </tcp-ip>
>>>>>> </join>
>>>>>> </network>
>>>>>> <executor-service>
>>>>>> <pool-size>16</pool-size>
>>>>>> </executor-service>
>>>>>> </hazelcast>
>>>>>>
>>>>>>
>>>>>> *** default-distributed-db-config.json ***
>>>>>>
>>>>>> {
>>>>>>     "autoDeploy": true,
>>>>>>     "hotAlignment": true,
>>>>>>     "executionMode": "synchronous",
>>>>>>     "readQuorum": 1,
>>>>>>     "writeQuorum": 3,
>>>>>>     "failureAvailableNodesLessQuorum": false,
>>>>>>     "readYourWrites": true,
>>>>>>     "clusters": {
>>>>>>         "internal": {
>>>>>>         },
>>>>>>         "index": {
>>>>>>         },
>>>>>>         "*": {
>>>>>>             "servers" : [ "<NEW_NODE>" ]
>>>>>>         }
>>>>>>     }
>>>>>> }
>>>>>>
>>>>>> Thank you for any assistance you can offer.
>>>>>>
>>>>>> Amir.
>>>>>>
>>>>>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[orientdb] Re: Observed issues with running a cluster in Windows Azure

Reply via email to