Response inline.
On Thursday, May 9, 2019 at 3:21:47 PM UTC, Jérôme Mainaud wrote:
>
> OK, I'm not surprised by the SB-Tree insert cost increase as adding a key
> complexity in such a Tree is O(log(n)).
>
> For your first case, I see no other solution as build an index but you can
> do it with a UNIQUE_HASH_INDEX. If the implementation is good, adding a
> key should be mean time constant (some keys are punctually more expensive,
> when the index storage base has to grow).
>
> Tried it. There is no difference. Initially beginning with 20000
items/sec, after about one and a half days, the speed decreased down to 500
items/sec.
For other cases, have you tried to query directly from the vertex ?
>
> Suppose we have this data:
> create class Person extends V;
> create property Person.name string;
>
> create class Company extends V;
> create property Company.name string;
>
> create class WorkedAt extends E;
>
> /* Add constraints on the edge. */
> create property WorkedAt.out link Person;
> create property WorkedAt.in link Company;
>
> insert into Person (name) values ('jerome');
> insert into Person (name) values ('john doe');
>
> insert into Company (name) values ('Zeenea');
> insert into Company (name) values ('Ippon Technologies');
> insert into Company (name) values ('Klee Group');
> insert into Company (name) values ('World Big Company');
>
> create edge WorkedAt from (select from Person where name = 'jerome') to
> (select from Company where name = 'Zeenea');
> create edge WorkedAt from (select from Person where name = 'jerome') to
> (select from Company where name = 'Ippon Technologies');
> create edge WorkedAt from (select from Person where name = 'jerome') to
> (select from Company where name = 'Klee Group');
> create edge WorkedAt from (select from Person where name = 'john doe') to
> (select from Company where name = 'World Big Company');
>
> *Use case 2*
> I can count out going link from Person with this query:
>
> orientdb {db=tdb}> select name, out('WorkedAt').size() from Person
>
> +----+--------+----------------------+
> |# |name |out('WorkedAt').size()|
> +----+--------+----------------------+
> |0 |jerome |3 |
> |1 |john doe|1 |
> +----+--------+----------------------+
>
> Which can be further optimized as (if not already done by the optimizer):
>
> orientdb {db=tdb}> select name, out_WorkedAt.size() from Person
>
> +----+--------+-------------------+
> |# |name |out_WorkedAt.size()|
> +----+--------+-------------------+
> |0 |jerome |3 |
> |1 |john doe|1 |
> +----+--------+-------------------+
>
> Those queries use direct links and don't need index, the last one just
> don't need the edge at all.
>
> *Use case 3*
> I can test if a person work in a company with this query:
>
> orientdb {db=tdb}> select count() from Person where name = 'jerome' and
> out('WorkedAt') contains (name = 'Zeenea')
>
> +----+-------+
> |# |count()|
> +----+-------+
> |0 |1 |
> +----+-------+
>
> If count result is one or more items are linked.
> This query use direct links and don't need index.
>
> Of course that just a way to give you the idea. You have to adapt it to
> your use case.
>
> Last but not least, just don't trust me. Test!
> I don't have billions of edges.
> Give me some feedback if I'm wrong or if I miss something. (I am learning
> while I respond to you.)
>
> my 2 cents,
>
> --
> Jérôme Mainaud
> [email protected] <javascript:>
>
>
> Le mer. 8 mai 2019 à 23:37, Suhas <[email protected] <javascript:>> a
> écrit :
>
>> Hey Jerome,
>>
>> Here are a few reasons why I needed an index:
>>
>> 1. Apply unique constraint on the edge. (no more than a single edge
>> between a pair of vertices)
>> 2. Compute incoming and outgoing edge count faster.
>> 3. Whether two vertices are connected or not.
>>
>> Meanwhile, I'm using an SB-Tree Index
>>
>>
>> On Wednesday, May 8, 2019 at 7:15:25 PM UTC, Jérôme Mainaud wrote:
>>>
>>> Hello,
>>>
>>> I don't know the exact implementation used by OrientDB, and it depends
>>> of the type of index you choose.
>>> But it's not a big surprise that the time to include a key increase with
>>> the number of entries in the index.
>>> Hash indexes should be less sensible to cost increase.
>>>
>>> What the purpose of indexing in and ou keys of your edge ?
>>> Queries won't benefit from them as they use links from vertex to the
>>> edge to traverse the graph which is far more efficient.
>>> Tell me if I'm wrong about that.
>>>
>>> --
>>> Jérôme Mainaud
>>> [email protected]
>>>
>>>
>>> Le mer. 8 mai 2019 à 16:04, Suhas <[email protected]> a écrit :
>>>
>>>> I’m creating indexes for an Edge class containing about 500 million
>>>> records on keys (in, out). The index creation progressed well in the
>>>> beginning at about 20,000 items/sec. But then after some time has
>>>> decreased
>>>> to <1000 items/sec.
>>>>
>>>>
>>>> 2019-05-08 08:43:25:885 INFO {db=cgraph} --> 37.00% progress, 177,405,476
>>>> indexed so far (855 items/sec) [OIndexRebuildOutputListener]
>>>> 2019-05-08 08:43:35:899 INFO {db=cgraph} --> 37.00% progress, 177,415,347
>>>> indexed so far (987 items/sec) [OIndexRebuildOutputListener]
>>>> 2019-05-08 08:43:45:902 INFO {db=cgraph} --> 37.00% progress, 177,427,464
>>>> indexed so far (1,211 items/sec) [OIndexRebuildOutputListener]
>>>>
>>>>
>>>> At this speed, it’ll take like 3-4 days!!
>>>> Settings used on 16GB RAM and 300GB SSD
>>>> java -server -Xms2G -Xmx7G -Dstorage.diskCache.bufferSize=7200
>>>>
>>>>
>>>> [image: Screenshot from 2019-05-08 09-06-47.png]
>>>>
>>>> Any idea why the speed of indexing decreased so drastically? And how
>>>> can I increase the speed of indexing?
>>>>
>>>> Orientdb 3.0.15
>>>>
>>>> --
>>>>
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "OrientDB" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/orient-database/95597c3e-632b-4570-af51-f07227dc1965%40googlegroups.com
>>>>
>>>> <https://groups.google.com/d/msgid/orient-database/95597c3e-632b-4570-af51-f07227dc1965%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> --
>>
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "OrientDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/orient-database/52f2837f-0663-4abf-9ed2-1715cda3c97b%40googlegroups.com
>>
>> <https://groups.google.com/d/msgid/orient-database/52f2837f-0663-4abf-9ed2-1715cda3c97b%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
--
---
You received this message because you are subscribed to the Google Groups
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/orient-database/21389bd0-d014-4b25-ba4c-af685f55974f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.