Re: [TinkerPop] gremlin-x

pieter-gmail Fri, 02 Dec 2016 07:30:55 -0800

Hi,

Let me disagree with your disagreement ;-)


Regarding Neo4j

I am talking about Neo4j embedded. The node/vertex is pretty much the
database already being a direct pointer to the node on disc with its
properties right next to it on disc. I would be surprised if all the
properties are not also already in its hot cache. I am speculating about
the internals but when coding in Neo4j embedded you don't care about
pre-loading all or some properties for performance reasons, just load
the node and all is well. Its the beauty of embedded Neo4j, latency is
just not a concern and the node represents a instance of a label.

It would be interesting to execute TinkerPop Neo4j's structure and
process test suites via gremlin server and compare the performance to
embedded. I don't really have a clue what to expect. If every property
access is to be a call via GremlinServer I reckon things will slow down
significantly. The suite is composed with the implicit assumption that
property access is not something to think about.

Regarding Hibernate. I have not worked with Hibernate for some time so
ran a test to make sure.

        EntityManager entityManager =
entityManagerFactory.createEntityManager();
        entityManager.getTransaction().begin();
        int count = 100;
        for (int i = 1; i < count + 1; i++) {
            Person person = new Person("person_" + i);
            entityManager.persist(person);
        }
        entityManager.getTransaction().commit();
        entityManager.close();

        entityManager = entityManagerFactory.createEntityManager();
        Person person = entityManager.find(Person.class, 1L);
        assertNotNull(person);
        assertEquals("person_1", person.getName());

The entityManager.find(Person.class, 1L) resulted in the following sql.

"select person0_.id as id1_5_0_, person0_.name as name2_5_0_ from Person
person0_ where person0_.id=?"

I did not ask for the name property, it returned it anyways as well it
should. If every property needs to be gotten separately then latency
will kill the app.
If the user has to ask for every property individually, well then part
of the point of Hibernate disappears.

RE: "Vertex is just a map wrapper"
But its not just any map, its a Vertex, a core notion of the property
graph model.

RE: "I don't know anyone who wants to deal with Vertex/Edges"
We probably live in our own bubbles but I don't know anyone who would
not want to deal with the core abstractions of the property graph model
and rather deal with Maps, except perhaps Json/Javascript folks :-)

The property graph model and graph traversals are all about vertices and
edge traversals, having that right there as a first class citizen in
code is great.

RE: in hibernate "If I set a property, it does not automatically persist
it to the database."
True but its also the cause of pain with hibernate altogether bypassing
the databases concurrency model with it optimistic locking. And voilla
you are stuck with lets just ignore the exception and retry and hope we
get lucky this time round logic. For what its worth setting a property
on Sqlg runs a update statement. Alas a very good reason why Hibernate
does what it does is because their way reduces latency being able to run
batch updates on commit or flush. Sqlg supports batch updates but its 
not the default.

RE: "In your model, there is no difference between transient, in-memory
state (e.g. workflow) and database state."
Not sure what you mean here. If you mean application writers keeping
their own cache of persistent data then you are right. Rule #1 of
caching is don't cache. Rule #2 is don't cache the cache. Caching is a
solution to a weakness elsewhere. I am not saying don't ever cache but
that if you can avoid it do so. Writing transactional caches is also a
rather specialized and difficult exercise and precisely what databases
are all about.

Lastly, to make sure we are talking about the same change, are you
proposing that all gremlins like

GraphTraversal<Vertex, Vertex> vertices =
this.sqlgGraph.traversal().V().out();

should become

GraphTraversal<Vertex, Map<String, Object>> vertexProperties =
this.sqlgGraph.traversal().V().out().valueMap();

or worse

GraphTraversal<Vertex, Map<String, Object>> vertexProperties =
this.sqlgGraph.traversal().V().out().values("propery1", "propety2",
"property3");

Cheers
Pieter

 
 


On 02/12/2016 14:57, Robert Dale wrote:
> Pieter, while I think Marko may be onto something, I just want to
> completely disagree with you as a Java dev. ;-)
>
> First, in Neo4j's impl, from what I can tell the elements are not
> fully loaded. Every get (getProperty, edges, etc) does a query to the
> database. This is more round trips to the database. So this is why I
> made the statement that implementations are different.  In your sqlg
> case, you are basically arguing that the default behavior is the sql
> equivalent of SELECT *.  This is not a good practice. Then you go on
> to say that if the dev is aware that this is a 'fat' element, they
> should ask for exact properties.  I think what we're arguing is that
> the default behavior should be 'always ask for exact properties'. This
> is the most accepted practice in querying any database, sql, nosql,
> mongodb, cassandra, etc.
>
> That leads us to your Hibernate comment.  In the abstract sense,
> Vertex is just a map wrapper. I think you're just splitting hairs
> trying to distinguish a Dog Vertex and a Dog Map. In either case, you
> would have to query the label.  In any case, I don't know anyone who
> wants to deal with Vertex/Edges.  What most devs deal with, in my
> experience, is a domain-specific model.  So whether I get back a
> Vertex or a Map, either way, I'm going to translate that to my domain
> model.  Also, in hibernate, when I get a property I didn't query for,
> I will get a null.  If I set a property, it does not automatically
> persist it to the database. In your model, there is no difference
> between transient, in-memory state (e.g. workflow) and database state.
> BTW, this would also be lots of round trips to the database in your
> case. Finally, believe it or not, Hibernate attempts to do smart
> querying where it will actually retrieve only the IDs, then look for
> them in its second-level cache, if not found, go back to the database
> to get them.  This is a very common pattern across sql/nosql datastores.
>
> So it's not just about becoming more like jdbc but more about a
> low-level paradigm. To that I agree with you on one thing, the first
> thing you should do is create a 'baby hibernate' because I don't think
> gremlin should be an ORM (OGM?).
>
>
>
> Robert Dale
>
> On Thu, Dec 1, 2016 at 2:28 PM, pieter-gmail <[email protected]
> <mailto:[email protected]>> wrote:
>
>     Hi,
>
>     "So with ReferenceElements, latency will be less too because it takes
>     less time to construct the ReferenceVertex than it does to construct a
>     DetachedVertex. Imagine a vertex with 100 properties and meta
>     properties. ?!"
>
>     But ReferencedElement does not have the properties so more round trips
>     are needed increasing latency. One of the first things to make Sqlg at
>     all usable was to make sure that a Vertex contains all of its
>     properties. Else at least one more call is needed per Vertex. Its a
>     latency killer. For those mostly few cases where the Vertex is so fat
>     that it is slow to load and only a few properties are needed then
>     g.V().hasLabel("label").values("property1", "property2") is used.
>     So to
>     my mind ReferencedElement increases latency and does not decreases it.
>
>     Using ReferencedElement which is hardly an Element at all, after
>     all it
>     throws exceptions on almost all of its own interface, the user has to
>     get the properties manually and then is back in a world of Map and
>     Lists
>     of Maps.
>
>     A refactor of much existing code will need to toss the Vertex
>     notion all
>     together and replace it with Maps and Lists of Maps. Almost like
>     writing
>     an application in pure JDBC code with thousands of lines iterating
>     through ResultSets mapping things back and forth. Unless I am missing
>     something this change seems huge.
>
>     I get that all this is important for non java devs but it be a pity if
>     their problems becomes java devs problems.
>
>     Cheers
>     Pieter
>
>
>     On 01/12/2016 20:38, Marko Rodriguez wrote:
>     > Hi,
>     >
>     > *PIETER REPLIES:*
>     >
>     >> One of the first reasons I came to graphs, Neo4j and then
>     TinkerPop way
>     >> back was precisely because of the direct access to Node/Vertex.
>     The user
>     >> treats it like any other object, not a remote connection. It is the
>     >> embedded nature that makes life so easy. In a way it was like
>     having a
>     >> simplistic Hibernate as the core api. 99% of queries we write is to
>     >> retrieve vertices. Not Maps and Lists of something. TinkerPop's
>     own test
>     >> suite applies this type of thinking. Querying/modifying
>     Elements and
>     >> asserting them. Vertex and Edge abound as first class citizens.
>     >
>     > So Graph/Vertex/Edge/VertexProperty/Property will still exist for
>     > users as objects in the respective GLV language, it is just they are
>     > not “attached” and “rich.”
>     >
>     > For instance, in Gremlin-Python, you have:
>     >
>     >     v = g.V().next()
>     >     v.id <http://v.id>
>     >
>     > A ReferenceVertex contains the id of the vertex so you can always
>     > “re-attach” it to the source.
>     >
>     >     g.V(v).out()
>     >
>     >
>     >> Graph, Vertex and Edge is the primary abstraction that users
>     deal with.
>     >> Having the direct representation of this is very very nice.
>     >> It makes user code easy and readable.  You know you are dealing
>     with the
>     >> "Person/Address/Dog/This/That" entity/label as opposed to just a
>     >> decontextualized bunch of data, Maps and Lists. If
>     Vertex/Edge/Property
>     >> were to disappear I'd say it would be the first call of duty to
>     write a
>     >> baby hibernate to bring the property model back in again into
>     userspace.
>     >
>     > Again, the abstraction is still there, but just ALWAYS in a
>     detached form.
>     >
>     >>
>     >> Regarding jdbc, this kinda makes the point. Sqlg and Hibernate
>     and many
>     >> many other tools exists precisely so that users do not need to
>     use JDBC
>     >> with endless hardcoded strings guiding the application. Making
>     TinkerPop
>     >> more like JDBC is not an obvious plus point.
>     >
>     > So, RemoteConnection differs from JDBC in that its not a fat string,
>     > but RemoteConnection.submit(Bytecode). Thus, you still work at the
>     > GraphTraversal level in every GLV.
>     >
>     >> A ReferencedElement is also no good as the problem I experience is
>     >> latency not bandwidth.
>     >
>     > So with ReferenceElements, latency will be less too because it takes
>     > less time to construct the ReferenceVertex than it does to
>     construct a
>     > DetachedVertex. Imagine a vertex with 100 properties and meta
>     > properties. ?!
>     >
>     >> I reckon the experience and usage of TinkerPop is rather
>     different for
>     >> java and non java people and perhaps even java folks. Hopefully
>     I am not
>     >> the only one who have made such heavy happy use of the TinkerPop
>     >> property meta model and would be sad to see it go.
>     >>
>     >> Cheers
>     >> Pieter
>     >>
>     >
>     >
>     > *ROBERT REPLIES:*
>     >
>     >> I agree the focus should be on the Connection (being separate from
>     >> Graph) and Traversal. I wouldn't constrain it to
>     "RemoteConnection",
>     >> just Connection or GraphConnection. Perhaps there's an
>     >> EmbeddedConnection and a RemoteConnection or maybe it's
>     URI-oriented
>     >> similar to how JDBC does it. In either case, the behavior  of
>     Remote
>     >> and Embedded is the same which is what I think we're striving for.
>     >
>     > Yes. Good point. Just Connection.
>     >
>     >> I would also like to see Transactions be Connection-oriented. With
>     >> the right API, it could hook into JTA and be able to take advantage
>     >> of various annotations for marking transaction boundaries.
>     >
>     >     g = g.openTx()
>     >     g.V().out().out()
>     >     g.addV()
>     >     g.V(1).addE().to(2)
>     >     g.closeTx();
>     >
>     >
>     > ??? This way, its all about GraphTraversalSource/GraphTraversal.
>     That
>     > is truly the “connection” where the Connection implementation is
>     just
>     > provider/machine specific shuffling of Bytecode in and
>     Traversers out.
>     >
>     >> Are there features of a lambda that couldn't be replaced by a more
>     >> feature-rich gremlin?
>     >> g.V().out('knows').map{it.get().value('name') + ' is the friend
>     name'}
>     >> g.V().out('knows').map(lambda(concat(__.it.get().value('name'),
>     ' is
>     >> the friend name’))
>     >
>     > So we currently have the concept of g:Lambda and this allows for
>     > lambdas to be used remotely.
>     >
>     >     g.V().map(function(“it.get().label()”)) // Gremlin-Java
>     traversal
>     >     with a Gremlin-Groovy lambda.
>     >
>     >
>     > The crappy thing is that the lambda is always a String.
>     >
>     >> Reference-only makes total sense. This works really well especially
>     >> with a local cache or for use cases where most of the data is
>     stored
>     >> in a separate database. I think it would lend itself nicely to lazy
>     >> loading. When you need values there are options for that as well
>     >> (properties/values/valueMap).  One of the problems with 'attached'
>     >> elements is you don't know what the implementation does. So
>     >> potentially every get or set property call is going to the database
>     >> and you don't realize it. That can hurt performance and have
>     >> unintended consequences.
>     >
>     > Dude, I’ve been saying this forever. DetachedXXX is a bad idea
>     for the
>     > reasons you have stipulated. Just imagine:
>     >
>     >     g.V(1).out(‘knows')
>     >
>     >
>     > The GraphSON return is every vertex 1 knows and all its
>     properties and
>     > meta properties?!?! If you wanted that data too you would have
>     queried
>     > for it.
>     >
>     > Marko.
>     > --
>     > You received this message because you are subscribed to the Google
>     > Groups "Gremlin-users" group.
>     > To unsubscribe from this group and stop receiving emails from
>     it, send
>     > an email to [email protected]
>     <mailto:gremlin-users%[email protected]>
>     > <mailto:[email protected]
>     <mailto:gremlin-users%[email protected]>>.
>     > To view this discussion on the web visit
>     >
>     
> https://groups.google.com/d/msgid/gremlin-users/7CBD403D-4EC3-4B4B-AFF9-9A54B4D3C4EF%40gmail.com
>     
> <https://groups.google.com/d/msgid/gremlin-users/7CBD403D-4EC3-4B4B-AFF9-9A54B4D3C4EF%40gmail.com>
>     >
>     
> <https://groups.google.com/d/msgid/gremlin-users/7CBD403D-4EC3-4B4B-AFF9-9A54B4D3C4EF%40gmail.com?utm_medium=email&utm_source=footer
>     
> <https://groups.google.com/d/msgid/gremlin-users/7CBD403D-4EC3-4B4B-AFF9-9A54B4D3C4EF%40gmail.com?utm_medium=email&utm_source=footer>>.
>     > For more options, visit https://groups.google.com/d/optout
>     <https://groups.google.com/d/optout>.
>
>     --
>     You received this message because you are subscribed to the Google
>     Groups "Gremlin-users" group.
>     To unsubscribe from this group and stop receiving emails from it,
>     send an email to [email protected]
>     <mailto:gremlin-users%[email protected]>.
>     To view this discussion on the web visit
>     
> https://groups.google.com/d/msgid/gremlin-users/79132fdd-f67f-5c3c-f8e3-87ab80f3c6f9%40gmail.com
>     
> <https://groups.google.com/d/msgid/gremlin-users/79132fdd-f67f-5c3c-f8e3-87ab80f3c6f9%40gmail.com>.
>     For more options, visit https://groups.google.com/d/optout
>     <https://groups.google.com/d/optout>.
>
>
> -- 
> You received this message because you are subscribed to the Google
> Groups "Gremlin-users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to [email protected]
> <mailto:[email protected]>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/gremlin-users/CABed_4qE89f4oqZPQGjRXP8hn4kQpqVUiE%3DGq%2Bnvu_XfTQ_mWw%40mail.gmail.com
> <https://groups.google.com/d/msgid/gremlin-users/CABed_4qE89f4oqZPQGjRXP8hn4kQpqVUiE%3DGq%2Bnvu_XfTQ_mWw%40mail.gmail.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

Re: [TinkerPop] gremlin-x

Reply via email to