Clearly we have different use cases. You prefer your model to be that of the underlying graph (following that logic, you would use Hibernate to map to Table objects?) and I prefer using application domain models.
You prefer your query to return the underlying graph model and I prefer it to return any data. You prefer your query to always return all properties and I prefer it to always return only selected properties. You prefer your objects to be proxies to the underlying datastore (I think this blurs the lines between being a graph provider and gremlin client) and I prefer my objects to be detached with load/store being explicit. In the end, it sounds like you want gremlin to be an object-graph mapper in the graph model and I prefer a layered approach where gremlin is a simple query language of which an object-graph mapper, in any domain model, could be built on top (like so many other query languages). So I guess we'll just have to agree to disagree. Robert Dale On Fri, Dec 2, 2016 at 10:30 AM, pieter-gmail <[email protected]> wrote: > Hi, > > Let me disagree with your disagreement ;-) > > Regarding Neo4j > > I am talking about Neo4j embedded. The node/vertex is pretty much the > database already being a direct pointer to the node on disc with its > properties right next to it on disc. I would be surprised if all the > properties are not also already in its hot cache. I am speculating about > the internals but when coding in Neo4j embedded you don't care about > pre-loading all or some properties for performance reasons, just load > the node and all is well. Its the beauty of embedded Neo4j, latency is > just not a concern and the node represents a instance of a label. > > It would be interesting to execute TinkerPop Neo4j's structure and > process test suites via gremlin server and compare the performance to > embedded. I don't really have a clue what to expect. If every property > access is to be a call via GremlinServer I reckon things will slow down > significantly. The suite is composed with the implicit assumption that > property access is not something to think about. > > Regarding Hibernate. I have not worked with Hibernate for some time so > ran a test to make sure. > > EntityManager entityManager = > entityManagerFactory.createEntityManager(); > entityManager.getTransaction().begin(); > int count = 100; > for (int i = 1; i < count + 1; i++) { > Person person = new Person("person_" + i); > entityManager.persist(person); > } > entityManager.getTransaction().commit(); > entityManager.close(); > > entityManager = entityManagerFactory.createEntityManager(); > Person person = entityManager.find(Person.class, 1L); > assertNotNull(person); > assertEquals("person_1", person.getName()); > > The entityManager.find(Person.class, 1L) resulted in the following sql. > > "select person0_.id as id1_5_0_, person0_.name as name2_5_0_ from Person > person0_ where person0_.id=?" > > I did not ask for the name property, it returned it anyways as well it > should. If every property needs to be gotten separately then latency > will kill the app. > If the user has to ask for every property individually, well then part > of the point of Hibernate disappears. > > RE: "Vertex is just a map wrapper" > But its not just any map, its a Vertex, a core notion of the property > graph model. > > RE: "I don't know anyone who wants to deal with Vertex/Edges" > We probably live in our own bubbles but I don't know anyone who would > not want to deal with the core abstractions of the property graph model > and rather deal with Maps, except perhaps Json/Javascript folks :-) > > The property graph model and graph traversals are all about vertices and > edge traversals, having that right there as a first class citizen in > code is great. > > RE: in hibernate "If I set a property, it does not automatically persist > it to the database." > True but its also the cause of pain with hibernate altogether bypassing > the databases concurrency model with it optimistic locking. And voilla > you are stuck with lets just ignore the exception and retry and hope we > get lucky this time round logic. For what its worth setting a property > on Sqlg runs a update statement. Alas a very good reason why Hibernate > does what it does is because their way reduces latency being able to run > batch updates on commit or flush. Sqlg supports batch updates but its > not the default. > > RE: "In your model, there is no difference between transient, in-memory > state (e.g. workflow) and database state." > Not sure what you mean here. If you mean application writers keeping > their own cache of persistent data then you are right. Rule #1 of > caching is don't cache. Rule #2 is don't cache the cache. Caching is a > solution to a weakness elsewhere. I am not saying don't ever cache but > that if you can avoid it do so. Writing transactional caches is also a > rather specialized and difficult exercise and precisely what databases > are all about. > > Lastly, to make sure we are talking about the same change, are you > proposing that all gremlins like > > GraphTraversal<Vertex, Vertex> vertices = > this.sqlgGraph.traversal().V().out(); > > should become > > GraphTraversal<Vertex, Map<String, Object>> vertexProperties = > this.sqlgGraph.traversal().V().out().valueMap(); > > or worse > > GraphTraversal<Vertex, Map<String, Object>> vertexProperties = > this.sqlgGraph.traversal().V().out().values("propery1", "propety2", > "property3"); > > Cheers > Pieter > > > > > > On 02/12/2016 14:57, Robert Dale wrote: > > Pieter, while I think Marko may be onto something, I just want to > > completely disagree with you as a Java dev. ;-) > > > > First, in Neo4j's impl, from what I can tell the elements are not > > fully loaded. Every get (getProperty, edges, etc) does a query to the > > database. This is more round trips to the database. So this is why I > > made the statement that implementations are different. In your sqlg > > case, you are basically arguing that the default behavior is the sql > > equivalent of SELECT *. This is not a good practice. Then you go on > > to say that if the dev is aware that this is a 'fat' element, they > > should ask for exact properties. I think what we're arguing is that > > the default behavior should be 'always ask for exact properties'. This > > is the most accepted practice in querying any database, sql, nosql, > > mongodb, cassandra, etc. > > > > That leads us to your Hibernate comment. In the abstract sense, > > Vertex is just a map wrapper. I think you're just splitting hairs > > trying to distinguish a Dog Vertex and a Dog Map. In either case, you > > would have to query the label. In any case, I don't know anyone who > > wants to deal with Vertex/Edges. What most devs deal with, in my > > experience, is a domain-specific model. So whether I get back a > > Vertex or a Map, either way, I'm going to translate that to my domain > > model. Also, in hibernate, when I get a property I didn't query for, > > I will get a null. If I set a property, it does not automatically > > persist it to the database. In your model, there is no difference > > between transient, in-memory state (e.g. workflow) and database state. > > BTW, this would also be lots of round trips to the database in your > > case. Finally, believe it or not, Hibernate attempts to do smart > > querying where it will actually retrieve only the IDs, then look for > > them in its second-level cache, if not found, go back to the database > > to get them. This is a very common pattern across sql/nosql datastores. > > > > So it's not just about becoming more like jdbc but more about a > > low-level paradigm. To that I agree with you on one thing, the first > > thing you should do is create a 'baby hibernate' because I don't think > > gremlin should be an ORM (OGM?). > > > > > > > > Robert Dale > > > > On Thu, Dec 1, 2016 at 2:28 PM, pieter-gmail <[email protected] > > <mailto:[email protected]>> wrote: > > > > Hi, > > > > "So with ReferenceElements, latency will be less too because it takes > > less time to construct the ReferenceVertex than it does to construct > a > > DetachedVertex. Imagine a vertex with 100 properties and meta > > properties. ?!" > > > > But ReferencedElement does not have the properties so more round > trips > > are needed increasing latency. One of the first things to make Sqlg > at > > all usable was to make sure that a Vertex contains all of its > > properties. Else at least one more call is needed per Vertex. Its a > > latency killer. For those mostly few cases where the Vertex is so fat > > that it is slow to load and only a few properties are needed then > > g.V().hasLabel("label").values("property1", "property2") is used. > > So to > > my mind ReferencedElement increases latency and does not decreases > it. > > > > Using ReferencedElement which is hardly an Element at all, after > > all it > > throws exceptions on almost all of its own interface, the user has to > > get the properties manually and then is back in a world of Map and > > Lists > > of Maps. > > > > A refactor of much existing code will need to toss the Vertex > > notion all > > together and replace it with Maps and Lists of Maps. Almost like > > writing > > an application in pure JDBC code with thousands of lines iterating > > through ResultSets mapping things back and forth. Unless I am missing > > something this change seems huge. > > > > I get that all this is important for non java devs but it be a pity > if > > their problems becomes java devs problems. > > > > Cheers > > Pieter > > > > > > On 01/12/2016 20:38, Marko Rodriguez wrote: > > > Hi, > > > > > > *PIETER REPLIES:* > > > > > >> One of the first reasons I came to graphs, Neo4j and then > > TinkerPop way > > >> back was precisely because of the direct access to Node/Vertex. > > The user > > >> treats it like any other object, not a remote connection. It is > the > > >> embedded nature that makes life so easy. In a way it was like > > having a > > >> simplistic Hibernate as the core api. 99% of queries we write is > to > > >> retrieve vertices. Not Maps and Lists of something. TinkerPop's > > own test > > >> suite applies this type of thinking. Querying/modifying > > Elements and > > >> asserting them. Vertex and Edge abound as first class citizens. > > > > > > So Graph/Vertex/Edge/VertexProperty/Property will still exist for > > > users as objects in the respective GLV language, it is just they > are > > > not “attached” and “rich.” > > > > > > For instance, in Gremlin-Python, you have: > > > > > > v = g.V().next() > > > v.id <http://v.id> > > > > > > A ReferenceVertex contains the id of the vertex so you can always > > > “re-attach” it to the source. > > > > > > g.V(v).out() > > > > > > > > >> Graph, Vertex and Edge is the primary abstraction that users > > deal with. > > >> Having the direct representation of this is very very nice. > > >> It makes user code easy and readable. You know you are dealing > > with the > > >> "Person/Address/Dog/This/That" entity/label as opposed to just a > > >> decontextualized bunch of data, Maps and Lists. If > > Vertex/Edge/Property > > >> were to disappear I'd say it would be the first call of duty to > > write a > > >> baby hibernate to bring the property model back in again into > > userspace. > > > > > > Again, the abstraction is still there, but just ALWAYS in a > > detached form. > > > > > >> > > >> Regarding jdbc, this kinda makes the point. Sqlg and Hibernate > > and many > > >> many other tools exists precisely so that users do not need to > > use JDBC > > >> with endless hardcoded strings guiding the application. Making > > TinkerPop > > >> more like JDBC is not an obvious plus point. > > > > > > So, RemoteConnection differs from JDBC in that its not a fat > string, > > > but RemoteConnection.submit(Bytecode). Thus, you still work at the > > > GraphTraversal level in every GLV. > > > > > >> A ReferencedElement is also no good as the problem I experience is > > >> latency not bandwidth. > > > > > > So with ReferenceElements, latency will be less too because it > takes > > > less time to construct the ReferenceVertex than it does to > > construct a > > > DetachedVertex. Imagine a vertex with 100 properties and meta > > > properties. ?! > > > > > >> I reckon the experience and usage of TinkerPop is rather > > different for > > >> java and non java people and perhaps even java folks. Hopefully > > I am not > > >> the only one who have made such heavy happy use of the TinkerPop > > >> property meta model and would be sad to see it go. > > >> > > >> Cheers > > >> Pieter > > >> > > > > > > > > > *ROBERT REPLIES:* > > > > > >> I agree the focus should be on the Connection (being separate from > > >> Graph) and Traversal. I wouldn't constrain it to > > "RemoteConnection", > > >> just Connection or GraphConnection. Perhaps there's an > > >> EmbeddedConnection and a RemoteConnection or maybe it's > > URI-oriented > > >> similar to how JDBC does it. In either case, the behavior of > > Remote > > >> and Embedded is the same which is what I think we're striving for. > > > > > > Yes. Good point. Just Connection. > > > > > >> I would also like to see Transactions be Connection-oriented. With > > >> the right API, it could hook into JTA and be able to take > advantage > > >> of various annotations for marking transaction boundaries. > > > > > > g = g.openTx() > > > g.V().out().out() > > > g.addV() > > > g.V(1).addE().to(2) > > > g.closeTx(); > > > > > > > > > ??? This way, its all about GraphTraversalSource/GraphTraversal. > > That > > > is truly the “connection” where the Connection implementation is > > just > > > provider/machine specific shuffling of Bytecode in and > > Traversers out. > > > > > >> Are there features of a lambda that couldn't be replaced by a more > > >> feature-rich gremlin? > > >> g.V().out('knows').map{it.get().value('name') + ' is the friend > > name'} > > >> g.V().out('knows').map(lambda(concat(__.it.get().value('name'), > > ' is > > >> the friend name’)) > > > > > > So we currently have the concept of g:Lambda and this allows for > > > lambdas to be used remotely. > > > > > > g.V().map(function(“it.get().label()”)) // Gremlin-Java > > traversal > > > with a Gremlin-Groovy lambda. > > > > > > > > > The crappy thing is that the lambda is always a String. > > > > > >> Reference-only makes total sense. This works really well > especially > > >> with a local cache or for use cases where most of the data is > > stored > > >> in a separate database. I think it would lend itself nicely to > lazy > > >> loading. When you need values there are options for that as well > > >> (properties/values/valueMap). One of the problems with 'attached' > > >> elements is you don't know what the implementation does. So > > >> potentially every get or set property call is going to the > database > > >> and you don't realize it. That can hurt performance and have > > >> unintended consequences. > > > > > > Dude, I’ve been saying this forever. DetachedXXX is a bad idea > > for the > > > reasons you have stipulated. Just imagine: > > > > > > g.V(1).out(‘knows') > > > > > > > > > The GraphSON return is every vertex 1 knows and all its > > properties and > > > meta properties?!?! If you wanted that data too you would have > > queried > > > for it. > > > > > > Marko. > > > -- > > > You received this message because you are subscribed to the Google > > > Groups "Gremlin-users" group. > > > To unsubscribe from this group and stop receiving emails from > > it, send > > > an email to [email protected] > > <mailto:gremlin-users%[email protected]> > > > <mailto:[email protected] > > <mailto:gremlin-users%[email protected]>>. > > > To view this discussion on the web visit > > > > > https://groups.google.com/d/msgid/gremlin-users/7CBD403D- > 4EC3-4B4B-AFF9-9A54B4D3C4EF%40gmail.com > > <https://groups.google.com/d/msgid/gremlin-users/7CBD403D- > 4EC3-4B4B-AFF9-9A54B4D3C4EF%40gmail.com> > > > > > <https://groups.google.com/d/msgid/gremlin-users/7CBD403D- > 4EC3-4B4B-AFF9-9A54B4D3C4EF%40gmail.com?utm_medium=email&utm_source=footer > > <https://groups.google.com/d/msgid/gremlin-users/7CBD403D- > 4EC3-4B4B-AFF9-9A54B4D3C4EF%40gmail.com?utm_medium=email&utm_source=footer > >>. > > > For more options, visit https://groups.google.com/d/optout > > <https://groups.google.com/d/optout>. > > > > -- > > You received this message because you are subscribed to the Google > > Groups "Gremlin-users" group. > > To unsubscribe from this group and stop receiving emails from it, > > send an email to [email protected] > > <mailto:gremlin-users%[email protected]>. > > To view this discussion on the web visit > > https://groups.google.com/d/msgid/gremlin-users/79132fdd- > f67f-5c3c-f8e3-87ab80f3c6f9%40gmail.com > > <https://groups.google.com/d/msgid/gremlin-users/79132fdd- > f67f-5c3c-f8e3-87ab80f3c6f9%40gmail.com>. > > For more options, visit https://groups.google.com/d/optout > > <https://groups.google.com/d/optout>. > > > > > > -- > > You received this message because you are subscribed to the Google > > Groups "Gremlin-users" group. > > To unsubscribe from this group and stop receiving emails from it, send > > an email to [email protected] > > <mailto:[email protected]>. > > To view this discussion on the web visit > > https://groups.google.com/d/msgid/gremlin-users/CABed_ > 4qE89f4oqZPQGjRXP8hn4kQpqVUiE%3DGq%2Bnvu_XfTQ_mWw%40mail.gmail.com > > <https://groups.google.com/d/msgid/gremlin-users/CABed_ > 4qE89f4oqZPQGjRXP8hn4kQpqVUiE%3DGq%2Bnvu_XfTQ_mWw%40mail. > gmail.com?utm_medium=email&utm_source=footer>. > > For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "Gremlin-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/gremlin-users/31db7eef-046b-465f-13ea-0044a10da18c%40gmail.com. > For more options, visit https://groups.google.com/d/optout. >
