Re: [DISCUSS] Limiting the use of Lambdas in TinkerPop 4.

2024-07-06 Thread Joshua Shinavier
Hi Valentyn,

I agree that lambdas in their current form will need to be retired, for the
reason you mention: they do not currently have a common representation
which can be shared among language variants. Since we are talking about
TinkerPop 4, though, I think it is worth talking about whether lambdas, in
some *other* form, can be carried forward.

In the context of lambda calculus, a lambda has the structure (variable,
term), where any occurrences of the variable in the term are bound to the
same type and value. In TinkerPop 3, the body of the lambda is opaque,
whereas in TP 4 it would need to have a native representation in
gremlin-language. The problem is that gremlin-language right now isn't
really expressive enough; it covers the syntax of Gremlin queries
perfectly, but it is not a very complete data representation language. GL
contains a simple production for "variable", but for terms, it only
contains productions for various literals. Since literals cannot contain
variables, there is no possibility of representing a lambda body. Variables
only occur in the various xxxArgument productions which are provided to
traversal methods.

The difference between what gremlin-language contains now, and what we
would need in order to carry lambdas forward, is a "term" (or "expression",
etc.) production in GL. At a minimum, the "term" production must be a
disjunction between variables, functions (including lambdas), literals, and
applications. I would also add records (which can be seen as a
generalization of genericLiteralMap) and variants to that list.
Applications are also absent from gremlin-language; while lambdas allow you
to introduce functions, applications eliminate them. E.g. the lambda \x.x+x
introduces a function which doubles its argument, while the term
(\x.x+x)(21) applies the function to an argument of 21, producing 42.

In my opinion, two other major missing pieces in gremlin-language, which
would fit in well with native lambdas, are a lexical environment (e.g.
allowing the user to bind the function \x.x+x to a name like "timesTwo")
and a sublanguage for datatypes. Happy to discuss further if the dev group
has interest in exploring this direction.

Josh










On Fri, Jul 5, 2024 at 8:15 AM Valentin Kagamlyk <
valentin.kagam...@gmail.com> wrote:

> Hi all,
>
> Lambdas are not supported in Grammar because they themselves are an entire
> language, based on Groovy syntax. Current implementation in Gremlin Server
> allows Lambdas arbitrary code execution with access to the entire JVM,
> which is not optimizable by providers. Also this leads to security problems
> that, despite numerous attempts to secure the Groovy sandbox, have not been
> successful.
>
> Since we decided gremlin-language will be the default in Gremlin Server,
> the choice to remove Lambda syntax for non-JVM languages and in scripts
> sent over HTTP makes sense because this will allow us to have the same
> capabilities for different script engines and increase security of Gremlin
> Server. Existing scripts that use Lambdas anyway wont work in TinkerPop 4
> because default will be gremlin-language.
>
> Lambdas can remain for embedded use cases as a form of added/sugar syntax
> for steps where they have made sense since their inception.
>
> Happy to hear any thoughts on the matter.
>
> Regards,
> Valentyn
>


Re: [Discuss] Type system in TinkerPop

2024-01-28 Thread Joshua Shinavier
Hi Valentin,

I agree with the sentiment, and I have a solution you might be interested
in. You might be able to grok the property graph validation test cases here
.
If you have ever heard me talking about algebraic property graphs (paper
), this is a special case of that type
system, implemented for the JVM using Hydra's LambdaGraph data model. Also
check out the typed property graph model here
;
I think you can see past the unfamiliar syntax to understand the notion of
property graph data / type conformance which is enforced:

   - A *property* is a string-valued key together with a property value.
   Properties are validated against *property types*, which are
   string-valued keys together with some primitive data type. Primitive data
   types and values are parameterized so that different applications can use
   their own. Property types also have a built-in optionality or requiredness
   parameter.
   - A *vertex* has a string-valued label, an id and a key/value map of
   property keys to values. Vertices are validated against *vertex types*,
   which mirror the structure of a vertex: a vertex type has a label, an id
   type, and a list of property types.
   - An *edge* has a string-valued label, an id, an out-vertex label, an
   in-vertex label, and a map of property keys to values. Edges are validated
   against *edge types*, which mirror the structure of an edge: an edge
   type has a label, an id type, an out-vertex label, an in-vertex label, and
   a list of property types.
   - A *graph* is a map of ids to vertices, together with a map of ids to
   edges (i.e. vertex ids and edge ids are unique in the graph. The latter
   constraint is relaxed for graphs which do not care about edge ids). Graphs
   are validated against *graph schemas*, which map vertex labels to vertex
   types, and edge labels to edge types.

As I mentioned, this approach is agnostic to the actual set of primitive
values and types which an application uses. In my team's work at LinkedIn
and Microsoft, we have been using Hydra's built-in Literal

and LiteralType
.
No integration with Gremlin yet. For Gremlin, it would be worthwhile to
standardize on a set of primitive types which is well aligned with the JVM
types we use in practice. I believe there was a thread about this, with an
associated proposal, a year or three ago -- I can't find it at the moment.
If anyone remembers, please post a link. A couple of other recent threads
on types for TinkerPop are this one
 and this
one .

Best,

Josh



On Tue, Jan 23, 2024 at 3:26 PM Valentin Kagamlyk <
valentin.kagam...@gmail.com> wrote:

> Hi all,
>
> Now in embedded graph technically possible to use any JVM type, but for
> network transmission this set is more limited. Also there are some
> differences in GLV's, for example number handling in Python and Javascript.
>
> There are 2 main categories of Gremlin types:
> - some needed to transfer data over the wire, for example Graph Elements,
> lots of enums and other utility types like ByteCode and Bindings.
> - types designed to store data as values in a graph, like labels, property
> values, etc. This includes most simple types like numbers, strings, dates,
> and some composite types like collections (List, Set, Map).
> Some types like Integer used everywhere.
>
> Restricting values to a limited set of allowed types helps to decouple
> gremlin from its JVM foundation, and allows us to ensure that value types
> are handled consistently across all gremlin language variants.
>
> I would like to discuss whether it makes sense to put such restrictions on
> element property values?
> If yes, what types to allow to be used as element property values?
>


Re: [DISCUSS] Is null equal to null

2023-08-15 Thread Joshua Shinavier
Ah yes, this is a good starting point. Per the doc, null == null is true in
a fairly limited way, as null is considered to be an instance of the
trivial nulltype, and nothing else. I would look to expand/revise this in a
few ways:

   - Decouple the type system from Java while maintaining interoperability
   - Add explicit support for optional types. Remove nulltype unless there
   are well defined use cases for it. The null / unit type can easily be seen
   as a special case of other complex types.
   - Clarify the List, Set, and Map types as being parameterized by other
   types (e.g map)
   - Generalize Map.Entry to Pair, or even Product.
   - Think carefully about Path. A path can be considered as a relation
   with an in-type and an out-type which can be composed with other paths such
   that the out-type of one is the in-type of the other.
   - Consider distinguishing between signed and unsigned integer types
   - Possibly move Date and UUID out of the core type system, and support
   them instead as domain-specific types
   - Define vertex, edge, and graph types (schemas) as I have done in the
   typed PG model I linked: parameterized by a type of types, and with
   particular type system such as the one just described (which might be
   called TinkerPop Core or such) as an implementation detail. Property types
   would just be the pairing of a property key with a type, possibly with
   special support for optionals.


Josh



On Tue, Aug 15, 2023 at 11:49 AM Cole Greer
 wrote:

> Hi everyone,
>
> I just remembered we do have a well-documented set of rules regarding
> comparability and equality of nulls and mismatched types which is relevant
> to this discussion. It is part of the provider docs here:
> https://tinkerpop.apache.org/docs/current/dev/provider/#gremlin-semantics-equality-comparability.
> We may still have some work to do to consistently follow our own rules,
> although the consensus here seems to agree with the docs that null==null is
> true. I agree that this would be worth revisiting if a type system is being
> introduced to TinkerPop.
>
> Josh, I look forward to hearing more on your proposed type system once it
> is ready.
>
> Regards,
>
> Cole
>
> From: Joshua Shinavier 
> Date: Tuesday, August 8, 2023 at 7:23 AM
> To: dev@tinkerpop.apache.org 
> Subject: Re: [DISCUSS] Is null equal to null
> Hi Dave,
>
> I declined to add my name to that paper. I worked closely with most of the
> authors for about 7 months in 2021 when I led a PGSWG subgroup on property
> types. We met weekly (minutes
> <
> https://docs.google.com/document/d/1-YcfzgCJ5zXzDq_lL0EfMzx-9M_DM5pGJ66hzfYLR_A/edit
> >)
> to define a type system for properties and, by extension, vertices and
> edges. This was a pretty interesting time, with vigorous debate about
> nominative vs. structural types, schema on write vs. schema on read, but
> there were still a lot of unresolved questions when we paused the working
> group. The paper was written a year later in the space of a few weeks -- I
> was invited to be involved, but didn't have time, and didn't approve of a
> couple of aspects of the paper draft -- above all, that the proposal
> ignored the still-open, still-relevant questions about the formalism in
> favor of just "getting something out there". Some of my influences made it
> in -- e.g. the graph data model feature matrix, which I introduced at the
> 2019 Dagstuhl seminar. So, it's a paper by my friends and colleagues which
> captures a lot of the major concerns of the working group and of schemas
> for property graphs -- I am just not satisfied with the actual formalism,
> and don't see it as definitive. My preference these days is to map graph
> schemas into some other, well-studied formalism like typed lambda calculi
> (in the case of Lambda Graph) rather than creating a type system
> specifically for property graphs as in that paper, or even as in APG. That
> gives you a lot more freedom to introduce variants of the data model (e.g.
> if an application needs constraints on property values like min/max, regex,
> etc., it is a short step from a property graph model based on System F to
> one with dependent types). I am also cautious of making any concessions to
> SQL which would weaken the type system, e.g. by including nulls in
> primitive types.
>
> Josh
>
>
>
> On Mon, Aug 7, 2023 at 3:09 PM David Bechberger 
> wrote:
>
> > Hello Ken,
> >
> > I don't know that I have a strong opinion on what NULL==NULL should
> > evaluate to, but I agree we should come up with a set of rules here for
> > consistency, both within Gremlin but also with other database language
> > standards (e.g. GQL and SQL) so that Gremlin best matches customer
> > expectations.  Greml

Re: [DISCUSS] Is null equal to null

2023-08-08 Thread Joshua Shinavier
Hi Dave,

I declined to add my name to that paper. I worked closely with most of the
authors for about 7 months in 2021 when I led a PGSWG subgroup on property
types. We met weekly (minutes
<https://docs.google.com/document/d/1-YcfzgCJ5zXzDq_lL0EfMzx-9M_DM5pGJ66hzfYLR_A/edit>)
to define a type system for properties and, by extension, vertices and
edges. This was a pretty interesting time, with vigorous debate about
nominative vs. structural types, schema on write vs. schema on read, but
there were still a lot of unresolved questions when we paused the working
group. The paper was written a year later in the space of a few weeks -- I
was invited to be involved, but didn't have time, and didn't approve of a
couple of aspects of the paper draft -- above all, that the proposal
ignored the still-open, still-relevant questions about the formalism in
favor of just "getting something out there". Some of my influences made it
in -- e.g. the graph data model feature matrix, which I introduced at the
2019 Dagstuhl seminar. So, it's a paper by my friends and colleagues which
captures a lot of the major concerns of the working group and of schemas
for property graphs -- I am just not satisfied with the actual formalism,
and don't see it as definitive. My preference these days is to map graph
schemas into some other, well-studied formalism like typed lambda calculi
(in the case of Lambda Graph) rather than creating a type system
specifically for property graphs as in that paper, or even as in APG. That
gives you a lot more freedom to introduce variants of the data model (e.g.
if an application needs constraints on property values like min/max, regex,
etc., it is a short step from a property graph model based on System F to
one with dependent types). I am also cautious of making any concessions to
SQL which would weaken the type system, e.g. by including nulls in
primitive types.

Josh



On Mon, Aug 7, 2023 at 3:09 PM David Bechberger  wrote:

> Hello Ken,
>
> I don't know that I have a strong opinion on what NULL==NULL should
> evaluate to, but I agree we should come up with a set of rules here for
> consistency, both within Gremlin but also with other database language
> standards (e.g. GQL and SQL) so that Gremlin best matches customer
> expectations.  Gremlin's divergence from user expectations when it comes to
> null handling has been a constant headache for new users.  While I agree
> with Josh that a type system would make this easier, we still need to be
> consistent until we cross that bridge.
>
> For example, if you have a list, A, which is
> [1,2,null] and a list, B, which is [1,null]. Should the result of an
> INTERSECT be [1,null] or [1]
>
> In Postgres, this would be [1,null] so that is probably what I would
> recommend unless someone has a stronger opinion to do something different?
>
> Josh, I am familiar with the work you did on Dragon, and I am curious how
> you see your work aligning with the recent SIGMOD paper [1] from the LDBC
> working group on PG Schema?
>
> Dave
>
> [1] https://arxiv.org/abs/2211.10962
>
>
> On Sat, Aug 5, 2023 at 7:02 AM Joshua Shinavier  wrote:
>
> > Hi Ken,
> >
> > Yes indeed, there is that push. I am not saying that Gremlin shouldn't
> have
> > a type system -- just that certain questions will have better answers
> once
> > it does. While I am not drawing a lot of attention to it yet in
> connection
> > with TinkerPop, there is a type system I am going to propose for
> TinkerPop.
> > The formalism is called Lambda Graph, and it is closely related to the
> > Algebraic Property Graphs [1] model which was implemented by Dragon [2].
> I
> > made a big deal about Dragon three years ago and then was unable to
> release
> > it, so I'm waiting until Hydra [3] is completely ready before promoting
> it
> > here. That said, it's not far from being ready. We are building property
> > graph (not yet TinkerPop) applications with it at LinkedIn. I recently
> gave
> > a presentation [4] on the data model which has excerpts from the Lambda
> > Graph paper draft. In terms of property types, probably the first thing I
> > will explore is integrating Hydra's "TinkerPop" model [5] with TinkerPop
> > proper. In that model, property types are parameterized and unspecified,
> as
> > are vertex and edge id types; different type systems for properties and
> ids
> > can be plugged in here. For Hydra's core type system, see hydra/core.Type
> > [6]. This type system behaves as I described above: there are no "nulls",
> > but there are optionals, which are comparable to the extent that the base
> > type is comparable.
> >
> > Josh
> >
> > [1]

Re: [DISCUSS] Is null equal to null

2023-08-05 Thread Joshua Shinavier
Hi Ken,

Yes indeed, there is that push. I am not saying that Gremlin shouldn't have
a type system -- just that certain questions will have better answers once
it does. While I am not drawing a lot of attention to it yet in connection
with TinkerPop, there is a type system I am going to propose for TinkerPop.
The formalism is called Lambda Graph, and it is closely related to the
Algebraic Property Graphs [1] model which was implemented by Dragon [2]. I
made a big deal about Dragon three years ago and then was unable to release
it, so I'm waiting until Hydra [3] is completely ready before promoting it
here. That said, it's not far from being ready. We are building property
graph (not yet TinkerPop) applications with it at LinkedIn. I recently gave
a presentation [4] on the data model which has excerpts from the Lambda
Graph paper draft. In terms of property types, probably the first thing I
will explore is integrating Hydra's "TinkerPop" model [5] with TinkerPop
proper. In that model, property types are parameterized and unspecified, as
are vertex and edge id types; different type systems for properties and ids
can be plugged in here. For Hydra's core type system, see hydra/core.Type
[6]. This type system behaves as I described above: there are no "nulls",
but there are optionals, which are comparable to the extent that the base
type is comparable.

Josh

[1] https://arxiv.org/abs/1909.04881
[2] https://www.uber.com/blog/dragon-schema-integration-at-uber-scale/
[3] https://github.com/CategoricalData/hydra
[4]
https://docs.google.com/presentation/d/1PF0K3KtopV0tMVa0sGBW2hDA7nw-cSwQm6h1AED1VSA
[5]
https://categoricaldata.github.io/hydra/hydra-java/javadoc/hydra/langs/tinkerpop/propertyGraph/package-summary.html
[6]
https://categoricaldata.github.io/hydra/hydra-java/javadoc/hydra/core/Type.html


On Fri, Aug 4, 2023 at 6:23 PM Ken Hu  wrote:

> Hi Josh,
>
> Thanks for your input. There seems to be a push in the graph database world
> towards having a schema. It's likely something like this would be
> introduced in TinkerPop in the future. Let's assume that TinkerPop does
> support schemas, and therefore would have a type system, would this change
> your opinion on the matter?
>
> Thanks again,
> Ken
>
> On Wed, Aug 2, 2023 at 3:54 PM Joshua Shinavier  wrote:
>
> > For what it is worth, I think the question of whether null == null is
> only
> > meaningful in the context of a specific type system, which Gremlin so far
> > does not provide. My personal preference is to avoid SQL-style nulls and
> > achieve optionality through union types (e.g. Java's Optional or
> Haskell's
> > Maybe). In the case of two lists, if you can assume that the type of the
> > list is list>, then you can safely treat null like
> > Optional.empty(), and compare it with another null of the same logical
> type
> > (int). If that is the interpretation of your two lists, then the
> > intersection is [1, null].
> >
> > Josh
> >
> >
> >
> > On Tue, Aug 1, 2023 at 5:47 PM Ken Hu 
> > wrote:
> >
> > > Hi All,
> > >
> > > As Gremlin evolves and gains more functionality, it is important that
> we
> > > establish some fundamental rules to provide consistency in results. One
> > > such question that we should come to agreement on is how null values
> are
> > > compared. Currently, Gremlin seems to mostly follow the comparison that
> > is
> > > used in Java where NULL == NULL returns TRUE. However, in many other
> > > database systems, NULL == NULL would return FALSE (or NULL).
> > >
> > > This question comes about as I'm starting to look a little deeper into
> > the
> > > proposed list functions. An example of where this is applicable is the
> > > INTERSECT list function. For example, if you have a list, A, which is
> > > [1,2,null] and a list, B, which is [1,null]. Should the result of an
> > > INTERSECT be [1,null] or [1]?
> > >
> > > I think it makes sense in Gremlin for us to follow the rule that most
> > > programming languages follow which is the former (NULL == NULL returns
> > > TRUE) because it feels more in line with how Gremlin was meant to be
> used
> > > (together with your code rather than as a string query). In this case
> the
> > > return value would be [1,null].
> > >
> > > What are your thoughts on this subject?
> > >
> > > Thanks,
> > > Ken
> > >
> >
>


Re: [DISCUSS] Is null equal to null

2023-08-02 Thread Joshua Shinavier
For what it is worth, I think the question of whether null == null is only
meaningful in the context of a specific type system, which Gremlin so far
does not provide. My personal preference is to avoid SQL-style nulls and
achieve optionality through union types (e.g. Java's Optional or Haskell's
Maybe). In the case of two lists, if you can assume that the type of the
list is list>, then you can safely treat null like
Optional.empty(), and compare it with another null of the same logical type
(int). If that is the interpretation of your two lists, then the
intersection is [1, null].

Josh



On Tue, Aug 1, 2023 at 5:47 PM Ken Hu  wrote:

> Hi All,
>
> As Gremlin evolves and gains more functionality, it is important that we
> establish some fundamental rules to provide consistency in results. One
> such question that we should come to agreement on is how null values are
> compared. Currently, Gremlin seems to mostly follow the comparison that is
> used in Java where NULL == NULL returns TRUE. However, in many other
> database systems, NULL == NULL would return FALSE (or NULL).
>
> This question comes about as I'm starting to look a little deeper into the
> proposed list functions. An example of where this is applicable is the
> INTERSECT list function. For example, if you have a list, A, which is
> [1,2,null] and a list, B, which is [1,null]. Should the result of an
> INTERSECT be [1,null] or [1]?
>
> I think it makes sense in Gremlin for us to follow the rule that most
> programming languages follow which is the former (NULL == NULL returns
> TRUE) because it feels more in line with how Gremlin was meant to be used
> (together with your code rather than as a string query). In this case the
> return value would be [1,null].
>
> What are your thoughts on this subject?
>
> Thanks,
> Ken
>


Re: [DISCUSS] Future of Neo4J-Gremlin

2023-07-22 Thread Joshua Shinavier
Thanks, Cole. I think that plan (deprecating Neo4j-Gremlin, but not
removing it until it becomes a nuisance) is a nice middle ground. Again
with the disclaimer that I am just one developer talking about his
applications, no: my projects which use Neo4j are old, and are not using
newer features of TinkerPop (though this is likely to change once
Hydra-Java [1] is ready for Gremlin interop). E.g. my personal knowledge
graph, Synchrony [2], has been in daily use since 2011, but the code tends
to be dormant for long periods. This is an application in maintenance mode
for which deleting Neo4j-Gremlin would mean the difference between not
needing to upgrade to the latest release of TinkerPop very often, and not
being able to (with the risk of eventually not being able to build). It may
or may not be representative of other legacy applications out there -- but
since Neo4j was a staple of TinkerPop for so long, I'd be more surprised if
there *aren't* a few other applications like it. Hard to say unless others
chime in. FWIW, I don't have any enterprise code which still uses
Neo4j-Gremlin.

Best regards,

Josh

[1] https://bit.ly/hydra-source
[2] https://github.com/synchrony


On Fri, Jul 21, 2023 at 5:10 PM Cole Greer 
wrote:

> Hi Josh,
>
> Thanks for responding. I think it makes sense to keep neo4j-gremlin around
> for that sort of use case. One question I have (mostly out of curiosity) is
> if you often find yourself using modern additions to TinkerPop (such as
> MergeV) with a legacy neo4j backend? I wonder if there are many users who
> would want all the latest features to TinkerPop and simply don’t mind if
> the backend graph is outdated.
>
> I still feel as if the right choice here is to mark neo4j-gremlin as
> deprecated though. Simply looking at the commit history shows that it has
> been on bare bones life support for years. Given the lack of investment in
> it and the fact that the primary dependency has dropped support, I don’t
> think we should be positioning neo4j-gremlin as something which is actively
> supported and that we recommend to users. I think that putting deprecation
> notices on it will give new users a more accurate representation of the
> current level of support for the plugin.
>
> That said I do agree that there is value in keeping neo4j-gremlin
> functioning if the maintenance burden is not too high. For that reason, I
> will withdraw my suggestion to eventually remove neo4j-gremlin from the
> repo. My suggestion is now that we mark it as deprecated but keep it in the
> repo including all the testing. We should maintain the current status quo
> of maintenance where on occasion, moderate effort will be made to ensure
> combability with new features. I think we could maintain neo4j-gremlin in
> this state for the foreseeable future, until such a time that a new feature
> in TinkerPop requires an unreasonable amount of effort to work with
> neo4j-gremlin. In such a situation we can reach out to the community to see
> if anyone is interested in taking on the work of updating neo4j-gremlin,
> and if not at that time it would need to be removed.
>
> Let me know your thoughts on this suggestion.
>
> Thanks,
>
> Cole
>
> From: Joshua Shinavier 
> Date: Thursday, July 20, 2023 at 4:21 PM
> To: dev@tinkerpop.apache.org 
> Subject: Re: [DISCUSS] Future of Neo4J-Gremlin
> Deprecating Neo4j support in Gremlin would definitely mark the end of an
> era. I would be inclined to retain the support so long as it does not
> represent a significant development burden, but this is only the
> perspective of a *user* of legacy applications built with Neo4j-Gremlin.
> There may be others who use Neo4j primarily as a back-end for TinkerPop,
> not the other way around, and who are content with Neo4j 3.4 indefinitely,
> or until legacy Neo4j does become unsupportable. If there aren't too many
> of these dinosaurs, however, then the support will not be missed.
>
> Josh
>
>
>
> On Thu, Jul 20, 2023 at 9:35 AM Cole Greer  .invalid>
> wrote:
>
> > Hi everyone,
> >
> > Just a quick update on this, I reached out to Michael Hunger (original
> > author of neo4j-tinkerpop-api-impl) and he confirmed that Neo4j has
> shifted
> > their focus away from gremlin support and they have no plans at this time
> > to make further contributions to the project. Due to a lack of interest
> > from both our community and Neo4j, I propose we deprecate neo4j-gremlin
> as
> > of TinkerPop 3.7.0 with the intention of removing the module from the
> repo
> > at a later date.
> >
> > Regards,
> >
> > Cole Greer
> >
> > From: Cole Greer 
> > Date: Friday, July 14, 2023 at 2:38 PM
> > To: Dev Tinkerpop 
> > Subject: [DISCUSS] Fu

Re: [DISCUSS] Future of Neo4J-Gremlin

2023-07-20 Thread Joshua Shinavier
Deprecating Neo4j support in Gremlin would definitely mark the end of an
era. I would be inclined to retain the support so long as it does not
represent a significant development burden, but this is only the
perspective of a *user* of legacy applications built with Neo4j-Gremlin.
There may be others who use Neo4j primarily as a back-end for TinkerPop,
not the other way around, and who are content with Neo4j 3.4 indefinitely,
or until legacy Neo4j does become unsupportable. If there aren't too many
of these dinosaurs, however, then the support will not be missed.

Josh



On Thu, Jul 20, 2023 at 9:35 AM Cole Greer 
wrote:

> Hi everyone,
>
> Just a quick update on this, I reached out to Michael Hunger (original
> author of neo4j-tinkerpop-api-impl) and he confirmed that Neo4j has shifted
> their focus away from gremlin support and they have no plans at this time
> to make further contributions to the project. Due to a lack of interest
> from both our community and Neo4j, I propose we deprecate neo4j-gremlin as
> of TinkerPop 3.7.0 with the intention of removing the module from the repo
> at a later date.
>
> Regards,
>
> Cole Greer
>
> From: Cole Greer 
> Date: Friday, July 14, 2023 at 2:38 PM
> To: Dev Tinkerpop 
> Subject: [DISCUSS] Future of Neo4J-Gremlin
> Hi everyone,
>
> With the addition of transaction support in TinkerGraph in 3.7, I think
> now is a good time to reassess the Neo4J-Gremlin module. To the best of my
> knowledge, Neo4J-Gremlin historically served 2 primary purposes. First it
> acted as a de facto reference implementation of gremlin transactions, and
> second it provided a convenient package for users to integrate Neo4J into
> the TinkerPop ecosystem (use of gremlin-console, server…). I don’t see a
> future for Neo4J-Gremlin in its current state in either of these roles.
> TinkerTransactionGraph will be filling the reference implementation role
> from now on, and the module has fallen so far out of support that it I
> don’t see it offering much value for users either.
>
> Neo4J support is dependent on neo4j-tinkerpop-api-impl to interface with
> Neo4J. This library has not been updated in over 5 years now and is stuck
> on Neo4J 3.4 which dropped out of support in 2020. As it stands, the
> neo4j-gremlin plugin cannot operate on any modern version of Neo4J and thus
> in my opinion, is no longer a viable product for users.
>
> Without a concrete plan for a significant investment in upgrading
> neo4j-tinkerpop-api-impl, I believe we should move to drop support for
> Neo4J-Gremlin.
>
> I would like to ask any users or stakeholders of Neo4J-Gremlin to reply
> with their thoughts on the future of the module. I would like to know if
> the plugin still provides any value for anyone beyond what I already
> captured. If anyone has objections to the deprecation of the module, I ask
> that they be raised in this thread. Further I would love to hear if anyone
> is interested in driving efforts to modernize and support Neo4J-Gremlin.
>
> I plan to leave this discussion open for a while to solicit feedback from
> as many stakeholders as possible. If there is no objections raised, I will
> assume a lazy consensus in favor of the deprecation and removal of
> Neo4J-Gremlin from TinkerPop.
>
> Regards,
>
> Cole
>


Re: Video channels for Apache TinkerPop related content

2022-11-10 Thread Joshua Shinavier
A "TinkerPop" playlist might make sense in that case, though I only see
event-specific playlists in the channel, rather than project-specific ones,
currently.

Josh


On Thu, Nov 10, 2022 at 1:07 AM Divij Vaidya 
wrote:

> It's a great idea.
>
> We had a recent discussion in the Apache Kafka community (see:
> https://lists.apache.org/thread/v4m41pr1kl5fyhkxd5dx32th4kb14hpr) where we
> asked the ASF about adding videos to the Apache YouTube channel itself.
>
> Perhaps, we should reach out to press@ASF and do the same?
>
> Also note that we need to publish some guidelines about the videos on the
> website such as they should be vendor neutral with no branding/product
> placement etc.
>
> --
> Divij Vaidya
>
>
>
> On Wed, Nov 9, 2022 at 3:15 PM Joshua Shinavier  wrote:
>
> > Sounds like a good idea (a YouTube channel; I haven't used Twitch so
> can't
> > comment). If the content is of high quality, we can promote it on the
> Graph
> > Show.
> >
> > Josh
> >
> >
> > On Wed, Nov 9, 2022 at 2:13 AM Florian Hockmann 
> > wrote:
> >
> > > Hi Taylor,
> > >
> > > great idea! I'd definitely welcome a TinkerPop channel on YouTube or
> > > Twitch. I think YouTube makes more sense for us at first given Twitch's
> > > focus on live streaming.
> > > Recorded Discord sessions would of course also be great to publish. I
> > > think more people are interested in them in general than people who
> > > actually join live. I personally would at least like to watch
> recordings
> > of
> > > some sessions that I couldn't join when they were held.
> > > Do we already have recordings of past sessions available? But even if
> > not,
> > > we can at least try to record sessions in the future.
> > >
> > >  Creating such playlists of conference talks and so on also sounds
> great
> > 😊
> > >
> > > I'd say we wait a few more days to give more people the chance to
> provide
> > > their feedback and then we can create a channel (assuming that the
> > > discussion here doesn't indicate something else).
> > >
> > > Regards,
> > > Florian
> > >
> > > -Ursprüngliche Nachricht-
> > > Von: Taylor Riggan 
> > > Gesendet: Freitag, 4. November 2022 20:25
> > > An: dev@tinkerpop.apache.org
> > > Betreff: Video channels for Apache TinkerPop related content
> > >
> > > Hi,
> > >
> > > Over the last few months we have started having Discord sessions
> related
> > > to new projects and features within the TinkerPop community.  I'm
> curious
> > > what folks think of creating an Apache TinkerPop YouTube/Twitch channel
> > for
> > > hosting recordings of this content (and any future sessions).  It would
> > > also offer a platform for walk-through videos to help new users as they
> > are
> > > getting started with TinkerPop/Gremlin.  On YouTube specifically, we
> > could
> > > include Playlists of existing content related to TinkerPop that is
> hosted
> > > on other channels (conference and user group talks from the past).
> > >
> > > Looking forward to your feedback and comments.
> > >
> > > Cheers,
> > >
> > > Taylor Riggan
> > > @triggan
> > >
> > >
> >
>


Re: Video channels for Apache TinkerPop related content

2022-11-09 Thread Joshua Shinavier
Sounds like a good idea (a YouTube channel; I haven't used Twitch so can't
comment). If the content is of high quality, we can promote it on the Graph
Show.

Josh


On Wed, Nov 9, 2022 at 2:13 AM Florian Hockmann 
wrote:

> Hi Taylor,
>
> great idea! I'd definitely welcome a TinkerPop channel on YouTube or
> Twitch. I think YouTube makes more sense for us at first given Twitch's
> focus on live streaming.
> Recorded Discord sessions would of course also be great to publish. I
> think more people are interested in them in general than people who
> actually join live. I personally would at least like to watch recordings of
> some sessions that I couldn't join when they were held.
> Do we already have recordings of past sessions available? But even if not,
> we can at least try to record sessions in the future.
>
>  Creating such playlists of conference talks and so on also sounds great 😊
>
> I'd say we wait a few more days to give more people the chance to provide
> their feedback and then we can create a channel (assuming that the
> discussion here doesn't indicate something else).
>
> Regards,
> Florian
>
> -Ursprüngliche Nachricht-
> Von: Taylor Riggan 
> Gesendet: Freitag, 4. November 2022 20:25
> An: dev@tinkerpop.apache.org
> Betreff: Video channels for Apache TinkerPop related content
>
> Hi,
>
> Over the last few months we have started having Discord sessions related
> to new projects and features within the TinkerPop community.  I'm curious
> what folks think of creating an Apache TinkerPop YouTube/Twitch channel for
> hosting recordings of this content (and any future sessions).  It would
> also offer a platform for walk-through videos to help new users as they are
> getting started with TinkerPop/Gremlin.  On YouTube specifically, we could
> include Playlists of existing content related to TinkerPop that is hosted
> on other channels (conference and user group talks from the past).
>
> Looking forward to your feedback and comments.
>
> Cheers,
>
> Taylor Riggan
> @triggan
>
>


Re: Async capabilities to TinkerPop

2022-07-30 Thread Joshua Shinavier
e I believe we already have async
>> execution implemented in TinkerPop Java client. Let me try to clarify and
>> please let me know if I missed something.
>>
>> Java client uses a small number of websocket connections to multiplex
>> multiple queries to the server. You can think of it as a pipe established
>> to the server on which we could send messages belonging to different
>> queries. On the server, these messages are queued until one of the
>> execution threads can pick it up. Once a request is picked for execution,
>> the results are returned in a pipelines/streaming manner i.e. the server
>> calculates a batch of results (size of batch is configurable per query),
>> and sends the results as messages on the same WebSocket channel. On the
>> client size, these results are stored in a queue until the application
>> thread consumes them uses an iterator. This model of execution *does not
>> block the application thread* and hence, provides async capabilities.
>>
>> A sample code to achieve this would be as follows:
>>
>> ```
>> final Cluster cluster = Cluster.build("localhost")
>>   .port(8182)
>>
>> .maxInProcessPerConnection(32)
>>
>> .maxSimultaneousUsagePerConnection(32)
>>
>> .serializer(Serializers.GRAPHBINARY_V1D0)
>>   .create();
>>
>> try {
>>   final GraphTraversalSource g =
>> traversal().withRemote(DriverRemoteConnection.using(cluster));
>>   CompletableFuture> result = g.V().has("name",
>> "pumba").out("friendOf").id().promise(Traversal::toList);
>>
>>   // do some application layer stuff
>>   // ...
>>   // ...
>>   // ...
>>
>>   List verticesWithNamePumba = result.join();
>>   System.out.println(verticesWithNamePumba);
>> } finally {
>>   cluster.close();
>> }
>> ```
>>
>> Note that, in the above example, the thread executing the above code is
>> not blocked until we call "result.join()".
>>
>> Does this address the use that Oleksandr brought up at the beginning of
>> this thread?
>>
>> --
>> Divij Vaidya
>>
>>
>>
>> On Fri, Jul 29, 2022 at 4:05 AM Oleksandr Porunov <
>> alexandr.poru...@gmail.com> wrote:
>>
>> Hmm, that's interesting! Thank you Joshua for the idea!
>> So, I guess the general idea here could be:
>> we can start small and start implementing async functionality for some
>> parts instead of implement async functionality for everything
>> straightaway.
>>
>> Oleksandr
>>
>> On Fri, Jul 29, 2022, 00:38 Joshua Shinavier  wrote:
>>
>> > Well, the wrapper I mentioned before did not require a full rewrite of
>> > TinkerPop :-) Rather, it provided async interfaces for vertices and
>> edges,
>> > on which operations like subgraph and shortest paths queries were
>> evaluated
>> > in an asynchronous fashion (using a special language, as it happened,
>> but
>> > limited Gremlin queries would have been an option). So I think a basic
>> > async API might be a useful starting point even if it doesn't go very
>> deep.
>> >
>> > Josh
>> >
>> >
>> > On Thu, Jul 28, 2022 at 4:21 PM Oleksandr Porunov <
>> > alexandr.poru...@gmail.com> wrote:
>> >
>> >> Hi Joshua and Pieter,
>> >>
>> >> Thank you for joining the conversation!
>> >>
>> >> I didn't actually look into the implementation details yet but quickly
>> >> checking Traversal.java code I think Pieter is right here.
>> >> For some reason I thought we could simply wrap synchronous method in
>> >> asynchronous, basically something like:
>> >>
>> >> // the method which should be implemented by a graph provider
>> >>
>> >> Future executeAsync(Callable func);
>> >>
>> >> public default Future asyncNext(){
>> >> return executeAsync(this::next);
>> >> }
>> >>
>> >> but checking that code I think I was wrong about it. Different steps
>> may
>> >> execute different logic (i.e. different underlying storage queries) for
>> >> different graph providers.
>> >> Thus, wrapping only terminal steps into async functions won't solve the
>> >> problem most likely.
>> >>
>> >> I guess it will requ

Re: Async capabilities to TinkerPop

2022-07-28 Thread Joshua Shinavier
Well, the wrapper I mentioned before did not require a full rewrite of
TinkerPop :-) Rather, it provided async interfaces for vertices and edges,
on which operations like subgraph and shortest paths queries were evaluated
in an asynchronous fashion (using a special language, as it happened, but
limited Gremlin queries would have been an option). So I think a basic
async API might be a useful starting point even if it doesn't go very deep.

Josh


On Thu, Jul 28, 2022 at 4:21 PM Oleksandr Porunov <
alexandr.poru...@gmail.com> wrote:

> Hi Joshua and Pieter,
>
> Thank you for joining the conversation!
>
> I didn't actually look into the implementation details yet but quickly
> checking Traversal.java code I think Pieter is right here.
> For some reason I thought we could simply wrap synchronous method in
> asynchronous, basically something like:
>
> // the method which should be implemented by a graph provider
>
> Future executeAsync(Callable func);
>
> public default Future asyncNext(){
> return executeAsync(this::next);
> }
>
> but checking that code I think I was wrong about it. Different steps may
> execute different logic (i.e. different underlying storage queries) for
> different graph providers.
> Thus, wrapping only terminal steps into async functions won't solve the
> problem most likely.
>
> I guess it will require re-writing or extending all steps to be able to
> pass an async state instead of a sync state.
>
> I'm not familiar enough with the TinkerPop code yet to claim that, so
> probably I could be wrong.
> I will need to research it a bit more to find out but I think that Pieter
> is most likely right about a massive re-write.
>
> Nevertheless, even if that requires massive re-write, I would be eager to
> start the ball rolling.
> I think we either need to try to implement async execution in TinkerPop 3
> or start making some concrete decisions regarding TinkerPop 4.
>
> I see Marko A. Rodriguez started to work on RxJava back in 2019 here
> https://github.com/apache/tinkerpop/tree/4.0-dev/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava
>
> but the process didn't go as far as I understand. I guess it would be good
> to know if we want to completely rewrite TinkerPop in version 4 or not.
>
> If we want to completely rewrite TinkerPop in version 4 then I assume it
> may take quite some time to do so. In this case I would be more likely to
> say that it's better to implement async functionality in TinkerPop 3 even
> if it requires rewriting all steps.
>
> In case TinkerPop 4 is a redevelopment with breaking changes but without
> starting to rewrite the whole functionality then I guess we could try to
> work on TinkerPop 4 by introducing async functionality and maybe applying
> more breaking changes in places where it's better to re-work some parts.
>
> Best regards,
> Oleksandr
>
>
> On Thu, Jul 28, 2022 at 7:47 PM pieter gmail 
> wrote:
>
>> Hi,
>>
>> Does this not imply a massive rewrite of TinkerPop? In particular the
>> iterator chaining pattern of steps should follow a reactive style of
>> coding?
>>
>> Cheers
>> Pieter
>>
>>
>> On Thu, 2022-07-28 at 15:18 +0100, Oleksandr Porunov wrote:
>> > I'm interested in adding async capabilities to TinkerPop.
>> >
>> > There were many discussions about async capabilities for TinkerPop
>> > but
>> > there was no clear consensus on how and when it should be developed.
>> >
>> > The benefit for async capabilities is that the user calling a query
>> > shouldn't need its thread to be blocked to simply wait for the result
>> > of
>> > the query execution. Instead of that a graph provider should take
>> > care
>> > about implementation of async queries execution.
>> > If that's the case then many graph providers will be able to optimize
>> > their
>> > execution of async queries by handling less resources for the query
>> > execution.
>> > As a real example of potential benefit we could get I would like to
>> > point
>> > on how JanusGraph executes CQL queries to process Gremlin queries.
>> > CQL result retrieval:
>> >
>> https://github.com/JanusGraph/janusgraph/blob/15a00b7938052274fe15cf26025168299a311224/janusgraph-cql/src/main/java/org/janusgraph/diskstorage/cql/function/slice/CQLSimpleSliceFunction.java#L45
>> >
>> > As seen from the code above, JanusGraph already leverages async
>> > functionality for CQL queries under the hood but JanusGraph is
>> > required to
>> > process those queries in synced manner, so what JanusGraph does - it
>> > simply
>> > blocks the whole executing thread until result is returned instead of
>> > using
>> > async execution.
>> >
>> > Of course, that's just a case when we can benefit from async
>> > execution
>> > because the underneath storage backend can process async queries. If
>> > a
>> > storage backend can't process async queries then we won't get any
>> > benefit
>> > from implementing a fake async executor.
>> >
>> > That said, I believe quite a few graph providers may be

Re: Async capabilities to TinkerPop

2022-07-28 Thread Joshua Shinavier
Hi Oleksandr,

I agree about the long-standing need for async queries. A "fake" async API
for TinkerPop was one the first things we had to build when I first started
at Uber in 2017 (using JanusGraph on Cassandra, and later an in-house
Cassandra-based graph DB). Feel free to share an early version of your
proposal here, or post a link to a design doc; I would be happy to be in
the loop from an interoperability point of view -- e.g. making sure that
async APIs in different languages are analogous. Callbacks / promise-based
RPC would have been my first thought, as well.

Josh


On Thu, Jul 28, 2022 at 7:18 AM Oleksandr Porunov <
alexandr.poru...@gmail.com> wrote:

> I'm interested in adding async capabilities to TinkerPop.
>
> There were many discussions about async capabilities for TinkerPop but
> there was no clear consensus on how and when it should be developed.
>
> The benefit for async capabilities is that the user calling a query
> shouldn't need its thread to be blocked to simply wait for the result of
> the query execution. Instead of that a graph provider should take care
> about implementation of async queries execution.
> If that's the case then many graph providers will be able to optimize their
> execution of async queries by handling less resources for the query
> execution.
> As a real example of potential benefit we could get I would like to point
> on how JanusGraph executes CQL queries to process Gremlin queries.
> CQL result retrieval:
>
> https://github.com/JanusGraph/janusgraph/blob/15a00b7938052274fe15cf26025168299a311224/janusgraph-cql/src/main/java/org/janusgraph/diskstorage/cql/function/slice/CQLSimpleSliceFunction.java#L45
>
> As seen from the code above, JanusGraph already leverages async
> functionality for CQL queries under the hood but JanusGraph is required to
> process those queries in synced manner, so what JanusGraph does - it simply
> blocks the whole executing thread until result is returned instead of using
> async execution.
>
> Of course, that's just a case when we can benefit from async execution
> because the underneath storage backend can process async queries. If a
> storage backend can't process async queries then we won't get any benefit
> from implementing a fake async executor.
>
> That said, I believe quite a few graph providers may benefit from having a
> possibility to execute queries in async fashion because they can optimize
> their resource utilization.
> I believe that we could have a feature flag for storage providers which
> want to implement async execution. Those who can't implement it or don't
> want to implement it may simply disable async capabilities which will
> result in throwing an exception anytime an async function is called. I
> think it should be fine because we already have some feature flags like
> that for graph providers. For example "Null Semantics" was added in
> TinkerPop 3.5.0 but `null` is not supported for all graph providers. Thus,
> a feature flag for Null Semantics exists like
> "g.getGraph().features().vertex().supportsNullPropertyValues()".
> I believe we can enable async in TinkerPop 3 by providing async as a
> feature flag and letting graph providers implement it at their will.
> Moreover if a graph provider wants to have async capabilities but their
> storage backends don't support async capabilities then it should be easy to
> hide async execution under an ExecutorService which mimics async execution.
> I believe we could do that for TinkerGraph so that users could experiment
> with async API at least. I believe we could simply have a default "async"
> function implementation for TinkerGraph which wraps all sync executions in
> a function and sends it to that ExecutorService (we can discuss which one).
> In such a case TinkerGraph will support async execution even without real
> async functionality. We could also potentially provide some configuration
> options to TinkerGraph to configure thread pool size, executor service
> implementation, etc.
>
> I didn't think about how it is better to implement those async capabilities
> for TinkerPop yet but I think reusing a similar approach like in Node.js
> which returns Promise when calling Terminal steps could be good. For
> example, we could have a method called `async` which accepts a termination
> step and returns a necessary Future object.
> I.e.:
> g.V(123).async(Traversal.next())
> g.V().async(Traversal.toList())
> g.E().async(Traversal.toSet())
> g.E().async(Traversal.iterate())
>
> I know that there were discussions about adding async functionality to
> TinkerPop 4 eventually, but I don't see strong reasons why we couldn't add
> async functionality to TinkerPop 3 with a feature flag.
> It would be really great to hear some thoughts and concerns about it.
>
> If there are no concerns, I'd like to develop a proposal for further
> discussion.
>
> Best regards,
> Oleksandr Porunov
>


Re: Design proposal to use Arrow Flight as transport for Gremlin Server

2022-06-30 Thread Joshua Shinavier
Hi Valentyn,

Thank you for the proposal/summary. Leo Meyerovich and others have
previously suggested adding Arrow support to TinkerPop; it just hasn't been
prioritized. I like everything about your description apart from this
phrase: "should replace the network layer with Arrow Flight". You are not
suggesting that the WebSocket-based solution be removed, are you? If the
two could exist in parallel, it definitely would be nice to have an Arrow
option. WebSocket could perhaps be dropped later if it isn't being used
much and/or the maintenance burden is too high. Just my $0.02.

Josh



On Thu, Jun 30, 2022 at 4:36 PM Valentyn Kahamlyk
 wrote:

> Hello Everyone,
>
> I would like to propose exploring options to use Arrow Flight as a transport 
> for Gremlin Server. Currently Gremlin Server and Clients are based on 
> WebSockets with a custom sub-protocol and serialization to GraphSON and 
> GraphBinary.  Developers for each driver must implement those protocols from 
> scratch and there is a limited amount of code which is being reused (only 3rd 
> party WebSocket libraries are currently reused in the client variants). The 
> protocol implementation is a complicated and error-prone process, so most 
> drivers only support some subset of Gremlin Server features. The maintenance 
> cost is also constantly increasing with the number of new client variants 
> being added to TinkerPop.
>
> ** Motivation **
> We would like to propose a solution to reduce maintenance and simplify the 
> development of the client drivers by using a standard protocol based on the 
> Apache Arrow Flight. As Arrow Flight is implemented in the most common 
> languages like C++, C#, Java and Python we anticipate a larger amount of 
> existing codebase can be reused which would help to reduce maintenance costs 
> in the future. Also, we can reuse some other Arrow Flight features like 
> authentication and error handling.
>
> ** Assumptions **
> Proof of Concept Development will be done with Java 8.
> Need to reuse existing code as much as possible.
> It is desirable, but not necessary, to maintain compatibility with existing 
> drivers.
> To simplify development at the initial stage, we will reuse existing 
> serialization mechanisms.
>
> ** Requirements **
> Gremlin Server and drivers should replace the network layer with Arrow Flight.
> No significant drop in performance.
> Gremlin Arrow must pass the Gherkin test suite.
>
> ** Prototype Design Overview **
> We would like to explore solution below and create prototype to prove 
> approach is feasible.
> The main idea is to replace the transport layer with FlightServer and 
> FlightClient. They support asynchronous data transfer, splitting data into 
> chunks, and authorization. While Arrow Flight typically requires schema, in a 
> short term we can proceed with implementation using existing serializers and 
> GraphBinary format. By using GraphBinary we will not have all capabilities 
> that Arrow Flight provides out of the box, like efficient compression. 
> However, in the future, we see the value of adding capabilities to generate a 
> schema from the server-side, and that can enable additional use cases.
>
> First stage: replace transport layer, but keep serializers
> Pros:
> Reduction of the code base to be developed and maintained
> A relatively low number of modifications
>
> Cons:
> We may observe reduced performance due to schema transfer and other overhead. 
> As part of the PoC we will assess performance overhead for small and large 
> responses and identify options to mitigate it.
> Still need to support GraphBinary serialization.
>
> Second stage: replace transport layer, make dynamic schema generation and use 
> native Arrow structures for data transmission
> Pros:
> Greater reduction of the codebase to be developed and maintained
> In addition, need to rework the serialization and add schema generation
> Performance can be improved for large data sets due to Arrow Flight 
> optimizations and the ability to transfer data in parallel
> No need to support GraphBinary and GraphSON serialization protocols
>
> Cons:
> Reduced performance for small result sets
> Can be complicated and expensive to generate a schema for each request
>
> Please find few more diagrams attached in the pdf file attached and please 
> share your thoughts.
>
> Regards, Valentyn
>
>


Re: A meta model for gremlin's property graph

2022-01-16 Thread Joshua Shinavier
e is constructed using two parameters: signedness and
precision. This allows an unlimited number of integer types) because it's
just simpler, and simplifies the supporting code you have to write.



> Same critique as above. Letting in another language means gremlin does not
> bootstrap itself.
>

Similar response as above. You're defining a language whether you like it
or not. The terms in your language are "Graph", "EdgeProperty", etc. You're
using Gremlin as the medium for expressing the language, but you're still
creating something new. The "something new" is the language I am talking
about, not the Gremlin syntax you're using to define it.


I don't see your approach of embedding model definitions and constraints
> natively in Gremlin as being at odds with having a formal data model.
>
>
> Afraid I do see as being at odds with one another. Describing gremlin
> using another language, be it MOF/EMF/category theory is a very big
> difference to it being self describing. If we decide against gremlin self
> describing then we abort this attempt, no point in hacking it.
>

Not sure we fully understood each other, but it's your idea; I'm just
giving you the requested feedback.



> For what its worth this is a bit of a proof of concept. To see if gremlin
> can meaningfully self describe. It has done so for the last 10 years.
>

I think it's a worthwhile thing to do, though when you say it like that, I
have to comment that making *Gremlin* self-describe is a much, much (much)
bigger problem than defining a schema language within Gremlin. I think both
problems are solvable, but the former is definitely a TinkerPop 4
proposition.



> Perhaps we should, however, before discussing the merits of this approach
> or another, first decide what we are trying to achieve in the first place.
>

+1



> Here goes my understanding of what we are trying to achieve.
>
> 1: A property graph meta model. To describe exactly what kind of data
> structure the gremlin language operates on.
>

+1



> 2: Gremlin grammar together with the documentation specifies gremlin the
> language fully.
>

The surface syntax of the language (enough for expressing your schema
constraints), yes.



> 3: Extend the gremlin grammar to specify schema create/edit/delete
> functionality.
>

Why is that necessary, if you're embedding schemas in the graph? Just embed
them in the graph. We don't have extra grammar for updating other types of
graphs.



> 4: Extend the grammar to query the schema. (This can be plain gremlin,
> just operating at the schema level)
>

Yeah, just plain Gremlin.



> 5: A language agnostic specification of how to interact with a remote
> gremlin enabled system. i.e. similar to the jdbc specification only without
> reference to any particular language.
>

Seems orthogonal to the language, and generation of constraints into
Gremlin syntax.


As an aside, breaking user space should not even be considered. i.e. 99%
> backward compatibility should be guaranteed at all times.
>

I think you can do what you are proposing with no changes at all to the
Gremlin language.


Josh




>
>
> On Tue, 2022-01-11 at 10:47 -0800, Joshua Shinavier wrote:
>
> Hey Pieter,
>
> Good to see some more motion on this front. Responses inline.
>
>
> On Sun, Jan 9, 2022 at 4:28 AM pieter gmail 
> wrote:
>
> Hi,
>
> I have done some work on defining a meta model for Gremlin's property
> graph. I am using the approach used in the modelling world, in particular
> as done by the OMG <https://www.omg.org/> group when defining their
> various meta models and specifications.
>
>
> +1 to using or drawing upon standards where we can. For those of us
> (including me) who have not worked with OMG standards other than
> occasionally bumping into UML, which parts of the approach you describe
> below were influenced by OMG?
>
>
>
> However where OMG uses a subset of the UML to define their meta models I
> suggest we use Gremlin. After all Gremlin is the language we use to
> describe the world and the property graph meta model can also be described
> in Gremlin.
>
>
> I agree, as long as these descriptions do not admit "arbitrary Gremlin".
> The problem right now is that Gremlin's declarative semantics aren't very
> clear, and it is a relatively complex language. I totally agree that you
> could define a DSL for defining models which could be embedded in Gremlin;
> you could even define the DSL in terms of itself.
>
>
>
> I propose that we have 3 levels of modelling. Each of which can itself be
> specified in gremlin.
>
> 1: The property graph meta model.
>
>
> +1
>
>
>
> 2: The model.
>
>

Re: A meta model for gremlin's property graph

2022-01-11 Thread Joshua Shinavier
Hey Pieter,

Good to see some more motion on this front. Responses inline.


On Sun, Jan 9, 2022 at 4:28 AM pieter gmail  wrote:

> Hi,
>
> I have done some work on defining a meta model for Gremlin's property
> graph. I am using the approach used in the modelling world, in particular
> as done by the OMG  group when defining their
> various meta models and specifications.
>

+1 to using or drawing upon standards where we can. For those of us
(including me) who have not worked with OMG standards other than
occasionally bumping into UML, which parts of the approach you describe
below were influenced by OMG?



> However where OMG uses a subset of the UML to define their meta models I
> suggest we use Gremlin. After all Gremlin is the language we use to
> describe the world and the property graph meta model can also be described
> in Gremlin.
>

I agree, as long as these descriptions do not admit "arbitrary Gremlin".
The problem right now is that Gremlin's declarative semantics aren't very
clear, and it is a relatively complex language. I totally agree that you
could define a DSL for defining models which could be embedded in Gremlin;
you could even define the DSL in terms of itself.



> I propose that we have 3 levels of modelling. Each of which can itself be
> specified in gremlin.
>
> 1: The property graph meta model.
>

+1



> 2: The model.
>

I like the term "schema".



> 3: The graph representing the actual data.
>

+1. Not only is the graph a "model", but depending on how you define the
modeling DSL, you can also see the other two models as "graphs", with types
as elements.



> 1) The property graph meta model describes the nature of the property
> graph itself. i.e. that property graphs have vertices, edges and properties.
>

I agree, and I think there is value in going one step further to create a
general purpose data model for defining data models, with property graphs
as a special case.



> 2) The model is an instance of the meta model. It describes the schema of
> a particular graph. i.e. for TinkerPop's modern graph this would be
> 'person', 'software', 'created' and 'knows' and the various properties
> 'weight', 'age', 'name' and 'lang' properties.
>

+1


3) The final level is an instance of the model. It is the actual graph
> itself. i.e. for TinkerPop's modern graph it is 'Marko', 'Josh', 'java' ...
>

Yes. So to elaborate on what I said above about models and graphs, let's
say we add a schema to the TinkerPop classic graph. The classic graph is an
instance of the schema, and the schema is an instance of a property graph
schema. Your three models are three graphs:
1) the classic graph ("data graph") has elements "Marko", "Josh", "ripple"
etc. each of which is a value together with a type and a name (id). The
type of Marko is "Person" (a named type) and the type of ripple is
"Project" etc. The value of Marko is the record {"name": "marko", "age":
29} while the value of ripple is {"name": "ripple", "lang": "java"}.
2) the schema of the classic graph ("schema graph") has elements "Person",
"Project", "knows", and "created". These again are values together with
types and ids. E.g. the type of "Person" is something like {"name": string,
"age": int32}, i.e. a record type.
3) the schema of the schema of the classic graph -- i.e. the core model or
what you called the meta model -- is again a graph with elements like
"Type", "Element", etc. Type expressions in the schema of the classic graph
are values in the core model. The core model is its own schema.

Decide for yourself if the above makes sense to you, but this is how I
think of the TinkerPop modeling layer cake these days -- as chained models
in which the schema of one graph is the data of the next, usually arriving
at a fixpoint -- the core -- within two steps.



1: Property Graph Meta Model
>
> public static Graph gremlinMetaModel() {
>
> enum GremlinDataType {
>
> STRING,
>
> INTEGER,
>
> DOUBLE,
>
> DATE,
>
> TIME
>
> //...
>
> }
>
>
Cool, except that I would banish types like Date and Time from the core
model. Drawing the line between primitive types and derived types is more
art than science, but there is enough variation in what developers want out
of dates/times that I put them on the other side of the fence. It also
makes implementations easier if you have as few baked-in types as possible.
On the other hand, I suggest adding many more numeric types, e.g. for
integers:

- bigint
- int8
- int16
- int32
- int64
- uint8
- uint16
- uint32
- uint64

and for floating-point numbers:

- name: bigfloat
- name: float32
- name: float64


[snip metamodel definition]
>
>
> This can be visualized as,
> ...
>


I'm not sure if I'm reading this correctly, and I can't see the figure yet,
but I understand that you are defining the metamodel as a graph. Cool.




> Notes:
> 1) GremlinDataType is an enumeration of named data types that G

Re: [DISCUSS] ASF Board Draft Report - October 2021

2021-10-03 Thread Joshua Shinavier
I don't care to continue such a nasty conversation on a public list.

Josh


On Sun, Oct 3, 2021 at 9:50 AM Marko Rodriguez  wrote:

> Josh — if you are going to come onto this project, put your name on it,
> then you need to do something. You promised me for 2 years that you would
> work on TInkerPop4. You haven’t. You lied. I have no respect for liars.
> That is my problem with you. You are ineffectual at best, two-faced at
> worst. You want people to respect you with your accolades, but you don’t do
> anything to earn said accolades. People who seek the respect of others are
> not leaders, they are weak souls who will, when the time is right, do what
> is best for them.
>
> Marko.
>
>
>
> > On Oct 3, 2021, at 10:42 AM, Joshua Shinavier  wrote:
> >
> > Marko, why you are so concerned with what I am doing or not doing is
> beyond
> > me. Likewise, you make vague accusations against Stephen on this list,
> > before calling him a "shining star"? How "nepotism", exactly? I commented
> > earlier that you had "self-cancelled" because with all of this behavior,
> > you seemed to be daring the rest of the world to take offense / get
> annoyed
> > / shut you out. I saw your edgy Twitter posts as an exercise in free
> > speech, and I was against your removal from the PMC as you well know. Bad
> > things were bound to happen, though; you are making an ass of yourself in
> > every public forum, and obviously that does not reflect well on you. If
> you
> > want others to respect you and follow you, quit trying to tear others
> down,
> > and get back to producing. I don't understand your rage against Amazon,
> but
> > your ideas around the mm-ADT "economic machine" seemed good -- why not
> > execute on them.
> >
> > Josh
> >
> >
> > On Sun, Oct 3, 2021 at 7:59 AM Marko Rodriguez  <mailto:okramma...@gmail.com>> wrote:
> >
> >> Hello,
> >>
> >> This looks good, though I think we should add some items regarding
> project
> >> leadership.
> >>
> >> First off, Tibco has been a TinkerPop-enabled graph database for over 5
> >> years now. So that is nothing new.
> >>
> >> Next, we should alert the Apache Board about the lack of contributions
> by
> >> recently elected PMC members. More generally, why is the project
> removing
> >> contributing members and replacing them with non-contributing members? I
> >> bring up Josh in particular. Of his performance of late, I’ve noted a
> >> single "VOTE +1” for a .toString() pull request by Stephen. Given the
> >> response time to the PR, there wasn’t even sufficient time for Josh to
> have
> >> compiled and tested the PR. This goes counter to what Stephen was
> arguing
> >> to me (Marko) earlier regarding why the PMC members were elected — they
> are
> >> needed to test the code, not necessarily contribute
> code/documentation/blog
> >> posts/academic articles/etc. So… what is the truth here? What’s going on
> >> with the leadership of this project? I believe this project is losing
> the
> >> meritocracy that Apache so holds dear for "nepotism" (not genetic
> nepotism,
> >> but through corporate affiliation). However, if “nepotism" is the
> direction
> >> Apache is going, then I think this should be made clear as it’s
> fraudulent
> >> to be underhanded about the reasoning behind the decisions being made
> for
> >> the project. Finally, this might also be the reasoning why I was removed
> >> from the project given my lack of support for Amazon in the OSS
> community
> >> [1].
> >>
> >> Thank you Stephen for your efforts on TinkerPop. You are a shining star.
> >>
> >> Marko.
> >>
> >> [1]
> >>
> https://www.slideshare.net/slidarko/mmadt-a-virtual-machinean-economic-machine
> >> <
> >>
> https://www.slideshare.net/slidarko/mmadt-a-virtual-machinean-economic-machine
> <
> https://www.slideshare.net/slidarko/mmadt-a-virtual-machinean-economic-machine
> >>
> >> [slides 1-37]
> >>- Please note that these slides are no longer indexed by Google.
> >> All other project slides/articles etc. are.
> >>- Unfortunate that large companies would be threatened by such
> >> small individuals. Is this what is happening with TinkerPop?
> >>
> >>
> >>> On Oct 1, 2021, at 4:13 PM, Stephen Mallette 
> >> wrote:
> >>>
> >>> Here is the attached draft of o

Re: [DISCUSS] ASF Board Draft Report - October 2021

2021-10-03 Thread Joshua Shinavier
Marko, why you are so concerned with what I am doing or not doing is beyond
me. Likewise, you make vague accusations against Stephen on this list,
before calling him a "shining star"? How "nepotism", exactly? I commented
earlier that you had "self-cancelled" because with all of this behavior,
you seemed to be daring the rest of the world to take offense / get annoyed
/ shut you out. I saw your edgy Twitter posts as an exercise in free
speech, and I was against your removal from the PMC as you well know. Bad
things were bound to happen, though; you are making an ass of yourself in
every public forum, and obviously that does not reflect well on you. If you
want others to respect you and follow you, quit trying to tear others down,
and get back to producing. I don't understand your rage against Amazon, but
your ideas around the mm-ADT "economic machine" seemed good -- why not
execute on them.

Josh


On Sun, Oct 3, 2021 at 7:59 AM Marko Rodriguez  wrote:

> Hello,
>
> This looks good, though I think we should add some items regarding project
> leadership.
>
> First off, Tibco has been a TinkerPop-enabled graph database for over 5
> years now. So that is nothing new.
>
> Next, we should alert the Apache Board about the lack of contributions by
> recently elected PMC members. More generally, why is the project removing
> contributing members and replacing them with non-contributing members? I
> bring up Josh in particular. Of his performance of late, I’ve noted a
> single "VOTE +1” for a .toString() pull request by Stephen. Given the
> response time to the PR, there wasn’t even sufficient time for Josh to have
> compiled and tested the PR. This goes counter to what Stephen was arguing
> to me (Marko) earlier regarding why the PMC members were elected — they are
> needed to test the code, not necessarily contribute code/documentation/blog
> posts/academic articles/etc. So… what is the truth here? What’s going on
> with the leadership of this project? I believe this project is losing the
> meritocracy that Apache so holds dear for "nepotism" (not genetic nepotism,
> but through corporate affiliation). However, if “nepotism" is the direction
> Apache is going, then I think this should be made clear as it’s fraudulent
> to be underhanded about the reasoning behind the decisions being made for
> the project. Finally, this might also be the reasoning why I was removed
> from the project given my lack of support for Amazon in the OSS community
> [1].
>
> Thank you Stephen for your efforts on TinkerPop. You are a shining star.
>
> Marko.
>
> [1]
> https://www.slideshare.net/slidarko/mmadt-a-virtual-machinean-economic-machine
> <
> https://www.slideshare.net/slidarko/mmadt-a-virtual-machinean-economic-machine>
> [slides 1-37]
> - Please note that these slides are no longer indexed by Google.
> All other project slides/articles etc. are.
> - Unfortunate that large companies would be threatened by such
> small individuals. Is this what is happening with TinkerPop?
>
>
> > On Oct 1, 2021, at 4:13 PM, Stephen Mallette 
> wrote:
> >
> > Here is the attached draft of our board report for this quarter.
> >
> >
> --
> >
> > ## Description:
> > Apache TinkerPop is a graph computing framework for both graph databases
> > (OLTP) and graph analytic systems (OLAP).
> >
> > ## Activity:
> > TinkerPop released 3.4.12 and 3.5.1 on July 19, 2021. These releases
> came a
> > bit earlier than expected to address a bug implementers had encountered
> in
> > 3.5.0. While the bug had a relatively simple workaround and did not
> > particularly affect end users, there was consensus in the community to
> > release sooner than later. These changes did include some minor
> enhancements
> > as well. After 3.5.1 released, it was announced that JanusGraph became
> the
> > first graph provider to support the 3.5.x release line.
> >
> > Development on 3.4.13, 3.5.2 and 3.6.0 is all well underway and it would
> be
> > likely that we'd see releases of at least 3.4.13 and 3.5.2 this year. It
> is
> > also likely that we will be reaching the end of the 3.4.x line of
> > maintenance.
> >
> > We've recently become aware of two new TinkerPop implementations in the
> > Tibco Graph Database[1] and ArcadeDB[2]. That brings the total number of
> > graph systems supporting TinkerPop to thirty.
> >
> > We are aware that our committer growth has been slow and are considering
> > ideas to improve our ability to attract and retain folks.
> >
> > ## Issues:
> > There are no issues requiring board attention at this time.
> >
> > ## Releases:
> > - 3.4.12 (July 19, 2021)
> > - 3.5.1 (July 19, 2021)
> >
> > ## PMC/Committer:
> > - Last PMC addition was Kelvin Lawrence/Josh Shinavier - June 2021
> > - Last committer addition was Øyvind Sæbø - March 2021
> >
> > ## Links
> > [1] https://www.tibco.com/products/tibco-graph-database/
> > [2] https://arcadedb.com/
>
>


[jira] [Commented] (TINKERPOP-2596) datetime function

2021-09-18 Thread Joshua Shinavier (Jira)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17417070#comment-17417070
 ] 

Joshua Shinavier commented on TINKERPOP-2596:
-

Option 2 kinda makes sense, if "bare" timestamps are allowed at all. Even with 
full date-times, there is some context dependence if the time zone is omitted – 
it defaults to a "local" time zone, which has to be predefined. We could assume 
that a reference timestamp is also predefined, and bound to a specific instant 
near the beginning of query evaluation. You could have even weirder things than 
hours and minutes without a date, e.g. seconds and milliseconds without 
minutes. Is it worth trying to support these oddball cases? Idk, but you could 
do it...

> datetime function
> -
>
> Key: TINKERPOP-2596
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2596
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: language
>Affects Versions: 3.5.1
>Reporter: Stephen Mallette
>Priority: Major
>
> Include a {{datetime()}} function in the grammar that will parse a ISO-8601 
> formatted date:
> {code}
> datetime('2021-07-21')
> datetime('2021-07-21T01:12:59')
> datetime('2021-07-21T01:12:59+0500')
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Anything I could do to help?

2021-09-08 Thread Joshua Shinavier
Marko, I doubt anyone thinks you're actually a Nazi racist. Why you chose
to self-cancel like that, the world may never know, but you might have
shown more consideration toward those who wanted to support you in spite of
all the craziness. I don't see us working together so soon after these
weird rants on the dev list, but I won't speak for anyone else. You're
still a TinkerPop contributor. Go ahead and do something.

Josh





On Tue, Sep 7, 2021 at 3:34 PM Marko Rodriguez  wrote:

> Hi guys/gals,
>
> Looks like it’s just been Stephen nick-nacking away again as it’s been the
> last few years. Given the recent big turnover in management, I was hoping
> to eat my own words and see some performance out of Josh, but unfortunately
> as given the last 15+ years, 'talk and walk’ (which is even worse than
> ‘commit and split’). Given that Amazon Neptune is including openCypher in
> their distribution and with Neo4j just took in a whomping $300+ million in
> a Series , seems Apache TinkerPop will be falling to the
> wayside unless some real innovation happens.
>
> As such, perhaps I could offer a helping hand given my intimate knowledge
> of the codebase and my master of the theory and history of graph computing
> that I helped formulate over the last 15 years. With that said, I
> completely understand if y’all need to hold to the narrative that I’m a
> “Nazi racist” and thus, unworthy of contributing (after all, the "Nazi
> code" I wrote over a decade has proven how detrimental ‘racism’ has been to
> the integrity of the software). However, on the other hand, if y’all have
> moved past such trivial concepts of ‘good and evil’, perhaps we can get
> TinkerPop movin' again.
>
> Take care mein comrades,
> Marko.
>
> http://markorodriguez.com 
>
>
>


Re: [DISCUSS] datetime

2021-07-23 Thread Joshua Shinavier
Well, the string is in ISO 8601, so that's accurate. The serialized values
are not (unless we use strings instead of longs). Maybe support for ISO
8601 proper could be added to GraphBinary as an extended type?

On Fri, Jul 23, 2021 at 8:41 AM Stephen Mallette 
wrote:

> ok - perhaps i shouldn't have declared it as ISO8601 explicitly.  In the
> interest of support for existing serialization perhaps it's best to call it
> unix time and simply have the datetime() accept a string in the
> YY-MM-DDThh:mm:ss format.
>
> On Fri, Jul 23, 2021 at 11:20 AM Joshua Shinavier 
> wrote:
>
> > That's fine -- it's just not ISO 8601 in that case; it's Unix time in
> > milliseconds. ISO 8601 preserves the time zone, and allows
> sub-millisecond
> > precision. If we want a Date in Java to map to a .NET DateTime and back
> > without information loss, then the serialized representation needs to be
> a
> > string or a structure rather than a number.
> >
> > Josh
> >
> > On Fri, Jul 23, 2021 at 3:36 AM Stephen Mallette 
> > wrote:
> >
> > > Current support for dates looks like:
> > >
> > > Java = java.util.Date()
> > > Python = datetime()
> > > Javascript = Date()
> > > .NET = DateTime()
> > >
> > > and given the current serialization model is represented as (taking the
> > > description from GraphBinary docs) an 8-byte two’s complement signed
> > > integer representing a millisecond-precision offset from the unix
> epoch:
> > >
> > > 00 00 00 00 00 00 00 00: The moment in time 1970-01-01T00:00:00.000Z.
> > > ff ff ff ff ff ff ff ff: The moment in time 1969-12-31T23:59:59.999Z.
> > >
> > >
> > >
> > >
> > > On Thu, Jul 22, 2021 at 7:16 PM Joshua Shinavier 
> > > wrote:
> > >
> > > > I think that works so long as there is a common type which would
> exist
> > in
> > > > each GLV and which datetime() would parse *to*. This type could then
> be
> > > > mapped into GLV-native types as desired. An easy choice for the
> common
> > > type
> > > > would be 64-bit Unix timestamps in milliseconds, but this does not
> > > capture
> > > > arbitrary precision (as ISO 8601 does). If all we want is
> milliseconds,
> > > > then maybe call the function dateTimeToMillis() or such. If we want a
> > > > structured representation of the dateTime, then we need a way of
> > > providing
> > > > the type in each language in an equivalent way (doable, as I have
> > > > illustrated, but needs doing).
> > > >
> > > > Josh
> > > >
> > > >
> > > > On Thu, Jul 22, 2021 at 10:45 AM David Bechberger <
> d...@bechberger.com
> > >
> > > > wrote:
> > > >
> > > > > +1 from me as well.
> > > > >
> > > > > On Thu, Jul 22, 2021 at 8:28 AM Kelvin Lawrence
> > >  > > > >
> > > > > wrote:
> > > > >
> > > > > > A big +1 from me for this. As much as possible making Gremlin a
> > > > language
> > > > > > that does not depend on closures for things like dates and string
> > > > > > manipulation will help with parity when compared to other query
> > > > > languages.
> > > > > > Kelvin
> > > > > >
> > > > > >
> > > > > > On Wednesday, July 21, 2021, 07:49:17 AM CDT, Stephen
> Mallette
> > <
> > > > > > spmalle...@gmail.com> wrote:
> > > > > >
> > > > > >  One of the things precluding a move toward a more pure usage of
> > > > > > gremlin-language in place of groovy scripts is a way to
> > instantiate a
> > > > > > date/time. It seems simple enough to just include a datetime()
> > > function
> > > > > in
> > > > > > the grammar that will parse a ISO-8601 formatted dates:
> > > > > >
> > > > > > datetime('2021-07-21')
> > > > > > datetime('2021-07-21T01:12:59')
> > > > > > datetime('2021-07-21T01:12:59+0500')
> > > > > >
> > > > > > Each language can retain its own method for producing datetime
> that
> > > it
> > > > > > already has.
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] Geo-Spatial support

2021-07-23 Thread Joshua Shinavier
Hi Dave,

I think something like this is a very good idea, and these look like useful
primitives. IMO when it comes to geospatial queries, the devil is in the
details. For example, at some point we'll have someone asking for
double-precision lat/lon points (GPS is not that accurate, but some
applications use computed/simulated points, or combine GPS data with local
position). Polygons are sometimes defined as having "holes", etc. It may be
worthwhile to take some direction from OGC standards like GeoSPARQL.

Just an initial $0.02. Ideally, the extension would be simple for
developers to use and understand (as this is), while also being somewhat
future-proof and playing well with standards.

Josh



On Thu, Jul 22, 2021 at 2:44 PM David Bechberger 
wrote:

> One of the common requests from customers and users of TinkerPop is to add
> support for geographic based searches (TINKERPOP-2558
> ). In fact many
> TinkerPop enabled database vendors such as DataStax Graph and JanusGraph
> have added custom predicates and libraries to handle this request. As a
> query language framework it would make sense for TinkerPop to adopt a
> common geo-predicate framework to provide standardization across providers
> and to support this as part of the TinkerPop ecosystem.
>
> In consultation with some others on the project we have put together a
> proposed scheme for supporting this in TinkerPop which I have documented in
> a gist here:
> https://gist.github.com/bechbd/70f4ce5a537d331929ea01634b1fbaa2
>
> Interested in hearing others thoughts?
>
> Dave
>


Re: [DISCUSS] datetime

2021-07-23 Thread Joshua Shinavier
That's fine -- it's just not ISO 8601 in that case; it's Unix time in
milliseconds. ISO 8601 preserves the time zone, and allows sub-millisecond
precision. If we want a Date in Java to map to a .NET DateTime and back
without information loss, then the serialized representation needs to be a
string or a structure rather than a number.

Josh

On Fri, Jul 23, 2021 at 3:36 AM Stephen Mallette 
wrote:

> Current support for dates looks like:
>
> Java = java.util.Date()
> Python = datetime()
> Javascript = Date()
> .NET = DateTime()
>
> and given the current serialization model is represented as (taking the
> description from GraphBinary docs) an 8-byte two’s complement signed
> integer representing a millisecond-precision offset from the unix epoch:
>
> 00 00 00 00 00 00 00 00: The moment in time 1970-01-01T00:00:00.000Z.
> ff ff ff ff ff ff ff ff: The moment in time 1969-12-31T23:59:59.999Z.
>
>
>
>
> On Thu, Jul 22, 2021 at 7:16 PM Joshua Shinavier 
> wrote:
>
> > I think that works so long as there is a common type which would exist in
> > each GLV and which datetime() would parse *to*. This type could then be
> > mapped into GLV-native types as desired. An easy choice for the common
> type
> > would be 64-bit Unix timestamps in milliseconds, but this does not
> capture
> > arbitrary precision (as ISO 8601 does). If all we want is milliseconds,
> > then maybe call the function dateTimeToMillis() or such. If we want a
> > structured representation of the dateTime, then we need a way of
> providing
> > the type in each language in an equivalent way (doable, as I have
> > illustrated, but needs doing).
> >
> > Josh
> >
> >
> > On Thu, Jul 22, 2021 at 10:45 AM David Bechberger 
> > wrote:
> >
> > > +1 from me as well.
> > >
> > > On Thu, Jul 22, 2021 at 8:28 AM Kelvin Lawrence
>  > >
> > > wrote:
> > >
> > > > A big +1 from me for this. As much as possible making Gremlin a
> > language
> > > > that does not depend on closures for things like dates and string
> > > > manipulation will help with parity when compared to other query
> > > languages.
> > > > Kelvin
> > > >
> > > >
> > > > On Wednesday, July 21, 2021, 07:49:17 AM CDT, Stephen Mallette <
> > > > spmalle...@gmail.com> wrote:
> > > >
> > > >  One of the things precluding a move toward a more pure usage of
> > > > gremlin-language in place of groovy scripts is a way to instantiate a
> > > > date/time. It seems simple enough to just include a datetime()
> function
> > > in
> > > > the grammar that will parse a ISO-8601 formatted dates:
> > > >
> > > > datetime('2021-07-21')
> > > > datetime('2021-07-21T01:12:59')
> > > > datetime('2021-07-21T01:12:59+0500')
> > > >
> > > > Each language can retain its own method for producing datetime that
> it
> > > > already has.
> > > >
> > >
> >
>


Re: [DISCUSS] datetime

2021-07-22 Thread Joshua Shinavier
I think that works so long as there is a common type which would exist in
each GLV and which datetime() would parse *to*. This type could then be
mapped into GLV-native types as desired. An easy choice for the common type
would be 64-bit Unix timestamps in milliseconds, but this does not capture
arbitrary precision (as ISO 8601 does). If all we want is milliseconds,
then maybe call the function dateTimeToMillis() or such. If we want a
structured representation of the dateTime, then we need a way of providing
the type in each language in an equivalent way (doable, as I have
illustrated, but needs doing).

Josh


On Thu, Jul 22, 2021 at 10:45 AM David Bechberger 
wrote:

> +1 from me as well.
>
> On Thu, Jul 22, 2021 at 8:28 AM Kelvin Lawrence 
> wrote:
>
> > A big +1 from me for this. As much as possible making Gremlin a language
> > that does not depend on closures for things like dates and string
> > manipulation will help with parity when compared to other query
> languages.
> > Kelvin
> >
> >
> > On Wednesday, July 21, 2021, 07:49:17 AM CDT, Stephen Mallette <
> > spmalle...@gmail.com> wrote:
> >
> >  One of the things precluding a move toward a more pure usage of
> > gremlin-language in place of groovy scripts is a way to instantiate a
> > date/time. It seems simple enough to just include a datetime() function
> in
> > the grammar that will parse a ISO-8601 formatted dates:
> >
> > datetime('2021-07-21')
> > datetime('2021-07-21T01:12:59')
> > datetime('2021-07-21T01:12:59+0500')
> >
> > Each language can retain its own method for producing datetime that it
> > already has.
> >
>


Re: [VOTE] TinkerPop 3.4.12 Release

2021-07-22 Thread Joshua Shinavier
+1


On Thu, Jul 22, 2021 at 11:06 AM David Bechberger 
wrote:

> +1
>
> On Tue, Jul 20, 2021 at 11:02 AM Stephen Mallette 
> wrote:
>
> > Docs look good and validate-distribution.sh passed
> >
> > VOTE +1
> >
> > On Tue, Jul 20, 2021 at 5:01 AM Florian Hockmann  >
> > wrote:
> >
> > > Hello,
> > >
> > > We are happy to announce that TinkerPop 3.4.12 is ready for release.
> > >
> > > The release artifacts can be found at this location:
> > > https://dist.apache.org/repos/dist/dev/tinkerpop/3.4.12/
> > >
> > > The source distribution is provided by:
> > > apache-tinkerpop-3.4.12-src.zip
> > >
> > > Two binary distributions are provided for user convenience:
> > > apache-tinkerpop-gremlin-console-3.4.12-bin.zip
> > > apache-tinkerpop-gremlin-server-3.4.12-bin.zip
> > >
> > > The GPG key used to sign the release artifacts is available at:
> > > https://dist.apache.org/repos/dist/dev/tinkerpop/KEYS
> > >
> > > The online docs can be found here:
> > > https://tinkerpop.apache.org/docs/3.4.12/ (user docs)
> > > https://tinkerpop.apache.org/docs/3.4.12/upgrade/ (upgrade
> docs)
> > > https://tinkerpop.apache.org/javadocs/3.4.12/core/ (core
> > javadoc)
> > > https://tinkerpop.apache.org/javadocs/3.4.12/full/ (full
> > javadoc)
> > > https://tinkerpop.apache.org/dotnetdocs/3.4.12/ (.NET API
> docs)
> > > https://tinkerpop.apache.org/jsdocs/3.4.12/ (Javascript API
> > docs)
> > >
> > > The tag in Apache Git can be found here:
> > > https://github.com/apache/tinkerpop/tree/3.4.12
> > >
> > > The release notes are available here:
> > >
> > https://github.com/apache/tinkerpop/blob/3.4.12/CHANGELOG.asciidoc
> > >
> > > The [VOTE] will be open for the next 72 hours --- closing Friday (July
> > 23,
> > > 2021) at 10am UTC.
> > >
> > > My vote is +1.
> > >
> > > Thank you very much,
> > > Florian Hockmann
> > >
> > >
> >
>


Re: [VOTE] TinkerPop 3.5.1 Release

2021-07-22 Thread Joshua Shinavier
+1


On Thu, Jul 22, 2021 at 10:41 AM David Bechberger 
wrote:

> Agreed, thanks for all the hard work.  +1
>
> We should probably include something about Discord in any sort of email
> communication as well.
>
> Thanks,
> Dave
>
> On Thu, Jul 22, 2021 at 8:27 AM Kelvin Lawrence 
> wrote:
>
> > Thanks for all the hard work to get the release out. VOTE +1
> > Kelvin
> > On Wednesday, July 21, 2021, 01:00:34 PM CDT, Stephen Mallette <
> > spmalle...@gmail.com> wrote:
> >
> >  It was an unfortunate omission but we probably should have had the
> > gremlint
> > library in the 3.5.1 upgrade docs. oh well, not something to retrigger
> all
> > that work over. Florian, please plan to call attention to it in the
> release
> > announcement in some way. I'll write something to add to the upgrade docs
> > for 3.5.1 so it will be there in the future.
> >
> > Other than that, bin/valdiate-distribution.sh worked fine so VOTE +1
> >
> > On Wed, Jul 21, 2021 at 10:16 AM Florian Hockmann <
> f...@florian-hockmann.de>
> > wrote:
> >
> > > Hello,
> > >
> > > We are happy to announce that TinkerPop 3.5.1 is ready for release.
> > >
> > > The release artifacts can be found at this location:
> > >https://dist.apache.org/repos/dist/dev/tinkerpop/3.5.1/
> > >
> > > The source distribution is provided by:
> > >apache-tinkerpop-3.5.1-src.zip
> > >
> > > Two binary distributions are provided for user convenience:
> > >apache-tinkerpop-gremlin-console-3.5.1-bin.zip
> > >apache-tinkerpop-gremlin-server-3.5.1-bin.zip
> > >
> > > The GPG key used to sign the release artifacts is available at:
> > >https://dist.apache.org/repos/dist/dev/tinkerpop/KEYS
> > >
> > > The online docs can be found here:
> > >https://tinkerpop.apache.org/docs/3.5.1/ (user docs)
> > >https://tinkerpop.apache.org/docs/3.5.1/upgrade/ (upgrade docs)
> > >https://tinkerpop.apache.org/javadocs/3.5.1/core/ (core
> javadoc)
> > >https://tinkerpop.apache.org/javadocs/3.5.1/full/ (full
> javadoc)
> > >https://tinkerpop.apache.org/dotnetdocs/3.5.1/ (.NET API docs)
> > >https://tinkerpop.apache.org/jsdocs/3.5.1/ (Javascript API
> docs)
> > >
> > > The tag in Apache Git can be found here:
> > >https://github.com/apache/tinkerpop/tree/3.5.1
> > >
> > > The release notes are available here:
> > >
> https://github.com/apache/tinkerpop/blob/3.5.1/CHANGELOG.asciidoc
> > >
> > > The [VOTE] will be open for the next 72 hours --- closing Saturday
> (July
> > > 24,
> > > 2021) at 3pm UTC.
> > >
> > > My vote is +1.
> > >
> > > Thank you very much,
> > > Florian Hockmann
> > >
> > >
> >
>


Re: [DISCUSS] Graphs over Thrift

2021-07-21 Thread Joshua Shinavier
FYI, this Friday at 11am PDT, a few of us will be following up on the above
with a Zoom meeting. We will touch on Thrift, Protobuf, Avro, and also
Apache Arrow as suggested by Leo Meyerovich. I think in the future, it
might be worthwhile to create a meetup.com group to organize events like
this, but for the time being, just reply to me directly with your email
address if you would like to attend, and I will add you to the invite.

Josh



On Fri, Jul 16, 2021 at 8:45 AM Joshua Shinavier  wrote:

> Following up on the proof of concept I created in TINKERPOP-2563-language,
> and the thread with Stephen, here is a demo video showing a graph with
> domain-specific types being sent over the wire between a Java-based client
> and a Java-based server:
>
> https://www.youtube.com/watch?v=wFCrJOXXs5Y
>
>
> Any thoughts are welcome. I will probably demonstrate the same thing for
> Protobuf and Avro in separate examples in the branch.
>
> Again, the flow of the demo is:
>
>- The client creates and populates a TinkerGraph instance. One of the
>properties has the key "livesIn" and a value which is an instance of a
>domain-specific BoundingBox class.
>- The client encodes the graph to an instance of the Thrift-generated
>Graph class. The BoundingBox is serialized using a JSON-based encoder which
>has been added to an encoder registry that is shared between client and
>server.
>- The Thrift-generated code sends the encoded graph across the wire to
>the server, which receives it again as an instance of the Thrift-generated
>Graph class
>- The server decodes the graph to a new instance of TinkerGraph. The
>serialized bounding box is deserialized to an instance of the
>domain-specific BoundingBox class, and becomes a property value in the
>server's graph.
>- The server prints out some info and writes the received graph to
>disk as a GraphSON file so  we can see that it is true to the client's
>original graph
>
> Josh
>


[DISCUSS] Graphs over Thrift

2021-07-16 Thread Joshua Shinavier
Following up on the proof of concept I created in TINKERPOP-2563-language,
and the thread with Stephen, here is a demo video showing a graph with
domain-specific types being sent over the wire between a Java-based client
and a Java-based server:

https://www.youtube.com/watch?v=wFCrJOXXs5Y


Any thoughts are welcome. I will probably demonstrate the same thing for
Protobuf and Avro in separate examples in the branch.

Again, the flow of the demo is:

   - The client creates and populates a TinkerGraph instance. One of the
   properties has the key "livesIn" and a value which is an instance of a
   domain-specific BoundingBox class.
   - The client encodes the graph to an instance of the Thrift-generated
   Graph class. The BoundingBox is serialized using a JSON-based encoder which
   has been added to an encoder registry that is shared between client and
   server.
   - The Thrift-generated code sends the encoded graph across the wire to
   the server, which receives it again as an instance of the Thrift-generated
   Graph class
   - The server decodes the graph to a new instance of TinkerGraph. The
   serialized bounding box is deserialized to an instance of the
   domain-specific BoundingBox class, and becomes a property value in the
   server's graph.
   - The server prints out some info and writes the received graph to disk
   as a GraphSON file so  we can see that it is true to the client's original
   graph

Josh


Re: [DISCUSS] go driver

2021-07-15 Thread Joshua Shinavier
+1 to official support for Go. In terms of popularity, it seems most often
to land in the top 10 than the top 5 in developer lists, but anecdotally,
it is increasingly used in companies that take after Google. My company has
two officially supported languages, FWIW: Go (#1) and Java (#2).

Josh


On Wed, Jul 14, 2021 at 1:05 PM Stephen Mallette 
wrote:

> If there were a candidate for another official Gremlin driver, I'd say that
> it would have to be Go. I've heard more requests for official support there
> than anything else. As far as I know there are no less than five different
> third-party Go drivers out there right now which further leads me to
> believe that folks want this functionality.
>
> I do have concerns about the additional overhead of yet another programming
> language to support, but if it helps unify a programming language space
> where we have a large number of users who would benefit, it's probably
> worth considering.
>
> Happy to hear from anyone with thoughts about adding support for Go,
> experience with any of the third-party Go drivers out there, etc.
>


Re: [DISCUSS] Discord Server?

2021-07-12 Thread Joshua Shinavier
That's the one I keep using! Shrug. If no-one else is having the same
problem, then maybe it isn't much of a problem.

Josh

On Mon, Jul 12, 2021 at 4:03 PM David Bechberger 
wrote:

> I thought that had been fixed with the new link (
> https://discord.gg/ndMpKZcBEE) as the first link I sent allows you to
> "Preview" which causes the behavior you mentioned.
>
> Josh, is the link above the one you are using when you see the issue?
>
> Dave
>
> On Mon, Jul 12, 2021 at 2:08 PM Joshua Shinavier 
> wrote:
>
> > Maybe I'm the only one, but every time I open up Discord after a few
> days,
> > I no longer see the Gremlin Icon for the Apache TinkerPop server, and
> have
> > to find and click your invite link all over again. Discord then
> "welcomes"
> > me into the #general channel. Might just be my inexperience with
> Discord; I
> > haven't used it yet for anything else.
> >
> > Josh
> >
> >
> > On Mon, Jul 12, 2021 at 1:39 PM David Bechberger 
> > wrote:
> >
> > > Now that we have a month+ of using Discord I was wondering what
> people's
> > > thoughts were on opening membership up to the Gremlin-users list and
> > > including it on the website?
> > >
> > > On Thu, May 27, 2021 at 9:22 AM David Bechberger 
> > > wrote:
> > >
> > > > So one of the things I have seen so far is that the original link I
> > sent
> > > > allowed users to "Preview" the server which allows you to see the
> > > channels
> > > > and post but as soon as you log off you are removed from the server.
> > > > Several people seem to have run into this so I have created a new
> link
> > > > below which does not allow this type of "Preview".
> > > >
> > > > https://discord.gg/ndMpKZcBEE
> > > >
> > > > Dave
> > > >
> > > > On Mon, May 24, 2021 at 12:05 PM David Bechberger <
> d...@bechberger.com
> > >
> > > > wrote:
> > > >
> > > >> I agree, I think fewer channels is better to start with.  If we get
> > to a
> > > >> level where the activity level is high we can always add additional
> > > >> channels.
> > > >>
> > > >>
> > > >> On Fri, May 21, 2021 at 9:54 AM Stephen Mallette <
> > spmalle...@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> thanks dave. I imagine the available channels are up for
> discussion.
> > to
> > > >>> start, i'd think the fewer channels the better. should we already
> be
> > > >>> looking to peel off into different graph provider channels?
> > > >>>
> > > >>> On Fri, May 21, 2021 at 1:14 PM David Bechberger <
> > d...@bechberger.com>
> > > >>> wrote:
> > > >>>
> > > >>> > I've gone ahead and set up a Discord instance for us to try.
> > > >>> >
> > > >>> > My thinking is that we can try this out internally for a week or
> > two
> > > >>> and
> > > >>> > then, if we like it, we promote this to the larger community.
> > > >>> Thoughts?
> > > >>> >
> > > >>> > If you click on the link below it should allow anyone here to
> join:
> > > >>> > https://discord.gg/5a72PZgmdq
> > > >>> >
> > > >>> >
> > > >>> > On Fri, May 14, 2021 at 2:17 AM Stephen Mallette <
> > > spmalle...@gmail.com
> > > >>> >
> > > >>> > wrote:
> > > >>> >
> > > >>> > > We've not promoted Slack as a place for users (purposefully).
> I'd
> > > say
> > > >>> > it's
> > > >>> > > worth adding Discord and giving it a shot for the user
> community
> > > and
> > > >>> > > keeping slack for those few moments where we need to ping each
> > > other
> > > >>> for
> > > >>> > > dev related items that might need some quick back/forth
> > > interaction.
> > > >>> > >
> > > >>> > > On Thu, May 13, 2021 at 9:14 PM Joshua Shinavier <
> > > j...@fortytwo.net>
> > > >>> > > wrote:
> > > >>> > >
> > > >

Re: [DISCUSS] Discord Server?

2021-07-12 Thread Joshua Shinavier
Maybe I'm the only one, but every time I open up Discord after a few days,
I no longer see the Gremlin Icon for the Apache TinkerPop server, and have
to find and click your invite link all over again. Discord then "welcomes"
me into the #general channel. Might just be my inexperience with Discord; I
haven't used it yet for anything else.

Josh


On Mon, Jul 12, 2021 at 1:39 PM David Bechberger 
wrote:

> Now that we have a month+ of using Discord I was wondering what people's
> thoughts were on opening membership up to the Gremlin-users list and
> including it on the website?
>
> On Thu, May 27, 2021 at 9:22 AM David Bechberger 
> wrote:
>
> > So one of the things I have seen so far is that the original link I sent
> > allowed users to "Preview" the server which allows you to see the
> channels
> > and post but as soon as you log off you are removed from the server.
> > Several people seem to have run into this so I have created a new link
> > below which does not allow this type of "Preview".
> >
> > https://discord.gg/ndMpKZcBEE
> >
> > Dave
> >
> > On Mon, May 24, 2021 at 12:05 PM David Bechberger 
> > wrote:
> >
> >> I agree, I think fewer channels is better to start with.  If we get to a
> >> level where the activity level is high we can always add additional
> >> channels.
> >>
> >>
> >> On Fri, May 21, 2021 at 9:54 AM Stephen Mallette 
> >> wrote:
> >>
> >>> thanks dave. I imagine the available channels are up for discussion. to
> >>> start, i'd think the fewer channels the better. should we already be
> >>> looking to peel off into different graph provider channels?
> >>>
> >>> On Fri, May 21, 2021 at 1:14 PM David Bechberger 
> >>> wrote:
> >>>
> >>> > I've gone ahead and set up a Discord instance for us to try.
> >>> >
> >>> > My thinking is that we can try this out internally for a week or two
> >>> and
> >>> > then, if we like it, we promote this to the larger community.
> >>> Thoughts?
> >>> >
> >>> > If you click on the link below it should allow anyone here to join:
> >>> > https://discord.gg/5a72PZgmdq
> >>> >
> >>> >
> >>> > On Fri, May 14, 2021 at 2:17 AM Stephen Mallette <
> spmalle...@gmail.com
> >>> >
> >>> > wrote:
> >>> >
> >>> > > We've not promoted Slack as a place for users (purposefully). I'd
> say
> >>> > it's
> >>> > > worth adding Discord and giving it a shot for the user community
> and
> >>> > > keeping slack for those few moments where we need to ping each
> other
> >>> for
> >>> > > dev related items that might need some quick back/forth
> interaction.
> >>> > >
> >>> > > On Thu, May 13, 2021 at 9:14 PM Joshua Shinavier <
> j...@fortytwo.net>
> >>> > > wrote:
> >>> > >
> >>> > > > I haven't used Discord much myself, but I don't see the down side
> >>> of
> >>> > > trying
> >>> > > > it out amongst ourselves, then inviting a few community members.
> >>> If the
> >>> > > > response is positive, announce it on gremlin-users. We already
> >>> have a
> >>> > > Slack
> >>> > > > workspace which is not much used, so in the worst case, now there
> >>> are
> >>> > > two.
> >>> > > >
> >>> > > > ^^ $0.02
> >>> > > >
> >>> > > > On Wed, May 12, 2021 at 4:53 PM David Bechberger <
> >>> d...@bechberger.com>
> >>> > > > wrote:
> >>> > > >
> >>> > > > > With the recent uptick in both StackOverflow posts and mailing
> >>> list
> >>> > > > posts,
> >>> > > > > it has got me thinking about ways to get more users engaged
> with
> >>> the
> >>> > > > > TinkerPop community. I was curious what people's thoughts were
> on
> >>> > > > starting
> >>> > > > > a Discord server?
> >>> > > > >
> >>> > > > > The pros I see here are:
> >>> > > > >
> >>> > > > > * The ability to interact with users directly
> >>> > > > > * The ability (through bots) to create a single location to
> >>> monitor
> >>> > for
> >>> > > > > questions
> >>> > > > > * The ability to have statistics on user engagement
> >>> > > > > * The ability to promote TinkerPop on internal/external discord
> >>> > server
> >>> > > > > lists
> >>> > > > >
> >>> > > > > The biggest con I can think of is that if there is not much
> >>> usage of
> >>> > > it,
> >>> > > > > then it would look like the project was not an active
> community.
> >>> > > > >
> >>> > > > > Thoughts?
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
>


Re: [tinkerpop] 03/04: Add a runnable example of a Thrift client sending graphs to a server over TChannel

2021-07-07 Thread Joshua Shinavier
FYI, I have extended the example to use the Protobuf-like solution for
domain-specific objects which I mentioned above. The overall flow of the
example is this:

   - The client creates and populates a TinkerGraph instance. One of the
   properties has the key "livesIn" and a value which is an instance of a
   domain-specific BoundingBox class.
   - The client encodes the graph to an instance of the Thrift-generated
   Graph class. The BoundingBox is serialized using a JSON-based encoder which
   has been added to an encoder registry that is shared between client and
   server.
   - The Thrift-generated code sends the encoded graph across the wire to
   the server, which receives it again as an instance of the Thrift-generated
   Graph class
   - The server decodes the graph to a new instance of TinkerGraph. The
   serialized bounding box is deserialized to an instance of the
   domain-specific BoundingBox class, and becomes a property value in the
   server's graph.
   - The server prints out some info and writes the received graph to disk
   as a GraphSON file so  we can see that it is true to the client's original
   graph

Note: I'm stretching the notion of "serialized" values somewhat in that, in
these graphs, a serialized value is a record with two fields (or an object
with two member variables): the encoded value itself (in this case, a JSON
blob), and a type identifier.

Josh



On Wed, Jul 7, 2021 at 6:51 AM Joshua Shinavier  wrote:

> Hi Stephen,
>
> Good questions. Let's elevate this discussion (about the specifics of
> graphs and traversal results over Thrift) to the dev list. See inline.
>
>
> On Wed, Jul 7, 2021 at 5:08 AM Stephen Mallette 
> wrote:
>
>> So, what happens if a returned Vertex contained a ByteBuffer or
>> InetAddress as a property value? I assume the thrift definition has to be
>> adjusted to include those types if you expect them in the results?
>>
>
>
> What you see in the diff, currently, captures the types specifically
> mentioned in Graph.Features (see graph_features.yaml). In order to support
> other types natively, we should update Graph.Features in parallel. Byte
> arrays can be captured using Thrift's binary type. Domain-specific types
> like InetAddress probably should not be built in, just as specific element
> labels and property keys are not built in at this level. However, that is
> not the only possible answer. Certain very common types like IP addresses,
> dates and intervals, units of measurement, etc. *could* be built into the
> type system, but IMO probably shouldn't. Instead, we should give users a
> way of encoding and decoding domain-specific objects using a handful of
> atomic types. InetAddress in this case is encoded either as a string or a
> struct.
>
>
>
>> How would provider specific types (like a Point or special instances of P
>> in JanusGraph) fit into something like this - how would providers (or
>> users) extend on our thrift definitions?
>>
>
> Point is definitely a domain-specific type which you would not see at this
> level of schema. Maybe I can illustrate encoding and decoding
> domain-specific types in the branch; using the current simple type system,
> you could turn the Point into a map with three keys, like "latitude",
> "longitude" and "type". When receiving a map with "type" equal to "Point",
> you turn it back into a native Point object. We could also use a strategy
> similar to Protobuf's Any type, where we send a struct with two fields over
> the wire: one field provides the data of the Point, and the other field
> provides a URL which specifies the type, i.e. how the object should be
> decoded. It is probably worthwhile to add a "record" type variant to
> Graph.Features in any case.
>
>
>
> I think that the idea of having a more strict definition on the types
>> Gremlin supports is starting to materialize given the constraints on
>> serializable types of GraphSON and then further restricted in GraphBinary.
>> We actually have a list of types that haven't changed much in years at this
>> point:
>>
>> https://tinkerpop.apache.org/docs/3.5.0/dev/io/
>>
>
>
> We might want to go through this list with a fine-toothed comb (i.e. we
> probably don't want both a Date atomic type and a Timestamp type unless
> they have different precision/granularity, in which case I would make that
> explicit in the name of the type, e.g. UnixTimeSeconds vs. UnixTimeMillis).
>
>
> I think we could actually even limit them further and then the dream would
>> be to prevent them from being so JVM specific.
>>
>
>
> Yes, I would argue for limiting them to very domain-indepen

Re: [tinkerpop] 03/04: Add a runnable example of a Thrift client sending graphs to a server over TChannel

2021-07-07 Thread Joshua Shinavier
Hi Stephen,

Good questions. Let's elevate this discussion (about the specifics of
graphs and traversal results over Thrift) to the dev list. See inline.


On Wed, Jul 7, 2021 at 5:08 AM Stephen Mallette 
wrote:

> So, what happens if a returned Vertex contained a ByteBuffer or
> InetAddress as a property value? I assume the thrift definition has to be
> adjusted to include those types if you expect them in the results?
>


What you see in the diff, currently, captures the types specifically
mentioned in Graph.Features (see graph_features.yaml). In order to support
other types natively, we should update Graph.Features in parallel. Byte
arrays can be captured using Thrift's binary type. Domain-specific types
like InetAddress probably should not be built in, just as specific element
labels and property keys are not built in at this level. However, that is
not the only possible answer. Certain very common types like IP addresses,
dates and intervals, units of measurement, etc. *could* be built into the
type system, but IMO probably shouldn't. Instead, we should give users a
way of encoding and decoding domain-specific objects using a handful of
atomic types. InetAddress in this case is encoded either as a string or a
struct.



> How would provider specific types (like a Point or special instances of P
> in JanusGraph) fit into something like this - how would providers (or
> users) extend on our thrift definitions?
>

Point is definitely a domain-specific type which you would not see at this
level of schema. Maybe I can illustrate encoding and decoding
domain-specific types in the branch; using the current simple type system,
you could turn the Point into a map with three keys, like "latitude",
"longitude" and "type". When receiving a map with "type" equal to "Point",
you turn it back into a native Point object. We could also use a strategy
similar to Protobuf's Any type, where we send a struct with two fields over
the wire: one field provides the data of the Point, and the other field
provides a URL which specifies the type, i.e. how the object should be
decoded. It is probably worthwhile to add a "record" type variant to
Graph.Features in any case.



I think that the idea of having a more strict definition on the types
> Gremlin supports is starting to materialize given the constraints on
> serializable types of GraphSON and then further restricted in GraphBinary.
> We actually have a list of types that haven't changed much in years at this
> point:
>
> https://tinkerpop.apache.org/docs/3.5.0/dev/io/
>


We might want to go through this list with a fine-toothed comb (i.e. we
probably don't want both a Date atomic type and a Timestamp type unless
they have different precision/granularity, in which case I would make that
explicit in the name of the type, e.g. UnixTimeSeconds vs. UnixTimeMillis).


I think we could actually even limit them further and then the dream would
> be to prevent them from being so JVM specific.
>


Yes, I would argue for limiting them to very domain-independent atomic
types, probably excluding the timestamp type(s) as well as UUID and Class.
However, as I say it's possible to include a few specialized types if the
user demand is really high. It's just more stuff which needs to be
implemented in each Gremlin language variant.



> It would be nice to elevate the discussion of supported types out of
> serialization and into the Gremlin language layer itself, which would then
> in turn drive serialization discussions.
>


That's where I see this going. The specification of Gremlin traversal
structure in YAML (already illustrated in the branch) translates neatly
into traversals over the wire using Thrift. To that and the basic graph
structure specification, we need a specification for other kinds of objects
which appear in traversal results, such as paths.


Josh


[original message clipped]


Re: [DISCUSS] Property types

2021-06-09 Thread Joshua Shinavier
Btw. NumericPrecision:

- name: NumericPrecision
  description: "Integer or floating-point precision in bits"
  type:
union:
  - name: arbitrary
description: "Arbitrary precision"

  - name: bits
description: "Precision limited to a given number of bits"
type: integer


On Wed, Jun 9, 2021 at 11:52 AM Joshua Shinavier  wrote:

> Hi Stephen,
>
> Responses inline.
>
> On Wed, Jun 9, 2021 at 4:04 AM Stephen Mallette 
> wrote:
>
>> Thanks for the update Josh
>>
>> [...]
>> >reasonably clear how to make that transition. In the beginning, you
>> > either
>> >have a schema or you don't.
>> >
>>
>> Could you clarify who is making that choice? Is it the provider saying
>> their graph supports schema or not? or did you mean the user is making
>> that
>> choice somehow and TinkerPop would thus enforce the schema?
>>
>
>
> At first, I don't think we need native schema support in graph providers.
> There will definitely be advantages to such support (e.g. better indexing,
> better query planning) where available, but there is a lot you can do with
> a schema at the application level, like validation, object-graph mapping
> (like Frames, but with no code other than the schema), and Gremlin
> traversal optimizations. tl;dr yes, it's the user who determines the
> schema, although every provider will come with a set of constraints
> (explicit or implicit) on what kinds of schemas can be supported. E.g. most
> providers do not support record-valued properties, so a schema with a
> record type for a property would be an illegal schema w.r.t. that provider
> (or at least, you'd need a mapping to turn the schema into one which is
> supported, e.g. by encoding records as strings).
>
>
>- *Atomic types*. As part of the basic type system for property graphs,
>> [...]
>> >However, all of this is to be discussed in detail on the dev list.
>> >
>>
>> I'm pretty interested in the direction this goes as numbers have always
>> been troublesome to our various language variants and it often doesn't
>> make
>> Gremlin look smart to those users of language off the JVM.
>>
>
>
> Below is the schema for Dragon's primitive types. Booleans and binary
> strings have no parameters, while integer and floating-point types do have
> some parameters. The string type happens to have a maximum-length parameter
> (other commonly asked-for features being minimum length, regex, etc.). This
> is not necessarily the schema we will use for TP4, but it might be close.
> Algebraic Property Graphs does not prescribe any particular set of
> primitive types; Dragon's types represent a pragmatic choice which has been
> appropriate for applications in a particular company. The questions to be
> answered for TinkerPop are: where should we draw the line between features
> which are built in to the framework, vs. extensions/ornamentation which are
> best left to individual graph providers. The PGSWG approach, at the moment,
> is more like APG in that there are no prescribed type parameters, and we're
> still deciding whether there should be built-in atomic types at all
> (leaning toward "yes").
>
> It might be worthwhile if you can summarize the problems we have had with
> numeric types, here or in a separate thread, and then we can talk about how
> we might be able to address them with schemas and a data model
> specification.
>
> Josh
>
>
> - name: PrimitiveType
>   description: "A primitive data type, such as a string or boolean type"
>   type:
> union:
>   - name: binary
> description: "The type of a binary value, consisting of a sequence of 
> bytes"
> type: BinaryType
>
>   - name: boolean
> description: "The type of a boolean value, consisting of true or 
> false"
> type: BooleanType
>
>   - name: float
> description: "The type of a floating-point value"
> type: FloatType
>
>   - name: integer
> description: "The type of an integer value"
> type: IntegerType
>
>   - name: string
> description: "The type of a string value"
> type: StringType
>
> - name: BinaryType
>   description: "The type of a binary value, consisting of a sequence of bytes"
>
> - name: BooleanType
>   description: "The type of a boolean value (either true or false)"
>
> - name: FloatType
>   description: "A floating-point data type with a given bit precision"
&g

Re: [DISCUSS] Property types

2021-06-09 Thread Joshua Shinavier
Hi Stephen,

Responses inline.

On Wed, Jun 9, 2021 at 4:04 AM Stephen Mallette 
wrote:

> Thanks for the update Josh
>
> [...]
> >reasonably clear how to make that transition. In the beginning, you
> > either
> >have a schema or you don't.
> >
>
> Could you clarify who is making that choice? Is it the provider saying
> their graph supports schema or not? or did you mean the user is making that
> choice somehow and TinkerPop would thus enforce the schema?
>


At first, I don't think we need native schema support in graph providers.
There will definitely be advantages to such support (e.g. better indexing,
better query planning) where available, but there is a lot you can do with
a schema at the application level, like validation, object-graph mapping
(like Frames, but with no code other than the schema), and Gremlin
traversal optimizations. tl;dr yes, it's the user who determines the
schema, although every provider will come with a set of constraints
(explicit or implicit) on what kinds of schemas can be supported. E.g. most
providers do not support record-valued properties, so a schema with a
record type for a property would be an illegal schema w.r.t. that provider
(or at least, you'd need a mapping to turn the schema into one which is
supported, e.g. by encoding records as strings).


   - *Atomic types*. As part of the basic type system for property graphs,
> [...]
> >However, all of this is to be discussed in detail on the dev list.
> >
>
> I'm pretty interested in the direction this goes as numbers have always
> been troublesome to our various language variants and it often doesn't make
> Gremlin look smart to those users of language off the JVM.
>


Below is the schema for Dragon's primitive types. Booleans and binary
strings have no parameters, while integer and floating-point types do have
some parameters. The string type happens to have a maximum-length parameter
(other commonly asked-for features being minimum length, regex, etc.). This
is not necessarily the schema we will use for TP4, but it might be close.
Algebraic Property Graphs does not prescribe any particular set of
primitive types; Dragon's types represent a pragmatic choice which has been
appropriate for applications in a particular company. The questions to be
answered for TinkerPop are: where should we draw the line between features
which are built in to the framework, vs. extensions/ornamentation which are
best left to individual graph providers. The PGSWG approach, at the moment,
is more like APG in that there are no prescribed type parameters, and we're
still deciding whether there should be built-in atomic types at all
(leaning toward "yes").

It might be worthwhile if you can summarize the problems we have had with
numeric types, here or in a separate thread, and then we can talk about how
we might be able to address them with schemas and a data model
specification.

Josh


- name: PrimitiveType
  description: "A primitive data type, such as a string or boolean type"
  type:
union:
  - name: binary
description: "The type of a binary value, consisting of a
sequence of bytes"
type: BinaryType

  - name: boolean
description: "The type of a boolean value, consisting of true or false"
type: BooleanType

  - name: float
description: "The type of a floating-point value"
type: FloatType

  - name: integer
description: "The type of an integer value"
type: IntegerType

  - name: string
description: "The type of a string value"
type: StringType

- name: BinaryType
  description: "The type of a binary value, consisting of a sequence of bytes"

- name: BooleanType
  description: "The type of a boolean value (either true or false)"

- name: FloatType
  description: "A floating-point data type with a given bit precision"
  type:
record:
  - name: precision
description: "The floating-point precision of the type, in
bits. Common precision values are 32 and 64."
type: NumericPrecision
  default:
precision:
  bits: 32

- name: IntegerType
  description: "An integer data type with a given bit precision,
signedness, and optional width encoding"
  type:
record:
  - name: precision
description: "The integer precision of the type, in bits.
Common precision values are 32 and 64."
type: NumericPrecision

  - name: signed
description: "Whether the type represents signed or unsigned integers"
type: boolean

  - name: fixedWidth
description: "Whether a fixed-width integer or varint encoding
is preferred"
type:
  optional: boolean
  default:
precision:
  bits: 32
signed: true

- name: StringType
  description: "A string data type with an optional maximum length.
The encoding scheme is unspecified."
  type:
record:
  - name: maximumLength
description: >
  If provided, an upper bound (inclusive) on the lengt

[DISCUSS] Property types

2021-06-07 Thread Joshua Shinavier
Hi all,

As we move ahead toward schema support in TinkerPop, one of the concerns we
need to keep in mind is future interoperability with GQL
, the ISO standard property graph query
language which is currently under development. GQL will encompass a
formally-defined data model and schema language for property graphs. One of
the most important inputs to GQL, in terms of schema, is the Property Graph
Schema Working Group (PGSWG
), which is concerned
with coming up with that formal data model. While PGSWG is not open to the
public, it is easy enough to become a member of LDBC if you would like to
get involved (feel free to ping me if so). You can also request access to this
doc

if you would like some insight into the discussion around property types,
which is the topic of the subgroup I lead. What is an appropriate type
system for property values, and by extension, graph elements? In this email
thread, I would like to expose some of the issues we have been discussing
to the TinkerPop community, and get your feedback. While TinkerPop's schema
language will not be based directly on GQL (for one thing, the
specification probably will not be ready for at least another year), it is
important to aim for compatibility, as this is what the major graph vendors
will be expected to implement in the future, if all goes well. There is
also an opportunity for TinkerPop to lead the way and demonstrate
applications of such a schema language in advance of a standard, which in
turn will inform the standard.

Without trying to give an exhaustive overview at this time, here are some
things I would like to mention:

   - *Prescriptive vs. descriptive schemas*. This is very important for
   property graphs. Whereas TinkerPop and most other PG solutions are
   relatively schema-less, there is strong motivation for a real,
   vendor-neutral schema language, which is what is driving the PGSWG. At the
   same time, a solution which requires all graph data to strictly conform to
   a schema in all contexts would not be supportive of typical applications,
   so there needs to be a spectrum. We are still discussing ways in which to
   provide the flexibility of schemas which describe parts of a graph and help
   with validation and inference over those parts, but which are tolerant of
   "other stuff" which goes beyond the schema. For TinkerPop, I think it will
   be simplest to start out with a binary world: either your graph is
   schemaless, i.e. you just don't have or just don't care about a schema, or
   it is expected to strictly conform to a predefined schema. Long-term, we
   are likely to move toward more of a true spectrum, and I think it's
   reasonably clear how to make that transition. In the beginning, you either
   have a schema or you don't.


   - *Algebraic types*. After all of the discussions I have had in the
   working group, at my company, and in the graph community, I still find
   algebraic data types to be the most promising basis for a TinkerPop schema
   language, and this is essentially what I am recommending for GQL as well.
   The initial proposal to GQL will include a kind of "everything but the
   kitchen sink" type grammar which has algebraic as well as non-algebraic
   type constructors (allowing implementations to pick and choose based on
   what works for them), but in TinkerPop I think we can be more focused, and
   I would welcome any discussion around this topic. By algebraic data types,
   I mean products (records) and sums (unions), together with primitives and
   named types (labels) along the lines of Algebraic Property Graphs (APG
   ). An algebraic type system of this
   kind has the advantage of being very straightforward to reason about, while
   also being well aligned with enterprise data languages. There will be
   other, more detailed posts from me to this list on the type system I
   propose to use in TinkerPop.


   - *Atomic types*. As part of the basic type system for property graphs,
   there is broad agreement that there should be a collection of predefined
   atomic types (called "primitive types" in the APG paper and in Dragon) like
   integers, floating-point numbers, character strings, and booleans. There is
   also agreement within the property types group that this collection of
   atomic types may be infinite, and that parameterization of atomic types
   should not be part of the standard schema language. For example, in
   addition to a 32-bit integer type, you might also have a 16-bit integer
   type, a 64-bit integer type, and so on... potentially any number of
   "integer" data types related to each other by a parameter. In GQL, these
   will likely just be given names like int32, int16, etc. and it will be up
   to implementations to determine the internal structure of th

Re: [TinkerPop] AW: Welcome Josh Shinavier as a TinkerPop PMC member

2021-06-07 Thread Joshua Shinavier
Thank you, Florian! Glad to be on board in an official capacity. Oh yes,
there will be schemas.

Josh



On Mon, Jun 7, 2021 at 6:34 AM Florian Hockmann 
wrote:

> Congrats and welcome, Josh!
>
> -Ursprüngliche Nachricht-
> Von: Stephen Mallette 
> Gesendet: Freitag, 4. Juni 2021 22:12
> An: dev@tinkerpop.apache.org; gremlin-us...@googlegroups.com
> Betreff: Welcome Josh Shinavier as a TinkerPop PMC member
>
> The TinkerPop PMC is pleased to announce that Josh Shinavier has accepted
> the invitation to become a PMC member. Thanks, Josh, for your continued
> support of the project and we are happy to have you here.
>
> Best regards,
>
> The TinkerPop PMC
>
> --
> You received this message because you are subscribed to the Google Groups
> "Gremlin-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to gremlin-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/gremlin-users/007e01d75ba1%24d860e510%248922af30%24%40florian-hockmann.de
> .
>


Re: code generation and RDF support in TinkerPop 4

2021-06-03 Thread Joshua Shinavier
Hi Pieter,

You give some good motivation for a formal schema language. My proposal for
an abstract data model for TinkerPop was, and is Algebraic Property Graphs (
paper <https://arxiv.org/abs/1909.04881>), of which Dragon's data model is
an extension. APG is broader than typical property graphs (e.g. allowing
hyperelements, nested data, and other features which are uncommon or
unknown in connection with TinkerPop), so the best answer to your question
is probably "a variant of APG with restrictions".

Given a formal specification of TinkerPop's data model, we can be very
flexible with respect to concrete syntaxes. Dragon has its YAML syntax, and
the new framework will probably support a slightly different YAML syntax,
but you can specify graph schemas in a variety of languages (the current
tooling will read schemas expressed in YAML, JSON, Thrift, or Protobuf),
and you can express graph data in a variety of languages. What the formal
specification of the data model, and the mappings, give you is the ability
to map schemas and data transparently between the formats, so you can use
whatever is most appropriate to your application.

Btw. at some point, you'll see a schema for property graph features appear
in the branch -- a kind of TP4 successor to Graph.Features
<https://tinkerpop.apache.org/javadocs/current/full/org/apache/tinkerpop/gremlin/structure/Graph.Features.html>.
This will be a small language for declaring the specific refinement of APG
/ the TinkerPop data model which is supported by a given property graph
implementation. That will help you understand not only a single graph, but
also the characteristic class of graphs for a given vendor, adapter, etc.

Josh




On Thu, Jun 3, 2021 at 11:43 AM pieter gmail 
wrote:

> Hi,
>
> I kinda lost track of what we discussed previously.
> Did we come to a decision regarding what language we are going to use to
> describe the structure of the graph.
>
> yaml,xsd,uml,yang or some category theory based language?
>
> From my understanding this would be the biggest change in tp4. A TinkerPop
> graph will no be longer a tangle of endless vertices and edges but instead
> can, optionally, be well defined and constrained. This way an engineer can,
> long after the original creators of a graph have left, immediately
> understand the graph, without needing to write a single query.
>
> Thanks
> Pieter
>
>
>
>
> On Thu, 2021-06-03 at 09:59 -0700, Joshua Shinavier wrote:
>
> Hi Pieter,
>
>
> On Thu, Jun 3, 2021 at 9:40 AM pieter gmail 
> wrote:
>
> Hi,
>
> Just to understand a bit better whats going on.
>
> Did you hand write the dragon yaml with the antlr grammar as input?
>
>
>
> Yes, the YAML was written by hand, and based pretty closely on Gremlin.g4.
> You can see Stephen's ANTLR definitions inline with the YAML as comments. I
> also took some direction from the Java API.
>
>
>
>
> Did you generate the java classes from the yaml using dragon or
> something else?
>
>
>
> Yes, the Java classes are currently generated using Dragon. I'm limiting
> the generated code to Java for now (other possible targets being Scala and
> Haskell) just to keep diffs to a reasonable size, and because a new,
> open-source solution is needed to replace Dragon. My current thinking is
> that the new transformation framework will be separate from TinkerPop, as
> it will serve non-graph as well as graph use cases. For now, you can think
> of the code generation as a bootstrapping strategy.
>
> Josh
>
>
>
>
>
> Thanks
> Pieter
>
> On Thu, 2021-06-03 at 07:48 -0700, Joshua Shinavier wrote:
> > Hello all,
> >
> > I would like to take some concrete steps toward the TinkerPop 4
> > interoperability goals I've stated a few times (e.g. see TinkerPop
> > 2020
> > <https://www.slideshare.net/joshsh/tinkerpop-2020>from last year). At
> > a
> > meetup <https://www.meetup.com/Category-Theory/events/277331504/> a
> > couple
> > of months ago, I demonstrated an approach for generating TinkerPop
> > APIs
> > consistently into different languages. I have started to check in
> > some of
> > that generated code in a branch (see my commits here
> > <
> https://github.com/apache/tinkerpop/commits/TINKERPOP-2563-language/gremlin-language
> > >)
> > and add bits and pieces for RDF support, as well.
> >
> > The Apache Software Foundation asks us to discuss any significant
> > changes
> > to the code base on the dev list. Since these steps toward TP4 will
> > be
> > major changes if and when they are merged into the master branch, I
> > will
> > start discussing them here. Expect occasional emails fro

Re: code generation and RDF support in TinkerPop 4

2021-06-03 Thread Joshua Shinavier
Hi Pieter,


On Thu, Jun 3, 2021 at 9:40 AM pieter gmail  wrote:

> Hi,
>
> Just to understand a bit better whats going on.
>
> Did you hand write the dragon yaml with the antlr grammar as input?
>


Yes, the YAML was written by hand, and based pretty closely on Gremlin.g4.
You can see Stephen's ANTLR definitions inline with the YAML as comments. I
also took some direction from the Java API.




> Did you generate the java classes from the yaml using dragon or
> something else?
>


Yes, the Java classes are currently generated using Dragon. I'm limiting
the generated code to Java for now (other possible targets being Scala and
Haskell) just to keep diffs to a reasonable size, and because a new,
open-source solution is needed to replace Dragon. My current thinking is
that the new transformation framework will be separate from TinkerPop, as
it will serve non-graph as well as graph use cases. For now, you can think
of the code generation as a bootstrapping strategy.

Josh




>
> Thanks
> Pieter
>
> On Thu, 2021-06-03 at 07:48 -0700, Joshua Shinavier wrote:
> > Hello all,
> >
> > I would like to take some concrete steps toward the TinkerPop 4
> > interoperability goals I've stated a few times (e.g. see TinkerPop
> > 2020
> > <https://www.slideshare.net/joshsh/tinkerpop-2020>from last year). At
> > a
> > meetup <https://www.meetup.com/Category-Theory/events/277331504/> a
> > couple
> > of months ago, I demonstrated an approach for generating TinkerPop
> > APIs
> > consistently into different languages. I have started to check in
> > some of
> > that generated code in a branch (see my commits here
> > <
> https://github.com/apache/tinkerpop/commits/TINKERPOP-2563-language/gremlin-language
> > >)
> > and add bits and pieces for RDF support, as well.
> >
> > The Apache Software Foundation asks us to discuss any significant
> > changes
> > to the code base on the dev list. Since these steps toward TP4 will
> > be
> > major changes if and when they are merged into the master branch, I
> > will
> > start discussing them here. Expect occasional emails from me about
> > the
> > various things I will be doing in the branch. I absolutely invite
> > comments,
> > feedback, and actual discussion on these design proposals, but even
> > if it's
> > just me issuing self-affirming statements into the void like the King
> > of
> > Pointland, I will just carry on, because that's how this process
> > works.
> >
> > A brief summary of the changes so far:
> >
> >
> >- *Abstract specification of Gremlin traversals*. I have turned
> >Stephen's Gremlin.g4
> >
> > <
> https://github.com/apache/tinkerpop/blob/TINKERPOP-2563-language/gremlin-language/src/main/antlr4/Gremlin.g4
> > >
> >ANTLR grammar into an abstract specification of Gremlin traversal
> > syntax
> >using the Dragon (YAML-based) format. Unfortunately, it is looking
> > very
> >unlikely that Dragon will become available as open-source
> > software, so you
> >can expect this YAML format to change just slightly once we have a
> > new
> >Dragon-like tool for schema and data transformations. More on that
> > later.
> >Right now, the syntax specification can be found here
> >
> > <
> https://github.com/apache/tinkerpop/tree/TINKERPOP-2563-language/gremlin-language/src/main/yaml/org/apache/tinkerpop/gremlin/language/model
> > >,
> >although the file path might change in the future.
> >
> >
> >- *Traversal DTOs*. Based on the abstract specification, I have
> >generated Java classes for building and working with traversals.
> > The
> >generated files can currently be found here
> >
> > <
> https://github.com/apache/tinkerpop/tree/TINKERPOP-2563-language/gremlin-language/src/gen/java/org/apache/tinkerpop/gremlin/language/model
> > >.
> >These are essentially POJOs or DTO classes, with special
> > boilerplate
> >methods for equality, pattern matching over alternative
> > constructors, and
> >modification by copying (since the instances are immutable). These
> > classes
> >allow you to build traversals in a declarative way, while all of
> > the logic
> >for evaluating traversals goes elsewhere. Support for
> > serialization and
> >deserialization for traversals is to be added in the future -- and
> > the same
> >goes for all other classes generated in this way.
> >
> >
> >- *RDF 1.1 concepts mode

code generation and RDF support in TinkerPop 4

2021-06-03 Thread Joshua Shinavier
Hello all,

I would like to take some concrete steps toward the TinkerPop 4
interoperability goals I've stated a few times (e.g. see TinkerPop 2020
from last year). At a
meetup  a couple
of months ago, I demonstrated an approach for generating TinkerPop APIs
consistently into different languages. I have started to check in some of
that generated code in a branch (see my commits here
)
and add bits and pieces for RDF support, as well.

The Apache Software Foundation asks us to discuss any significant changes
to the code base on the dev list. Since these steps toward TP4 will be
major changes if and when they are merged into the master branch, I will
start discussing them here. Expect occasional emails from me about the
various things I will be doing in the branch. I absolutely invite comments,
feedback, and actual discussion on these design proposals, but even if it's
just me issuing self-affirming statements into the void like the King of
Pointland, I will just carry on, because that's how this process works.

A brief summary of the changes so far:


   - *Abstract specification of Gremlin traversals*. I have turned
   Stephen's Gremlin.g4
   

   ANTLR grammar into an abstract specification of Gremlin traversal syntax
   using the Dragon (YAML-based) format. Unfortunately, it is looking very
   unlikely that Dragon will become available as open-source software, so you
   can expect this YAML format to change just slightly once we have a new
   Dragon-like tool for schema and data transformations. More on that later.
   Right now, the syntax specification can be found here
   
,
   although the file path might change in the future.


   - *Traversal DTOs*. Based on the abstract specification, I have
   generated Java classes for building and working with traversals. The
   generated files can currently be found here
   
.
   These are essentially POJOs or DTO classes, with special boilerplate
   methods for equality, pattern matching over alternative constructors, and
   modification by copying (since the instances are immutable). These classes
   allow you to build traversals in a declarative way, while all of the logic
   for evaluating traversals goes elsewhere. Support for serialization and
   deserialization for traversals is to be added in the future -- and the same
   goes for all other classes generated in this way.


   - *RDF 1.1 concepts model*. RDF support was part of TinkerPop from the
   beginning, but it was de-emphasized for TinkerPop 3 due to other priorities
   such as OLAP. For years, developers have been asking us for better
   interoperability with RDF. While we do have some query-level support for
   RDF these days in sparql-gremlin, we no longer have any data-level support,
   e.g. supporting loading RDF data into a property graph and getting it back
   out, evaluating Gremlin traversals over RDF datasets, etc. These things are
   not especially hard to do, in certain limited ways, but our old approach of
   writing adapters like GraphSail
   ,
   SailGraph
   , and
   PropertyGraphSail
   

   in Java, with no support for other languages, does not seem appropriate for
   TinkerPop 4. Also, those early mappings were extremely underspecified in a
   formal sense -- good enough for some practical applications, but not good
   enough for anything requiring inference, optimization, or composition with
   other mappings. To that end, I am starting to add abstract specifications
   for RDF along the lines of the Gremlin specifications I described above.
   The first of these, a specification of RDF 1.1 Concepts, can currently be
   found here
   
,
   with generated Java classes here
   
.
   This gives us a way of working with RDF data in a language-neutral and
   framework-neutral way (whereas we were previously tied to Java and to the
   RDF4j, nee Sesame, API). Mappings into and out of RDF will be defined with
   respect to these abst

Re: [DISCUSS] Discord Server?

2021-05-13 Thread Joshua Shinavier
I haven't used Discord much myself, but I don't see the down side of trying
it out amongst ourselves, then inviting a few community members. If the
response is positive, announce it on gremlin-users. We already have a Slack
workspace which is not much used, so in the worst case, now there are two.

^^ $0.02

On Wed, May 12, 2021 at 4:53 PM David Bechberger 
wrote:

> With the recent uptick in both StackOverflow posts and mailing list posts,
> it has got me thinking about ways to get more users engaged with the
> TinkerPop community. I was curious what people's thoughts were on starting
> a Discord server?
>
> The pros I see here are:
>
> * The ability to interact with users directly
> * The ability (through bots) to create a single location to monitor for
> questions
> * The ability to have statistics on user engagement
> * The ability to promote TinkerPop on internal/external discord server
> lists
>
> The biggest con I can think of is that if there is not much usage of it,
> then it would look like the project was not an active community.
>
> Thoughts?
>


[jira] [Assigned] (TINKERPOP-2563) Unify Gremlin grammar and structure/process APIs across GLVs

2021-05-11 Thread Joshua Shinavier (Jira)


 [ 
https://issues.apache.org/jira/browse/TINKERPOP-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua Shinavier reassigned TINKERPOP-2563:
---

Assignee: Joshua Shinavier

> Unify Gremlin grammar and structure/process APIs across GLVs
> 
>
> Key: TINKERPOP-2563
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2563
> Project: TinkerPop
>  Issue Type: New Feature
>  Components: language
>Affects Versions: 3.6.0
>Reporter: Joshua Shinavier
>    Assignee: Joshua Shinavier
>Priority: Major
>
> This is a set of exploratory features in which the ANTLR grammar Gremlin.g4 
> will be supplemented by, and possibly generated from, a higher-level 
> specification in YAML. Generalizing the grammar in this way will potentially 
> allow grammars for additional Gremlin language variants to be generated, and 
> the common specification (for traversals as well as the core property graph 
> data model) can also be used for generating structure and process APIs in 
> multiple languages in parallel. See [TinkerPop 
> 2020|https://www.slideshare.net/joshsh/tinkerpop-2020] for a discussion of 
> related open problems, and How to Build a Dragon ([Part 
> 3|https://www.meetup.com/Category-Theory/events/277331504/]) for a 
> demonstration of some of the anticipated features.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TINKERPOP-2563) Unify Gremlin grammar and structure/process APIs across GLVs

2021-05-11 Thread Joshua Shinavier (Jira)


 [ 
https://issues.apache.org/jira/browse/TINKERPOP-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua Shinavier updated TINKERPOP-2563:

Component/s: (was: process)
 language

> Unify Gremlin grammar and structure/process APIs across GLVs
> 
>
> Key: TINKERPOP-2563
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2563
> Project: TinkerPop
>  Issue Type: New Feature
>  Components: language
>Affects Versions: 3.6.0
>Reporter: Joshua Shinavier
>Priority: Major
>
> This is a set of exploratory features in which the ANTLR grammar Gremlin.g4 
> will be supplemented by, and possibly generated from, a higher-level 
> specification in YAML. Generalizing the grammar in this way will potentially 
> allow grammars for additional Gremlin language variants to be generated, and 
> the common specification (for traversals as well as the core property graph 
> data model) can also be used for generating structure and process APIs in 
> multiple languages in parallel. See [TinkerPop 
> 2020|https://www.slideshare.net/joshsh/tinkerpop-2020] for a discussion of 
> related open problems, and How to Build a Dragon ([Part 
> 3|https://www.meetup.com/Category-Theory/events/277331504/]) for a 
> demonstration of some of the anticipated features.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (TINKERPOP-2563) Unify Gremlin grammar and structure/process APIs across GLVs

2021-05-11 Thread Joshua Shinavier (Jira)
Joshua Shinavier created TINKERPOP-2563:
---

 Summary: Unify Gremlin grammar and structure/process APIs across 
GLVs
 Key: TINKERPOP-2563
 URL: https://issues.apache.org/jira/browse/TINKERPOP-2563
 Project: TinkerPop
  Issue Type: New Feature
  Components: process
Affects Versions: 3.6.0
Reporter: Joshua Shinavier


This is a set of exploratory features in which the ANTLR grammar Gremlin.g4 
will be supplemented by, and possibly generated from, a higher-level 
specification in YAML. Generalizing the grammar in this way will potentially 
allow grammars for additional Gremlin language variants to be generated, and 
the common specification (for traversals as well as the core property graph 
data model) can also be used for generating structure and process APIs in 
multiple languages in parallel. See [TinkerPop 
2020|https://www.slideshare.net/joshsh/tinkerpop-2020] for a discussion of 
related open problems, and How to Build a Dragon ([Part 
3|https://www.meetup.com/Category-Theory/events/277331504/]) for a 
demonstration of some of the anticipated features.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] TinkerPop 3.4.11 Release

2021-05-06 Thread Joshua Shinavier
+1

On Thu, May 6, 2021 at 6:07 AM Øyvind Sæbø  wrote:

> VOTE +1
>
> tor. 6. mai 2021 kl. 14:51 skrev Jorge Bay Gondra <
> jorgebaygon...@gmail.com
> >:
>
> > VOTE +1
> >
> > On Wed, May 5, 2021 at 11:10 AM  wrote:
> >
> > > I mostly reviewed the docs and all looks good.
> > >
> > > VOTE +1
> > >
> > > -Ursprüngliche Nachricht-
> > > Von: Stephen Mallette 
> > > Gesendet: Dienstag, 4. Mai 2021 18:00
> > > An: dev@tinkerpop.apache.org
> > > Betreff: [VOTE] TinkerPop 3.4.11 Release
> > >
> > > Hello,
> > >
> > > We are happy to announce that TinkerPop 3.4.11 is ready for release.
> > >
> > > The release artifacts can be found at this location:
> > > https://dist.apache.org/repos/dist/dev/tinkerpop/3.4.11/
> > >
> > > The source distribution is provided by:
> > > apache-tinkerpop-3.4.11-src.zip
> > >
> > > Two binary distributions are provided for user convenience:
> > > apache-tinkerpop-gremlin-console-3.4.11-bin.zip
> > > apache-tinkerpop-gremlin-server-3.4.11-bin.zip
> > >
> > > The GPG key used to sign the release artifacts is available at:
> > > https://dist.apache.org/repos/dist/dev/tinkerpop/KEYS
> > >
> > > The online docs can be found here:
> > > https://tinkerpop.apache.org/docs/3.4.11/ (user docs)
> > > https://tinkerpop.apache.org/docs/3.4.11/upgrade/ (upgrade
> docs)
> > > https://tinkerpop.apache.org/javadocs/3.4.11/core/ (core
> > javadoc)
> > > https://tinkerpop.apache.org/javadocs/3.4.11/full/ (full
> > javadoc)
> > > https://tinkerpop.apache.org/dotnetdocs/3.4.11/ (.NET API
> docs)
> > > https://tinkerpop.apache.org/jsdocs/3.4.11/ (Javascript API
> > docs)
> > >
> > > The tag in Apache Git can be found here:
> > > https://github.com/apache/tinkerpop/tree/3.4.11
> > >
> > > The release notes are available here:
> > >
> > https://github.com/apache/tinkerpop/blob/3.4.11/CHANGELOG.asciidoc
> > >
> > > The [VOTE] will be open for the next 72 hours --- closing Friday (May
> 7,
> > > 2021) at 12pm ET.
> > >
> > > My vote is +1.
> > >
> > >
> >
>


Re: [VOTE] TinkerPop 3.5.0 Release

2021-05-06 Thread Joshua Shinavier
Awesome. +1

On Thu, May 6, 2021 at 6:42 AM Øyvind Sæbø  wrote:

> 🎉
>
> VOTE +1
>
> tor. 6. mai 2021 kl. 14:51 skrev Jorge Bay Gondra <
> jorgebaygon...@gmail.com
> >:
>
> > Yay! 3.5.0!
> >
> > VOTE +1
> >
> > On Thu, May 6, 2021 at 10:08 AM  wrote:
> >
> > > VOTE +1
> > >
> > > One small issue I found while reviewing the upgrade docs though:
> > > TINKERPOP-2317 removed support for Python lambdas in Gremlin Server,
> but
> > > Gremlin.Net still supports sending them. This is also documented in the
> > > reference docs:
> > >
> https://tinkerpop.apache.org/docs/3.5.0/reference/#gremlin-dotnet-lambda
> > > I unfortunately completely missed that until now, but I think that it's
> > > not a big issue that we can also fix for 3.5.1. This should definitely
> > not
> > > stop the release in my opinion.
> > >
> > > -Ursprüngliche Nachricht-
> > > Von: Stephen Mallette 
> > > Gesendet: Mittwoch, 5. Mai 2021 22:50
> > > An: dev@tinkerpop.apache.org
> > > Betreff: [VOTE] TinkerPop 3.5.0 Release
> > >
> > > Hello,
> > >
> > > We are happy to announce that TinkerPop 3.5.0 is ready for release.
> > >
> > > The release artifacts can be found at this location:
> > > https://dist.apache.org/repos/dist/dev/tinkerpop/3.5.0/
> > >
> > > The source distribution is provided by:
> > > apache-tinkerpop-3.5.0-src.zip
> > >
> > > Two binary distributions are provided for user convenience:
> > > apache-tinkerpop-gremlin-console-3.5.0-bin.zip
> > > apache-tinkerpop-gremlin-server-3.5.0-bin.zip
> > >
> > > The GPG key used to sign the release artifacts is available at:
> > > https://dist.apache.org/repos/dist/dev/tinkerpop/KEYS
> > >
> > > The online docs can be found here:
> > > https://tinkerpop.apache.org/docs/3.5.0/ (user docs)
> > > https://tinkerpop.apache.org/docs/3.5.0/upgrade/ (upgrade
> docs)
> > > https://tinkerpop.apache.org/javadocs/3.5.0/core/ (core
> javadoc)
> > > https://tinkerpop.apache.org/javadocs/3.5.0/full/ (full
> javadoc)
> > > https://tinkerpop.apache.org/dotnetdocs/3.5.0/ (.NET API docs)
> > > https://tinkerpop.apache.org/jsdocs/3.5.0/ (Javascript API
> docs)
> > >
> > > The tag in Apache Git can be found here:
> > > https://github.com/apache/tinkerpop/tree/3.5.0
> > >
> > > The release notes are available here:
> > >
> > https://github.com/apache/tinkerpop/blob/3.5.0/CHANGELOG.asciidoc
> > >
> > > The [VOTE] will be open for the next 72 hours --- closing Saturday (May
> > 8,
> > > 2021) at 5pm ET.
> > >
> > > My vote is +1.
> > >
> > >
> >
>


Re: 3.5.0 Announcement Volunteers

2021-05-01 Thread Joshua Shinavier
Okiedokie. Lots to unpack there, but let me just say this one thing:
Haskell rocks.

Anyway, I think there are good technical reasons for optimism about the
current state of TinkerPop. If you are suggesting some alternative to
incremental improvements, plus prototyping the more substantial changes,
the specifics of it are not clear from your emails. Maybe Stephen has more
context on a previous line of conversation than I do, but I don't see how
the crazy talk is adding anything of value.

Josh



On Sat, May 1, 2021 at 6:55 PM Marko Rodriguez  wrote:

> Josh — You have been talking for over 2 years now about what you will
> accomplish. 2 years ago you asked to be a committer. Do you remember what I
> said? "You have to do something to be a committer." However, I felt for you
> because you were looking for a job and I fool-heartedly vouched for you
> thinking you wouldn’t dare cross me once more with your empty promises.
> However, once you got your name on the TinkerPop webpage, what have you
> done since except parade it around on resumes and the like? And some
> internal Uber code in Haskell is not accomplishing anything for TinkerPop.
> You fooled us with your promises and now you act (once again) as if you
> will do something in the future. I’ve worked with you for 15 years now —
> think about it 15 years as your CTO in one company, CEO in another, and
> your advisor at LANL — and it all comes to not. You know it. I would love
> for you to finally prove me wrong and finally grab the bull by the horns
> and accomplish something of value instead of relegating Gremlin to “the
> bastard child of Ripple” and living off the successes of others with your
> name all proud front-and-center on the work created by the hands of other
> men.
>
> This is the point people. You all have learned how to talk and act, but
> what have you done in the last 3 years that keeps this project burning
> beyond the whims of your dying organizations and fading careers? To claim
> we are now in ‘enterprise world’ or ‘I promise to do’ all the while
> allowing those who did do stuff to be butchered like pigs in front of your
> own eyes. Cowards.
>
> Stephen — you dilly-dally. Kuppitz left. I left. Your great collaborators
> faded away … laying in wait for truth once more. You have only so many
> decisions left to make before you will not come back from the void you are
> staring into. Nut up — as leader of this project, create the thriving
> environment we once enjoyed. Don’t let your social and political fears trap
> you in mediocrity. You are a hero. You will only come to this point again
> and again and again in your lives to come. Why waste time? Slay the dragon
> and let us feast on the magical meat of creation once more — as in the time
> when our dining halls were not filled with lost bards and delirious jesters.
>
> Marko.
>
>
>
>
>
>
> > On May 1, 2021, at 10:30 AM, Joshua Shinavier  wrote:
> >
> > I think a great way to lose developers, and not gain new ones, is to make
> > negative comments on the dev and/or users list, even if they are only
> half
> > serious. Or more than half serious? I can't tell. In any case, I think
> > TinkerPop is in a good place, and would be surprised if you truly don't
> > agree. There are Gremlin implementations almost everywhere there are
> graph
> > databases. To my mind, the scaffolding stage of the project -- building
> the
> > structure and filling the space -- is done. Now we have a chance to go
> back
> > and make things truly robust. Formalizing the data model, formalizing the
> > semantics of traversals in a way which adds power without subtracting
> > versatility. Building better bridges between TinkerPop-compatible graphs
> > and the rest of the world's data. Other, OLAPy and distrtibuted-systems-y
> > things I haven't thought as much about, but which others have. I think
> some
> > of the changes will require a clean break from the existing code base,
> > hence a new major version, but others can follow more of a
> > replace-and-deprecate pattern.
> >
> > Josh
> >
> >
> >
> > On Sat, May 1, 2021 at 8:54 AM Marko Rodriguez 
> wrote:
> >
> >> Hello,
> >>
> >>> not quite the topic for this thread but...
> >>
> >> Oh but it is. Over the last 3 years there has been little done to
> advance
> >> the 50% area of the codebase that I wrote — the virtual machine, OLAP,
> and
> >> language.
> >>
> >>1. Talking with DataBricks about gremlin-spark, it’s odd that
> >> DataFrames hasn’t been adopted.
> >>2. Why can’t OLAP do bulk writes/updates?
> >&

Re: 3.5.0 Announcement Volunteers

2021-05-01 Thread Joshua Shinavier
> >>>> enhancements to the Python client. I could write something around
> those
> >>>> features.
> >>>>
> >>>> Cheers, Kelvin
> >>>>
> >>>>> On Apr 30, 2021, at 04:30, f...@florian-hockmann.de wrote:
> >>>>>
> >>>>> I could write something for .NET. Added GraphBinary support and
> >>>> switching the JSON library could be interesting for some Gremlin.Net
> >> users.
> >>>>>
> >>>>> -Ursprüngliche Nachricht-
> >>>>> Von: Stephen Mallette 
> >>>>> Gesendet: Donnerstag, 29. April 2021 21:32
> >>>>> An: dev@tinkerpop.apache.org
> >>>>> Betreff: Re: 3.5.0 Announcement Volunteers
> >>>>>
> >>>>> Right now, I think it's fine for these to just have each person's
> >>>> individual style - might make the posts more interesting assuming we
> >> get a
> >>>> few more volunteers. If you can come up with a neat image that could
> go
> >>>> with a tweet to promote the announcement (that we will push through
> the
> >>>> TinkerPop account), that would be cool. We've not really come up with
> >>>> anything that sort of iconifies the gremlin-language module, so if you
> >> feel
> >>>> like thinking about that, that would be neat.
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On Thu, Apr 29, 2021 at 2:45 PM Joshua Shinavier  >
> >>>> wrote:
> >>>>>>
> >>>>>> Sounds good. I'll write the announcement. If you have thoughts on
> the
> >>>>>> format, please feel free to share.
> >>>>>>
> >>>>>> Josh
> >>>>>>
> >>>>>> On Thu, Apr 29, 2021 at 10:56 AM Stephen Mallette
> >>>>>> 
> >>>>>> wrote:
> >>>>>>
> >>>>>>> On Thu, Apr 29, 2021 at 1:38 PM Joshua Shinavier <
> j...@fortytwo.net>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> I would be happy to collaborate on gremlin-language if there is
> >>>>>> something
> >>>>>>>> which needs doing.
> >>>>>>>>
> >>>>>>>> Josh
> >>>>>>>>
> >>>>>>>>
> >>>>>>> great josh - thanks! The upgrade docs sorta tuck that feature away
> >>>>>>> in the provider section
> >>>>>>>
> >>>>>>>
> >>>>>>
> >> https://tinkerpop.apache.org/docs/3.5.0-SNAPSHOT/upgrade/#_gremlin_lan
> >>>>>> guage
> >>>>>>>
> >>>>>>> because at this point it doesn't have direct user impact, but i
> >>>>>>> think it might be useful to the community to write something in an
> >>>>>>> announcement
> >>>>>> that
> >>>>>>> helps describe what this module lays the foundation for. you've had
> >>>>>>> some interesting ideas in this area that i'm not sure have gotten
> >>>>>>> outside of
> >>>>>> the
> >>>>>>> dev list as of yet.
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>
> >>
>
>


Re: 3.5.0 Announcement Volunteers

2021-04-29 Thread Joshua Shinavier
Sounds good. I'll write the announcement. If you have thoughts on the
format, please feel free to share.

Josh

On Thu, Apr 29, 2021 at 10:56 AM Stephen Mallette 
wrote:

> On Thu, Apr 29, 2021 at 1:38 PM Joshua Shinavier 
> wrote:
>
> > I would be happy to collaborate on gremlin-language if there is something
> > which needs doing.
> >
> > Josh
> >
> >
> great josh - thanks! The upgrade docs sorta tuck that feature away in the
> provider section
>
> https://tinkerpop.apache.org/docs/3.5.0-SNAPSHOT/upgrade/#_gremlin_language
>
> because at this point it doesn't have direct user impact, but i think it
> might be useful to the community to write something in an announcement that
> helps describe what this module lays the foundation for. you've had some
> interesting ideas in this area that i'm not sure have gotten outside of the
> dev list as of yet.
>


Re: 3.5.0 Announcement Volunteers

2021-04-29 Thread Joshua Shinavier
I would be happy to collaborate on gremlin-language if there is something
which needs doing.

Josh


On Thu, Apr 29, 2021 at 10:24 AM Stephen Mallette 
wrote:

> 3.5.0 is a fairly big release. There's too much in there to summarize in a
> single announcement email. I was thinking that we might produce multiple
> tweets/posts to follow the main announcement that would call attention to
> important changes. I think that each announcement should hopefully do a bit
> more than re-hash the upgrade documentation - hopefully each announcement
> can either expand on a batch of changes or focus in on a single item from
> the upgrade docs.
>
> We'd look to start sending these out immediately after our general release
> announcement which should occur Monday, May 10. I'd like to try to have a
> schedule of these posts ready by the middle of next week and I'd think
> drafts in review by next Friday, May 7.  If you have a 3.5.0 feature that
> you'd like to author/collaborate on and have a bit of time in your schedule
> to work on this, then please reply to this thread and we can get organized.
>
> Thanks,
>
> Stephen
>


Re: [DISCUSS] ANTLR and gremlin-script

2021-03-25 Thread Joshua Shinavier
Hi Stephen,

Replies inline.

On Wed, Mar 24, 2021 at 5:18 AM Stephen Mallette 
wrote:

> [...]
>  I'd thought that it made sense that Groovy and the grammar stayed
> fairly close to one another as a nice point of Groovy is that you can make
> it do neat tricks when building DSLs but I don't think it necessarily needs
> to.
>

I think those neat tricks ought to be possible with the generalized grammar
as well, but it's all a little hand-wavy right now. If this is of interest,
which it seems to be, I'll see if I can do a small proof of concept some
time soon.



>1. Specify an abstract Gremlin grammar in a neutral language like YAML
> >2. Write some helper code for generating ANTLR grammars from the YAML
> >3. For each Gremlin language variant, write a smaller amount of code
> >based on (2) to generate a language-specific ANTLR grammar
> >
>
> I like how you present this idea because one of our big problems and big
> assets are language variants and generating our way to ANTLR support in
> each makes sense to me. I'd be curious what a YAML representation and the
> related work might look like and how you think we might structure it.
>


I'm curious too, but it would probably look a lot like the other kinds of
transformations we perform on schemas and data with YAML or JSON as the
source. E.g. instead of an expression like this:

traversalMethod_out:
  'out' LPAREN stringLiteralList RPAREN
  ;

you'd have something more like this:

- name: out
  category: traversalMethod
  parameters:
- type:
list: label

and we would transform the latter to the former in the case of
Gremlin-Groovy, and to something slightly different in the case of other
variants.




> Are you suggesting I change gremlin-grammar to gremlin-language with that
> work in mind?



That's my suggestion, because gremlin-language sounds a little more
inclusive of other abstract Gremlin language specifications in addition to
the grammar. Not to get too far ahead of ourselves, but I think it should
be possible to use the YAML-based specification I'm hinting at above for
other things like serializing/deserializing Gremlin traversals, and
traversal results in non-text formats like JSON, Thrift, Protobuf, Avro,
YAML of course, etc. A cleaner solution to the "format zoo" was another
pain area we discussed last year, where I think mappings can help.



> I don't think it's a problem to do so. I guess
> gremlin-language would eventually house (1), (2), and (3)? Right now,
> though, gremlin-grammar also generates a Java parser from the one ANTLR
> file I have there. Would the idea be that we'd organize this module to
> generate parsers for each language we supported as well? or would that live
> somewhere else?
>

Yeah, I think once we can generate the parser for Gremlin-Groovy (which
should look almost identical to what you already have), it ought to be
straightforward to generate parser for other languages. Btw. if I give this
a go, and the solution looks promising, maybe I will make it the topic of a
future Category Theory and Applications session. The first session on
Dragon is tonight at 6pm PDT.

Josh




>
>
> >
> > Btw. I will be giving a Category Theory and Applications presentation
> > <https://www.meetup.com/Category-Theory/events/nnrhgsyccfbhc/> next week
> > which will illustrate how something like the above might work.
> >
> > Josh
> >
> >
> >
> > On Tue, Mar 16, 2021 at 12:48 PM Stephen Mallette 
> > wrote:
> >
> > > Here is the PR: https://github.com/apache/tinkerpop/pull/1408
> > >
> > > On Tue, Mar 16, 2021 at 6:14 AM Stephen Mallette  >
> > > wrote:
> > >
> > > > No branch yet, but I think I will be sending the PR today.
> > > >
> > > > On Mon, Mar 15, 2021 at 9:33 PM Joshua Shinavier 
> > > > wrote:
> > > >
> > > >> Is there a branch we can take a look at before the PR is ready?
> > > >>
> > > >> Josh
> > > >>
> > > >> On Fri, Mar 12, 2021 at 5:42 AM Stephen Mallette <
> > spmalle...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > I've been working on forming a pull request for this task. I don't
> > > >> think IP
> > > >> > Clearance is necessary as I originally did because the
> contribution
> > is
> > > >> > really just an ANTLR4 grammar file with some tests to validate
> > things.
> > > >> > Therefore, it's not a big body of independent code as I'd perhaps
> > > >> initially
> > > >&g

Re: [DISCUSS] ANTLR and gremlin-script

2021-03-22 Thread Joshua Shinavier
Btw. to your other questions:

I think Stephen's attitude toward TP4 is that there should be a smooth
transition, so ideally this would be for TP3, migrating to TP4 when the
time comes.

Native Java traversals are not accessible to a parser, so I do not see how
they would be impacted by gremlin-grammar or gremlin-language.

Josh



On Mon, Mar 22, 2021 at 2:00 PM Joshua Shinavier  wrote:

> I am not yet entirely sure what it means either, but I am thinking that it
> would be nice to be able to
> a) validate Gremlin expressions written in languages other than Groovy or
> Java, and
> b) parse Gremlin expressions in Gremlin language variants, producing
> JVM-based traversals, or even
> c) parse Gremlin expressions in Gremlin language variants, producing
> native traversals in the host language
> These options range from easy (a) to hard (c).
>
> Another way to look at this is: let's use Stephen's grammar as a template
> for a more generic grammar which is more flexible w.r.t. the input language.
>
> Josh
>
>
>
> On Mon, Mar 22, 2021 at 11:37 AM pieter gmail 
> wrote:
>
>> Hi,
>>
>> Exciting as this is I am not quite sure what it means.
>>
>> Naively  perhaps it the idea,
>> Arbitary gremlin string -> antlr parser -> some AST walker -> gremlin
>> byte code -> java in memory steps ... -> voila
>>
>> Is the grammar going to be the primary and only
>> interface/specification, or will the native java implementation bypass
>> the grammar going straight to the steps instead?
>>
>> Is this aimed at the gremlin 3 or 4?
>>
>> Cheers
>> Pieter
>>
>> On Tue, 2021-03-16 at 15:47 -0400, Stephen Mallette wrote:
>> > Here is the PR: https://github.com/apache/tinkerpop/pull/1408
>> >
>> > On Tue, Mar 16, 2021 at 6:14 AM Stephen Mallette
>> > 
>> > wrote:
>> >
>> > > No branch yet, but I think I will be sending the PR today.
>> > >
>> > > On Mon, Mar 15, 2021 at 9:33 PM Joshua Shinavier
>> > > 
>> > > wrote:
>> > >
>> > > > Is there a branch we can take a look at before the PR is ready?
>> > > >
>> > > > Josh
>> > > >
>> > > > On Fri, Mar 12, 2021 at 5:42 AM Stephen Mallette
>> > > > 
>> > > > wrote:
>> > > >
>> > > > > I've been working on forming a pull request for this task. I
>> > > > > don't
>> > > > think IP
>> > > > > Clearance is necessary as I originally did because the
>> > > > > contribution is
>> > > > > really just an ANTLR4 grammar file with some tests to validate
>> > > > > things.
>> > > > > Therefore, it's not a big body of independent code as I'd
>> > > > > perhaps
>> > > > initially
>> > > > > envisioned. Compared to gremlint, this addition is pretty
>> > > > > simple and
>> > > > > straightforward. I've created this issue in JIRA with some
>> > > > > additional
>> > > > notes
>> > > > > on what to expect in this initial body of work:
>> > > > >
>> > > > > https://issues.apache.org/jira/browse/TINKERPOP-2533
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Mon, Feb 8, 2021 at 10:06 AM Stephen Mallette
>> > > > > 
>> > > > > wrote:
>> > > > >
>> > > > > > Just wanted to leave an update on this thread. It was nice to
>> > > > > > see some
>> > > > > > support for it. I've not had time to focus on the task itself
>> > > > > > so sorry
>> > > > > > there hasn't been much movement, but I hope to see it on
>> > > > > > track soon. I
>> > > > > > thought to update the thread after I came across yet another
>> > > > > > nice
>> > > > usage
>> > > > > for
>> > > > > > it. I've long wanted to unify our test framework (i.e.
>> > > > > > deprecate the
>> > > > JVM
>> > > > > > process suite in favor of the GLV test suite). I was
>> > > > > > experimenting
>> > > > with
>> > > > > > what that might look like on Friday and hit a circular
>> > > > > > dependency

Re: [DISCUSS] ANTLR and gremlin-script

2021-03-22 Thread Joshua Shinavier
I am not yet entirely sure what it means either, but I am thinking that it
would be nice to be able to
a) validate Gremlin expressions written in languages other than Groovy or
Java, and
b) parse Gremlin expressions in Gremlin language variants, producing
JVM-based traversals, or even
c) parse Gremlin expressions in Gremlin language variants, producing native
traversals in the host language
These options range from easy (a) to hard (c).

Another way to look at this is: let's use Stephen's grammar as a template
for a more generic grammar which is more flexible w.r.t. the input language.

Josh



On Mon, Mar 22, 2021 at 11:37 AM pieter gmail 
wrote:

> Hi,
>
> Exciting as this is I am not quite sure what it means.
>
> Naively  perhaps it the idea,
> Arbitary gremlin string -> antlr parser -> some AST walker -> gremlin
> byte code -> java in memory steps ... -> voila
>
> Is the grammar going to be the primary and only
> interface/specification, or will the native java implementation bypass
> the grammar going straight to the steps instead?
>
> Is this aimed at the gremlin 3 or 4?
>
> Cheers
> Pieter
>
> On Tue, 2021-03-16 at 15:47 -0400, Stephen Mallette wrote:
> > Here is the PR: https://github.com/apache/tinkerpop/pull/1408
> >
> > On Tue, Mar 16, 2021 at 6:14 AM Stephen Mallette
> > 
> > wrote:
> >
> > > No branch yet, but I think I will be sending the PR today.
> > >
> > > On Mon, Mar 15, 2021 at 9:33 PM Joshua Shinavier
> > > 
> > > wrote:
> > >
> > > > Is there a branch we can take a look at before the PR is ready?
> > > >
> > > > Josh
> > > >
> > > > On Fri, Mar 12, 2021 at 5:42 AM Stephen Mallette
> > > > 
> > > > wrote:
> > > >
> > > > > I've been working on forming a pull request for this task. I
> > > > > don't
> > > > think IP
> > > > > Clearance is necessary as I originally did because the
> > > > > contribution is
> > > > > really just an ANTLR4 grammar file with some tests to validate
> > > > > things.
> > > > > Therefore, it's not a big body of independent code as I'd
> > > > > perhaps
> > > > initially
> > > > > envisioned. Compared to gremlint, this addition is pretty
> > > > > simple and
> > > > > straightforward. I've created this issue in JIRA with some
> > > > > additional
> > > > notes
> > > > > on what to expect in this initial body of work:
> > > > >
> > > > > https://issues.apache.org/jira/browse/TINKERPOP-2533
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Feb 8, 2021 at 10:06 AM Stephen Mallette
> > > > > 
> > > > > wrote:
> > > > >
> > > > > > Just wanted to leave an update on this thread. It was nice to
> > > > > > see some
> > > > > > support for it. I've not had time to focus on the task itself
> > > > > > so sorry
> > > > > > there hasn't been much movement, but I hope to see it on
> > > > > > track soon. I
> > > > > > thought to update the thread after I came across yet another
> > > > > > nice
> > > > usage
> > > > > for
> > > > > > it. I've long wanted to unify our test framework (i.e.
> > > > > > deprecate the
> > > > JVM
> > > > > > process suite in favor of the GLV test suite). I was
> > > > > > experimenting
> > > > with
> > > > > > what that might look like on Friday and hit a circular
> > > > > > dependency
> > > > which
> > > > > > constantly trips things up where gremlin-test wants to depend
> > > > > > on
> > > > > > gremlin-groovy (for ScriptEngine support) but gremlin-groovy
> > > > > > depends
> > > > on
> > > > > > gremlin-test and tinkergraph with  scope already. I
> > > > > > think the
> > > > > > introduction of gremlin-script would let gremlin-test build
> > > > > > the
> > > > Traversal
> > > > > > object from a Gremlin string and thus avoid that circular
> > > > relationship.
> > > > > >
> > > > > > On Fri, Jan 8, 2021 at 2:43 AM pieter gmail

Re: [DISCUSS] ANTLR and gremlin-script

2021-03-22 Thread Joshua Shinavier
Awesome. This definitely fills a major gap, and as Florian said, it is
great that it is already improving the documentation.

High level-question: do you have any thoughts about Gremlin syntax
validation *in general*, i.e. for language variants other than Groovy? It
would be interesting to explore the following:

   1. Specify an abstract Gremlin grammar in a neutral language like YAML
   2. Write some helper code for generating ANTLR grammars from the YAML
   3. For each Gremlin language variant, write a smaller amount of code
   based on (2) to generate a language-specific ANTLR grammar

Based on my experience with Dragon, I have pretty good handle on what the
YAML (1) would need to look like, and how to write language-neutral helper
code (2). What would take a little more investigation is to what extent we
could do (3) on the basis of an understanding of the target language alone.
E.g. if a new step or signature is added to Gremlin, will it be enough to
add a specification of the step to the abstract grammar, or would we need
to special-case the step for each language variant? I suspect we wouldn't
have to do *too* much special-casing, but that's to be determined.

I might suggest calling the module "gremlin-language" if we were to
undertake the above. That would also allow other schemas to be provided
which would help us generate the structure API into different
implementation languages in a consistent way, as we discussed last year
<https://www.slideshare.net/joshsh/tinkerpop-2020>.

Btw. I will be giving a Category Theory and Applications presentation
<https://www.meetup.com/Category-Theory/events/nnrhgsyccfbhc/> next week
which will illustrate how something like the above might work.

Josh



On Tue, Mar 16, 2021 at 12:48 PM Stephen Mallette 
wrote:

> Here is the PR: https://github.com/apache/tinkerpop/pull/1408
>
> On Tue, Mar 16, 2021 at 6:14 AM Stephen Mallette 
> wrote:
>
> > No branch yet, but I think I will be sending the PR today.
> >
> > On Mon, Mar 15, 2021 at 9:33 PM Joshua Shinavier 
> > wrote:
> >
> >> Is there a branch we can take a look at before the PR is ready?
> >>
> >> Josh
> >>
> >> On Fri, Mar 12, 2021 at 5:42 AM Stephen Mallette 
> >> wrote:
> >>
> >> > I've been working on forming a pull request for this task. I don't
> >> think IP
> >> > Clearance is necessary as I originally did because the contribution is
> >> > really just an ANTLR4 grammar file with some tests to validate things.
> >> > Therefore, it's not a big body of independent code as I'd perhaps
> >> initially
> >> > envisioned. Compared to gremlint, this addition is pretty simple and
> >> > straightforward. I've created this issue in JIRA with some additional
> >> notes
> >> > on what to expect in this initial body of work:
> >> >
> >> > https://issues.apache.org/jira/browse/TINKERPOP-2533
> >> >
> >> >
> >> >
> >> > On Mon, Feb 8, 2021 at 10:06 AM Stephen Mallette <
> spmalle...@gmail.com>
> >> > wrote:
> >> >
> >> > > Just wanted to leave an update on this thread. It was nice to see
> some
> >> > > support for it. I've not had time to focus on the task itself so
> sorry
> >> > > there hasn't been much movement, but I hope to see it on track
> soon. I
> >> > > thought to update the thread after I came across yet another nice
> >> usage
> >> > for
> >> > > it. I've long wanted to unify our test framework (i.e. deprecate the
> >> JVM
> >> > > process suite in favor of the GLV test suite). I was experimenting
> >> with
> >> > > what that might look like on Friday and hit a circular dependency
> >> which
> >> > > constantly trips things up where gremlin-test wants to depend on
> >> > > gremlin-groovy (for ScriptEngine support) but gremlin-groovy depends
> >> on
> >> > > gremlin-test and tinkergraph with  scope already. I think the
> >> > > introduction of gremlin-script would let gremlin-test build the
> >> Traversal
> >> > > object from a Gremlin string and thus avoid that circular
> >> relationship.
> >> > >
> >> > > On Fri, Jan 8, 2021 at 2:43 AM pieter gmail <
> pieter.mar...@gmail.com>
> >> > > wrote:
> >> > >
> >> > >> +1
> >> > >>
> >> > >> I have often thought the language specification should be a project
> >> > >> 

Re: [DISCUSS] ANTLR and gremlin-script

2021-03-15 Thread Joshua Shinavier
Is there a branch we can take a look at before the PR is ready?

Josh

On Fri, Mar 12, 2021 at 5:42 AM Stephen Mallette 
wrote:

> I've been working on forming a pull request for this task. I don't think IP
> Clearance is necessary as I originally did because the contribution is
> really just an ANTLR4 grammar file with some tests to validate things.
> Therefore, it's not a big body of independent code as I'd perhaps initially
> envisioned. Compared to gremlint, this addition is pretty simple and
> straightforward. I've created this issue in JIRA with some additional notes
> on what to expect in this initial body of work:
>
> https://issues.apache.org/jira/browse/TINKERPOP-2533
>
>
>
> On Mon, Feb 8, 2021 at 10:06 AM Stephen Mallette 
> wrote:
>
> > Just wanted to leave an update on this thread. It was nice to see some
> > support for it. I've not had time to focus on the task itself so sorry
> > there hasn't been much movement, but I hope to see it on track soon. I
> > thought to update the thread after I came across yet another nice usage
> for
> > it. I've long wanted to unify our test framework (i.e. deprecate the JVM
> > process suite in favor of the GLV test suite). I was experimenting with
> > what that might look like on Friday and hit a circular dependency which
> > constantly trips things up where gremlin-test wants to depend on
> > gremlin-groovy (for ScriptEngine support) but gremlin-groovy depends on
> > gremlin-test and tinkergraph with  scope already. I think the
> > introduction of gremlin-script would let gremlin-test build the Traversal
> > object from a Gremlin string and thus avoid that circular relationship.
> >
> > On Fri, Jan 8, 2021 at 2:43 AM pieter gmail 
> > wrote:
> >
> >> +1
> >>
> >> I have often thought the language specification should be a project
> >> separate from the implementations, and done in a formal but plain
> >> English format similar to OMG or IETF specifications.
> >>
> >> I suspect Sqlg's code base would have been fastly different if it had
> >> evolved from a grammer instead of an api.
> >>
> >> Cheers
> >> Pieter
> >>
> >> On Thu, 2020-12-24 at 14:41 -0500, Stephen Mallette wrote:
> >> > As a project, over the years, we've often been asked the question as
> >> > to why
> >> > Gremlin doesn't have an ANTLR style grammar. There have been varying
> >> > answers over the years to explain the reasoning but in recent years
> >> > I've
> >> > started to see where our dependence on Java for driving Gremlin
> >> > design has
> >> > not translated well as we have expanded Gremlin into other
> >> > programming
> >> > ecosystems. Using Java has often allowed idioms of that language to
> >> > leak
> >> > into Gremlin itself which introduces friction when implemented
> >> > outside of
> >> > the JVM. I think that there is some advantage to designing Gremlin
> >> > more
> >> > with just graphs/usage in mind and then determining how that design
> >> > choice
> >> > looks in each programming language.
> >> >
> >> > I think that using an ANTLR grammar to drive that design work for
> >> > Gremlin
> >> > makes a lot of sense in this context. We would effectively have
> >> > something
> >> > like a gremlin-script which would become the new language archetype.
> >> > New
> >> > steps, language changes, etc. would be discussed in its context and
> >> > then
> >> > implemented in the grammar and later in each programming language we
> >> > support in the style a developer would expect. An interesting upside
> >> > of
> >> > this approach is that we can implement gremlin-script in the
> >> > ScriptEngine
> >> > and replace GremlinGroovyScriptEngine which would help us strengthen
> >> > our
> >> > security story in Gremlin Server. Groovy processing would just be a
> >> > fallback to Gremlin scripts that could not be processed by the AST.
> >> > In fact
> >> > users who didn't need Groovy could simply not install it at all and
> >> > thus
> >> > boast a much more secure system.
> >> >
> >> > I think that inclusion of a grammar in our project is an exciting new
> >> > direction for us to take and will help in a variety of areas beyond
> >> > those
> >> > I've already related.
> >> >
> >> > If we like this direction, Amazon Neptune already maintains such a
> >> > grammar
> >> > and would be willing to contribute it to the project to live in open
> >> > source. The contribution would go through the same IP Clearance
> >> > process
> >> > gremlint is going through since it was developed outside of
> >> > TinkerPop. I'd
> >> > be happy to guide that process through if we draw to consensus here.
> >>
> >>
> >>
>


Re: [DISCUSS] Adding motif support to match()

2021-02-04 Thread Joshua Shinavier
Initial thought: if the ASCII art syntax is Cypher-like, why not make it
openCypher proper? I.e. keep match() as it is, but generalize the cypher()
step out of Neo4jGraph, with native Neo4j evaluation of Cypher as an
optimization.

Josh


On Thu, Feb 4, 2021 at 2:17 PM David Bechberger  wrote:

> Over the years of working with Gremlin I have foudn the match() step is
> difficult to create traversals with and even more difficult to make it work
> efficently.  While the imperative style of programming in Gremlin provides
> a powerful path finding mechanism it really lacks an easy way to perform
> pattern matching queries.  It would be great if we could simplify the
> match() step to enable users to easily generate these pattern matching
> traversals.
>
> To accomplish this I was wondering what adding support for a subset of
> motif/ascii art patterns to the match step might look like.  These types of
> patterns are very easy for people to understand and I think the ability to
> combine these pattern matching syntax with the powerful path finding and
> formatting features of Gremlin would make a powerful combination.
>
> To accomplish this I am suggesting supporting a subset of potential
> patterns.  The two most common examples of this sort of pattern out there
> are the openCypher type style and the style used by GraphX.  I have
> provided a few examples below of what this syntax might look like:
>
> e.g. openCypher style
>
> Find me everything within one hop
> g.V().match("()-[]->()")
>
> Find me everything within one hop of a Person vertex
> g.V().match("(p:Person)-[]->()")
>
> Find me all Companies within one hop of a Person vertex
> g.V().match("(p:Person)-[]->(c:Company)")
>
> Find me all Companies within one hop of a Person vertex with an Employed_at
> edge
> g.V().match("(p:Person)-[e:employed_at]->(c:Company)")
>
>
> The other option would be to take more of a hybrid approach and use only
> the basic art/motifs like GraphX and apply the additional filtering in a
> hybrid type of mode like this:
>
> Find me all Companies within one hop of a Person vertex with an Employed_at
> edge
> g.V().match("(p)-[e]->(c)",
> __.as('p').hasLabel('Person'),
> __.as('e').hasLabel('employed_at'),
> __.as('c').hasLabel('Company'),
> )
>
> This also has the potential to enable some significantly more complex
> patterns like "Find me all Companies within one hop of a Person vertex with
> an Employed_at edge who also worked at Foo"
> g.V().match("(p)-[e]->(c)",
> __.as('p').hasLabel('Person').out('employed_at').has('Company', 'name',
> 'Foo'),
> __.as('e').hasLabel('employed_at'),
> __.as('c').hasLabel('Company'),
> )
>
> Thoughts?
>
> Dave
>


Re: [DISCUSS] Add gremlify.com to Powered By

2020-01-30 Thread Joshua Shinavier
I don't know if I would could a tool that does not (yet) implement basic
steps like E() as "powered by Gremlin", but it seems to me that the hard
part (building a nice GUI for constructing graphs and running traversals,
that does already construct graphs and let you run certain traversals) is
already done. After the missing pieces are added, this will be a great
asset to new graph users, and even a handy tool for experienced ones. I
imagine the usual toy graphs could be added with little trouble.

So... conditional +1.


On Wed, Jan 29, 2020 at 2:28 PM Stephen Mallette 
wrote:

> The http://gremlify.com/ website was recently announced on gremlin-users:
>
> https://groups.google.com/d/msg/gremlin-users/BDmWtfInTl4/e5F84qNkDgAJ
>
> It's young but I think it satisfies our requirements for listing in the
> Powered By section of our homepage. Unless there are objections in the next
> 72 hours, I will assume lazy consensus and get it added.
>


Re: [DISCUSS] string formatting

2020-01-23 Thread Joshua Shinavier
Just a quick note, but I don't think we would go far wrong using Formatter
for now. It is after all an "interpreter for printf-style format strings",
where printf  has a POSIX
specification  that is
implemented in many programming languages. The Formatter docs go on to say
that while Formatter is "inspired by" printf, it departs from printf in
ways that are idiosyncratic to Java. My googling did not turn up a handy
list of features for which Formatter differs from printf, nor did it turn
up a POSIXly-correct printf library for Java. Likely that both of these
things exist somewhere. Otherwise if we really wanted to be picky about
portability, we might have to write that custom printf in each
target language, including Java.

W.r.t. outputting elements as JSON, IMO this is another area where a formal
data model is going to help, and the output need not be limited to JSON.
Thrift, Protobuf, Avro, JSON, GraphQL... any data model we have an
appropriate API for and can describe in terms of primitive types, sum, and
products, we can map schemas and data into. The target of format() would
not be "JSON" but (for example) "JSON conforming to the JSON Schema
equivalent of your graph schema". In the default graph schema, I think
there will be one kind of vertex, and one kind of edge, with labels more
like properties than types. The generated JSON for a graph with this flat
schema could be made to look something like GraphSON, though I wouldn't
expect the default representation of a vertex to contain "inE" or "outE"
because a vertex doesn't own/contain its incident edges. You could output
something that *contains* a vertex and also contains the incident edges.

Josh






On Wed, Jan 22, 2020 at 12:11 PM Stephen Mallette 
wrote:

> We've long had the issue of how to deal with string better. Typically the
> concern lies with concatenation, but there are other use cases that have
> come up along the way as well. I started playing around with a format()
> step to try to capture all the odds and ends I have notes on in relation to
> this:
>
> Quickly hacked together I have something that allows:
>
> gremlin> g.V().hasLabel('person').format("%s is %s years
> old").by('name').by('age')
> ==>marko is 29 years old
> ==>vadas is 27 years old
> ==>josh is 32 years old
> ==>peter is 35 years old
>
> The engine behind the string formatting is the standard Java Formatter. I
> just wanted to see what it could look like so Formatter was an easy choice.
> Of course, Formatter might not be best - part of me would prefer a more
> non-JVM centric sort of templating, perhaps something like:
>
> g.V().hasLabel('person').format("{} is {} years old").by('name').by('age')
>
> which is fairly commonplace across languages (even used in Java in
> libraries like slf4j). That of course made me realize that it wouldn't be
> hard to overload format() to take a formatting engine as an argument so
> that it's extensible:
>
> g.V().hasLabel('person').format("{} is {} years old").by('name').by('age')
>
> g.V().hasLabel('person').format(JAVA, "%s is %s years
> old").by('name').by('age')
>
> The notion of a formatting engine argument made me think about another
> thing folks tend to want in relation to strings - clean output to JSON (not
> GraphSON exactly with all the embedded types - like think back to GraphSON
> 1 format) and other string formats:
>
> g.V().hasLabel('person').format(JSON)
>
> or perhaps it is just GraphSON??
>
> g.V().hasLabel('person').format(GraphSON_1)
>
> Providers who require special serializers could easily just override the
> FormatStep to configure the engines as necessary.
>
> I think format() helps solve a lot of the common issues with strings and
> Gremlin. Even with the basic Formatter you can do a poor man's sort of
> substring:
>
> gremlin> g.V().hasLabel('person').format("%1.1s").by('name')
> ==>m
> ==>v
> ==>j
> ==>p
>
> I'd imagine that with a more advanced engine we could get something more
> full featured if we wanted to cover even wider general function use cases.
> Not sure if things like substring should be more like first class citizens
> in Gremlin or not though. Anyway, happy to hear any thoughts on the idea of
> format() and what it might mean to Gremlin.
>


Re: [DISCUSS] Process API for TP4 [Was: structure API for TP4]

2020-01-13 Thread Joshua Shinavier
the other thread.
>
>
>
> On Mon, Jan 13, 2020 at 7:53 AM Stephen Mallette 
> wrote:
>
> > Thanks for trying out Idris. I had a feeling it would work the way that
> > you found it to but without actually trying it out there would be no way
> to
> > know for sure.
> >
> > Interesting idea to use thrift to generate process classes like steps.
> > Having some foundational code could be helpful in starting up and
> > maintaining a GLV. With Idris I'd hoped to get more than just some
> > interfaces and some core code that could supply some working logic to
> every
> > language ecosystem we supported but perhaps that was asking too much.
> >
> > I've looked at Thrift before as a possible serialization format for use
> to
> > use with Gremlin Server but given the adherence to schema that it
> required
> > I opted away from it. Given that we now look to have the notion of a
> schema
> > in TP4 I suppose Thrift, protocolbuffers and other such formats and
> > protocols are back on the table for consideration. There is a whole
> > separate discussion to be had about "Gremlin Server" and the methods by
> > which users "connect" to a graph for TP4, but perhaps I will save that
> for
> > a separate thread so as not to redirect this one too much.
> >
> > On Fri, Jan 10, 2020 at 1:43 PM Joshua Shinavier 
> [...]


Re: structure API for TP4

2020-01-13 Thread Joshua Shinavier
Hi Stephen,


On Mon, Jan 13, 2020 at 4:54 AM Stephen Mallette 
wrote:

> [...]
> Interesting idea to use thrift to generate process classes like steps.
> Having some foundational code could be helpful in starting up and
> maintaining a GLV.


Thrift IDL is currently the RPC language with the most robust support by
Dragon, the schema toolkit I am preparing to release. Protobuf and Avro
support are also pretty good, as we use generated Proto and Avro schemas in
production at Uber.



> With Idris I'd hoped to get more than just some
> interfaces and some core code that could supply some working logic to every
> language ecosystem we supported but perhaps that was asking too much.
>

W.r.t. using Idris to generate TinkerGraph implementations, I am not
entirely sure it's feasible, not to say it definitely isn't. A graph
implementation involves a lot of stateful operations I haven't given much
thought to from a representational point of view. But maybe. I am more
confident that we could come up with a schema for Gremlin steps and Gremlin
traversals, and generate code to evaluate traversals in multiple
programming languages. Maybe it isn't even necessary to generate the Idris
code; it might be enough to hand-code implementations in Idris and let
Idris code generation carry them into the target languages. What is a
little less clear to me is how we can get both:
a) working executable code, and
b) clean and extensible interface definitions
for each target language using a single approach to code generation.



> I've looked at Thrift before as a possible serialization format for use to
> use with Gremlin Server but given the adherence to schema that it required
> I opted away from it. Given that we now look to have the notion of a schema
> in TP4 I suppose Thrift, protocolbuffers and other such formats and
> protocols are back on the table for consideration. There is a whole
> separate discussion to be had about "Gremlin Server" and the methods by
> which users "connect" to a graph for TP4, but perhaps I will save that for
> a separate thread so as not to redirect this one too much.
>

I definitely think there is value in exploring this. If we can write an APG
schema for traversals and results, there will be straightforward paths to
remote execution using Thrift, gRPC etc., or Avro for a start.

Josh



>
> On Fri, Jan 10, 2020 at 1:43 PM Joshua Shinavier 
> wrote:
> [...]


Re: structure API for TP4

2020-01-10 Thread Joshua Shinavier
As an illustration of what we can get out of Thrift (or other code gen
frameworks, but Thrift is one I use frequently), here is the
FieldDefinition type in Java:'

public class FieldDefinition java.io.Serializable, Cloneable,
Comparable {

  public String name;
  public CommonMetadata meta;
  public DataType type;
  public String referenceBy;
  public int index;
  public String defaultValue;
  public boolean primaryKey;

...


and Python

class FieldDefinition(object):

def __init__(self, name=None, meta=None, type=None,
referenceBy=None, index=None, defaultValue=None, primaryKey=None,):
self.name = name
self.meta = meta
self.type = type
self.referenceBy = referenceBy
self.index = index
self.defaultValue = defaultValue
self.primaryKey = primaryKey

...


and Go:

type FieldDefinition struct {
  Name *FieldName `thrift:"name,1" db:"name" json:"name,omitempty"`
  Meta *CommonMetadata `thrift:"meta,2" db:"meta" json:"meta,omitempty"`
  Type *DataType `thrift:"type,3" db:"type" json:"type,omitempty"`
  ReferenceBy *FieldName `thrift:"referenceBy,4" db:"referenceBy"
json:"referenceBy,omitempty"`
  Index *FieldIndex `thrift:"index,5" db:"index" json:"index,omitempty"`
  DefaultValue *string `thrift:"defaultValue,6" db:"defaultValue"
json:"defaultValue,omitempty"`
  PrimaryKey *bool `thrift:"primaryKey,7" db:"primaryKey"
json:"primaryKey,omitempty"`
}


etc. We can generate similar skeleton APIs for anything we can define with
a schema. This will ensure that new GLVs start out with the same basic
structural assumptions about elements, types, etc. The same approach can
also be used for classes on the process side, such as for steps --
constraining inputs and outputs. The YAML for the examples above looks like
this:

- name: FieldDefinition
  type:
record:
  - name: name
type: FieldName
  - name: meta
type:
  optional: CommonMetadata
  - name: type
type: DataType
  - name: referenceBy
type:
  optional: FieldName
  - name: index
    type:
  optional: FieldIndex
  - name: defaultValue
type:
  optional: string
  - name: primaryKey
type: boolean


Josh



On Thu, Jan 9, 2020 at 3:08 PM Joshua Shinavier  wrote:

> So, w.r.t. "empty promises", I wanted to give the Idris code generation
> idea a shot before we continue talking about it as a possibility for TP4. I
> enhanced my Haskell transformer to generate either Haskell or Idris data
> type definitions. This works well enough for product types and simple union
> types (enums). Haskell-style records with multiple constructors (including
> the example I provided to Pieter above) are not supported in Idris, and I
> haven't gotten deep enough into the Idris type system to be sure of the
> best construction for a sum of product types.
>
> However.
>
> The code I'm able to generate using idris --codegen (I have tried
> JavaScript and C) is not encouraging. I probably could have found this out
> earlier without going to the trouble of writing a transformer first.
> Whereas I was talking about using Thrift for generating *interface*
> definitions, Idris code generation does not seem to result in a friendly
> interface. It is optimized executable code that is generated, and this does
> not necessarily resemble the data type / record definitions that were used
> to generate it. For example, I can generate (starting from YAML) an Idris
> record definition like this:
>
> record FieldDefinition where
>   constructor MkFieldDefinition
>   fieldDefinitionName : FieldName
>   fieldDefinitionMeta : Maybe CommonMetadata
>   fieldDefinitionType : DataType
>   fieldDefinitionReferenceBy : Maybe FieldName
>   fieldDefinitionIndex : Maybe FieldIndex
>   fieldDefinitionDefaultValue : Maybe String
>   fieldDefinitionPrimaryKey : Bool
>
> ...and then I can use idris --codegen javascript to generate a *.js file.
> However, unless I use FieldDefinition in the main of my Idris program, it
> is omitted entirely from the *.js file. If I do use the type definition
> from main, I get some code which exactly implements whatever operation I
> performed on it in my main, but I do not get any kind of stand-alone
> FieldDefinition definition. This makes the generated code not-so-useful
> for the sake of creating a skeleton API. The situation is even worse for
> the C target, as executable code rather than C source code is generated. In
> the case of the (external) Java codegen implementation, it appears that
> bytecode, not Java source code, is generated.
>
> tl;dr Idtris 

Re: structure API for TP4

2020-01-09 Thread Joshua Shinavier
otion of a schema aligning with Gremlin and the current
> methods by which Gremlin is supported by graph providers? Like, how will
> graphs that don't natively have schema, like Neo4j, work with TinkerPop's
> schema API? For graphs that have a native schema language like JanusGraph
> and DS Graph, what will they need to do to support TinkerPop's schema APIs?
> If we're true to the design goal of native language support, I'd be curious
> as to how we will achieve schema API interop in the same way that we have
> Gremlin capable of being written with python but then processed on the JVM
> when executed. How will we make it so that we can write our schema in
> python and have it apply to a JVM based graph (that might be remotely
> hosted like CosmosDB with no native schema or DS Graph with a native
> schema)?
>
> On Tue, Jan 7, 2020 at 1:00 PM Joshua Shinavier  wrote:
>
> > That might be an even better option. I don't have any experience with
> > Idris, but the syntax for data type definitions is pretty similar to
> > Haskell's. I have a mapping already written (in Haskell) that takes
> schemas
> > defined in YAML to Haskell data type definitions; I imagine I could tweak
> > it slightly to generate Idris definitions instead, and from there we
> could
> > take advantage of Idris code generation. Come to think of it, there are
> > also quite a few codegen projects in Haskell that could be used. With
> > Idris, however, it seems that code generation was a design consideration
> > for the language itself.
> >
> > Josh
> >
> >
> >
> > On Tue, Jan 7, 2020 at 4:05 AM Stephen Mallette 
> > wrote:
> >
> > > Regarding code generation...
> > >
> > > A while ago, James Thornton put me onto Idris which is sorta what sent
> me
> > > trying to learn Haskell:
> > >
> > > http://docs.idris-lang.org/en/latest/reference/codegen.html
> > >
> > > I don't really have a sense of whether or not we could use that to our
> > > advantage. Perhaps you do Josh?
> > >
> > > On Mon, Jan 6, 2020 at 1:08 PM Joshua Shinavier 
> > wrote:
> > >
> > > > Hi Pieter, Stephen,
> > > >
> > > > Pieter: Can it be specified in `formal` English rather than in
> Category
> > > > Theory?
> > > > Josh: Sure. CT is a mathematical framework that makes our definition
> of
> > > the
> > > > data model rigorous, but the data model can also be described in
> plain
> > > > English. We tried to do both in the paper, and naturally the
> reference
> > > > documentation for TinkerPop will be extended for any new APIs. You
> will
> > > be
> > > > able to get pretty far in understanding the data model just by
> looking
> > at
> > > > the code. For example, even if you don't know Haskell, you might be
> > able
> > > to
> > > > tell what is going on here:
> > > >
> > > > data DataType
> > > >   = PrimitiveType PrimitiveType
> > > >   | NamedType TypeReference
> > > >   | ProductType
> > > >   { productFields  :: [Field] }
> > > >   | SumType
> > > >   { sumCases   :: [Field] }
> > > >   | EnumType
> > > >   { enumValues :: [Field] }
> > > >   | OptionalType
> > > >   { optionalType   :: DataType }
> > > >   | ListType
> > > >   { elementType:: DataType }
> > > >   | SetType
> > > >   { setElementType :: DataType }
> > > >   | MapType
> > > >   { keyType:: DataType
> > > >   , valueType  :: DataType }
> > > >
> > > >
> > > > A data type is either a primitive type:
> > > >
> > > > data PrimitiveType
> > > >   = BinaryType
> > > >   | BooleanType
> > > >   | FloatType
> > > > { floatTypePrecision   :: BitPrecision }
> > > >   | IntegerType
> > > > { integerTypePrecision :: BitPrecision
> > > > , integerTypeSigned:: Bool }
> > > >   | StringType
> > > >
> > > >
> > > > ...or it's a named ("labeled") type like "Person" or "knows", or a
> sum
> > or
> > > > product type, or one of a few other things depending on what we
> choose
> > to
> > > > support in TinkerPop. To this, we will probably add VertexType,
> > EdgeType,
> >

Re: structure API for TP4

2020-01-07 Thread Joshua Shinavier
That might be an even better option. I don't have any experience with
Idris, but the syntax for data type definitions is pretty similar to
Haskell's. I have a mapping already written (in Haskell) that takes schemas
defined in YAML to Haskell data type definitions; I imagine I could tweak
it slightly to generate Idris definitions instead, and from there we could
take advantage of Idris code generation. Come to think of it, there are
also quite a few codegen projects in Haskell that could be used. With
Idris, however, it seems that code generation was a design consideration
for the language itself.

Josh



On Tue, Jan 7, 2020 at 4:05 AM Stephen Mallette 
wrote:

> Regarding code generation...
>
> A while ago, James Thornton put me onto Idris which is sorta what sent me
> trying to learn Haskell:
>
> http://docs.idris-lang.org/en/latest/reference/codegen.html
>
> I don't really have a sense of whether or not we could use that to our
> advantage. Perhaps you do Josh?
>
> On Mon, Jan 6, 2020 at 1:08 PM Joshua Shinavier  wrote:
>
> > Hi Pieter, Stephen,
> >
> > Pieter: Can it be specified in `formal` English rather than in Category
> > Theory?
> > Josh: Sure. CT is a mathematical framework that makes our definition of
> the
> > data model rigorous, but the data model can also be described in plain
> > English. We tried to do both in the paper, and naturally the reference
> > documentation for TinkerPop will be extended for any new APIs. You will
> be
> > able to get pretty far in understanding the data model just by looking at
> > the code. For example, even if you don't know Haskell, you might be able
> to
> > tell what is going on here:
> >
> > data DataType
> >   = PrimitiveType PrimitiveType
> >   | NamedType TypeReference
> >   | ProductType
> >   { productFields  :: [Field] }
> >   | SumType
> >   { sumCases   :: [Field] }
> >   | EnumType
> >   { enumValues :: [Field] }
> >   | OptionalType
> >   { optionalType   :: DataType }
> >   | ListType
> >   { elementType:: DataType }
> >   | SetType
> >   { setElementType :: DataType }
> >   | MapType
> >   { keyType:: DataType
> >   , valueType  :: DataType }
> >
> >
> > A data type is either a primitive type:
> >
> > data PrimitiveType
> >   = BinaryType
> >   | BooleanType
> >   | FloatType
> > { floatTypePrecision   :: BitPrecision }
> >   | IntegerType
> > { integerTypePrecision :: BitPrecision
> > , integerTypeSigned:: Bool }
> >   | StringType
> >
> >
> > ...or it's a named ("labeled") type like "Person" or "knows", or a sum or
> > product type, or one of a few other things depending on what we choose to
> > support in TinkerPop. To this, we will probably add VertexType, EdgeType,
> > and PropertyType. Yes, logically they are product types, but they are
> > fairly special in TinkerPop, and deserve their own constructors, like the
> > OptionalType and EnumType constructors you see above (optionals and enums
> > being special sum types). When we get down into the actual code and
> > documentation, I don't think users are going to need to worry about
> > category theory.
> >
> >
> > Pieter: "I'd prefer if the reference implementation is in fact far less
> > important than the specification itself"
> > Josh: I think the reason we have never had a real specification is that
> > neither the property graph data model nor the operational semantics of
> > Gremlin had been formalized. We're halfway there now with the formal PG
> > data model. The extent to which Gremlin can be formalized for TP4 is TBD,
> > though I would like to see things move things in the direction of a
> monadic
> > formalism as I say. The further we go in that direction, I'd say the
> easier
> > it will be to write a spec.
> >
> > W.r.t. making implementations more efficient, that's somewhat orthogonal
> to
> > what I'm trying to do, but at least in Scala (and Haskell if we decide to
> > pursue a full implementation there) I do see a lot of the nested iterator
> > messiness and other intermediate abstractions going away.
> >
> > Stephen: "I think the idea is more about the notion that the Structure
> API
> > which is a provider API is something that can go away as a concept."
> > Josh: OK, yes, I can see edge and vertex implementations going away, as
> > well, if the basic data access operations for outV, inV, 

Re: structure API for TP4

2020-01-06 Thread Joshua Shinavier
t CT is
> bringing here, but if TinkerPop becomes harder and more abstract to use as
> a result I don't think we're doing anything helpful. It seems important
> that we have some higher level language above the mathematical rigor so
> that the average user has a shot at using this stuff.
>
>
> > In TinkerPop 3 the specification was pretty much the reference
> > implementation itself. In TinkerPop 4 I'd prefer if the reference
> > implementation is in fact far less important than the specification
> > itself. I.e. the specification must be in English and not refer to api
> > calls in the reference implementation.
> >
>
> The Structure Test Suite is the worst offender there, though there are
> aspects of the Process Test Suite that are equally bad. I'm not sure what a
> test suite will look like offhand, but I think we'll need to think harder
> about the types of test we write to take care that they are not bound too
> closely to the "TinkerGraph" way of doing things.
>
>
> > Regarding the implementation.
> >
> > Something that has always concerned me about TinkerPop's implementation
> > is that it (embedded java db's being the exception) is generally too
> > far away from the data. Massive latency and endless copying of the data
> > occurs.
>
>
> I guess Remote Graph Providers (DSG, Neptune, etc) have mitigated that by
> putting their implementations close to the data, thus executing the
> traversal on the server near the data and then just returning the result. I
> think that we need to keep that model in mind for TP4 as it was really only
> emergent in TP3 and our designs supporting that model basically were
> shoehorned in.
>
>
> > Further it has no real understanding of memory. Any step might for
> > whatever reason have a ReducingBarrierStep and load the full traversal
> > data set into the JVM's memory.
> >
>
> I'm not sure that I follow what you're looking for TP to do here. If you
> want to outline that further, perhaps start a different thread as it
> doesn't sound quite related to this thread on the Schema API.
>
>
> > Perhaps a reference implementation written in C/C++/Go/Rust... might be
> > more useful to database vendors.
> >
>
> All languages I don't know ;) Short of some major new contributions from
> someone, I'd expect us to be heading down the road of the JVM again as our
> starting point.
>
>
> > All that said, thanks for all the work you are putting into this.
>
>
> Appreciate your thoughts. Take care.
>
>
> On Sun, Jan 5, 2020 at 2:14 PM pieter martin 
> wrote:
>
> > Hi,
> >
> > Here are some thoughts/concerns that I have.
> >
> > Regarding the structure api and query specification.
> >
> > Can it be specified in `formal` English rather than in Category Theory?
> > I think having the specification in Category Theory simply makes the
> > barrier to entry to high for many of us to partake in the conversation.
> >
> > I get that having a formal mathematical spec is useful and interesting
> > but perhaps it can remain just below the surface rather than being the
> > primary source.
> >
> > In TinkerPop 3 the specification was pretty much the reference
> > implementation itself. In TinkerPop 4 I'd prefer if the reference
> > implementation is in fact far less important than the specification
> > itself. I.e. the specification must be in English and not refer to api
> > calls in the reference implementation.
> >
> > Regarding the implementation.
> >
> > Something that has always concerned me about TinkerPop's implementation
> > is that it (embedded java db's being the exception) is generally too
> > far away from the data. Massive latency and endless copying of the data
> > occurs.
> > Further it has no real understanding of memory. Any step might for
> > whatever reason have a ReducingBarrierStep and load the full traversal
> > data set into the JVM's memory.
> > Perhaps a reference implementation written in C/C++/Go/Rust... might be
> > more useful to database vendors.
> >
> > All that said, thanks for all the work you are putting into this.
> >
> > Cheers
> > Pieter
> >
> >
> >
> >
> > On Sat, 2020-01-04 at 10:51 -0800, Joshua Shinavier wrote:
> > > Thanks for the detailed response, Stephen. Good points made. Let's
> > > dig a
> > > little deeper to get to a common understanding of a "structure API"
> > > for
> > > TP4. I agree that Graph is a relic of t

Re: structure API for TP4

2020-01-04 Thread Joshua Shinavier
alization that TinkerPop, specifically Gremlin, would be available
> natively in other language ecosystems besides the JVM came way too late in
> TP3. As a result, we have an extraordinarily mixed set of messages with
> Gremlin usage. Things work one way in Java, but another way in Python. And
> while 3.4.x unified connection options across languages, there's still too
> many ways to connect to a graph and too much discrepancy in behavior. We
> need to think about how every single feature that we create for TP4 behaves
> in each language and what parity of capability we can achieve there. And if
> some reasonable level of parity can't be achieved for whatever reason, we
> should seriously consider either not implementing the feature or the story
> for the language ecosystems that don't have the functionality better be
> crystal clear and consistent with TinkerPop as whole. We should very much
> consider how Graph.Features (in whatever form it takes) is accessible via
> Java, Python, Javascript, etc. before going too far in any particular
> development direction.
> 2. What is the general structure for this project with respect to the
> different language environments that we have? Personally, I still like the
> idea of a single repo, but without a single build system ruling it all. In
> this way each language ecosystem can take advantage of the best parts of
> its particular build tool chain without having to shoehorn into a different
> system's approach. That said, I think each ecosystem should stick to a
> single build tool chain e.g.. maven for the JVM.
>
> As a big picture point, I think the JVM ecosystem will be the model for all
> other language ecosystems. I would think that we would want to take care
> that we not turn TinkerPop into a Scala-only system - I assume this work
> isn't laying the foundation for that, but figured I'd voice the concern. I
> think we'd largely still rely on Java for development outside of this
> feature that has some specific demands not addressed well by it. I'd
> further assume that we would have some nice clean interop back to Java for
> this stuff so as to keep our core users well engaged.
>
> > to keep TinkerPop aligned with upcoming standards like RDF* and GQL.
> > Interoperability with mm-ADT should be straightforward
>
> Thank you for keeping up with the developing standards. That's a nice
> service to TinkerPop.
>
> Ultimately my vision for TP4 seems to have less to do with specific major
> new features (thus glad to see that you're thinking in that manner) and
> more to do with creating consistent, coherent and easy graph usage patterns
> across language ecosystems for users while making it even simpler for
> providers to build their TinkerPop-enabled systems. Having seen so much
> success with GLVs for TP3, despite their drawbacks, I can't help but sense
> that focusing on this notion as a foundational element of design for TP4
> will further expand TinkerPop's appeal and reach.
>
>
>
>
>
> On Thu, Dec 26, 2019 at 11:00 AM Joshua Shinavier 
> wrote:
>
> > Hi everyone,
> >
> > I would like to reboot the conversation around TinkerPop 4, specifically
> as
> > it concerns the structure API. You will have seen my posts, ever since my
> > presentation [1] last January, about an algebraic approach to property
> > graph schemas and transformations, which Ryan and I formalized in the APG
> > paper [2]. I am now very close to releasing the Haskell implementation of
> > this framework as open source software (to be accompanied by an Uber
> > Engineering Blog post, in the next few weeks if all goes well).
> >
> > At various times and places, I have suggested that we develop a
> Scala-based
> > structure API for TP4 which implements APG in an extensible way. I think
> it
> > is time to proceed and start committing code, or discuss alternative
> plans
> > for the structure API. There seems to be plenty of community interest,
> and
> > I now have an official OK to put some engineering hours towards it at
> work.
> > I would like to align with you -- the TP PMC and other TinkerPop
> committers
> > and developers -- on how to proceed, who will contribute, and what the
> > development timeline will look like.
> >
> > Some specifics from my side:
> >
> >- Graph.Features will carry over into TP4; it will just be a bit more
> >sophisticated than the current TP3 Graph.Features. Btw. I also
> proposed
> >this idea of a graph feature vector at the recent Dagstuhl Seminar
> [3],
> >where it caught on and will be the basis of a "dragon data model" that
> >might

structure API for TP4

2019-12-26 Thread Joshua Shinavier
Hi everyone,

I would like to reboot the conversation around TinkerPop 4, specifically as
it concerns the structure API. You will have seen my posts, ever since my
presentation [1] last January, about an algebraic approach to property
graph schemas and transformations, which Ryan and I formalized in the APG
paper [2]. I am now very close to releasing the Haskell implementation of
this framework as open source software (to be accompanied by an Uber
Engineering Blog post, in the next few weeks if all goes well).

At various times and places, I have suggested that we develop a Scala-based
structure API for TP4 which implements APG in an extensible way. I think it
is time to proceed and start committing code, or discuss alternative plans
for the structure API. There seems to be plenty of community interest, and
I now have an official OK to put some engineering hours towards it at work.
I would like to align with you -- the TP PMC and other TinkerPop committers
and developers -- on how to proceed, who will contribute, and what the
development timeline will look like.

Some specifics from my side:

   - Graph.Features will carry over into TP4; it will just be a bit more
   sophisticated than the current TP3 Graph.Features. Btw. I also proposed
   this idea of a graph feature vector at the recent Dagstuhl Seminar [3],
   where it caught on and will be the basis of a "dragon data model" that
   might help to keep TinkerPop aligned with upcoming standards like RDF* and
   GQL.
   - I feel we should use Scala for the API. This opinion is informed by my
   experiences writing tools of this kind in both Java and Haskell at Uber.
   While I am a huge fan of Haskell, practical considerations rule it out as
   an option. We need the API to be JVM-compatible. The best Haskell-JVM
   bridge in is Eta [4], but IMO it is not ready to be put in the critical
   path on a project such as TinkerPop; we used it at Uber for a while and
   found it to be a time sink, despite the generated bytecode working great.
   Likewise, I would strongly advise against continuing with a pure Java-based
   API if we want to do intelligent things with graph schemas. The language is
   just not appropriate as a basis for the type system in question. Scala, on
   the other hand, has all of the advantages of Haskell in terms of type
   safety and functional pattern matching, although it requires some extra
   discipline to keep your code pure.
   - Interoperability with Ryan's CQL (categorical query language [5]) is
   of interest.
   - Interoperability with mm-ADT should be straightforward now that mm-ADT
   has support for union types. Hopefully, mm-ADT's type system will end up as
   a proper superset of TP4's.

Thoughts?

Josh


[1]
https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012
[2] https://arxiv.org/abs/1909.04881
[3] https://www.dagstuhl.de/en/program/calendar/semhp/?semnr=19491
[4] https://eta-lang.org
[5] https://www.categoricaldata.net


Re: [VOTE] TinkerPop 3.4.3 Release

2019-08-07 Thread Joshua Shinavier
Thanks. No luck yet, but did also have to change this:

COMMITTERS=$(curl -Ls https://dist.apache.org/repos/dist/dev/tinkerpop/KEYS
| tee ${TMP_DIR}/KEYS | grep -Po '(?<=<)[^<]*(?=@apache.org>)' | uniq)


to this:

COMMITTERS=$(curl -Ls https://dist.apache.org/repos/dist/dev/tinkerpop/KEYS
| tee ${TMP_DIR}/KEYS | perl -nle 'print $& while m{(?<=<)[^<]*(?=@
apache.org>)}g' | uniq)

as grep -P is not supported in my environment.

Josh


On Wed, Aug 7, 2019 at 10:49 AM Stephen Mallette 
wrote:

> I think you just need to import our public keys. just download this:
>
> https://dist.apache.org/repos/dist/dev/tinkerpop/KEYS
>
> and then:
>
>  $ gpg --import KEYS
>
> i think that's it anyway.
>
> On Wed, Aug 7, 2019 at 1:43 PM Joshua Shinavier  wrote:
>
> > Thanks, Stephen. I tried the script from the 3.4.3 tag, and ran into the
> > following. What am I doing wrong?
> >
> >
> > $ bin/validate-distribution.sh 3.4.3
> >
> >
> > Validating binary distributions
> >
> >
> > usage: grep [-abcDEFGHhIiJLlmnOoqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
> >
> > [-e pattern] [-f file] [--binary-files=value] [--color=when]
> >
> > [--context[=num]] [--directories=action] [--label] [--line-buffered]
> >
> > [--null] [pattern] [file ...]
> >
> > (23) Failed writing body
> >
> > * downloading Apache TinkerPop Gremlin
> > (apache-tinkerpop-gremlin-console-3.4.3-bin.zip)... OK
> >
> > * validating signatures and checksums ...
> >
> >   * PGP signature ... failed
> >
> >
> >
> > On Wed, Aug 7, 2019 at 10:25 AM Stephen Mallette 
> > wrote:
> >
> > > You can use:
> > >
> > >
> > >
> >
> https://github.com/apache/tinkerpop/blob/master/bin/validate-distribution.sh
> > >
> > >
> > > it does "everything" - just make sure to run it from the branch on
> which
> > > the release was done.
> > >
> > > On Wed, Aug 7, 2019 at 1:19 PM Joshua Shinavier 
> > wrote:
> > >
> > > > Humble +1 from me. Cool logo. Daniel, what script are you using for
> > > > validating the distribution?
> > > >
> > > > Josh
> > > >
> > > > On Wed, Aug 7, 2019 at 7:35 AM Robert Dale 
> wrote:
> > > >
> > > > > Best durn lookin' docs ever!
> > > > > VOTE +1
> > > > >
> > > > > Robert Dale
> > > > >
> > > > >
> > > > > On Wed, Aug 7, 2019 at 7:39 AM Daniel Kuppitz 
> > wrote:
> > > > >
> > > > > > Distribution validation is looking good.
> > > > > >
> > > > > > Validating binary distributions
> > > > > >
> > > > > > * downloading Apache TinkerPop Gremlin
> > > > > > (apache-tinkerpop-gremlin-console-3.4.3-bin.zip)... OK
> > > > > > * validating signatures and checksums ...
> > > > > >   * PGP signature ... OK
> > > > > >   * SHA512 checksum ... OK
> > > > > > * unzipping Apache TinkerPop Gremlin ... OK
> > > > > > * validating Apache TinkerPop Gremlin's docs ... OK
> > > > > > * validating Apache TinkerPop Gremlin's binaries ... OK
> > > > > > * validating Apache TinkerPop Gremlin's legal files ...
> > > > > >   * LICENSE ... OK
> > > > > >   * NOTICE ... OK
> > > > > > * validating Apache TinkerPop Gremlin's plugin directory ... OK
> > > > > > * validating Apache TinkerPop Gremlin's lib directory ... OK
> > > > > > * testing script evaluation ... OK
> > > > > >
> > > > > > * downloading Apache TinkerPop Gremlin
> > > > > > (apache-tinkerpop-gremlin-server-3.4.3-bin.zip)... OK
> > > > > > * validating signatures and checksums ...
> > > > > >   * PGP signature ... OK
> > > > > >   * SHA512 checksum ... OK
> > > > > > * unzipping Apache TinkerPop Gremlin ... OK
> > > > > > * validating Apache TinkerPop Gremlin's docs ... OK
> > > > > > * validating Apache TinkerPop Gremlin's binaries ... OK
> > > > > > * validating Apache TinkerPop Gremlin's legal files ...
> > > > > >   * LICENSE ... OK
> > > > > >   * NOTICE ... OK
> > > > > > * validating Apache TinkerPop Gremlin'

Re: [VOTE] TinkerPop 3.4.3 Release

2019-08-07 Thread Joshua Shinavier
Thanks, Stephen. I tried the script from the 3.4.3 tag, and ran into the
following. What am I doing wrong?


$ bin/validate-distribution.sh 3.4.3


Validating binary distributions


usage: grep [-abcDEFGHhIiJLlmnOoqRSsUVvwxZ] [-A num] [-B num] [-C[num]]

[-e pattern] [-f file] [--binary-files=value] [--color=when]

[--context[=num]] [--directories=action] [--label] [--line-buffered]

[--null] [pattern] [file ...]

(23) Failed writing body

* downloading Apache TinkerPop Gremlin
(apache-tinkerpop-gremlin-console-3.4.3-bin.zip)... OK

* validating signatures and checksums ...

  * PGP signature ... failed



On Wed, Aug 7, 2019 at 10:25 AM Stephen Mallette 
wrote:

> You can use:
>
>
> https://github.com/apache/tinkerpop/blob/master/bin/validate-distribution.sh
>
>
> it does "everything" - just make sure to run it from the branch on which
> the release was done.
>
> On Wed, Aug 7, 2019 at 1:19 PM Joshua Shinavier  wrote:
>
> > Humble +1 from me. Cool logo. Daniel, what script are you using for
> > validating the distribution?
> >
> > Josh
> >
> > On Wed, Aug 7, 2019 at 7:35 AM Robert Dale  wrote:
> >
> > > Best durn lookin' docs ever!
> > > VOTE +1
> > >
> > > Robert Dale
> > >
> > >
> > > On Wed, Aug 7, 2019 at 7:39 AM Daniel Kuppitz  wrote:
> > >
> > > > Distribution validation is looking good.
> > > >
> > > > Validating binary distributions
> > > >
> > > > * downloading Apache TinkerPop Gremlin
> > > > (apache-tinkerpop-gremlin-console-3.4.3-bin.zip)... OK
> > > > * validating signatures and checksums ...
> > > >   * PGP signature ... OK
> > > >   * SHA512 checksum ... OK
> > > > * unzipping Apache TinkerPop Gremlin ... OK
> > > > * validating Apache TinkerPop Gremlin's docs ... OK
> > > > * validating Apache TinkerPop Gremlin's binaries ... OK
> > > > * validating Apache TinkerPop Gremlin's legal files ...
> > > >   * LICENSE ... OK
> > > >   * NOTICE ... OK
> > > > * validating Apache TinkerPop Gremlin's plugin directory ... OK
> > > > * validating Apache TinkerPop Gremlin's lib directory ... OK
> > > > * testing script evaluation ... OK
> > > >
> > > > * downloading Apache TinkerPop Gremlin
> > > > (apache-tinkerpop-gremlin-server-3.4.3-bin.zip)... OK
> > > > * validating signatures and checksums ...
> > > >   * PGP signature ... OK
> > > >   * SHA512 checksum ... OK
> > > > * unzipping Apache TinkerPop Gremlin ... OK
> > > > * validating Apache TinkerPop Gremlin's docs ... OK
> > > > * validating Apache TinkerPop Gremlin's binaries ... OK
> > > > * validating Apache TinkerPop Gremlin's legal files ...
> > > >   * LICENSE ... OK
> > > >   * NOTICE ... OK
> > > > * validating Apache TinkerPop Gremlin's plugin directory ... OK
> > > > * validating Apache TinkerPop Gremlin's lib directory ... OK
> > > >
> > > > Validating source distribution
> > > >
> > > > * downloading Apache TinkerPop 3.4.3
> > (apache-tinkerpop-3.4.3-src.zip)...
> > > OK
> > > > * validating signatures and checksums ...
> > > >   * PGP signature ... OK
> > > >   * SHA512 checksum ... OK
> > > > * unzipping Apache TinkerPop 3.4.3 ... OK
> > > > * checking source files ... OK
> > > > * building project ... OK
> > > >
> > > >
> > > > Skimmed over the docs, with focus on common problem areas, no issues
> > > found.
> > > >
> > > > VOTE +1
> > > >
> > > > Cheers,
> > > > Daniel
> > > >
> > > >
> > > >
> > > > On Tue, Aug 6, 2019 at 4:11 AM Stephen Mallette <
> spmalle...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > We are happy to announce that TinkerPop 3.4.3 is ready for release.
> > > > >
> > > > > The release artifacts can be found at this location:
> > > > > https://dist.apache.org/repos/dist/dev/tinkerpop/3.4.3/
> > > > >
> > > > > The source distribution is provided by:
> > > > > apache-tinkerpop-3.4.3-src.zip
> > > > >
> > > > > Two binary distributions are provided for user convenience:
> > > > >

Re: [VOTE] TinkerPop 3.4.3 Release

2019-08-07 Thread Joshua Shinavier
Humble +1 from me. Cool logo. Daniel, what script are you using for
validating the distribution?

Josh

On Wed, Aug 7, 2019 at 7:35 AM Robert Dale  wrote:

> Best durn lookin' docs ever!
> VOTE +1
>
> Robert Dale
>
>
> On Wed, Aug 7, 2019 at 7:39 AM Daniel Kuppitz  wrote:
>
> > Distribution validation is looking good.
> >
> > Validating binary distributions
> >
> > * downloading Apache TinkerPop Gremlin
> > (apache-tinkerpop-gremlin-console-3.4.3-bin.zip)... OK
> > * validating signatures and checksums ...
> >   * PGP signature ... OK
> >   * SHA512 checksum ... OK
> > * unzipping Apache TinkerPop Gremlin ... OK
> > * validating Apache TinkerPop Gremlin's docs ... OK
> > * validating Apache TinkerPop Gremlin's binaries ... OK
> > * validating Apache TinkerPop Gremlin's legal files ...
> >   * LICENSE ... OK
> >   * NOTICE ... OK
> > * validating Apache TinkerPop Gremlin's plugin directory ... OK
> > * validating Apache TinkerPop Gremlin's lib directory ... OK
> > * testing script evaluation ... OK
> >
> > * downloading Apache TinkerPop Gremlin
> > (apache-tinkerpop-gremlin-server-3.4.3-bin.zip)... OK
> > * validating signatures and checksums ...
> >   * PGP signature ... OK
> >   * SHA512 checksum ... OK
> > * unzipping Apache TinkerPop Gremlin ... OK
> > * validating Apache TinkerPop Gremlin's docs ... OK
> > * validating Apache TinkerPop Gremlin's binaries ... OK
> > * validating Apache TinkerPop Gremlin's legal files ...
> >   * LICENSE ... OK
> >   * NOTICE ... OK
> > * validating Apache TinkerPop Gremlin's plugin directory ... OK
> > * validating Apache TinkerPop Gremlin's lib directory ... OK
> >
> > Validating source distribution
> >
> > * downloading Apache TinkerPop 3.4.3 (apache-tinkerpop-3.4.3-src.zip)...
> OK
> > * validating signatures and checksums ...
> >   * PGP signature ... OK
> >   * SHA512 checksum ... OK
> > * unzipping Apache TinkerPop 3.4.3 ... OK
> > * checking source files ... OK
> > * building project ... OK
> >
> >
> > Skimmed over the docs, with focus on common problem areas, no issues
> found.
> >
> > VOTE +1
> >
> > Cheers,
> > Daniel
> >
> >
> >
> > On Tue, Aug 6, 2019 at 4:11 AM Stephen Mallette 
> > wrote:
> >
> > > Hello,
> > >
> > > We are happy to announce that TinkerPop 3.4.3 is ready for release.
> > >
> > > The release artifacts can be found at this location:
> > > https://dist.apache.org/repos/dist/dev/tinkerpop/3.4.3/
> > >
> > > The source distribution is provided by:
> > > apache-tinkerpop-3.4.3-src.zip
> > >
> > > Two binary distributions are provided for user convenience:
> > > apache-tinkerpop-gremlin-console-3.4.3-bin.zip
> > > apache-tinkerpop-gremlin-server-3.4.3-bin.zip
> > >
> > > The GPG key used to sign the release artifacts is available at:
> > > https://dist.apache.org/repos/dist/dev/tinkerpop/KEYS
> > >
> > > The online docs can be found here:
> > > http://tinkerpop.apache.org/docs/3.4.3/ (user docs)
> > > http://tinkerpop.apache.org/docs/3.4.3/upgrade/ (upgrade docs)
> > > http://tinkerpop.apache.org/javadocs/3.4.3/core/ (core
> javadoc)
> > > http://tinkerpop.apache.org/javadocs/3.4.3/full/ (full
> javadoc)
> > > http://tinkerpop.apache.org/dotnetdocs/3.4.3/ (.NET API docs)
> > > http://tinkerpop.apache.org/jsdocs/3.4.3/ (Javascript API
> docs)
> > >
> > > The tag in Apache Git can be found here:
> > > https://github.com/apache/tinkerpop/tree/3.4.3
> > >
> > > The release notes are available here:
> > >
> > https://github.com/apache/tinkerpop/blob/3.4.3/CHANGELOG.asciidoc
> > >
> > > The [VOTE] will be open for the next 72 hours --- closing Friday
> (August
> > 9,
> > > 2019) at 7am EST.
> > >
> > > My vote is +1.
> > >
> >
>


Re: [DISCUSS] Deprecate support for Multi/Metaproperties in Neo4j

2019-07-23 Thread Joshua Shinavier
Ah, got it. Even less of an objection to dropping these features, in that
case. If Neo4j natively supported multi- and meta-properties, that would be
different.

On Tue, Jul 23, 2019 at 11:30 AM Stephen Mallette 
wrote:

> Thanks Josh...Just to be clear, this issue isn't about removal of
> meta/multiproperties in the TinkerPop structure. It's only removing them as
> a configuration option for neo4j where they are sort of shoehorned in (as
> neo4j doesn't support them natively). I don't think this is a precursor to
> completely removing them in TinkerPop structure APIs in 3.x either right
> now.
>
> On Tue, Jul 23, 2019 at 2:27 PM Joshua Shinavier 
> wrote:
>
> > Hi Stephen,
> >
> > I do not see meta-properties much used in practice, either. No objection
> to
> > removing them for TP 3.5, but I would like to make them reappear in a
> > different form in TP 4.x, along with meta-edges (edges to or from
> > properties or other edges). Exotic as these features might seem, I think
> > the extra degree of freedom is valuable for defining mappings, e.g. to
> RDF
> > with named graphs, or to hypergraph data models. Also, the API complexity
> > is not too bad if we just promote properties to elements, along with
> > vertices and edges. Then a property is an element with an associated data
> > value. An edge is a binary relationship between two elements. Ordinary
> > edges are those in which out- and in- elements are both vertices, etc. --
> > again invoking the element taxonomy
> > <https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ>
> > from my Graph Day talk (and a paper which we should be able to post very
> > soon now).
> >
> > Josh
> >
> >
> > On Tue, Jul 23, 2019 at 10:43 AM Stephen Mallette 
> > wrote:
> >
> > > Here's the issue for this item:
> > >
> > > https://issues.apache.org/jira/browse/TINKERPOP-2270
> > >
> > > Once that's done a new issue will get created for removal.
> > >
> > > On Tue, Jul 16, 2019 at 5:57 PM Stephen Mallette  >
> > > wrote:
> > >
> > > > A long time back we built out multi and meta properties into neo4j by
> > > > encoding their values into the graph itself. While that was a neat
> > > > experiment, I don't think the concept really took hold and it
> > introduced
> > > a
> > > > layer of code that we probably don't need to keep, especially since I
> > > don't
> > > > think we ever even bothered to remove the "experimental" label we've
> > > > associated with it since its inception.
> > > >
> > > > Seems like we can drop "experimental" things from our code at this
> > point
> > > > of TinkerPop's life. Tests will run faster without the extra test
> > mode. I
> > > > also doubt that this feature has really caught on for anyone. Any
> > > > objections to deprecating in 3.3.x/3.4.x and then removing this
> feature
> > > for
> > > > 3.5.0?
> > > >
> > >
> >
>


Re: [DISCUSS] Deprecate support for Multi/Metaproperties in Neo4j

2019-07-23 Thread Joshua Shinavier
Hi Stephen,

I do not see meta-properties much used in practice, either. No objection to
removing them for TP 3.5, but I would like to make them reappear in a
different form in TP 4.x, along with meta-edges (edges to or from
properties or other edges). Exotic as these features might seem, I think
the extra degree of freedom is valuable for defining mappings, e.g. to RDF
with named graphs, or to hypergraph data models. Also, the API complexity
is not too bad if we just promote properties to elements, along with
vertices and edges. Then a property is an element with an associated data
value. An edge is a binary relationship between two elements. Ordinary
edges are those in which out- and in- elements are both vertices, etc. --
again invoking the element taxonomy

from my Graph Day talk (and a paper which we should be able to post very
soon now).

Josh


On Tue, Jul 23, 2019 at 10:43 AM Stephen Mallette 
wrote:

> Here's the issue for this item:
>
> https://issues.apache.org/jira/browse/TINKERPOP-2270
>
> Once that's done a new issue will get created for removal.
>
> On Tue, Jul 16, 2019 at 5:57 PM Stephen Mallette 
> wrote:
>
> > A long time back we built out multi and meta properties into neo4j by
> > encoding their values into the graph itself. While that was a neat
> > experiment, I don't think the concept really took hold and it introduced
> a
> > layer of code that we probably don't need to keep, especially since I
> don't
> > think we ever even bothered to remove the "experimental" label we've
> > associated with it since its inception.
> >
> > Seems like we can drop "experimental" things from our code at this point
> > of TinkerPop's life. Tests will run faster without the extra test mode. I
> > also doubt that this feature has really caught on for anyone. Any
> > objections to deprecating in 3.3.x/3.4.x and then removing this feature
> for
> > 3.5.0?
> >
>


[jira] [Comment Edited] (TINKERPOP-2267) a very easily solved documentation problem

2019-07-14 Thread Joshua Shinavier (JIRA)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884731#comment-16884731
 ] 

Joshua Shinavier edited comment on TINKERPOP-2267 at 7/14/19 5:58 PM:
--

Hi Jeff. Indeed, the most current documentation can always be found 
[here|http://tinkerpop.apache.org/docs/current/reference/]. This has:

{code:bash}
bin/gremlin-server.sh install org.apache.tinkerpop neo4j-gremlin 3.4.2
{code}

Please close the ticket if this resolves the issue.


was (Author: joshsh):
Hi Jeff. Indeed, the most current documentation can always be found 
[here|[http://tinkerpop.apache.org/docs/current/reference/]]. This has:

```

bin/gremlin-server.sh install org.apache.tinkerpop neo4j-gremlin 3.4.2

```

Please close the ticket if this resolves the issue.

> a very easily solved documentation problem
> --
>
> Key: TINKERPOP-2267
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2267
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: documentation
> Environment: Ubuntu, from within Docker
>Reporter: Jeffrey B Brown
>Priority: Minor
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> This page[1] of the documentation indicates that in order to install the 
> neo4j plugin for gremlin server, one should run `bin/gremlin-server.sh -i 
> org.apache.tinkerpop neo4j-gremlin 3.1.0-incubating`. I was unable to run 
> that until I substituted `install` for `-i`.
> [1] https://tinkerpop.apache.org/docs/3.1.0-incubating/#neo4j-gremlin



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (TINKERPOP-2267) a very easily solved documentation problem

2019-07-14 Thread Joshua Shinavier (JIRA)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884731#comment-16884731
 ] 

Joshua Shinavier commented on TINKERPOP-2267:
-

Hi Jeff. Indeed, the most current documentation can always be found 
[here|[http://tinkerpop.apache.org/docs/current/reference/]]. This has:

```

bin/gremlin-server.sh install org.apache.tinkerpop neo4j-gremlin 3.4.2

```

Please close the ticket if this resolves the issue.

> a very easily solved documentation problem
> --
>
> Key: TINKERPOP-2267
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2267
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: documentation
> Environment: Ubuntu, from within Docker
>Reporter: Jeffrey B Brown
>Priority: Minor
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> This page[1] of the documentation indicates that in order to install the 
> neo4j plugin for gremlin server, one should run `bin/gremlin-server.sh -i 
> org.apache.tinkerpop neo4j-gremlin 3.1.0-incubating`. I was unable to run 
> that until I substituted `install` for `-i`.
> [1] https://tinkerpop.apache.org/docs/3.1.0-incubating/#neo4j-gremlin



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


Re: A Type System Core to mm-ADT

2019-05-26 Thread Joshua Shinavier
OK. I see the "referent" concept is broader than I had thought. They are
not just pointers, but (paraphrasing) expressions awaiting evaluation. The
"referent pattern" is more or less the type of the expression, i.e. the
type of whatever the expression evaluates to. For example, in Haskell
notation:

  sum [1,2,3] :: Int

Here, "sum [1,2,3]" is the reference. The referent is something which has
yet to be determined (the number 6). We know that the referent's type is
Int, and we can type-check the expression to be verify that it will produce
an Int. Another example:

  fmap (\n -> "number " ++ show n) $ filter (> 1) [1,2,3] :: [String]

Here, "fmap ... [1,2,3]" is the reference, and the referent is a list of
strings: ["number 2","number 3"].

Instruction patterns seem like additional "referents" to me, with the
difference that they are applied to objects, and that they are composed of
concrete instructions. If a referent is nullary (has some type "a"), an
instruction pattern seems unary (has some type "a->b", consuming an "a" and
producing a "b"). But I need to grok more.

Josh






On Sun, May 26, 2019 at 6:51 PM Marko Rodriguez 
wrote:

> Hello,
>
> >> [db][get,’people’] // *{name:@string, age:!gt(20)&!lt(33)}
> >>
> >> We have lost information about the “schema.” This is not good as
> >> compile-time write validation is not possible.
> >>
> >
> > So far, I am thinking: yeah, dealing with schema changes can be tricky.
>
> This is not a “schema change” but a greater specification of what the
> referents are. Again, a reference is defined by its referent pattern (and
> instruction patterns). The referent pattern is a description of the current
> instances (referents) while the “schema” is a description of what is legal
> for all instances (referents). Without “schema,” I lose compile-time
> validation.
>
> > I then create a people-key on the db map that maintains a
> person-reference.
> >>
> >
> > OK. I think by people-key you mean the primary key for the person type,
> > i.e. the vertex id. Correct me if I am wrong.
>
> No. db.get(‘people’) is the "people table.” RDBMSs are modeled as a map
> with the keys being the table names and the values being *{:} references to
> maps (i.e. rows).
>
> > I see this as a type plus a constraint. And... you won't be surprised to
> > hear me say this... you express it with a select statement:
> >
> >youngishPeople := σ_{age <= 20 ∧ age >= 33}(people)
>
> Well, type definitions like this won’t happen at runtime. The VM will just
> be able to tell you if the range has been restricted. It won’t create new
> types. But yea, referent patterns are (now) a type plus a constraint.
> Before, they were just constraints and that is why I lost schema
> information at runtime.
>
> Take care,
> Marko.
>
> http://rredux.com
>
>


Re: A Type System Core to mm-ADT

2019-05-26 Thread Joshua Shinavier
Hi Marko,

Ryan Wisnesky, Jan Hidders, and I have been jamming on the Algebraic
Property Graphs paper, which formalizes that type system. It's at 17 pages;
I hope to have it in a sharable state tomorrow after I add an intro
section. Looking forward to your feedback. We have delved into the TP4
virtual machine specifically, as it is in flux until the spec comes out,
but Ryan has laid out a strategy for bridging the gap between the type
system and programming environments like Java, Python, CQL. I believe the
same approach can be used for mm-ADT.

Responses inline.



On Sun, May 26, 2019 at 4:34 PM Marko Rodriguez 
wrote:

> Hi,
>
> *** This email is primarily for Josh (and Kuppitz).
>
> I think we need Josh's type system core to mm-ADT. I’m facing the problem
> of having to model both a “range” (legal schema) and a “codomain” (current
> schema) of the referents of a reference. Let me explain with an example.
>
> Suppose that there is an SQL table called ‘people’ and the table is empty.
> When I mm-ADT serves up a this table, it looks like this in mm-ADT:
>
>
> [db][get,’people’] // *{name:@string, age:@int}
>
> This says that ‘people’ is a pointer to maps containing a name-key with a
> string value and an age-key with an integer value.
>
> Now lets say I insert some rows into this table. Now, according to the
> mm-ADT spec, every reference must have as much information as possible
> about the referents. Thus, the people-reference pattern can change. Lets
> assume it does and it now is:
>
> [db][get,’people’] // *{name:@string, age:!gt(20)&!lt(33)}
>
> We have lost information about the “schema.” This is not good as
> compile-time write validation is not possible.
>

So far, I am thinking: yeah, dealing with schema changes can be tricky.


Thus, I want to make a distinction between “range” and “codomain”. Here is
> some bytecode:
>
> [db][define,’person’,{name:@string,age:@int}]
> [db][create,’people’,*person]
>
> I define a type called person, where all such instances must match the
> respective map-pattern.

I then create a people-key on the db map that maintains a person-reference.
>

OK. I think by people-key you mean the primary key for the person type,
i.e. the vertex id. Correct me if I am wrong.


Now:
>
> [db][add,’people’,{name:marko,age:29}]
> [db][add,’people’,{name:josh,age:32}]
> ...
> [db][get,’people']// *person{name:@string, age:!gt(20)&!lt(33)}
> [db][type,’person']   // {name:@string, age:@int}
>
> Thus, when I get the reference at people, I see the “codomain” of current
> person referents, but when I get the person-type, I get the “range” of
> legal person referents.
>

I see this as a type plus a constraint. And... you won't be surprised to
hear me say this... you express it with a select statement:

youngishPeople := σ_{age <= 20 ∧ age >= 33}(people)



> In this way, “types” become central to mm-ADT, where schema is crucial in
> specifying a referent range.
>

As you know, I would like to see types become central. With types come the
possibility of type inference, as well as easier integration with the type
systems of other frameworks. Anything which conforms to the simply typed
lambda calculus, including extensions to product and coproduct types, will
be in the native language of the TP4 VM.



> —I have more to say on the necessity of multi-types (union of types) and
> their role in pattern definitions.
>

I think you agree that with both products and coproducts in the type
system, we can support functional pattern matching in the style of Haskell
or Scala, which is a powerful feature that we just couldn't take advantage
of in  TP1..3, for lack of a strong type system.

Josh



>
> Thoughts?,
> Marko.
>
> http://rredux.com 
>
>
>
>
>


Re: N-Tuple Transactions?

2019-05-15 Thread Joshua Shinavier
Tough question, since I have not used Akka or the actor model, but here are
some first thoughts. From what I am reading, the trick would be to
implement the transaction log as a CRDT
<https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type>.
Operation-based CRDTs -- which propagate individual mutations as opposed to
local state -- appear to be preferable if mutations are commutative. So are
they commutative? In the "imperative" scenario I described to Stephen, no.
In the "functional" scenario, yes, they have to be. Suppose you insert a
vertex and also delete that vertex. The eventually consistent result of the
transaction must be a no-op; if the vertex already exists, leave it alone.
If it does not exist, do not create it. However, it does not matter in what
order you perform the insert and delete -- once all operations are
accounted for, you arrive at the correct state.

Just from what I glean from Wikipedia, there appear to be a handful of
well-known strategies for operation-based and state-based CRDTs. I do not
know how hard it would be to support multiple strategies in the same VM,
but in the Akka world, that seems to be the way in which you would choose
your operational semantics.

Josh




On Wed, May 15, 2019 at 8:00 AM Marko Rodriguez 
wrote:

> Wow. I totally understood what you wrote.
>
> Question: What is the TransactionLog in a distributed environment?
> e.g. Akka-driven traversers spawned from the same
> query migrating around the cluster mutating stuff.
>
> Thanks for the lesson,
> Marko.
>
> http://rredux.com <http://rredux.com/>
>
>
>
>
> > On May 15, 2019, at 8:58 AM, Joshua Shinavier  wrote:
> >
> > Hi Stephen,
> >
> > More the latter. TinkerPop transactions would be layered on top of the
> > native transactions of the database (if any), which gives the VM more
> > control over the operational semantics of a computation in between
> database
> > commits. For example, in many scenarios it would be desirable not to
> mutate
> > the graph at all until a traversal has completed, so that the result does
> > not depend on the order of evaluation. Consider a traversal which adds or
> > deletes elements as it goes. In some cases, you want writes and reads to
> > build on each other, so that what you wrote in one step is accessible for
> > reading in the next step. This is a very imperative style of traversal
> for
> > which you need to understand how the VM builds a query plan in order to
> > predict the result. In many other cases, you might prefer a more
> functional
> > approach, for which you can forget about the query plan. Without VM-level
> > transactions, you don't have this choice; you are at the mercy of the
> > underlying database. The extra level of control will be useful for
> > concurrency and parallelism, as well -- without it, the same programs may
> > have different results when executed on different databases.
> >
> > Josh
> >
> >
> >
> >
> > On Wed, May 15, 2019 at 6:47 AM Stephen Mallette 
> > wrote:
> >
> >> Hi Josh, interesting... we have graphs with everything from no
> transactions
> >> like TinkerGraph to more acid transactional systems and everything in
> >> between - will transaction support as you describe it cover all the
> >> different transactional semantics of the underlying graphs which we
> might
> >> encounter? or is this an approach that helps unify those different
> >> transactional semantics under TinkerPop's definition of a transaction?
> >>
> >> On Wed, May 15, 2019 at 9:23 AM Joshua Shinavier 
> >> wrote:
> >> [...]
>
>


Re: N-Tuple Transactions?

2019-05-15 Thread Joshua Shinavier
Hi Stephen,

More the latter. TinkerPop transactions would be layered on top of the
native transactions of the database (if any), which gives the VM more
control over the operational semantics of a computation in between database
commits. For example, in many scenarios it would be desirable not to mutate
the graph at all until a traversal has completed, so that the result does
not depend on the order of evaluation. Consider a traversal which adds or
deletes elements as it goes. In some cases, you want writes and reads to
build on each other, so that what you wrote in one step is accessible for
reading in the next step. This is a very imperative style of traversal for
which you need to understand how the VM builds a query plan in order to
predict the result. In many other cases, you might prefer a more functional
approach, for which you can forget about the query plan. Without VM-level
transactions, you don't have this choice; you are at the mercy of the
underlying database. The extra level of control will be useful for
concurrency and parallelism, as well -- without it, the same programs may
have different results when executed on different databases.

Josh




On Wed, May 15, 2019 at 6:47 AM Stephen Mallette 
wrote:

> Hi Josh, interesting... we have graphs with everything from no transactions
> like TinkerGraph to more acid transactional systems and everything in
> between - will transaction support as you describe it cover all the
> different transactional semantics of the underlying graphs which we might
> encounter? or is this an approach that helps unify those different
> transactional semantics under TinkerPop's definition of a transaction?
>
> On Wed, May 15, 2019 at 9:23 AM Joshua Shinavier 
> wrote:
> [...]


Re: N-Tuple Transactions?

2019-05-15 Thread Joshua Shinavier
Hi Marko,

Get ready for monads
. I mentioned
them in my post
 on
algebraic property graphs. In functional programming, monads are a typical
way of composing chains of stateful operations together in such that they
do not violate functional purity. For example, an operation which adds a
vertex to a graph can be thought of as a function f : Graph -> Graph that
takes a graph as its input, adds a vertex, and returns the resulting graph
as its output. The function f doesn't actually mutate the graph on disk,
but it gives you an in-memory representation of the mutated graph, which
can then be persisted to disk. Some things you need in order to make this
work:

1) a snapshot of the state of the graph / database as it existed when the
transaction was started
2) a transaction log, within the TinkerPop VM, containing all atomic
changes that were made to the graph since the transaction was started
3) a view of the graph overlaid with the contents of the transaction log
4) the ability to persist the transaction log to the database

Items (1) and (4) are pretty trivial if the underlying database itself
supports transactions. Item (2) is easy if we use a basic state monad. More
on that below. Item (3) requires some insight into how graphs and other
data structures are represented in TinkerPop4, and this is where the
interaction between the basic data model and the VM comes in. In terms of
what I called the APG data model, there are three basic changes of state:

1) add an element of a given type. E.g. the edge with label knows and id 42
didn't exist before, and now it does.
2) remove an element of a given type. E.g. the edge with label knows and id
42 existed before, but now it doesn't.
3) mutate an existing element of a given type. E.g. the element with label
knows and id 42 used to have Person vertex 1 as its out-element and Person
vertex 4 as its in-element, but now it has Person vertex 6 as its
in-element.

In other words, we support *create*, *update*, and *delete* operations for
typed elements. *Read* operations do not require appending to the
transaction log. Now, given that we have mutated the graph in our
transaction, but the graph on disk has not changed, how do we deliver a
consistent view of the mutated graph to subsequent read operations in the
same transaction? If we think of the graph as a set of relations (tables,
indexes), then we just need to wrap each read operation, from each table,
in such a way that the read operation respects the transaction log.

For example, if we have a relation like V() that represents all vertices in
the graph, and we have added a vertex, then the iterator for V() should be
the raw V() iterator for the unmodified graph -- filtered to exclude all
*delete* elements in the transaction log which are vertices -- concatenated
with a filtered iterator over all *create* elements which are vertices.
Once you have committed your transaction, the transaction log is empty, so
these wrapped iterators provide exactly the same elements as the raw
iterators.

How do you build a transaction log within a traversal? With a state monad
.
A state monad will allow you to execute any basic VM instruction and carry
the transaction log along with the computation. Most instructions are a
no-op with respect to state, but those few instructions which do affect
state must append to the transaction log. For example, a V() operation
doesn't just give you an iterator over vertices; it gives you a pair of
objects: the iterator over vertices, and also the transaction log . A
create-vertex operation also gives you a pair of objects: the newly-created
vertex, and also the state, in which we have appended a *create* element to
the transaction log.

In terms of language support, Java 8+ supports some things
 that happen to be
monads, where the flatMap method is equivalent to the monadic bind
operator. Of course, you can also implement your own monads in Java. Scala
does not have a built-in monad concept or syntactic sugar either, although
it does have better support for higher-kinded types in general. In any
case, we only really need to implement one monad for the sake of
transactions: call it State or Transaction. In plain old Java, this would
look something like the following (ignoring applicative functors, which are
awkward in Java):

public class TransactionLog {
private Iterable created;
private Iterable updated;
private Iterable deleted;

public TransactionLog append(TransactionLog other) {
...
}
}

public class State {
private TransactionLog log;
private A object;

public State(A object, TransactionLog log) {
this.object = object;
this.log = log;
}

public TransactionLog getLog() {
 

Re: N-Tuples, Pointers, Data Model Interfaces, and Bytecode Instructions

2019-05-12 Thread Joshua Shinavier
Hi Marko,

Just a quick reply for now, but this sounds like a good game plan, and
there is certainly a ton of research on query optimization, adaptive query
processing, etc. that could help to guide the development, and more
opportunities for static analysis (i.e. a type system) will help.

Btw. my initial implementation of MatchStep had some stats-driven
optimization built in. It was a little "off the top of my head", and also
not OLAP-friendly, but it would be worthwhile to revisit this in a bigger
and more formal way in TP4.

Josh


On Sun, May 12, 2019 at 10:37 AM Marko Rodriguez 
wrote:

> Hi,
>
> Thank you for your reply Josh. I think we are ultimately converging on:
>
> TinkerPop4 as an open source, programmable query planner.
>
> We are going in this direction because of our three requirements:
> 1. Easy for language designers to have their language work with
> various data processing/storage systems.
> 2. Easy for processing engines to execute language expressions
> over data.
> 3. Easy for data storage systems to expose their data for
> computation.
> All the “easy”-talk implies “hard”-work for us. :)
>
> Thus far, TinkerPop as a graph processing framework, query planning has
> been relatively straightforward as we have relied on
> (1) users knowing the shape of their data and writing queries that
> are ‘smart’.
> (2) compiler strategies rewriting common expressions into cheaper
> expressions.
> (3) our match()-step for dynamically resorting patterns based on
> runtime cost analyses.
> However, I think moving forward (to capture more complex data scenarios)
> we will need data storage system providers to expose statistics about their
> data. What does this entail? I believe it entails allowing data storage
> systems to expose:
> (1) data paths (i.e. supported references through the data)
> (2) data statistics (i.e. the time and space costs associated with
> particular data paths)
>
> For SQL query planning, it will take a few years for our framework to
> become top-notch. However, for Gremlin/Cypher, RQL/SPARQL,
> MongoQuery/XPath, CQL, etc. I believe we can pull it off with the resources
> we have on our first release as these “NoSQL”-systems tend to have simple
> 'data paths’ with, arguably, graph and RDF being the most difficult to
> reason on.
>
> ———
>
> What does the TP4 VM need to know and how will the various system
> components (language, processor, structure) provide that information?
>
> I believe we have been talking about this the whole time except now I am
> introducing costs.
> * What are the types of tuples and how do they relate?
> * How much does it cost to move through these tuple-relations?
>
> pg.graph
>   [data access instructions]
>   V()
>   V(id)
>   V(key,value)
>   [data costs instructions]
>   cost(V())
>   cost(V(id))
> pg.graph.vertex
>   [data access instructions]
>   out()
>   out(string)
>   [data costs instructions]
>   cost(out())
>   cost(out(string))
>   …
>
> In other words, for every type of tuple, we need to know what instructions
> it supports and we we need to know the time/space costs of said
> instructions.
>
> TP4 VM uses the “data cost”-instructions to construct the query
> plan.
> cost(out(‘knows’)) = { space:10345, time:O(1) } //
> in-memory graphdb
> cost(out(‘knows’)) = { space:10345, time:O(log(n)) } //
> RDBMS-encoded graphdb
> TP4 processors use the “data access”-instructions to process the
> data.
> out(‘knows’) -> Iterator
>
> What I showed above was PropertyGraphs as of TP2. What about when labels
> and schemas are involved? This is where Josh’s concept of “typing” comes
> into play.
>
> pg.graph
> pg.graph.vertex.person
>   out(‘knows’)
>   out(‘created’)
>   cost(out(‘knows’))
>   cost(out(‘created’))
> pg.graph.vertex.project
> pg.graph.edge.knows
> pg.graph.edge.created
> ...
>
> With schema, we of course get more refined statistics….
>
> Thus, I think that we are ultimately trying to create a Multi-Model ADT
> that exposes data paths (the explicit structure in the data including
> auxiliary indices) and data costs (the time/space-statistics of such
> paths). TP4 VM uses that information to:
>
> 1. Take unoptimized bytecode from the language provider (easy for
> them, only operational semantic query planning required).
> 2. Convert that bytecode into an optimized bytecode for the data
> storage system (easy for them, they only need to say what instructions they
> support and costs).
> 3. Submit that bytec

Re: N-Tuples, Pointers, Data Model Interfaces, and Bytecode Instructions

2019-05-11 Thread Joshua Shinavier
Oops, looked back at my email and noticed some nonsense. This:

knows_weight_gt_85(v, e) = knows(v, e) ∧ weight(e, w) ∧ w>0.85


should be more like this:

knows_weight_gt_85(_, v, e) = knows(e, v, _) ∧ weight(_, e, w) ∧ w>0.85


because we need the identity of edges (and by extension, properties). So we
treat edge relations like triples, where the first element of the triple is
the identity of the edge. The _ symbol represents "don't care" variables.

Josh


On Sat, May 11, 2019 at 8:01 AM Joshua Shinavier  wrote:

> Hi Marko,
>
> Responses inline.
>
>
> On Tue, May 7, 2019 at 6:26 AM Marko Rodriguez 
> wrote:
>
>> Whoa.
>>
>> Check out this trippy trick.
>>
>> First, here is how you define a pointer to a map-tuple.
>>
>> *{k1?v1, k2?v2, …, kn?vn}
>> * says “this is a pointer to a map" { }
>> ? is some comparator like =, >, <, !=, contains(), etc.
>>
>
> OK.
>
>
>
>
>> Assume the vertex map tuple v[1]:
>>
>> {#id:1, #label:person, name:marko, age:29}
>>
>> Now, we can add the following fields:
>>
>> 1. #outE:*{#outV=*{#id=1}}  // references all tuples that have an outV
>> field that is a pointer to the the v[1] vertex tuple.
>>
>
> Yes, I agree. You won't be surprised to hear me say that this boils down
> to good old select() and project(), e.g.
>
> g.V(1).select("*", "out")
>
>
> Here, I am using "*" as shorthand for "any matching relation", i.e.
> anything with an "out" (your "outV") field.
>
>
>
>> 2. #outE.knows:*{#outV=*{#id=1},#label=knows} // references all outgoing
>> knows-edges.
>>
>
> g.V(1).select("knows", "out")
>
>
>
> 3. #outE.knows.weight_gt_85:*{#outV=*{#id=1},#label=knows,weight>0.85} //
>> references all strong outgoing knows-edges
>>
>
> g.V(1).select("knows", "out").as("e").select("weight",
> "out").project("in").is(P.gt(0.85)).back("e")
>
>
> I like how you are giving a name to a new relation built up of other
> relations. In relational calculus, this looks something like:
>
> knows_weight_gt_85(v, e) = knows(v, e) ∧ weight(e, w) ∧ w>0.85
>
>
> And I do think we should make defining new relations as straightforward as
> possible in TP4. Compositionality is life.
>
>
>
>> By using different types of pointers, a graph database provider can make
>> explicit their internal structure. Assume all three fields above are in the
>> v[1] vertex tuple. This means that:
>>
>> 1. all of v[1]’s outgoing edges are group together. <— linear scan
>>
>
> By convention, a relation could be indexed left to right, so 
> knows_weight_gt_85(v,
> e) would express exactly that.
>
>
>
>> 2. all of v[1]’s outgoing knows-edges are group together. <—
>> indexed by label
>>
>
> Same.
>
>
>
>> 3. all of v[1]’s strong outgoing knows-edges are group together
>> <— indexed by label and weight
>>
>
> Yep.
>
>
>
>> Thus, a graph database provider can describe the way in which it
>> internally organizes adjacent edges — i.e. vertex-centric indices!
>
>
> This looks like convergence.
>
>
>
>> This means then that TP4 can do vertex-centric index optimizations
>> automatically for providers!
>>
>
> Ex-actly.
>
>
>
>> 1. values(“#outE”).hasLabel(‘knows’).has(‘weight’,gt(0.85)) //
>> grab all edges, then filter on label, then filter on weight.
>> 2. values(“#outE.knows”).has(‘weight’,gt(0.85)) // grab all
>> knows-edges, then filter on weight.
>> 3. values(“#outE.knows.weight_gt_85”) // grab all strong
>> knows-edges.
>>
>> *** Realize that Gremlin outE() will just compile to bytecode
>> values(“#outE”).
>>
>> Freakin’ crazy! … Josh was interested in using the n-tuple structure to
>> describe indices. I was against it. I believe I still am. However, this is
>> pretty neat. As Josh was saying though, without a rich enough n-tuple
>> description of the underlying database, there should be no reason for
>> providers to have to write custom strategies and instructions ?!?!?!?!?
>> crazy!?
>>
>
> I think we might not mean the same thing by "indices", so maybe we just
> don't get hung up on that term, but we are on the same page w.r.t. what you
> wrote in this email. What's more, these indices... ok, maybe we do need to
> call them indi

Re: N-Tuples, Pointers, Data Model Interfaces, and Bytecode Instructions

2019-05-11 Thread Joshua Shinavier
Hi Marko,

Responses inline.


On Tue, May 7, 2019 at 6:26 AM Marko Rodriguez  wrote:

> Whoa.
>
> Check out this trippy trick.
>
> First, here is how you define a pointer to a map-tuple.
>
> *{k1?v1, k2?v2, …, kn?vn}
> * says “this is a pointer to a map" { }
> ? is some comparator like =, >, <, !=, contains(), etc.
>

OK.




> Assume the vertex map tuple v[1]:
>
> {#id:1, #label:person, name:marko, age:29}
>
> Now, we can add the following fields:
>
> 1. #outE:*{#outV=*{#id=1}}  // references all tuples that have an outV
> field that is a pointer to the the v[1] vertex tuple.
>

Yes, I agree. You won't be surprised to hear me say that this boils down to
good old select() and project(), e.g.

g.V(1).select("*", "out")


Here, I am using "*" as shorthand for "any matching relation", i.e.
anything with an "out" (your "outV") field.



> 2. #outE.knows:*{#outV=*{#id=1},#label=knows} // references all outgoing
> knows-edges.
>

g.V(1).select("knows", "out")



3. #outE.knows.weight_gt_85:*{#outV=*{#id=1},#label=knows,weight>0.85} //
> references all strong outgoing knows-edges
>

g.V(1).select("knows", "out").as("e").select("weight",
"out").project("in").is(P.gt(0.85)).back("e")


I like how you are giving a name to a new relation built up of other
relations. In relational calculus, this looks something like:

knows_weight_gt_85(v, e) = knows(v, e) ∧ weight(e, w) ∧ w>0.85


And I do think we should make defining new relations as straightforward as
possible in TP4. Compositionality is life.



> By using different types of pointers, a graph database provider can make
> explicit their internal structure. Assume all three fields above are in the
> v[1] vertex tuple. This means that:
>
> 1. all of v[1]’s outgoing edges are group together. <— linear scan
>

By convention, a relation could be indexed left to right, so
knows_weight_gt_85(v,
e) would express exactly that.



> 2. all of v[1]’s outgoing knows-edges are group together. <—
> indexed by label
>

Same.



> 3. all of v[1]’s strong outgoing knows-edges are group together <—
> indexed by label and weight
>

Yep.



> Thus, a graph database provider can describe the way in which it
> internally organizes adjacent edges — i.e. vertex-centric indices!


This looks like convergence.



> This means then that TP4 can do vertex-centric index optimizations
> automatically for providers!
>

Ex-actly.



> 1. values(“#outE”).hasLabel(‘knows’).has(‘weight’,gt(0.85)) //
> grab all edges, then filter on label, then filter on weight.
> 2. values(“#outE.knows”).has(‘weight’,gt(0.85)) // grab all
> knows-edges, then filter on weight.
> 3. values(“#outE.knows.weight_gt_85”) // grab all strong
> knows-edges.
>
> *** Realize that Gremlin outE() will just compile to bytecode
> values(“#outE”).
>
> Freakin’ crazy! … Josh was interested in using the n-tuple structure to
> describe indices. I was against it. I believe I still am. However, this is
> pretty neat. As Josh was saying though, without a rich enough n-tuple
> description of the underlying database, there should be no reason for
> providers to have to write custom strategies and instructions ?!?!?!?!?
> crazy!?
>

I think we might not mean the same thing by "indices", so maybe we just
don't get hung up on that term, but we are on the same page w.r.t. what you
wrote in this email. What's more, these indices... ok, maybe we do need to
call them indices... can be relations of more than two variables. See the
geospatial index example from my previous email. A vertex-centric index is
to an edge what a generic index is to a hyperedge.


Josh



>
> Marko.
>
> http://rredux.com 
>
>
>
>
> > On May 7, 2019, at 4:44 AM, Marko Rodriguez 
> wrote:
> >
> > Hey Josh,
> >
> >> I think of your Pointer as a reference to an entity. It does not
> contain
> >> the entity it refers to, but it contains the primary key of that entity.
> >
> > Exactly! I was just thinking that last night. Tuples don’t need a
> separate ID system. No -- pointers reference the primary key of a tuple!
> Better yet perhaps, they can reference one-to-many. For instance:
> >
> > { id:1, label:person, name:marko, age:29, outE:*(outV=id) }
> >
> > Thus, a pointer is defined by a pattern match. Haven’t thought through
> the consequences, but … :)
> >
> >> Here, I have invented an Entity class to indicate that the pointer
> resolves
> >> to a vertex (an entity without a tuple, or rather with a 0-tuple -- the
> >> unit element).
> >
> > Ah — the 0-tuple. Neat thought.
> >
> > I look forward to your slides from the Knowledge Graph Conference. If I
> wasn’t such a reclusive hermit, I would have loved to have joined you there.
> >
> > Take care,
> > Marko.
> >
> > http://rredux.com 
> >
> >
> >> On Mon, May 6, 2019 at 9:38 PM Marko Rodriguez  > wrote:
> >>
> >>> Hey Josh,
> >>>
>  I am feeling the tuples... as long

Re: The Fundamental Structure Instructions Already Exist! (w/ RDBMS Example)

2019-05-11 Thread Joshua Shinavier
OK, beginning at the beginning.


On Mon, May 6, 2019 at 3:58 AM Marko Rodriguez  wrote:

> Hey Josh,
>
>
> > One more thing is needed: disjoint unions. I described these in my email
> on
> > algebraic property graphs. They are the "plus" operator to complement the
> > "times" operator in our type algebra. A disjoint union type is just like
> a
> > tuple type, but instead of having values for field a AND field b AND
> field
> > c, an instance of a union type has a value for field a XOR field b XOR
> > field c. Let me know if you are not completely sold on union types, and I
> > will provide additional motivation.
>
> Huh. That is an interesting concept. Can you please provide examples?
>

Yes. If you think back to your elementary school algebra, you will recall
four basic associative operations: addition, multiplication, subtraction,
and division. Simple stuff, but let's make things even simpler by throwing
out inverses. So we have: addition and multiplication. You also need unit
elements 0 and 1 which have the usual properties. This structure is called
a semiring , and with it, you can
build up a rich type system, and allows you to reason on equations of
types. Multiplication represents the concatenation of tuples -- a × b × c
is a type that has a AND b and c -- whereas addition represents a choice -- a +
b + c is a type that has a XOR b XOR c.

Examples of multiplication are edges (e.g. a knows edge type is the product
of Person and Person; the out-vertex is a person, and the in-vertex is a
person) and properties (e.g. age is a product of Person and the primitive
integer type). For example, you could express the knows type as Person
× Person or as prod{out=Person, in=Person} if you want to give names to the
components of tuples (fields).

Examples of addition are in- or out-types which are a disjunction of other
types. For example, in the TInkerPop classic graph, the name property can
attach to either a Person or a Project, so the type is (Person +
Project) × string, or prod{out=sum{person=Person, project=Project},
in=string} if you want field names.

Just as the teacher made you do at the blackboard, you can distribute
multiplication over a sum, so

(Person + Project) × string = (Person × string) + (Project × string)


 In other words, a name property which can attach either to a person or
project is equivalent to two distinct properties, maybe call them personName
and projectName, which each attach to only one type of vertex.

Other fun things you can build with unions include lists, trees, and other
recursive data structures. How do you formalize a "list of people" as a
type? Well, you can think of it in this way:

ListOfPeople = () + (Person) + (Person × Person) + (Person × Person ×
Person) + ...


In other words, a list of people can be either the additive unit (0-tuple),
a single person, a pair of people, a triplet of people... an n-tuple of
people for any n >= 0. You could also write:

ListOfPeople = () + (Person × ListOfPeople)


Products let you concatenate types and tuples to build larger types and
tuples; sums enable choices and pattern matching.



> One thing I want to stress. The “universal bytecode” is just standard
> [op,arg*]* bytecode save that data access is via the “universal model's"
> db() instruction. Thus, AND/OR/pattern matching/etc. is all available.
> Likewise union(), repeat(), coalesce(), choose(), etc. are all available.
>
> db().and(as('a').values('knows').as('b'),
>  or(as('a').has('name','marko'),
> as('a').values(‘created').count().is(gt(1))),
>  as('b').values(’created').as('c')).
>  path(‘c')
>

No disagreement. This is essentially functional pattern matching as
motivated above, though it includes a condition we wouldn't include in the
type system itself: the "created" count.



> As you can see, and()/or() pattern matching is possible and can be nested.
>   *** SIDENOTE: In TP3, such nested and()/or() pattern matching is
> expressed using match() where the root grouping is assumed to be and()’d
> together.
>

Yep.



>   *** SIDENOTE: In TP4, I want to get rid of an explicit match() bytecode
> instruction and replace it with and()/or() instructions with prefix/suffix
> as()s.
>

Hmm. I think the match() syntax is useful, even if you can build match()
expressions out of and() and or(). Or maybe we just point users to
OpenCypher if they want conjunctive query patterns. Jeremy Hanna and I
chatted about this at the conference earlier this week... it is really just
a matter of providing the best syntactic sugar. You CAN do everything that
match() or OpenCypher can do in Gremlin, but this is not to say you always
SHOULD.



>[...]

> Or other tuples, or tagged values. E.g. any edge projects to two vertices,
> > which are (trivial) tuples as opposed to primitive values.
>
> Good point. I started to do some modeling and I’ve been getting some good
> mileage from a new “pointer” primitive. Assume every

Re: N-Tuples, Pointers, Data Model Interfaces, and Bytecode Instructions

2019-05-06 Thread Joshua Shinavier
Let me get back to you in a couple of days (post- Knowledge Graph
Conference) with a more detailed reply, but I think we are more or less on
the same page w.r.t. "pointers". With respect to sequences, in terms of
type theory you can think of them as recursive types which make use of the
disjoint union I have been pushing for. You can also think of them as a
union of tuple types. E.g. the type of a list of strings is

() + (String) + (String, String) + (String, String, String) etc.

You need that + operator to unify tuple types and sequences in this way.

I think of your Pointer as a reference to an entity. It does not contain
the entity it refers to, but it contains the primary key of that entity. If
you assume that primary keys are unique for a given type of entity (another
way of saying that there is only one relation for each entity type), then
the primary key uniquely resolves to a tuple. Otherwise, Pointer is similar
to any other value. Say we use Value for data-typed values. Using your
Pair, the type of an "age" property would look like:

Pair, Value>

Here, I have invented an Entity class to indicate that the pointer resolves
to a vertex (an entity without a tuple, or rather with a 0-tuple -- the
unit element).

Josh




On Mon, May 6, 2019 at 9:38 PM Marko Rodriguez  wrote:

> Hey Josh,
>
> > I am feeling the tuples... as long as they can be typed, e.g.
> >
> >  myTuple.get(Integer) -- int-indexed tuples
> >  myTuple.get(String) -- string-indexed tuples
> > In most programming languages, "tuples" are not lists, though they are
> typed by a list of element types. E.g. in Haskell you might have a tuple
> with the type
> > (Double, Double, Bool)
>
>
> Yes, we have Pair, Triple, Quadruple, etc. However
> for base Tuple of unknown length, the best I can do in Java is . :|
> You can see my stubs in the gist:
> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8 <
> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8> (LINES
> #21-42)
>
> > If this is in line with your proposal, then we agree that tuples should
> be the atomic unit of data in TP4.
>
> Yep. Vertices, Edges, Rows, Documents, etc. are all just tuples. However,
> I suspect that we will disagree on some of my tweaks. Thus, I’d really like
> to get your feedback on:
>
> 1. pointers (tuple entries referencing tuples).
> 2. sequences (multi-value tuple entries).
> 3. # hidden map keys :|
> - sorta ghetto.
>
> Also, I’m still not happy with db().has().has().as(‘x’).db().where()… its
> an intense syntax and its hard to strategize.
>
> I really want to nail down this “universal model” (tuple structure and
> tuple-oriented instructions) as then I can get back on the codebase and
> start to flush this stuff out with confidence.
>
> See ya,
> Marko.
>
> http://rredux.com 
>
>
> >
> > Josh
> >
> >
> > On Mon, May 6, 2019 at 5:34 PM Marko Rodriguez  > wrote:
> > Hi,
> >
> > I spent this afternoon playing with n-tuples, pointers, data model
> interfaces, and bytecode instructions.
> >
> > https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8 <
> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8> <
> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8 <
> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8>>
> >
> > *** Kuppitz: They are tuples :). A Map extends Tuple>.
> Tada!
> >
> > What I like about this is that it combines the best of both worlds
> (Josh+Marko).
> > * just flat tuples of arbitrary length.
> > * pattern matching for arbitrary joins. (k1=k2 AND k3=k4
> …)
> > * pointers chasing for direct links. (edges, foreign
> keys, document _id references, URI resolutions, …)
> > * sequences are a special type of tuple used for multi-valued
> entries.
> > * has()/values()/etc. work on all tuple types! (maps, lists,
> tuples, vertices, edges, rows, statements, documents, etc.)
> >
> > Thoughts?,
> > Marko.
> >
> > http://rredux.com   http://rredux.com/>>
> >
> >
>
>


Re: The Fundamental Structure Instructions Already Exist! (w/ RDBMS Example)

2019-05-03 Thread Joshua Shinavier
Hi Marko,

Thanks for the detailed emails. Responses inline.


On Thu, May 2, 2019 at 6:40 AM Marko Rodriguez  wrote:

> [...]
> Thus, there exists a data model that can describe these database
> structures in a database agnostic manner.
> - not in terms of tables, vertices, JSON, column families, …
>

100% with you on this.



> While we call this a “universal model” it is NOT more “general”
> (theoretically powerful) than any other database structure.
>

I agree. We should be trying harder to find equivalences, as opposed to
introducing a "bigger, better, brand-new shiny" data model.



> Reasons for creating a “universal model”.
>
> 1. To have a reduced set of objects for the TP4 VM to consider.
> - edges are just vertices with one incoming and outgoing
> “edge.”
>

Kinda. Let's say edges are elements with two fields. Vertices are elements
with no fields.



> - a column family is just a “map” of rows which are just
> “maps.”
>

Kinda. Let's say a table / column family is a data type with a number of
fields. Equivalently, it is a relation with a number of columns. You
brought up a good point in your previous email w.r.t. "person" vs.
"people", but that's why mappings are needed. A trivial schema mapping
gives you an element type "person" from a relation/table "people" and vice
versa. The table and the type are equivalent.



> - tables are just groupings of schema-equivalent rows.
>

Agreed. The "universal model" just makes an element out of each row.



> 2. To have a limited set of instructions in the TP4 bytecode
> specification.
> - outE/inE/outV/inV are just following direct “links”
> between objects.
>

inV and outV, yes, because they are fields of an edge element. outE and inE
are different, because they are not fields of the vertex. However, they are
functions. You can put them in the same namespace as inV and outV if you
want to; just keep in mind that in terms of relational algebra, they are a
fundamentally different operation.



> - has(), values(), keys(), valueMap(), etc. need not just
> apply to vertices and edges.
>

Agreed.



> 3. To have a simple serialization format.
> - we do not want to ship around
> rows/vertices/edges/documents/columns/etc.
> - we want to make it easy for other languages to integrate
> with the TP4 VM.
> - we want to make it easy to create TP4 VMs in other
> languages.
>

What is easier than a table? Any finite graph in this model is just a
collection of tables which can be shipped around as CSVs, among other
formats.



> 4. To have a theoretical understanding of the relationship between
> the various data structures.
> - “this is just a that” is useful to limit the
> complexities of our codebase and explain to the public how different
> database relate.
>

Yes.



> [...]
> The objects:
> 1. primitives: floats, doubles, Strings, ints, etc.
>

Yes.



> 2. tuples: key’d collections of primitives. (instances)
> 3. relations: groupings of tuples with ?equivalent? schemas.
> (types)
>

These are the same thing. A tuple is a row, is an element. A relation is a
set of elements/tuples/rows of the same type.

One more thing is needed: disjoint unions. I described these in my email on
algebraic property graphs. They are the "plus" operator to complement the
"times" operator in our type algebra. A disjoint union type is just like a
tuple type, but instead of having values for field a AND field b AND field
c, an instance of a union type has a value for field a XOR field b XOR
field c. Let me know if you are not completely sold on union types, and I
will provide additional motivation.



> The instructions:
> 1. relations can be “queried” for matching tuples.
>

Yes.



> 2. tuple values can be projected out to yield primitives.
>

Or other tuples, or tagged values. E.g. any edge projects to two vertices,
which are (trivial) tuples as opposed to primitive values.


Lets do a “traversal” from marko to the people he knows.
>
> // g.V().has(‘name’,’marko’).outE(‘knows’).inV().values(‘name’)
>
> db(‘person’).has(‘name’,’marko’).as(‘x’).
> db(‘knows’).has(‘#outV’, path(‘x’).by(‘#id’)).as(‘y’).
> db(‘person’).has(‘#id’, path(‘y’).by(‘#inV’)).
>   values(‘name’)
>

I still don't think we need the "db" step, but I think that syntax works --
you are distinguishing between fields and higher-order things like
properties by using hash characters for the field names.



> While the above is a single stream of processing, I will state what each
> line above has at that point in the stream.
> - [#label:person,name:marko,age:29]
>

Keeping in mind that "name" and "age" are property keys as opposed to
fields, yes.



> - [#label:knows,#outV:1,#inV:2,weight:0.5], ...
> - [#label:person,name:vadas,age:27], ...
> - vadas, ...
>

OK.



> Dat

Re: TP4 + Cypher

2019-04-30 Thread Joshua Shinavier
+1

For another data point, Alastair Green, also of Neo4j, was here at Uber
last week to align on property graph standardization. This was just before
the thread on universal structure / algebraic property graphs. He was open
to using APG as a high-level data model from which the property graph
standard (which will be implemented by Neo4j) can be derived by adding
constraints. Jan Hidders, Ryan Wisnesky, and I have been working on a
set-based formulation of the model which should enable this. If Neo4j is to
be the reference implementation for TP4 property graphs, then it will be
good to have Dimitry in the loop w.r.t. the data modeling discussions, as
well.

Josh


On Tue, Apr 30, 2019 at 10:11 AM Marko Rodriguez 
wrote:

> Hello,
>
> I had the most interesting meeting this morning with Dmitry Novikov (the
> author of Cypher-for-Gremlin). The fellow is sharp and has a thorough
> understanding of Gremlin (language + mechanics). Here are two points to
> consider:
>
> 1.
> https://github.com/opencypher/cypher-for-gremlin/tree/master/tinkerpop/cypher-gremlin-extensions
> <
> https://github.com/opencypher/cypher-for-gremlin/tree/master/tinkerpop/cypher-gremlin-extensions
> >
> - This page presents the issues that he is running into
> trying to get Cypher-for-Gremlin to be 100% openCypher compliant.
> - When he went through each problem one-by-one, I was able
> to say that most of his issues are known and have respective solutions in
> TP4.
> - However, there are some concepts he presented that I was
> completely unaware of. (e.g. generators!)
>
> 2. Neo4j is interested in working closely with TP4.
> - They want Cypher to be the reference implementation
> language for TP4 property graphs.
> - I think this is a great idea.
> - I see SPARQL being the reference implementation language
> for TP4 RDF stores.
> - I see SQL being the reference implementation language
> for TP4 RDBMs.
> - Finally, I see Gremlin as the multi-model assembly
> language for the TP4 VM.
> - graphs, triples, tables, documents, .. Gremlin
> can do it all.
>
> I really like Dmitry and believe collaborating with him will benefit the
> project. When tp4/ stabilizes, I offered that he start working on a
> org.apache.tinkerpop.language.cypher . With both of us working
> side-by-side, we should be able to rectify all the points he identifies in
> (1) above and at the same time, riff on each others’ knowledge to gain a
> deeper understanding of what all of this is all about!
>
> Any thoughts?,
> Marko.
>
> http://rredux.com 
>
>


Re: The Fundamental Structure Instructions Already Exist! (w/ RDBMS Example)

2019-04-30 Thread Joshua Shinavier
Hi Marko,

I like it. But I still have some constructive criticism. I think a little
more simplicity in the right places will make things like index support,
query optimization, and integration with SEDMs (someone else's data model)
that much easier in the future.

First, the "root". While we do need context for traversals, I don't think
there should be a distinct kind of root for each kind of structure. Once
again, select(), or operations derived from select() will work just fine.
Want the "person" table? db.select("person"). Want a sequence of vertices
with the label "person"? db.select("person"). What we are saying in either
case is "give me the 'person' relation. Don't project any specific fields;
just give me all the data". A relational DB and a property graph DB will
have different ways of supplying the relation, but in either case, it can
hide behind the same interface (TRelation?).

But wait, you say, what if the under the hood, you have a TTable in one
case, and TSequence in the other? They are so different! That's why
the Dataflow
Model

is so great; to an extent, you can think of the two as interchangeable. I
think we would get a lot of mileage out of treating them as interchangeable
within TP4.

So instead of a data model -specific "root", I argue for a universal root
together with a set of relations and what we might call an "indexes". An
index is an arrow from a type to a relation which says "give me a
column/value pair, and I will give you all matching tuples from this
relation". The result is another relation. Where data sources differentiate
themselves is by having different relations and indexes.

For example, if the underlying data structure is nothing but a stream of
Trip tuples, you will have a single relation "Trip", and no indexes. Sorry;
you just have to wait for tuples to go by, and filter on them. So if you
say d.select("Trip", "driver") -- where d is a traversal that gets you to a
User -- the machine knows that it can't use "driver" to look up a specific
set of trips; it has to use a filter over all future "Trip" tuples. If, on
the other hand, we have a relational database, we have the option of
indexing on "driver". In this case, d.select("Trip", "driver") may take you
to a specific table like "Trip_by_driver" which has "driver" as a primary
key. The machine recognizes that this index exists, and uses it to answer
the query more efficiently. The alternative is to do a full scan over any
table which contains the "Trip" relation. Since TinkerPop3, we have been
without a vendor-neutral API for indexes, but this is where such an API
would really start to shine. Consider Neo4j's single property indexes,
JanusGraph's composite indexes, and even RDF triple indices (spo, ops,
etc.) as in AllegroGraph in addition to primary keys in relational
databases.

TTuple -- cool. +1

"Enums" -- I agree that enums are necessary, but we need even more: tagged
unions . They are part of the
system of algebraic data types which I described on Friday. An enum is a
special case of a tagged union in which there is no value, just a type tag.
May I suggest something like TValue, which contains a value (possibly
trivial) together with a type tag. This enables ORs and pattern matching.
For example, suppose "created" edges are allowed to point to either
"Project" or "Document" vertices. The in-type of "created" is
union{project:Project, document:Document). Now the in value of a specific
edge can be TValue("project", [some project vertex]) or TValue("document",
[some document vertex]) and you have the freedom to switch on the type tag
if you want to, e.g. the next step in the traversal can give you the "name"
of the project or the "title" of the document as appropriate.

Multi-properties -- agreed; has() is good enough.

Meta-properties -- again, this is where I think we should have a
lower-level select() operation. Then has() builds on that operation.
Whereas select() matches on fields of a relation, has() matches on property
values and other higher-order things. If you want properties of properties,
don't use has(); use select()/from(). Most of the time, you will just want
to use has().

Agreed that every *entity* should have an id(), and also a label() (though
it should always be possible to infer label() from the context). I would
suggest TEntity (or TElement), which has id(), label(), and value(), where
value() provides the raw value (usually a TTuple) of the entity.

Josh



On Mon, Apr 29, 2019 at 10:35 AM Marko Rodriguez 
wrote:

> Hello Josh,
>
> > A has("age",29), for example, operates at a different level of
> abstraction than a
> > has("city","Santa Fe") if "city" is a column in an "addresses" table.
>
> So hasXXX() operators work on TTuples. Thus:
>
> g.V().hasLabel(‘person’).has(‘age’,29)
> g.V().hasLabel(‘address’).has(‘city’,’Santa Fe’)
>
> ..both work as a person-vertex an

Re: The Fundamental Structure Instructions Already Exist! (w/ RDBMS Example)

2019-04-29 Thread Joshua Shinavier
Hi Marko,

I will respond in more detail tomorrow (I'm a late-night-thinking,
early-morning-writing kind of guy) but yes I think this is cool, so long as
we are not overloading the steps with different levels of abstraction.
A has("age",
29), for example, operates at a different level of abstraction than a
has("city",
"Santa Fe") if "city" is a column in an "addresses" table. At least, they
are different if the data model allows for multi-properties,
meta-properties, and hyper-edges. A property is something that can either
be there, attached to an element, or not be there. There may also be more
than one such property, and it may have other properties attached to it. A
column of a table, on the other hand, is always there (even if its value is
allowed to be null), always has a single value, and cannot have further
properties attached. The same goes for values().

In order to simplify user queries, you can let has() and values() do double
duty, but I still feel that there are lower-level operations at play, at a
logical level even if not at a bytecode level. However, expressing the a
traversal in terms of its lowest-level relational operations may also be
useful for query optimization.

Josh



On Mon, Apr 29, 2019 at 7:34 AM Marko Rodriguez 
wrote:

> Hi,
>
> *** This email is primarily for Josh (and Kuppitz). However, if others are
> interested… ***
>
> So I did a lot of thinking this weekend about structure/ and this morning,
> I prototyped both graph/ and rdbms/.
>
> This is the way I’m currently thinking of things:
>
> 1. There are 4 base types in structure/.
> - Primitive: string, long, float, int, … (will constrain
> these at some point).
> - TTuple: key/value map.
> - TSequence: an iterable of v objects.
> - TSymbol: like Ruby, I think we need “enum-like” symbols
> (e.g., #id, #label).
>
> 2. Every structure has a “root.”
> - for graph its TGraph implements TSequence
> - for rdbms its a TDatabase implements
> TTuple
>
> 3. Roots implement Structure and thus, are what is generated by
> StructureFactory.mint().
> - defined using withStructure().
> - For graph, its accessible via V().
> - For rdbms, its accessible via db().
>
> 4. There is a list of core instructions for dealing with these
> base objects.
> - value(K key): gets the TTuple value for the provided key.
> - values(K key): gets an iterator of the value for the
> provided key.
> - entries(): gets an iterator of T2Tuple objects for the
> incoming TTuple.
> - hasXXX(A,B): various has()-based filters for looking
> into a TTuple and a TSequence
> - db()/V()/etc.: jump to the “root” of the withStructure()
> structure.
> - drop()/add(): behave as one would expect and thus.
>
> 
>
> For RDBMS, we have three interfaces in rdbms/.
> (machine/machine-core/structure/rdbms)
>
> 1. TDatabase implements TTuple // the root
> structure that indexes the tables.
> 2. TTable implements TSequence> // a table is a sequence
> of rows
> 3. TRow implements TTuple> // a row has string column
> names
>
> I then created a new project at machine/structure/jdbc). The classes in
> here implement the above rdbms/ interfaces/
>
> Here is an RDBMS session:
>
> final Machine machine = LocalMachine.open();
> final TraversalSource jdbc =
> Gremlin.traversal(machine).
> withProcessor(PipesProcessor.class).
> withStructure(JDBCStructure.class,
> Map.of(JDBCStructure.JDBC_CONNECTION, "jdbc:h2:/tmp/test"));
>
> System.out.println(jdbc.db().toList());
> System.out.println(jdbc.db().entries().toList());
> System.out.println(jdbc.db().value("people").toList());
> System.out.println(jdbc.db().values("people").toList());
> System.out.println(jdbc.db().values("people").value("name").toList());
> System.out.println(jdbc.db().values("people").entries().toList());
>
> This yields:
>
> []
> [PEOPLE:]
> []
> [, ]
> [marko, josh]
> [NAME:marko, AGE:29, NAME:josh, AGE:32]
>
> The bytecode of the last query is:
>
> [db(), values(people),
> entries]
>
> JDBCDatabase implements TDatabase, Structure.
> *** JDBCDatabase is the root structure and is referenced by db()
> *** (CRUCIAL POINT)
>
> Assume another table called ADDRESSES with two columns: name and city.
>
>
> jdbc.db().values(“people”).as(“x”).db().values(“addresses”).has(“name”,eq(path(“x”).by(“name”))).value(“city”)
>
> The above is equivalent to:
>
> SELECT city FROM people,addresses WHERE people.name=addresses.name
>
> If you want to do an inner join (a product), you do this:
>
>
> jdbc.db().values(“people”).as(“x”).db().values(“addresses”).has(“name”,eq(path(“x”).by(“name”))).as(“y”).path(“x”,”y")
>
> The above is equivalent to:
>
> SELECT * FROM addresses INNER JOIN people ON 

Re: [TinkerPop] Re: A TP4 Structure Agnostic Bytecode Specification (The Universal Structure)

2019-04-26 Thread Joshua Shinavier
These past few days, I have had some requests for a more detailed write-up
of the data model, so here goes. See also my Global Graph Summit
presentation

.

*Algebraic data types*

The basic idea of this data model, which I have nicknamed *algebraic
property graphs*, is that an "ordinary" property graph schema is just a
special case of a broader class of relational schemas with primary and
foreign keys. What are edges if not relations between vertices? What are
properties if not relations between elements (edges or vertices) and
primitive values? In this model, each edge label and property key
identifies a distinct relation in a graph. Vertex types identify unary
relations, i.e. sets.

For example, in the schema of the TinkerPop classic graph, below, Person
and Project are distinct vertex types, knows and created are distinct edge
types, etc. The primitive types are drawn in blue/purple, the vertex types
are salmon-colored, the edge types are yellow, and the property types are
green. The "o" and "i" ports on the boxes represent the "out" (tail) and
"in" (head) component of each edge type or property type.


[image: image.png]


Some details which should stand out visually:
1) In a typical property graph like this one, each type must be placed at
one of three levels: primitive or vertex, vertex property or edge, or edge
property. Vertex meta-properties would be at the third level as well, if
this graph had any. All projections (arrows between types) run from higher
levels to lower levels.
2) Primitive types and vertex types have no projections; all other types
have two. Element ids (i.e. primary keys) are not depicted.
3) Some ports have more than one outgoing arrow. This represents
*disjunction*, e.g. a weight property can be applied *either* to a knows
edge *or* a created edge.

Although disjoint unions may not be common in relational schemas (because
they introduce complexity), they are necessary for supporting
general-purpose algebraic data types
, which are fundamental
to a broad swath of data models which we support at Uber, and which we
would like to support in TinkerPop4.

As we expand beyond vanilla property graphs, we quickly get into
greater-than-binary relations such as this hyper-edge type

:

[image: image.png]


The type is drawn in a different color to indicate that it is neither a
vertex type (an element with no projections), a property type (an element
with projections to another element and a primitive value) or an edge (an
element with projections to two other elements). It is simply an element.
The guiding principle here is similar to that of TinkerPop3's Graph.Features
:
start with a maximally expressive data model, then refine the data model
for a particular context by adding constraints. Some examples of
schema-level constraints:

*) May a type have more than two projections? I.e. are hyper-edges  / n-ary
relations  supported?
*) Can edge types depend on other edge types? I.e. are meta-edges
(sometimes confusingly called hyperedges) supported?
*) Can property types depend on other property types? I.e. are
meta-properties

supported?
*) Does every relation type of arity >= 2 need to have a primary key? I.e.
are compound data types supported (e.g. lat/lon pairs, records like
addresses with multiple fields)?
*) Are recursive / self-referential types (e.g. lists or trees of elements
or primitives) allowed?
etc.

There are also constraints which apply at the instance level, e.g.

*) May a graph contain two edges which differ only by id? I.e. are non-
simple

edges supported?
*) May a graph contain two properties which differ only by id? I.e. are
multi-properties

supported?
*) May a generalized property or edge instance reference itself?
etc.

With the right set of constraints, we obtain a basic property graph data
model. Relaxing the constraints, we can define and manipulate datasets
which are definitely not basic property graphs, but which are nonetheless
graph-like, and for which it makes sense to perform graph traversals. Enter
TinkerPop4.


[image: image.png]


*Graph traversal as relational algebra*

What are the fundamental operations of

Re: [TinkerPop] A TP4 Structure Agnostic Bytecode Specification (The Universal Structure)

2019-04-25 Thread Joshua Shinavier
+10. Great to see structure and process coming together in this way. I
think the algebraic and relational idiom will serve TP4 well. ADTs +
constraints are all we need in order to expose a wide variety of structured
data models to TinkerPop. Traditional property graphs, hypergraphs,
relational DBs, and anything expressed in typical data interchange
languages like Protobuf, Thrift, Avro, etc. Graph I/O and graph stream
processing becomes easier, because a graph is just a set of tuples. There
are fewer barriers to formal analysis of schemas, queries, and mappings,
and new forms of optimization become possible. 40+ years of database
research is now more immediately applicable, and so is the wide world of
category theory. Agreed that there are individual details to nail down, but
the broad strokes of this are looking pretty good.

Josh


On Thu, Apr 25, 2019 at 10:46 AM Marko Rodriguez 
wrote:

> Hello,
>
> This email proposes a TP4 bytecode specification that is agnostic to the
> underlying data structure and thus, is both:
>
> 1. *Turing Complete*: the instruction set has process-oriented
> instructions capable of expressing any algorithm (universal processing).
> 2. *Pointer-Based*: the instruction set has data-oriented instructions
> for moving through referential links in memory (universal structuring).
>
> Turing Completeness has already been demonstrated for TinkerPop using the
> following sub-instruction set.
> union(), repeat(), choose(), etc. // i.e. the standard program flow
> instructions
>
> We will focus on the universal structuring aspect of this proposed
> bytecode spec. This work is founded on Josh Shinavier’s Category Theoretic
> approach to data structures. My contribution has been to reformulate his
> ideas according to the idioms and requirements of TP4 and then deduce a set
> of TP4-specific implementation details.
>
> *TP4 REQUIREMENTS*:
> 1. The TP4 VM should be able to process any data structure (not just
> property graphs).
> 2. The TP4 VM should respect the lexicon of the data structure (not just
> embed the data structure into a property graph).
> 3. The TP4 VM should allow query languages to naturally process their
> respective data structures  (standards compliant language compilation).
>
> Here is a set of axioms defining the structures and processes of a
> universal data structure.
>
> *THE UNIVERSAL STRUCTURE:*
> 1. There are 2 data read instructions — select() and project().
> 2. There are 2 data write instructions — insert() and delete().
> 3. There are 3 sorts of data  — tuples, primitives, and sequences.
> - Tuples can be thought of as “key/value maps.”
> - Primitives are doubles, floats, integers, booleans, Strings, etc.
> - Sequences are contiguous streams of tuples and/or primitives.
> 4 Tuple data is accessed via keys.
> - A key is a primitive used for referencing a value in the tuple. (not
> just String keys)
> - A tuple can not have duplicate keys.
> - Tuple values can be tuples, primitives, or sequences.
>
> Popular data structures can be defined as specializations of this
> universal structure. In other words, the data structures used by relational
> databases, graphdbs, triplestores, document databases, column stores,
> key/value stores, etc. all demand a particular set of constraints on the
> aforementioned axioms.
>
> 
> /// A Schema-Oriented Multi-Relational Structure (RDBMS) ///
> 
>
> *RDBMS CONSTRAINTS ON THE UNIVERSAL STRUCTURE:*
> 1. There are an arbitrary number of global tuple sequences (tables)
> 2. All tuple keys are Strings. (column names)
> 3. All tuple values are primitives. (row values)
> 4. All tuples in the same sequence have the same keys. (tables have
> predefined columns)
> 5. All tuples in the same sequence have the same primitive value type for
> the same key. (tables have predefined row value types)
> Assume the following tables in a relational database.
>
> *vertices*
> id  label   name   age
> 1   person  marko  29
> 2   person  josh   35
>
> *edges*
> id outV label inV
> 0  1knows 2
>
> An SQL query is presented and then the respective TP4 bytecode is provided
> (using fluent notation vs. [op,arg*]*).
>
> // SELECT * FROM vertices WHERE id=1
> select(‘vertices’).has(‘id’,1)
>   => v[1]
> // SELECT name FROM vertices WHERE id=1
> select(‘vertices’).has(‘id’,1).project('name’)
>   => "marko"
> // SELECT * FROM edges WHERE outV=1
> select('edges’).has('outV’,1)
>   => e[0][v[1]-knows->v[2]]
> // SELECT * FROM edges WHERE outV=(SELECT 'id' FROM vertices WHERE
> name=‘marko')
>
> select('edges’).has(‘outV’,within(select(‘vertices’).has(‘name’,’marko’).project(‘id’)))
>   => e[0][v[1]-knows->v[2]]
> // SELECT vertices.* FROM edges,vertices WHERE outV=1 AND id=inV
> select(‘vertices’).has(‘id’,1)
> .select('edges').by('outV',eq('id')).select('vertices').by('id',eq('inV'))
>   => v[2]
>
> *VARIATIONS:*
> 1. Relational databa

Re: What makes 'graph traversals' and 'relational joins' the same?

2019-04-25 Thread Joshua Shinavier
) from v[1] gives you an iterator
of "knows" edges for which v[1] is the out-vertex. Btw. adjacents() here is
the same as from() / comeFrom() in previous emails.



> Take this basic Gremlin traversal:
>
> g.V(1).out(‘knows’).values(‘name’)
>
> I now believe this should compile to the following:
>
> [goto,V,1] [goto,outE,knows] [goto,inV] [goto,properties,name]
>

Polymorphism is cool. Your two-argument goto() appears to be my to(),
whereas your three-argument goto() appears to be my from(). The minimal
tweaks I would make to your syntax are:

[goto,V,1] [goto,out,knows] [goto,in] [goto,out,name][goto,in]

I might go a step further and say:

[const,1] [goto,id,person] [goto,out,knows] [goto,in]
[goto,out,name][goto,in]



> Given MyGraph/MyVertex/MyEdge all implement ComplexType and there is no
> local caching of data on these respective objects, then the bytecode isn’t
> rewritten and the following cascade of events occurs:
>
> [...]
>

Looks pretty good.



> Lets review the ComplexType adjacents()-method:
>
> complexType.adjacents(label,identifiers...)
>
> complexType must have sufficient information to represent the tail of the
> relation.
>

Yes; it need to know what relation type you are matching, and on what field
(e.g. "out"/"in") in that relation. Note that the table-per-relation
approach is most appropriate when traversals are always strongly typed.
E.g. when your step is v.out("knows") as opposed to v.out(). For v.out() to
be supported efficiently, the monolithic table, or an element-to-type
table, makes sense.



> label specifies the relation type (we will always assume that a single
> String is sufficient)
>

Exactly. And yeah, I think it is safe to assume that types can be
identified by strings. Want namespaces? Use a qualified name syntax
appropriate for your application.



> identifiers... must contain sufficient information to identify the head of
> the relation.
>

Yes.


The return of the the method adjacents() is then the object(s) on the other
> side of the relation(s).
>

Yeah. We're taking an element and then iterating through all of the
incoming projections, of a given label, to that element. The label is the
name of the relation together with the name of the field/column.



> Now, given the way I have my data structure organized, I could beef up the
> MyXXX implementation such that MyStrategy rewrites the base bytecode to:
>
> [...]
> Now, I could really beef up MyStrategy when I realize that no path
> information is used in the traversal. Thus, the base bytecode compiles to:
>
> [my:sql,SELECT name FROM properties_table,vertex_table,edge_table WHERE …
> lots of join equalities]
>

Something of the kind.



> [...]
> To recap.
>
> 1. There are primitives.
>

+1


> 2. There are Maps and Lists.
>

Sure. Lists of primitives, and maps of primitives to primitives.



> 3. There are ComplexTypes.
>

I like the fancy term "algebraic data types". They are just tuples in which
each field is either:
1) a primitive value (possibly tagged with a type), or
2) an element reference (possibly tagged with a type)

You also need a special "unit" type for optionals.



> 4. ComplexTypes are adjacent to other objects via relations.
> - These adjacent objects may be cached locally with the
> ComplexType instance.
> - These adjacent objects may require some database lookup.
> - Regardless, TP4 doesn’t care — its up to the provider’s
> ComplexType instance to decide how to resolve the adjacency.
>

+1


> 5. ComplexTypes don’t go over the wire — a ComplexTypeProxy with
> appropriately provided toString() is all that leaves the TP4 VM.
>

As a tuple, ComplexTypes / ADTs go over the wire. The values of their
primitive fields should probably go with them. However, the values of their
element / entity fields are just references; the attached element doesn't
go with them.



> Finally, to solve the asMap()/asList() problem, we simply have:
>
> asMap(’name’,’age’) => complexType.adjacents(‘asMap’,’name’,’age')
> asList() => complexType.adjacents(‘asList’)
>

I think I need an example of asList(), but I agree that we can make
properties into key/value maps. If we want to access metaproperties, then
we don't use asMap().



It is up to the complexType to manifest a Map or List accordingly.
>
> I see this as basically a big flatmap system. ComplexTypes just map from
> self to any number of logical neighbors as specified by the relation.
>
> Am I getting it?,
>

Yeah, and I think I am getting how you break down traversals into basic
instructions. Go go GMachine.

Josh




> Marko.
>
> http://rredux.com <http://rredux.com/>

Re: What makes 'graph traversals' and 'relational joins' the same?

2019-04-24 Thread Joshua Shinavier
re not tied to the
JVM, and should be straightforward to implement in other frameworks.



>
> Thus, what is crucial to all this is the “shape of the data.” Using your
> pointers wisely so instructions produce useful results.
>

+1



> Does any of what I wrote update your comeFrom().goto() stuff?


Sadly, no, though I appreciate that you are coming from a slightly
different place w.r.t. properties, hypergraphs, and most importantly, the
role of a type system.



> If not, can you please explain to me why comeFrom() is cool — sorry for
> being dense (aka “being Kuppitz" — thats right, I said it. boom!).
>

Let's keep iterating until we reach a fixed point. Maybe Daniel's already
there.

Josh



>
> Thanks,
> Marko.
>
> http://rredux.com <http://rredux.com/>
>
>
>
>
> > On Apr 23, 2019, at 10:25 AM, Joshua Shinavier 
> wrote:
> >
> > On Tue, Apr 23, 2019 at 5:14 AM Marko Rodriguez 
> > wrote:
> >
> >> Hey Josh,
> >>
> >> This gets to the notion I presented in “The Fabled GMachine.”
> >>http://rredux.com/the-fabled-gmachine.html <
> >> http://rredux.com/the-fabled-gmachine.html> (first paragraph of
> >> “Structures, Processes, and Languages” section)
> >>
> >> All that exists are memory addresses that contain either:
> >>
> >>1. A primitive
> >>2. A set of labeled references to other references or primitives.
> >>
> >> Using your work and the above, here is a super low-level ‘bytecode' for
> >> property graphs.
> >>
> >> v.goto("id") => 1
> >>
> >
> > LGTM. An id is special because it is uniquely identifying / is a primary
> > key for the element. However, it is also just a field of the element,
> like
> > "in"/"inV" and "out"/"outV" are fields of an edge. As an aside, an id
> would
> > only really need to be unique among other elements of the same type. To
> the
> > above, I would add:
> >
> > v.type() => Person
> >
> > ...a special operation which takes you from an element to its type. This
> is
> > important if unions are supported; e.g. "name" in my example can apply
> > either to a Person or a Project.
> >
> >
> > v.goto("label") => person
> >>
> >
> > Or that. Like "id", "type"/"label" is special. You can think of it as a
> > field; it's just a different sort of field which will have the same value
> > for all elements of any given type.
> >
> >
> >
> >> v.goto("properties").goto("name") => "marko"
> >>
> >
> > OK, properties. Are properties built-in as a separate kind of thing from
> > edges, or can we treat them the same as vertices and edges here? I think
> we
> > can treat them the same. A property, in the algebraic model I described
> > above, is just an element with two fields, the second of which is a
> > primitive value. As I said, I think we need two distinct traversal
> > operations -- projection and selection -- and here is where we can use
> the
> > latter. Here, I will call it "comeFrom".
> >
> > v.comeFrom("name", "out").goto("in") => {"marko"}
> >
> > You can think of this comeFrom as a special case of a select() function
> > which takes a type -- "name" -- and a set of key/value pairs {("out",
> v)}.
> > It returns all matching elements of the given type. You then project to
> the
> > "in" value using your goto. I wrote {"marko"} as a set, because comeFrom
> > can give you multiple properties, depending on whether multi-properties
> are
> > supported.
> >
> > Note how similar this is to an edge traversal:
> >
> > v.comeFrom("knows", "out").goto("in") => {v[2], v[4]}
> >
> > Of course, you could define "properties" in such a way that a
> > goto("properties") does exactly this under the hood, but in terms of low
> > level instructions, you need something like comeFrom.
> >
> >
> > v.goto("properties").goto("name").goto(0) => "m"
> >>
> >
> > This is where the notion of optionals becomes handy. You can make
> > array/list indices into fields like this, but IMO you should also make
> them
> > safe. E.g. borrowing Haskell syntax for a moment:
> >
> > v.goto("properties").goto(&quo

Re: [Article] Pull vs. Push-Based Loop Fusion in Query Engines

2019-04-24 Thread Joshua Shinavier
Cool to see push vs. pull studied in depth. Often, we simply pick one style
and hope for the best. The cited paper

on stream fusion is also interesting, and also delves into the nuance of
lazy vs. strict evaluation over streams.

Although I have not used Scala, it has previously occurred to me that it
might be a better fit than Java for a core TP4 reference API. I understand
that Scala supports both lazy and strict evaluation, and the more advanced
support for higher-kinded types and functional pattern matching would be an
advantage for encapsulating side-effects and working with algebraic data
types (see parallel thread).

Josh


On Tue, Apr 23, 2019 at 9:31 AM Marko Rodriguez 
wrote:

> Hello,
>
> I just read this article:
>
> Push vs. Pull-Based Loop Fusion in Query Engines
> https://arxiv.org/abs/1610.09166  >
>
> It is a really good read if you are interested in TP4. Here are some notes
> I jotted down:
>
> 1. Pull-based engines are inefficient when there are lots of
> filters().
> - they require a while(predicate.test(next())) which
> introduces branch flow control and subsequent JVM performance issues.
> - push-based engines simply don’t emit() if the
> predicate.test() is false. Thus, no branching.
> 2. Pull-based engines are better at limit() based queries.
> - they only process what is necessary to satisfy the limit.
> - push-based engines will provide more results than needed
> given their eager evaluation strategy (backpressure comes into play).
> 3. We should introduce a "collection()" operator in TP4 for better
> expressivity with list and map manipulation and so we don’t have to use
> unfold()…fold().
> - [9,11,13].collection(incr().is(gt(10))) => [12,14]
> - the ability to chain functions in a collection
> manipulation sequence is crucial for performance as you don’t create
> intermediate collections.
> 4. Given that some bytecode is best on a push-based vs. a
> pull-based (and vice versa), we can strategize for this accordingly.
> - We have Pipes for pull-based.
> - We have RxJava for push-based.
> - We can even isolate sub-sections of a flow. For instance:
> g.V().has(‘age’,gt(10)).out(‘knows').limit(10)
> ==>becomes
>
> g.V().has(‘age’,gt(10)).local(out(‘knows’).limit(10))
> - where the local(bytecode) (TP3-style) is
> executed by Pipes and the root bytecode by rxJava.
> 5. They have lots of good tips for writing JVM performant
> operators/steps/functions.
> - All their work is done in Scala.
>
> Enjoy!,
> Marko.
>
> http://rredux.com 
>
>
>
>
>


Re: What makes 'graph traversals' and 'relational joins' the same?

2019-04-23 Thread Joshua Shinavier
(goto(“properties”).goto(“name”).is(“marko”)).goto(“outE”).filter(goto(“label”).is(“knows”)).goto(“inV”).free()
>


In the alternate universe:

g.comeFrom("Person", "graph").comeFrom("name", "out").restrict("in",
"marko").goto("out").comeFrom("knows", "out").goto("in").free()

I have wimped out on free() and just left it as you had it, but I think it
would be worthwhile to explore a monadic syntax for traversals with
side-effects. Different topic.

Now, all of this "out", "in" business is getting pretty repetitive, right?
Well, the field names become more diverse if we allow hyper-edges and
generalized ADTs. E.g. in my Trip example, say I want to know all drop-off
locations for a given rider:

u.comeFrom("Trip", "rider").goto("dropoff").goto("place")

Done.



> If we can get things that “low-level” and still efficient to compile, then
> we can model every data structure. All you are doing is pointer chasing
> through a withStructure() data structure. .
>

Agreed.


No one would ever want to write strategies for goto()-based Bytecode.


Also agreed.



> Thus, perhaps there could be a PropertyGraphDecorationStrategy that does:
>
> [...]


No argument here, though the alternate-universe "bytecode" would look
slightly different. And the high-level syntax should also be able to deal
with generalized relations / data types gracefully. As a thought
experiment, suppose we were to define the steps to() as your goto(), and
from() as my comeFrom(). Then traversals like:

u.from("Trip", "rider").to("dropoff").to("time")

...look pretty good as-is, and are not too low-level. However, ordinary
edge traversals like:

v.from("knows", "out").to("in")

...do look a little Assembly-like. So in/out/both etc. remain as they are,
but are shorthand for from() and to() steps using "out" or "in":

v.out("knows") === v.outE("knows").inV() === v.from("knows", "out").to("in")


[I AM NOW GOING OFF THE RAILS]
> [snip]
>

Sure. Again, I like the idea of wrapping side-effects in monads. What would
that look like in a Gremlinesque fluent syntax? I don't quite know, but if
we think of the dot as a monadic bind operation like Haskell's >>=, then
perhaps the monadic expressions look pretty similar to what you have just
sketched out. Might have to be careful about what it means to nest
operations as in your addEdge examples.



[I AM NOW BACK ON THE RAILS]
>
> Its as if “properties”, “outE”, “label”, “inV”, etc. references mean
> something to property graph providers and they can do more intelligent
> stuff than what MongoDB would do with such information. However, someone,
> of course, can create a MongoDBPropertyGraphStrategy that would make
> documents look like vertices and edges and then use O(log(n)) lookups on
> ids to walk the graph. However, if that didn’t exist, it would still do
> something that works even if its horribly inefficient as every database can
> make primitives with references between them!
>

I'm on the same same pair of rails.



> Anywho @Josh, I believe goto() is what you are doing with multi-references
> off an object. How do we make it all clean, easy, and universal?
>

Let me know what you think of the above.

Josh



>
> Marko.
>
> http://rredux.com <http://rredux.com/>
>
>
>
>
> > On Apr 22, 2019, at 6:42 PM, Joshua Shinavier  wrote:
> >
> > Ah, glad you asked. It's all in the pictures. I have nowhere to put them
> online at the moment... maybe this attachment will go through to the list?
> >
> > Btw. David Spivak gave his talk today at Uber; it was great. Juan
> Sequeda (relational <--> RDF mapping guy) was also here, and Ryan joined
> remotely. Really interesting discussion about databases vs. graphs, and
> what category theory brings to the table.
> >
> >
> > On Mon, Apr 22, 2019 at 1:45 PM Marko Rodriguez  <mailto:okramma...@gmail.com>> wrote:
> > Hey Josh,
> >
> > I’m digging what you are saying, but the pictures didn’t come through
> for me ? … Can you provide them again (or if dev@ is filtering them, can
> you give me URLs to them)?
> >
> > Thanks,
> > Marko.
> >
> >
> > > On Apr 21, 2019, at 12:58 PM, Joshua Shinavier  <mailto:j...@fortytwo.net>> wrote:
> > >
> > > On the subject of "reified joins", maybe be a picture will be worth a
> few words. As I said in the thread <
> https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ <
> https://groups.google.com/d/msg/grem

Re: What makes 'graph traversals' and 'relational joins' the same?

2019-04-21 Thread Joshua Shinavier
On the subject of "reified joins", maybe be a picture will be worth a few
words. As I said in the thread
 on
property graph standardization, if you think of vertex labels, edge labels,
and property keys as types, each with projections to two other types, there
is a nice analogy with relations of two columns, and this analogy can be
easily extended to hyper-edges. Here is what the schema of the TinkerPop
classic graph looks like if you make each type (e.g. Person, Project,
knows, name) into a relation:

[image: image.png]


I have made the vertex types salmon-colored, the edge types yellow, the
property types green, and the data types blue. The "o" and "I" columns
represent the out-type (e.g. out-vertex type of Person) and in-type (e.g.
property value type of String) of each relation. More than two arrows from
a column represent a coproduct, e.g. the out-type of "name" is Person OR
Project. Now you can think of out() and in() as joins of two tables on a
primary and foreign key.

We are not limited to "out" and "in", however. Here is the ternary
relationship (hyper-edge) from hyper-edge slide

of
my Graph Day preso, which has three columns/roles/projections:

[image: image.png]


I have drawn Says in light blue to indicate that it is a generalized
element; it has projections other than "out" and "in". Now the line between
relations and edges begins to blur. E.g. in the following, is PlaceEvent a
vertex or a property?

[image: image.png]


With the right type system, we can just speak of graph elements, and use
"vertex", "edge", "property" when it is convenient. In the relational
model, they are relations. If you materialize them in a relational
database, they are rows. In any case, you need two basic graph traversal
operations:

   - project() -- forward traversal of the arrows in the above diagrams.
   Takes you from an element to a component like in-vertex.
   - select() -- reverse traversal of the arrows. Allows you to answer
   questions like "in which Trips is John Doe the rider?"


Josh


On Fri, Apr 19, 2019 at 10:03 AM Marko Rodriguez 
wrote:

> Hello,
>
> I agree with everything you say. Here is my question:
>
> Relational database — join: Table x Table x equality function ->
> Table
> Graph database — traverser: Vertex x edge label -> Vertex
>
> I want a single function that does both. The only think was to represent
> traverser() in terms of join():
>
> Graph database — traverser: Vertices x Vertex x equality function
> -> Vertices
>
> For example,
>
> V().out(‘address’)
>
> ==>
>
> g.join(V().hasLabel(‘person’).as(‘a’)
>V().hasLabel(‘addresses’).as(‘b’)).
>  by(‘name’).select(?address vertex?)
>
> That is, join the vertices with themselves based on some predicate to go
> from vertices to vertices.
>
> However, I would like instead to transform the relational database join()
> concept into a traverser() concept. Kuppitz and I were talking the other
> day about a link() type operator that says: “try and link to this thing in
> some specified way.” .. ?? The problem we ran into is again, “link it to
> what?”
>
> - in graph, the ‘to what’ is hardcoded so you don’t need to
> specify anything.
> - in rdbms, the ’to what’ is some other specified table.
>
> So what does the link() operator look like?
>
> ——
>
> Some other random thoughts….
>
> Relational databases join on the table (the whole collection)
> Graph databases traverser on the vertex (an element of the whole
> collection)
>
> We can make a relational database join on single row (by providing a
> filter to a particular primary key). This is the same as a table with one
> row. Likewise, for graph in the join() context above:
>
> V(1).out(‘address’)
>
> ==>
>
> g.join(V(1).as(‘a’)
>V().hasLabel(‘addresses’).as(‘b’)).
>  by(‘name’).select(?address vertex?)
>
> More thoughts please….
>
> Marko.
>
> http://rredux.com 
>
>
>
>
> > On Apr 19, 2019, at 4:20 AM, pieter martin 
> wrote:
> >
> > Hi,
> > The way I saw it is that the big difference is that graph's have
> > reified joins. This is both a blessing and a curse.
> > A blessing because its much easier (less text to type, less mistakes,
> > clearer semantics...) to traverse an edge than to construct a manual
> > join.A curse because there are almost always far more ways to traverse
> > a data set than just by the edges some architect might have considered
> > when creating the data set. Often the architect is not the domain
> > expert and the edges are a hardcoded layout of the dataset, which
> > almost certainly won't survive the real world's demands. In graphs, if
> > their are no edges then the data is not reachable, except via indexed
> > lookups. This is the standard engineer

Re: [DISCUSS] Primitive Types, Complex Types, and their Entailments in TP4

2019-04-16 Thread Joshua Shinavier
On Mon, Apr 15, 2019 at 12:07 PM Marko Rodriguez 
wrote:

> [...]
> TinkerPop4 will have VM implementations on various language-platforms. For
> sure, Apache’s distribution will have a JVM and .NET implementation. The
> purpose of TinkerPop-specific types (and not JVM, Mono, Python, etc.) types
> is that we know its the same type across all VMs.
>

I agree it is important to define a standard set of scalar types. They can
probably be counted on one hand, or at most two -- at Uber, we use bytes
and byte arrays, character strings, floats (varying precision and
signedness), and integers (varying precision and signedness) as basic
types. My point is that you may not need special, TinkerPop-specific
wrapper classes for the scalar types; it is enough to define a mapping.
E.g. Integer is a suitable implementation, on the JVM (dunno what the .NET
equivalent is), for a standard 32-bit signed integer type, but a TInteger
wouldn't hurt.



> > To my mind, your approach is headed in the direction of a
> > TinkerPop-specific notion of a *type*, in general, which captures the
> > structure and constraints of a logical data type
> > <
> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/42
> <
> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/42
> >>,
> > and which can be used for query planning and optimization. These include
> > both scalar types as well as vertex, edge, and property types, as well as
> > more generic constructs such as optionals, lists, records.
>
> Yes — I’d like to be able to use some type of formal data type
> specification. You have those skills. I don’t. My rudimentary
> (non-categorical) representation is just “common useful data structures” —
> map, list, bool, string, etc.
>


I think we can formalize an appropriate general-purpose data model along
the lines I have motivated previously, with property graphs as a special
case. You are on the thread with Ryan, where we are trying to connect the
intuitive model with CQL. This would provide some nice guarantees of
tractability, and think the relationship of the model with runtime types
ought to be straightforward; they are basically just tuples with reference
-- pairs, lists, etc.



> A TList only supports primitives. However, a TRDFList could be a complex
> type for dealing with RDF lists and would be contained with the TP4-VM.
> Adding complex types is okay — it doesn’t break anything.
>

Agree, and don't care too much about the names of the runtime types.



> Hm. Yea, I’m not too strong with hypergraph thinking.
>
> g.V(1) // vertex
> g.V(1).outE(‘family’)  // hyperedges
> g.V(1).outE(‘family’).inV(‘father’) // ? perhaps inV/outV/bothV
> can take a String… label?
>
> We should talk to the GRAKN.AI guys and see what they think.
> https://grakn.ai/ 
> https://dev.grakn.ai/docs/general/quickstart <
> https://dev.grakn.ai/docs/general/quickstart>
>


Yes, I am a fan of GRAKN.AI's data model, and I think TinkerPop's structure
APIs ought to be expressive enough to interface with it. The "projections"
I have talked about here and elsewhere are "roles" in GRAKN, which relaxes
the property graph constraint from two projections/roles per relationship
to any number. GRAKN's relationships are hyper-edges in that sense, and
also in the colloquial sense of "edges to/from edges", i.e. allowing
projection between relationship types.



Yes. I want to make sure we naturally/natively support property graphs, RDF
> graphs, hypergraphs, tables, documents, etc. Property graphs (as specified
> by Neo4j) are not “special” in TP4. Like Gremlin for languages, property
> graphs sit side-by-side w/ other data structures. If we do this right, we
> will be heros!
>

+1


Josh


Re: [DISCUSS] Primitive Types, Complex Types, and their Entailments in TP4

2019-04-15 Thread Joshua Shinavier
Hi Marko,

I think this does satisfy your requirements, though I don't think I
understand all aspects the approach, especially the need for
TinkerPop-specific types *for basic scalar values* like booleans, strings,
and numbers. Since we are committed to the native data types supported by
the JVM, I think it is OK to use a subset of them as the basis for a
TinkerPop type system. E.g. while a formal type system might define "long"
as a signed 64-bit integer, the Long class is an appropriate
implementation; while it doesn't hurt to wrap Long in a TinkerPop-specific
TLong class, I am not sure it is necessary. Maybe there is more to your
get(), or other methods you would like to attach to these types, than I see.

To my mind, your approach is headed in the direction of a
TinkerPop-specific notion of a *type*, in general, which captures the
structure and constraints of a logical data type
,
and which can be used for query planning and optimization. These include
both scalar types as well as vertex, edge, and property types, as well as
more generic constructs such as optionals, lists, records.

Miscellaneous thoughts:

Can a TList really only contain primitives? A list of vertices or edges
would definitely be unusual, and typical PG implementations may not choose
to support them, but language-agnostic VM possibly should. They would
nicely capture RDF lists, in which list nodes typically do not have any
properties (edges) other than rdf:first and rdf:rest.

For hypergraphs, an inV and outV which may produce more than one vertex, is
one way to go, but a labeled hypergraph should really have other projections

in addition to inV, outV. That suggests a more generic step than inV or
outV, which takes as an argument the name of the projection as well as the
in/out element. E.g. project("in", v1), project("out", v1),
project("subject", v1).

For undirected graphs, we might as well just allow both in() and out()
rather than throwing exceptions. You can think of an undirected edge as a
pair of directed edges.

Agreed that provider-specific structures (types) are OK, and should not be
discouraged. Not only do different providers have their own data models,
but specific applications have their own schemas. A structure like a
metaproperty may be allowed in certain contexts and not others, and the
same goes for instances of conventional structures like edges of a certain
label.

For multi-properties, there is a distinction to be made between multiple
properties with the same key and element, and single collection-valued
properties. This is something the PG Working Group has been grappling with.
I think both should be allowed.

IMO it's OK if URIs, in an RDF context, become Strings in a TP context. You
can think of URI as a constraint on String, which should be enforced at the
appropriate time, but does not require a vendor-specific class. Can you
concatenate two URIs? Sure... just concatenate the Strings, but also be
aware that the result is not a URI.

Josh



On Mon, Apr 15, 2019 at 5:06 AM Marko Rodriguez 
wrote:

> Hello,
>
> I have a consolidated approach to handling data structures in TP4. I would
> appreciate any feedback you many have.
>
> 1. Every object processed by TinkerPop has a TinkerPop-specific
> type.
> - TLong, TInteger, TString, TMap, TVertex, TEdge, TPath,
> TList, …
> - BENEFIT #1: A universal type system will protect us from
> language platform peculiarities (e.g. Python long vs Java long).
> - BENEFIT #2: The serialization format is constrained and
> consistent across all languages platforms. (no more coming across a
> MySpecialClass).
> 2. All primitive T-type data can be directly access via get().
> - TBoolean.get() -> java.lang.Boolean | System.Boolean |
> ...
> - TLong.get() -> java.lang.Long | System.Int64 | ...
> - TString.get() -> java.lang.String | System.String | …
> - TList.get() -> java.lang.ArrayList | .. // can only
> contain primitives
> - TMap.get() -> java.lang.LinkedHashMap | .. // can only
> contain primitives
> - ...
> 3. All complex T-types have no methods! (except those afforded by
> Object)
> - TVertex: no accessible methods.
> - TEdge: no accessible methods.
> - TRow: no accessible methods.
> - TDocument: no accessible methods.
> - TDocumentArray: no accessible methods. // a document
> list field that can contain complex objects
> - ...
>
> REQUIREMENT #1: We need to be able to support multiple graphdbs in the
> same query.
> -

Re: [DISCUSS] Name of 3.4.x

2018-04-20 Thread Joshua Shinavier
Awesome :-)

On Fri, Apr 20, 2018 at 7:11 AM, Stephen Mallette 
wrote:

> Glad that is settled - this looks so much better now:
>
> https://github.com/apache/tinkerpop/blob/master/
> CHANGELOG.asciidoc#tinkerpop-340-avant-gremlin-
> construction-3-for-theremin-and-flowers
>
> thanks again josh!
>
> On Thu, Apr 19, 2018 at 1:03 PM, Joshua Shinavier 
> wrote:
>
> > Great! Stephen, I'll submit the PR per your suggestions later today.
> >
> > Josh
> >
> >
> > On Thu, Apr 19, 2018 at 5:36 AM, Robert Dale  wrote:
> >
> > > +1 to second that motion
> > >
> > > Robert Dale
> > >
> > > On Thu, Apr 19, 2018 at 8:03 AM, Stephen Mallette <
> spmalle...@gmail.com>
> > > wrote:
> > >
> > > > Hey Josh,
> > > >
> > > > I really like the other one with all that abstraction going on in the
> > > > background. In that case the antenna doesn't look too long. I think
> > that
> > > if
> > > > you could provide two avant-gremlin logos:
> > > >
> > > > 1. the original with background/flowers (avant-gremlin.png)
> > > > 2. one with just gremlin/theramin/headphones
> (avant-gremlin-noback.png)
> > > >
> > > > that would work nicely. no one seems to be objecting (or submitting
> > > > anything competing) so I think we've found our name and logo for the
> > > 3.4.x
> > > > line - thanks Josh!
> > > >
> > > > would you like to just submit this as a pull request against master?
> > The
> > > > images (and the graffle if you like) would be added here:
> > > >
> > > > https://github.com/apache/tinkerpop/tree/master/docs/static/images
> > > >
> > > > Then, just update these spots with the name/logo (use
> avant-gremlin.png
> > > > please):
> > > >
> > > > https://github.com/apache/tinkerpop/blob/master/
> > > > CHANGELOG.asciidoc#tinkerpop-340-not-named-yet
> > > > https://github.com/apache/tinkerpop/blob/master/docs/
> > > > src/upgrade/release-3.4.x.asciidoc
> > > >
> > > > Finally, i think that you should use full name in all its glory:
> > > > "Avant-Gremlin
> > > > Construction #3 for Theremin and Flowers" . If you don't feel like a
> > pull
> > > > request, you can just make the files available here and I can handle
> > > > getting them into the repo. Sound good?
> > > >
> > > >
> > > >
> > > > On Wed, Apr 18, 2018 at 10:52 AM, Joshua Shinavier <
> j...@fortytwo.net>
> > > > wrote:
> > > >
> > > > > Here's a version with only the flowers as the background:
> > > > >
> > > > > http://fortytwo.net/share/edRX5gqbVHT2ZbJx/avant-
> > > gremlin-noback.png
> > > > >
> > > > > I was actually wondering if the Gremlin's theremin antenna makes
> the
> > > logo
> > > > > too tall, but I'm sure he wouldn't mind pushing it in a little if
> > need
> > > > be.
> > > > >
> > > > > Josh
> > > > >
> > > > >
> > > > > On Wed, Apr 18, 2018 at 7:04 AM, Stephen Mallette <
> > > spmalle...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > haha - i seriously like all aspects of this.
> > > > > >
> > > > > > minor nit, but maybe not at issue - our gremlin logos have never
> > had
> > > a
> > > > > > background to them. i think the Gremlin with accompanying
> theramin
> > > and
> > > > > > headphones might fit the part for the logo on its own, but i
> really
> > > > like
> > > > > it
> > > > > > with the background. not sure if we need to stay consistent with
> > the
> > > > "no
> > > > > > background" thing or not.
> > > > > >
> > > > > > anyway +1 for this name/logo - thanks
> > > > > >
> > > > > > On Wed, Apr 18, 2018 at 9:56 AM, Joshua Shinavier <
> > j...@fortytwo.net
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Stephen,
> > > > > > >
> > > > > > > I was afraid of that. Here:
>

Re: [DISCUSS] Name of 3.4.x

2018-04-19 Thread Joshua Shinavier
Great! Stephen, I'll submit the PR per your suggestions later today.

Josh


On Thu, Apr 19, 2018 at 5:36 AM, Robert Dale  wrote:

> +1 to second that motion
>
> Robert Dale
>
> On Thu, Apr 19, 2018 at 8:03 AM, Stephen Mallette 
> wrote:
>
> > Hey Josh,
> >
> > I really like the other one with all that abstraction going on in the
> > background. In that case the antenna doesn't look too long. I think that
> if
> > you could provide two avant-gremlin logos:
> >
> > 1. the original with background/flowers (avant-gremlin.png)
> > 2. one with just gremlin/theramin/headphones (avant-gremlin-noback.png)
> >
> > that would work nicely. no one seems to be objecting (or submitting
> > anything competing) so I think we've found our name and logo for the
> 3.4.x
> > line - thanks Josh!
> >
> > would you like to just submit this as a pull request against master? The
> > images (and the graffle if you like) would be added here:
> >
> > https://github.com/apache/tinkerpop/tree/master/docs/static/images
> >
> > Then, just update these spots with the name/logo (use avant-gremlin.png
> > please):
> >
> > https://github.com/apache/tinkerpop/blob/master/
> > CHANGELOG.asciidoc#tinkerpop-340-not-named-yet
> > https://github.com/apache/tinkerpop/blob/master/docs/
> > src/upgrade/release-3.4.x.asciidoc
> >
> > Finally, i think that you should use full name in all its glory:
> > "Avant-Gremlin
> > Construction #3 for Theremin and Flowers" . If you don't feel like a pull
> > request, you can just make the files available here and I can handle
> > getting them into the repo. Sound good?
> >
> >
> >
> > On Wed, Apr 18, 2018 at 10:52 AM, Joshua Shinavier 
> > wrote:
> >
> > > Here's a version with only the flowers as the background:
> > >
> > > http://fortytwo.net/share/edRX5gqbVHT2ZbJx/avant-
> gremlin-noback.png
> > >
> > > I was actually wondering if the Gremlin's theremin antenna makes the
> logo
> > > too tall, but I'm sure he wouldn't mind pushing it in a little if need
> > be.
> > >
> > > Josh
> > >
> > >
> > > On Wed, Apr 18, 2018 at 7:04 AM, Stephen Mallette <
> spmalle...@gmail.com>
> > > wrote:
> > >
> > > > haha - i seriously like all aspects of this.
> > > >
> > > > minor nit, but maybe not at issue - our gremlin logos have never had
> a
> > > > background to them. i think the Gremlin with accompanying theramin
> and
> > > > headphones might fit the part for the logo on its own, but i really
> > like
> > > it
> > > > with the background. not sure if we need to stay consistent with the
> > "no
> > > > background" thing or not.
> > > >
> > > > anyway +1 for this name/logo - thanks
> > > >
> > > > On Wed, Apr 18, 2018 at 9:56 AM, Joshua Shinavier  >
> > > > wrote:
> > > >
> > > > > Hi Stephen,
> > > > >
> > > > > I was afraid of that. Here:
> > > > >
> > > > > http://fortytwo.net/share/edRX5gqbVHT2ZbJx/avant-gremlin.png
> > > > >
> > > > > For a number, maybe:
> > > > >
> > > > > "Avant-Gremlin Construction #3"
> > > > >
> > > > > aka
> > > > >
> > > > > "Avant-Gremlin Construction #3 for Theremin and Flowers"
> > > > >
> > > > >
> > > > > On Wed, Apr 18, 2018 at 4:40 AM, Stephen Mallette <
> > > spmalle...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi, josh - I think the dev list doesn't do a great job with
> > > attachments
> > > > > so
> > > > > > I think it got stripped. Note that the naming should include a
> > > "number"
> > > > > but
> > > > > > also be related to music, hence our naming thus far:
> > > > > >
> > > > > > * Gremlin Symphony #40 in G Minor (3.3.x)
> > > > > > * Nine Inch Gremlins (3.2.x)
> > > > > > * A 187 On The Undercover Gremlinz (3.1.x)
> > > > > > * A Gremlin Rāga in 7/16 Time (3.0.x)
> > > > > >
> > > > > > Maybe you're avant-gremlin could still fit that naming pattern?
> > > > > >
>

Re: [DISCUSS] Name of 3.4.x

2018-04-18 Thread Joshua Shinavier
Here's a version with only the flowers as the background:

http://fortytwo.net/share/edRX5gqbVHT2ZbJx/avant-gremlin-noback.png

I was actually wondering if the Gremlin's theremin antenna makes the logo
too tall, but I'm sure he wouldn't mind pushing it in a little if need be.

Josh


On Wed, Apr 18, 2018 at 7:04 AM, Stephen Mallette 
wrote:

> haha - i seriously like all aspects of this.
>
> minor nit, but maybe not at issue - our gremlin logos have never had a
> background to them. i think the Gremlin with accompanying theramin and
> headphones might fit the part for the logo on its own, but i really like it
> with the background. not sure if we need to stay consistent with the "no
> background" thing or not.
>
> anyway +1 for this name/logo - thanks
>
> On Wed, Apr 18, 2018 at 9:56 AM, Joshua Shinavier 
> wrote:
>
> > Hi Stephen,
> >
> > I was afraid of that. Here:
> >
> > http://fortytwo.net/share/edRX5gqbVHT2ZbJx/avant-gremlin.png
> >
> > For a number, maybe:
> >
> > "Avant-Gremlin Construction #3"
> >
> > aka
> >
> > "Avant-Gremlin Construction #3 for Theremin and Flowers"
> >
> >
> > On Wed, Apr 18, 2018 at 4:40 AM, Stephen Mallette 
> > wrote:
> >
> > > Hi, josh - I think the dev list doesn't do a great job with attachments
> > so
> > > I think it got stripped. Note that the naming should include a "number"
> > but
> > > also be related to music, hence our naming thus far:
> > >
> > > * Gremlin Symphony #40 in G Minor (3.3.x)
> > > * Nine Inch Gremlins (3.2.x)
> > > * A 187 On The Undercover Gremlinz (3.1.x)
> > > * A Gremlin Rāga in 7/16 Time (3.0.x)
> > >
> > > Maybe you're avant-gremlin could still fit that naming pattern?
> > >
> > >
> > >
> > > On Tue, Apr 17, 2018 at 9:30 PM, Joshua Shinavier 
> > > wrote:
> > >
> > > > May I suggest:
> > > >
> > > > Avant-gremlin (image attached).
> > > >
> > > > Keepin' the OmniGraffle tradition alive. Didn't really think about a
> > > > number, but "forty two" comes to mind.
> > > >
> > > > Josh
> > > >
> > > >
> > > >
> > > > On Tue, Apr 17, 2018 at 5:35 AM, Stephen Mallette <
> > spmalle...@gmail.com>
> > > > wrote:
> > > >
> > > >> We still need a name and logo for 3.4.x - we have plenty of decent
> > > names,
> > > >> but no one threw up logo ideas. Can't really go with a name without
> a
> > > logo
> > > >> - we need both. Is anyone planning on following up their suggestion
> > > with a
> > > >> logo? If not, I'll just do mine and we can get this settled up.
> > Thanks.
> > > >>
> > > >> On Mon, Mar 5, 2018 at 4:25 PM, Stephen Mallette <
> > spmalle...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > it just needs to be "music related" and "have something numeric"
> > about
> > > >> it.
> > > >> >
> > > >> > On Mon, Mar 5, 2018 at 4:20 PM, David Brown 
> > > >> wrote:
> > > >> >
> > > >> >> Does it need to have a 3 or 4 in it?
> > > >> >>
> > > >> >> - 3 Little Gremlins (Marley)
> > > >> >> - 4 Gremlins of the Apocalypse (Kind of like Clash)
> > > >> >>
> > > >> >> Idk I do like the Gambler...
> > > >> >>
> > > >> >>
> > > >> >> On Mon, Mar 5, 2018 at 12:53 PM Stephen Mallette <
> > > spmalle...@gmail.com
> > > >> >
> > > >> >> wrote:
> > > >> >>
> > > >> >> > So...we have a few good suggestions in hand (and I added one):
> > > >> >> >
> > > >> >> > 1. 4 Seasons of Gremlin - unclear if we're talking Boyz II Men
> or
> > > >> >> Vivaldi
> > > >> >> > here...
> > > >> >> > 2. Four Rusted Gremlins - Marilyn Manson
> > > >> >> > 3. Three Times a Gremlin - from the Commodores, though I think
> > > >> Gremlin
> > > >> >> > would prefer the Kenny Rogers styling of it - I mean, look at
&g

Re: [DISCUSS] Name of 3.4.x

2018-04-18 Thread Joshua Shinavier
Hi Stephen,

I was afraid of that. Here:

http://fortytwo.net/share/edRX5gqbVHT2ZbJx/avant-gremlin.png

For a number, maybe:

"Avant-Gremlin Construction #3"

aka

"Avant-Gremlin Construction #3 for Theremin and Flowers"


On Wed, Apr 18, 2018 at 4:40 AM, Stephen Mallette 
wrote:

> Hi, josh - I think the dev list doesn't do a great job with attachments so
> I think it got stripped. Note that the naming should include a "number" but
> also be related to music, hence our naming thus far:
>
> * Gremlin Symphony #40 in G Minor (3.3.x)
> * Nine Inch Gremlins (3.2.x)
> * A 187 On The Undercover Gremlinz (3.1.x)
> * A Gremlin Rāga in 7/16 Time (3.0.x)
>
> Maybe you're avant-gremlin could still fit that naming pattern?
>
>
>
> On Tue, Apr 17, 2018 at 9:30 PM, Joshua Shinavier 
> wrote:
>
> > May I suggest:
> >
> > Avant-gremlin (image attached).
> >
> > Keepin' the OmniGraffle tradition alive. Didn't really think about a
> > number, but "forty two" comes to mind.
> >
> > Josh
> >
> >
> >
> > On Tue, Apr 17, 2018 at 5:35 AM, Stephen Mallette 
> > wrote:
> >
> >> We still need a name and logo for 3.4.x - we have plenty of decent
> names,
> >> but no one threw up logo ideas. Can't really go with a name without a
> logo
> >> - we need both. Is anyone planning on following up their suggestion
> with a
> >> logo? If not, I'll just do mine and we can get this settled up. Thanks.
> >>
> >> On Mon, Mar 5, 2018 at 4:25 PM, Stephen Mallette 
> >> wrote:
> >>
> >> > it just needs to be "music related" and "have something numeric" about
> >> it.
> >> >
> >> > On Mon, Mar 5, 2018 at 4:20 PM, David Brown 
> >> wrote:
> >> >
> >> >> Does it need to have a 3 or 4 in it?
> >> >>
> >> >> - 3 Little Gremlins (Marley)
> >> >> - 4 Gremlins of the Apocalypse (Kind of like Clash)
> >> >>
> >> >> Idk I do like the Gambler...
> >> >>
> >> >>
> >> >> On Mon, Mar 5, 2018 at 12:53 PM Stephen Mallette <
> spmalle...@gmail.com
> >> >
> >> >> wrote:
> >> >>
> >> >> > So...we have a few good suggestions in hand (and I added one):
> >> >> >
> >> >> > 1. 4 Seasons of Gremlin - unclear if we're talking Boyz II Men or
> >> >> Vivaldi
> >> >> > here...
> >> >> > 2. Four Rusted Gremlins - Marilyn Manson
> >> >> > 3. Three Times a Gremlin - from the Commodores, though I think
> >> Gremlin
> >> >> > would prefer the Kenny Rogers styling of it - I mean, look at this
> >> >> > https://www.youtube.com/watch?v=Ok7becfXoJE :D
> >> >> >
> >> >> > Sorry Jason, gotta disqualify your suggestion as it doesn't have a
> >> >> "number
> >> >> > reference" in it.
> >> >> >
> >> >> > Anyone have any other ideas?
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Thu, Mar 1, 2018 at 8:33 AM, Jason Plurad 
> >> wrote:
> >> >> >
> >> >> > > Boyz II Men - ha, I hadn't considered that angle. I assumed the 4
> >> >> Seasons
> >> >> > > was a Vivaldi reference.
> >> >> > >
> >> >> > > How about David Bowie? The Gremlin Who Sold the World
> >> >> > > On Thu, Mar 1, 2018 at 8:06 AM Stephen Mallette <
> >> spmalle...@gmail.com
> >> >> >
> >> >> > > wrote:
> >> >> > >
> >> >> > > > I'm guessing at the reference for "4 Seasons of Gremlin" - are
> we
> >> >> > talking
> >> >> > > > Boyz || Men and "4 Seasons of Loneliness"?
> >> >> > > >
> >> >> > > > >  Yeah, I'd like to see what Gremlyn Manson would look like!
> >> >> > > >
> >> >> > > > would he look that much different than our "Nine Inch Nails
> >> >> Gremlin"?
> >> >> > > >
> >> >> > > >
>

  1   2   >