Re: [VOTE] TinkerPop 3.4.9 Release

2020-12-09 Thread Divij Vaidya
I found this issue while testing sanity:
https://issues.apache.org/jira/browse/TINKERPOP-2489 but this exists in
older releases (tested with 3.4.8) as well and hence, I don't consider this
as a release blocker.

Other than this, I tested some basic sanity using console -> server
interaction, which looked good.

VOTE +1

--
Divij Vaidya



On Wed, Dec 9, 2020 at 9:02 AM (null) (null) 
wrote:

> VOTE +1
>
> Sent from my iPhone
>
> Cheers,
> Kelvin
>
>
> > On Dec 9, 2020, at 10:06 AM, f...@florian-hockmann.de wrote:
> >
> > VOTE +1
> >
> > -Ursprüngliche Nachricht-
> > Von: Jorge Bay Gondra 
> > Gesendet: Mittwoch, 9. Dezember 2020 15:23
> > An: dev@tinkerpop.apache.org
> > Betreff: Re: [VOTE] TinkerPop 3.4.9 Release
> >
> > VOTE +1
> >
> >> On Mon, Dec 7, 2020 at 8:05 PM Stephen Mallette 
> >> wrote:
> >>
> >> Hello,
> >>
> >> We are happy to announce that TinkerPop 3.4.9 is ready for release.
> >>
> >> The release artifacts can be found at this location:
> >>https://dist.apache.org/repos/dist/dev/tinkerpop/3.4.9/
> >>
> >> The source distribution is provided by:
> >>apache-tinkerpop-3.4.9-src.zip
> >>
> >> Two binary distributions are provided for user convenience:
> >>apache-tinkerpop-gremlin-console-3.4.9-bin.zip
> >>apache-tinkerpop-gremlin-server-3.4.9-bin.zip
> >>
> >> The GPG key used to sign the release artifacts is available at:
> >>https://dist.apache.org/repos/dist/dev/tinkerpop/KEYS
> >>
> >> The online docs can be found here:
> >>https://tinkerpop.apache.org/docs/3.4.9/ (user docs)
> >>https://tinkerpop.apache.org/docs/3.4.9/upgrade/ (upgrade docs)
> >>https://tinkerpop.apache.org/javadocs/3.4.9/core/ (core javadoc)
> >>https://tinkerpop.apache.org/javadocs/3.4.9/full/ (full javadoc)
> >>https://tinkerpop.apache.org/dotnetdocs/3.4.9/ (.NET API docs)
> >>https://tinkerpop.apache.org/jsdocs/3.4.9/ (Javascript API
> >> docs)
> >>
> >> The tag in Apache Git can be found here:
> >>https://github.com/apache/tinkerpop/tree/3.4.9
> >>
> >> The release notes are available here:
> >>
> >> https://github.com/apache/tinkerpop/blob/3.4.9/CHANGELOG.asciidoc
> >>
> >> The [VOTE] will be open for the next 72 hours --- closing Thursday
> >> (December 10, 2020) at 2pm EST.
> >>
> >> My vote is +1.
> >>
> >> Thank you very much,
> >>
> >> Stephen
> >>
> >
>
>


[jira] [Created] (TINKERPOP-2489) Server doesn't start if folder has spaces

2020-12-09 Thread Divij Vaidya (Jira)
Divij Vaidya created TINKERPOP-2489:
---

 Summary: Server doesn't start if folder has spaces
 Key: TINKERPOP-2489
 URL: https://issues.apache.org/jira/browse/TINKERPOP-2489
 Project: TinkerPop
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.8, 3.4.9
Reporter: Divij Vaidya


Repro steps:

1. Download the server zip.
2. Unzip the binary.
3. Rename the unzipped folder and add a space, e.g. 
{code:java}
apache-tinkerpop-gremlin-server-3.4.9 my{code}
4. Start the server 
{code:java}
./bin/gremlin-server.sh start{code}
5. The server will fail to start (check status) with the error "Error: Could 
not find or load main class my.conf.log4j-server.properties"

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TINKERPOP-2487) Add steps to support basic analysis like standard deviation and percentile

2020-12-09 Thread Kelvin R. Lawrence (Jira)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246736#comment-17246736
 ] 

Kelvin R. Lawrence commented on TINKERPOP-2487:
---

I have recently had two different users ask me if we have considered a 
`product` step also that would multiply all the values in the stream together. 
There is no easy workaround today outside of using lambdas/closures.

> Add steps to support basic analysis like standard deviation and percentile
> --
>
> Key: TINKERPOP-2487
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2487
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: process
>Affects Versions: 3.4.8
>Reporter: Guo Junshi
>Priority: Minor
>
> When using tinkerpop Gremlin for real use cases, we found that some general 
> analytical steps are very useful, yet not supported now. Some analytical 
> steps are general enough to be part of the official gremlin package, e.g. 
> steps to calculate standard deviation and percentile. The example usage might 
> be:
>  
> {code:java}
> gremlin> g.V().values('ages')
> ==>1
> ==>2
> ==>3
> gremlin> g.V().values('ages').stdev()
> ==>0.816
> gremlin> g.V().values('ages').fold().stdev(Scope.local)
> ==>0.816
> gremlin> g.V().values('ages').percentile(50)
> ==>2
> // one percentile, return single value
> gremlin> g.V().values('ages').percentile(0, 100)
> ==>[0: 1, 100: 3]
> // multiple percentiles, return a map{code}
> These steps are frequently used in our cases, and we think it would be great 
> to support them in official versions. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TINKERPOP-2389) Authorization support in TinkerPop

2020-12-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246702#comment-17246702
 ] 

ASF GitHub Bot commented on TINKERPOP-2389:
---

spmallette commented on pull request #1308:
URL: https://github.com/apache/tinkerpop/pull/1308#issuecomment-741922455


   Thanks for all the changes on this. I will give it another review in greater 
detail after 3.4.9 is officially released. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Authorization support in TinkerPop
> --
>
> Key: TINKERPOP-2389
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2389
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.4.7
>Reporter: Shekhar Bansal
>Priority: Major
> Attachments: Screenshot 2020-06-25 at 15.15.04.png
>
>
> Use case:
>  # Tinkerpop supports multiple graphs using a single API and admin might want 
> to restrict access to some of the graphs.
>  # Admin might want to restrict read/write access to certain users.
>  
> Proposal
> Add read/write access restrictions at graph level. We can extend it to 
> executing scripts by adding execute privileges.
>  
> Changes required
> Add `authorizer` block similar to `authentication` block in yaml file
>  
> {code:java}
> authorization: {
>   authorizer: 
> org.apache.tinkerpop.gremlin.server.authorization.AllowAllAuthorizer,
>   authorizationHandler: 
> org.apache.tinkerpop.gremlin.server.handler.SaslAuthorizationHandler,
>   config: {
>}
> }{code}
>  
> Authorization will be done only if authentication is enabled. Authentication 
> is done at per session basis while authorization will be done for each and 
> every request.
> In `SaslAuthorizationHandler` or `HttpAuthorizationHandler` query will be 
> parsed and depending on the step instructions, the query will be marked as of 
> type read or write and then privilege evaluation will be done by calling 
> `isAccessAllowed` method of `Authorizer`
> {code:java}
> public interface Authorizer {
> /**
>  * Whether or not the authorization requires check.
>  * If false will not authorzie user.
>  */
> public boolean requireAuthorization();
> /**
>  * Setup is called once upon system startup to initialize the {@code 
> Authorizer}.
>  */
> public void setup(final Map config);
> /**
>  * A "standard" authorization implementation
>  */
> public boolean isAccessAllowed(AuthorizationRequest authorizationRequest) 
> throws AuthorizationException;
> }
> {code}
> Access policies can be defined in tools like `Apache Ranger`, sample policy:
> !Screenshot 2020-06-25 at 15.15.04.png|width=1017,height=548!
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TINKERPOP-2389) Authorization support in TinkerPop

2020-12-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246700#comment-17246700
 ] 

ASF GitHub Bot commented on TINKERPOP-2389:
---

spmallette commented on a change in pull request #1308:
URL: https://github.com/apache/tinkerpop/pull/1308#discussion_r539499923



##
File path: 
gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/handler/HttpBasicAuthorizationHandler.java
##
@@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.tinkerpop.gremlin.server.handler;
+
+import io.netty.channel.ChannelFutureListener;
+import io.netty.channel.ChannelHandler;
+import io.netty.channel.ChannelHandlerContext;
+import io.netty.channel.ChannelInboundHandlerAdapter;
+import io.netty.handler.codec.http.DefaultFullHttpResponse;
+import io.netty.handler.codec.http.FullHttpMessage;
+import io.netty.handler.codec.http.FullHttpRequest;
+import io.netty.handler.codec.http.HttpResponseStatus;
+import io.netty.util.ReferenceCountUtil;
+import org.apache.tinkerpop.gremlin.driver.Tokens;
+import org.apache.tinkerpop.gremlin.driver.message.RequestMessage;
+import org.apache.tinkerpop.gremlin.server.GremlinServer;
+import org.apache.tinkerpop.gremlin.server.auth.AuthenticatedUser;
+import org.apache.tinkerpop.gremlin.server.authz.AuthorizationException;
+import org.apache.tinkerpop.gremlin.server.authz.Authorizer;
+import org.javatuples.Quartet;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Map;
+
+import static io.netty.handler.codec.http.HttpResponseStatus.BAD_REQUEST;
+import static 
io.netty.handler.codec.http.HttpResponseStatus.INTERNAL_SERVER_ERROR;
+import static io.netty.handler.codec.http.HttpResponseStatus.UNAUTHORIZED;
+import static io.netty.handler.codec.http.HttpVersion.HTTP_1_1;
+
+
+/**
+ *  An authorization handler for the http channel that allows the {@link 
Authorizer} to be plugged into it.
+ *
+ * @author Marc de Lignie
+ */
+@ChannelHandler.Sharable
+public class HttpBasicAuthorizationHandler extends 
ChannelInboundHandlerAdapter {
+private static final Logger logger = 
LoggerFactory.getLogger(HttpBasicAuthorizationHandler.class);
+private static final Logger auditLogger = 
LoggerFactory.getLogger(GremlinServer.AUDIT_LOGGER_NAME);
+
+private AuthenticatedUser user;
+private final Authorizer authorizer;
+
+public HttpBasicAuthorizationHandler(Authorizer authorizer) {
+this.authorizer = authorizer;
+}
+
+@Override
+public void channelRead(final ChannelHandlerContext ctx, final Object msg) 
{
+if (msg instanceof FullHttpMessage){
+final FullHttpMessage request = (FullHttpMessage) msg;
+try {
+user = ctx.channel().attr(StateKey.AUTHENTICATED_USER).get();
+if (null == user) {// This is expected when using the 
AllowAllAuthenticator
+user = AuthenticatedUser.ANONYMOUS_USER;
+}
+// ToDo: move getRequestArguments to a new preceding pipeline 
step in the Channelizer, but @Stephen,
+//   how about the sendAndCleanupConnection logic in 
HttpGremlinEndpointHandler?

Review comment:
   As they are all static methods I think you could refactor to create a 
small final utility class to house them - `HttpUtil` or something like that?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Authorization support in TinkerPop
> --
>
> Key: TINKERPOP-2389
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2389
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.4.7
>Reporter: Shekhar Bansal
>Priority: Major
> Attachments: Screenshot 2020-06-25 at 

Re: [VOTE] TinkerPop 3.4.9 Release

2020-12-09 Thread (null) (null)
VOTE +1

Sent from my iPhone

Cheers,
Kelvin


> On Dec 9, 2020, at 10:06 AM, f...@florian-hockmann.de wrote:
> 
> VOTE +1
> 
> -Ursprüngliche Nachricht-
> Von: Jorge Bay Gondra  
> Gesendet: Mittwoch, 9. Dezember 2020 15:23
> An: dev@tinkerpop.apache.org
> Betreff: Re: [VOTE] TinkerPop 3.4.9 Release
> 
> VOTE +1
> 
>> On Mon, Dec 7, 2020 at 8:05 PM Stephen Mallette 
>> wrote:
>> 
>> Hello,
>> 
>> We are happy to announce that TinkerPop 3.4.9 is ready for release.
>> 
>> The release artifacts can be found at this location:
>>https://dist.apache.org/repos/dist/dev/tinkerpop/3.4.9/
>> 
>> The source distribution is provided by:
>>apache-tinkerpop-3.4.9-src.zip
>> 
>> Two binary distributions are provided for user convenience:
>>apache-tinkerpop-gremlin-console-3.4.9-bin.zip
>>apache-tinkerpop-gremlin-server-3.4.9-bin.zip
>> 
>> The GPG key used to sign the release artifacts is available at:
>>https://dist.apache.org/repos/dist/dev/tinkerpop/KEYS
>> 
>> The online docs can be found here:
>>https://tinkerpop.apache.org/docs/3.4.9/ (user docs)
>>https://tinkerpop.apache.org/docs/3.4.9/upgrade/ (upgrade docs)
>>https://tinkerpop.apache.org/javadocs/3.4.9/core/ (core javadoc)
>>https://tinkerpop.apache.org/javadocs/3.4.9/full/ (full javadoc)
>>https://tinkerpop.apache.org/dotnetdocs/3.4.9/ (.NET API docs)
>>https://tinkerpop.apache.org/jsdocs/3.4.9/ (Javascript API 
>> docs)
>> 
>> The tag in Apache Git can be found here:
>>https://github.com/apache/tinkerpop/tree/3.4.9
>> 
>> The release notes are available here:
>> 
>> https://github.com/apache/tinkerpop/blob/3.4.9/CHANGELOG.asciidoc
>> 
>> The [VOTE] will be open for the next 72 hours --- closing Thursday 
>> (December 10, 2020) at 2pm EST.
>> 
>> My vote is +1.
>> 
>> Thank you very much,
>> 
>> Stephen
>> 
> 



AW: [VOTE] TinkerPop 3.4.9 Release

2020-12-09 Thread fh
VOTE +1

-Ursprüngliche Nachricht-
Von: Jorge Bay Gondra  
Gesendet: Mittwoch, 9. Dezember 2020 15:23
An: dev@tinkerpop.apache.org
Betreff: Re: [VOTE] TinkerPop 3.4.9 Release

VOTE +1

On Mon, Dec 7, 2020 at 8:05 PM Stephen Mallette 
wrote:

> Hello,
>
> We are happy to announce that TinkerPop 3.4.9 is ready for release.
>
> The release artifacts can be found at this location:
> https://dist.apache.org/repos/dist/dev/tinkerpop/3.4.9/
>
> The source distribution is provided by:
> apache-tinkerpop-3.4.9-src.zip
>
> Two binary distributions are provided for user convenience:
> apache-tinkerpop-gremlin-console-3.4.9-bin.zip
> apache-tinkerpop-gremlin-server-3.4.9-bin.zip
>
> The GPG key used to sign the release artifacts is available at:
> https://dist.apache.org/repos/dist/dev/tinkerpop/KEYS
>
> The online docs can be found here:
> https://tinkerpop.apache.org/docs/3.4.9/ (user docs)
> https://tinkerpop.apache.org/docs/3.4.9/upgrade/ (upgrade docs)
> https://tinkerpop.apache.org/javadocs/3.4.9/core/ (core javadoc)
> https://tinkerpop.apache.org/javadocs/3.4.9/full/ (full javadoc)
> https://tinkerpop.apache.org/dotnetdocs/3.4.9/ (.NET API docs)
> https://tinkerpop.apache.org/jsdocs/3.4.9/ (Javascript API 
> docs)
>
> The tag in Apache Git can be found here:
> https://github.com/apache/tinkerpop/tree/3.4.9
>
> The release notes are available here:
> 
> https://github.com/apache/tinkerpop/blob/3.4.9/CHANGELOG.asciidoc
>
> The [VOTE] will be open for the next 72 hours --- closing Thursday 
> (December 10, 2020) at 2pm EST.
>
> My vote is +1.
>
> Thank you very much,
>
> Stephen
>



Re: [VOTE] TinkerPop 3.4.9 Release

2020-12-09 Thread Jorge Bay Gondra
VOTE +1

On Mon, Dec 7, 2020 at 8:05 PM Stephen Mallette 
wrote:

> Hello,
>
> We are happy to announce that TinkerPop 3.4.9 is ready for release.
>
> The release artifacts can be found at this location:
> https://dist.apache.org/repos/dist/dev/tinkerpop/3.4.9/
>
> The source distribution is provided by:
> apache-tinkerpop-3.4.9-src.zip
>
> Two binary distributions are provided for user convenience:
> apache-tinkerpop-gremlin-console-3.4.9-bin.zip
> apache-tinkerpop-gremlin-server-3.4.9-bin.zip
>
> The GPG key used to sign the release artifacts is available at:
> https://dist.apache.org/repos/dist/dev/tinkerpop/KEYS
>
> The online docs can be found here:
> https://tinkerpop.apache.org/docs/3.4.9/ (user docs)
> https://tinkerpop.apache.org/docs/3.4.9/upgrade/ (upgrade docs)
> https://tinkerpop.apache.org/javadocs/3.4.9/core/ (core javadoc)
> https://tinkerpop.apache.org/javadocs/3.4.9/full/ (full javadoc)
> https://tinkerpop.apache.org/dotnetdocs/3.4.9/ (.NET API docs)
> https://tinkerpop.apache.org/jsdocs/3.4.9/ (Javascript API docs)
>
> The tag in Apache Git can be found here:
> https://github.com/apache/tinkerpop/tree/3.4.9
>
> The release notes are available here:
> https://github.com/apache/tinkerpop/blob/3.4.9/CHANGELOG.asciidoc
>
> The [VOTE] will be open for the next 72 hours --- closing Thursday
> (December 10, 2020) at 2pm EST.
>
> My vote is +1.
>
> Thank you very much,
>
> Stephen
>


Re: [DISCUSS] Creating pattern steps to codify best practices

2020-12-09 Thread Stephen Mallette
Josh, thanks for your thoughts - some responses inline:

On Tue, Dec 8, 2020 at 10:16 PM Josh Perryman 
wrote:

> I'll offer some thoughts. I'm seeing upsertV() as an idempotent getOrCreate
> call which always returns a vertex with the label/property values specified
> within the step. It's sort of a declarative pattern: "return this vertex to
> me, find it if you can, create it if you must."
>

I like this description - I've added it to the gist, though it's a bit at
odds with Dave's previous post, so we'll consider it a temporary addition
until he responds.


> On that account, I do like the simplification in 1. Repetition shouldn't be
> necessary. In an ideal world, the engine should know the primary
> identifiers (name or id) and find/create the vertex based on them. Any
> other included values will be "trued up" as well. But this may be a bridge
> too far for TinkerPop since knowing identifiers may require a specified
> schema. I'd prefer to omit the third input, but it might be necessary to
> keep it so that the second input can be for the matching use case.
>

In my most recent post on gremlin-users I think I came up with a nice way
to get rid of the second Map. One Map that forms the full list of
properties for upserting is easier than partitioning two Maps that
essentially merge together. I imagine it's unlikely that application code
will have that separation naturally so users will have the added step of
trying to separate their data into searchable vs "just data". Getting us to
one Map argument will simplify APIs for us and reduce complexity to users.
Here is what I'd proposed for those not following over there:

// match on name and age (or perhaps whatever the underlying graph system
thinks is best?)
g.upsertV('person', [name:'marko',age:29])

// match on name only
g.upsertV('person', [name:'marko',age:29]).by('name')

// explicitly match on name and age
g.upsertV('person', [name:'marko',age:29]).
  by('name').by('age')

// match on id only
g.upsertV('person', [(T.id): 100, name:'marko',age:29]).by(T.id)

// match on whatever the by(Traversal) predicate defines
g.upsertV('person', [name:'marko',age:29]).
  by(has('name', 'marko'))

// match on id, then update age
g.upsertV('person', [(T.id): 100, name:'marko']).by(T.id).
  property('age',29)

With this model, we get one Map argument that represents the complete
property set to be added/updated to the graph and the user can hint on what
key they wish to match on using by() where that sort of step modulation
should be a well understood and familiar concept in Gremlin at this point.

So that means I think 2 should always match or update the additional
> values. Again, we're specifying the expected result and letting the engine
> figure out best how to return that results and appropriately maintain
> state.
>

I again like this description, but we'll see what Dave's thoughts are since
he's a bit behind on the threads at this point I think.


> I'm also presuming that anything not included as inputs to the upsertV()
> step are then to be handled by following steps. I'm hoping that is a
> sufficient approach for addressing the multi/meta property use cases
> brought up in 3.
>

yeahit needs more thought. I spent more time thinking on
this issue yesterday than I have for all the previous posts combined and I
think it yielded something good in that revised syntax. It's going to take
more of that kind of elbow grease to dig into these lesser use cases to
make sure we aren't coding ourselves into corners.


> I do like the idea of using modulators (with(), by()) for more
> sophisticated usage and advanced use cases. Also, the streaming examples
> are quite elegant allowing for a helpful separation of data and logic.
>

cool - hope you like the revised syntax I posted then. :)


> That's my humble take. This is a very welcome addition to the language and
> I appreciate the thoughtful & collaborative approach to the design
> considerations.
>

Thanks again and please keep the thoughts coming. Lots of other interesting
design discussions seem to be brewing.


>
> Josh
>
> On Tue, Dec 8, 2020 at 8:57 AM Stephen Mallette 
> wrote:
>
> > I started a expanded this discussion to gremlin-users for a wider
> audience
> > and the thread is starting to grow:
> >
> > https://groups.google.com/g/gremlin-users/c/QBmiOUkA0iI/m/pj5Ukiq6AAAJ
> >
> > I guess we'll need to summarize that discussion back here now
> >
> > I did have some more thoughts to hang out there and figured that I
> wouldn't
> > convolute the discussion on gremlin-users with it so I will continue the
> > discussion here.
> >
> > 1, The very first couple of examples seem wrong (or at least not best
> > demonstrating the usage):
> >
> > g.upsertV('person', [name: 'marko'],
> > [name: 'marko', age: 29])
> > g.upsertV('person', [(T.id): 1],
> > [(T.id): 1, name: 'Marko'])
> >
> > should instead be:
> >
> > g.upsertV('person', [name: 'marko'],
> 

[jira] [Updated] (TINKERPOP-2487) Add steps to support basic analysis like standard deviation and percentile

2020-12-09 Thread Stephen Mallette (Jira)


 [ 
https://issues.apache.org/jira/browse/TINKERPOP-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Mallette updated TINKERPOP-2487:

Affects Version/s: 3.4.8

> Add steps to support basic analysis like standard deviation and percentile
> --
>
> Key: TINKERPOP-2487
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2487
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: process
>Affects Versions: 3.4.8
>Reporter: Guo Junshi
>Priority: Minor
>
> When using tinkerpop Gremlin for real use cases, we found that some general 
> analytical steps are very useful, yet not supported now. Some analytical 
> steps are general enough to be part of the official gremlin package, e.g. 
> steps to calculate standard deviation and percentile. The example usage might 
> be:
>  
> {code:java}
> gremlin> g.V().values('ages')
> ==>1
> ==>2
> ==>3
> gremlin> g.V().values('ages').stdev()
> ==>0.816
> gremlin> g.V().values('ages').fold().stdev(Scope.local)
> ==>0.816
> gremlin> g.V().values('ages').percentile(50)
> ==>2
> // one percentile, return single value
> gremlin> g.V().values('ages').percentile(0, 100)
> ==>[0: 1, 100: 3]
> // multiple percentiles, return a map{code}
> These steps are frequently used in our cases, and we think it would be great 
> to support them in official versions. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [New Step Discussion] Add Steps to Support Basic Distribution Analysis (e.g. Standard Deviation and Percentile)

2020-12-09 Thread Stephen Mallette
Thanks for posting. In the math department, I think that these two steps
are asked for commonly and I think we have reached a point where the things
folks are doing with Gremlin are requiring steps of greater specificity so
this conversation is definitely expected. We currently have two sorts of
steps for operating on numbers: reducing steps like sum() and then math()
step for expressions. It's interesting what you can accomplish with those
two steps - note here how Kelvin manages standard deviation without lambdas:

g.V().hasLabel('airport').
  values('runways').fold().as('runways').
  mean(local).as('mean').
  select('runways').unfold().
  math('(_-mean)^2').mean().math('sqrt(_)')

https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html#stddevone

In any case, we can see that there is a fair bit of indirection there to do
the work of a simple stdev() step. I've often wondered if math() could
behave both in the way it does now and as a form of reducing step. In that
way we could quietly add new math functions without forming new steps, as I
can't help imaging that the addition of stdev() and percentile() will then
follow with: variance(), covariance(), confidence() and so on.  Kelvin
recently asked me about mult() for use cases that he sees from time to time.

As it stands our math expression library exp4j:

https://www.objecthunter.net/exp4j/

is good at extensibility but isn't' really formed well out of the box to
handle reducing operations because its architecture forces you to specify
the number of arguments it will take up front and those arguments must be
double:

https://www.objecthunter.net/exp4j/#Custom_functions

So, that would be an issue to contend with, but technical issues aside and
focusing instead on the user angle, would math() that worked as follows be
a good path?

gremlin> g.V().values('ages').fold().math(local, "stdev(_)")
==>0.816
gremlin> g.inject([1,2,3]).math(local, "product(_)")
==>6

And then, what distinction would there be between a math() step and first
class "math steps" like sum(), min(), max(), and mean()? in other words,
why would those exist if math() could already do it all? What makes a math
operation "common" enough to beget its own first class representation?

Just to be clear, I'm not saying we shouldn't add stdev()/percentile() - I
just want to consider all the design possibilities and talk them through.
Thanks again for bringing up this conversation. I will link this thread to
your JIRA for reference.


On Wed, Dec 9, 2020 at 6:40 AM js guo  wrote:

> Hi team,
>
> We are using tinkerpop Gremlin in our risk detection cases. Some analytical
> calculations are used frequently, yet there is no corresponding steps in
> hand.
>
> I am thinking that some general analytical steps can be added in Gremlin.
> e.g. steps to calculate standard deviation and percentile. The example
> usage might be as follows.
> 
> gremlin> g.V().values('ages')
> ==>1
> ==>2
> ==>3
> gremlin> g.V().values('ages').stdev()
> ==>0.816
> gremlin> g.V().values('ages').fold().stdev(Scope.local)
> ==>0.816
>
> gremlin> g.V().values('ages').percentile(50)
> ==>2
> // one percentile, return single value
> gremlin> g.V().values('ages').percentile(0, 100)
> ==>[0: 1, 100: 3]
> // multiple percentiles, return a map
> 
>
> Sorry for not emailing earlier, I have created a JIRA ticket for this
> https://issues.apache.org/jira/browse/TINKERPOP-2487.
>
> As new steps are already used in our cases, we are glad to offer the
> implementation for review, if you think it good to add the two steps.
>
> Regards,
> Junshi
>


[New Step Discussion] Add Steps to Support Basic Distribution Analysis (e.g. Standard Deviation and Percentile)

2020-12-09 Thread js guo
Hi team,

We are using tinkerpop Gremlin in our risk detection cases. Some analytical
calculations are used frequently, yet there is no corresponding steps in
hand.

I am thinking that some general analytical steps can be added in Gremlin.
e.g. steps to calculate standard deviation and percentile. The example
usage might be as follows.

gremlin> g.V().values('ages')
==>1
==>2
==>3
gremlin> g.V().values('ages').stdev()
==>0.816
gremlin> g.V().values('ages').fold().stdev(Scope.local)
==>0.816

gremlin> g.V().values('ages').percentile(50)
==>2
// one percentile, return single value
gremlin> g.V().values('ages').percentile(0, 100)
==>[0: 1, 100: 3]
// multiple percentiles, return a map


Sorry for not emailing earlier, I have created a JIRA ticket for this
https://issues.apache.org/jira/browse/TINKERPOP-2487.

As new steps are already used in our cases, we are glad to offer the
implementation for review, if you think it good to add the two steps.

Regards,
Junshi