[RESULT][VOTE] TinkerPop 3.4.9 Release

2020-12-10 Thread Stephen Mallette
This vote is now closed with a total of 5 +1s, no +0s and no -1s. The
results are:

BINDING VOTES:

+1  (3 -- Stephen Mallette, Jorge Bay Gondra, Florian Hockmann)
0   (0)
-1  (0)

NON-BINDING VOTES:

+1 (2 -- Kelvin Lawrence, Divij Vaidya)
0  (0)
-1 (0)

Thank you very much,

Stephen

On Wed, Dec 9, 2020 at 6:18 PM Divij Vaidya  wrote:

> I found this issue while testing sanity:
> https://issues.apache.org/jira/browse/TINKERPOP-2489 but this exists in
> older releases (tested with 3.4.8) as well and hence, I don't consider this
> as a release blocker.
>
> Other than this, I tested some basic sanity using console -> server
> interaction, which looked good.
>
> VOTE +1
>
> --
> Divij Vaidya
>
>
>
> On Wed, Dec 9, 2020 at 9:02 AM (null) (null) 
> wrote:
>
> > VOTE +1
> >
> > Sent from my iPhone
> >
> > Cheers,
> > Kelvin
> >
> >
> > > On Dec 9, 2020, at 10:06 AM, f...@florian-hockmann.de wrote:
> > >
> > > VOTE +1
> > >
> > > -Ursprüngliche Nachricht-
> > > Von: Jorge Bay Gondra 
> > > Gesendet: Mittwoch, 9. Dezember 2020 15:23
> > > An: dev@tinkerpop.apache.org
> > > Betreff: Re: [VOTE] TinkerPop 3.4.9 Release
> > >
> > > VOTE +1
> > >
> > >> On Mon, Dec 7, 2020 at 8:05 PM Stephen Mallette  >
> > >> wrote:
> > >>
> > >> Hello,
> > >>
> > >> We are happy to announce that TinkerPop 3.4.9 is ready for release.
> > >>
> > >> The release artifacts can be found at this location:
> > >>https://dist.apache.org/repos/dist/dev/tinkerpop/3.4.9/
> > >>
> > >> The source distribution is provided by:
> > >>apache-tinkerpop-3.4.9-src.zip
> > >>
> > >> Two binary distributions are provided for user convenience:
> > >>apache-tinkerpop-gremlin-console-3.4.9-bin.zip
> > >>apache-tinkerpop-gremlin-server-3.4.9-bin.zip
> > >>
> > >> The GPG key used to sign the release artifacts is available at:
> > >>https://dist.apache.org/repos/dist/dev/tinkerpop/KEYS
> > >>
> > >> The online docs can be found here:
> > >>https://tinkerpop.apache.org/docs/3.4.9/ (user docs)
> > >>https://tinkerpop.apache.org/docs/3.4.9/upgrade/ (upgrade
> docs)
> > >>https://tinkerpop.apache.org/javadocs/3.4.9/core/ (core
> javadoc)
> > >>https://tinkerpop.apache.org/javadocs/3.4.9/full/ (full
> javadoc)
> > >>https://tinkerpop.apache.org/dotnetdocs/3.4.9/ (.NET API docs)
> > >>https://tinkerpop.apache.org/jsdocs/3.4.9/ (Javascript API
> > >> docs)
> > >>
> > >> The tag in Apache Git can be found here:
> > >>https://github.com/apache/tinkerpop/tree/3.4.9
> > >>
> > >> The release notes are available here:
> > >>
> > >> https://github.com/apache/tinkerpop/blob/3.4.9/CHANGELOG.asciidoc
> > >>
> > >> The [VOTE] will be open for the next 72 hours --- closing Thursday
> > >> (December 10, 2020) at 2pm EST.
> > >>
> > >> My vote is +1.
> > >>
> > >> Thank you very much,
> > >>
> > >> Stephen
> > >>
> > >
> >
> >
>


Re: [New Step Discussion] Add Steps to Support Basic Distribution Analysis (e.g. Standard Deviation and Percentile)

2020-12-10 Thread js guo
Thanks for the reply. It is a good idea to provide reducing operations through 
math() step. But from my understanding, we still need different reducing steps 
or at least different seed suppliers and reducing operators in the back-end.

gremlin> g.V().values('age').fold().math(local, "stdev(_)")
==>0.816
gremlin> g.inject([1,2,3]).math(local, "product(_)")
==>6

One of the advantage of a reducing step is that we do not need to hold the 
whole collection of numbers. Take standard deviation calculation for example, 
Kelvin’s solution requires manipulation of number arrays. With a reducing step, 
we can accumulate value sum, square sum and count during the traversal and get 
a final result with sqrt((E(X)^2 - E(X^2)). The latter has a better performance 
together with potentially lower memory requirement.

Maybe for math() step, when passing its scope as global, we can replace it with 
a reducing step internally. The main change is how users write queries with 
little change in underlying implementation. This way, we can align math 
functions into one single step, which I think is the right way to go 
considering that there might be more and more analytical functions to be 
supported. BTW, users still need to remember what steps are supported by 
math(). 

gremlin> g.V().values('age').fold().math(local, “mean(_)”)   // default local 
scope, accepts array
==>30.75
gremlin> g.V().values('age').math(global, “mean(_)”)  // internal execution 
with MeanGlobalStep
==>30.75

A further thinking performance wise. In ReducingBarrierStep implementation, 
“projectTraverser” is used to project current traversal into single value and a 
“BinaryOperator” is used to reduce multiple single-values into one. For number 
manipulation, this process involves a lot of boxing and unboxing, and also 
object creation (e.g. creating MeanNumber for MeanGlobalStep 
https://github.com/apache/tinkerpop/blob/3.4-dev/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MeanGlobalStep.java#L59).
 

I have wondered if we can optimize the reducing framework for number related 
steps, like store intermediate values (for MeanGlobalStep, it is the “count” 
and value “sum”) as step instance variables, and the reducing operation happens 
directly in traverser projection?

On 2020/12/09 12:24:04, Stephen Mallette  wrote: 
> Thanks for posting. In the math department, I think that these two steps
> are asked for commonly and I think we have reached a point where the things
> folks are doing with Gremlin are requiring steps of greater specificity so
> this conversation is definitely expected. We currently have two sorts of
> steps for operating on numbers: reducing steps like sum() and then math()
> step for expressions. It's interesting what you can accomplish with those
> two steps - note here how Kelvin manages standard deviation without lambdas:
> 
> g.V().hasLabel('airport').
>   values('runways').fold().as('runways').
>   mean(local).as('mean').
>   select('runways').unfold().
>   math('(_-mean)^2').mean().math('sqrt(_)')
> 
> https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html#stddevone
> 
> In any case, we can see that there is a fair bit of indirection there to do
> the work of a simple stdev() step. I've often wondered if math() could
> behave both in the way it does now and as a form of reducing step. In that
> way we could quietly add new math functions without forming new steps, as I
> can't help imaging that the addition of stdev() and percentile() will then
> follow with: variance(), covariance(), confidence() and so on.  Kelvin
> recently asked me about mult() for use cases that he sees from time to time.
> 
> As it stands our math expression library exp4j:
> 
> https://www.objecthunter.net/exp4j/
> 
> is good at extensibility but isn't' really formed well out of the box to
> handle reducing operations because its architecture forces you to specify
> the number of arguments it will take up front and those arguments must be
> double:
> 
> https://www.objecthunter.net/exp4j/#Custom_functions
> 
> So, that would be an issue to contend with, but technical issues aside and
> focusing instead on the user angle, would math() that worked as follows be
> a good path?
> 
> gremlin> g.V().values('ages').fold().math(local, "stdev(_)")
> ==>0.816
> gremlin> g.inject([1,2,3]).math(local, "product(_)")
> ==>6
> 
> And then, what distinction would there be between a math() step and first
> class "math steps" like sum(), min(), max(), and mean()? in other words,
> why would those exist if math() could already do it all? What makes a math
> operation "common" enough to beget its own first class representation?
> 
> Just to be clear, I'm not saying we shouldn't add stdev()/percentile() - I
> just want to consider all the design possibilities and talk them through.
> Thanks again for bringing up this conversation. I will link this thread to
> your JIRA for reference.
> 
> 
> On Wed, Dec 9, 2020 at 6:40 AM

[jira] [Created] (TINKERPOP-2490) RangeGlobalStep touches next traverser when high limit is already hit

2020-12-10 Thread Guo Junshi (Jira)
Guo Junshi created TINKERPOP-2490:
-

 Summary: RangeGlobalStep touches next traverser when high limit is 
already hit
 Key: TINKERPOP-2490
 URL: https://issues.apache.org/jira/browse/TINKERPOP-2490
 Project: TinkerPop
  Issue Type: Bug
  Components: process
Affects Versions: 3.4.8
Reporter: Guo Junshi


In FilterStep, the processNextStart() method will first retrieve next traverser 
and then apply filtering logic. But for RangleGlobalStep, if high limit is 
already hit, there will be no need to get next traverser.
{code:java}
@Override
protected Traverser.Admin processNextStart() {
while (true) {
final Traverser.Admin traverser = this.starts.next();
if (this.filter(traverser))
return traverser;
}
}
{code}
An example would be limit step: g.V().limit(1). This query will touch 2 
vertices although only 1 vertex will be returned.

This extra data loading will cause performance defects if DB data loading is 
involved. It is not a functionality bug, but for better performance, we'd 
better check high range limit first before touching next traversal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TINKERPOP-2490) RangeGlobalStep touches next traverser when high limit is already hit

2020-12-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247711#comment-17247711
 ] 

ASF GitHub Bot commented on TINKERPOP-2490:
---

junshiguo opened a new pull request #1370:
URL: https://github.com/apache/tinkerpop/pull/1370


   https://issues.apache.org/jira/browse/TINKERPOP-2490
   
   In FilterStep, the processNextStart() method will first retrieve next 
traverser and then apply filtering logic. But for RangleGlobalStep, if high 
limit is already hit, there will be no need to get next traverser.
   
   e.g. g.V().limit(1). This query will touch 2 vertices although only 1 vertex 
will be returned.
   
   This PR added high limit check before retrieving next traverser for 
filtering. Functionality is not affected, but we can expect better performance 
if getting next traverser is heavy.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> RangeGlobalStep touches next traverser when high limit is already hit
> -
>
> Key: TINKERPOP-2490
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2490
> Project: TinkerPop
>  Issue Type: Bug
>  Components: process
>Affects Versions: 3.4.8
>Reporter: Guo Junshi
>Priority: Major
>
> In FilterStep, the processNextStart() method will first retrieve next 
> traverser and then apply filtering logic. But for RangleGlobalStep, if high 
> limit is already hit, there will be no need to get next traverser.
> {code:java}
> @Override
> protected Traverser.Admin processNextStart() {
> while (true) {
> final Traverser.Admin traverser = this.starts.next();
> if (this.filter(traverser))
> return traverser;
> }
> }
> {code}
> An example would be limit step: g.V().limit(1). This query will touch 2 
> vertices although only 1 vertex will be returned.
> This extra data loading will cause performance defects if DB data loading is 
> involved. It is not a functionality bug, but for better performance, we'd 
> better check high range limit first before touching next traversal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)