Thanks for the reply. We definitely can have a separate DISCUSS thread for 
details of refactoring reducing of number streams. 

As for my PR, agree that new steps are better targeted in 3.5.0 versions 
instead of minor fix versions. I will later apply the changes to master branch 
and raise a new PR. 

Regards,
Junshi

On 2020/12/16 18:36:21, Stephen Mallette <spmalle...@gmail.com> wrote: 
> Some responses inline:
> 
> On Fri, Dec 11, 2020 at 12:53 AM js guo <jsguo8...@gmail.com> wrote:
> 
> > Thanks for the reply. It is a good idea to provide reducing operations
> > through math() step. But from my understanding, we still need different
> > reducing steps or at least different seed suppliers and reducing operators
> > in the back-end.
> >
> > gremlin> g.V().values('age').fold().math(local, "stdev(_)")
> > ==>0.816
> > gremlin> g.inject([1,2,3]).math(local, "product(_)")
> > ==>6
> >
> > One of the advantage of a reducing step is that we do not need to hold the
> > whole collection of numbers. Take standard deviation calculation for
> > example, Kelvin’s solution requires manipulation of number arrays. With a
> > reducing step, we can accumulate value sum, square sum and count during the
> > traversal and get a final result with sqrt((E(X)^2 - E(X^2)). The latter
> > has a better performance together with potentially lower memory requirement.
> >
> 
> true - that is a good argument.
> 
> 
> > Maybe for math() step, when passing its scope as global, we can replace it
> > with a reducing step internally. The main change is how users write queries
> > with little change in underlying implementation. This way, we can align
> > math functions into one single step, which I think is the right way to go
> > considering that there might be more and more analytical functions to be
> > supported. BTW, users still need to remember what steps are supported by
> > math().
> >
> 
> That could be what's nagging at me as we have a math() step but then we
> have steps that do math and i'm trying to understand why we sometimes sum()
> and sometimes math("x+y")...unifying these concepts in some way might make
> sense....i just want to be sure we consider the ramifications before we
> charge ahead with existing patterns.
> 
> 
> > gremlin> g.V().values('age').fold().math(local, “mean(_)”)   // default
> > local scope, accepts array
> > ==>30.75
> > gremlin> g.V().values('age').math(global, “mean(_)”)  // internal
> > execution with MeanGlobalStep
> > ==>30.75
> >
> > A further thinking performance wise. In ReducingBarrierStep
> > implementation, “projectTraverser” is used to project current traversal
> > into single value and a “BinaryOperator” is used to reduce multiple
> > single-values into one. For number manipulation, this process involves a
> > lot of boxing and unboxing, and also object creation (e.g. creating
> > MeanNumber for MeanGlobalStep
> > https://github.com/apache/tinkerpop/blob/3.4-dev/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MeanGlobalStep.java#L59).
> >
> >
> > I have wondered if we can optimize the reducing framework for number
> > related steps, like store intermediate values (for MeanGlobalStep, it is
> > the “count” and value “sum”) as step instance variables, and the reducing
> > operation happens directly in traverser projection?
> >
> 
> That was a convenient way to code that but I see what you're saying. Some
> refactoring there could be worthwhile. Happy to discuss that further if you
> feel you'd like to contribute something in that area. Perhaps start another
> DISCUSS thread with some more details?
> 
> As for your current PR, I still haven't had a chance to put the thought
> I've wanted into this change. I will look to do that before the end of the
> week. At this point, however, I don't see anyone with any specific
> objections to the idea. As a logistical thought, I do think that should it
> go forward as is, it should be a change targeted at 3.5.0 though rather
> than 3.4.10.
> 
> 
> >
> > On 2020/12/09 12:24:04, Stephen Mallette <spmalle...@gmail.com> wrote:
> > > Thanks for posting. In the math department, I think that these two steps
> > > are asked for commonly and I think we have reached a point where the
> > things
> > > folks are doing with Gremlin are requiring steps of greater specificity
> > so
> > > this conversation is definitely expected. We currently have two sorts of
> > > steps for operating on numbers: reducing steps like sum() and then math()
> > > step for expressions. It's interesting what you can accomplish with those
> > > two steps - note here how Kelvin manages standard deviation without
> > lambdas:
> > >
> > > g.V().hasLabel('airport').
> > >       values('runways').fold().as('runways').
> > >       mean(local).as('mean').
> > >       select('runways').unfold().
> > >       math('(_-mean)^2').mean().math('sqrt(_)')
> > >
> > > https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html#stddevone
> > >
> > > In any case, we can see that there is a fair bit of indirection there to
> > do
> > > the work of a simple stdev() step. I've often wondered if math() could
> > > behave both in the way it does now and as a form of reducing step. In
> > that
> > > way we could quietly add new math functions without forming new steps,
> > as I
> > > can't help imaging that the addition of stdev() and percentile() will
> > then
> > > follow with: variance(), covariance(), confidence() and so on.  Kelvin
> > > recently asked me about mult() for use cases that he sees from time to
> > time.
> > >
> > > As it stands our math expression library exp4j:
> > >
> > > https://www.objecthunter.net/exp4j/
> > >
> > > is good at extensibility but isn't' really formed well out of the box to
> > > handle reducing operations because its architecture forces you to specify
> > > the number of arguments it will take up front and those arguments must be
> > > double:
> > >
> > > https://www.objecthunter.net/exp4j/#Custom_functions
> > >
> > > So, that would be an issue to contend with, but technical issues aside
> > and
> > > focusing instead on the user angle, would math() that worked as follows
> > be
> > > a good path?
> > >
> > > gremlin> g.V().values('ages').fold().math(local, "stdev(_)")
> > > ==>0.816
> > > gremlin> g.inject([1,2,3]).math(local, "product(_)")
> > > ==>6
> > >
> > > And then, what distinction would there be between a math() step and first
> > > class "math steps" like sum(), min(), max(), and mean()? in other words,
> > > why would those exist if math() could already do it all? What makes a
> > math
> > > operation "common" enough to beget its own first class representation?
> > >
> > > Just to be clear, I'm not saying we shouldn't add stdev()/percentile() -
> > I
> > > just want to consider all the design possibilities and talk them through.
> > > Thanks again for bringing up this conversation. I will link this thread
> > to
> > > your JIRA for reference.
> > >
> > >
> > > On Wed, Dec 9, 2020 at 6:40 AM js guo <jsguo8...@gmail.com> wrote:
> > >
> > > > Hi team,
> > > >
> > > > We are using tinkerpop Gremlin in our risk detection cases. Some
> > analytical
> > > > calculations are used frequently, yet there is no corresponding steps
> > in
> > > > hand.
> > > >
> > > > I am thinking that some general analytical steps can be added in
> > Gremlin.
> > > > e.g. steps to calculate standard deviation and percentile. The example
> > > > usage might be as follows.
> > > > --------------------------------
> > > > gremlin> g.V().values('ages')
> > > > ==>1
> > > > ==>2
> > > > ==>3
> > > > gremlin> g.V().values('ages').stdev()
> > > > ==>0.816
> > > > gremlin> g.V().values('ages').fold().stdev(Scope.local)
> > > > ==>0.816
> > > >
> > > > gremlin> g.V().values('ages').percentile(50)
> > > > ==>2
> > > > // one percentile, return single value
> > > > gremlin> g.V().values('ages').percentile(0, 100)
> > > > ==>[0: 1, 100: 3]
> > > > // multiple percentiles, return a map
> > > > --------------------------------
> > > >
> > > > Sorry for not emailing earlier, I have created a JIRA ticket for this
> > > > https://issues.apache.org/jira/browse/TINKERPOP-2487.
> > > >
> > > > As new steps are already used in our cases, we are glad to offer the
> > > > implementation for review, if you think it good to add the two steps.
> > > >
> > > > Regards,
> > > > Junshi
> > > >
> > >
> >
> 

Reply via email to