[jira] [Created] (TINKERPOP-2417) Add possibility to supply io step with InputStream instead of a file path

2020-09-07 Thread Oleksandr Porunov (Jira)
Oleksandr Porunov created TINKERPOP-2417:


 Summary: Add possibility to supply io step with InputStream 
instead of a file path
 Key: TINKERPOP-2417
 URL: https://issues.apache.org/jira/browse/TINKERPOP-2417
 Project: TinkerPop
  Issue Type: New Feature
  Components: io
Reporter: Oleksandr Porunov


Currently it is possible to supply *io* step with file path to read a file from 
file system.

It would be very convenient if we would be able to read files from input stream 
instead of file path. 

For example, it could look something like:

*g.io(graphSonInputStream).with(IO.reader, IO.graphson).read().iterate()*

Of course it wouldn't work as expected if we would use *write* instead of read 
here because we won't be able to write to the input stream. Thus, we can fail 
fast if the user supplied an input stream into the *io* step but tried to use 
*write*.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TINKERPOP-2417) Add possibility to supply io step with InputStream instead of a file path

2020-09-10 Thread Oleksandr Porunov (Jira)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193539#comment-17193539
 ] 

Oleksandr Porunov commented on TINKERPOP-2417:
--

[~spmallette] graph.io(Builder) allows to create a custom reader / writer which 
can be supplied with InputStream or OutputStream.

I.e. something like:

graph.io(IoCore.graphson()).reader().create().readGraph(graphSonInputStream), 
graph)

 

The deprecation says to use GraphTraversalSource.io(String) which isn't the 
same as it expects filesystem path which sometimes might be very inconvenient.

Examples:
 * The user receives InputStream of GraphSON but the user don't have write 
access to the underlying file system to write GraphSON file temporary and then 
remove it after it was ingested to the graph.
 * Server disks don't have enough capacity to write a temporary file for 
ingestion.
 * Server disks are too slow thus limiting ingestion performance.

 

Would it be better to undeprecate graph.io(Builder) until there is a solution 
which replaces it? Is it planned to remove graph.io(Builder) soon?

> Add possibility to supply io step with InputStream instead of a file path
> -
>
> Key: TINKERPOP-2417
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2417
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: io
>Reporter: Oleksandr Porunov
>Priority: Major
>
> Currently it is possible to supply *io* step with file path to read a file 
> from file system.
> It would be very convenient if we would be able to read files from input 
> stream instead of file path. 
> For example, it could look something like:
> *g.io(graphSonInputStream).with(IO.reader, IO.graphson).read().iterate()*
> Of course it wouldn't work as expected if we would use *write* instead of 
> read here because we won't be able to write to the input stream. Thus, we can 
> fail fast if the user supplied an input stream into the *io* step but tried 
> to use *write*.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (TINKERPOP-2633) Support Gremlin Console on Java 17

2021-10-28 Thread Oleksandr Porunov (Jira)
Oleksandr Porunov created TINKERPOP-2633:


 Summary: Support Gremlin Console on Java 17
 Key: TINKERPOP-2633
 URL: https://issues.apache.org/jira/browse/TINKERPOP-2633
 Project: TinkerPop
  Issue Type: New Feature
  Components: console
 Environment: Java 17
Reporter: Oleksandr Porunov


Currently Gremlin Console cannot be started on Java 17 following the next error:

java.lang.IllegalArgumentException: Unsupported class file major version 61

 

It would be great if Gremlin Console supports Java 17 as it is the newest LTS 
release after Java 11 and most likely people will start switching to Java 17 
sooner or later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TINKERPOP-2633) Support Gremlin Console on Java 17

2021-10-28 Thread Oleksandr Porunov (Jira)


 [ 
https://issues.apache.org/jira/browse/TINKERPOP-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleksandr Porunov updated TINKERPOP-2633:
-
Affects Version/s: 3.5.1

> Support Gremlin Console on Java 17
> --
>
> Key: TINKERPOP-2633
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2633
> Project: TinkerPop
>  Issue Type: New Feature
>  Components: console
>Affects Versions: 3.5.1
> Environment: Java 17
>Reporter: Oleksandr Porunov
>Priority: Major
>
> Currently Gremlin Console cannot be started on Java 17 following the next 
> error:
> java.lang.IllegalArgumentException: Unsupported class file major version 61
>  
> It would be great if Gremlin Console supports Java 17 as it is the newest LTS 
> release after Java 11 and most likely people will start switching to Java 17 
> sooner or later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (TINKERPOP-2924) Refactor PropertyMapStep to be able to overwrite map method

2023-04-14 Thread Oleksandr Porunov (Jira)
Oleksandr Porunov created TINKERPOP-2924:


 Summary: Refactor PropertyMapStep to be able to overwrite map 
method
 Key: TINKERPOP-2924
 URL: https://issues.apache.org/jira/browse/TINKERPOP-2924
 Project: TinkerPop
  Issue Type: Improvement
  Components: driver
Affects Versions: 3.6.2
Reporter: Oleksandr Porunov


We would like to extend `PropertyMapStep` and overwrite some of it's 
functionality (in `map` method), so that we could leverage multi-query 
optimization in JanusGraph for `PropertyMapStep`. 

Unfortunately, some of it's utility methods and  fields are `private`. Thus, we 
should duplicate `includeToken` logic and we don't have any access to 
`traversalRing` because it's private and is created inside the constructor.

I would suggest making all of those private fields / methods as protected 
(similarly as it's done in `PropertiesStep`), so that there would be a 
possibility for overwrite without logic duplication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (TINKERPOP-2927) Make all Steps extensible and overridable

2023-04-18 Thread Oleksandr Porunov (Jira)
Oleksandr Porunov created TINKERPOP-2927:


 Summary: Make all Steps extensible and overridable 
 Key: TINKERPOP-2927
 URL: https://issues.apache.org/jira/browse/TINKERPOP-2927
 Project: TinkerPop
  Issue Type: Improvement
  Components: driver
Affects Versions: 3.6.2
Reporter: Oleksandr Porunov


Related issue (fixed): https://issues.apache.org/jira/browse/TINKERPOP-2924

 

Working on optimization strategies sometimes require replacing steps with 
extended version of those steps. At this moment not all steps can be extended 
due to being `final` (like `ProjectStep`, `PropertyKeyStep`, 
`PropertyValueStep`, `RangeLocalStep`, `SumLocalStep`, and many more). Thus, it 
requires creating a similar step and duplicate some logic there instead of 
simply extending a specific step.

 

For those steps which are non-final there are sometimes private fields without 
any getter methods (for example `private CallbackRegistry 
callbackRegistry` in `DropStep` is `private`. Thus, the caller needs to use 
Reflaction API to retrieve it's value).

 

In JanusGraph we replace some steps with the extended version of those steps. 

For example, we completely overwrite `flatMap` step of `PropertiesStep` which 
is an anti-pattern, but in the case when it's hard to extend specific logic 
parts such anti-pattern might be a a good solution I guess.

 

I think it would make sense to let Graph developers to extend any step and has 
access to it's fields / utility methods.

In such case we could do similar with `ProjectStep` and make it query data in 
parallel (see issue: [https://github.com/JanusGraph/janusgraph/issues/3559] ).

 

I'm also good not doing it in case anyone can suggest other patterns to follow 
for those optimizations instead of overwriting logic. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TINKERPOP-2490) RangeGlobalStep touches next traverser when high limit is already hit

2023-04-23 Thread Oleksandr Porunov (Jira)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715490#comment-17715490
 ] 

Oleksandr Porunov commented on TINKERPOP-2490:
--

This bug still exists in TinkerPop (checked version 3.6.2). An example test 
which shows when this behavior is not expected is below:
{code:java}
@Test
public void testLimitedFilterNotChecksElementsOverTheLimit(){
TinkerGraph tinkerGraph = TinkerGraph.open();
List vertices = new ArrayList<>();
for(int i=0; i<3; i++){
Vertex v1 = tinkerGraph.addVertex();
Vertex v2 = tinkerGraph.addVertex();
v1.addEdge("connects", v2);
vertices.add(v1);
}
tinkerGraph.traversal().V(vertices.get(0), vertices.get(1), vertices.get(2))
.where(__.out("connects").count().is(P.gte(1))).limit(1).toList();

TraversalMetrics traversalMetrics = tinkerGraph.traversal()
.V(vertices.get(0), vertices.get(1), vertices.get(2))
.where(__.out("connects").count().is(P.gte(1)))
.limit(1)
.profile().next();

Long filterTraversalCount = 
traversalMetrics.getMetrics().stream().filter(metrics -> 
metrics.getName().startsWith(TraversalFilterStep.class.getSimpleName()))
.findFirst().get().getCount(TraversalMetrics.TRAVERSER_COUNT_ID);

Assertions.assertEquals(1, filterTraversalCount);
} {code}
In the above test the filter first checks the first provided vertex (which runs 
some potentially expensive check) then when limit is satisfied the check is 
executed again for the second provided vertex (even so it's unnecessary). The 
3rd vertex is not evaluated.

Thus, basically, we execute this computation always for 1 more unnecessary 
vertex.

> RangeGlobalStep touches next traverser when high limit is already hit
> -
>
> Key: TINKERPOP-2490
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2490
> Project: TinkerPop
>  Issue Type: Bug
>  Components: process
>Affects Versions: 3.4.8
>Reporter: Guo Junshi
>Priority: Major
>
> In FilterStep, the processNextStart() method will first retrieve next 
> traverser and then apply filtering logic. But for RangleGlobalStep, if high 
> limit is already hit, there will be no need to get next traverser.
> {code:java}
> @Override
> protected Traverser.Admin processNextStart() {
> while (true) {
> final Traverser.Admin traverser = this.starts.next();
> if (this.filter(traverser))
> return traverser;
> }
> }
> {code}
> An example would be limit step: g.V().limit(1). This query will touch 2 
> vertices although only 1 vertex will be returned.
> This extra data loading will cause performance defects if DB data loading is 
> involved. It is not a functionality bug, but for better performance, we'd 
> better check high range limit first before touching next traversal.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (TINKERPOP-2490) RangeGlobalStep touches next traverser when high limit is already hit

2023-04-23 Thread Oleksandr Porunov (Jira)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715490#comment-17715490
 ] 

Oleksandr Porunov edited comment on TINKERPOP-2490 at 4/23/23 4:25 PM:
---

This bug still exists in TinkerPop (checked version 3.6.2). An example test 
which shows when this behavior is not expected is below:
{code:java}
@Test
public void testLimitedFilterNotChecksElementsOverTheLimit(){
TinkerGraph tinkerGraph = TinkerGraph.open();
List vertices = new ArrayList<>();
for(int i=0; i<3; i++){
Vertex v1 = tinkerGraph.addVertex();
Vertex v2 = tinkerGraph.addVertex();
v1.addEdge("connects", v2);
vertices.add(v1);
}
tinkerGraph.traversal().V(vertices.get(0), vertices.get(1), vertices.get(2))
.where(__.out("connects").count().is(P.gte(1))).limit(1).toList();

TraversalMetrics traversalMetrics = tinkerGraph.traversal()
.V(vertices.get(0), vertices.get(1), vertices.get(2))
.where(__.out("connects").count().is(P.gte(1)))
.limit(1)
.profile().next();

Long filterTraversalCount = 
traversalMetrics.getMetrics().stream().filter(metrics -> 
metrics.getName().startsWith(TraversalFilterStep.class.getSimpleName()))
.findFirst().get().getCount(TraversalMetrics.TRAVERSER_COUNT_ID);

Assertions.assertEquals(1, filterTraversalCount);
} {code}
In the above test the filter first checks the first provided vertex (which runs 
some potentially expensive check) then when limit is satisfied the check is 
executed again for the second provided vertex (even so it's unnecessary). The 
3rd vertex is not evaluated.

Thus, basically, we execute this computation always for 1 more unnecessary 
vertex.

The above assertEquals will throw:

 
{code:java}
org.opentest4j.AssertionFailedError: 
Expected :1
Actual   :2 {code}
 

 


was (Author: porunov):
This bug still exists in TinkerPop (checked version 3.6.2). An example test 
which shows when this behavior is not expected is below:
{code:java}
@Test
public void testLimitedFilterNotChecksElementsOverTheLimit(){
TinkerGraph tinkerGraph = TinkerGraph.open();
List vertices = new ArrayList<>();
for(int i=0; i<3; i++){
Vertex v1 = tinkerGraph.addVertex();
Vertex v2 = tinkerGraph.addVertex();
v1.addEdge("connects", v2);
vertices.add(v1);
}
tinkerGraph.traversal().V(vertices.get(0), vertices.get(1), vertices.get(2))
.where(__.out("connects").count().is(P.gte(1))).limit(1).toList();

TraversalMetrics traversalMetrics = tinkerGraph.traversal()
.V(vertices.get(0), vertices.get(1), vertices.get(2))
.where(__.out("connects").count().is(P.gte(1)))
.limit(1)
.profile().next();

Long filterTraversalCount = 
traversalMetrics.getMetrics().stream().filter(metrics -> 
metrics.getName().startsWith(TraversalFilterStep.class.getSimpleName()))
.findFirst().get().getCount(TraversalMetrics.TRAVERSER_COUNT_ID);

Assertions.assertEquals(1, filterTraversalCount);
} {code}
In the above test the filter first checks the first provided vertex (which runs 
some potentially expensive check) then when limit is satisfied the check is 
executed again for the second provided vertex (even so it's unnecessary). The 
3rd vertex is not evaluated.

Thus, basically, we execute this computation always for 1 more unnecessary 
vertex.

> RangeGlobalStep touches next traverser when high limit is already hit
> -
>
> Key: TINKERPOP-2490
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2490
> Project: TinkerPop
>  Issue Type: Bug
>  Components: process
>Affects Versions: 3.4.8
>Reporter: Guo Junshi
>Priority: Major
>
> In FilterStep, the processNextStart() method will first retrieve next 
> traverser and then apply filtering logic. But for RangleGlobalStep, if high 
> limit is already hit, there will be no need to get next traverser.
> {code:java}
> @Override
> protected Traverser.Admin processNextStart() {
> while (true) {
> final Traverser.Admin traverser = this.starts.next();
> if (this.filter(traverser))
> return traverser;
> }
> }
> {code}
> An example would be limit step: g.V().limit(1). This query will touch 2 
> vertices although only 1 vertex will be returned.
> This extra data loading will cause performance defects if DB data loading is 
> involved. It is not a functionality bug, but for better performance, we'd 
> better check high range limit first before touching next traversal.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TINKERPOP-2490) RangeGlobalStep touches next traverser when high limit is already hit

2023-04-23 Thread Oleksandr Porunov (Jira)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715539#comment-17715539
 ] 

Oleksandr Porunov commented on TINKERPOP-2490:
--

Maybe I'm missing something, but I doubt that it's `CountStrategy. Even without 
`count` step at all (let's replace it with `has` step for example) the behavior 
will be the same. `has` step will be executed for 2 vertices even so, `limit` 
step says that `1` is enough. Could be I'm missing some logic behind those step 
usages, but in my understanding the filter query should be executed for a 
single vertex if that vertex matches the filter and the limit is reached.

I.e. `V(v1, v2, v3) .where(${some_filter_which_is_always_true}).limit(1)` in my 
testing `some_filter_which_is_always_true` is executed 2 times, but I would 
expect it to be executed only once because limit says that we don't need more 
elements then 1.

As for count query above, notice that I use `.gte` and not `gt`. Thus it should 
be grater or equal and thus `1` should satisfy the requirements. I don't see a 
point of checking a second element to find out that the amount of elements is 
more if equal to 1 satisfies the requirements as well. Nevertheless, my issue 
isn't in the Count query, but in the filter step which is executed more than 
needed (from my point of view).

Here is a simplified version of the test:
{code:java}
@Test
public void testLimitedFilterNotChecksElementsOverTheLimit(){
TinkerGraph tinkerGraph = TinkerGraph.open();
List vertices = new ArrayList<>();
for(int i=0; i<3; i++){
Vertex v1 = tinkerGraph.addVertex();
Vertex v2 = tinkerGraph.addVertex();
v1.addEdge("connects", v2);
vertices.add(v1);
}

TraversalMetrics traversalMetrics = tinkerGraph.traversal()
.V(vertices.get(0), vertices.get(1), vertices.get(2))
.where(__.inject(true))
.limit(1)
.profile().next();

Long filterTraversalCount = 
traversalMetrics.getMetrics().stream().filter(metrics -> 
metrics.getName().startsWith(TraversalFilterStep.class.getSimpleName()))
.findFirst().get().getCount(TraversalMetrics.TRAVERSER_COUNT_ID);

Assertions.assertEquals(1, filterTraversalCount);
} {code}
 

> RangeGlobalStep touches next traverser when high limit is already hit
> -
>
> Key: TINKERPOP-2490
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2490
> Project: TinkerPop
>  Issue Type: Bug
>  Components: process
>Affects Versions: 3.4.8
>Reporter: Guo Junshi
>Priority: Major
>
> In FilterStep, the processNextStart() method will first retrieve next 
> traverser and then apply filtering logic. But for RangleGlobalStep, if high 
> limit is already hit, there will be no need to get next traverser.
> {code:java}
> @Override
> protected Traverser.Admin processNextStart() {
> while (true) {
> final Traverser.Admin traverser = this.starts.next();
> if (this.filter(traverser))
> return traverser;
> }
> }
> {code}
> An example would be limit step: g.V().limit(1). This query will touch 2 
> vertices although only 1 vertex will be returned.
> This extra data loading will cause performance defects if DB data loading is 
> involved. It is not a functionality bug, but for better performance, we'd 
> better check high range limit first before touching next traversal.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)