Hi Aru

Firstly, as a side note, the issue to which you link highlights a problem
with the deb/rpm packages for some versions of Neo4j. The tarballs are fine
and py2neo is definitely compatible with all versions from 1.8 upwards.

Now to your main question....

Neo4j is optimised for efficient graph traversal but, in your case, you are
not really using any of this capability. You are instead attempting to
fetch a set of unconnected nodes based on a single property so, whichever
way you spin it, it's not really a "graphy" query. This query is in fact
much more the kind of one you'd ask an RMDBS and so I think this is more a
problem with data modelling than with the query language itself.

So, one option would be to rework your data model a little to add some
(more) relationships. As I don't know much about your data model or the
criteria for ETL selection, it's hard to say exactly what might be useful
in your case. But, if you can group or segment your entity nodes then you
can probably reduce the amount of data scanned by each run of your
extraction query. Your query might then look something like this:

MATCH (g:EntityGroup)-[:MEMBER]->(n:Entity)
WHERE n.uid IN ["uid001", "uid002" ... "uid500"]
AND g.group_name IN ["widgets", "things"]
RETURN n ORDER BY ID(n) ASC

This kind of approach would then only search a subset of your graph data
and should speed up the query. Of course, if your extraction criteria are
arbitrary or very variable then this option may be less viable.

Incidentally, you are ordering by ID(n) in the return clause. I'd generally
recommend against using entity IDs within any part of your domain logic as
they are an internal artifact and may not always operate as you'd expect.

Nigel


On 13 October 2014 22:33, Aru Sahni <arusa...@gmail.com> wrote:

> Hi,
>
> Disclaimer: I'm using Neo4j 2.0.3. I know that this is far from the most
> recent version, but a bug between the latest stable py2neo and 2.1.x
> builds have me stuck on this release
> <https://groups.google.com/forum/#!topic/neo4j/-eqzLPxk0DI>.
>
> I'm writing an ETL script that needs to retrieve around 500 nodes per
> request.  My nodes have a `uid` field that is indexed and has a
> uniqueness constraint.
>
> :Entity(uid)
>
> To get these nodes, I'm issuing the following query:
>
> MATCH (n:Entity)
> WHERE n.uid IN ["uid001", "uid002" ... "uid500"]
> RETURN n ORDER BY ID(n) ASC;
>
> This takes quite a bit of time. Running this with the profiler indicates
> that it's hitting the database for every filter.  Attempting to unroll this
> (i.e. `WHERE n.uid = "uid001" OR n.uid = "uid002"`, etc) hits the
> database just as heavily. If I try to specify the index with the USING
> statement, I get the following error:
>
> IndexHintException: Cannot use index hint in this context. Index hints
>> require using a simple equality comparison in WHERE (either directly or as
>> part of a top-level AND).
>>
>
> What I find ends up working is:
>
> MATCH (n:Entity) WHERE n.uid = "uid001" RETURN n
> UNION ALL
> MATCH (n:Entity) WHERE n.uid = "uid002" RETURN n
> ...
> MATCH (n:Entity) WHERE n.uid = "uid500" RETURN n
>
> This is a little too verbose and hacky for my tastes. I was wondering if
> there's anything I can do to improve the performance and reduce the
> complexity of this query.
>
> Regards,
> ~Aru
>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to neo4j+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to