Hello,
I thought I'd drop a note to this list on something I've been working on
for a bit. From time to time, I get new neo4j databases from colleagues.
Because the notion of schemas is somewhat loose in neo4j, a lot of times
they don't have documentation per se on the graph model for their ad-hoc
database. So I had taken to running a few "forensic queries" on a new
database to figure a few things out of it before I jumped in and started
querying.
In the last week, I wrapped a lot of those up in a simple profiling
application. You can call it on any neo4j data directory (tested mostly
with 2.0 databases). It will build a short report in JSON, Markdown, or
HTML that describes the node labels, constraints, relationship types,
sample properties, and so on. It attempts to sample a bit of data and
infer data types for properties, and whether or not they're required. In
a way, we're trying to do very basic data profiling on novel neo4j
databases and package the result into a file that can be used as an
automatically generated schema documentation of sorts for the lazy neo4j db
admin (which describes many of us I think).
The code is here: https://github.com/moxious/neoprofiler
Check the README on GitHub for instructions on how to run it. If you want
to have a look at what the results look like, see the attached sample
report describing a neo4j database I work with often.
Thanks,
David
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Title: Profile of /home/x/provenance.db/ generated 2014/05/08 09:16:11
# Profile of /home/x/provenance.db/ generated 2014/05/08 09:16:11
/home/x/provenance.db/
## Schema
*Information about Neo4J's database schema*
### Indexes
* On nodes labeled Provenance. Property keys:
* oid
* On nodes labeled Actor. Property keys:
* aid
* On nodes labeled PrivilegeClass. Property keys:
* pid
* On nodes labeled NonProvenance. Property keys:
* npid
### Non-Index Constraints
* Type UNIQUENESS On nodes labeled Provenance. Property keys:
* oid
* Type UNIQUENESS On nodes labeled Actor. Property keys:
* aid
* Type UNIQUENESS On nodes labeled PrivilegeClass. Property keys:
* pid
* Type UNIQUENESS On nodes labeled NonProvenance. Property keys:
* npid
### Observations
* Run Time (ms): 3
## Nodes
*Summary statistics about nodes in the graph*
### Observations
* Node Labels:
* [Provenance]
* [Actor]
* [PrivilegeClass]
* Run Time (ms): 1417
* Total Nodes: 13266
## Relationships
*Summary statistics about relationships in the graph*
### Observations
* Available Relationship Types:
* dominates
* owns
* triggered
* controlledBy
* contributed
* generated
* input to
* Run Time (ms): 826
* Total Relationships: 11948
## Label 'Provenance'
*Profile of nodes labeled 'Provenance'*
### Observations
* Inbound relationship types:
* triggered
* contributed
* generated
* input to
* owns
* Outbound relationship types:
* triggered
* controlledBy
* contributed
* input to
* generated
* Run Time (ms): 636
* Sample properties:
* when_end (String) optional
* content (String) optional
* type (String) required
* SGFs (String[]) optional
* certainty (String) required
* created (Long) required
* when_start (String) optional
* oid (String) required
* activity (String) optional
* subtype (String) required
* workflow (String) optional
* name (String) required
* ownerid (String) required
* metadata:implus:motif (String) optional
* Total nodes: 13225
## Label 'Actor'
*Profile of nodes labeled 'Actor'*
### Observations
* Inbound relationship types:
* N/A
* Outbound relationship types:
* owns
* Run Time (ms): 85
* Sample properties:
* displayName (String) optional
* aid (String) optional
* type (String) optional
* name (String) optional
* created (Long) optional
* Total nodes: 25
## Label 'PrivilegeClass'
*Profile of nodes labeled 'PrivilegeClass'*
### Observations
* Inbound relationship types:
* dominates
* controlledBy
* Outbound relationship types:
* dominates
* Run Time (ms): 68
* Sample properties:
* created (Long) optional
* name (String) optional
* description (String) optional
* pid (String) optional
* type (String) optional
* Total nodes: 16
## Relationships Typed 'dominates'
*Profile of relationships of type 'dominates'*
### Observations
* Run Time (ms): 292
* Sample properties:
* Total relationships: 16
* domain:
* PrivilegeClass
* range:
* PrivilegeClass
## Relationships Typed 'owns'
*Profile of relationships of type 'owns'*
### Observations
* Run Time (ms): 170
* Sample properties:
* Total relationships: 77
* domain:
* Actor
* range:
* Provenance
## Relationships Typed 'triggered'
*Profile of relationships of type 'triggered'*
### Observations
* Run Time (ms): 111
* Sample properties:
* workflow (String) required
* Total relationships: 1257
* domain:
* Provenance
* range:
* Provenance
## Relationships Typed 'controlledBy'
*Profile of relationships of type 'controlledBy'*
### Observations
* Run Time (ms): 86
* Sample properties:
* Total relationships: 178
* domain:
* Provenance
* range:
* PrivilegeClass
## Relationships Typed 'contributed'
*Profile of relationships of type 'contributed'*
### Observations
* Run Time (ms): 92
* Sample properties:
* workflow (String) required
* Total relationships: 483
* domain:
* Provenance
* range:
* Provenance
## Relationships Typed 'generated'
*Profile of relationships of type 'generated'*
### Observations
* Run Time (ms): 117
* Sample properties:
* workflow (String) required
* Total relationships: 2911
* domain:
* Provenance
* range:
* Provenance
## Relationships Typed 'input to'
*Profile of relationships of type 'input to'*
### Observations
* Run Time (ms): 139
* Sample properties:
* workflow (String) required
* Total relationships: 7026
* domain:
* Provenance
* range:
* Provenance