Hello,

I thought I'd drop a note to this list on something I've been working on 
for a bit.   From time to time, I get new neo4j databases from colleagues.  
Because the notion of schemas is somewhat loose in neo4j, a lot of times 
they don't have documentation per se on the graph model for their ad-hoc 
database.   So I had taken to running a few "forensic queries" on a new 
database to figure a few things out of it before I jumped in and started 
querying.

In the last week, I wrapped a lot of those up in a simple profiling 
application.  You can call it on any neo4j data directory (tested mostly 
with 2.0 databases).  It will build a short report in JSON, Markdown, or 
HTML that describes the node labels, constraints, relationship types, 
sample properties, and so on.  It attempts to sample a bit of data and 
infer data types for properties, and whether or not they're required.   In 
a way, we're trying to do very basic data profiling on novel neo4j 
databases and package the result into a file that can be used as an 
automatically generated schema documentation of sorts for the lazy neo4j db 
admin (which describes many of us I think).

The code is here:  https://github.com/moxious/neoprofiler

Check the README on GitHub for instructions on how to run it.   If you want 
to have a look at what the results look like, see the attached sample 
report describing a neo4j database I work with often.

Thanks,
David

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Title: Profile of /home/x/provenance.db/ generated 2014/05/08 09:16:11
# Profile of /home/x/provenance.db/ generated 2014/05/08 09:16:11 /home/x/provenance.db/ ## Schema *Information about Neo4J's database schema* ### Indexes * On nodes labeled Provenance. Property keys: * oid * On nodes labeled Actor. Property keys: * aid * On nodes labeled PrivilegeClass. Property keys: * pid * On nodes labeled NonProvenance. Property keys: * npid ### Non-Index Constraints * Type UNIQUENESS On nodes labeled Provenance. Property keys: * oid * Type UNIQUENESS On nodes labeled Actor. Property keys: * aid * Type UNIQUENESS On nodes labeled PrivilegeClass. Property keys: * pid * Type UNIQUENESS On nodes labeled NonProvenance. Property keys: * npid ### Observations * Run Time (ms): 3 ## Nodes *Summary statistics about nodes in the graph* ### Observations * Node Labels: * [Provenance] * [Actor] * [PrivilegeClass] * Run Time (ms): 1417 * Total Nodes: 13266 ## Relationships *Summary statistics about relationships in the graph* ### Observations * Available Relationship Types: * dominates * owns * triggered * controlledBy * contributed * generated * input to * Run Time (ms): 826 * Total Relationships: 11948 ## Label 'Provenance' *Profile of nodes labeled 'Provenance'* ### Observations * Inbound relationship types: * triggered * contributed * generated * input to * owns * Outbound relationship types: * triggered * controlledBy * contributed * input to * generated * Run Time (ms): 636 * Sample properties: * when_end (String) optional * content (String) optional * type (String) required * SGFs (String[]) optional * certainty (String) required * created (Long) required * when_start (String) optional * oid (String) required * activity (String) optional * subtype (String) required * workflow (String) optional * name (String) required * ownerid (String) required * metadata:implus:motif (String) optional * Total nodes: 13225 ## Label 'Actor' *Profile of nodes labeled 'Actor'* ### Observations * Inbound relationship types: * N/A * Outbound relationship types: * owns * Run Time (ms): 85 * Sample properties: * displayName (String) optional * aid (String) optional * type (String) optional * name (String) optional * created (Long) optional * Total nodes: 25 ## Label 'PrivilegeClass' *Profile of nodes labeled 'PrivilegeClass'* ### Observations * Inbound relationship types: * dominates * controlledBy * Outbound relationship types: * dominates * Run Time (ms): 68 * Sample properties: * created (Long) optional * name (String) optional * description (String) optional * pid (String) optional * type (String) optional * Total nodes: 16 ## Relationships Typed 'dominates' *Profile of relationships of type 'dominates'* ### Observations * Run Time (ms): 292 * Sample properties: * Total relationships: 16 * domain: * PrivilegeClass * range: * PrivilegeClass ## Relationships Typed 'owns' *Profile of relationships of type 'owns'* ### Observations * Run Time (ms): 170 * Sample properties: * Total relationships: 77 * domain: * Actor * range: * Provenance ## Relationships Typed 'triggered' *Profile of relationships of type 'triggered'* ### Observations * Run Time (ms): 111 * Sample properties: * workflow (String) required * Total relationships: 1257 * domain: * Provenance * range: * Provenance ## Relationships Typed 'controlledBy' *Profile of relationships of type 'controlledBy'* ### Observations * Run Time (ms): 86 * Sample properties: * Total relationships: 178 * domain: * Provenance * range: * PrivilegeClass ## Relationships Typed 'contributed' *Profile of relationships of type 'contributed'* ### Observations * Run Time (ms): 92 * Sample properties: * workflow (String) required * Total relationships: 483 * domain: * Provenance * range: * Provenance ## Relationships Typed 'generated' *Profile of relationships of type 'generated'* ### Observations * Run Time (ms): 117 * Sample properties: * workflow (String) required * Total relationships: 2911 * domain: * Provenance * range: * Provenance ## Relationships Typed 'input to' *Profile of relationships of type 'input to'* ### Observations * Run Time (ms): 139 * Sample properties: * workflow (String) required * Total relationships: 7026 * domain: * Provenance * range: * Provenance

Reply via email to