Re: [Neo4j] best practices for storing 40 millions of nodes

2018-10-10 Thread Russ Burkert
The link:
https://tbgraph.wordpress.com/2018/09/09/article-recommendation-system-on-a-citation-network-using-personalized-pagerank-and-neo4j/

has some good info on working with NLP graphs

On Wed, Oct 10, 2018 at 2:41 PM John Carlo  wrote:

> You could start with the 20 newsgroups dataset
> http://qwone.com/~jason/20Newsgroups/
>
> Il giorno mercoledì 10 ottobre 2018 17:42:37 UTC+2, Sakshi Srivastva ha
> scritto:
>>
>> Sir, i am in search of a data set in which i can find hidden facts like
>> panama leak ,please suggest me similar big data set .
>>
>> On Wed, Oct 10, 2018 at 7:34 PM John Carlo  wrote:
>>
>>> Hello Michael,
>>>
>>> thank your for your reply.
>>>
>>> I've re-implemented the db structure using unique words/nodes, now the
>>> number of nodes dropped from 47.108.544 to 1.934.049
>>>
>>> I still have a huge number of relationships, 45.442.034 that now point
>>> to the unique nodes, and the query are slow.
>>>
>>> My end goal is to find specific patterns in sentence structures, like
>>> the following example
>>>
>>> (John)-[ACTION ]->(eat)-[SUBJECT]->(apple)
>>>
>>> Any suggestion will be appreciated
>>>
>>> thank you very much
>>>
>>> Il giorno mercoledì 10 ottobre 2018 00:50:22 UTC+2, Michael Hunger ha
>>> scritto:

 Yes, I would only create every word node once. And then link the
 sentence structures.
 In general, just finding all the word nodes is probably not your
 end-goal or?

 Best ask here Community Site & Forum  in
 the Modeling and Cypher categories.


 On Tue, Oct 9, 2018 at 11:00 PM John Carlo 
 wrote:

> Hello all,
>
> I've been using Neo4j for some weeks and I think it's awesome.
>
> I'm building an NLP application, and basically, I'm using Neo4j for
> storing the dependency graph generated by a semantic parser, something 
> like
> this:
>
> https://explosion.ai/demos/displacy?text=Hi%20dear%2C%20what%20is%20your%20name%3F&model=en_core_web_sm&cpu=1&cph=0
>
> In the nodes, I store the single words contained in the sentences, and
> I connect them through relations with a number of different types.
>
> For my application, I have the requirement to find all the nodes that
> contain a given word, so basically I have to search through all the nodes,
> finding those that contain the input word.  Of course, I've already 
> created
> an index on the word text field.
>
> I'm working on a very big dataset (by the way, the CSV importer is a
> great thing).
>
> On my laptop, the following query takes about 20 ms
> *MATCH (t:token) WHERE t.text="avoid" RETURN t.text*
>
> Here are the details of the graph.db:
> 47.108.544 nodes
>
> *45.442.034 relationships*
>
> *13.39 GiB db size*
> *Index created on token.text field*
>
> PROFILE MATCH (t:token) WHERE t.text="switch" RETURN t.text
> 
> NodeIndexSeek
> 251,679 db hits
> ---
> Projection
> 251,678 db hits
> --
> ProduceResults
> 251,678 db hits
>
> I wonder if I'm doing something wrong in indexing such amount of
> nodes. At the moment, I create a new node for each word I encounter in the
> text, even if the text is the same of other nodes.
>
> Should I create a new node only when a new word is encountered,
> managing the sentence structures through relationships?
>
> Could you please help me with a suggestion or best practice to adopt
> for this specific case? I think that Neo4j is a great piece of software 
> and
> I'd like to make the most out of it :-)
>
> Thank you very much
>
> --
> You received this message because you are subscribed to the Google
> Groups "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to neo4j+un...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
 --
>>> You received this message because you are subscribed to the Google
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to neo4j+un...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to neo4j+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [Neo4j] best practices for storing 40 millions of nodes

2018-10-10 Thread John Carlo
You could start with the 20 newsgroups dataset
http://qwone.com/~jason/20Newsgroups/

Il giorno mercoledì 10 ottobre 2018 17:42:37 UTC+2, Sakshi Srivastva ha 
scritto:
>
> Sir, i am in search of a data set in which i can find hidden facts like 
> panama leak ,please suggest me similar big data set .
>
> On Wed, Oct 10, 2018 at 7:34 PM John Carlo  > wrote:
>
>> Hello Michael, 
>>
>> thank your for your reply. 
>>
>> I've re-implemented the db structure using unique words/nodes, now the 
>> number of nodes dropped from 47.108.544 to 1.934.049
>>
>> I still have a huge number of relationships, 45.442.034 that now point to 
>> the unique nodes, and the query are slow.
>>
>> My end goal is to find specific patterns in sentence structures, like the 
>> following example
>>
>> (John)-[ACTION ]->(eat)-[SUBJECT]->(apple)
>>
>> Any suggestion will be appreciated
>>
>> thank you very much
>>
>> Il giorno mercoledì 10 ottobre 2018 00:50:22 UTC+2, Michael Hunger ha 
>> scritto:
>>>
>>> Yes, I would only create every word node once. And then link the 
>>> sentence structures.
>>> In general, just finding all the word nodes is probably not your 
>>> end-goal or?
>>>
>>> Best ask here Community Site & Forum  in 
>>> the Modeling and Cypher categories.
>>>
>>>
>>> On Tue, Oct 9, 2018 at 11:00 PM John Carlo  wrote:
>>>
 Hello all, 

 I've been using Neo4j for some weeks and I think it's awesome. 

 I'm building an NLP application, and basically, I'm using Neo4j for 
 storing the dependency graph generated by a semantic parser, something 
 like 
 this:

 https://explosion.ai/demos/displacy?text=Hi%20dear%2C%20what%20is%20your%20name%3F&model=en_core_web_sm&cpu=1&cph=0

 In the nodes, I store the single words contained in the sentences, and 
 I connect them through relations with a number of different types.

 For my application, I have the requirement to find all the nodes that 
 contain a given word, so basically I have to search through all the nodes, 
 finding those that contain the input word.  Of course, I've already 
 created 
 an index on the word text field.

 I'm working on a very big dataset (by the way, the CSV importer is a 
 great thing). 

 On my laptop, the following query takes about 20 ms
 *MATCH (t:token) WHERE t.text="avoid" RETURN t.text*

 Here are the details of the graph.db:
 47.108.544 nodes

 *45.442.034 relationships*

 *13.39 GiB db size*
 *Index created on token.text field*

 PROFILE MATCH (t:token) WHERE t.text="switch" RETURN t.text
 
 NodeIndexSeek
 251,679 db hits
 ---
 Projection
 251,678 db hits
 --
 ProduceResults
 251,678 db hits

 I wonder if I'm doing something wrong in indexing such amount of nodes. 
 At the moment, I create a new node for each word I encounter in the text, 
 even if the text is the same of other nodes.

 Should I create a new node only when a new word is encountered, 
 managing the sentence structures through relationships?

 Could you please help me with a suggestion or best practice to adopt 
 for this specific case? I think that Neo4j is a great piece of software 
 and 
 I'd like to make the most out of it :-)

 Thank you very much 

 -- 
 You received this message because you are subscribed to the Google 
 Groups "Neo4j" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to neo4j+un...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to neo4j+un...@googlegroups.com .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [Neo4j] best practices for storing 40 millions of nodes

2018-10-10 Thread John Carlo
Sure! I'm going to post now. Thank you!

Il giorno mercoledì 10 ottobre 2018 21:47:41 UTC+2, Michael Hunger ha 
scritto:
>
> John, can you post here Community Site & Forum 
> 
> Easier for me to answer there.
>
>
> On Wed, Oct 10, 2018 at 3:41 PM John Carlo  > wrote:
>
>> Hello Michael, 
>>
>> thank your for your reply. 
>>
>> I've re-implemented the db structure using unique words/nodes, now the 
>> number of nodes dropped from 47.108.544 to 1.934.049
>>
>> I still have a huge number of relationships, 45.442.034 that now point to 
>> the unique nodes, and the query are slow.
>>
>> My end goal is to find specific patterns in sentence structures, like the 
>> following example
>>
>> (John)-[ACTION ]->(eat)-[SUBJECT]->(apple)
>>
>> Any suggestion will be appreciated
>>
>> thank you very much
>>
>>
>>
>> Il giorno mercoledì 10 ottobre 2018 00:50:22 UTC+2, Michael Hunger ha 
>> scritto:
>>>
>>> Yes, I would only create every word node once. And then link the 
>>> sentence structures.
>>> In general, just finding all the word nodes is probably not your 
>>> end-goal or?
>>>
>>> Best ask here Community Site & Forum  in 
>>> the Modeling and Cypher categories.
>>>
>>>
>>> On Tue, Oct 9, 2018 at 11:00 PM John Carlo  wrote:
>>>
 Hello all, 

 I've been using Neo4j for some weeks and I think it's awesome. 

 I'm building an NLP application, and basically, I'm using Neo4j for 
 storing the dependency graph generated by a semantic parser, something 
 like 
 this:

 https://explosion.ai/demos/displacy?text=Hi%20dear%2C%20what%20is%20your%20name%3F&model=en_core_web_sm&cpu=1&cph=0

 In the nodes, I store the single words contained in the sentences, and 
 I connect them through relations with a number of different types.

 For my application, I have the requirement to find all the nodes that 
 contain a given word, so basically I have to search through all the nodes, 
 finding those that contain the input word.  Of course, I've already 
 created 
 an index on the word text field.

 I'm working on a very big dataset (by the way, the CSV importer is a 
 great thing). 

 On my laptop, the following query takes about 20 ms
 *MATCH (t:token) WHERE t.text="avoid" RETURN t.text*

 Here are the details of the graph.db:
 47.108.544 nodes

 *45.442.034 relationships*

 *13.39 GiB db size*
 *Index created on token.text field*

 PROFILE MATCH (t:token) WHERE t.text="switch" RETURN t.text
 
 NodeIndexSeek
 251,679 db hits
 ---
 Projection
 251,678 db hits
 --
 ProduceResults
 251,678 db hits

 I wonder if I'm doing something wrong in indexing such amount of nodes. 
 At the moment, I create a new node for each word I encounter in the text, 
 even if the text is the same of other nodes.

 Should I create a new node only when a new word is encountered, 
 managing the sentence structures through relationships?

 Could you please help me with a suggestion or best practice to adopt 
 for this specific case? I think that Neo4j is a great piece of software 
 and 
 I'd like to make the most out of it :-)

 Thank you very much 

 -- 
 You received this message because you are subscribed to the Google 
 Groups "Neo4j" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to neo4j+un...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to neo4j+un...@googlegroups.com .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [Neo4j] best practices for storing 40 millions of nodes

2018-10-10 Thread 'Michael Hunger' via Neo4j
John, can you post here Community Site & Forum 
Easier for me to answer there.


On Wed, Oct 10, 2018 at 3:41 PM John Carlo  wrote:

> Hello Michael,
>
> thank your for your reply.
>
> I've re-implemented the db structure using unique words/nodes, now the
> number of nodes dropped from 47.108.544 to 1.934.049
>
> I still have a huge number of relationships, 45.442.034 that now point to
> the unique nodes, and the query are slow.
>
> My end goal is to find specific patterns in sentence structures, like the
> following example
>
> (John)-[ACTION ]->(eat)-[SUBJECT]->(apple)
>
> Any suggestion will be appreciated
>
> thank you very much
>
>
>
> Il giorno mercoledì 10 ottobre 2018 00:50:22 UTC+2, Michael Hunger ha
> scritto:
>>
>> Yes, I would only create every word node once. And then link the sentence
>> structures.
>> In general, just finding all the word nodes is probably not your end-goal
>> or?
>>
>> Best ask here Community Site & Forum  in
>> the Modeling and Cypher categories.
>>
>>
>> On Tue, Oct 9, 2018 at 11:00 PM John Carlo  wrote:
>>
>>> Hello all,
>>>
>>> I've been using Neo4j for some weeks and I think it's awesome.
>>>
>>> I'm building an NLP application, and basically, I'm using Neo4j for
>>> storing the dependency graph generated by a semantic parser, something like
>>> this:
>>>
>>> https://explosion.ai/demos/displacy?text=Hi%20dear%2C%20what%20is%20your%20name%3F&model=en_core_web_sm&cpu=1&cph=0
>>>
>>> In the nodes, I store the single words contained in the sentences, and I
>>> connect them through relations with a number of different types.
>>>
>>> For my application, I have the requirement to find all the nodes that
>>> contain a given word, so basically I have to search through all the nodes,
>>> finding those that contain the input word.  Of course, I've already created
>>> an index on the word text field.
>>>
>>> I'm working on a very big dataset (by the way, the CSV importer is a
>>> great thing).
>>>
>>> On my laptop, the following query takes about 20 ms
>>> *MATCH (t:token) WHERE t.text="avoid" RETURN t.text*
>>>
>>> Here are the details of the graph.db:
>>> 47.108.544 nodes
>>>
>>> *45.442.034 relationships*
>>>
>>> *13.39 GiB db size*
>>> *Index created on token.text field*
>>>
>>> PROFILE MATCH (t:token) WHERE t.text="switch" RETURN t.text
>>> 
>>> NodeIndexSeek
>>> 251,679 db hits
>>> ---
>>> Projection
>>> 251,678 db hits
>>> --
>>> ProduceResults
>>> 251,678 db hits
>>>
>>> I wonder if I'm doing something wrong in indexing such amount of nodes.
>>> At the moment, I create a new node for each word I encounter in the text,
>>> even if the text is the same of other nodes.
>>>
>>> Should I create a new node only when a new word is encountered, managing
>>> the sentence structures through relationships?
>>>
>>> Could you please help me with a suggestion or best practice to adopt for
>>> this specific case? I think that Neo4j is a great piece of software and I'd
>>> like to make the most out of it :-)
>>>
>>> Thank you very much
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to neo4j+un...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to neo4j+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [Neo4j] best practices for storing 40 millions of nodes

2018-10-10 Thread Sakshi Srivastva
Sir, i am in search of a data set in which i can find hidden facts like
panama leak ,please suggest me similar big data set .

On Wed, Oct 10, 2018 at 7:34 PM John Carlo  wrote:

> Hello Michael,
>
> thank your for your reply.
>
> I've re-implemented the db structure using unique words/nodes, now the
> number of nodes dropped from 47.108.544 to 1.934.049
>
> I still have a huge number of relationships, 45.442.034 that now point to
> the unique nodes, and the query are slow.
>
> My end goal is to find specific patterns in sentence structures, like the
> following example
>
> (John)-[ACTION ]->(eat)-[SUBJECT]->(apple)
>
> Any suggestion will be appreciated
>
> thank you very much
>
> Il giorno mercoledì 10 ottobre 2018 00:50:22 UTC+2, Michael Hunger ha
> scritto:
>>
>> Yes, I would only create every word node once. And then link the sentence
>> structures.
>> In general, just finding all the word nodes is probably not your end-goal
>> or?
>>
>> Best ask here Community Site & Forum  in
>> the Modeling and Cypher categories.
>>
>>
>> On Tue, Oct 9, 2018 at 11:00 PM John Carlo  wrote:
>>
>>> Hello all,
>>>
>>> I've been using Neo4j for some weeks and I think it's awesome.
>>>
>>> I'm building an NLP application, and basically, I'm using Neo4j for
>>> storing the dependency graph generated by a semantic parser, something like
>>> this:
>>>
>>> https://explosion.ai/demos/displacy?text=Hi%20dear%2C%20what%20is%20your%20name%3F&model=en_core_web_sm&cpu=1&cph=0
>>>
>>> In the nodes, I store the single words contained in the sentences, and I
>>> connect them through relations with a number of different types.
>>>
>>> For my application, I have the requirement to find all the nodes that
>>> contain a given word, so basically I have to search through all the nodes,
>>> finding those that contain the input word.  Of course, I've already created
>>> an index on the word text field.
>>>
>>> I'm working on a very big dataset (by the way, the CSV importer is a
>>> great thing).
>>>
>>> On my laptop, the following query takes about 20 ms
>>> *MATCH (t:token) WHERE t.text="avoid" RETURN t.text*
>>>
>>> Here are the details of the graph.db:
>>> 47.108.544 nodes
>>>
>>> *45.442.034 relationships*
>>>
>>> *13.39 GiB db size*
>>> *Index created on token.text field*
>>>
>>> PROFILE MATCH (t:token) WHERE t.text="switch" RETURN t.text
>>> 
>>> NodeIndexSeek
>>> 251,679 db hits
>>> ---
>>> Projection
>>> 251,678 db hits
>>> --
>>> ProduceResults
>>> 251,678 db hits
>>>
>>> I wonder if I'm doing something wrong in indexing such amount of nodes.
>>> At the moment, I create a new node for each word I encounter in the text,
>>> even if the text is the same of other nodes.
>>>
>>> Should I create a new node only when a new word is encountered, managing
>>> the sentence structures through relationships?
>>>
>>> Could you please help me with a suggestion or best practice to adopt for
>>> this specific case? I think that Neo4j is a great piece of software and I'd
>>> like to make the most out of it :-)
>>>
>>> Thank you very much
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to neo4j+un...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to neo4j+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [Neo4j] best practices for storing 40 millions of nodes

2018-10-10 Thread John Carlo
Hello Michael, 

thank your for your reply. 

I've re-implemented the db structure using unique words/nodes, now the 
number of nodes dropped from 47.108.544 to 1.934.049

I still have a huge number of relationships, 45.442.034 that now point to 
the unique nodes, and the query are slow.

My end goal is to find specific patterns in sentence structures, like the 
following example

(John)-[ACTION ]->(eat)-[SUBJECT]->(apple)

Any suggestion will be appreciated

thank you very much

Il giorno mercoledì 10 ottobre 2018 00:50:22 UTC+2, Michael Hunger ha 
scritto:
>
> Yes, I would only create every word node once. And then link the sentence 
> structures.
> In general, just finding all the word nodes is probably not your end-goal 
> or?
>
> Best ask here Community Site & Forum  in the 
> Modeling and Cypher categories.
>
>
> On Tue, Oct 9, 2018 at 11:00 PM John Carlo  > wrote:
>
>> Hello all, 
>>
>> I've been using Neo4j for some weeks and I think it's awesome. 
>>
>> I'm building an NLP application, and basically, I'm using Neo4j for 
>> storing the dependency graph generated by a semantic parser, something like 
>> this:
>>
>> https://explosion.ai/demos/displacy?text=Hi%20dear%2C%20what%20is%20your%20name%3F&model=en_core_web_sm&cpu=1&cph=0
>>
>> In the nodes, I store the single words contained in the sentences, and I 
>> connect them through relations with a number of different types.
>>
>> For my application, I have the requirement to find all the nodes that 
>> contain a given word, so basically I have to search through all the nodes, 
>> finding those that contain the input word.  Of course, I've already created 
>> an index on the word text field.
>>
>> I'm working on a very big dataset (by the way, the CSV importer is a 
>> great thing). 
>>
>> On my laptop, the following query takes about 20 ms
>> *MATCH (t:token) WHERE t.text="avoid" RETURN t.text*
>>
>> Here are the details of the graph.db:
>> 47.108.544 nodes
>>
>> *45.442.034 relationships*
>>
>> *13.39 GiB db size*
>> *Index created on token.text field*
>>
>> PROFILE MATCH (t:token) WHERE t.text="switch" RETURN t.text
>> 
>> NodeIndexSeek
>> 251,679 db hits
>> ---
>> Projection
>> 251,678 db hits
>> --
>> ProduceResults
>> 251,678 db hits
>>
>> I wonder if I'm doing something wrong in indexing such amount of nodes. 
>> At the moment, I create a new node for each word I encounter in the text, 
>> even if the text is the same of other nodes.
>>
>> Should I create a new node only when a new word is encountered, managing 
>> the sentence structures through relationships?
>>
>> Could you please help me with a suggestion or best practice to adopt for 
>> this specific case? I think that Neo4j is a great piece of software and I'd 
>> like to make the most out of it :-)
>>
>> Thank you very much 
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to neo4j+un...@googlegroups.com .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [Neo4j] best practices for storing 40 millions of nodes

2018-10-10 Thread John Carlo
Hello Michael, 

thank your for your reply. 

I've re-implemented the db structure using unique words/nodes, now the 
number of nodes dropped from 47.108.544 to 1.934.049

I still have a huge number of relationships, 45.442.034 that now point to 
the unique nodes, and the query are slow.

My end goal is to find specific patterns in sentence structures, like the 
following example

(John)-[ACTION ]->(eat)-[SUBJECT]->(apple)

Any suggestion will be appreciated

thank you very much



Il giorno mercoledì 10 ottobre 2018 00:50:22 UTC+2, Michael Hunger ha 
scritto:
>
> Yes, I would only create every word node once. And then link the sentence 
> structures.
> In general, just finding all the word nodes is probably not your end-goal 
> or?
>
> Best ask here Community Site & Forum  in the 
> Modeling and Cypher categories.
>
>
> On Tue, Oct 9, 2018 at 11:00 PM John Carlo  > wrote:
>
>> Hello all, 
>>
>> I've been using Neo4j for some weeks and I think it's awesome. 
>>
>> I'm building an NLP application, and basically, I'm using Neo4j for 
>> storing the dependency graph generated by a semantic parser, something like 
>> this:
>>
>> https://explosion.ai/demos/displacy?text=Hi%20dear%2C%20what%20is%20your%20name%3F&model=en_core_web_sm&cpu=1&cph=0
>>
>> In the nodes, I store the single words contained in the sentences, and I 
>> connect them through relations with a number of different types.
>>
>> For my application, I have the requirement to find all the nodes that 
>> contain a given word, so basically I have to search through all the nodes, 
>> finding those that contain the input word.  Of course, I've already created 
>> an index on the word text field.
>>
>> I'm working on a very big dataset (by the way, the CSV importer is a 
>> great thing). 
>>
>> On my laptop, the following query takes about 20 ms
>> *MATCH (t:token) WHERE t.text="avoid" RETURN t.text*
>>
>> Here are the details of the graph.db:
>> 47.108.544 nodes
>>
>> *45.442.034 relationships*
>>
>> *13.39 GiB db size*
>> *Index created on token.text field*
>>
>> PROFILE MATCH (t:token) WHERE t.text="switch" RETURN t.text
>> 
>> NodeIndexSeek
>> 251,679 db hits
>> ---
>> Projection
>> 251,678 db hits
>> --
>> ProduceResults
>> 251,678 db hits
>>
>> I wonder if I'm doing something wrong in indexing such amount of nodes. 
>> At the moment, I create a new node for each word I encounter in the text, 
>> even if the text is the same of other nodes.
>>
>> Should I create a new node only when a new word is encountered, managing 
>> the sentence structures through relationships?
>>
>> Could you please help me with a suggestion or best practice to adopt for 
>> this specific case? I think that Neo4j is a great piece of software and I'd 
>> like to make the most out of it :-)
>>
>> Thank you very much 
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to neo4j+un...@googlegroups.com .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [Neo4j] best practices for storing 40 millions of nodes

2018-10-10 Thread John Carlo
It's a custom dataset generated from a series of documents in XML format, 
translated in CSV and the imported in Neo4j via the built-in CSV loader


Il giorno mercoledì 10 ottobre 2018 09:46:39 UTC+2, Sakshi Srivastva ha 
scritto:
>
> CAN YOU PLEASE TELL ME WHICH DATA SET YOU ARE USING.
>
> On Tue, Oct 9, 2018 at 3:50 PM 'Michael Hunger' via Neo4j <
> ne...@googlegroups.com > wrote:
>
>> Yes, I would only create every word node once. And then link the sentence 
>> structures.
>> In general, just finding all the word nodes is probably not your end-goal 
>> or?
>>
>> Best ask here Community Site & Forum  in 
>> the Modeling and Cypher categories.
>>
>>
>> On Tue, Oct 9, 2018 at 11:00 PM John Carlo > > wrote:
>>
>>> Hello all, 
>>>
>>> I've been using Neo4j for some weeks and I think it's awesome. 
>>>
>>> I'm building an NLP application, and basically, I'm using Neo4j for 
>>> storing the dependency graph generated by a semantic parser, something like 
>>> this:
>>>
>>> https://explosion.ai/demos/displacy?text=Hi%20dear%2C%20what%20is%20your%20name%3F&model=en_core_web_sm&cpu=1&cph=0
>>>
>>> In the nodes, I store the single words contained in the sentences, and I 
>>> connect them through relations with a number of different types.
>>>
>>> For my application, I have the requirement to find all the nodes that 
>>> contain a given word, so basically I have to search through all the nodes, 
>>> finding those that contain the input word.  Of course, I've already created 
>>> an index on the word text field.
>>>
>>> I'm working on a very big dataset (by the way, the CSV importer is a 
>>> great thing). 
>>>
>>> On my laptop, the following query takes about 20 ms
>>> *MATCH (t:token) WHERE t.text="avoid" RETURN t.text*
>>>
>>> Here are the details of the graph.db:
>>> 47.108.544 nodes
>>>
>>> *45.442.034 relationships*
>>>
>>> *13.39 GiB db size*
>>> *Index created on token.text field*
>>>
>>> PROFILE MATCH (t:token) WHERE t.text="switch" RETURN t.text
>>> 
>>> NodeIndexSeek
>>> 251,679 db hits
>>> ---
>>> Projection
>>> 251,678 db hits
>>> --
>>> ProduceResults
>>> 251,678 db hits
>>>
>>> I wonder if I'm doing something wrong in indexing such amount of nodes. 
>>> At the moment, I create a new node for each word I encounter in the text, 
>>> even if the text is the same of other nodes.
>>>
>>> Should I create a new node only when a new word is encountered, managing 
>>> the sentence structures through relationships?
>>>
>>> Could you please help me with a suggestion or best practice to adopt for 
>>> this specific case? I think that Neo4j is a great piece of software and I'd 
>>> like to make the most out of it :-)
>>>
>>> Thank you very much 
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to neo4j+un...@googlegroups.com .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to neo4j+un...@googlegroups.com .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [Neo4j] best practices for storing 40 millions of nodes

2018-10-10 Thread Sakshi Srivastva
CAN YOU PLEASE TELL ME WHICH DATA SET YOU ARE USING.

On Tue, Oct 9, 2018 at 3:50 PM 'Michael Hunger' via Neo4j <
neo4j@googlegroups.com> wrote:

> Yes, I would only create every word node once. And then link the sentence
> structures.
> In general, just finding all the word nodes is probably not your end-goal
> or?
>
> Best ask here Community Site & Forum  in the
> Modeling and Cypher categories.
>
>
> On Tue, Oct 9, 2018 at 11:00 PM John Carlo 
> wrote:
>
>> Hello all,
>>
>> I've been using Neo4j for some weeks and I think it's awesome.
>>
>> I'm building an NLP application, and basically, I'm using Neo4j for
>> storing the dependency graph generated by a semantic parser, something like
>> this:
>>
>> https://explosion.ai/demos/displacy?text=Hi%20dear%2C%20what%20is%20your%20name%3F&model=en_core_web_sm&cpu=1&cph=0
>>
>> In the nodes, I store the single words contained in the sentences, and I
>> connect them through relations with a number of different types.
>>
>> For my application, I have the requirement to find all the nodes that
>> contain a given word, so basically I have to search through all the nodes,
>> finding those that contain the input word.  Of course, I've already created
>> an index on the word text field.
>>
>> I'm working on a very big dataset (by the way, the CSV importer is a
>> great thing).
>>
>> On my laptop, the following query takes about 20 ms
>> *MATCH (t:token) WHERE t.text="avoid" RETURN t.text*
>>
>> Here are the details of the graph.db:
>> 47.108.544 nodes
>>
>> *45.442.034 relationships*
>>
>> *13.39 GiB db size*
>> *Index created on token.text field*
>>
>> PROFILE MATCH (t:token) WHERE t.text="switch" RETURN t.text
>> 
>> NodeIndexSeek
>> 251,679 db hits
>> ---
>> Projection
>> 251,678 db hits
>> --
>> ProduceResults
>> 251,678 db hits
>>
>> I wonder if I'm doing something wrong in indexing such amount of nodes.
>> At the moment, I create a new node for each word I encounter in the text,
>> even if the text is the same of other nodes.
>>
>> Should I create a new node only when a new word is encountered, managing
>> the sentence structures through relationships?
>>
>> Could you please help me with a suggestion or best practice to adopt for
>> this specific case? I think that Neo4j is a great piece of software and I'd
>> like to make the most out of it :-)
>>
>> Thank you very much
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to neo4j+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to neo4j+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.