Re: [Neo4j] LOAD CSV creates nodes but does not set properties

Michael Hunger Mon, 23 Jun 2014 03:23:30 -0700

Please start a new thread for this discussion.

Am 23.06.2014 um 11:02 schrieb Paul Damian <pauldamia...@gmail.com>:


> Hey, 
> I'm trying to run a command to find out 10 clients and the companies they 
> work for. I've used a query like this:
> match (c: Client)-[WORKS_FOR]->(co: Company)  return c, co limit 10
> However, it keeps returning Java heap space error. Neo4j is installed on a vm 
> with windows server 2012R2 Intel Xeon @ 2.27 GHz and 8 GB of RAM. The graph 
> db has over 30 GB (which is also weird since the SQL database that was used 
> to populate the graph only has 13 GB). What can I do to improve the query 
> performance beside adding indexes?
> 
> 
> 
> miercuri, 18 iunie 2014, 16:34:10 UTC+3, Michael Hunger a scris:
> For me it sounds as if there is a big cross product happening.
> 
> I.e. many Verticals with the same Id
> 
> What happens if you do:
> 
> MATCH (v:Vertical)
> RETURN v.Id, count(*) 
> 
> Michael
> 
> Am 18.06.2014 um 15:26 schrieb Paul Damian <paulda...@gmail.com>:
> 
>> Hi,
>> 
>> I've tried with another file, which contains ClientdId and VerticalId. The 
>> thing is, there are only 7 verticals and 11M clients, so there is an obvious 
>> one-to-many relationship there.
>> When I run 
>> LOAD CSV WITH HEADERS FROM "file:/Users/pauld/Documents/Vertical.csv" AS c
>> WITH c LIMIT 100
>> MATCH (cli: Client { Id: toInt(c.ClientId)}), (vert: Vertical { Id: 
>> toInt(c.VerticalId)})
>> Return count(*)
>> it return Neo.DatabaseError.Statement.ExecutionFailure 
>> I get the same result when I only match the verticals. 
>> However, if I run 
>> LOAD CSV WITH HEADERS FROM "file:/Users/pauld/Documents/Vertical.csv" AS c
>> WITH c LIMIT 100
>> MATCH (cli: Client { Id: toInt(c.ClientId)})
>> Return count(*)
>>  it returns 100.
>> I think it has something to do with the fact that the first 100 verticals 
>> have the same Id
>> 
>> miercuri, 18 iunie 2014, 14:20:57 UTC+3, Michael Hunger a scris:
>> sorry
>> 
>> LOAD CSV WITH HEADERS FROM "file:/Users/pauld/Documents/LOCATED_IN.csv" AS c
>> WITH c
>> LIMIT 100
>> MATCH (client: Client { Id: toInt(c.Id)}), (city: City { Id: 
>> toInt(c.CityId)})
>> Return count(*)
>> 
>> 
>> Am 18.06.2014 um 11:44 schrieb Paul Damian <paulda...@gmail.com>:
>> 
>>> I cannot run this command. It returns invalid syntax.  Only way I could run 
>>> it was 
>>> 
>>>  LOAD CSV WITH HEADERS FROM "file:/Users/pauld/Documents/LOCATED_IN.csv" AS 
>>> c
>>>  MATCH (client: Client { Id: toInt(c.Id)}), (city: City { Id: 
>>> toInt(c.CityId)})
>>> Return count(*) Limit 100
>>> 
>>> Also, I think a skype call would be great.
>>> 
>>> marți, 17 iunie 2014, 21:36:05 UTC+3, Michael Hunger a scris:
>>> The something is really wrong.
>>> 
>>> What happens if you do
>>> 
>>>>>>>>  
>>>>>>>>  LOAD CSV WITH HEADERS FROM 
>>>>>>>> "file:/Users/pauld/Documents/LOCATED_IN.csv" AS c
>>> Limit 100
>>>>>>>>  MATCH (client: Client { Id: toInt(c.Id)}), (city: City { Id: 
>>>>>>>> toInt(c.CityId)})
>>> Return count(*)
>>> 
>>> I'm at a conference in Amsterdam this week
>>> but perhaps we can do a skype call next week?
>>> 
>>> Michael
>>> 
>>> 
>>> 
>>> Sent from mobile device
>>> 
>>> Am 17.06.2014 um 18:48 schrieb Paul Damian <paulda...@gmail.com>:
>>> 
>>>> Yes, I do. I keep getting Java heap space error now. I'm using 100 commit 
>>>> size.
>>>> 
>>>> marți, 17 iunie 2014, 19:28:05 UTC+3, Michael Hunger a scris:
>>>> Ok, cool and you have the indexes for both :City(Id) and :Client(Id) ?
>>>> 
>>>> 
>>>> Michael
>>>> 
>>>> Am 17.06.2014 um 18:15 schrieb Paul Damian <paulda...@gmail.com>:
>>>> 
>>>>> The first query returns 999996 which is the number of rows in the file 
>>>>> and the second one returns Neo.DatabaseError.Statement.ExecutionFailure
>>>>>  probably because of the null values. But then I run the following 
>>>>> command:
>>>>> LOAD CSV WITH HEADERS FROM "file:/Users/pauld/Documents/LOCATED_IN.csv" 
>>>>> AS c
>>>>>  MATCH (city:City { Id: toInt(c.CityId)})
>>>>> WHERE coalesce(c.CityId,"") <> ""
>>>>> RETURN count(*)
>>>>> 
>>>>> and I get 992980
>>>>> 
>>>>> 
>>>>> marți, 17 iunie 2014, 17:55:56 UTC+3, Michael Hunger a scris:
>>>>> No you can just filter out the lines with no cityid
>>>>> 
>>>>> Did you run my suggested commands?
>>>>> 
>>>>>>>> LOAD CSV WITH HEADERS FROM 
>>>>>>>> "file:/Users/pauld/Documents/LOCATED_IN.csv" AS c
>>>>>>>>  MATCH (client: Client { Id: toInt(c.Id)})
>>>>>>> RETURN count(*)
>>>>>>> 
>>>>>>>> LOAD CSV WITH HEADERS FROM 
>>>>>>>> "file:/Users/pauld/Documents/LOCATED_IN.csv" AS c
>>>>>>>>  MATCH (city: City { Id: toInt(c.CityId)})
>>>>>>> RETURN count(*)
>>>>> 
>>>>>> 
>>>>> 
>>>>>>>> LOAD CSV WITH HEADERS FROM 
>>>>>>>> "file:/Users/pauld/Documents/LOCATED_IN.csv" AS c
>>>>> return c
>>>>> limit 10
>>>>> 
>>>>>>> 
>>>>> 
>>>>> Am 17.06.2014 um 16:37 schrieb Paul Damian <paulda...@gmail.com>:
>>>>> 
>>>>>> in the file I only have 2 columns, one for client id, which is always 
>>>>>> not null and CityId, which may be sometimes null. Should I export the 
>>>>>> records from SQL database leaving out the columns that contain null 
>>>>>> values?
>>>>>> 
>>>>>> marți, 17 iunie 2014, 15:39:14 UTC+3, Michael Hunger a scris:
>>>>>> if they don't have a value for city id, do they then have empty columns 
>>>>>> there still? like "user-id,,
>>>>>> 
>>>>>> You probably want to filter these rows?
>>>>>> 
>>>>>>>> LOAD CSV WITH HEADERS FROM 
>>>>>>>> "file:/Users/pauld/Documents/LOCATED_IN.csv" AS c
>>>>>> WHERE coalesce(c.CitiId,"") <> ""
>>>>>> ...
>>>>>> 
>>>>>> Am 17.06.2014 um 11:23 schrieb Paul Damian <paulda...@gmail.com>:
>>>>>> 
>>>>>>> Well, the csv file contains some rows that do not have a value for 
>>>>>>> CityId, and the rows are unique regarding the clientID. There are 11M 
>>>>>>> clients living in 14K Cities. Is there a limit of links/node?
>>>>>>> Now I've created a piece of code that reads from file and creates each 
>>>>>>> relationship, but, as you can imagine, it works really slow in this 
>>>>>>> scenario.
>>>>>>>  
>>>>>>> did you create an index on :Client(Id) and :City(Id)
>>>>>>> 
>>>>>>> what happens if you do:
>>>>>>> 
>>>>>>>> LOAD CSV WITH HEADERS FROM 
>>>>>>>> "file:/Users/pauld/Documents/LOCATED_IN.csv" AS c
>>>>>>>>  MATCH (client: Client { Id: toInt(c.Id)})
>>>>>>> RETURN count(*)
>>>>>>> 
>>>>>>>> LOAD CSV WITH HEADERS FROM 
>>>>>>>> "file:/Users/pauld/Documents/LOCATED_IN.csv" AS c
>>>>>>>>  MATCH (city: City { Id: toInt(c.CityId)})
>>>>>>> RETURN count(*)
>>>>>>> 
>>>>>>> each count should be equivalent to the # of rows in the file.
>>>>>>> 
>>>>>>> Michael
>>>>>>> 
>>>>>>> Am 16.06.2014 um 17:47 schrieb Paul Damian <paulda...@gmail.com>:
>>>>>>> 
>>>>>>>> Somehow I've managed to load all the nodes and now I'm trying to load 
>>>>>>>> the links as well. I read the nodes from csv file and create the 
>>>>>>>> relation between them. I run the following command:
>>>>>>>> USING PERIODIC COMMIT 100 
>>>>>>>>  LOAD CSV WITH HEADERS FROM 
>>>>>>>> "file:/Users/pauld/Documents/LOCATED_IN.csv" AS c
>>>>>>>>  MATCH (client: Client { Id: toInt(c.Id)}), (city: City { Id: 
>>>>>>>> toInt(c.CityId)})
>>>>>>>>  CREATE (client)-[r:LOCATED_IN]->(city)
>>>>>>>> 
>>>>>>>> Running with a smaller commit size returns this error 
>>>>>>>> Neo.DatabaseError.Statement.ExecutionFailure, while increasing the 
>>>>>>>> commit size to 10000 throws Neo.DatabaseError.General.UnknownFailure. 
>>>>>>>> Can you help me with this?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> joi, 5 iunie 2014, 12:05:18 UTC+3, Michael Hunger a scris:
>>>>>>>> Perhaps something with field or line terminators?
>>>>>>>> 
>>>>>>>> I assume it blows up the field separation.
>>>>>>>> 
>>>>>>>> Try to run:
>>>>>>>> 
>>>>>>>> LOAD CSV WITH HEADERS FROM "file:/Users/pauld/Documents/Client.csv" AS 
>>>>>>>> c
>>>>>>>> RETURN { Id: toInt(c.Id), FirstName: c.FirstName, LastName: 
>>>>>>>> c.Lastname, Address: c.Address, ZipCode: toInt(c.ZipCode), Email: 
>>>>>>>> c.Email, Phone: c.Phone, Fax: c.Fax, BusinessName: c.BusinessName, 
>>>>>>>> URL: c.URL, Latitude: toFloat(c.Latitude), Longitude: 
>>>>>>>> toFloat(c.Longitude), AgencyId: toInt(c.AgencyId), RowStatus: 
>>>>>>>> toInt(c.RowStatus)} as data, c as line
>>>>>>>> LIMIT 3
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Thu, Jun 5, 2014 at 10:51 AM, Paul Damian <paulda...@gmail.com> 
>>>>>>>> wrote:
>>>>>>>> I've tried using the shell and I get the same results: nodes with no 
>>>>>>>> properties.
>>>>>>>> I've created the csv file using MsSQL Server Export. Is it relevant?
>>>>>>>> 
>>>>>>>> About you curiosity: I figured I would import first the nodes, then 
>>>>>>>> the relationships from the connection tables. Am I doing it wrong?
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> 
>>>>>>>> joi, 5 iunie 2014, 09:54:31 UTC+3, Michael Hunger a scris:
>>>>>>>> I'd probably use a commit size in your case of 50k or 100k.
>>>>>>>> 
>>>>>>>> Try to use the neo4j-shell and not the web-interface.
>>>>>>>> 
>>>>>>>> Connect to neo4j using bin/neo4j-shell
>>>>>>>> 
>>>>>>>> Then run your commands ending with a semicolon.
>>>>>>>> 
>>>>>>>> Just curious: Your data is imported as one node per row? That's not 
>>>>>>>> really a graph structure.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Jun 4, 2014 at 6:56 PM, Paul Damian <paulda...@gmail.com> 
>>>>>>>> wrote:
>>>>>>>> Hi there,
>>>>>>>> 
>>>>>>>> I'm experimenting with Neo4j while benchmarking a bunch of NoSQL 
>>>>>>>> databases for my graduation paper. 
>>>>>>>> I'm using the web interface to populate the database. I've been able 
>>>>>>>> to load the smaller tables from my SQL database and LOAD CSV works 
>>>>>>>> fine.
>>>>>>>> By small, I mean a few columns (4-5) and some rows (1 million). 
>>>>>>>> However, when I try to upload a larger table (15 columns, 12 million 
>>>>>>>> rows), it creates the nodes but it doesn't set any properties.
>>>>>>>> I've tried to reduce the number of records (to 100) and also the 
>>>>>>>> number of columns( just the Id property ), but no luck so far.
>>>>>>>> 
>>>>>>>> The cypher command used is this one
>>>>>>>> USING PERIODIC COMMIT 100
>>>>>>>> LOAD CSV WITH HEADERS FROM "file:/Users/pauld/Documents/Client.csv" AS 
>>>>>>>> c
>>>>>>>> CREATE (:Client { Id: toInt(c.Id), FirstName: c.FirstName, LastName: 
>>>>>>>> c.Lastname, Address: c.Address, ZipCode: toInt(c.ZipCode), Email: 
>>>>>>>> c.Email, Phone: c.Phone, Fax: c.Fax, BusinessName: c.BusinessName, 
>>>>>>>> URL: c.URL, Latitude: toFloat(c.Latitude), Longitude: 
>>>>>>>> toFloat(c.Longitude), AgencyId: toInt(c.AgencyId), RowStatus: 
>>>>>>>> toInt(c.RowStatus)})
>>>>>>>> 
>>>>>>>> Any help and indication is welcomed,
>>>>>>>> Paul
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>> Groups "Neo4j" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>>>>> an email to neo4j+un...@googlegroups.com.
>>>>>>>> 
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>> Groups "Neo4j" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>>>>> an email to neo4j+un...@googlegroups.com.
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>> Groups "Neo4j" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>>>>> an email to neo4j+un...@googlegroups.com.
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "Neo4j" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>>>> an email to neo4j+un...@googlegroups.com.
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "Neo4j" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>>> an email to neo4j+un...@googlegroups.com.
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>> 
>>>>> 
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google Groups 
>>>>> "Neo4j" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>>> email to neo4j+un...@googlegroups.com.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>> 
>>>> 
>>>> -- 
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "Neo4j" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>> email to neo4j+un...@googlegroups.com.
>>>> For more options, visit https://groups.google.com/d/optout.
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to neo4j+un...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to neo4j+un...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to neo4j+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] LOAD CSV creates nodes but does not set properties

Reply via email to