Perhaps you should show the statement too? Not just the log output? :) use this: CREATE INDEX ON :{Label}(LC_ID); <- replace with your label(s)
On Fri, Dec 5, 2014 at 12:09 AM, José F. Morales <josef...@gmail.com> wrote: > Andrii and Michael, > > Sorry for the delay in response. I was a little under the weather. > ANYHOW, it looks like I figured out how to do the data loading! I was > trying several approaches and the one using Michael's shell tools seems to > have worked! There were info from Andrii that proved important as well! > (my_node_ID as integer). The loading of the 18k NODES was in seconds. When > I tested the RELS with a tiny data set it worked perfectly. I am cleaning > up the 52k RELS file after the first attempt failed because of a missing " > ' ". > > My only issue is that the RELs loading is slow.... > > commit after 1000 row(s) 0. 1%: nodes = 0 rels = 1000 properties = 7000 > time 7059450 ms total 7059450 ms > > Now I thought that if I created an index (below), it would be faster. > Apparently not. > > neo4j-sh (?)$ auto-index LC_ID > > Enabling auto-indexing of Node properties: [LC_ID] > > Do I have this wrong? Should it have been CREATE INDEX ON :LC_ID? > > Jose > > > On Monday, December 1, 2014 5:09:36 PM UTC-5, Andrii Stesin wrote: >> >> Hi José, >> >> On Monday, December 1, 2014 12:33:58 AM UTC+2, José F. Morales wrote: >>> >>> Ok, but how many valid distinct combinations of your 10 node labels may >>>> exist? >>>> >>> >>> JFM: 264 >>> >> >> This makes me think that maybe your target data model needs some >> refactoring. What are the entities (classes), and what can be better >> considered as attributes? Again, I'm not familiar with LabCard, so in case >> you give some explanations and a sample dataset which is publicly >> available, I'd take a close look at it. >> >> >>> JFM: Like I said, there are 264 unique combinations in all my nodes. >>>> Some are redundant, full spelling of a term/phrase and an abbreviation. >>>> Some are a code for a term/phrase. Some were created in anticipation of >>>> others values I would create later. I am trying to anticipate queries I'll >>>> make later. >>>> >>> >> Once again, I foresee a data modelling issue here. >> >> >>> JFM: Makes sense for speed. I guess it depends upon the size of one's >>>>> data. >>>>> >>>> >> Sure it does :) >> >> >>> Q3: “Skewer” is just an integer right? It corresponds in a way to >>>>> my_node_id >>>>> >>>> >>>> No, it's a label! so in Cypher your node (suppose it has 2 labels >>>> :LabelA and :LabelJ ) is described like >>>> >>>> MATCH (n:LabelA:LabelJ:Skewer {my_node_id: 123454, p1: 'something', p2: >>>> 'something >>>> else', p3: 'etc.'}) >>>> >>>> >>> JFM: Got that! >>> >>> JFM: ok basic question... MATCH (n: <---What is "n"? Does it just >>> indicate that its a node of a particular class? What letter it is is >>> arbitrary right? Is there a name for what "n" is? For a while there, I >>> thought it was *my_node_ID. * >>> >> >> *n* is just a name of the variable. Cypher, like any other programming >> language, has a notion of "variable" which has it's name and which cat take >> different values; here I've choose *n* just occasionally for the >> variable name. >> >> >>> Q4: So does repeating the LOAD CSV with each file CLT_NODES_LabelA…J >>>>> combine the various labels and their respective values with their >>>>> corresponding nodes? >>>>> >>>> >>>> Label is not a variable, it does not have a value. It's just a label, >>>> consider "tag". >>>> Also *my_node_id* IS a variable so it does have a value. >>>> >>> >>> JFM: OK, I am not understanding this. I understood a "Label" as a >>> general category for a node. >>> >> >> That's Ok, or maybe even better is to imagine a tag. Node may have >> multiple tags (labels), they can be added and/or removed. >> >> >>> This was as opposed to a "Property" that was specific to a particular >>> node. As I understood it, a "Label" has different values. >>> >> >> Label is just a label. It doesn't have any value itself, it just marks >> (tags) some (sub)set of your nodes and allows you to distinguish between >> them. Labels may overlap. Consider automotive domain, and let's take a look >> for data model for it. >> >> Brand seems to better be modelled as a label. Say `Opel`, `Volvo` or >> `Peugeout`. >> Kind of vehicle is definitely(???) a label. Say `Truck`, `SUV`, `Car`. >> How to model some deeper things, depends on what you are going to achieve. >> Is body color a label or property? Which approach is better: either >> >> MATCH (vhcl:Truck:Volvo {body_color: 'red', VIN: 'VE18727673826812634X65' >> }) >> >> or >> >> MATCH (vhcl:Opel:Yellow:SUV {VIN: 'VE18727673826812634X65'}) >> >> ? I'm not sure, it depends on the goal, as for me I'd prefer color to be >> a property of some exact single car (once you can decide to paint your >> yellow car in white or some other color, after all) >> >> But VIN is *definitely* a property of one exact single car. >> >> Is car license plate a label or property? Definitely none of either, >> because you can sell your car and new owner will get another license plate >> for it, so I'd model this as >> >> MATCH (vhcl:Car:Ford {body_color: 'pink', VIN: 'FGT87356873HU8745'})-[: >> HAS_LICENSE_PLATE]->(lp:LicensePlate {state: 'AL', str: 'WH4TWR'}) >> >> >> but as you see `LicensePlate` obviously should not be ever mixed with >> either `Car` or `Truck`, so they are different labels which do not >> intersect. >> >> So that Label could be "Category" and there could be two categories, for >>> example... CLT_SOURCE and CLT_TARGET . I thought that makes it like a >>> variable. If not, the label is all the same on a given set of nodes and >>> what's the point in that? >>> >>> JFM: OK, I get that *my_node_id *is a variable. >>> >> >> Agh, exactly. >> >> >>> >>>> 1. When doing LabelA .csv you will create whatever uniquely >>>> numbered nodes were not already in the database, fill their properties >>>> (or >>>> maybe overwrite them?) and label the node (be it new or existing one) >>>> with >>>> LabelA - no matter what other labels did node (possibly) have, >>>> >>>> JFM: OK. I get it. >>> >>>> >>>> 1. When doing LabelJ .csv you *again *will create whatever uniquely >>>> numbered nodes were not already in the database, *again* either >>>> fill or overwrite propertiers, and *again* label the node (be it >>>> new or existing one) with LabelJ - no matter what other labels did node >>>> (possibly) have, >>>> >>>> JFM: OK. I get it. >>> >>>> >>>> 1. so if you created some node with first file and labeled it >>>> LabelA, if the same unique *my_node_id *occur both in first and >>>> second files, your node will get 2 labels LabelA and LabelJ. >>>> >>>> JFM: That's wha tI want!! >>> >> >> Huh, Ok so far :) >> >> >>> Q5: Since I think of my data in terms of the two classes of nodes in my >>>>> Data model …[CLT_SOURCE —> CLT_TARGET ; CLT_TARGET —> CLT_SOURCE], >>>>> after >>>>> loading the nodes, how then I get two classes of nodes? >>>>> >>>> >>>> Make them 2 labels: CLTSource and CLTTarget respectively. >>>> >>> >>> JFM: OK. Regarding the labels...my csv file has a column called DESC >>> that has two values CLT_SOURCE and CLT_TARGET. You are saying that my >>> Source cvs should have a CLT_SOURCE column and my target csv >>> should have a CLT_TARGET column? My csv files should NOT a >>> configuration as I described? >>> >> >> What does CLT really mean in the real life? I failed to parse :( sorry >> for that. Once again, in case you describe the LabCard domain and provide >> me with a dataset, I'd be able to make you some better ideas (this also may >> become a good tutorial sample case for future Neo4j users). >> >> >>> JFM: Since my csv file has its A thru J columns A (2) values, B (1), C >>> (4) D (83), E (83), F (11) G (11) H (83) J (83), K (2), I should have ALOT >>> of csv files instead of just two for nodes! >>> >> >> Again, I strongly suspect a data modelling issue here. >> >> >>> JFM: What I am not getting from this is there is one csv file that has >>>>> the CLTSOURCE and CLTTARGET labels in it. That contradicts what I said >>>>> above because that would make only 1 csv file. I assume this there is one >>>>> LOAD CSV statement and the my_node_ID:TOINT(csvline(0)}) and >>>>> my_node_ID:TOINT(csvline(1)}) refer presumably to two lines in that file. >>>>> >>>> >> As soon as you have both src and target nodes already inside the >> database, you need a .csv file which describes only relationships in terms >> of 1st column contains src nodes ids, 2d column contains dst nodes ids and >> thus 1 row of .csv describes 1 single relationship per (linked) pair of >> nodes. >> >> For .csv with relationships, csvline[0] is a value of *my_node_id *property >>>>>> of the *source* node, csvline[1] is a value of *my_node_id *property >>>>>> of the *target* node, and TOINT() type conversion is used because my >>>>>> personal preference is to use integers for ids. >>>>>> >>>>> >>>> >>>>> Is it that ToInt(csvline[0]} refers to the a line of the REL.csv file? >>>>> >>>>> >>>>> Does csvline[0] refer to a column in REL.csv as do csvline[2] and >>>>> csvline[ZZ] (line 3) ? >>>>> >>>> >>>> >>> JFM: OK, I think I get it. >>> >>> >>>> I think you can combine import of multiple .CSV files in a single LOAD >>>> CSV statement but I didn't ever try this mode. >>>> >>>> WBR, >>>> Andrii >>>> >>>> >>> >>> JFM: Thanks! >>> >> >> :) >> >> WBR, >> Andrii >> > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to neo4j+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.