Hello, I am new to using OrientDB. I am using version 1.7 rc2. I am
building a graph model using the code snippet below. The code iterates
through a directory containing *.csv files. Each directory name denotes
and exchange name. Each exchange can contain 100s or 1000s of *.csv files.
Each *.csv file is an instrument's name. So the desired model to build
for this use case looks like:
[vertex] exchange -->[edge] lists --> [vertex] instrument -->[edge]
snapshot --> [vertex] date --> [edge] snapshot --> [vertex] [7 properties]
eod
For this test case I used 106 files of smaller size totaling 14 MB on disk.
After processing with the above model the on disk database size (with du
-hc on Mac OS X) is 3.9 GB.
My concern is there are over 64,000 files to process totaling 5.33 GB of
text data.
Am I doing something wrong in the model/relationships etc. or is there an
optimization I can use?
<code snippet>
*val* dir = *new* File(directory.get)
*val* dirs = subdirs(dir)
*var* exchange: Vertex = *null*
*var* instrument: Vertex = *null*
*var* eod: Vertex = *null*
*var* date: Vertex = *null*
*var* source: Source = *null*
*var* linesIterator: Iterator[String] = *null*
// Graph handle
*val* graph = factory.getNoTx()
*try* {
*for* (d <- dirs) {
println(*"Exchange: "* + d.getName())
//Create a new vertex for each Exchange
exchange = graph.addVertex()
exchange.setProperty(*"name"*, d.getName())
graph.getRawGraph().declareIntent(*new* OIntentMassiveInsert())
//Iterate through the files in the directory
*for* (f <- d.listFiles() *if* (selected.get.contains(d.getName))) {
instrument = graph.addVertex()
instrument.setProperty(*"symbol"*, f.getName().split(*""".csv"""*
)(0))
//Add and edge from the exchange vertex to the instrument vertex
exchange.addEdge(*"lists"*, instrument)
source = Source.fromFile(f)
linesIterator = source.getLines()
*var* count = 0
//Iterate through the lines in the file
*for* (v <- linesIterator) {
*if* (count < 1) {
count += 1
} *else* {
*var* data = v.split(*","*)
*val* size = data.size
*if* (size < 7) {
*val* insert = *new* Array[String](7)
*for* (i <- 0 until 7) {
*if* (i >= size) {
insert(i) = *""*
} *else* {
insert(i) = data(i)
}
}
data = insert
}
date = graph.addVertex()
instrument.addEdge(*"snapshots"*, date)
eod = graph.addVertex()
ElementHelper.setProperties(eod, *"date"*, data(0), *"open"*,
doubleValue(data(1)).get, *"high"*
,doubleValue(data(2)).get, *"low"*, doubleValue(data(3)).get,
*"close"*, doubleValue(data(4)).get
, *"volume"*, longValue(data(5)).get, *"adjClose"*,
doubleValue(data(6)).get)
date.addEdge(*"measure"*, eod)
date.setProperty(*"date"*, data(0))
}
}
graph.commit()
source.close()
}
instrument = *null*
eod = *null*
graph.getRawGraph().declareIntent(*null*)
}
}
</code snippet>
Thanks for any responses.
--
---
You received this message because you are subscribed to the Google Groups
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.