Hello, I am new to using OrientDB. I am using version 1.7 rc2.  I am 
building a graph model using the code snippet below.  The code iterates 
through a directory containing *.csv files.  Each directory name denotes 
and exchange name.  Each exchange can contain 100s or 1000s of *.csv files. 
 Each *.csv file is an instrument's name.  So the desired model to build 
for this use case looks like:

[vertex] exchange -->[edge] lists --> [vertex] instrument -->[edge] 
snapshot --> [vertex] date --> [edge] snapshot --> [vertex] [7 properties] 
eod

For this test case I used 106 files of smaller size totaling 14 MB on disk. 
 After processing with the above model the on disk database size (with du 
-hc on Mac OS X) is 3.9 GB.

My concern is there are over 64,000 files to process totaling 5.33 GB of 
text data.

Am I doing something wrong in the model/relationships etc. or is there an 
optimization I can use?

<code snippet>

    *val* dir = *new* File(directory.get)

    *val* dirs = subdirs(dir)

    *var* exchange: Vertex = *null*

    *var* instrument: Vertex = *null*

    *var* eod: Vertex = *null*

    *var* date: Vertex = *null*

    *var* source: Source = *null*

    *var* linesIterator: Iterator[String] = *null*

    // Graph handle

    *val* graph = factory.getNoTx()


    *try* {

      *for* (d <- dirs) {

        println(*"Exchange: "* + d.getName())

        //Create a new vertex for each Exchange

        exchange = graph.addVertex()

        exchange.setProperty(*"name"*, d.getName())

        graph.getRawGraph().declareIntent(*new* OIntentMassiveInsert())

        //Iterate through the files in the directory

        *for* (f <- d.listFiles() *if* (selected.get.contains(d.getName))) {

          instrument = graph.addVertex()

          instrument.setProperty(*"symbol"*, f.getName().split(*""".csv"""*
)(0))

          //Add and edge from the exchange vertex to the instrument vertex

          exchange.addEdge(*"lists"*, instrument)

          source = Source.fromFile(f)

          linesIterator = source.getLines()

          *var* count = 0

          //Iterate through the lines in the file

          *for* (v <- linesIterator) {

            *if* (count < 1) {

              count += 1

            } *else* {

              *var* data = v.split(*","*)

              *val* size = data.size

              *if* (size < 7) {

                *val* insert = *new* Array[String](7)

                *for* (i <- 0 until 7) {

                  *if* (i >= size) {

                    insert(i) = *""*

                  } *else* {

                    insert(i) = data(i)

                  }

                }

                data = insert

              }

              date = graph.addVertex()

              instrument.addEdge(*"snapshots"*, date)

              eod = graph.addVertex()

              ElementHelper.setProperties(eod, *"date"*, data(0), *"open"*, 
doubleValue(data(1)).get, *"high"*

                  ,doubleValue(data(2)).get, *"low"*, doubleValue(data(3)).get, 
*"close"*, doubleValue(data(4)).get

                  , *"volume"*, longValue(data(5)).get, *"adjClose"*, 
doubleValue(data(6)).get)

              date.addEdge(*"measure"*, eod)

              date.setProperty(*"date"*, data(0))

            }

          }

          graph.commit()

          source.close()

        }

        instrument = *null*

        eod = *null*

        graph.getRawGraph().declareIntent(*null*)

      }

    }
</code snippet>

Thanks for any responses.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to