Re: [orientdb] Re: 14MB of text data expands to over 3.9 GB

Artem Orobets Fri, 04 Apr 2014 01:38:33 -0700

Hi,

Andrey, you are right, however "disk cache" might not be exact answer to
the question.


@odbuser, the additional disk space is consumed by write ahead log, that is
used to make sure durability of data persistence. So when the server is
powered off or db process is killed, the log is used to restore data that
has not been flushed to disk.
This log is automatically shrieked after reaching a threshold.
Unfortunately, for general case this threshold can't be safely (for
durability) decreased until issue
#1603<https://github.com/orientechnologies/orientdb/issues/1603>will
be implemented.

However, on devices that doesn't have so much memory you can decrease that
threshold, or disable WAL at all.

Take a look at following configuration options:

   - storage.useWAL
   - storage.wal.maxSize



Best regards,
Artem Orobets

* Orient Technologiesthe Company behind OrientDB*


2014-04-04 2:37 GMT+03:00 odbuser <[email protected]>:

> Is there a way to control the size of the disk cache?  If this size is
> required for that amount of data, it needs to be clear in the OrientDB
> requirements.  On some systems, 4GB is not an issue but I also use much
> smaller devices that will never have that much disk available.
>
>
> On Thursday, April 3, 2014 11:36:17 AM UTC-4, Andrey Yesyev wrote:
>>
>> That's okay! It won't grow significantly bigger then 4Gb. If you shutdown
>> DB, the size will be a couple Mb at most. What you're seeing now, it's
>> included disk cache.
>>
>> On Thursday, April 3, 2014 10:47:13 AM UTC-4, IQH wrote:
>>>
>>> Hello, I am new to using OrientDB. I am using version 1.7 rc2.  I am
>>> building a graph model using the code snippet below.  The code iterates
>>> through a directory containing *.csv files.  Each directory name denotes
>>> and exchange name.  Each exchange can contain 100s or 1000s of *.csv files.
>>>  Each *.csv file is an instrument's name.  So the desired model to build
>>> for this use case looks like:
>>>
>>> [vertex] exchange -->[edge] lists --> [vertex] instrument -->[edge]
>>> snapshot --> [vertex] date --> [edge] snapshot --> [vertex] [7 properties]
>>> eod
>>>
>>> For this test case I used 106 files of smaller size totaling 14 MB on
>>> disk.  After processing with the above model the on disk database size
>>> (with du -hc on Mac OS X) is 3.9 GB.
>>>
>>> My concern is there are over 64,000 files to process totaling 5.33 GB of
>>> text data.
>>>
>>> Am I doing something wrong in the model/relationships etc. or is there
>>> an optimization I can use?
>>>
>>> <code snippet>
>>>
>>>     *val* dir = *new* File(directory.get)
>>>
>>>     *val* dirs = subdirs(dir)
>>>
>>>     *var* exchange: Vertex = *null*
>>>
>>>     *var* instrument: Vertex = *null*
>>>
>>>     *var* eod: Vertex = *null*
>>>
>>>     *var* date: Vertex = *null*
>>>
>>>     *var* source: Source = *null*
>>>
>>>     *var* linesIterator: Iterator[String] = *null*
>>>
>>>     // Graph handle
>>>
>>>     *val* graph = factory.getNoTx()
>>>
>>>
>>>     *try* {
>>>
>>>       *for* (d <- dirs) {
>>>
>>>         println(*"Exchange: "* + d.getName())
>>>
>>>         //Create a new vertex for each Exchange
>>>
>>>         exchange = graph.addVertex()
>>>
>>>         exchange.setProperty(*"name"*, d.getName())
>>>
>>>         graph.getRawGraph().declareIntent(*new* OIntentMassiveInsert())
>>>
>>>         //Iterate through the files in the directory
>>>
>>>         *for* (f <- d.listFiles() *if* (selected.get.contains(d.getName)))
>>> {
>>>
>>>           instrument = graph.addVertex()
>>>
>>>           instrument.setProperty(*"symbol"*, f.getName().split(
>>> *""".csv"""*)(0))
>>>
>>>           //Add and edge from the exchange vertex to the instrument
>>> vertex
>>>
>>>           exchange.addEdge(*"lists"*, instrument)
>>>
>>>           source = Source.fromFile(f)
>>>
>>>           linesIterator = source.getLines()
>>>
>>>           *var* count = 0
>>>
>>>           //Iterate through the lines in the file
>>>
>>>           *for* (v <- linesIterator) {
>>>
>>>             *if* (count < 1) {
>>>
>>>               count += 1
>>>
>>>             } *else* {
>>>
>>>               *var* data = v.split(*","*)
>>>
>>>               *val* size = data.size
>>>
>>>               *if* (size < 7) {
>>>
>>>                 *val* insert = *new* Array[String](7)
>>>
>>>                 *for* (i <- 0 until 7) {
>>>
>>>                   *if* (i >= size) {
>>>
>>>                     insert(i) = *""*
>>>
>>>                   } *else* {
>>>
>>>                     insert(i) = data(i)
>>>
>>>                   }
>>>
>>>                 }
>>>
>>>                 data = insert
>>>
>>>               }
>>>
>>>               date = graph.addVertex()
>>>
>>>               instrument.addEdge(*"snapshots"*, date)
>>>
>>>               eod = graph.addVertex()
>>>
>>>               ElementHelper.setProperties(eod, *"date"*, data(0),
>>> *"open"*, doubleValue(data(1)).get, *"high"*
>>>
>>>                   ,doubleValue(data(2)).get, *"low"*, 
>>> doubleValue(data(3)).get,
>>> *"close"*, doubleValue(data(4)).get
>>>
>>>                   , *"volume"*, longValue(data(5)).get, *"adjClose"*,
>>> doubleValue(data(6)).get)
>>>
>>>               date.addEdge(*"measure"*, eod)
>>>
>>>               date.setProperty(*"date"*, data(0))
>>>
>>>             }
>>>
>>>           }
>>>
>>>           graph.commit()
>>>
>>>           source.close()
>>>
>>>         }
>>>
>>>         instrument = *null*
>>>
>>>         eod = *null*
>>>
>>>         graph.getRawGraph().declareIntent(*null*)
>>>
>>>       }
>>>
>>>     }
>>> </code snippet>
>>>
>>> Thanks for any responses.
>>>
>>  --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Re: 14MB of text data expands to over 3.9 GB

Reply via email to