Hello Dave,

On 2012/07/10 1:01, Dave Beckett wrote:
Nick is correct about the serializer but the question was about the turtle
parser, and it is also valid.

The Raptor turtle (n3, trig) parser relies on flex and bison (aka lex+yacc)
of which  bison:
a) has to have the entire input in memory in one block in order to parse

This is really the first time I hear something like this about bison. flex definitely doesn't need all its input in memory, it has a well-organized buffer mechanism (check for YY_BUFFER_STATE, yyin, yy_scan_string, YY_INPUT,...). Therefore, bison can't require to have the whole input in memory. There may be an application/implementation-specific reason for having everything in memory in raptor, but that would be a different story.

Regards,   Martin.

b) uses 32 bit unsigned int offsets

So Raptor has to assemble the input in memory (lots of alloc / realloc) and
end up with a max 2G size.  A 5G file is not going to parse.

I have looked at fixing this several times but writing a streaming lexer
and parser is damn hard - months of work.  Using ANTLR and other things
that do the same job looks like it would make things a lot more complex
(it's C++).  I've also tried looking at sqlite's lemon but it doesn't stream
so it seems the only road to this is a lot of work.

Dave


On 7/9/12 1:30 AM, Nicholas Humfrey wrote:
Hello,

Yes, the Turtle serialiser puts everything into RAM, in order to build a tree 
of the data and out a nice pretty file, with all the triples with the same 
subject next to each other.

If you output as ntriples, then output will be much faster and it won't try and 
load everything into RAM.

nick.


On 9 Jul 2012, at 02:15, Medha Atre wrote:

Hello,

I am trying to use the Raptor RDF parser library to parse a very large RDF/XML file of 
LUBM dataset (synthetically generated) and convert it into Turle representation. The 
gzipped format of RDF/XML file itself is 5.1 GB (I am reading its input through a fifo 
and "rapper" reads from this fifo).

When I run "rapper" command to convert RDF/XML into Turtle on this file, the 
memory utilization shoots up very high (it consumes almost all of my RAM leaving me 
unable to do anything else on the computer).

I was wondering if there is any option to restrict the memory used by "rapper" tool? I checked 
"configure" and "rapper --help", but didn't find any such option.

Can someone please let me know what the best and easiest workaround for this?

Thanks.

Medha

_______________________________________________
redland-dev mailing list
[email protected]
http://lists.librdf.org/mailman/listinfo/redland-dev

_______________________________________________
redland-dev mailing list
[email protected]
http://lists.librdf.org/mailman/listinfo/redland-dev


_______________________________________________
redland-dev mailing list
[email protected]
http://lists.librdf.org/mailman/listinfo/redland-dev

_______________________________________________
redland-dev mailing list
[email protected]
http://lists.librdf.org/mailman/listinfo/redland-dev

Reply via email to