Hi All! I tried to use raptor-utils a while ago to convert very big files in ntriples format to turtle but it went too slow. I've discovered in the code that turtle serializer collects all triples in memory and outputs once in the end. (If this is not true please correct me.) I think this is done to make subjects appear only once. You shouldn't do this IMO because this means the performance is really BAD!
Turtle is really meant to be a stream format i.e. the serializer should not collect lots of triples. Collect triples while the subject is the same and write them down as soon as the subject changes. This is IMO the right way to do. If you want to optimize the output you can just use 'sort' on ntriples file before the conversion. sort does this job MUCH better. Sorry, I don't have a patch and I'm not going to write it because I don't use rapper anymore. But I decided to write about this issue because it was the only shortcoming I've noticed. Thanks for the great software! -- Alexander _______________________________________________ redland-dev mailing list [email protected] http://lists.librdf.org/mailman/listinfo/redland-dev
