Weak Performance of "application/json+rdf" serializer on big TripleCollections
and Serialzer/Parser using Platform encoding instead of UTF-8
--------------------------------------------------------------------------------------------------------------------------------------------
Key: CLEREZZA-643
URL: https://issues.apache.org/jira/browse/CLEREZZA-643
Project: Clerezza
Issue Type: Improvement
Reporter: Rupert Westenthaler
Both the "application/json+rdf" serializer and parser use platform specific
encodings instead of UTF-8.
In addition the serializer suffers from very poor performance on big graphs (at
least when using SimpleMGrpah)
After some digging in the Code I came to the conclusion that this is because of
the use of multiple TripleCollection.filter(..) calls fist to filter all
predicates for an subject and than all objects for each subject/predicate
combination. A trying to serialize a graph with 50k triples ended in several
minutes 100% CPU.
With the next comment I will provide a patch with an implementation based on a
sorted array of the triples. With this method one can serialize graphs with
100k in about 1sec. This patch also changes encoding to UTF-8.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira