Hello list,
A colleague of mine encountered an issue in the Raptor source that affects
automatic parser guessing on Windows, leading to behavior inconsistent
with Linux builds.
In raptor_internal.h, RAPTOR_READ_BUFFER_SIZE is defined as BUFSIZ if the
latter is defined. On my Linux system, this value is 8192.
In raptor_parse.c, in raptor_world_guess_parser_name(), FIRSTN is the
number of bytes at the beginning of a document that the code should
analyze for syntax recognition. This is defined to 1024, a value small
enough to avoid documents that contain HTML/XML examples (per the
preceding comment).
The problem is that on Windows, BUFSIZ is defined to 512, and thus so is
RAPTOR_READ_BUFFER_SIZE. Raptor does not buffer more than this many bytes
at a time (see struct raptor_parser_s.buffer), and so when syntax
recognition is enabled, Raptor is only looking at the first 512 bytes of
the document on Windows, compared to 1024 on Linux. Which can lead to
differing results, as my colleague found.
The attached patch provides (1) a compile-time check in raptor_parse.c to
ensure that RAPTOR_READ_BUFFER_SIZE is at least as large as FIRSTN, and
(2) a change to raptor_internal.h to use BUFSIZ only if it is greater than
4096 (this being the default value used if BUFSIZ is undefined).
Questions and comments are welcome.
--Daniel
--
Daniel Richard G. || [email protected] || Software Developer
Teragram Linguistic Technologies (a division of SAS)
http://www.teragram.com/
diff --git a/src/raptor_internal.h b/src/raptor_internal.h
index 5920150..5af98e8 100644
--- a/src/raptor_internal.h
+++ b/src/raptor_internal.h
@@ -444,7 +444,7 @@ raptor_namespace** raptor_namespace_stack_to_array(raptor_namespace_stack *nstac
/* Size of buffer to use when reading from a file */
-#ifdef BUFSIZ
+#if defined(BUFSIZ) && BUFSIZ > 4096
#define RAPTOR_READ_BUFFER_SIZE BUFSIZ
#else
#define RAPTOR_READ_BUFFER_SIZE 4096
diff --git a/src/raptor_parse.c b/src/raptor_parse.c
index e2f8704..6642c55 100644
--- a/src/raptor_parse.c
+++ b/src/raptor_parse.c
@@ -1319,6 +1319,9 @@ raptor_world_guess_parser_name(raptor_world* world,
* RDF/XML examples
*/
#define FIRSTN 1024
+#if FIRSTN > RAPTOR_READ_BUFFER_SIZE
+#error RAPTOR_READ_BUFFER_SIZE is not large enough
+#endif
if(buffer && len && len > FIRSTN) {
c = buffer[FIRSTN];
((char*)buffer)[FIRSTN] = '\0';
_______________________________________________
redland-dev mailing list
[email protected]
http://lists.librdf.org/mailman/listinfo/redland-dev