Hello list,

A colleague of mine encountered an issue in the Raptor source that affects automatic parser guessing on Windows, leading to behavior inconsistent with Linux builds.

In raptor_internal.h, RAPTOR_READ_BUFFER_SIZE is defined as BUFSIZ if the latter is defined. On my Linux system, this value is 8192.

In raptor_parse.c, in raptor_world_guess_parser_name(), FIRSTN is the number of bytes at the beginning of a document that the code should analyze for syntax recognition. This is defined to 1024, a value small enough to avoid documents that contain HTML/XML examples (per the preceding comment).

The problem is that on Windows, BUFSIZ is defined to 512, and thus so is RAPTOR_READ_BUFFER_SIZE. Raptor does not buffer more than this many bytes at a time (see struct raptor_parser_s.buffer), and so when syntax recognition is enabled, Raptor is only looking at the first 512 bytes of the document on Windows, compared to 1024 on Linux. Which can lead to differing results, as my colleague found.

The attached patch provides (1) a compile-time check in raptor_parse.c to ensure that RAPTOR_READ_BUFFER_SIZE is at least as large as FIRSTN, and (2) a change to raptor_internal.h to use BUFSIZ only if it is greater than 4096 (this being the default value used if BUFSIZ is undefined).

Questions and comments are welcome.


--Daniel


--
Daniel Richard G. || [email protected] || Software Developer
Teragram Linguistic Technologies (a division of SAS)
http://www.teragram.com/
diff --git a/src/raptor_internal.h b/src/raptor_internal.h
index 5920150..5af98e8 100644
--- a/src/raptor_internal.h
+++ b/src/raptor_internal.h
@@ -444,7 +444,7 @@ raptor_namespace** raptor_namespace_stack_to_array(raptor_namespace_stack *nstac
 
 
 /* Size of buffer to use when reading from a file */
-#ifdef BUFSIZ
+#if defined(BUFSIZ) && BUFSIZ > 4096
 #define RAPTOR_READ_BUFFER_SIZE BUFSIZ
 #else
 #define RAPTOR_READ_BUFFER_SIZE 4096
diff --git a/src/raptor_parse.c b/src/raptor_parse.c
index e2f8704..6642c55 100644
--- a/src/raptor_parse.c
+++ b/src/raptor_parse.c
@@ -1319,6 +1319,9 @@ raptor_world_guess_parser_name(raptor_world* world,
        * RDF/XML examples
        */
 #define FIRSTN 1024
+#if FIRSTN > RAPTOR_READ_BUFFER_SIZE
+#error RAPTOR_READ_BUFFER_SIZE is not large enough
+#endif
       if(buffer && len && len > FIRSTN) {
         c = buffer[FIRSTN];
         ((char*)buffer)[FIRSTN] = '\0';
_______________________________________________
redland-dev mailing list
[email protected]
http://lists.librdf.org/mailman/listinfo/redland-dev

Reply via email to