[
https://issues.apache.org/jira/browse/HADOOP-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525046
]
Hadoop QA commented on HADOOP-1758:
-----------------------------------
+1
http://issues.apache.org/jira/secure/attachment/12364812/1758_01.patch applied
and successfully tested against trunk revision r572826.
Test results:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/688/testReport/
Console output:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/688/console
> processing escapes in a jute record is quadratic
> ------------------------------------------------
>
> Key: HADOOP-1758
> URL: https://issues.apache.org/jira/browse/HADOOP-1758
> Project: Hadoop
> Issue Type: Bug
> Components: record
> Affects Versions: 0.13.0
> Reporter: Dick King
> Assignee: Vivek Ratan
> Priority: Blocker
> Fix For: 0.15.0
>
> Attachments: 1758_01.patch
>
>
> The following code appears in hadoop/src/c++/librecordio/csvarchive.cc :
> static void replaceAll(std::string s, const char *src, char c)
> {
> std::string::size_type pos = 0;
> while (pos != std::string::npos) {
> pos = s.find(src);
> if (pos != std::string::npos) {
> s.replace(pos, strlen(src), 1, c);
> }
> }
> }
> This is used in the context of replacing jute escapes in the code:
> void hadoop::ICsvArchive::deserialize(std::string& t, const char* tag)
> {
> t = readUptoTerminator(stream);
> if (t[0] != '\'') {
> throw new IOException("Errror deserializing string.");
> }
> t.erase(0, 1); /// erase first character
> replaceAll(t, "%0D", 0x0D);
> replaceAll(t, "%0A", 0x0A);
> replaceAll(t, "%7D", 0x7D);
> replaceAll(t, "%00", 0x00);
> replaceAll(t, "%2C", 0x2C);
> replaceAll(t, "%25", 0x25);
> }
> Since this replaces the entire string for each instance of the escape
> sequence, practically anything would be better. I would propose that within
> deserialize we allocate a char * [since each replacement is smaller than the
> original], scan for each %, and either do a general hex conversion in place
> or look for one of the six patterns, and after each replacement move down the
> unmodified text and scan for the % fom that starting point.
> -dk
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.