Jim Apple has posted comments on this change. Change subject: IMPALA-2428: Support multiple-character string as the field delimiter ......................................................................
Patch Set 4: (8 comments) http://gerrit.cloudera.org:8080/#/c/3314/4//COMMIT_MSG Commit Message: Line 15: 2. tuple delimiter could not be the first byte of field delimiter What happens today if tuple delimiters and field delimiters are the same character? Same question about escape character. Line 17: 4. terminators could not contains '\0' Today, are terminators allowed to contain '\0'? http://gerrit.cloudera.org:8080/#/c/3314/4/be/src/exec/delimited-text-parser-test.cc File be/src/exec/delimited-text-parser-test.cc: Line 173: Validate(&single_null_no_escape_parser, string("a\0b|c\0d|e\0f", 11), 4, TUPLE_DELIM, 1, 3); long line (here and elsewhere). Consider using clang-format to help you find and fix these, though please do no clang format code you are not changing. http://gerrit.cloudera.org:8080/#/c/3314/1/be/src/exec/delimited-text-parser.cc File be/src/exec/delimited-text-parser.cc: Line 46: DCHECK(tuple_delim != field_delim[0]); > size of field_delim could not be 0. We have checked that in sql-parser.y an I see you have taken that out of the parser now. Why is that? http://gerrit.cloudera.org:8080/#/c/3314/1/fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java File fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java: Line 268: String fieldDelim = StringEscapeUtils.unescapeJava(rowFormat_.getFieldDelimiter()); > Just as parseDelim(), it turns escaped string into un-escaped string.For ex Why isn't it already used elsewhere? http://gerrit.cloudera.org:8080/#/c/3314/1/fe/src/main/java/com/cloudera/impala/catalog/HdfsStorageDescriptor.java File fe/src/main/java/com/cloudera/impala/catalog/HdfsStorageDescriptor.java: Line 171: if (escapeChar == (byte)fieldDelim.charAt(0) || > Done I do not see that fix in this code. http://gerrit.cloudera.org:8080/#/c/3314/4/tests/query_test/test_delimited_text.py File tests/query_test/test_delimited_text.py: Line 56: def test_create_table_rowformat(self, vector, unique_database): Please name this more appropriately - it only tests failure modes right now, so please name it as such. Line 61: 4. terminator could not contain \0.""" Please add these notes on each test so we know what is being tested in each place. -- To view, visit http://gerrit.cloudera.org:8080/3314 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id1437ca35dc4fdc58a7db1c2c70d4da30adf0c3e Gerrit-PatchSet: 4 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Yuanhao Luo <luoyuan...@software.ict.ac.cn> Gerrit-Reviewer: Jim Apple <jbap...@cloudera.com> Gerrit-Reviewer: Yuanhao Luo <luoyuan...@software.ict.ac.cn> Gerrit-HasComments: Yes