[
https://issues.apache.org/jira/browse/ORC-233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136413#comment-16136413
]
ASF GitHub Bot commented on ORC-233:
------------------------------------
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/orc/pull/160#discussion_r134398544
--- Diff:
java/mapreduce/src/test/org/apache/orc/mapreduce/TestMapreduceOrcOutputFormat.java
---
@@ -153,11 +153,55 @@ public void testColumnSelection() throws Exception {
assertEquals(false, reader.nextKeyValue());
}
+ @Test
+ public void testColumnSelectionBlank() throws Exception {
+ String typeStr = "struct<i:int,j:int,k:int>";
+ OrcConf.MAPRED_OUTPUT_SCHEMA.setString(conf, typeStr);
+ conf.set("mapreduce.output.fileoutputformat.outputdir",
workDir.toString());
+ conf.setInt(OrcConf.ROW_INDEX_STRIDE.getAttribute(), 1000);
+ conf.setBoolean(OrcOutputFormat.SKIP_TEMP_DIRECTORY, true);
+ TaskAttemptID id = new TaskAttemptID("jt", 0, TaskType.MAP, 0, 1);
+ TaskAttemptContext attemptContext = new TaskAttemptContextImpl(conf,
id);
+ OutputFormat<NullWritable, OrcStruct> outputFormat =
+ new OrcOutputFormat<OrcStruct>();
+ RecordWriter<NullWritable, OrcStruct> writer =
+ outputFormat.getRecordWriter(attemptContext);
- /**
- * Make sure that the writer ignores the OrcKey
- * @throws Exception
- */
+ // write 4000 rows with the integer and the binary string
+ TypeDescription type = TypeDescription.fromString(typeStr);
+ OrcStruct row = (OrcStruct) OrcStruct.createValue(type);
+ NullWritable nada = NullWritable.get();
+ for (int r = 0; r < 3000; ++r) {
+ row.setFieldValue(0, new IntWritable(r));
+ row.setFieldValue(1, new IntWritable(r * 2));
+ row.setFieldValue(2, new IntWritable(r * 3));
+ writer.write(nada, row);
+ }
+ writer.close(attemptContext);
+
+ conf.set(OrcConf.INCLUDE_COLUMNS.getAttribute(), "");
+ FileSplit split = new FileSplit(new Path(workDir, "part-m-00000.orc"),
+ 0, 1000000, new String[0]);
+ RecordReader<NullWritable, OrcStruct> reader =
+ new OrcInputFormat<OrcStruct>().createRecordReader(split,
+ attemptContext);
+ // the sarg should cause it to skip over the rows except 1000 to 2000
+ for (int r = 0; r < 3000; ++r) {
+ assertEquals(true, reader.nextKeyValue());
+ row = reader.getCurrentValue();
+ assertEquals(null, ((IntWritable) row.getFieldValue(0)));
+ assertEquals(null, row.getFieldValue(1));
+ assertEquals(null, ((IntWritable) row.getFieldValue(2)));
+ }
+ assertEquals(false, reader.nextKeyValue());
+ }
+
+
+
+ /**
+ * Make sure that the writer ignores the OrcKey
+ * @throws Exception
+ */
--- End diff --
Could you remove the redundant blank here in line 201 ~ 204?
> Allow `orc.include.columns` to be empty
> ---------------------------------------
>
> Key: ORC-233
> URL: https://issues.apache.org/jira/browse/ORC-233
> Project: ORC
> Issue Type: Bug
> Components: Java
> Affects Versions: 1.4.0
> Reporter: Dongjoon Hyun
>
> Apache ORC should support returning all NULLs by the following.
> {code}
> conf.set(OrcConf.INCLUDE_COLUMNS.getAttribute, "")
> {code}
> Currently, it raises the following exceptions.
> {code}
> For input string: ""
> java.lang.NumberFormatException: For input string: ""
> at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Integer.parseInt(Integer.java:592)
> at java.lang.Integer.parseInt(Integer.java:615)
> at
> org.apache.orc.mapred.OrcInputFormat.parseInclude(OrcInputFormat.java:69)
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)