[
https://issues.apache.org/jira/browse/ORC-233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136420#comment-16136420
]
ASF GitHub Bot commented on ORC-233:
------------------------------------
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/orc/pull/160#discussion_r134399015
--- Diff:
java/mapreduce/src/test/org/apache/orc/mapreduce/TestMapreduceOrcOutputFormat.java
---
@@ -153,11 +153,55 @@ public void testColumnSelection() throws Exception {
assertEquals(false, reader.nextKeyValue());
}
+ @Test
+ public void testColumnSelectionBlank() throws Exception {
+ String typeStr = "struct<i:int,j:int,k:int>";
+ OrcConf.MAPRED_OUTPUT_SCHEMA.setString(conf, typeStr);
+ conf.set("mapreduce.output.fileoutputformat.outputdir",
workDir.toString());
+ conf.setInt(OrcConf.ROW_INDEX_STRIDE.getAttribute(), 1000);
+ conf.setBoolean(OrcOutputFormat.SKIP_TEMP_DIRECTORY, true);
+ TaskAttemptID id = new TaskAttemptID("jt", 0, TaskType.MAP, 0, 1);
+ TaskAttemptContext attemptContext = new TaskAttemptContextImpl(conf,
id);
+ OutputFormat<NullWritable, OrcStruct> outputFormat =
+ new OrcOutputFormat<OrcStruct>();
+ RecordWriter<NullWritable, OrcStruct> writer =
+ outputFormat.getRecordWriter(attemptContext);
- /**
- * Make sure that the writer ignores the OrcKey
- * @throws Exception
- */
+ // write 4000 rows with the integer and the binary string
+ TypeDescription type = TypeDescription.fromString(typeStr);
+ OrcStruct row = (OrcStruct) OrcStruct.createValue(type);
+ NullWritable nada = NullWritable.get();
+ for (int r = 0; r < 3000; ++r) {
+ row.setFieldValue(0, new IntWritable(r));
+ row.setFieldValue(1, new IntWritable(r * 2));
+ row.setFieldValue(2, new IntWritable(r * 3));
+ writer.write(nada, row);
+ }
+ writer.close(attemptContext);
+
+ conf.set(OrcConf.INCLUDE_COLUMNS.getAttribute(), "");
+ FileSplit split = new FileSplit(new Path(workDir, "part-m-00000.orc"),
+ 0, 1000000, new String[0]);
+ RecordReader<NullWritable, OrcStruct> reader =
+ new OrcInputFormat<OrcStruct>().createRecordReader(split,
--- End diff --
ditto.
> Allow `orc.include.columns` to be empty
> ---------------------------------------
>
> Key: ORC-233
> URL: https://issues.apache.org/jira/browse/ORC-233
> Project: ORC
> Issue Type: Bug
> Components: Java
> Affects Versions: 1.4.0
> Reporter: Dongjoon Hyun
>
> Apache ORC should support returning all NULLs by the following.
> {code}
> conf.set(OrcConf.INCLUDE_COLUMNS.getAttribute, "")
> {code}
> Currently, it raises the following exceptions.
> {code}
> For input string: ""
> java.lang.NumberFormatException: For input string: ""
> at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Integer.parseInt(Integer.java:592)
> at java.lang.Integer.parseInt(Integer.java:615)
> at
> org.apache.orc.mapred.OrcInputFormat.parseInclude(OrcInputFormat.java:69)
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)