[
https://issues.apache.org/jira/browse/FLINK-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14053839#comment-14053839
]
ASF GitHub Bot commented on FLINK-933:
--------------------------------------
Github user tillrohrmann commented on a diff in the pull request:
https://github.com/apache/incubator-flink/pull/47#discussion_r14607248
--- Diff:
stratosphere-java/src/main/java/eu/stratosphere/api/java/io/PrimitiveInputFormat.java
---
@@ -0,0 +1,73 @@
+/***********************************************************************************************************************
+ *
+ * Copyright (C) 2010-2013 by the Stratosphere project
(http://stratosphere.eu)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License"); you may
not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on
+ * an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
express or implied. See the License for the
+ * specific language governing permissions and limitations under the
License.
+ *
+
**********************************************************************************************************************/
+package eu.stratosphere.api.java.io;
+
+import eu.stratosphere.api.common.io.DelimitedInputFormat;
+import eu.stratosphere.core.fs.FileInputSplit;
+import eu.stratosphere.core.fs.Path;
+import eu.stratosphere.types.parser.FieldParser;
+import eu.stratosphere.util.InstantiationUtil;
+
+import java.io.IOException;
+
+/**
+ * An input format that reads single field primitive data from a given
file. The difference between this and
+ * {@link eu.stratosphere.api.java.io.CsvInputFormat} is that it won't go
through {@link eu.stratosphere.api.java.tuple.Tuple1}.
+ */
+public class PrimitiveInputFormat<OT> extends DelimitedInputFormat<OT> {
+
+ private Class<OT> primitiveClass;
+
+ private static final byte CARRIAGE_RETURN = (byte) '\r';
+
+ private static final byte NEW_LINE = (byte) '\n';
+
+ private transient FieldParser<OT> parser;
+
+
+ public PrimitiveInputFormat(Path filePath, Class<OT> primitiveClass) {
+ super(filePath);
+ this.primitiveClass = primitiveClass;
+ }
+
+ public PrimitiveInputFormat(Path filePath, char delimiter, Class<OT>
primitiveClass) {
+ super(filePath);
+ this.primitiveClass = primitiveClass;
+ this.setDelimiter(delimiter);
+ }
+
+ @Override
+ public void open(FileInputSplit split) throws IOException {
+ super.open(split);
+ Class<? extends FieldParser<OT>> parserType =
FieldParser.getParserForType(primitiveClass);
+ if (parserType == null) {
+ throw new IllegalArgumentException("The type '" +
primitiveClass.getName() + "' is not supported for the primitive input
format.");
+ }
+ parser = InstantiationUtil.instantiate(parserType,
FieldParser.class);
+ }
+
+ @Override
+ public OT readRecord(OT reuse, byte[] bytes, int offset, int numBytes) {
+ //Check if \n is used as delimiter and the end of this line is
a \r, then remove \r from the line
+ if (this.getDelimiter() != null && this.getDelimiter().length
== 1
+ && this.getDelimiter()[0] == NEW_LINE &&
offset+numBytes >= 1
--- End diff --
Can we have records of length 0?
> Add an input format to read primitive types directly (not through tuples)
> -------------------------------------------------------------------------
>
> Key: FLINK-933
> URL: https://issues.apache.org/jira/browse/FLINK-933
> Project: Flink
> Issue Type: Bug
> Reporter: Stephan Ewen
> Assignee: Mingliang Qi
> Priority: Minor
> Labels: easyfix, features, starter
>
> Right now, reading primitive types goes either through custom formats (work
> intensive), or through CSV inputs. The latter return tuples.
> To read a sequence of primitives, you need to go though Tuple1, which is
> clumsy.
> I would suggest to add an input format to read primitive types line wise (or
> otherwise delimited), and also add a method to the environment for that.
--
This message was sent by Atlassian JIRA
(v6.2#6252)