[
https://issues.apache.org/jira/browse/FLINK-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923859#comment-15923859
]
ASF GitHub Bot commented on FLINK-6039:
---------------------------------------
Github user fhueske commented on a diff in the pull request:
https://github.com/apache/flink/pull/3529#discussion_r105856276
--- Diff: flink-core/src/main/java/org/apache/flink/types/Row.java ---
@@ -66,10 +66,11 @@ public int getArity() {
* Gets the field at the specified position.
* @param pos The position of the field, 0-based.
* @return The field at the specified position.
- * @throws IndexOutOfBoundsException Thrown, if the position is
negative, or equal to, or larger than the number of fields.
+ * Return null if the position is equal to, or larger than the number
of fields.
+ * @throws IndexOutOfBoundsException Thrown, if the position is
negative.
*/
public Object getField(int pos) {
- return fields[pos];
+ return pos >= fields.length ? null : fields[pos];
--- End diff --
This will cause overhead for basically every operation and should not be
done to support a minor feature.
If you expect that you might receive a `Row` which violates the expected
schema and you want to avoid an `IndexOutOfBoundsException` you should rather
check `getArity()`.
> Row of TableFunction should support flexible number of fields
> -------------------------------------------------------------
>
> Key: FLINK-6039
> URL: https://issues.apache.org/jira/browse/FLINK-6039
> Project: Flink
> Issue Type: Improvement
> Reporter: Zhuoluo Yang
> Assignee: Zhuoluo Yang
>
> In actual world, especially while processing logs with TableFunction. The
> formats of the logs in actual world are flexible. Thus, the number of fields
> should not be fixed.
> For examples, we should make the three following types of of TableFunction
> work.
> {code}
> // Test for incomplete row
> class TableFunc4 extends TableFunction[Row] {
> def eval(str: String): Unit = {
> if (str.contains("#")) {
> str.split("#").foreach({ s =>
> val row = new Row(3)
> row.setField(0, s) // And we only set values for one column
> collect(row)
> })
> }
> }
> override def getResultType: TypeInformation[Row] = {
> new RowTypeInfo(BasicTypeInfo.STRING_TYPE_INFO,
> BasicTypeInfo.INT_TYPE_INFO,
> BasicTypeInfo.INT_TYPE_INFO)
> }
> }
> // Test for incomplete row
> class TableFunc5 extends TableFunction[Row] {
> def eval(str: String): Unit = {
> if (str.contains("#")) {
> str.split("#").foreach({ s =>
> val row = new Row(1) // ResultType is three columns, we have only
> one here
> row.setField(0, s)
> collect(row)
> })
> }
> }
> override def getResultType: TypeInformation[Row] = {
> new RowTypeInfo(BasicTypeInfo.STRING_TYPE_INFO,
> BasicTypeInfo.INT_TYPE_INFO,
> BasicTypeInfo.INT_TYPE_INFO)
> }
> }
> // Test for overflow row
> class TableFunc6 extends TableFunction[Row] {
> def eval(str: String): Unit = {
> if (str.contains("#")) {
> str.split("#").foreach({ s =>
> val row = new Row(5) // ResultType is two columns, we have five
> columns here
> row.setField(0, s)
> row.setField(1, s.length)
> row.setField(2, s.length)
> row.setField(3, s.length)
> row.setField(4, s.length)
> collect(row)
> })
> }
> }
> override def getResultType: TypeInformation[Row] = {
> new RowTypeInfo(BasicTypeInfo.STRING_TYPE_INFO,
> BasicTypeInfo.INT_TYPE_INFO)
> }
> }
> {code}
> Actually, the TableFunc4 and TableFunc6 has already worked correctly with
> current version. This issue will make TableFunc5 works.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)