[
https://issues.apache.org/jira/browse/FLINK-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241856#comment-14241856
]
mustafa elbehery commented on FLINK-629:
----------------------------------------
Hi Robert,
I created a parser for tweets in JSON format, into java object, and it was
integrated with Flink. When I was creating ETL job, Flink was throwing
NullPointerException, because the tweets dataset contained Null values. When I
tried to debug, the exception was from AVRO serializer instances can not write
Null values. I have contacted Stephan Ewen and he recommended using a Filter to
overwrite the null values with default values, because Flink does not support
Null values yet. Below is the stack trace :
java.lang.NullPointerException: in de.tu_berlin.impro3.flink.model.tweet.Tweet
in array null of array in field contributors of
de.tu_berlin.impro3.flink.model.tweet.Tweet
at
org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
at
org.apache.flink.api.java.typeutils.runtime.AvroSerializer.serialize(AvroSerializer.java:112)
at
org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:51)
at
org.apache.flink.runtime.io.network.serialization.SpanningRecordSerializer.addRecord(SpanningRecordSerializer.java:76)
at
org.apache.flink.runtime.io.network.api.RecordWriter.emit(RecordWriter.java:82)
at
org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:88)
at
org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:196)
at
org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:235)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at
org.apache.avro.reflect.ReflectDatumWriter.writeArray(ReflectDatumWriter.java:67)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:68)
at
org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)
at
org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
at
org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:175)
at
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
at
org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)
... 9 more
14/12/01 16:08:39 INFO client.JobClient: 12/01/2014 16:08:38: Job execution
switched to status FAILED
14/12/01 16:08:39 INFO client.JobClient: java.lang.NullPointerException: in
de.tu_berlin.impro3.flink.model.tweet.Tweet in array null of array in field
contributors of de.tu_berlin.impro3.flink.model.tweet.Tweet
at
org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
at
org.apache.flink.api.java.typeutils.runtime.AvroSerializer.serialize(AvroSerializer.java:112)
at
org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:51)
at
org.apache.flink.runtime.io.network.serialization.SpanningRecordSerializer.addRecord(SpanningRecordSerializer.java:76)
at
org.apache.flink.runtime.io.network.api.RecordWriter.emit(RecordWriter.java:82)
at
org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:88)
at
org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:196)
at
org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:235)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at
org.apache.avro.reflect.ReflectDatumWriter.writeArray(ReflectDatumWriter.java:67)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:68)
at
org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)
at
org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
at
org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:175)
at
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
at
org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)
... 9 more
14/12/01 16:08:39 INFO taskmanager.TaskManager: Shutting down TaskManager
Exception in thread "main"
org.apache.flink.runtime.client.JobExecutionException:
java.lang.NullPointerException: in de.tu_berlin.impro3.flink.model.tweet.Tweet
in array null of array in field contributors of
de.tu_berlin.impro3.flink.model.tweet.Tweet
at
org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
at
org.apache.flink.api.java.typeutils.runtime.AvroSerializer.serialize(AvroSerializer.java:112)
at
org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:51)
at
org.apache.flink.runtime.io.network.serialization.SpanningRecordSerializer.addRecord(SpanningRecordSerializer.java:76)
at
org.apache.flink.runtime.io.network.api.RecordWriter.emit(RecordWriter.java:82)
at
org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:88)
at
org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:196)
at
org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:235)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at
org.apache.avro.reflect.ReflectDatumWriter.writeArray(ReflectDatumWriter.java:67)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:68)
at
org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)
at
org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
at
org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:175)
at
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
at
org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)
... 9 more
at
org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient.java:349)
at
org.apache.flink.client.LocalExecutor.executePlan(LocalExecutor.java:239)
at
org.apache.flink.api.java.LocalEnvironment.execute(LocalEnvironment.java:51)
at
de.tu_berlin.impro3.flink.PrintTweetsTest.main(PrintTweetsTest.java:174)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
> Add support for null values to the java api
> -------------------------------------------
>
> Key: FLINK-629
> URL: https://issues.apache.org/jira/browse/FLINK-629
> Project: Flink
> Issue Type: Improvement
> Components: Java API
> Reporter: Stephan Ewen
> Assignee: Gyula Fora
> Priority: Critical
> Labels: github-import
> Fix For: pre-apache
>
>
> Currently, many runtime operations fail when encountering a null value. Tuple
> serialization should allow null fields.
> I suggest to add a method to the tuples called `getFieldNotNull()` which
> throws a meaningful exception when the accessed field is null. That way, we
> simplify the logic of operators that should not dead with null fields, like
> key grouping or aggregations.
> Even though SQL allows grouping and aggregating of null values, I suggest to
> exclude this from the java api, because the SQL semantics of aggregating null
> fields are messy.
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/629
> Created by: [StephanEwen|https://github.com/StephanEwen]
> Labels: enhancement, java api,
> Milestone: Release 0.5.1
> Created at: Wed Mar 26 00:27:49 CET 2014
> State: open
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)