[ 
https://issues.apache.org/jira/browse/FLINK-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241856#comment-14241856
 ] 

mustafa elbehery commented on FLINK-629:
----------------------------------------

Hi Robert, 

I created a parser for tweets in JSON format, into java object, and it was 
integrated with Flink. When I was creating ETL job, Flink was throwing 
NullPointerException, because the tweets dataset contained Null values. When I 
tried to debug, the exception was from AVRO serializer instances can not write 
Null values. I have contacted Stephan Ewen and he recommended using a Filter to 
overwrite the null values with default values, because Flink does not support 
Null values yet. Below is the stack trace : 



java.lang.NullPointerException: in de.tu_berlin.impro3.flink.model.tweet.Tweet 
in array null of array in field contributors of 
de.tu_berlin.impro3.flink.model.tweet.Tweet
        at 
org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145)
        at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
        at 
org.apache.flink.api.java.typeutils.runtime.AvroSerializer.serialize(AvroSerializer.java:112)
        at 
org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:51)
        at 
org.apache.flink.runtime.io.network.serialization.SpanningRecordSerializer.addRecord(SpanningRecordSerializer.java:76)
        at 
org.apache.flink.runtime.io.network.api.RecordWriter.emit(RecordWriter.java:82)
        at 
org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:88)
        at 
org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:196)
        at 
org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:235)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
        at 
org.apache.avro.reflect.ReflectDatumWriter.writeArray(ReflectDatumWriter.java:67)
        at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:68)
        at 
org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)
        at 
org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
        at 
org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:175)
        at 
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
        at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
        at 
org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)
        ... 9 more

14/12/01 16:08:39 INFO client.JobClient: 12/01/2014 16:08:38:   Job execution 
switched to status FAILED
14/12/01 16:08:39 INFO client.JobClient: java.lang.NullPointerException: in 
de.tu_berlin.impro3.flink.model.tweet.Tweet in array null of array in field 
contributors of de.tu_berlin.impro3.flink.model.tweet.Tweet
        at 
org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145)
        at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
        at 
org.apache.flink.api.java.typeutils.runtime.AvroSerializer.serialize(AvroSerializer.java:112)
        at 
org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:51)
        at 
org.apache.flink.runtime.io.network.serialization.SpanningRecordSerializer.addRecord(SpanningRecordSerializer.java:76)
        at 
org.apache.flink.runtime.io.network.api.RecordWriter.emit(RecordWriter.java:82)
        at 
org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:88)
        at 
org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:196)
        at 
org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:235)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
        at 
org.apache.avro.reflect.ReflectDatumWriter.writeArray(ReflectDatumWriter.java:67)
        at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:68)
        at 
org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)
        at 
org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
        at 
org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:175)
        at 
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
        at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
        at 
org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)
        ... 9 more

14/12/01 16:08:39 INFO taskmanager.TaskManager: Shutting down TaskManager
Exception in thread "main" 
org.apache.flink.runtime.client.JobExecutionException: 
java.lang.NullPointerException: in de.tu_berlin.impro3.flink.model.tweet.Tweet 
in array null of array in field contributors of 
de.tu_berlin.impro3.flink.model.tweet.Tweet
        at 
org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145)
        at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
        at 
org.apache.flink.api.java.typeutils.runtime.AvroSerializer.serialize(AvroSerializer.java:112)
        at 
org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:51)
        at 
org.apache.flink.runtime.io.network.serialization.SpanningRecordSerializer.addRecord(SpanningRecordSerializer.java:76)
        at 
org.apache.flink.runtime.io.network.api.RecordWriter.emit(RecordWriter.java:82)
        at 
org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:88)
        at 
org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:196)
        at 
org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:235)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
        at 
org.apache.avro.reflect.ReflectDatumWriter.writeArray(ReflectDatumWriter.java:67)
        at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:68)
        at 
org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)
        at 
org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
        at 
org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:175)
        at 
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
        at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
        at 
org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)
        ... 9 more

        at 
org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient.java:349)
        at 
org.apache.flink.client.LocalExecutor.executePlan(LocalExecutor.java:239)
        at 
org.apache.flink.api.java.LocalEnvironment.execute(LocalEnvironment.java:51)
        at 
de.tu_berlin.impro3.flink.PrintTweetsTest.main(PrintTweetsTest.java:174)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)



> Add support for null values to the java api
> -------------------------------------------
>
>                 Key: FLINK-629
>                 URL: https://issues.apache.org/jira/browse/FLINK-629
>             Project: Flink
>          Issue Type: Improvement
>          Components: Java API
>            Reporter: Stephan Ewen
>            Assignee: Gyula Fora
>            Priority: Critical
>              Labels: github-import
>             Fix For: pre-apache
>
>
> Currently, many runtime operations fail when encountering a null value. Tuple 
> serialization should allow null fields.
> I suggest to add a method to the tuples called `getFieldNotNull()` which 
> throws a meaningful exception when the accessed field is null. That way, we 
> simplify the logic of operators that should not dead with null fields, like 
> key grouping or aggregations.
> Even though SQL allows grouping and aggregating of null values, I suggest to 
> exclude this from the java api, because the SQL semantics of aggregating null 
> fields are messy.
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/629
> Created by: [StephanEwen|https://github.com/StephanEwen]
> Labels: enhancement, java api, 
> Milestone: Release 0.5.1
> Created at: Wed Mar 26 00:27:49 CET 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to