[
https://issues.apache.org/jira/browse/PIG-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13687481#comment-13687481
]
Cheolsoo Park commented on PIG-3357:
------------------------------------
This is because Pig always converts Python float to Java double.
{code:title=JythonUtils.java}
} else if (pyObject instanceof PyFloat) {
// J(P)ython is loosely typed, supports only float type,
// hence we convert everything to double to save precision
javaObj = pyObject.__tojava__(Double.class);
}
{code}
In fact, that is a good thing because Python floating point numbers are
implemented using double in C:
http://docs.python.org/2/library/stdtypes.html#typesnumeric
One workaround is to specify bytearray as the outputSchema of Python udf and do
a cast to float in Pig. That is,
{code:title=test.py}
@outputSchema("v: bytearray")
def test(v):
return v
{code}
{code:title=test.pig}
table_out = foreach table_in generate (float)udf.test(v);
{code}
Perhaps we should document this in the Pig manual.
> Pig doesn't take care of declared float type and converts it to double
> ----------------------------------------------------------------------
>
> Key: PIG-3357
> URL: https://issues.apache.org/jira/browse/PIG-3357
> Project: Pig
> Issue Type: Bug
> Components: piggybank
> Affects Versions: 0.11
> Environment: cdh 4.3.0
> Reporter: Sergey
>
> Here is the script:
> {code}
> register /usr/lib/pig/lib/avro-1.7.4.jar;
> register /usr/lib/pig/lib/json-simple-1.1.jar;
> register /usr/lib/pig/piggybank.jar;
> register test.py using jython as udf;
> table_in = load 'in' as (v: float);
> table_out = foreach table_in generate udf.test(v);
> store table_out into 'out' using
> org.apache.pig.piggybank.storage.avro.AvroStorage('schema', '{"name": "test",
> "type": "float"}');
> {code}
> Here is UDF:
> {code}
> @outputSchema("v: float")
> def test(v):
> return v
> {code}
> Here is an input:
> {code}
> 1
> {code}
> Here is the stacktrace:
> java.lang.Exception:
> org.apache.avro.file.DataFileWriter$AppendWriteException:
> java.io.IOException: Cannot convert to float:class java.lang.Double
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:404)
> Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException:
> java.io.IOException: Cannot convert to float:class java.lang.Double
> at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:260)
> at
> org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49)
> at
> org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:722)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
> at
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:558)
> at
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> at
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:106)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:266)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:679)
> Caused by: java.io.IOException: Cannot convert to float:class java.lang.Double
> at
> org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.writeFloat(PigAvroDatumWriter.java:281)
> at
> org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:88)
> at
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
> at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:257)
> ... 21 more
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira