[
https://issues.apache.org/jira/browse/PIG-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776884#comment-13776884
]
Rohini Palaniswamy commented on PIG-2417:
-----------------------------------------
I see compilation fails with 4 similar errors in TestStreamingUDF.java when
running it on Linux. Works fine on Mac.
test/org/apache/pig/impl/builtin/TestStreamingUDF.java:287: error: unmappable
character for encoding UTF8
>From javac documentation:
-encoding encoding
Set the source file encoding name, such as EUC-JP and UTF-8. If -encoding is
not specified, the platform default converter is used.
Not sure what the platform defaults exactly are in MAC, as inside a java
program file.encoding and Charset.defaultCharset() are UTF8. Either we should
specify -encoding in the ant javac invocation or fix the test to use \uxxxx.
[~daijy],
Does it compile fine on Windows?
> Streaming UDFs - allow users to easily write UDFs in scripting languages
> with no JVM implementation.
> -----------------------------------------------------------------------------------------------------
>
> Key: PIG-2417
> URL: https://issues.apache.org/jira/browse/PIG-2417
> Project: Pig
> Issue Type: Improvement
> Affects Versions: 0.12.0
> Reporter: Jeremy Karn
> Assignee: Jeremy Karn
> Fix For: 0.12.0
>
> Attachments: PIG-2417-4.patch, PIG-2417-5.patch, PIG-2417-6.patch,
> PIG-2417-7.patch, PIG-2417-8.patch, PIG-2417-9-1.patch, PIG-2417-9-2.patch,
> PIG-2417-9.patch, PIG-2417-e2e.patch, streaming2.patch, streaming3.patch,
> streaming.patch
>
>
> The goal of Streaming UDFs is to allow users to easily write UDFs in
> scripting languages with no JVM implementation or a limited JVM
> implementation. The initial proposal is outlined here:
> https://cwiki.apache.org/confluence/display/PIG/StreamingUDFs.
> In order to implement this we need new syntax to distinguish a streaming UDF
> from an embedded JVM UDF. I'd propose something like the following (although
> I'm not sure 'language' is the best term to be using):
> {code}define my_streaming_udfs language('python')
> ship('my_streaming_udfs.py'){code}
> We'll also need a language-specific controller script that gets shipped to
> the cluster which is responsible for reading the input stream, deserializing
> the input data, passing it to the user written script, serializing that
> script output, and writing that to the output stream.
> Finally, we'll need to add a StreamingUDF class that extends evalFunc. This
> class will likely share some of the existing code in POStream and
> ExecutableManager (where it make sense to pull out shared code) to stream
> data to/from the controller script.
> One alternative approach to creating the StreamingUDF EvalFunc is to use the
> POStream operator directly. This would involve inserting the POStream
> operator instead of the POUserFunc operator whenever we encountered a
> streaming UDF while building the physical plan. This approach seemed
> problematic because there would need to be a lot of changes in order to
> support POStream in all of the places we want to be able use UDFs (For
> example - to operate on a single field inside of a for each statement).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira