[
https://issues.apache.org/jira/browse/PIG-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612937#action_12612937
]
Santhosh Srinivasan commented on PIG-303:
-----------------------------------------
The conversion from any pig type to byte array is broken.
The cast functionality is used in the following scenarios:
1. Cast bytes to appropriate pig types during load
2. Cast one pig type to another during execution
3. Cast pig types to appropriate storage representation during a store
Out of these three scenarios, POCast plays a role in the first two. The third
scenario influences the behavior of POCast.
Currently, POCast uses the load function to convert bytes to the appropriate
pig type (scenario 1). During the pipeline execution, after the load, users can
use casts as they deem fit. This covers scenarios like converting a pig type
(other than byte array) to byte array followed by a conversion of the byte
array to the same or a different pig type (Scenario 2). Consider the
hypothetical use of the cast below.
{code}
a = load 'myfile' as (t: tuple(i: int, f: float));
b = foreach a generate (bytearray) $0;
c = foreach b generate (tuple(int, int)) $0;
{code}
The tuple is first cast to a byte array and then cast back to a tuple. In order
to facilitate these types of casts, the byte array representation should retain
information about the original type it was cast from. This information is
conceptually encapsulated in the load function, which supports the ability to
convert bytes to pig types. The inverse mechanism of converting pig types to
bytes will nicely fit in the context of the load function. This will enable pig
to use the conversion and inversion hooks in the load function to convert bytes
to pig types and vice versa in the context of the pipeline execution (Scenario
2).
The obvious benefit of this approach: Store functions which understand the byte
representation of the data can now convert the bytes back in the format of
choice (Scenario 3).
Summary:
1. Load function interface supports toBytes for each pig type in addition to
bytesToInteger, bytesToLong, etc.
2. POCast uses the load function to convert bytes to pig types and vice versa
3. PigStorage will be extended to support complex types (tuples, bags, maps)
and provide inverse functions, i.e., convert pig types to bytes representation
> POCast does not cast chararray to bytearray
> -------------------------------------------
>
> Key: PIG-303
> URL: https://issues.apache.org/jira/browse/PIG-303
> Project: Pig
> Issue Type: Bug
> Affects Versions: types_branch
> Reporter: Santhosh Srinivasan
> Assignee: Santhosh Srinivasan
> Fix For: types_branch
>
>
> When chararray is cast to bytearray, the query execution fails due to
> ClassCastException. The problem is inside the getNext(DataByteArray) code in
> POCast.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.