[ 
https://issues.apache.org/jira/browse/PIG-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612937#action_12612937
 ] 

Santhosh Srinivasan commented on PIG-303:
-----------------------------------------

The conversion from any pig type to byte array is broken. 

The cast functionality is used in the following scenarios:

1. Cast bytes to appropriate pig types during load
2. Cast one pig type to another during execution
3. Cast pig types to appropriate storage representation during a store

Out of these three scenarios, POCast plays a role in the first two. The third 
scenario influences the behavior of POCast.

Currently, POCast uses the load function to convert bytes to the appropriate 
pig type (scenario 1). During the pipeline execution, after the load, users can 
use casts as they deem fit. This covers scenarios like converting a pig type 
(other than byte array) to byte array followed by a conversion of the byte 
array to the same or a different pig type (Scenario 2). Consider the 
hypothetical use of the cast below.

{code}

a = load 'myfile' as (t: tuple(i: int, f: float));

b = foreach a generate (bytearray) $0;

c = foreach b generate (tuple(int, int)) $0;
{code}

The tuple is first cast to a byte array and then cast back to a tuple. In order 
to facilitate these types of casts, the byte array representation should retain 
information about the original type it was cast from. This information is 
conceptually encapsulated in the load function, which supports the ability to 
convert bytes to pig types. The inverse mechanism of converting pig types to 
bytes will nicely fit in the context of the load function. This will enable pig 
to use the conversion and inversion hooks in the load function to convert bytes 
to pig types and vice versa in the context of the pipeline execution (Scenario 
2).

The obvious benefit of this approach: Store functions which understand the byte 
representation of the data can now convert the bytes back in  the format of 
choice (Scenario 3).

Summary:

1. Load function interface supports  toBytes for each pig type in addition to 
bytesToInteger, bytesToLong, etc.
2. POCast uses the load function to convert bytes to pig types and vice versa
3. PigStorage will be extended to support complex types (tuples, bags, maps) 
and provide inverse functions, i.e., convert pig types to bytes representation

> POCast does not cast chararray to bytearray
> -------------------------------------------
>
>                 Key: PIG-303
>                 URL: https://issues.apache.org/jira/browse/PIG-303
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: types_branch
>
>
> When chararray is cast to bytearray, the query execution fails due to 
> ClassCastException. The problem is inside the getNext(DataByteArray) code in 
> POCast.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to