[ 
https://issues.apache.org/jira/browse/CASSANDRA-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13189334#comment-13189334
 ] 

Brandon Williams commented on CASSANDRA-3371:
---------------------------------------------

bq. 1. Fix schema so that this ticket's problem is resolved

v4 does this, however it's not quite all of what we want.

bq. 2. have the default return value from CassandraStorage be (key, column, 
value) as is thought of for transposing wide rows

After thinking about this more, that's the wrong way to approach that, because 
if you DO want to work within the row, now you have to do an expensive group to 
get back what we had before -- a nest structure -- where breaking that 
structure up into (k, c, v) is extremely cheap if that's what you want.  So 
ultimately, we need to stick with a bag for spillage, and thus keep the 
existing schema.  v4 does this.

v4 also names the *values* of indexed/validated columns after their name, which 
is more pygmalion-style, since you'll always want to filter the value, not the 
name.

The problem, however, is strange parsing problems again:

{noformat}
ERROR 1200: Pig script failed to parse: 
<file foo.pig, line 3, column 7> pig script failed to validate: 
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1128: Cannot find 
field owner_id in :tuple(name:chararray,owner_id:chararray)
{noformat}

The seems related to the fact that schema-wise, a bag can only contain a single 
tuple - but that tuple can then contain any number of items.  Apparently this 
is only a hard requirement in 0.9 or later, but I tested it up to trunk so it 
doesn't look like it's going anywhere.

In practice, however, getNext doesn't actually return this 'container' tuple.  
If you do you get casting errors.

I'm not really sure how we can fix this, and finding other examples of 
LoadMetadata implemented with bags are hard to come by.


                
> Cassandra inferred schema and actual data don't match
> -----------------------------------------------------
>
>                 Key: CASSANDRA-3371
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3371
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 0.8.7
>            Reporter: Pete Warden
>            Assignee: Brandon Williams
>         Attachments: 3371-v2.txt, 3371-v3.txt, 3371-v4.txt, pig.diff
>
>
> It's looking like there may be a mismatch between the schema that's being 
> reported by the latest CassandraStorage.java, and the data that's actually 
> returned. Here's an example:
> rows = LOAD 'cassandra://Frap/PhotoVotes' USING CassandraStorage();
> DESCRIBE rows;
> rows: {key: chararray,columns: {(name: chararray,value: 
> bytearray,photo_owner: chararray,value_photo_owner: bytearray,pid: 
> chararray,value_pid: bytearray,matched_string: 
> chararray,value_matched_string: bytearray,src_big: chararray,value_src_big: 
> bytearray,time: chararray,value_time: bytearray,vote_type: 
> chararray,value_vote_type: bytearray,voter: chararray,value_voter: 
> bytearray)}}
> DUMP rows;
> (691831038_1317937188.48955,{(photo_owner,1596090180),(pid,6855155124568798560),(matched_string,),(src_big,),(time,Thu
>  Oct 06 14:39:48 -0700 2011),(vote_type,album_dislike),(voter,691831038)})
> getSchema() is reporting the columns as an inner bag of tuples, each of which 
> contains 16 values. In fact, getNext() seems to return an inner bag 
> containing 7 tuples, each of which contains two values. 
> It appears that things got out of sync with this change:
> http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java?r1=1177083&r2=1177082&pathrev=1177083
> See more discussion at:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/pig-cassandra-problem-quot-Incompatible-field-schema-quot-error-tc6882703.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to