[ 
https://issues.apache.org/jira/browse/CASSANDRA-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496547#comment-14496547
 ] 

Piotr Kołaczkowski commented on CASSANDRA-7410:
-----------------------------------------------

org/apache/cassandra/hadoop/pig/CqlNativeStorage.java:149
{noformat}
        if (t.getType(0) == DataType.TUPLE)
        {
            if (bulkOutputFormat)
            {
                cqlQueryFromTuple(null, t, 0);
            }
            else if (t.getType(1) == DataType.TUPLE)
            {
                Map<String, ByteBuffer> key = tupleToKeyMap((Tuple)t.get(0));
                cqlQueryFromTuple(key, t, 1);
            }
            else
                throw new IOException("Second argument in output must be a 
tuple");
        }
        else
            throw new IOException("First argument in output must be a tuple");
{noformat}

Personally, I don't like this input validation style.
Much better to validate input in a flat way at the beginning:

{noformat}
if (t.getType(0) != DataType.TUPLE)
    throw ....
if (t.getType(1) != DataType.TUPLE)
    throw ....

// now we know input is ok, so we can focus on doing real stuff
{noformat}

Moreover, {{cqlQueryFromTuple}} does the same validation again...

------------

org.apache.cassandra.hadoop.pig.CqlNativeStorage#setStoreLocation:

This method is a copy-paste from
org.apache.cassandra.hadoop.pig.CqlStorage#setStoreLocation
with only a minor section related to bulkOutputFormat added.

Any reason for not using super.setStoreLocation()?

------------

org/apache/cassandra/io/sstable/CQLSSTableWriter.java:
{noformat}
               try
               {
                   Schema.instance.load(ksm);
               }
               catch (Exception e)
               {
                   //It may get an exception of Attempting to load already 
loaded column family
              }
{noformat}
Ok, I get it, but what if it tries to load it for the first time and fails? It 
doesn't even inform the user that something bad happened and why. 
Also, can you elaborate more on why it may want to load it multiple times?


> Pig support for BulkOutputFormat as a parameter in url
> ------------------------------------------------------
>
>                 Key: CASSANDRA-7410
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7410
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>            Reporter: Alex Liu
>            Assignee: Alex Liu
>            Priority: Minor
>             Fix For: 2.0.15
>
>         Attachments: 7410-2.0-branch.txt, 7410-2.1-branch.txt, 
> 7410-v2-2.0-branch.txt, 7410-v3-2.0-branch.txt, 
> CASSANDRA-7410-v2-2.1-branch.txt
>
>
> Add BulkOutputFormat support in Pig url



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to