[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat

2014-08-13 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095606#comment-14095606
 ] 

Jonathan Ellis commented on CASSANDRA-6927:
---

Why add the Config.isClientMode check to isLocalDC?

 Create a CQL3 based bulk OutputFormat
 -

 Key: CASSANDRA-6927
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Paul Pak
Priority: Minor
  Labels: cql3, hadoop
 Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, 
 trunk-6927-v4.txt, trunk-6927-v5.txt, trunk-6927.txt


 This is the CQL compatible version of BulkOutputFormat.  CqlOutputFormat 
 exists, but doesn't write SSTables directly, similar to 
 ColumnFamilyOutputFormat for thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat

2014-08-13 Thread Paul Pak (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095675#comment-14095675
 ] 

Paul Pak commented on CASSANDRA-6927:
-

[~jbellis] Ah, good catch. I had to think back a while, but if I recall 
correctly I was initially working off of 2.1.0-beta1, which set isLocalDC in 
the StreamRateLimiter constructor like this:
{code}
isLocalDC = DatabaseDescriptor.getLocalDataCenter().equals(

DatabaseDescriptor.getEndpointSnitch().getDatacenter(peer));
{code}
which was throwing an NPE because DatabaseDescriptor.getLocalDataCenter() was 
returning null. At that time I mistakenly thought Config.isClientMode() was 
inversely related to whether a local dc was being used, so I used 
isClientMode() to short circuit and bypass the NPE. However, since then I see 
that the appropriate null checks were added and that 'is client mode' isn't 
related to 'is local dc' as I had originally thought. I'll remove that from the 
patch.

 Create a CQL3 based bulk OutputFormat
 -

 Key: CASSANDRA-6927
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Paul Pak
Priority: Minor
  Labels: cql3, hadoop
 Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, 
 trunk-6927-v4.txt, trunk-6927-v5.txt, trunk-6927.txt


 This is the CQL compatible version of BulkOutputFormat.  CqlOutputFormat 
 exists, but doesn't write SSTables directly, similar to 
 ColumnFamilyOutputFormat for thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat

2014-08-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092532#comment-14092532
 ] 

Piotr Kołaczkowski commented on CASSANDRA-6927:
---

Ok, doing the review today. Thanks for the reminder.

 Create a CQL3 based bulk OutputFormat
 -

 Key: CASSANDRA-6927
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Paul Pak
Priority: Minor
  Labels: cql3, hadoop
 Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, 
 trunk-6927-v4.txt, trunk-6927.txt


 This is the CQL compatible version of BulkOutputFormat.  CqlOutputFormat 
 exists, but doesn't write SSTables directly, similar to 
 ColumnFamilyOutputFormat for thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat

2014-08-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092559#comment-14092559
 ] 

Piotr Kołaczkowski commented on CASSANDRA-6927:
---

Patch does not apply to trunk:
{noformat}
$ git apply trunk-6927-v4.txt

trunk-6927-v4.txt:148: trailing whitespace.

trunk-6927-v4.txt:150: trailing whitespace.

trunk-6927-v4.txt:153: trailing whitespace.
protected final int bufferSize; 
trunk-6927-v4.txt:158: trailing whitespace.

trunk-6927-v4.txt:326: trailing whitespace.
} 
error: patch failed: 
src/java/org/apache/cassandra/hadoop/cql3/CqlConfigHelper.java:20
error: src/java/org/apache/cassandra/hadoop/cql3/CqlConfigHelper.java: patch 
does not apply
error: patch failed: 
src/java/org/apache/cassandra/streaming/StreamManager.java:76
error: src/java/org/apache/cassandra/streaming/StreamManager.java: patch does 
not apply
{noformat}

 Create a CQL3 based bulk OutputFormat
 -

 Key: CASSANDRA-6927
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Paul Pak
Priority: Minor
  Labels: cql3, hadoop
 Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, 
 trunk-6927-v4.txt, trunk-6927.txt


 This is the CQL compatible version of BulkOutputFormat.  CqlOutputFormat 
 exists, but doesn't write SSTables directly, similar to 
 ColumnFamilyOutputFormat for thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat

2014-08-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092590#comment-14092590
 ] 

Piotr Kołaczkowski commented on CASSANDRA-6927:
---

+1
I applied the patch using IDE (it seems to have stronger algorithms for 
applying patches than the default in git) and all looks good. 
Please update the patch to make it mergable and move to testing.


 Create a CQL3 based bulk OutputFormat
 -

 Key: CASSANDRA-6927
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Paul Pak
Priority: Minor
  Labels: cql3, hadoop
 Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, 
 trunk-6927-v4.txt, trunk-6927.txt


 This is the CQL compatible version of BulkOutputFormat.  CqlOutputFormat 
 exists, but doesn't write SSTables directly, similar to 
 ColumnFamilyOutputFormat for thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat

2014-08-11 Thread Paul Pak (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092920#comment-14092920
 ] 

Paul Pak commented on CASSANDRA-6927:
-

[~pkolaczk] New patch (trunk-6927-v5.txt) attached. This was generated against 
the following commit, so I believe it should work without issues for you:

commit c7e191ba128841b3e67a168e0d0fb97ca2eed2dd
Merge: d2b24de 50ee3a7
Author: Aleksey Yeschenko alek...@apache.org
Date:   Mon Aug 11 17:33:47 2014 +0300

Merge branch 'cassandra-2.1' into trunk



 Create a CQL3 based bulk OutputFormat
 -

 Key: CASSANDRA-6927
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Paul Pak
Priority: Minor
  Labels: cql3, hadoop
 Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, 
 trunk-6927-v4.txt, trunk-6927-v5.txt, trunk-6927.txt


 This is the CQL compatible version of BulkOutputFormat.  CqlOutputFormat 
 exists, but doesn't write SSTables directly, similar to 
 ColumnFamilyOutputFormat for thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat

2014-08-05 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086598#comment-14086598
 ] 

Brandon Williams commented on CASSANDRA-6927:
-

[~sixpak32577] Are you still working on this?

 Create a CQL3 based bulk OutputFormat
 -

 Key: CASSANDRA-6927
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Paul Pak
Priority: Minor
  Labels: cql3, hadoop
 Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, 
 trunk-6927-v4.txt, trunk-6927.txt


 This is the CQL compatible version of BulkOutputFormat.  CqlOutputFormat 
 exists, but doesn't write SSTables directly, similar to 
 ColumnFamilyOutputFormat for thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat

2014-07-10 Thread Paul Pak (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057846#comment-14057846
 ] 

Paul Pak commented on CASSANDRA-6927:
-

I'm most likely going to move the 2 config setter/getter helper methods from 
CqlConfigHelper to CqlBulkOutputFormat. I think it'll make it clearer to the 
user that these configs are to be used with the CqlBulkOutputFormat, similar to 
the static config helper methods on other Hadoop OutputFormats (e.g. 
FileOutputFormat).

 Create a CQL3 based bulk OutputFormat
 -

 Key: CASSANDRA-6927
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Paul Pak
Priority: Minor
  Labels: cql3, hadoop
 Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, trunk-6927.txt


 This is the CQL compatible version of BulkOutputFormat.  CqlOutputFormat 
 exists, but doesn't write SSTables directly, similar to 
 ColumnFamilyOutputFormat for thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat

2014-07-09 Thread Paul Pak (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056404#comment-14056404
 ] 

Paul Pak commented on CASSANDRA-6927:
-

[~alexliu68] Hm, I'm not sure I like that approach. It's more complex than it 
needs to be and confusing for users. For one, the user would also need to know 
to set the cassandra.columnfamily.multipleoutputs property when using 
MultipleOutputs. Simply using MultipleOutputs should be enough indication of 
that without having to set a property saying so. Additionally, if I saw both 
CqlConfigHelper.setColumnFamilySchema(Configuration conf, String columnFamily) 
AND CqlConfigHelper.setColumnFamilySchema(Configuration conf) methods, I would 
be confused as so which one I was supposed to use. And using the wrong one 
would most likely lead to errors. As it currently stands, there is no ambiguity 
and it's not as if specifying the columnFamily is any extra burden for the 
user. The columnFamily for a particular schema and insert statement should be 
well known... it's part of the script itself.

 Create a CQL3 based bulk OutputFormat
 -

 Key: CASSANDRA-6927
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Paul Pak
Priority: Minor
  Labels: cql3, hadoop
 Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, trunk-6927.txt


 This is the CQL compatible version of BulkOutputFormat.  CqlOutputFormat 
 exists, but doesn't write SSTables directly, similar to 
 ColumnFamilyOutputFormat for thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat

2014-07-09 Thread Alex Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056472#comment-14056472
 ] 

Alex Liu commented on CASSANDRA-6927:
-

@paul pak The default use case is for a single  output  and the format of 
Hadoop properties is not specific to a columnfamily by default. Though the 
solution is not perfect, it does perceive the default format for single output. 
By default, you don't need set cassandra.columnfamily.multipleoutputs.

Some validation checking could be added to getColumnFamilySchema(Configuration 
conf, String columnFamily) and  getColumnFamilySchema(Configuration conf),  so 
if  user sets the wrong method of 
CqlConfigHelper.setColumnFamilySchema(Configuration conf, String columnFamily) 
or CqlConfigHelper.setColumnFamilySchema(Configuration conf) , an exception 
could be thrown.

Other outputformats haven't  support MultipleOutpus yet, this is kind of new. 
Either solution works,  adding cassandra.columnfamily.multipleoutputs just sync 
default property format with other output formats.




 Create a CQL3 based bulk OutputFormat
 -

 Key: CASSANDRA-6927
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Paul Pak
Priority: Minor
  Labels: cql3, hadoop
 Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, trunk-6927.txt


 This is the CQL compatible version of BulkOutputFormat.  CqlOutputFormat 
 exists, but doesn't write SSTables directly, similar to 
 ColumnFamilyOutputFormat for thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat

2014-07-09 Thread Paul Pak (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056557#comment-14056557
 ] 

Paul Pak commented on CASSANDRA-6927:
-

[~alexliu68] Hi Alex, thanks for your input. The fact that Hadoop properties 
aren't naturally specific to a column family is precisely the reason for not 
having generic schema/insertStatement properties and expecting them to apply to 
a particular column family, even if you happen to be working with only one 
column family. If some property value only applies to a specific column family, 
why not indicate it as such in the property key? It's certainly clearer and 
safer.

Also, what would be the benefit of having overloaded set/getColumnFamily* 
methods? They require additional validations to ensure the proper ones were 
used for the appropriate scenario, as opposed to having unambiguous ones that 
don't require any validation and work in all cases. The only possible benefit I 
can see is if there was a case where a column family was either unknown or not 
applicable, but that will never be the case with these schema/insertStatements 
properties.

In general, I prefer an approach where one solution works in all scenarios over 
one that entails variations of settings/methods that apply differently in 
different scenarios. It's adds unnecessary complexity without any benefits and 
is prone to user confusion, misuse, and error.

 Create a CQL3 based bulk OutputFormat
 -

 Key: CASSANDRA-6927
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Paul Pak
Priority: Minor
  Labels: cql3, hadoop
 Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, trunk-6927.txt


 This is the CQL compatible version of BulkOutputFormat.  CqlOutputFormat 
 exists, but doesn't write SSTables directly, similar to 
 ColumnFamilyOutputFormat for thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat

2014-07-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054786#comment-14054786
 ] 

Piotr Kołaczkowski commented on CASSANDRA-6927:
---

CqlOutputFormat, L31, L33:
{noformat}OutputFormat that allows reduce tasks insert the binded variable 
values{noformat}
binded - bound

org.apache.cassandra.hadoop.AbstractBulkRecordWriter.ExternalClient#init,
org.apache.cassandra.hadoop.AbstractBulkRecordWriter.ExternalClient#createThriftClient:
They duplicate quite a lot of code found in 
org.apache.cassandra.hadoop.ConfigHelper#getClientFromOutputAddressList. 
Additionally, the ExternalClient code hardcodes a reference to FramedTransport 
and doesn't use the configured ITransportFactory (extension point used by e.g. 
DSE to plugin kerberos authentication). I know this code was there before (it 
was moved from BulkRecordWriter), but maybe this is a good occasion to clean it 
up a little?

org.apache.cassandra.hadoop.cql3.CqlBulkRecordWriter#prepareWriter, L88:
Unchecked result of mkdirs.

org/apache/cassandra/hadoop/cql3/CqlBulkRecordWriter.java:89:
getColumnFamilySchema result is not checked; might return null and cause NPE 
instead of descriptive message.

org/apache/cassandra/hadoop/cql3/CqlBulkRecordWriter.java:96
org.apache.cassandra.hadoop.cql3.CqlBulkRecordWriter#getColumnFamilyInsertStatement
 - as above, needs proper null check; 

I think those two should be checked in checkOutputSpecs at the level of 
CqlBulkOutputFormat (needs override); better fail early.

org.apache.cassandra.hadoop.cql3.CqlBulkRecordWriter#write doesn't invoke 
Hadoop progress method, as BulkRecordWriter does:
{noformat}
if (null != progress)
   progress.progress();
if (null != context)
   HadoopCompat.progress(context);
{noformat}

org/apache/cassandra/hadoop/cql3/CqlBulkRecordWriter.java:144
{noformat}
  throw new IOException(Error adding row, e);
{noformat}
Maybe logging the key in the error message would be useful here? 






 Create a CQL3 based bulk OutputFormat
 -

 Key: CASSANDRA-6927
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Paul Pak
Priority: Minor
  Labels: cql3, hadoop
 Attachments: 6927-2.0-branch-v2.txt, trunk-6927.txt


 This is the CQL compatible version of BulkOutputFormat.  CqlOutputFormat 
 exists, but doesn't write SSTables directly, similar to 
 ColumnFamilyOutputFormat for thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat

2014-07-08 Thread Paul Pak (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055656#comment-14055656
 ] 

Paul Pak commented on CASSANDRA-6927:
-

[~pkolaczk] Above issues have been addressed in trunk-6927-v3.txt. I reverted a 
change from [~alexliu68] in CqlBulkRecordWriter retrieving the schema and 
insertStatement and refactored it. The reason why I couldn't have a generic 
config for the schema and insertStatement, but needed columnFamily-specific 
ones is to accommodate writing to multiple columnFamilies using MultipleOutputs.

 Create a CQL3 based bulk OutputFormat
 -

 Key: CASSANDRA-6927
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Paul Pak
Priority: Minor
  Labels: cql3, hadoop
 Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, trunk-6927.txt


 This is the CQL compatible version of BulkOutputFormat.  CqlOutputFormat 
 exists, but doesn't write SSTables directly, similar to 
 ColumnFamilyOutputFormat for thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat

2014-07-08 Thread Paul Pak (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055678#comment-14055678
 ] 

Paul Pak commented on CASSANDRA-6927:
-

By doing:
{code:java}
MultipleOutputs.addNamedOutput(job, myColumnFamily, 
CqlBulkOutputFormat.class, Object.class, List.class);
CqlConfigHelper.setColumnFamilySchema(conf, myColumnFamily, CREATE TABLE 
myKeyspace.myColumnFamily ...);
CqlConfigHelper.setColumnFamilyInsertStatement(conf, myColumnFamily, 
UPDATE myKeyspace.myColumnFamily SET );
{code}
you'll be able to write to multiple columnFamilies by doing:
{code:java}
MultipleOutputs multiOutputs = ...
multiOutputs.write(myColumnFamily, null, values);
{code}

 Create a CQL3 based bulk OutputFormat
 -

 Key: CASSANDRA-6927
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Paul Pak
Priority: Minor
  Labels: cql3, hadoop
 Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, trunk-6927.txt


 This is the CQL compatible version of BulkOutputFormat.  CqlOutputFormat 
 exists, but doesn't write SSTables directly, similar to 
 ColumnFamilyOutputFormat for thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat

2014-07-08 Thread Alex Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055822#comment-14055822
 ] 

Alex Liu commented on CASSANDRA-6927:
-

[~sixpak32577] You can create a boolean property,  
cassandra.columnfamily.multipleoutputs, in CqlConfigHelper. so if it's set to 
true, use getColumnFamilySchema(Configuration conf, String columnFamily), 
otherwise use getColumnFamilySchema(Configuration conf). By default, it's set 
to false.

 Create a CQL3 based bulk OutputFormat
 -

 Key: CASSANDRA-6927
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Paul Pak
Priority: Minor
  Labels: cql3, hadoop
 Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, trunk-6927.txt


 This is the CQL compatible version of BulkOutputFormat.  CqlOutputFormat 
 exists, but doesn't write SSTables directly, similar to 
 ColumnFamilyOutputFormat for thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat

2014-06-18 Thread Alex Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14036616#comment-14036616
 ] 

Alex Liu commented on CASSANDRA-6927:
-

CASSANDRA-7412 has Pig support the Cql based bulk output format.

 Create a CQL3 based bulk OutputFormat
 -

 Key: CASSANDRA-6927
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Paul Pak
Priority: Minor
  Labels: cql3, hadoop
 Attachments: 6927-2.0-branch-v2.txt, trunk-6927.txt


 This is the CQL compatible version of BulkOutputFormat.  CqlOutputFormat 
 exists, but doesn't write SSTables directly, similar to 
 ColumnFamilyOutputFormat for thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat

2014-06-17 Thread Alex Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034314#comment-14034314
 ] 

Alex Liu commented on CASSANDRA-6927:
-

[~sixpak32577] I will take your code and make some interface changes.

 Create a CQL3 based bulk OutputFormat
 -

 Key: CASSANDRA-6927
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Paul Pak
Priority: Minor
  Labels: cql3, hadoop
 Attachments: trunk-6927.txt


 This is the CQL compatible version of BulkOutputFormat.  CqlOutputFormat 
 exists, but doesn't write SSTables directly, similar to 
 ColumnFamilyOutputFormat for thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat

2014-06-17 Thread Paul Pak (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034378#comment-14034378
 ] 

Paul Pak commented on CASSANDRA-6927:
-

@alexliu68 What type of interface changes did you have in mind?

 Create a CQL3 based bulk OutputFormat
 -

 Key: CASSANDRA-6927
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Paul Pak
Priority: Minor
  Labels: cql3, hadoop
 Attachments: trunk-6927.txt


 This is the CQL compatible version of BulkOutputFormat.  CqlOutputFormat 
 exists, but doesn't write SSTables directly, similar to 
 ColumnFamilyOutputFormat for thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat

2014-06-17 Thread Alex Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034441#comment-14034441
 ] 

Alex Liu commented on CASSANDRA-6927:
-

v2 on cassandra-2.0 branch is attached. It change the interface from 
OutputFormatByteBuffer,ListMutation to OutputFormatObject, ListMutation 
where Object is a placeholder for an object, but internally it's not used, so 
it can just be null object. ListMutation is the binder variable values to the 
insert statement.

I also remove the . + columnFamily  in
{code}
private String getColumnFamilySchema()
{
return conf.get(COLUMNFAMILY_SCHEMA + . + columnFamily);
}

private String getColumnFamilyInsertStatement()
{
return conf.get(COLUMNFAMILY_INSERT_STATEMENT + . + columnFamily);
}
{code}

and some code format clean up

 Create a CQL3 based bulk OutputFormat
 -

 Key: CASSANDRA-6927
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Paul Pak
Priority: Minor
  Labels: cql3, hadoop
 Attachments: 6927-2.0-branch-v2.txt, trunk-6927.txt


 This is the CQL compatible version of BulkOutputFormat.  CqlOutputFormat 
 exists, but doesn't write SSTables directly, similar to 
 ColumnFamilyOutputFormat for thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat

2014-04-29 Thread Paul Pak (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984588#comment-13984588
 ] 

Paul Pak commented on CASSANDRA-6927:
-

[~pkolaczk] Hi Piotr, any idea on when this may get reviewed?

 Create a CQL3 based bulk OutputFormat
 -

 Key: CASSANDRA-6927
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Paul Pak
Priority: Minor
  Labels: cql3, hadoop
 Attachments: trunk-6927.txt


 This is the CQL compatible version of BulkOutputFormat.  CqlOutputFormat 
 exists, but doesn't write SSTables directly, similar to 
 ColumnFamilyOutputFormat for thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat

2014-03-27 Thread Paul Pak (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949386#comment-13949386
 ] 

Paul Pak commented on CASSANDRA-6927:
-

Initial thought is to use the CQLSSTableWriter to write the SSTable files, and 
then use a variation of the SSTableLoader to do the load.  The only main 
difference I see with the new table loader compared to SSTableLoader would be 
in the way that CFMetaData is determined, since the thrift call for the 
keyspace ignores CQL3 tables.  Any feedback on this approach would be welcome.

 Create a CQL3 based bulk OutputFormat
 -

 Key: CASSANDRA-6927
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Paul Pak
Priority: Minor
  Labels: cql3, hadoop

 This is the CQL compatible version of BulkOutputFormat.  CqlOutputFormat 
 exists, but doesn't write SSTables directly, similar to 
 ColumnFamilyOutputFormat for thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat

2014-03-27 Thread Paul Pak (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949398#comment-13949398
 ] 

Paul Pak commented on CASSANDRA-6927:
-

One aspect that doesn't line up perfectly is that CQLSSTableWriter.addRow() and 
.rawAddRow() methods simply take the column values or name-value pairs as 
parameters into the stored procedure, while Hadoop's RecordWriter.write() 
method separates its parameters by keys and values.  My plan is to have the new 
writer typed with ListByteBuffer, ListByteBuffer, and when the 
.write(ListByteBuffer, ListByteBuffer) method internally calls 
CQLSSTableWriter.rawAddRow(ListByteBuffer), just append the values list to 
the keys list.

 Create a CQL3 based bulk OutputFormat
 -

 Key: CASSANDRA-6927
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Paul Pak
Priority: Minor
  Labels: cql3, hadoop

 This is the CQL compatible version of BulkOutputFormat.  CqlOutputFormat 
 exists, but doesn't write SSTables directly, similar to 
 ColumnFamilyOutputFormat for thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)