[jira] [Updated] (CASSANDRA-4208) ColumnFamilyOutputFormat should support writing to multiple column families

2012-09-27 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-4208:
-

Attachment: trunk-4208-v3.txt

I've attached the new patch (v3) rebased against trunk.

 ColumnFamilyOutputFormat should support writing to multiple column families
 ---

 Key: CASSANDRA-4208
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4208
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Affects Versions: 1.1.0
Reporter: Robbie Strickland
 Attachments: cassandra-1.1-4208.txt, cassandra-1.1-4208-v2.txt, 
 cassandra-1.1-4208-v3.txt, cassandra-1.1-4208-v4.txt, trunk-4208.txt, 
 trunk-4208-v2.txt, trunk-4208-v3.txt


 It is not currently possible to output records to more than one column family 
 in a single reducer.  Considering that writing values to Cassandra often 
 involves multiple column families (i.e. updating your index when you insert a 
 new value), this seems overly restrictive.  I am submitting a patch that 
 moves the specification of column family from the job configuration to the 
 write() call in ColumnFamilyRecordWriter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-4208) ColumnFamilyOutputFormat should support writing to multiple column families

2012-09-21 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-4208:
-

Attachment: cassandra-1.1-4208-v4.txt

I've attached a new patch that removes the check for a null output CF on 
BulkOutputFormat.  This allows BOF to use the MultipleOutputs API.

 ColumnFamilyOutputFormat should support writing to multiple column families
 ---

 Key: CASSANDRA-4208
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4208
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Affects Versions: 1.1.0
Reporter: Robbie Strickland
 Attachments: cassandra-1.1-4208.txt, cassandra-1.1-4208-v2.txt, 
 cassandra-1.1-4208-v3.txt, cassandra-1.1-4208-v4.txt, trunk-4208.txt, 
 trunk-4208-v2.txt


 It is not currently possible to output records to more than one column family 
 in a single reducer.  Considering that writing values to Cassandra often 
 involves multiple column families (i.e. updating your index when you insert a 
 new value), this seems overly restrictive.  I am submitting a patch that 
 moves the specification of column family from the job configuration to the 
 write() call in ColumnFamilyRecordWriter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-4208) ColumnFamilyOutputFormat should support writing to multiple column families

2012-05-10 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-4208:
-

Attachment: cassandra-1.1-4208-v2.txt

I've attached a patch that adds a  setOutputColumnFamily() overload that takes 
in both keyspace and CF.  The one outstanding issue that I've commented on in 
CFOF is that checkOutputSpecs() cannot currently ensure that a CF has been 
specified either through setOutputColumnFamily() or MultipleOutputs.  

Unfortunately MultipleOutputs.getNamedOutputsList()--which would be the right 
way to do this--is currently private.  So we either don't do the check and let 
it throw an NPE at runtime, or we duplicate the code in MultipleOutputs to grab 
the values from config ourselves.  Not sure which is the lesser of two evils. 

 ColumnFamilyOutputFormat should support writing to multiple column families
 ---

 Key: CASSANDRA-4208
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4208
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Affects Versions: 1.1.0
Reporter: Robbie Strickland
 Attachments: cassandra-1.1-4208-v2.txt, cassandra-1.1-4208.txt, 
 trunk-4208-v2.txt, trunk-4208.txt


 It is not currently possible to output records to more than one column family 
 in a single reducer.  Considering that writing values to Cassandra often 
 involves multiple column families (i.e. updating your index when you insert a 
 new value), this seems overly restrictive.  I am submitting a patch that 
 moves the specification of column family from the job configuration to the 
 write() call in ColumnFamilyRecordWriter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-4208) ColumnFamilyOutputFormat should support writing to multiple column families

2012-05-04 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-4208:
-

Attachment: cassandra-1.1-4208.txt

It appears I was mistaken about the MultipleOutputs issue being resolved only 
in trunk.  It's resolved in the mapred package in trunk, but the new version in 
mapreduce dates at least back to 1.0.1.  It still references FileOutputFormat, 
but the attached patch gets around this by using the same config key.  I have 
attached a new patch based against Cassandra 1.1 and Hadoop 1.0.2.  Changes are 
actually minimal.  Let me know your thoughts...

 ColumnFamilyOutputFormat should support writing to multiple column families
 ---

 Key: CASSANDRA-4208
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4208
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Affects Versions: 1.1.0
Reporter: Robbie Strickland
 Attachments: cassandra-1.1-4208.txt, trunk-4208-v2.txt, trunk-4208.txt


 It is not currently possible to output records to more than one column family 
 in a single reducer.  Considering that writing values to Cassandra often 
 involves multiple column families (i.e. updating your index when you insert a 
 new value), this seems overly restrictive.  I am submitting a patch that 
 moves the specification of column family from the job configuration to the 
 write() call in ColumnFamilyRecordWriter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-4208) ColumnFamilyOutputFormat should support writing to multiple column families

2012-05-03 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-4208:
-

Attachment: trunk-4208-v2.txt

I've added a patch to allow support for MultipleOutputs. Hadoop trunk now 
contains a new version of MultipleOutputs that should support this out of the 
box, although I am submitting a patch to deal with an inconsistency that could 
cause future issues with non-file formats.

The basic solution involves changing the config key for output CF to match the 
basename key being written by MultipleOutputs. I had to make related changes 
to CassandraStorage and TestRingCache, as well as some minor changes to 
ColumnFamilyInputFormat to account for some interface changes in Hadoop trunk.

So the bottom line is this will work if people use Hadoop and Cassandra trunk 
with both patches applied. The original patch can be used as a temporary 
solution if needed.

 ColumnFamilyOutputFormat should support writing to multiple column families
 ---

 Key: CASSANDRA-4208
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4208
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Affects Versions: 1.1.0
Reporter: Robbie Strickland
 Attachments: trunk-4208-v2.txt, trunk-4208.txt


 It is not currently possible to output records to more than one column family 
 in a single reducer.  Considering that writing values to Cassandra often 
 involves multiple column families (i.e. updating your index when you insert a 
 new value), this seems overly restrictive.  I am submitting a patch that 
 moves the specification of column family from the job configuration to the 
 write() call in ColumnFamilyRecordWriter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-4208) ColumnFamilyOutputFormat should support writing to multiple column families

2012-05-01 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-4208:
-

Attachment: trunk-4208.txt

 ColumnFamilyOutputFormat should support writing to multiple column families
 ---

 Key: CASSANDRA-4208
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4208
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Affects Versions: 1.1.0
Reporter: Robbie Strickland
 Attachments: trunk-4208.txt


 It is not currently possible to output records to more than one column family 
 in a single reducer.  Considering that writing values to Cassandra often 
 involves multiple column families (i.e. updating your index when you insert a 
 new value), this seems overly restrictive.  I am submitting a patch that 
 moves the specification of column family from the job configuration to the 
 write() call in ColumnFamilyRecordWriter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-4208) ColumnFamilyOutputFormat should support writing to multiple column families

2012-05-01 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-4208:
-

Comment: was deleted

(was: I created an issue regarding the specificity of MultipleOutputs to 
FileOutputFormat. Linked here as an FYI.)

 ColumnFamilyOutputFormat should support writing to multiple column families
 ---

 Key: CASSANDRA-4208
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4208
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Affects Versions: 1.1.0
Reporter: Robbie Strickland
 Attachments: trunk-4208.txt


 It is not currently possible to output records to more than one column family 
 in a single reducer.  Considering that writing values to Cassandra often 
 involves multiple column families (i.e. updating your index when you insert a 
 new value), this seems overly restrictive.  I am submitting a patch that 
 moves the specification of column family from the job configuration to the 
 write() call in ColumnFamilyRecordWriter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira