subject:"\[jira\] \[Updated\] \(CASSANDRA\-4208\) ColumnFamilyOutputFormat should support writing to multiple column families"

[jira] [Updated] (CASSANDRA-4208) ColumnFamilyOutputFormat should support writing to multiple column families

2012-09-27 Thread Robbie Strickland (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-4208:
-

Attachment: trunk-4208-v3.txt

I've attached the new patch (v3) rebased against trunk.

 ColumnFamilyOutputFormat should support writing to multiple column families
 ---

 Key: CASSANDRA-4208
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4208
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Affects Versions: 1.1.0
Reporter: Robbie Strickland
 Attachments: cassandra-1.1-4208.txt, cassandra-1.1-4208-v2.txt, 
 cassandra-1.1-4208-v3.txt, cassandra-1.1-4208-v4.txt, trunk-4208.txt, 
 trunk-4208-v2.txt, trunk-4208-v3.txt


 It is not currently possible to output records to more than one column family 
 in a single reducer.  Considering that writing values to Cassandra often 
 involves multiple column families (i.e. updating your index when you insert a 
 new value), this seems overly restrictive.  I am submitting a patch that 
 moves the specification of column family from the job configuration to the 
 write() call in ColumnFamilyRecordWriter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4208) ColumnFamilyOutputFormat should support writing to multiple column families

2012-09-21 Thread Robbie Strickland (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-4208:
-

Attachment: cassandra-1.1-4208-v4.txt

I've attached a new patch that removes the check for a null output CF on 
BulkOutputFormat.  This allows BOF to use the MultipleOutputs API.

 ColumnFamilyOutputFormat should support writing to multiple column families
 ---

 Key: CASSANDRA-4208
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4208
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Affects Versions: 1.1.0
Reporter: Robbie Strickland
 Attachments: cassandra-1.1-4208.txt, cassandra-1.1-4208-v2.txt, 
 cassandra-1.1-4208-v3.txt, cassandra-1.1-4208-v4.txt, trunk-4208.txt, 
 trunk-4208-v2.txt


 It is not currently possible to output records to more than one column family 
 in a single reducer.  Considering that writing values to Cassandra often 
 involves multiple column families (i.e. updating your index when you insert a 
 new value), this seems overly restrictive.  I am submitting a patch that 
 moves the specification of column family from the job configuration to the 
 write() call in ColumnFamilyRecordWriter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4208) ColumnFamilyOutputFormat should support writing to multiple column families

2012-05-10 Thread Robbie Strickland (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robbie Strickland updated CASSANDRA-4208:
-

Attachment: cassandra-1.1-4208-v2.txt

I've attached a patch that adds a setOutputColumnFamily() overload that takes
in both keyspace and CF. The one outstanding issue that I've commented on in
CFOF is that checkOutputSpecs() cannot currently ensure that a CF has been
specified either through setOutputColumnFamily() or MultipleOutputs.

Unfortunately MultipleOutputs.getNamedOutputsList()--which would be the right
way to do this--is currently private. So we either don't do the check and let
it throw an NPE at runtime, or we duplicate the code in MultipleOutputs to grab
the values from config ourselves. Not sure which is the lesser of two evils.

ColumnFamilyOutputFormat should support writing to multiple column families
---

Key: CASSANDRA-4208
URL: https://issues.apache.org/jira/browse/CASSANDRA-4208
Project: Cassandra
Issue Type: Improvement
Components: Hadoop
Affects Versions: 1.1.0
Reporter: Robbie Strickland
Attachments: cassandra-1.1-4208-v2.txt, cassandra-1.1-4208.txt,
trunk-4208-v2.txt, trunk-4208.txt

It is not currently possible to output records to more than one column family
in a single reducer. Considering that writing values to Cassandra often
involves multiple column families (i.e. updating your index when you insert a
new value), this seems overly restrictive. I am submitting a patch that
moves the specification of column family from the job configuration to the
write() call in ColumnFamilyRecordWriter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4208) ColumnFamilyOutputFormat should support writing to multiple column families

2012-05-04 Thread Robbie Strickland (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robbie Strickland updated CASSANDRA-4208:
-

Attachment: cassandra-1.1-4208.txt

It appears I was mistaken about the MultipleOutputs issue being resolved only
in trunk. It's resolved in the mapred package in trunk, but the new version in
mapreduce dates at least back to 1.0.1. It still references FileOutputFormat,
but the attached patch gets around this by using the same config key. I have
attached a new patch based against Cassandra 1.1 and Hadoop 1.0.2. Changes are
actually minimal. Let me know your thoughts...

ColumnFamilyOutputFormat should support writing to multiple column families
---

[jira] [Updated] (CASSANDRA-4208) ColumnFamilyOutputFormat should support writing to multiple column families

2012-05-03 Thread Robbie Strickland (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robbie Strickland updated CASSANDRA-4208:
-

Attachment: trunk-4208-v2.txt

I've added a patch to allow support for MultipleOutputs. Hadoop trunk now
contains a new version of MultipleOutputs that should support this out of the
box, although I am submitting a patch to deal with an inconsistency that could
cause future issues with non-file formats.

The basic solution involves changing the config key for output CF to match the
basename key being written by MultipleOutputs. I had to make related changes
to CassandraStorage and TestRingCache, as well as some minor changes to
ColumnFamilyInputFormat to account for some interface changes in Hadoop trunk.

So the bottom line is this will work if people use Hadoop and Cassandra trunk
with both patches applied. The original patch can be used as a temporary
solution if needed.

ColumnFamilyOutputFormat should support writing to multiple column families
---

[jira] [Updated] (CASSANDRA-4208) ColumnFamilyOutputFormat should support writing to multiple column families

2012-05-01 Thread Robbie Strickland (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-4208:
-

Attachment: trunk-4208.txt

 ColumnFamilyOutputFormat should support writing to multiple column families
 ---

 Key: CASSANDRA-4208
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4208
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Affects Versions: 1.1.0
Reporter: Robbie Strickland
 Attachments: trunk-4208.txt


 It is not currently possible to output records to more than one column family 
 in a single reducer.  Considering that writing values to Cassandra often 
 involves multiple column families (i.e. updating your index when you insert a 
 new value), this seems overly restrictive.  I am submitting a patch that 
 moves the specification of column family from the job configuration to the 
 write() call in ColumnFamilyRecordWriter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4208) ColumnFamilyOutputFormat should support writing to multiple column families

2012-05-01 Thread Robbie Strickland (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-4208:
-

Comment: was deleted

(was: I created an issue regarding the specificity of MultipleOutputs to 
FileOutputFormat. Linked here as an FYI.)

 ColumnFamilyOutputFormat should support writing to multiple column families
 ---

 Key: CASSANDRA-4208
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4208
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Affects Versions: 1.1.0
Reporter: Robbie Strickland
 Attachments: trunk-4208.txt


 It is not currently possible to output records to more than one column family 
 in a single reducer.  Considering that writing values to Cassandra often 
 involves multiple column families (i.e. updating your index when you insert a 
 new value), this seems overly restrictive.  I am submitting a patch that 
 moves the specification of column family from the job configuration to the 
 write() call in ColumnFamilyRecordWriter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4208) ColumnFamilyOutputFormat should support writing to multiple column families

[jira] [Updated] (CASSANDRA-4208) ColumnFamilyOutputFormat should support writing to multiple column families

[jira] [Updated] (CASSANDRA-4208) ColumnFamilyOutputFormat should support writing to multiple column families

[jira] [Updated] (CASSANDRA-4208) ColumnFamilyOutputFormat should support writing to multiple column families

[jira] [Updated] (CASSANDRA-4208) ColumnFamilyOutputFormat should support writing to multiple column families

[jira] [Updated] (CASSANDRA-4208) ColumnFamilyOutputFormat should support writing to multiple column families

[jira] [Updated] (CASSANDRA-4208) ColumnFamilyOutputFormat should support writing to multiple column families

7 matches

Site Navigation

Mail list logo

Footer information