[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095606#comment-14095606 ] Jonathan Ellis commented on CASSANDRA-6927: --- Why add the Config.isClientMode check to isLocalDC? Create a CQL3 based bulk OutputFormat - Key: CASSANDRA-6927 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Paul Pak Priority: Minor Labels: cql3, hadoop Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, trunk-6927-v4.txt, trunk-6927-v5.txt, trunk-6927.txt This is the CQL compatible version of BulkOutputFormat. CqlOutputFormat exists, but doesn't write SSTables directly, similar to ColumnFamilyOutputFormat for thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095675#comment-14095675 ] Paul Pak commented on CASSANDRA-6927: - [~jbellis] Ah, good catch. I had to think back a while, but if I recall correctly I was initially working off of 2.1.0-beta1, which set isLocalDC in the StreamRateLimiter constructor like this: {code} isLocalDC = DatabaseDescriptor.getLocalDataCenter().equals( DatabaseDescriptor.getEndpointSnitch().getDatacenter(peer)); {code} which was throwing an NPE because DatabaseDescriptor.getLocalDataCenter() was returning null. At that time I mistakenly thought Config.isClientMode() was inversely related to whether a local dc was being used, so I used isClientMode() to short circuit and bypass the NPE. However, since then I see that the appropriate null checks were added and that 'is client mode' isn't related to 'is local dc' as I had originally thought. I'll remove that from the patch. Create a CQL3 based bulk OutputFormat - Key: CASSANDRA-6927 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Paul Pak Priority: Minor Labels: cql3, hadoop Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, trunk-6927-v4.txt, trunk-6927-v5.txt, trunk-6927.txt This is the CQL compatible version of BulkOutputFormat. CqlOutputFormat exists, but doesn't write SSTables directly, similar to ColumnFamilyOutputFormat for thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092532#comment-14092532 ] Piotr Kołaczkowski commented on CASSANDRA-6927: --- Ok, doing the review today. Thanks for the reminder. Create a CQL3 based bulk OutputFormat - Key: CASSANDRA-6927 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Paul Pak Priority: Minor Labels: cql3, hadoop Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, trunk-6927-v4.txt, trunk-6927.txt This is the CQL compatible version of BulkOutputFormat. CqlOutputFormat exists, but doesn't write SSTables directly, similar to ColumnFamilyOutputFormat for thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092559#comment-14092559 ] Piotr Kołaczkowski commented on CASSANDRA-6927: --- Patch does not apply to trunk: {noformat} $ git apply trunk-6927-v4.txt trunk-6927-v4.txt:148: trailing whitespace. trunk-6927-v4.txt:150: trailing whitespace. trunk-6927-v4.txt:153: trailing whitespace. protected final int bufferSize; trunk-6927-v4.txt:158: trailing whitespace. trunk-6927-v4.txt:326: trailing whitespace. } error: patch failed: src/java/org/apache/cassandra/hadoop/cql3/CqlConfigHelper.java:20 error: src/java/org/apache/cassandra/hadoop/cql3/CqlConfigHelper.java: patch does not apply error: patch failed: src/java/org/apache/cassandra/streaming/StreamManager.java:76 error: src/java/org/apache/cassandra/streaming/StreamManager.java: patch does not apply {noformat} Create a CQL3 based bulk OutputFormat - Key: CASSANDRA-6927 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Paul Pak Priority: Minor Labels: cql3, hadoop Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, trunk-6927-v4.txt, trunk-6927.txt This is the CQL compatible version of BulkOutputFormat. CqlOutputFormat exists, but doesn't write SSTables directly, similar to ColumnFamilyOutputFormat for thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092590#comment-14092590 ] Piotr Kołaczkowski commented on CASSANDRA-6927: --- +1 I applied the patch using IDE (it seems to have stronger algorithms for applying patches than the default in git) and all looks good. Please update the patch to make it mergable and move to testing. Create a CQL3 based bulk OutputFormat - Key: CASSANDRA-6927 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Paul Pak Priority: Minor Labels: cql3, hadoop Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, trunk-6927-v4.txt, trunk-6927.txt This is the CQL compatible version of BulkOutputFormat. CqlOutputFormat exists, but doesn't write SSTables directly, similar to ColumnFamilyOutputFormat for thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092920#comment-14092920 ] Paul Pak commented on CASSANDRA-6927: - [~pkolaczk] New patch (trunk-6927-v5.txt) attached. This was generated against the following commit, so I believe it should work without issues for you: commit c7e191ba128841b3e67a168e0d0fb97ca2eed2dd Merge: d2b24de 50ee3a7 Author: Aleksey Yeschenko alek...@apache.org Date: Mon Aug 11 17:33:47 2014 +0300 Merge branch 'cassandra-2.1' into trunk Create a CQL3 based bulk OutputFormat - Key: CASSANDRA-6927 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Paul Pak Priority: Minor Labels: cql3, hadoop Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, trunk-6927-v4.txt, trunk-6927-v5.txt, trunk-6927.txt This is the CQL compatible version of BulkOutputFormat. CqlOutputFormat exists, but doesn't write SSTables directly, similar to ColumnFamilyOutputFormat for thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086598#comment-14086598 ] Brandon Williams commented on CASSANDRA-6927: - [~sixpak32577] Are you still working on this? Create a CQL3 based bulk OutputFormat - Key: CASSANDRA-6927 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Paul Pak Priority: Minor Labels: cql3, hadoop Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, trunk-6927-v4.txt, trunk-6927.txt This is the CQL compatible version of BulkOutputFormat. CqlOutputFormat exists, but doesn't write SSTables directly, similar to ColumnFamilyOutputFormat for thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057846#comment-14057846 ] Paul Pak commented on CASSANDRA-6927: - I'm most likely going to move the 2 config setter/getter helper methods from CqlConfigHelper to CqlBulkOutputFormat. I think it'll make it clearer to the user that these configs are to be used with the CqlBulkOutputFormat, similar to the static config helper methods on other Hadoop OutputFormats (e.g. FileOutputFormat). Create a CQL3 based bulk OutputFormat - Key: CASSANDRA-6927 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Paul Pak Priority: Minor Labels: cql3, hadoop Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, trunk-6927.txt This is the CQL compatible version of BulkOutputFormat. CqlOutputFormat exists, but doesn't write SSTables directly, similar to ColumnFamilyOutputFormat for thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056404#comment-14056404 ] Paul Pak commented on CASSANDRA-6927: - [~alexliu68] Hm, I'm not sure I like that approach. It's more complex than it needs to be and confusing for users. For one, the user would also need to know to set the cassandra.columnfamily.multipleoutputs property when using MultipleOutputs. Simply using MultipleOutputs should be enough indication of that without having to set a property saying so. Additionally, if I saw both CqlConfigHelper.setColumnFamilySchema(Configuration conf, String columnFamily) AND CqlConfigHelper.setColumnFamilySchema(Configuration conf) methods, I would be confused as so which one I was supposed to use. And using the wrong one would most likely lead to errors. As it currently stands, there is no ambiguity and it's not as if specifying the columnFamily is any extra burden for the user. The columnFamily for a particular schema and insert statement should be well known... it's part of the script itself. Create a CQL3 based bulk OutputFormat - Key: CASSANDRA-6927 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Paul Pak Priority: Minor Labels: cql3, hadoop Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, trunk-6927.txt This is the CQL compatible version of BulkOutputFormat. CqlOutputFormat exists, but doesn't write SSTables directly, similar to ColumnFamilyOutputFormat for thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056472#comment-14056472 ] Alex Liu commented on CASSANDRA-6927: - @paul pak The default use case is for a single output and the format of Hadoop properties is not specific to a columnfamily by default. Though the solution is not perfect, it does perceive the default format for single output. By default, you don't need set cassandra.columnfamily.multipleoutputs. Some validation checking could be added to getColumnFamilySchema(Configuration conf, String columnFamily) and getColumnFamilySchema(Configuration conf), so if user sets the wrong method of CqlConfigHelper.setColumnFamilySchema(Configuration conf, String columnFamily) or CqlConfigHelper.setColumnFamilySchema(Configuration conf) , an exception could be thrown. Other outputformats haven't support MultipleOutpus yet, this is kind of new. Either solution works, adding cassandra.columnfamily.multipleoutputs just sync default property format with other output formats. Create a CQL3 based bulk OutputFormat - Key: CASSANDRA-6927 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Paul Pak Priority: Minor Labels: cql3, hadoop Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, trunk-6927.txt This is the CQL compatible version of BulkOutputFormat. CqlOutputFormat exists, but doesn't write SSTables directly, similar to ColumnFamilyOutputFormat for thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056557#comment-14056557 ] Paul Pak commented on CASSANDRA-6927: - [~alexliu68] Hi Alex, thanks for your input. The fact that Hadoop properties aren't naturally specific to a column family is precisely the reason for not having generic schema/insertStatement properties and expecting them to apply to a particular column family, even if you happen to be working with only one column family. If some property value only applies to a specific column family, why not indicate it as such in the property key? It's certainly clearer and safer. Also, what would be the benefit of having overloaded set/getColumnFamily* methods? They require additional validations to ensure the proper ones were used for the appropriate scenario, as opposed to having unambiguous ones that don't require any validation and work in all cases. The only possible benefit I can see is if there was a case where a column family was either unknown or not applicable, but that will never be the case with these schema/insertStatements properties. In general, I prefer an approach where one solution works in all scenarios over one that entails variations of settings/methods that apply differently in different scenarios. It's adds unnecessary complexity without any benefits and is prone to user confusion, misuse, and error. Create a CQL3 based bulk OutputFormat - Key: CASSANDRA-6927 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Paul Pak Priority: Minor Labels: cql3, hadoop Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, trunk-6927.txt This is the CQL compatible version of BulkOutputFormat. CqlOutputFormat exists, but doesn't write SSTables directly, similar to ColumnFamilyOutputFormat for thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054786#comment-14054786 ] Piotr Kołaczkowski commented on CASSANDRA-6927: --- CqlOutputFormat, L31, L33: {noformat}OutputFormat that allows reduce tasks insert the binded variable values{noformat} binded - bound org.apache.cassandra.hadoop.AbstractBulkRecordWriter.ExternalClient#init, org.apache.cassandra.hadoop.AbstractBulkRecordWriter.ExternalClient#createThriftClient: They duplicate quite a lot of code found in org.apache.cassandra.hadoop.ConfigHelper#getClientFromOutputAddressList. Additionally, the ExternalClient code hardcodes a reference to FramedTransport and doesn't use the configured ITransportFactory (extension point used by e.g. DSE to plugin kerberos authentication). I know this code was there before (it was moved from BulkRecordWriter), but maybe this is a good occasion to clean it up a little? org.apache.cassandra.hadoop.cql3.CqlBulkRecordWriter#prepareWriter, L88: Unchecked result of mkdirs. org/apache/cassandra/hadoop/cql3/CqlBulkRecordWriter.java:89: getColumnFamilySchema result is not checked; might return null and cause NPE instead of descriptive message. org/apache/cassandra/hadoop/cql3/CqlBulkRecordWriter.java:96 org.apache.cassandra.hadoop.cql3.CqlBulkRecordWriter#getColumnFamilyInsertStatement - as above, needs proper null check; I think those two should be checked in checkOutputSpecs at the level of CqlBulkOutputFormat (needs override); better fail early. org.apache.cassandra.hadoop.cql3.CqlBulkRecordWriter#write doesn't invoke Hadoop progress method, as BulkRecordWriter does: {noformat} if (null != progress) progress.progress(); if (null != context) HadoopCompat.progress(context); {noformat} org/apache/cassandra/hadoop/cql3/CqlBulkRecordWriter.java:144 {noformat} throw new IOException(Error adding row, e); {noformat} Maybe logging the key in the error message would be useful here? Create a CQL3 based bulk OutputFormat - Key: CASSANDRA-6927 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Paul Pak Priority: Minor Labels: cql3, hadoop Attachments: 6927-2.0-branch-v2.txt, trunk-6927.txt This is the CQL compatible version of BulkOutputFormat. CqlOutputFormat exists, but doesn't write SSTables directly, similar to ColumnFamilyOutputFormat for thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055656#comment-14055656 ] Paul Pak commented on CASSANDRA-6927: - [~pkolaczk] Above issues have been addressed in trunk-6927-v3.txt. I reverted a change from [~alexliu68] in CqlBulkRecordWriter retrieving the schema and insertStatement and refactored it. The reason why I couldn't have a generic config for the schema and insertStatement, but needed columnFamily-specific ones is to accommodate writing to multiple columnFamilies using MultipleOutputs. Create a CQL3 based bulk OutputFormat - Key: CASSANDRA-6927 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Paul Pak Priority: Minor Labels: cql3, hadoop Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, trunk-6927.txt This is the CQL compatible version of BulkOutputFormat. CqlOutputFormat exists, but doesn't write SSTables directly, similar to ColumnFamilyOutputFormat for thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055678#comment-14055678 ] Paul Pak commented on CASSANDRA-6927: - By doing: {code:java} MultipleOutputs.addNamedOutput(job, myColumnFamily, CqlBulkOutputFormat.class, Object.class, List.class); CqlConfigHelper.setColumnFamilySchema(conf, myColumnFamily, CREATE TABLE myKeyspace.myColumnFamily ...); CqlConfigHelper.setColumnFamilyInsertStatement(conf, myColumnFamily, UPDATE myKeyspace.myColumnFamily SET ); {code} you'll be able to write to multiple columnFamilies by doing: {code:java} MultipleOutputs multiOutputs = ... multiOutputs.write(myColumnFamily, null, values); {code} Create a CQL3 based bulk OutputFormat - Key: CASSANDRA-6927 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Paul Pak Priority: Minor Labels: cql3, hadoop Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, trunk-6927.txt This is the CQL compatible version of BulkOutputFormat. CqlOutputFormat exists, but doesn't write SSTables directly, similar to ColumnFamilyOutputFormat for thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055822#comment-14055822 ] Alex Liu commented on CASSANDRA-6927: - [~sixpak32577] You can create a boolean property, cassandra.columnfamily.multipleoutputs, in CqlConfigHelper. so if it's set to true, use getColumnFamilySchema(Configuration conf, String columnFamily), otherwise use getColumnFamilySchema(Configuration conf). By default, it's set to false. Create a CQL3 based bulk OutputFormat - Key: CASSANDRA-6927 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Paul Pak Priority: Minor Labels: cql3, hadoop Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, trunk-6927.txt This is the CQL compatible version of BulkOutputFormat. CqlOutputFormat exists, but doesn't write SSTables directly, similar to ColumnFamilyOutputFormat for thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14036616#comment-14036616 ] Alex Liu commented on CASSANDRA-6927: - CASSANDRA-7412 has Pig support the Cql based bulk output format. Create a CQL3 based bulk OutputFormat - Key: CASSANDRA-6927 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Paul Pak Priority: Minor Labels: cql3, hadoop Attachments: 6927-2.0-branch-v2.txt, trunk-6927.txt This is the CQL compatible version of BulkOutputFormat. CqlOutputFormat exists, but doesn't write SSTables directly, similar to ColumnFamilyOutputFormat for thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034314#comment-14034314 ] Alex Liu commented on CASSANDRA-6927: - [~sixpak32577] I will take your code and make some interface changes. Create a CQL3 based bulk OutputFormat - Key: CASSANDRA-6927 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Paul Pak Priority: Minor Labels: cql3, hadoop Attachments: trunk-6927.txt This is the CQL compatible version of BulkOutputFormat. CqlOutputFormat exists, but doesn't write SSTables directly, similar to ColumnFamilyOutputFormat for thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034378#comment-14034378 ] Paul Pak commented on CASSANDRA-6927: - @alexliu68 What type of interface changes did you have in mind? Create a CQL3 based bulk OutputFormat - Key: CASSANDRA-6927 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Paul Pak Priority: Minor Labels: cql3, hadoop Attachments: trunk-6927.txt This is the CQL compatible version of BulkOutputFormat. CqlOutputFormat exists, but doesn't write SSTables directly, similar to ColumnFamilyOutputFormat for thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034441#comment-14034441 ] Alex Liu commented on CASSANDRA-6927: - v2 on cassandra-2.0 branch is attached. It change the interface from OutputFormatByteBuffer,ListMutation to OutputFormatObject, ListMutation where Object is a placeholder for an object, but internally it's not used, so it can just be null object. ListMutation is the binder variable values to the insert statement. I also remove the . + columnFamily in {code} private String getColumnFamilySchema() { return conf.get(COLUMNFAMILY_SCHEMA + . + columnFamily); } private String getColumnFamilyInsertStatement() { return conf.get(COLUMNFAMILY_INSERT_STATEMENT + . + columnFamily); } {code} and some code format clean up Create a CQL3 based bulk OutputFormat - Key: CASSANDRA-6927 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Paul Pak Priority: Minor Labels: cql3, hadoop Attachments: 6927-2.0-branch-v2.txt, trunk-6927.txt This is the CQL compatible version of BulkOutputFormat. CqlOutputFormat exists, but doesn't write SSTables directly, similar to ColumnFamilyOutputFormat for thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984588#comment-13984588 ] Paul Pak commented on CASSANDRA-6927: - [~pkolaczk] Hi Piotr, any idea on when this may get reviewed? Create a CQL3 based bulk OutputFormat - Key: CASSANDRA-6927 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Paul Pak Priority: Minor Labels: cql3, hadoop Attachments: trunk-6927.txt This is the CQL compatible version of BulkOutputFormat. CqlOutputFormat exists, but doesn't write SSTables directly, similar to ColumnFamilyOutputFormat for thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949386#comment-13949386 ] Paul Pak commented on CASSANDRA-6927: - Initial thought is to use the CQLSSTableWriter to write the SSTable files, and then use a variation of the SSTableLoader to do the load. The only main difference I see with the new table loader compared to SSTableLoader would be in the way that CFMetaData is determined, since the thrift call for the keyspace ignores CQL3 tables. Any feedback on this approach would be welcome. Create a CQL3 based bulk OutputFormat - Key: CASSANDRA-6927 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Paul Pak Priority: Minor Labels: cql3, hadoop This is the CQL compatible version of BulkOutputFormat. CqlOutputFormat exists, but doesn't write SSTables directly, similar to ColumnFamilyOutputFormat for thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949398#comment-13949398 ] Paul Pak commented on CASSANDRA-6927: - One aspect that doesn't line up perfectly is that CQLSSTableWriter.addRow() and .rawAddRow() methods simply take the column values or name-value pairs as parameters into the stored procedure, while Hadoop's RecordWriter.write() method separates its parameters by keys and values. My plan is to have the new writer typed with ListByteBuffer, ListByteBuffer, and when the .write(ListByteBuffer, ListByteBuffer) method internally calls CQLSSTableWriter.rawAddRow(ListByteBuffer), just append the values list to the keys list. Create a CQL3 based bulk OutputFormat - Key: CASSANDRA-6927 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Paul Pak Priority: Minor Labels: cql3, hadoop This is the CQL compatible version of BulkOutputFormat. CqlOutputFormat exists, but doesn't write SSTables directly, similar to ColumnFamilyOutputFormat for thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)