[jira] [Commented] (NIFI-901) Create processors to get/put data with Apache Cassandra

Benjamin Janssen (JIRA) Tue, 13 Oct 2015 21:20:14 -0700

    [ 
https://issues.apache.org/jira/browse/NIFI-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14956226#comment-14956226
 ]


Benjamin Janssen commented on NIFI-901:
---------------------------------------

Been brushing up on CQL and I'm starting to foresee some difficulties.

First is the issue that with CQL Cassandra loses a lot of the nice fancy 
features of NoSQL databases.  There is no longer a way (from what I've been 
able to gather) to refer to a row by row name + column name.  Instead, each 
table must have a schema assigned to it with the row and column names being 
constructed from the fields that make up the "primary key" of the SQL like 
language.  This makes it difficult to build a simple generic processor to read 
row and column from the FlowFile attributes and dump the content into the cell 
and requires that the processor somehow be schema aware.

For batching purposes on the Put side of things.  The CQL3 documentation seems 
to imply that batching should not be used when seeking performance improvements 
(http://docs.datastax.com/en/cql/3.1/cql/cql_using/useBatch.html) but this 
seems to be mostly directed at the BATCH construct.  I think it would be fine 
to batch (without using the BATCH key word) by buffering updates to a single 
primary key (note that primary key in CQL refers to the combination of fields 
that defines the row AND column that will be written to).  I'm not sure this 
level of buffering is worth doing.

Combining these two issues, I'm wondering if FlowFiles should be structured in 
such a way that they have no content and the information to insert is contained 
solely within the attributes or if perhaps the content should be required to be 
of a JSON type format that defines the relevant information necessary for the 
update.  I think both of these approaches would limit the overall size of the 
entry that could be inserted but I'm not sure we want to be loading 
particularly huge objects into Cassandra anyways.

Thoughts?

> Create processors to get/put data with Apache Cassandra
> -------------------------------------------------------
>
>                 Key: NIFI-901
>                 URL: https://issues.apache.org/jira/browse/NIFI-901
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: Joseph Witt
>              Labels: beginner
>             Fix For: 0.4.0
>
>
> Develop processors to interact with Apache Cassandra.  The current http 
> processors may actually support this as is but such configuration may be too 
> complex to provide the quality user experience desired.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NIFI-901) Create processors to get/put data with Apache Cassandra

Reply via email to