from:"Ali Akhtar"

Re: Stable cassandra version with frozen UDTs

2017-06-26 Thread Ali Akhtar

So, which cassandra version is the most stable / production ready
currently? I'm fine with reverting to 2.x if needed.

On Mon, Jun 26, 2017 at 8:37 PM, Michael Shuler 
wrote:

> On 06/26/2017 10:17 AM, Vladimir Yudovin wrote:
> >
> > In terms of tick-tock releases odd releases (e.g. 3.11) are bug fixes.
>
> The last tick-tock feature release was 3.10. Tick-tock releases are no
> more. The project has moved back to development on stable release series.
>
> The bug-fixes on top of the features developed during the tick-tock
> cycle will live on as the ongoing 3.11 release series. 3.11.0 was *just*
> released and announced here on the user@ list. The cassandra-3.11 branch
> will get ongoing 3.11.x releases as a stable production series.
>
> Y'all know where to post bugs[0] while testing out and deploying this
> new Apache Cassandra release series. :)
>
> --
> Warm regards,
> Michael
>
> [0] https://issues.apache.org/jira/browse/CASSANDRA
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

Re: Stable cassandra version with frozen UDTs

2017-06-26 Thread Ali Akhtar

Is that version going to be stable for production? I'm looking for
something that I can just install, add nodes when needed, but otherwise not
have to worry or think about, even if it means downgrading to a lower
version and rewriting some of the code involving UDTs.

On Mon, Jun 26, 2017 at 3:51 PM, Vladimir Yudovin 
wrote:

> Latest comment in this JIRA is "I've committed to 3.11". 3.11 change log
> also contains "* Fix validation of non-frozen UDT cells (CASSANDRA-12916)"
> (merged from 3.10)
> So try version 3.11
>
> Best regards, Vladimir Yudovin,
> *Winguzone <https://winguzone.com?from=list> - Cloud Cassandra Hosting*
>
>
>  On Thu, 22 Jun 2017 10:17:15 -0400 *Ali Akhtar  >* wrote 
>
> I'm running cassandra 3.9, but it doesn't seem stable. E.g, one of my
> nodes recently crashed with the message
>
> 'org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException:
> Unexpected error deserializing mutation; saved to /tmp/
> mutation3976606415170694683dat.  This may be caused by replaying a
> mutation against a table with the same name but incompatible schema.
> Exception follows: org.apache.cassandra.serializers.MarshalException: Not
> enough bytes to read 0th field board_id'
>
> It looks like this particular bug is fixed in 3.10: https://issues.apache.
> org/jira/browse/CASSANDRA-12916
>
> Is there a stable version with support for frozen UDTs that I should use?
> If not, should I change my UDT code to use text, and revert to a 2.x
> version which is stable? I'm still in development, so it will be a pain,
> but I can revert to non frozen UDTs.
>
>
>

Stable cassandra version with frozen UDTs

2017-06-22 Thread Ali Akhtar

I'm running cassandra 3.9, but it doesn't seem stable. E.g, one of my nodes
recently crashed with the message

'org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException:
Unexpected error deserializing mutation; saved to
/tmp/mutation3976606415170694683dat.  This may be caused by replaying a
mutation against a table with the same name but incompatible schema.
Exception follows: org.apache.cassandra.serializers.MarshalException: Not
enough bytes to read 0th field board_id'

It looks like this particular bug is fixed in 3.10:
https://issues.apache.org/jira/browse/CASSANDRA-12916

Is there a stable version with support for frozen UDTs that I should use?
If not, should I change my UDT code to use text, and revert to a 2.x
version which is stable? I'm still in development, so it will be a pain,
but I can revert to non frozen UDTs.

Cassandra cost vs an RDBMS?

2017-06-15 Thread Ali Akhtar

A client recently inquired about the costs of running Cassandra vs a
traditional RDBMS like Postgres or Mysql, in the cloud.

They are releasing a b2b product similar to Slack, Trello, etc which will
have a free tier. And they're concerned about the costs of running it on
Cassandra, and whether it may be too expensive if it gets popular.

They have a write heavy workload, where data is being received 24/7,
analyzed and the results written to Cassandra. A few times a day, users
will view the results of the analysis, which will be the read portion of
the system.

Its my understanding that it may cost slightly, e.g 10-15% more to run this
system on Cassandra vs an RDBMS, because it needs more nodes, and higher
tier of AWS / GCE instances to run.

Can anyone who has experience scaling Cassandra share their insights?

Costs, metrics (e.g users, requests per second), etc would be really
helpful!

Counter being incremented extra times

2017-04-27 Thread Ali Akhtar

I have the following schema:

CREATE TABLE total_volume (
team_id text,
channel_id text,
volume counter,
PRIMARY KEY (team_id, channel_id)
);

I've written an integration test, using CassandraUnit, which runs a loop
200 times and executes the query:

UPDATE total_volume SET volume = volume + 1 WHERE team_id = ? AND
channel_id = ?

200 times, using an Accessor (Java driver)

Afterwards, I read the volume back, using:

SELECT * FROM total_volume WHERE team_id = ? AND channel_id = ?

and block the thread until the volume returned is 200.

The problem is, this always gets stuck on 230, but not at 200.

Why is this happening? Since I incremented the counter exactly 200 times,
shouldn't the final value be 200 and not 230?

Deserializing a json string directly to a java class using Jackson?

2017-04-11 Thread Ali Akhtar

I have a table containing a column `foo` which is a string, and is json.

I have a class called `Foo` which maps to `foo_json` and can be serialized
/ deserialized using Jackson.

Is it possible to define the column as `private Foo foo` rather than
`private String foo` and manually deserializing it?

From
https://docs.datastax.com/en/drivers/java/3.1/com/datastax/driver/extras/codecs/json/JacksonJsonCodec.html
it looks like one just has to add that maven dependency? Does anything else
have to be done?

Effective partition key for time series data, which allows range queries?

2017-03-27 Thread Ali Akhtar

I have a use case where the data for individual users is being tracked, and
every 15 minutes or so, the data for the past 15 minutes is inserted into
the table.

The table schema looks like:
user id, timestamp, foo, bar, etc.

Where foo, bar, etc are the items being tracked, and their values over the
past 15 minutes.

I initially planned to use the user id as the primary key of the table.
But, I realized that this may cause really wide rows ( tracking for 24
hours means 96 records inserted (1 for each 15 min window), over 1 year
this means 36k records per user, over 2 years, 72k, etc).

I know the  limit of wide rows is billions of records, but I've heard that
the practical limit is much lower.

So I considered using a composite primary key: (user, timestamp)

If I'm correct, the above should create a new row for each user & timestamp
logged.

However, will i still be able to do range queries on the timestamp, to e.g
return the data for the last week?

E.g select * from data where user_id = 'foo' and timestamp >= '<1 month
ago>' and timestamp <= '' ?

Grouping time series data into blocks of times

2017-03-18 Thread Ali Akhtar

I have a use case where a stream of time series data is coming in.

Each item in the stream has a timestamp of when it was sent, and covers the
activity that happened within a 5 minute timespan.

I need to group the items together into 30 minute blocks of time.

E.g, say I receive the following items:

5:00 PM, 5:05 PM, 5:10 PM... 5:30 PM, 6:20 PM

I need to group the messages from 5:00 PM to 5:30 PM into one block, and
put the 6:20 PM message into another block.

It seems simple enough to do, if for each message, I look up the last
received message. If it was within 30 minutes, then the message goes into
the current block. Otherwise, a new block is started.

My concern is about messages that arrive out of order, or are processed
concurrently.

Saving and reading them with Consistency=ALL would be bad for performance,
and I've had issues where queries have failed due to timeouts with those
settings (and timeouts can't be increased on a per query basis).

Would it be better to use Redis, or another database, to use as a helper /
companion to C*?

Or perhaps, all messages should just be stored first, and then ~30 minutes
later, a job is run which gets all messages within last 30 mins, sorts them
by time, and then sorts them into blocks of time?

Re: Ye old singleton debate

2017-03-15 Thread Ali Akhtar

+1. Would be awesome if this could be mocked / tested.

On Thu, Mar 16, 2017 at 3:47 AM, Edward Capriolo 
wrote:

> This question came up today:
>
> OK, say you mock, how do you construct a working multi-process
> representation of how C* actually works from within a unit test without
> running the code that actually constructs the cluster?
>
> 1) Don't do that (construct a multinode cluster in a test) just mock the
> crap out of it.
>
> http://www.baeldung.com/mockito-verify
>
> 2) dtests
> Dtest don't actually do this in the classic sense. One challenge is
> code-coverage. For many projects I use cobertura. http://www.
> mojohaus.org/cobertura-maven-plugin/. Cobertura can't (as far as know)
> can not instrument N JVMs and give you coverage. Bringing up a full on
> cluster to test something is slow, compute intensive, and quite hard.
>
> 3) Fix it
> https://issues.apache.org/jira/browse/CASSANDRA-7837
> https://issues.apache.org/jira/browse/CASSANDRA-10283
>
> *Impossible you say!  No NoSQL JAVA DATABASE CAN DO THIS!!!*
>
> https://www.elastic.co/guide/en/elasticsearch/reference/
> current/integration-tests.html
>
> Wouldn't that be just the bees knees???
>
>
>
>
>
>
>

Re: Not timing out some queries (Java driver)

2016-12-22 Thread Ali Akhtar

The replication factor is the default - I haven't changed it. Would
tweaking it help?

On Thu, Dec 22, 2016 at 8:41 PM, Ali Akhtar  wrote:

> Vladimir,
>
> I'm receiving a batch of messages which are out of order, and I need to
> process those messages in order.
>
> My solution is to write them to a cassandra table first, where they'll be
> ordered by their timestamp.
>
> Then read them back from that table, knowing that they'll be ordered.
>
> But for this to work, I need the data to be available immediately after I
> write it. For this, I think I need consistency = ALL.
>
> On Thu, Dec 22, 2016 at 8:29 PM, Vladimir Yudovin 
> wrote:
>
>> What is replication factor? Why not use CONSISTENCY QUORUM? It's faster
>> and safe enough.
>>
>> Best regards, Vladimir Yudovin,
>> *Winguzone <https://winguzone.com?from=list> - Cloud Cassandra Hosting*
>>
>>
>>  On Thu, 22 Dec 2016 10:14:14 -0500 *Ali Akhtar
>> >* wrote 
>>
>> Is it possible to provide these options per query rather than set them
>> globally?
>>
>> On Thu, Dec 22, 2016 at 7:15 AM, Voytek Jarnot 
>> wrote:
>>
>> cassandra.yaml has various timeouts such as read_request_timeout,
>> range_request_timeout, write_request_timeout, etc.  The driver does as well
>> (via Cluster -> Configuration -> SocketOptions -> setReadTimeoutMillis).
>>
>> Not sure if you can (or would want to) set them to "forever", but it's a
>> starting point.
>>
>> On Wed, Dec 21, 2016 at 7:10 PM, Ali Akhtar  wrote:
>>
>> I have some queries which need to be processed in a consistent manner.
>> I'm setting the consistently level = ALL option on these queries.
>>
>> However, I've noticed that sometimes these queries fail because of a
>> timeout (2 seconds).
>>
>> In my use case, for certain queries, I want them to never time out and
>> block until they have been acknowledged by all nodes.
>>
>> Is that possible thru the Datastax Java driver, or another way?
>>
>>
>>
>

Re: Not timing out some queries (Java driver)

2016-12-22 Thread Ali Akhtar

Vladimir,

I'm receiving a batch of messages which are out of order, and I need to
process those messages in order.

My solution is to write them to a cassandra table first, where they'll be
ordered by their timestamp.

Then read them back from that table, knowing that they'll be ordered.

But for this to work, I need the data to be available immediately after I
write it. For this, I think I need consistency = ALL.

On Thu, Dec 22, 2016 at 8:29 PM, Vladimir Yudovin 
wrote:

> What is replication factor? Why not use CONSISTENCY QUORUM? It's faster
> and safe enough.
>
> Best regards, Vladimir Yudovin,
> *Winguzone <https://winguzone.com?from=list> - Cloud Cassandra Hosting*
>
>
> ---- On Thu, 22 Dec 2016 10:14:14 -0500 *Ali Akhtar  >* wrote 
>
> Is it possible to provide these options per query rather than set them
> globally?
>
> On Thu, Dec 22, 2016 at 7:15 AM, Voytek Jarnot 
> wrote:
>
> cassandra.yaml has various timeouts such as read_request_timeout,
> range_request_timeout, write_request_timeout, etc.  The driver does as well
> (via Cluster -> Configuration -> SocketOptions -> setReadTimeoutMillis).
>
> Not sure if you can (or would want to) set them to "forever", but it's a
> starting point.
>
> On Wed, Dec 21, 2016 at 7:10 PM, Ali Akhtar  wrote:
>
> I have some queries which need to be processed in a consistent manner. I'm
> setting the consistently level = ALL option on these queries.
>
> However, I've noticed that sometimes these queries fail because of a
> timeout (2 seconds).
>
> In my use case, for certain queries, I want them to never time out and
> block until they have been acknowledged by all nodes.
>
> Is that possible thru the Datastax Java driver, or another way?
>
>
>

Re: Not timing out some queries (Java driver)

2016-12-22 Thread Ali Akhtar

Is it possible to provide these options per query rather than set them
globally?

On Thu, Dec 22, 2016 at 7:15 AM, Voytek Jarnot 
wrote:

> cassandra.yaml has various timeouts such as read_request_timeout,
> range_request_timeout, write_request_timeout, etc.  The driver does as well
> (via Cluster -> Configuration -> SocketOptions -> setReadTimeoutMillis).
>
> Not sure if you can (or would want to) set them to "forever", but it's a
> starting point.
>
> On Wed, Dec 21, 2016 at 7:10 PM, Ali Akhtar  wrote:
>
>> I have some queries which need to be processed in a consistent manner.
>> I'm setting the consistently level = ALL option on these queries.
>>
>> However, I've noticed that sometimes these queries fail because of a
>> timeout (2 seconds).
>>
>> In my use case, for certain queries, I want them to never time out and
>> block until they have been acknowledged by all nodes.
>>
>> Is that possible thru the Datastax Java driver, or another way?
>>
>
>

Re: Processing time series data in order

2016-12-21 Thread Ali Akhtar

The batch size can be large, so in memory ordering isn't an option,
unfortunately.

On Thu, Dec 22, 2016 at 7:09 AM, Jesse Hodges 
wrote:

> Depending on the expected max out of order window, why not order them in
> memory? Then you don't need to reread from Cassandra, in case of a problem
> you can reread data from Kafka.
>
> -Jesse
>
> > On Dec 21, 2016, at 7:24 PM, Ali Akhtar  wrote:
> >
> > - I'm receiving a batch of messages to a Kafka topic.
> >
> > Each message has a timestamp, however the messages can arrive / get
> processed out of order. I.e event 1's timestamp could've been a few seconds
> before event 2, and event 2 could still get processed before event 1.
> >
> > - I know the number of messages that are sent per batch.
> >
> > - I need to process the messages in order. The messages are basically
> providing the history of an item. I need to be able to track the history
> accurately (i.e, if an event occurred 3 times, i need to accurately log the
> dates of the first, 2nd, and 3rd time it occurred).
> >
> > The approach I'm considering is:
> >
> > - Creating a cassandra table which is ordered by the timestamp of the
> messages.
> >
> > - Once a batch of messages has arrived, writing them all to cassandra,
> counting on them being ordered by the timestamp even if they are processed
> out of order.
> >
> > - Then iterating over the messages in the cassandra table, to process
> them in order.
> >
> > However, I'm concerned about Cassandra's eventual consistency. Could it
> be that even though I wrote the messages, they are not there when I try to
> read them (which would be almost immediately after they are written)?
> >
> > Should I enforce consistency = ALL to make sure the messages will be
> available immediately after being written?
> >
> > Is there a better way to handle this thru either Kafka streams or
> Cassandra?
>

Processing time series data in order

2016-12-21 Thread Ali Akhtar

- I'm receiving a batch of messages to a Kafka topic.

Each message has a timestamp, however the messages can arrive / get
processed out of order. I.e event 1's timestamp could've been a few seconds
before event 2, and event 2 could still get processed before event 1.

- I know the number of messages that are sent per batch.

- I need to process the messages in order. The messages are basically
providing the history of an item. I need to be able to track the history
accurately (i.e, if an event occurred 3 times, i need to accurately log the
dates of the first, 2nd, and 3rd time it occurred).

The approach I'm considering is:

- Creating a cassandra table which is ordered by the timestamp of the
messages.

- Once a batch of messages has arrived, writing them all to cassandra,
counting on them being ordered by the timestamp even if they are processed
out of order.

- Then iterating over the messages in the cassandra table, to process them
in order.

However, I'm concerned about Cassandra's eventual consistency. Could it be
that even though I wrote the messages, they are not there when I try to
read them (which would be almost immediately after they are written)?

Should I enforce consistency = ALL to make sure the messages will be
available immediately after being written?

Is there a better way to handle this thru either Kafka streams or Cassandra?

Not timing out some queries (Java driver)

2016-12-21 Thread Ali Akhtar

I have some queries which need to be processed in a consistent manner. I'm
setting the consistently level = ALL option on these queries.

However, I've noticed that sometimes these queries fail because of a
timeout (2 seconds).

In my use case, for certain queries, I want them to never time out and
block until they have been acknowledged by all nodes.

Is that possible thru the Datastax Java driver, or another way?

Re: Storing videos in cassandra

2016-11-14 Thread Ali Akhtar

I am truly sorry, Raghavendra. It didn't occur to me that you could be a
beginner.

On Mon, Nov 14, 2016 at 11:46 PM, Jon Haddad 
wrote:

> Think about it like this.  You just started using Cassandra for the first
> time.  You have a question, you find there’s a mailing list, and you ask.
> You have zero experience with the DB and are an outsider to a community.
> You ask anyways, because it’s where the Apache website says to go.  You get
> back 2 sarcastic responses which aren’t helpful at all.  You, Ali, are the
> first contact with the community and it's a negative one.  Your joke,
> however funny it is, excludes someone who isn’t on the inside.  They don’t
> get the elbow to the ribs, haha, we’re just having fun, they get the “wow,
> all I did was ask a question and I got made fun of” feeling.
>
> Everyone is a beginner, and an outsider, at some point.  Please keep that
> in mind no-one has any understanding to the intent on your jokes when all
> they have is a 2 sentence response that is obviously not meant to be
> helpful.
>
> Jon
>
> On Nov 14, 2016, at 10:25 AM, Ali Akhtar  wrote:
>
> Excuse me? I did not make fun of anyone. I gave valid suggestions that are
> all theoretically possible.
>
> If it came off in a condescending way, i am genuinely sorry.
>
> On 14 Nov 2016 11:22 pm, "Jon Haddad"  wrote:
>
>> You’ve asked a lot of questions on this mailing list, and you’ve gotten
>> help on a ton of beginner issues.  Making fun of someone for asking similar
>> beginner questions is not cool at all.  Cut it out.
>>
>>
>>
>> On Nov 14, 2016, at 10:13 AM, Ali Akhtar  wrote:
>>
>> Another solution could be to print the raw bytes to paper, and write the
>> page numbers to cassandra. Playback will be challenging with this method
>> however, unless interns are available to transcribe the papers back to a
>> digital format.
>>
>> On Mon, Nov 14, 2016 at 11:06 PM, Ali Akhtar 
>> wrote:
>>
>>> The video can be written to floppy diskettes, and the serial numbers of
>>> the diskettes can be written to cassandra.
>>>
>>> On Mon, Nov 14, 2016 at 11:00 PM, Oskar Kjellin >> > wrote:
>>>
>>>> The actual video is not stored in Cassandra. You need to use a proper
>>>> origin like s3.
>>>>
>>>> Although you can probably store it in Cassandra, it's not a good idea.
>>>>
>>>> Sent from my iPhone
>>>>
>>>> > On 14 nov. 2016, at 18:02, raghavendra vutti <
>>>> raghu9raghaven...@gmail.com> wrote:
>>>> >
>>>> > Hi,
>>>> >
>>>> >  Just wanted to know How does hulu or netflix store videos in
>>>> cassandra.
>>>> >
>>>> > Do they just use references to the video files in the form of URL's
>>>> and store in the DB??
>>>> >
>>>> > could someone please me on this.
>>>> >
>>>> >
>>>> > Thanks,
>>>> > Raghavendra.
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>>
>

Re: Storing videos in cassandra

2016-11-14 Thread Ali Akhtar

Excuse me? I did not make fun of anyone. I gave valid suggestions that are
all theoretically possible.

If it came off in a condescending way, i am genuinely sorry.

On 14 Nov 2016 11:22 pm, "Jon Haddad"  wrote:

> You’ve asked a lot of questions on this mailing list, and you’ve gotten
> help on a ton of beginner issues.  Making fun of someone for asking similar
> beginner questions is not cool at all.  Cut it out.
>
>
>
> On Nov 14, 2016, at 10:13 AM, Ali Akhtar  wrote:
>
> Another solution could be to print the raw bytes to paper, and write the
> page numbers to cassandra. Playback will be challenging with this method
> however, unless interns are available to transcribe the papers back to a
> digital format.
>
> On Mon, Nov 14, 2016 at 11:06 PM, Ali Akhtar  wrote:
>
>> The video can be written to floppy diskettes, and the serial numbers of
>> the diskettes can be written to cassandra.
>>
>> On Mon, Nov 14, 2016 at 11:00 PM, Oskar Kjellin 
>> wrote:
>>
>>> The actual video is not stored in Cassandra. You need to use a proper
>>> origin like s3.
>>>
>>> Although you can probably store it in Cassandra, it's not a good idea.
>>>
>>> Sent from my iPhone
>>>
>>> > On 14 nov. 2016, at 18:02, raghavendra vutti <
>>> raghu9raghaven...@gmail.com> wrote:
>>> >
>>> > Hi,
>>> >
>>> >  Just wanted to know How does hulu or netflix store videos in
>>> cassandra.
>>> >
>>> > Do they just use references to the video files in the form of URL's
>>> and store in the DB??
>>> >
>>> > could someone please me on this.
>>> >
>>> >
>>> > Thanks,
>>> > Raghavendra.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>>
>>
>>
>
>

Re: Storing videos in cassandra

2016-11-14 Thread Ali Akhtar

Another solution could be to print the raw bytes to paper, and write the
page numbers to cassandra. Playback will be challenging with this method
however, unless interns are available to transcribe the papers back to a
digital format.

On Mon, Nov 14, 2016 at 11:06 PM, Ali Akhtar  wrote:

> The video can be written to floppy diskettes, and the serial numbers of
> the diskettes can be written to cassandra.
>
> On Mon, Nov 14, 2016 at 11:00 PM, Oskar Kjellin 
> wrote:
>
>> The actual video is not stored in Cassandra. You need to use a proper
>> origin like s3.
>>
>> Although you can probably store it in Cassandra, it's not a good idea.
>>
>> Sent from my iPhone
>>
>> > On 14 nov. 2016, at 18:02, raghavendra vutti <
>> raghu9raghaven...@gmail.com> wrote:
>> >
>> > Hi,
>> >
>> >  Just wanted to know How does hulu or netflix store videos in cassandra.
>> >
>> > Do they just use references to the video files in the form of URL's and
>> store in the DB??
>> >
>> > could someone please me on this.
>> >
>> >
>> > Thanks,
>> > Raghavendra.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>
>

Re: Storing videos in cassandra

2016-11-14 Thread Ali Akhtar

The video can be written to floppy diskettes, and the serial numbers of the
diskettes can be written to cassandra.

On Mon, Nov 14, 2016 at 11:00 PM, Oskar Kjellin 
wrote:

> The actual video is not stored in Cassandra. You need to use a proper
> origin like s3.
>
> Although you can probably store it in Cassandra, it's not a good idea.
>
> Sent from my iPhone
>
> > On 14 nov. 2016, at 18:02, raghavendra vutti <
> raghu9raghaven...@gmail.com> wrote:
> >
> > Hi,
> >
> >  Just wanted to know How does hulu or netflix store videos in cassandra.
> >
> > Do they just use references to the video files in the form of URL's and
> store in the DB??
> >
> > could someone please me on this.
> >
> >
> > Thanks,
> > Raghavendra.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>

Re: Consistency when adding data to collections concurrently?

2016-11-13 Thread Ali Akhtar

Yeah, except I guess there's a minor debate left on whether it'd be more
performant to store the labels in their own table, and do a read query each
time when the parent item is fetched.

Or if they should be kept as a set on the parent item and take the
penalty when updating / deleting labels. (Which will be rare, by the way.)

On Sun, Nov 13, 2016 at 5:38 PM, DuyHai Doan  wrote:

> So problem solved!
>
> On Sun, Nov 13, 2016 at 1:37 PM, Ali Akhtar  wrote:
>
>> Yeah, I am using set (not set though)
>>
>> On Sun, Nov 13, 2016 at 5:36 PM, DuyHai Doan 
>> wrote:
>>
>>> Yes you'd have to know the UDT values since it's part of the primary key
>>> to query your data.
>>>
>>> If I were you I would stick to using a set and use UPDATE my_table
>>> SET labels = labels + ;
>>>
>>> It does work well with concurrent updates.
>>>
>>> On Sun, Nov 13, 2016 at 1:32 PM, Ali Akhtar 
>>> wrote:
>>>
>>>> But then how would you query it? You'd need to know all the values of
>>>> the udt, right?
>>>>
>>>> On Sun, Nov 13, 2016 at 5:30 PM, DuyHai Doan 
>>>> wrote:
>>>>
>>>>> "Also can you make a UDT a clustered key?" --> yes if it's frozen
>>>>>
>>>>> On Sun, Nov 13, 2016 at 1:25 PM, Ali Akhtar 
>>>>> wrote:
>>>>>
>>>>>> If I wanted to get all values for an item, including its labels, how
>>>>>> would that be done in the above case?
>>>>>>
>>>>>> Also can you make a UDT a clustered key?
>>>>>>
>>>>>> On Sun, Nov 13, 2016 at 4:33 AM, Manoj Khangaonkar <
>>>>>> khangaon...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Instead of using a collection, consider making label a clustered
>>>>>>> column.
>>>>>>>
>>>>>>> With this each request will essentially append a column (label) to
>>>>>>> the partition.
>>>>>>>
>>>>>>> To get all labels would be a simple query
>>>>>>>
>>>>>>> select label from table where partitionkey = "value".
>>>>>>>
>>>>>>> In general , read + update  of a column is an anti pattern in
>>>>>>> cassandra - which is what you are doing. What I suggesting
>>>>>>> above is appending more columns and not updating existing columns.
>>>>>>>
>>>>>>> regards
>>>>>>>
>>>>>>> regards
>>>>>>>
>>>>>>> On Sat, Nov 12, 2016 at 2:34 AM, Ali Akhtar 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I have a table where each record contains a list of labels.
>>>>>>>>
>>>>>>>> I have an endpoint which responds to new labels being added to a
>>>>>>>> record by the user.
>>>>>>>>
>>>>>>>> Consider the following scenario:
>>>>>>>>
>>>>>>>> - Record X, labels = []
>>>>>>>> - User selects 2 labels, clicks a button, and 2 http requests are
>>>>>>>> generated.
>>>>>>>> - The server receives request for Label 1 and Label 2 at the same
>>>>>>>> time.
>>>>>>>> - Both requests see the labels as empty, add 1 label to the
>>>>>>>> collection, and send it.
>>>>>>>> - Record state as label 1 request sees it: [1], as label 2 sees it:
>>>>>>>> [2]
>>>>>>>>
>>>>>>>> How will the above conflict be resolved? What can I do so I end up
>>>>>>>> with [1, 2] instead of either [1] or [2] after both requests have been
>>>>>>>> processed?
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> http://khangaonkar.blogspot.com/
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Consistency when adding data to collections concurrently?

2016-11-13 Thread Ali Akhtar

Yeah, I am using set (not set though)

On Sun, Nov 13, 2016 at 5:36 PM, DuyHai Doan  wrote:

> Yes you'd have to know the UDT values since it's part of the primary key
> to query your data.
>
> If I were you I would stick to using a set and use UPDATE my_table
> SET labels = labels + ;
>
> It does work well with concurrent updates.
>
> On Sun, Nov 13, 2016 at 1:32 PM, Ali Akhtar  wrote:
>
>> But then how would you query it? You'd need to know all the values of the
>> udt, right?
>>
>> On Sun, Nov 13, 2016 at 5:30 PM, DuyHai Doan 
>> wrote:
>>
>>> "Also can you make a UDT a clustered key?" --> yes if it's frozen
>>>
>>> On Sun, Nov 13, 2016 at 1:25 PM, Ali Akhtar 
>>> wrote:
>>>
>>>> If I wanted to get all values for an item, including its labels, how
>>>> would that be done in the above case?
>>>>
>>>> Also can you make a UDT a clustered key?
>>>>
>>>> On Sun, Nov 13, 2016 at 4:33 AM, Manoj Khangaonkar <
>>>> khangaon...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Instead of using a collection, consider making label a clustered
>>>>> column.
>>>>>
>>>>> With this each request will essentially append a column (label) to the
>>>>> partition.
>>>>>
>>>>> To get all labels would be a simple query
>>>>>
>>>>> select label from table where partitionkey = "value".
>>>>>
>>>>> In general , read + update  of a column is an anti pattern in
>>>>> cassandra - which is what you are doing. What I suggesting
>>>>> above is appending more columns and not updating existing columns.
>>>>>
>>>>> regards
>>>>>
>>>>> regards
>>>>>
>>>>> On Sat, Nov 12, 2016 at 2:34 AM, Ali Akhtar 
>>>>> wrote:
>>>>>
>>>>>> I have a table where each record contains a list of labels.
>>>>>>
>>>>>> I have an endpoint which responds to new labels being added to a
>>>>>> record by the user.
>>>>>>
>>>>>> Consider the following scenario:
>>>>>>
>>>>>> - Record X, labels = []
>>>>>> - User selects 2 labels, clicks a button, and 2 http requests are
>>>>>> generated.
>>>>>> - The server receives request for Label 1 and Label 2 at the same
>>>>>> time.
>>>>>> - Both requests see the labels as empty, add 1 label to the
>>>>>> collection, and send it.
>>>>>> - Record state as label 1 request sees it: [1], as label 2 sees it:
>>>>>> [2]
>>>>>>
>>>>>> How will the above conflict be resolved? What can I do so I end up
>>>>>> with [1, 2] instead of either [1] or [2] after both requests have been
>>>>>> processed?
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> http://khangaonkar.blogspot.com/
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Consistency when adding data to collections concurrently?

2016-11-13 Thread Ali Akhtar

But then how would you query it? You'd need to know all the values of the
udt, right?

On Sun, Nov 13, 2016 at 5:30 PM, DuyHai Doan  wrote:

> "Also can you make a UDT a clustered key?" --> yes if it's frozen
>
> On Sun, Nov 13, 2016 at 1:25 PM, Ali Akhtar  wrote:
>
>> If I wanted to get all values for an item, including its labels, how
>> would that be done in the above case?
>>
>> Also can you make a UDT a clustered key?
>>
>> On Sun, Nov 13, 2016 at 4:33 AM, Manoj Khangaonkar > > wrote:
>>
>>> Hi,
>>>
>>> Instead of using a collection, consider making label a clustered column.
>>>
>>> With this each request will essentially append a column (label) to the
>>> partition.
>>>
>>> To get all labels would be a simple query
>>>
>>> select label from table where partitionkey = "value".
>>>
>>> In general , read + update  of a column is an anti pattern in cassandra
>>> - which is what you are doing. What I suggesting
>>> above is appending more columns and not updating existing columns.
>>>
>>> regards
>>>
>>> regards
>>>
>>> On Sat, Nov 12, 2016 at 2:34 AM, Ali Akhtar 
>>> wrote:
>>>
>>>> I have a table where each record contains a list of labels.
>>>>
>>>> I have an endpoint which responds to new labels being added to a record
>>>> by the user.
>>>>
>>>> Consider the following scenario:
>>>>
>>>> - Record X, labels = []
>>>> - User selects 2 labels, clicks a button, and 2 http requests are
>>>> generated.
>>>> - The server receives request for Label 1 and Label 2 at the same time.
>>>> - Both requests see the labels as empty, add 1 label to the collection,
>>>> and send it.
>>>> - Record state as label 1 request sees it: [1], as label 2 sees it: [2]
>>>>
>>>> How will the above conflict be resolved? What can I do so I end up with
>>>> [1, 2] instead of either [1] or [2] after both requests have been 
>>>> processed?
>>>>
>>>
>>>
>>>
>>> --
>>> http://khangaonkar.blogspot.com/
>>>
>>
>>
>

Re: Consistency when adding data to collections concurrently?

2016-11-13 Thread Ali Akhtar

If I wanted to get all values for an item, including its labels, how would
that be done in the above case?

Also can you make a UDT a clustered key?

On Sun, Nov 13, 2016 at 4:33 AM, Manoj Khangaonkar 
wrote:

> Hi,
>
> Instead of using a collection, consider making label a clustered column.
>
> With this each request will essentially append a column (label) to the
> partition.
>
> To get all labels would be a simple query
>
> select label from table where partitionkey = "value".
>
> In general , read + update  of a column is an anti pattern in cassandra -
> which is what you are doing. What I suggesting
> above is appending more columns and not updating existing columns.
>
> regards
>
> regards
>
> On Sat, Nov 12, 2016 at 2:34 AM, Ali Akhtar  wrote:
>
>> I have a table where each record contains a list of labels.
>>
>> I have an endpoint which responds to new labels being added to a record
>> by the user.
>>
>> Consider the following scenario:
>>
>> - Record X, labels = []
>> - User selects 2 labels, clicks a button, and 2 http requests are
>> generated.
>> - The server receives request for Label 1 and Label 2 at the same time.
>> - Both requests see the labels as empty, add 1 label to the collection,
>> and send it.
>> - Record state as label 1 request sees it: [1], as label 2 sees it: [2]
>>
>> How will the above conflict be resolved? What can I do so I end up with
>> [1, 2] instead of either [1] or [2] after both requests have been processed?
>>
>
>
>
> --
> http://khangaonkar.blogspot.com/
>

Re: Consistency when adding data to collections concurrently?

2016-11-12 Thread Ali Akhtar

Just to be clear, doing mapper.save() will do an insert rather than an
update?

On Sat, Nov 12, 2016 at 9:36 PM, Andrew Tolbert  wrote:

> I believe you are correct that the implementation taking the Set is the
> right one to use.
>
> On Sat, Nov 12, 2016 at 9:44 AM Ali Akhtar  wrote:
>
>> Or it could even take Set as the first bound var:
>>
>> void addLabel(Set label, String id);
>>
>>
>> On Sat, Nov 12, 2016 at 8:41 PM, Ali Akhtar  wrote:
>>
>> Andrew,
>>
>> I was thinking about setting up an accessor with that query and a bound
>> variable ? which binds to the instance being added, e.g:
>>
>> @Query("UPDATE my_table SET labels = labels + ? WHERE id = ?")
>> void addLabel(Label label, String id);
>>
>> Will that  work?
>>
>> On Sat, Nov 12, 2016 at 8:38 PM, Andrew Tolbert <
>> andrew.tolb...@datastax.com> wrote:
>>
>> You can do it in a SimpleStatement assuming you provide the CQL exactly
>> as you provided, but in a PreparedStatement it will not work because cql
>> prohibits provide bind values in collection literals.  For it to work you
>> could provide a List of UDT values in a bound prepared statement, i.e.:
>>
>> UserType udtType = cluster.getMetadata().
>> getKeyspace("k").getUserType("u");
>> UDTValue value = udtType.newValue();
>> value.setString(0, "data");
>>
>> PreparedStatement p0 = session.prepare("UPDATE my_table SET labels =
>> labels + ? where id = ?");
>> BoundStatement b0 = p0.bind(*Lists.newArrayList(value)*, 0);
>> session.execute(b0);
>>
>> Thanks,
>> Andy
>>
>> On Sat, Nov 12, 2016 at 9:02 AM, Ali Akhtar  wrote:
>>
>> Looks like the trick was to use [] around the udt value literal.
>>
>> Any way to do this using the java driver?
>>
>> On Sat, Nov 12, 2016 at 7:58 PM, Ali Akhtar  wrote:
>>
>> Changing the double quotes to single quotes gives:
>>
>> UPDATE my_table SET labels = labels + {id: 'foo'} where id = '';
>>
>> InvalidRequest: Error from server: code=2200 [Invalid query]
>> message="Invalid user type literal for labels of type list>"
>>
>>
>> On Sat, Nov 12, 2016 at 7:50 PM, Ali Akhtar  wrote:
>>
>> The question is about appending to a set of frozen and how to do
>> that while avoiding the race condition.
>>
>> If I run:
>>
>>  UPDATE my_table SET labels = labels + {id: "foo"} where id = 'xx';
>>
>> I get:
>>
>> SyntaxException: line 1:57 no viable alternative at input '}' (...=
>> labels + {id: ["fo]o"}...)
>>
>> Here labels is set>
>>
>> On Sat, Nov 12, 2016 at 7:40 PM, Vladimir Yudovin 
>> wrote:
>>
>> If I used consistency = ALL both when getting the record, and when saving
>> the record, will that avoid the race condition?
>> If I use consistency level = all, will that cause it to end up with [1,2]?
>> No. Even if you have only one host it's possible that two threads first
>> both read data and than overwrite existing value one by one.
>>
>> The list is actually of a list> and not a text (I used
>> text for simplification, apologies).
>> In that case, will updates still merge the list values instead of
>> overwriting them?
>> Do you mean UPDATE cql operation? Yes, it adds new values to list,
>> allowing duplicates.
>>
>> When setting a new value to a list, C* will do a read-delete-write
>> internally e.g. read the current list, remove all its value (by a range
>> tombstone) and then write the new list.
>> As I mentioned duplicates are allowed in LIST, and as DOC says:
>>
>> These update operations are implemented internally without any
>> read-before-write. Appending and prepending a new element to the list
>> writes only the new element.
>>
>> Only when using index
>>
>> When you add an element at a particular position, Cassandra reads the
>> entire list, and then writes only the updated element. Consequently, adding
>> an element at a particular position results in greater latency than
>> appending or prefixing an element to a list.
>>
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>> CassandraLaunch your cluster in minutes.*
>>
>>
>>  On Sat, 12 Nov 2016 07:57:36 -0500*Ali Akhtar > >* wrote 
>>
>> The labels collection is of the type set> ,

Re: Consistency when adding data to collections concurrently?

2016-11-12 Thread Ali Akhtar

Or it could even take Set as the first bound var:

void addLabel(Set label, String id);


On Sat, Nov 12, 2016 at 8:41 PM, Ali Akhtar  wrote:

> Andrew,
>
> I was thinking about setting up an accessor with that query and a bound
> variable ? which binds to the instance being added, e.g:
>
> @Query("UPDATE my_table SET labels = labels + ? WHERE id = ?")
> void addLabel(Label label, String id);
>
> Will that  work?
>
> On Sat, Nov 12, 2016 at 8:38 PM, Andrew Tolbert <
> andrew.tolb...@datastax.com> wrote:
>
>> You can do it in a SimpleStatement assuming you provide the CQL exactly
>> as you provided, but in a PreparedStatement it will not work because cql
>> prohibits provide bind values in collection literals.  For it to work you
>> could provide a List of UDT values in a bound prepared statement, i.e.:
>>
>> UserType udtType = cluster.getMetadata().getKeysp
>> ace("k").getUserType("u");
>> UDTValue value = udtType.newValue();
>> value.setString(0, "data");
>>
>> PreparedStatement p0 = session.prepare("UPDATE my_table SET labels =
>> labels + ? where id = ?");
>> BoundStatement b0 = p0.bind(*Lists.newArrayList(value)*, 0);
>> session.execute(b0);
>>
>> Thanks,
>> Andy
>>
>> On Sat, Nov 12, 2016 at 9:02 AM, Ali Akhtar  wrote:
>>
>>> Looks like the trick was to use [] around the udt value literal.
>>>
>>> Any way to do this using the java driver?
>>>
>>> On Sat, Nov 12, 2016 at 7:58 PM, Ali Akhtar 
>>> wrote:
>>>
>>>> Changing the double quotes to single quotes gives:
>>>>
>>>> UPDATE my_table SET labels = labels + {id: 'foo'} where id = '';
>>>>
>>>> InvalidRequest: Error from server: code=2200 [Invalid query]
>>>> message="Invalid user type literal for labels of type list>"
>>>>
>>>>
>>>> On Sat, Nov 12, 2016 at 7:50 PM, Ali Akhtar 
>>>> wrote:
>>>>
>>>>> The question is about appending to a set of frozen and how to do
>>>>> that while avoiding the race condition.
>>>>>
>>>>> If I run:
>>>>>
>>>>>  UPDATE my_table SET labels = labels + {id: "foo"} where id = 'xx';
>>>>>
>>>>> I get:
>>>>>
>>>>> SyntaxException: line 1:57 no viable alternative at input '}' (...=
>>>>> labels + {id: ["fo]o"}...)
>>>>>
>>>>> Here labels is set>
>>>>>
>>>>> On Sat, Nov 12, 2016 at 7:40 PM, Vladimir Yudovin <
>>>>> vla...@winguzone.com> wrote:
>>>>>
>>>>>> If I used consistency = ALL both when getting the record, and when
>>>>>> saving the record, will that avoid the race condition?
>>>>>> If I use consistency level = all, will that cause it to end up with
>>>>>> [1,2]?
>>>>>> No. Even if you have only one host it's possible that two threads
>>>>>> first both read data and than overwrite existing value one by one.
>>>>>>
>>>>>> The list is actually of a list> and not a text (I used
>>>>>> text for simplification, apologies).
>>>>>> In that case, will updates still merge the list values instead of
>>>>>> overwriting them?
>>>>>> Do you mean UPDATE cql operation? Yes, it adds new values to list,
>>>>>> allowing duplicates.
>>>>>>
>>>>>> When setting a new value to a list, C* will do a read-delete-write
>>>>>> internally e.g. read the current list, remove all its value (by a range
>>>>>> tombstone) and then write the new list.
>>>>>> As I mentioned duplicates are allowed in LIST, and as DOC says:
>>>>>>
>>>>>> These update operations are implemented internally without any
>>>>>> read-before-write. Appending and prepending a new element to the
>>>>>> list writes only the new element.
>>>>>>
>>>>>> Only when using index
>>>>>>
>>>>>> When you add an element at a particular position, Cassandra reads the
>>>>>> entire list, and then writes only the updated element. Consequently, 
>>>>>> adding
>>>>>> an element at a particular position results in gre

Re: Consistency when adding data to collections concurrently?

2016-11-12 Thread Ali Akhtar

Andrew,

I was thinking about setting up an accessor with that query and a bound
variable ? which binds to the instance being added, e.g:

@Query("UPDATE my_table SET labels = labels + ? WHERE id = ?")
void addLabel(Label label, String id);

Will that  work?

On Sat, Nov 12, 2016 at 8:38 PM, Andrew Tolbert  wrote:

> You can do it in a SimpleStatement assuming you provide the CQL exactly as
> you provided, but in a PreparedStatement it will not work because cql
> prohibits provide bind values in collection literals.  For it to work you
> could provide a List of UDT values in a bound prepared statement, i.e.:
>
> UserType udtType = cluster.getMetadata().
> getKeyspace("k").getUserType("u");
> UDTValue value = udtType.newValue();
> value.setString(0, "data");
>
> PreparedStatement p0 = session.prepare("UPDATE my_table SET labels =
> labels + ? where id = ?");
> BoundStatement b0 = p0.bind(*Lists.newArrayList(value)*, 0);
> session.execute(b0);
>
> Thanks,
> Andy
>
> On Sat, Nov 12, 2016 at 9:02 AM, Ali Akhtar  wrote:
>
>> Looks like the trick was to use [] around the udt value literal.
>>
>> Any way to do this using the java driver?
>>
>> On Sat, Nov 12, 2016 at 7:58 PM, Ali Akhtar  wrote:
>>
>>> Changing the double quotes to single quotes gives:
>>>
>>> UPDATE my_table SET labels = labels + {id: 'foo'} where id = '';
>>>
>>> InvalidRequest: Error from server: code=2200 [Invalid query]
>>> message="Invalid user type literal for labels of type list>"
>>>
>>>
>>> On Sat, Nov 12, 2016 at 7:50 PM, Ali Akhtar 
>>> wrote:
>>>
>>>> The question is about appending to a set of frozen and how to do
>>>> that while avoiding the race condition.
>>>>
>>>> If I run:
>>>>
>>>>  UPDATE my_table SET labels = labels + {id: "foo"} where id = 'xx';
>>>>
>>>> I get:
>>>>
>>>> SyntaxException: line 1:57 no viable alternative at input '}' (...=
>>>> labels + {id: ["fo]o"}...)
>>>>
>>>> Here labels is set>
>>>>
>>>> On Sat, Nov 12, 2016 at 7:40 PM, Vladimir Yudovin >>> > wrote:
>>>>
>>>>> If I used consistency = ALL both when getting the record, and when
>>>>> saving the record, will that avoid the race condition?
>>>>> If I use consistency level = all, will that cause it to end up with
>>>>> [1,2]?
>>>>> No. Even if you have only one host it's possible that two threads
>>>>> first both read data and than overwrite existing value one by one.
>>>>>
>>>>> The list is actually of a list> and not a text (I used
>>>>> text for simplification, apologies).
>>>>> In that case, will updates still merge the list values instead of
>>>>> overwriting them?
>>>>> Do you mean UPDATE cql operation? Yes, it adds new values to list,
>>>>> allowing duplicates.
>>>>>
>>>>> When setting a new value to a list, C* will do a read-delete-write
>>>>> internally e.g. read the current list, remove all its value (by a range
>>>>> tombstone) and then write the new list.
>>>>> As I mentioned duplicates are allowed in LIST, and as DOC says:
>>>>>
>>>>> These update operations are implemented internally without any
>>>>> read-before-write. Appending and prepending a new element to the list
>>>>> writes only the new element.
>>>>>
>>>>> Only when using index
>>>>>
>>>>> When you add an element at a particular position, Cassandra reads the
>>>>> entire list, and then writes only the updated element. Consequently, 
>>>>> adding
>>>>> an element at a particular position results in greater latency than
>>>>> appending or prefixing an element to a list.
>>>>>
>>>>>
>>>>> Best regards, Vladimir Yudovin,
>>>>>
>>>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>>>> CassandraLaunch your cluster in minutes.*
>>>>>
>>>>>
>>>>>  On Sat, 12 Nov 2016 07:57:36 -0500*Ali Akhtar
>>>>> >* wrote 
>>>>>
>>>>> The labels collection is of the type set> , where lab

Re: Consistency when adding data to collections concurrently?

2016-11-12 Thread Ali Akhtar

Looks like the trick was to use [] around the udt value literal.

Any way to do this using the java driver?

On Sat, Nov 12, 2016 at 7:58 PM, Ali Akhtar  wrote:

> Changing the double quotes to single quotes gives:
>
> UPDATE my_table SET labels = labels + {id: 'foo'} where id = '';
>
> InvalidRequest: Error from server: code=2200 [Invalid query]
> message="Invalid user type literal for labels of type list>"
>
>
> On Sat, Nov 12, 2016 at 7:50 PM, Ali Akhtar  wrote:
>
>> The question is about appending to a set of frozen and how to do
>> that while avoiding the race condition.
>>
>> If I run:
>>
>>  UPDATE my_table SET labels = labels + {id: "foo"} where id = 'xx';
>>
>> I get:
>>
>> SyntaxException: line 1:57 no viable alternative at input '}' (...=
>> labels + {id: ["fo]o"}...)
>>
>> Here labels is set>
>>
>> On Sat, Nov 12, 2016 at 7:40 PM, Vladimir Yudovin 
>> wrote:
>>
>>> If I used consistency = ALL both when getting the record, and when
>>> saving the record, will that avoid the race condition?
>>> If I use consistency level = all, will that cause it to end up with
>>> [1,2]?
>>> No. Even if you have only one host it's possible that two threads first
>>> both read data and than overwrite existing value one by one.
>>>
>>> The list is actually of a list> and not a text (I used
>>> text for simplification, apologies).
>>> In that case, will updates still merge the list values instead of
>>> overwriting them?
>>> Do you mean UPDATE cql operation? Yes, it adds new values to list,
>>> allowing duplicates.
>>>
>>> When setting a new value to a list, C* will do a read-delete-write
>>> internally e.g. read the current list, remove all its value (by a range
>>> tombstone) and then write the new list.
>>> As I mentioned duplicates are allowed in LIST, and as DOC says:
>>>
>>> These update operations are implemented internally without any
>>> read-before-write. Appending and prepending a new element to the list
>>> writes only the new element.
>>>
>>> Only when using index
>>>
>>> When you add an element at a particular position, Cassandra reads the
>>> entire list, and then writes only the updated element. Consequently, adding
>>> an element at a particular position results in greater latency than
>>> appending or prefixing an element to a list.
>>>
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>> CassandraLaunch your cluster in minutes.*
>>>
>>>
>>>  On Sat, 12 Nov 2016 07:57:36 -0500*Ali Akhtar
>>> >* wrote 
>>>
>>> The labels collection is of the type set> , where label is
>>> a udt containing: id, name, description , all text fields.
>>>
>>> On Sat, Nov 12, 2016 at 5:54 PM, Ali Akhtar 
>>> wrote:
>>>
>>> The problem isn't just the update / insert though, right? Don't frozen
>>> entities get overwritten completely? So if I had [1] [2] being written as
>>> updates, won't each update overwrite the set completely, so i'll end up
>>> with either one of them instead of [1,2]?
>>>
>>> On Sat, Nov 12, 2016 at 5:50 PM, DuyHai Doan 
>>> wrote:
>>>
>>> Maybe you should use my Achilles mapper, which does generates UPDATE
>>> statements on collections and not only INSERT
>>> Le 12 nov. 2016 13:08, "Ali Akhtar"  a écrit :
>>>
>>> I am using the Java Cassandra mapper for all of these cases, so my code
>>> looks like this:
>>>
>>> Item myItem = myaccessor.get( itemId );
>>> Mapper mapper = mappingManager.create( Item.class );
>>>
>>> myItem.labels.add( newLabel );
>>> mapper.save( myItem );
>>>
>>> On Sat, Nov 12, 2016 at 5:06 PM, Ali Akhtar 
>>> wrote:
>>>
>>> Thanks DuyHai, I will switch to using a set.
>>>
>>> But I'm still not sure how to resolve the original question.
>>>
>>> - Original labels = []
>>> - Request 1 arrives with label = 1, and request 2 arrives with label = 2
>>> - Updates are sent to c* with labels = [1] and labels = [2]
>>> simultaneously.
>>>
>>> What will happen in the above case? Will it cause the labels to end up
>>> as [1,2] (what I want) or either [1]

Re: Consistency when adding data to collections concurrently?

2016-11-12 Thread Ali Akhtar

Changing the double quotes to single quotes gives:

UPDATE my_table SET labels = labels + {id: 'foo'} where id = '';

InvalidRequest: Error from server: code=2200 [Invalid query]
message="Invalid user type literal for labels of type list>"


On Sat, Nov 12, 2016 at 7:50 PM, Ali Akhtar  wrote:

> The question is about appending to a set of frozen and how to do that
> while avoiding the race condition.
>
> If I run:
>
>  UPDATE my_table SET labels = labels + {id: "foo"} where id = 'xx';
>
> I get:
>
> SyntaxException: line 1:57 no viable alternative at input '}' (...= labels
> + {id: ["fo]o"}...)
>
> Here labels is set>
>
> On Sat, Nov 12, 2016 at 7:40 PM, Vladimir Yudovin 
> wrote:
>
>> If I used consistency = ALL both when getting the record, and when saving
>> the record, will that avoid the race condition?
>> If I use consistency level = all, will that cause it to end up with [1,2]?
>> No. Even if you have only one host it's possible that two threads first
>> both read data and than overwrite existing value one by one.
>>
>> The list is actually of a list> and not a text (I used
>> text for simplification, apologies).
>> In that case, will updates still merge the list values instead of
>> overwriting them?
>> Do you mean UPDATE cql operation? Yes, it adds new values to list,
>> allowing duplicates.
>>
>> When setting a new value to a list, C* will do a read-delete-write
>> internally e.g. read the current list, remove all its value (by a range
>> tombstone) and then write the new list.
>> As I mentioned duplicates are allowed in LIST, and as DOC says:
>>
>> These update operations are implemented internally without any
>> read-before-write. Appending and prepending a new element to the list
>> writes only the new element.
>>
>> Only when using index
>>
>> When you add an element at a particular position, Cassandra reads the
>> entire list, and then writes only the updated element. Consequently, adding
>> an element at a particular position results in greater latency than
>> appending or prefixing an element to a list.
>>
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>> CassandraLaunch your cluster in minutes.*
>>
>>
>>  On Sat, 12 Nov 2016 07:57:36 -0500*Ali Akhtar > >* wrote 
>>
>> The labels collection is of the type set> , where label is
>> a udt containing: id, name, description , all text fields.
>>
>> On Sat, Nov 12, 2016 at 5:54 PM, Ali Akhtar  wrote:
>>
>> The problem isn't just the update / insert though, right? Don't frozen
>> entities get overwritten completely? So if I had [1] [2] being written as
>> updates, won't each update overwrite the set completely, so i'll end up
>> with either one of them instead of [1,2]?
>>
>> On Sat, Nov 12, 2016 at 5:50 PM, DuyHai Doan 
>> wrote:
>>
>> Maybe you should use my Achilles mapper, which does generates UPDATE
>> statements on collections and not only INSERT
>> Le 12 nov. 2016 13:08, "Ali Akhtar"  a écrit :
>>
>> I am using the Java Cassandra mapper for all of these cases, so my code
>> looks like this:
>>
>> Item myItem = myaccessor.get( itemId );
>> Mapper mapper = mappingManager.create( Item.class );
>>
>> myItem.labels.add( newLabel );
>> mapper.save( myItem );
>>
>> On Sat, Nov 12, 2016 at 5:06 PM, Ali Akhtar  wrote:
>>
>> Thanks DuyHai, I will switch to using a set.
>>
>> But I'm still not sure how to resolve the original question.
>>
>> - Original labels = []
>> - Request 1 arrives with label = 1, and request 2 arrives with label = 2
>> - Updates are sent to c* with labels = [1] and labels = [2]
>> simultaneously.
>>
>> What will happen in the above case? Will it cause the labels to end up as
>> [1,2] (what I want) or either [1] or [2]?
>>
>> If I use consistency level = all, will that cause it to end up with [1,2]?
>>
>> On Sat, Nov 12, 2016 at 4:59 PM, DuyHai Doan 
>> wrote:
>>
>> Don't use list, use set instead. If you need ordering of insertion, use a
>> map where timeuuid is generated by the client to guarantee
>> insertion order
>>
>> When setting a new value to a list, C* will do a read-delete-write
>> internally e.g. read the current list, remove all its value (by a range
>> tombstone) and then write the new list. Please note that prepend & append
&g

Re: Consistency when adding data to collections concurrently?

2016-11-12 Thread Ali Akhtar

The question is about appending to a set of frozen and how to do that
while avoiding the race condition.

If I run:

 UPDATE my_table SET labels = labels + {id: "foo"} where id = 'xx';

I get:

SyntaxException: line 1:57 no viable alternative at input '}' (...= labels
+ {id: ["fo]o"}...)

Here labels is set>

On Sat, Nov 12, 2016 at 7:40 PM, Vladimir Yudovin 
wrote:

> If I used consistency = ALL both when getting the record, and when saving
> the record, will that avoid the race condition?
> If I use consistency level = all, will that cause it to end up with [1,2]?
> No. Even if you have only one host it's possible that two threads first
> both read data and than overwrite existing value one by one.
>
> The list is actually of a list> and not a text (I used text
> for simplification, apologies).
> In that case, will updates still merge the list values instead of
> overwriting them?
> Do you mean UPDATE cql operation? Yes, it adds new values to list,
> allowing duplicates.
>
> When setting a new value to a list, C* will do a read-delete-write
> internally e.g. read the current list, remove all its value (by a range
> tombstone) and then write the new list.
> As I mentioned duplicates are allowed in LIST, and as DOC says:
>
> These update operations are implemented internally without any
> read-before-write. Appending and prepending a new element to the list
> writes only the new element.
>
> Only when using index
>
> When you add an element at a particular position, Cassandra reads the
> entire list, and then writes only the updated element. Consequently, adding
> an element at a particular position results in greater latency than
> appending or prefixing an element to a list.
>
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
>  On Sat, 12 Nov 2016 07:57:36 -0500*Ali Akhtar  >* wrote 
>
> The labels collection is of the type set> , where label is a
> udt containing: id, name, description , all text fields.
>
> On Sat, Nov 12, 2016 at 5:54 PM, Ali Akhtar  wrote:
>
> The problem isn't just the update / insert though, right? Don't frozen
> entities get overwritten completely? So if I had [1] [2] being written as
> updates, won't each update overwrite the set completely, so i'll end up
> with either one of them instead of [1,2]?
>
> On Sat, Nov 12, 2016 at 5:50 PM, DuyHai Doan  wrote:
>
> Maybe you should use my Achilles mapper, which does generates UPDATE
> statements on collections and not only INSERT
> Le 12 nov. 2016 13:08, "Ali Akhtar"  a écrit :
>
> I am using the Java Cassandra mapper for all of these cases, so my code
> looks like this:
>
> Item myItem = myaccessor.get( itemId );
> Mapper mapper = mappingManager.create( Item.class );
>
> myItem.labels.add( newLabel );
> mapper.save( myItem );
>
> On Sat, Nov 12, 2016 at 5:06 PM, Ali Akhtar  wrote:
>
> Thanks DuyHai, I will switch to using a set.
>
> But I'm still not sure how to resolve the original question.
>
> - Original labels = []
> - Request 1 arrives with label = 1, and request 2 arrives with label = 2
> - Updates are sent to c* with labels = [1] and labels = [2] simultaneously.
>
> What will happen in the above case? Will it cause the labels to end up as
> [1,2] (what I want) or either [1] or [2]?
>
> If I use consistency level = all, will that cause it to end up with [1,2]?
>
> On Sat, Nov 12, 2016 at 4:59 PM, DuyHai Doan  wrote:
>
> Don't use list, use set instead. If you need ordering of insertion, use a
> map where timeuuid is generated by the client to guarantee
> insertion order
>
> When setting a new value to a list, C* will do a read-delete-write
> internally e.g. read the current list, remove all its value (by a range
> tombstone) and then write the new list. Please note that prepend & append
> operations on list do not require this read-delete-write and thus performs
> slightly better
>
> On Sat, Nov 12, 2016 at 11:34 AM, Ali Akhtar  wrote:
>
> I have a table where each record contains a list of labels.
>
> I have an endpoint which responds to new labels being added to a record by
> the user.
>
> Consider the following scenario:
>
> - Record X, labels = []
> - User selects 2 labels, clicks a button, and 2 http requests are
> generated.
> - The server receives request for Label 1 and Label 2 at the same time.
> - Both requests see the labels as empty, add 1 label to the collection,
> and send it.
> - Record state as label 1 request sees it: [1], as label 2 sees it: [2]
>
> How will the above conflict be resolved? What can I do so I end up with
> [1, 2] instead of either [1] or [2] after both requests have been processed?
>
>
>

Re: Consistency when adding data to collections concurrently?

2016-11-12 Thread Ali Akhtar

The labels collection is of the type set> , where label is a
udt containing: id, name, description , all text fields.

On Sat, Nov 12, 2016 at 5:54 PM, Ali Akhtar  wrote:

> The problem isn't just the update / insert though, right? Don't frozen
> entities get overwritten completely? So if I had [1] [2] being written as
> updates, won't each update overwrite the set completely, so i'll end up
> with either one of them instead of [1,2]?
>
> On Sat, Nov 12, 2016 at 5:50 PM, DuyHai Doan  wrote:
>
>> Maybe you should use my Achilles mapper, which does generates UPDATE
>> statements on collections and not only INSERT
>> Le 12 nov. 2016 13:08, "Ali Akhtar"  a écrit :
>>
>>> I am using the Java Cassandra mapper for all of these cases, so my code
>>> looks like this:
>>>
>>> Item myItem = myaccessor.get( itemId );
>>> Mapper mapper = mappingManager.create( Item.class );
>>>
>>> myItem.labels.add( newLabel );
>>> mapper.save( myItem );
>>>
>>> On Sat, Nov 12, 2016 at 5:06 PM, Ali Akhtar 
>>> wrote:
>>>
>>>> Thanks DuyHai, I will switch to using a set.
>>>>
>>>> But I'm still not sure how to resolve the original question.
>>>>
>>>> - Original labels = []
>>>> - Request 1 arrives with label = 1, and request 2 arrives with label = 2
>>>> - Updates are sent to c* with labels = [1] and labels = [2]
>>>> simultaneously.
>>>>
>>>> What will happen in the above case? Will it cause the labels to end up
>>>> as [1,2] (what I want) or either [1] or [2]?
>>>>
>>>> If I use consistency level = all, will that cause it to end up with
>>>> [1,2]?
>>>>
>>>> On Sat, Nov 12, 2016 at 4:59 PM, DuyHai Doan 
>>>> wrote:
>>>>
>>>>> Don't use list, use set instead. If you need ordering of insertion,
>>>>> use a map where timeuuid is generated by the client to
>>>>> guarantee insertion order
>>>>>
>>>>> When setting a new value to a list, C* will do a read-delete-write
>>>>> internally e.g. read the current list, remove all its value (by a range
>>>>> tombstone) and then write the new list. Please note that prepend & append
>>>>> operations on list do not require this read-delete-write and thus performs
>>>>> slightly better
>>>>>
>>>>> On Sat, Nov 12, 2016 at 11:34 AM, Ali Akhtar 
>>>>> wrote:
>>>>>
>>>>>> I have a table where each record contains a list of labels.
>>>>>>
>>>>>> I have an endpoint which responds to new labels being added to a
>>>>>> record by the user.
>>>>>>
>>>>>> Consider the following scenario:
>>>>>>
>>>>>> - Record X, labels = []
>>>>>> - User selects 2 labels, clicks a button, and 2 http requests are
>>>>>> generated.
>>>>>> - The server receives request for Label 1 and Label 2 at the same
>>>>>> time.
>>>>>> - Both requests see the labels as empty, add 1 label to the
>>>>>> collection, and send it.
>>>>>> - Record state as label 1 request sees it: [1], as label 2 sees it:
>>>>>> [2]
>>>>>>
>>>>>> How will the above conflict be resolved? What can I do so I end up
>>>>>> with [1, 2] instead of either [1] or [2] after both requests have been
>>>>>> processed?
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>

Re: Consistency when adding data to collections concurrently?

2016-11-12 Thread Ali Akhtar

The problem isn't just the update / insert though, right? Don't frozen
entities get overwritten completely? So if I had [1] [2] being written as
updates, won't each update overwrite the set completely, so i'll end up
with either one of them instead of [1,2]?

On Sat, Nov 12, 2016 at 5:50 PM, DuyHai Doan  wrote:

> Maybe you should use my Achilles mapper, which does generates UPDATE
> statements on collections and not only INSERT
> Le 12 nov. 2016 13:08, "Ali Akhtar"  a écrit :
>
>> I am using the Java Cassandra mapper for all of these cases, so my code
>> looks like this:
>>
>> Item myItem = myaccessor.get( itemId );
>> Mapper mapper = mappingManager.create( Item.class );
>>
>> myItem.labels.add( newLabel );
>> mapper.save( myItem );
>>
>> On Sat, Nov 12, 2016 at 5:06 PM, Ali Akhtar  wrote:
>>
>>> Thanks DuyHai, I will switch to using a set.
>>>
>>> But I'm still not sure how to resolve the original question.
>>>
>>> - Original labels = []
>>> - Request 1 arrives with label = 1, and request 2 arrives with label = 2
>>> - Updates are sent to c* with labels = [1] and labels = [2]
>>> simultaneously.
>>>
>>> What will happen in the above case? Will it cause the labels to end up
>>> as [1,2] (what I want) or either [1] or [2]?
>>>
>>> If I use consistency level = all, will that cause it to end up with
>>> [1,2]?
>>>
>>> On Sat, Nov 12, 2016 at 4:59 PM, DuyHai Doan 
>>> wrote:
>>>
>>>> Don't use list, use set instead. If you need ordering of insertion, use
>>>> a map where timeuuid is generated by the client to guarantee
>>>> insertion order
>>>>
>>>> When setting a new value to a list, C* will do a read-delete-write
>>>> internally e.g. read the current list, remove all its value (by a range
>>>> tombstone) and then write the new list. Please note that prepend & append
>>>> operations on list do not require this read-delete-write and thus performs
>>>> slightly better
>>>>
>>>> On Sat, Nov 12, 2016 at 11:34 AM, Ali Akhtar 
>>>> wrote:
>>>>
>>>>> I have a table where each record contains a list of labels.
>>>>>
>>>>> I have an endpoint which responds to new labels being added to a
>>>>> record by the user.
>>>>>
>>>>> Consider the following scenario:
>>>>>
>>>>> - Record X, labels = []
>>>>> - User selects 2 labels, clicks a button, and 2 http requests are
>>>>> generated.
>>>>> - The server receives request for Label 1 and Label 2 at the same time.
>>>>> - Both requests see the labels as empty, add 1 label to the
>>>>> collection, and send it.
>>>>> - Record state as label 1 request sees it: [1], as label 2 sees it: [2]
>>>>>
>>>>> How will the above conflict be resolved? What can I do so I end up
>>>>> with [1, 2] instead of either [1] or [2] after both requests have been
>>>>> processed?
>>>>>
>>>>
>>>>
>>>
>>

Deadlock in callbacks to async operations (Java)

2016-11-12 Thread Ali Akhtar

At https://datastax.github.io/java-driver/manual/async/ the docs say to not
do any blocking operations within the callback of an async operation. This
example is given as one that can cause a deadlock:

ListenableFuture resultSet = Futures.transform(session,
new Function() {
public ResultSet apply(Session session) {
// Synchronous operation in a callback.
// DON'T DO THIS! It might deadlock.
return session.execute("select release_version from
system.local");
}
});

Will the above example work if instead of session.execute, it was doing
session.executeAsync()?

Re: Consistency when adding data to collections concurrently?

2016-11-12 Thread Ali Akhtar

I am using the Java Cassandra mapper for all of these cases, so my code
looks like this:

Item myItem = myaccessor.get( itemId );
Mapper mapper = mappingManager.create( Item.class );

myItem.labels.add( newLabel );
mapper.save( myItem );

On Sat, Nov 12, 2016 at 5:06 PM, Ali Akhtar  wrote:

> Thanks DuyHai, I will switch to using a set.
>
> But I'm still not sure how to resolve the original question.
>
> - Original labels = []
> - Request 1 arrives with label = 1, and request 2 arrives with label = 2
> - Updates are sent to c* with labels = [1] and labels = [2] simultaneously.
>
> What will happen in the above case? Will it cause the labels to end up as
> [1,2] (what I want) or either [1] or [2]?
>
> If I use consistency level = all, will that cause it to end up with [1,2]?
>
> On Sat, Nov 12, 2016 at 4:59 PM, DuyHai Doan  wrote:
>
>> Don't use list, use set instead. If you need ordering of insertion, use a
>> map where timeuuid is generated by the client to guarantee
>> insertion order
>>
>> When setting a new value to a list, C* will do a read-delete-write
>> internally e.g. read the current list, remove all its value (by a range
>> tombstone) and then write the new list. Please note that prepend & append
>> operations on list do not require this read-delete-write and thus performs
>> slightly better
>>
>> On Sat, Nov 12, 2016 at 11:34 AM, Ali Akhtar 
>> wrote:
>>
>>> I have a table where each record contains a list of labels.
>>>
>>> I have an endpoint which responds to new labels being added to a record
>>> by the user.
>>>
>>> Consider the following scenario:
>>>
>>> - Record X, labels = []
>>> - User selects 2 labels, clicks a button, and 2 http requests are
>>> generated.
>>> - The server receives request for Label 1 and Label 2 at the same time.
>>> - Both requests see the labels as empty, add 1 label to the collection,
>>> and send it.
>>> - Record state as label 1 request sees it: [1], as label 2 sees it: [2]
>>>
>>> How will the above conflict be resolved? What can I do so I end up with
>>> [1, 2] instead of either [1] or [2] after both requests have been processed?
>>>
>>
>>
>

Re: Consistency when adding data to collections concurrently?

2016-11-12 Thread Ali Akhtar

Thanks DuyHai, I will switch to using a set.

But I'm still not sure how to resolve the original question.

- Original labels = []
- Request 1 arrives with label = 1, and request 2 arrives with label = 2
- Updates are sent to c* with labels = [1] and labels = [2] simultaneously.

What will happen in the above case? Will it cause the labels to end up as
[1,2] (what I want) or either [1] or [2]?

If I use consistency level = all, will that cause it to end up with [1,2]?

On Sat, Nov 12, 2016 at 4:59 PM, DuyHai Doan  wrote:

> Don't use list, use set instead. If you need ordering of insertion, use a
> map where timeuuid is generated by the client to guarantee
> insertion order
>
> When setting a new value to a list, C* will do a read-delete-write
> internally e.g. read the current list, remove all its value (by a range
> tombstone) and then write the new list. Please note that prepend & append
> operations on list do not require this read-delete-write and thus performs
> slightly better
>
> On Sat, Nov 12, 2016 at 11:34 AM, Ali Akhtar  wrote:
>
>> I have a table where each record contains a list of labels.
>>
>> I have an endpoint which responds to new labels being added to a record
>> by the user.
>>
>> Consider the following scenario:
>>
>> - Record X, labels = []
>> - User selects 2 labels, clicks a button, and 2 http requests are
>> generated.
>> - The server receives request for Label 1 and Label 2 at the same time.
>> - Both requests see the labels as empty, add 1 label to the collection,
>> and send it.
>> - Record state as label 1 request sees it: [1], as label 2 sees it: [2]
>>
>> How will the above conflict be resolved? What can I do so I end up with
>> [1, 2] instead of either [1] or [2] after both requests have been processed?
>>
>
>

Re: Consistency when adding data to collections concurrently?

2016-11-12 Thread Ali Akhtar

If I used consistency = ALL both when getting the record, and when saving
the record, will that avoid the race condition?

On Sat, Nov 12, 2016 at 4:26 PM, Ali Akhtar  wrote:

> I'm responding to a 3rd party API, so I have no control over sending the
> labels together instead of one by one. In this case, the API will send them
> one by one.
>
> The list is actually of a list> and not a text (I used text
> for simplification, apologies).
>
> In that case, will updates still merge the list values instead of
> overwriting them?
>
> On Sat, Nov 12, 2016 at 4:15 PM, Oskar Kjellin 
> wrote:
>
>> Could you not send both labels in one request? Race conditions should
>> still be handled as Vladimir suggests. But in this specific case the client
>> could send both as 1 request thus simplifying the solution.
>>
>> /Oskar
>>
>> On 12 nov. 2016, at 12:05, Vladimir Yudovin  wrote:
>>
>> Hi Ali,
>>
>> >What can I do so I end up with [1, 2] instead of either [1] or [2] after
>> both requests have been processed?
>> Use UPDATE, not INSERT. Thus new labels will be added to list, without
>> overwriting old ones. Also consider usage of SET instead of LIST to avoid
>> duplicates.
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>> CassandraLaunch your cluster in minutes.*
>>
>>
>>  On Sat, 12 Nov 2016 05:34:24 -0500*Ali Akhtar > >* wrote 
>>
>> I have a table where each record contains a list of labels.
>>
>> I have an endpoint which responds to new labels being added to a record
>> by the user.
>>
>> Consider the following scenario:
>>
>> - Record X, labels = []
>> - User selects 2 labels, clicks a button, and 2 http requests are
>> generated.
>> - The server receives request for Label 1 and Label 2 at the same time.
>> - Both requests see the labels as empty, add 1 label to the collection,
>> and send it.
>> - Record state as label 1 request sees it: [1], as label 2 sees it: [2]
>>
>> How will the above conflict be resolved? What can I do so I end up with
>> [1, 2] instead of either [1] or [2] after both requests have been processed?
>>
>>
>>
>

Re: Consistency when adding data to collections concurrently?

2016-11-12 Thread Ali Akhtar

I'm responding to a 3rd party API, so I have no control over sending the
labels together instead of one by one. In this case, the API will send them
one by one.

The list is actually of a list> and not a text (I used text
for simplification, apologies).

In that case, will updates still merge the list values instead of
overwriting them?

On Sat, Nov 12, 2016 at 4:15 PM, Oskar Kjellin 
wrote:

> Could you not send both labels in one request? Race conditions should
> still be handled as Vladimir suggests. But in this specific case the client
> could send both as 1 request thus simplifying the solution.
>
> /Oskar
>
> On 12 nov. 2016, at 12:05, Vladimir Yudovin  wrote:
>
> Hi Ali,
>
> >What can I do so I end up with [1, 2] instead of either [1] or [2] after
> both requests have been processed?
> Use UPDATE, not INSERT. Thus new labels will be added to list, without
> overwriting old ones. Also consider usage of SET instead of LIST to avoid
> duplicates.
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
>  On Sat, 12 Nov 2016 05:34:24 -0500*Ali Akhtar  >* wrote 
>
> I have a table where each record contains a list of labels.
>
> I have an endpoint which responds to new labels being added to a record by
> the user.
>
> Consider the following scenario:
>
> - Record X, labels = []
> - User selects 2 labels, clicks a button, and 2 http requests are
> generated.
> - The server receives request for Label 1 and Label 2 at the same time.
> - Both requests see the labels as empty, add 1 label to the collection,
> and send it.
> - Record state as label 1 request sees it: [1], as label 2 sees it: [2]
>
> How will the above conflict be resolved? What can I do so I end up with
> [1, 2] instead of either [1] or [2] after both requests have been processed?
>
>
>

Consistency when adding data to collections concurrently?

2016-11-12 Thread Ali Akhtar

I have a table where each record contains a list of labels.

I have an endpoint which responds to new labels being added to a record by
the user.

Consider the following scenario:

- Record X, labels = []
- User selects 2 labels, clicks a button, and 2 http requests are generated.
- The server receives request for Label 1 and Label 2 at the same time.
- Both requests see the labels as empty, add 1 label to the collection, and
send it.
- Record state as label 1 request sees it: [1], as label 2 sees it: [2]

How will the above conflict be resolved? What can I do so I end up with [1,
2] instead of either [1] or [2] after both requests have been processed?

Re: Having Counters in a Collection, like a map?

2016-11-09 Thread Ali Akhtar

The only issue with the last 2 solutions is, they require knowing the key
in advance in order to look up the counters.

The keys however are dynamic in my case.

On Wed, Nov 9, 2016 at 5:47 PM, DuyHai Doan  wrote:

> "Is there a way to do this in c* which doesn't require creating 1 table
> per type of map that i need?"
>
> You're lucky, it's possible with some tricks
>
>
> CREATE TABLE my_counters_map (
>  partition_key id uuid,
>  map_name text,
>  map_key int,
>  count counter,
>  PRIMARY KEY ((id), map_name, map_key)
> );
>
> This table can be seen as:
>
> Map >>
>
> The couple (map_key, counter) simulates your map
>
> The clustering column map_name allows you to have multiple maps of
> counters for a single partition_key
>
>
>
> On Wed, Nov 9, 2016 at 1:32 PM, Vladimir Yudovin 
> wrote:
>
>> Unfortunately it's impossible nor to use counters inside collections
>> neither mix them with other non-counter columns :
>>
>> CREATE TABLE cnt (id int PRIMARY KEY , cntmap MAP);
>> InvalidRequest: Error from server: code=2200 [Invalid query]
>> message="Counters are not allowed inside collections: map"
>>
>> CREATE TABLE cnt (id int PRIMARY KEY , cnt1 counter, txt text);
>> InvalidRequest: Error from server: code=2200 [Invalid query]
>> message="Cannot mix counter and non counter columns in the same table"
>>
>>
>> >Is there a way to do this in c* which doesn't require creating 1 table
>> per type of map that i need?
>> But you don't need to create separate table per each counter, just use
>> one row per counter:
>>
>> CREATE TABLE cnt (id int PRIMARY KEY , value counter);
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>> CassandraLaunch your cluster in minutes.*
>>
>>
>>  On Wed, 09 Nov 2016 07:17:53 -0500*Ali Akhtar > >* wrote 
>>
>> I have a use-case where I need to have a dynamic number of counters.
>>
>> The easiest way to do this would be to have a map where the
>> int is the key, and the counter is the value which is incremented /
>> decremented. E.g if something related to 5 happened, then i'd get the
>> counter for 5 and increment / decrement it.
>>
>> I also need to have multiple maps of this type, where each
>> int is a key referring to something different.
>>
>> Is there a way to do this in c* which doesn't require creating 1 table
>> per type of map that i need?
>>
>>
>>
>

Having Counters in a Collection, like a map?

2016-11-09 Thread Ali Akhtar

I have a use-case where I need to have a dynamic number of counters.

The easiest way to do this would be to have a map where the
int is the key, and the counter is the value which is incremented /
decremented. E.g if something related to 5 happened, then i'd get the
counter for 5 and increment / decrement it.

I also need to have multiple maps of this type, where each
int is a key referring to something different.

Is there a way to do this in c* which doesn't require creating 1 table per
type of map that i need?

Re: Improving performance where a lot of updates and deletes are required?

2016-11-08 Thread Ali Akhtar

Does TTL also cause tombstones?

On Tue, Nov 8, 2016 at 6:57 PM, Vladimir Yudovin 
wrote:

> >The deletes will be done at a scheduled time, probably at the end of the
> day, each day.
>
> Probably you can use TTL? http://docs.datastax.com/en/
> cql/3.1/cql/cql_using/use_expire_c.html
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
> ---- On Tue, 08 Nov 2016 05:04:12 -0500*Ali Akhtar  >* wrote 
>
> I have a use case where a lot of updates and deletes to a table will be
> necessary.
>
> The deletes will be done at a scheduled time, probably at the end of the
> day, each day.
>
> Updates will be done throughout the day, as new data comes in.
>
> Are there any guidelines on improving cassandra's performance for this use
> case? Any caveats to be aware of? Any tips, like running nodetool repair
> every X days?
>
> Thanks.
>
>
>

Re: Improving performance where a lot of updates and deletes are required?

2016-11-08 Thread Ali Akhtar

Yes, because there will also be a lot of inserts, and the linear
scalability that c* offers is required.

But the inserts aren't static, and the data that comes in will need to be
updated in response to user events.

Data which hasn't been touched for over a week has to be deleted.
(Sensitive data, so better to delete when its out of date rather than store
it).

Couldn't really do the weekly tables without massively complicating my
report generation, as the entire dataset needs to be queried for generating
certain reports.

So my question is really about how to get the best out of c* in this sort
of scenario.

On Tue, Nov 8, 2016 at 3:05 PM, DuyHai Doan  wrote:

> Are you sure Cassandra is a good fit for this kind of heavy update &
> delete scenario ?
>
> Otherwise, you can always use several tables (one table/day, rotating
> through 7 days for a week) and do a truncate of the table at the end of the
> day.
>
> On Tue, Nov 8, 2016 at 11:04 AM, Ali Akhtar  wrote:
>
>> I have a use case where a lot of updates and deletes to a table will be
>> necessary.
>>
>> The deletes will be done at a scheduled time, probably at the end of the
>> day, each day.
>>
>> Updates will be done throughout the day, as new data comes in.
>>
>> Are there any guidelines on improving cassandra's performance for this
>> use case? Any caveats to be aware of? Any tips, like running nodetool
>> repair every X days?
>>
>> Thanks.
>>
>
>

Improving performance where a lot of updates and deletes are required?

2016-11-08 Thread Ali Akhtar

I have a use case where a lot of updates and deletes to a table will be
necessary.

The deletes will be done at a scheduled time, probably at the end of the
day, each day.

Updates will be done throughout the day, as new data comes in.

Are there any guidelines on improving cassandra's performance for this use
case? Any caveats to be aware of? Any tips, like running nodetool repair
every X days?

Thanks.

Re: Using a Set for UDTs, how is uniqueness established?

2016-11-07 Thread Ali Akhtar

Huh, so that means updates to the udt values won't be possible?

Sticking to a map then.

On Mon, Nov 7, 2016 at 5:31 PM, DuyHai Doan  wrote:

> So, to compare UDT values, Cassandra will compare them field by field. So
> that udt1.equals(udt2) results in:
>
>   udt1.field1.equals(udt2.field1)
> && udt1.field2.equals(udt2.field2)
> ...
> && udt1.fieldN.equals(udt2.fieldN)
>
> Your idea of using field "id" to distinguish between UDT value is good
> e.g. if the "id" value mismatches then the 2 UDT are different. However, if
> the "id" values do match, it does not guarantee that the UDT values match
> since it requires that all other fields match.
>
>
>
> On Mon, Nov 7, 2016 at 1:14 PM, Ali Akhtar  wrote:
>
>> I have a UDT which contains a text 'id' field, which should be used to
>> establish the uniqueness of the UDT.
>>
>> I'd like to have a set field in a table, and I'd like to use the
>> id of the udts to establish uniqueness.
>>
>> Any ideas how this can be done? Also using Java, and c* 3.7
>>
>
>

Using a Set for UDTs, how is uniqueness established?

2016-11-07 Thread Ali Akhtar

I have a UDT which contains a text 'id' field, which should be used to
establish the uniqueness of the UDT.

I'd like to have a set field in a table, and I'd like to use the id
of the udts to establish uniqueness.

Any ideas how this can be done? Also using Java, and c* 3.7

Re: Are Cassandra writes are faster than reads?

2016-11-06 Thread Ali Akhtar

tl;dr? I just want to know if updates are bad for performance, and if so,
for how long.

On Mon, Nov 7, 2016 at 10:23 AM, Ben Bromhead  wrote:

> Check out https://wiki.apache.org/cassandra/WritePathForUsers for the
> full gory details.
>
> On Sun, 6 Nov 2016 at 21:09 Ali Akhtar  wrote:
>
>> How long does it take for updates to get merged / compacted into the main
>> data file?
>>
>> On Mon, Nov 7, 2016 at 5:31 AM, Ben Bromhead  wrote:
>>
>> To add some flavor as to how the commitlog implementation is so quick.
>>
>> It only flushes to disk every 10s by default. So writes are effectively
>> done to memory and then to disk asynchronously later on. This is generally
>> accepted to be OK, as the write is also going to other nodes.
>>
>> You can of course change this behavior to flush on each write or to skip
>> the commitlog altogether (danger!). This however will change how "safe"
>> things are from a durability perspective.
>>
>> On Sun, Nov 6, 2016, 12:51 Jeff Jirsa  wrote:
>>
>> Cassandra writes are particularly fast, for a few reasons:
>>
>>
>>
>> 1)   Most writes go to a commitlog (append-only file, written
>> linearly, so particularly fast in terms of disk operations) and then pushed
>> to the memTable. Memtable is flushed in batches to the permanent data
>> files, so it buffers many mutations and then does a sequential write to
>> persist that data to disk.
>>
>> 2)   Reads may have to merge data from many data tables on disk.
>> Because the writes (described very briefly in step 1) write to immutable
>> files, updates/deletes have to be merged on read – this is extra effort for
>> the read path.
>>
>>
>>
>> If you don’t do much in terms of overwrites/deletes, and your partitions
>> are particularly small, and your data fits in RAM (probably mmap/page cache
>> of data files, unless you’re using the row cache), reads may be very fast
>> for you. Certainly individual reads on low-merge workloads can be < 0.1ms.
>>
>>
>>
>> -  Jeff
>>
>>
>>
>> *From: *Vikas Jaiman 
>> *Reply-To: *"user@cassandra.apache.org" 
>> *Date: *Sunday, November 6, 2016 at 12:42 PM
>> *To: *"user@cassandra.apache.org" 
>> *Subject: *Are Cassandra writes are faster than reads?
>>
>>
>>
>> Hi all,
>>
>>
>>
>> Are Cassandra writes are faster than reads ?? If yes, why is this so? I
>> am using consistency 1 and data is in memory.
>>
>>
>>
>> Vikas
>>
>> --
>> Ben Bromhead
>> CTO | Instaclustr <https://www.instaclustr.com/>
>> +1 650 284 9692
>> Managed Cassandra / Spark on AWS, Azure and Softlayer
>>
>>
>> --
> Ben Bromhead
> CTO | Instaclustr <https://www.instaclustr.com/>
> +1 650 284 9692
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>

Re: Are Cassandra writes are faster than reads?

2016-11-06 Thread Ali Akhtar

How long does it take for updates to get merged / compacted into the main
data file?

On Mon, Nov 7, 2016 at 5:31 AM, Ben Bromhead  wrote:

> To add some flavor as to how the commitlog implementation is so quick.
>
> It only flushes to disk every 10s by default. So writes are effectively
> done to memory and then to disk asynchronously later on. This is generally
> accepted to be OK, as the write is also going to other nodes.
>
> You can of course change this behavior to flush on each write or to skip
> the commitlog altogether (danger!). This however will change how "safe"
> things are from a durability perspective.
>
> On Sun, Nov 6, 2016, 12:51 Jeff Jirsa  wrote:
>
>> Cassandra writes are particularly fast, for a few reasons:
>>
>>
>>
>> 1)   Most writes go to a commitlog (append-only file, written
>> linearly, so particularly fast in terms of disk operations) and then pushed
>> to the memTable. Memtable is flushed in batches to the permanent data
>> files, so it buffers many mutations and then does a sequential write to
>> persist that data to disk.
>>
>> 2)   Reads may have to merge data from many data tables on disk.
>> Because the writes (described very briefly in step 1) write to immutable
>> files, updates/deletes have to be merged on read – this is extra effort for
>> the read path.
>>
>>
>>
>> If you don’t do much in terms of overwrites/deletes, and your partitions
>> are particularly small, and your data fits in RAM (probably mmap/page cache
>> of data files, unless you’re using the row cache), reads may be very fast
>> for you. Certainly individual reads on low-merge workloads can be < 0.1ms.
>>
>>
>>
>> -  Jeff
>>
>>
>>
>> *From: *Vikas Jaiman 
>> *Reply-To: *"user@cassandra.apache.org" 
>> *Date: *Sunday, November 6, 2016 at 12:42 PM
>> *To: *"user@cassandra.apache.org" 
>> *Subject: *Are Cassandra writes are faster than reads?
>>
>>
>>
>> Hi all,
>>
>>
>>
>> Are Cassandra writes are faster than reads ?? If yes, why is this so? I
>> am using consistency 1 and data is in memory.
>>
>>
>>
>> Vikas
>>
> --
> Ben Bromhead
> CTO | Instaclustr 
> +1 650 284 9692
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>

Re: Using Instants for timestamps in Java mappings?

2016-11-04 Thread Ali Akhtar

This should be added to core

On Fri, Nov 4, 2016 at 2:37 PM, DuyHai Doan  wrote:

> You can:
>
> 1) Define your own codec : http://datastax.github.io/
> java-driver/manual/custom_codecs/
> 2) Use the InstantCodec from Java drive extra module: https://github.com/
> datastax/java-driver/blob/3.x/driver-extras/src/main/java/
> com/datastax/driver/extras/codecs/jdk8/InstantCodec.java
>
> On Fri, Nov 4, 2016 at 8:39 AM, Ali Akhtar  wrote:
>
>> Is it possible to use Instants to represent timestamp columns in java
>> mappings of cassandra tables? (Using the official java driver)
>>
>
>

Using Instants for timestamps in Java mappings?

2016-11-04 Thread Ali Akhtar

Is it possible to use Instants to represent timestamp columns in java
mappings of cassandra tables? (Using the official java driver)

Re: Cannot mix counter and non counter columns in the same table

2016-11-01 Thread Ali Akhtar

> I agree it makes code messier

That's not a trivial point.

It isn't just about doing two queries to read. Its also having to do two
queries to write and remembering to do that in all the places.

On Wed, Nov 2, 2016 at 1:56 AM, Cody Yancey  wrote:

> I agree it makes code messier, but you aren't really losing anything by
> separating them out into separate tables and then doing parallel queries.
>
> Counter tables already don't support atomic batch operations (all batch
> operations are unlogged), CAS operations (LWTs not supported) and have a
> whole host of other gotchas that don't apply as much to Cassandra but have
> more to do with the mathematical underpinnings of non-idempotent operations
> in a world where the Two Generals problem is unsolved.
>
> If Cassandra WAS to allow mixing of storage paradigms into one "logical"
> table it would probably be just two separate tables under the hood anyway
> since the write path is so different.
>
> This isn't Stockholm Syndrome for Cassandra as much as it is Stockholm
> Syndrome for databases. I've never used a database that could count very
> well, even non-distributed databases like postgres or mysql. Cassandra's
> implementation is at least fast and scalable.
>
> Thanks,
> Cody
>
> On Tue, Nov 1, 2016 at 2:13 PM, Edward Capriolo 
> wrote:
>
>> Here is a solution that I have leverage. Ignore the count of the value
>> and use a multi-part column name as it's value.
>>
>> For example:
>>
>> create column family stuff (
>> rowkey string,
>> column string,
>> value string.
>> counter_to_ignore long,
>> primary key( rowkey, column, value));
>>
>>
>>
>> On Tue, Nov 1, 2016 at 9:29 AM, Ali Akhtar  wrote:
>>
>>> That's a terrible gotcha rule.
>>>
>>> On Tue, Nov 1, 2016 at 6:27 PM, Cody Yancey  wrote:
>>>
>>>> In your table schema, you have KEYS and you have VALUES. Your KEYS are
>>>> text, but they could be any non-counter type or compound thereof. KEYS
>>>> obviously cannot ever be counters.
>>>>
>>>> Your VALUES, however, must be either all counters or all non-counters.
>>>> The official example you posted conforms to this limitation.
>>>>
>>>> Thanks,
>>>> Cody
>>>>
>>>> On Nov 1, 2016 7:16 AM, "Ali Akhtar"  wrote:
>>>>
>>>>> I'm not referring to the primary key, just to other columns.
>>>>>
>>>>> My primary key is a text, and my table contains a mix of texts, ints,
>>>>> and timestamps.
>>>>>
>>>>> If I try to change one of the ints to a counter and run the create
>>>>> table query, I get the error ' Cannot mix counter and non counter
>>>>> columns in the same table'
>>>>>
>>>>>
>>>>> On Tue, Nov 1, 2016 at 6:11 PM, Cody Yancey  wrote:
>>>>>
>>>>>> For counter tables, non-counter types are of course allowed in the
>>>>>> primary key. Counters would be meaningless otherwise.
>>>>>>
>>>>>> Thanks,
>>>>>> Cody
>>>>>>
>>>>>> On Nov 1, 2016 7:00 AM, "Ali Akhtar"  wrote:
>>>>>>
>>>>>>> In the documentation for counters:
>>>>>>>
>>>>>>> https://docs.datastax.com/en/cql/3.1/cql/cql_using/use_count
>>>>>>> er_t.html
>>>>>>>
>>>>>>> The example table is created via:
>>>>>>>
>>>>>>> CREATE TABLE counterks.page_view_counts
>>>>>>>   (counter_value counter,
>>>>>>>   url_name varchar,
>>>>>>>   page_name varchar,
>>>>>>>   PRIMARY KEY (url_name, page_name)
>>>>>>> );
>>>>>>>
>>>>>>> Yet if I try to create a table with a mixture of texts, ints,
>>>>>>> timestamps, and counters, i get the error ' Cannot mix counter and non
>>>>>>> counter columns in the same table'
>>>>>>>
>>>>>>> Is that supposed to be allowed or not allowed, given that the
>>>>>>> official example contains a mix of counters and non-counters?
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>

Re: Cannot mix counter and non counter columns in the same table

2016-11-01 Thread Ali Akhtar

^ Stockholm syndrome :)

On Tue, Nov 1, 2016 at 10:54 PM, Robert Wille  wrote:

> I used to think it was terrible as well. But it really isn’t. Just put
> your non-counter columns in a separate table with the same primary key. If
> you want to query both counter and non-counter columns at the same time,
> just query both tables at the same time with asynchronous queries.
>
> On Nov 1, 2016, at 7:29 AM, Ali Akhtar  wrote:
>
> That's a terrible gotcha rule.
>
> On Tue, Nov 1, 2016 at 6:27 PM, Cody Yancey  wrote:
>
>> In your table schema, you have KEYS and you have VALUES. Your KEYS are
>> text, but they could be any non-counter type or compound thereof. KEYS
>> obviously cannot ever be counters.
>>
>> Your VALUES, however, must be either all counters or all non-counters.
>> The official example you posted conforms to this limitation.
>>
>> Thanks,
>> Cody
>>
>> On Nov 1, 2016 7:16 AM, "Ali Akhtar"  wrote:
>>
>>> I'm not referring to the primary key, just to other columns.
>>>
>>> My primary key is a text, and my table contains a mix of texts, ints,
>>> and timestamps.
>>>
>>> If I try to change one of the ints to a counter and run the create table
>>> query, I get the error ' Cannot mix counter and non counter columns in
>>> the same table'
>>>
>>>
>>> On Tue, Nov 1, 2016 at 6:11 PM, Cody Yancey  wrote:
>>>
>>>> For counter tables, non-counter types are of course allowed in the
>>>> primary key. Counters would be meaningless otherwise.
>>>>
>>>> Thanks,
>>>> Cody
>>>>
>>>> On Nov 1, 2016 7:00 AM, "Ali Akhtar"  wrote:
>>>>
>>>>> In the documentation for counters:
>>>>>
>>>>> https://docs.datastax.com/en/cql/3.1/cql/cql_using/use_counter_t.html
>>>>>
>>>>> The example table is created via:
>>>>>
>>>>> CREATE TABLE counterks.page_view_counts
>>>>>   (counter_value counter,
>>>>>   url_name varchar,
>>>>>   page_name varchar,
>>>>>   PRIMARY KEY (url_name, page_name)
>>>>> );
>>>>>
>>>>> Yet if I try to create a table with a mixture of texts, ints,
>>>>> timestamps, and counters, i get the error ' Cannot mix counter and non
>>>>> counter columns in the same table'
>>>>>
>>>>> Is that supposed to be allowed or not allowed, given that the official
>>>>> example contains a mix of counters and non-counters?
>>>>>
>>>>
>>>
>
>

Re: Cannot mix counter and non counter columns in the same table

2016-11-01 Thread Ali Akhtar

That's a terrible gotcha rule.

On Tue, Nov 1, 2016 at 6:27 PM, Cody Yancey  wrote:

> In your table schema, you have KEYS and you have VALUES. Your KEYS are
> text, but they could be any non-counter type or compound thereof. KEYS
> obviously cannot ever be counters.
>
> Your VALUES, however, must be either all counters or all non-counters. The
> official example you posted conforms to this limitation.
>
> Thanks,
> Cody
>
> On Nov 1, 2016 7:16 AM, "Ali Akhtar"  wrote:
>
>> I'm not referring to the primary key, just to other columns.
>>
>> My primary key is a text, and my table contains a mix of texts, ints, and
>> timestamps.
>>
>> If I try to change one of the ints to a counter and run the create table
>> query, I get the error ' Cannot mix counter and non counter columns in
>> the same table'
>>
>>
>> On Tue, Nov 1, 2016 at 6:11 PM, Cody Yancey  wrote:
>>
>>> For counter tables, non-counter types are of course allowed in the
>>> primary key. Counters would be meaningless otherwise.
>>>
>>> Thanks,
>>> Cody
>>>
>>> On Nov 1, 2016 7:00 AM, "Ali Akhtar"  wrote:
>>>
>>>> In the documentation for counters:
>>>>
>>>> https://docs.datastax.com/en/cql/3.1/cql/cql_using/use_counter_t.html
>>>>
>>>> The example table is created via:
>>>>
>>>> CREATE TABLE counterks.page_view_counts
>>>>   (counter_value counter,
>>>>   url_name varchar,
>>>>   page_name varchar,
>>>>   PRIMARY KEY (url_name, page_name)
>>>> );
>>>>
>>>> Yet if I try to create a table with a mixture of texts, ints,
>>>> timestamps, and counters, i get the error ' Cannot mix counter and non
>>>> counter columns in the same table'
>>>>
>>>> Is that supposed to be allowed or not allowed, given that the official
>>>> example contains a mix of counters and non-counters?
>>>>
>>>
>>

Re: Cannot mix counter and non counter columns in the same table

2016-11-01 Thread Ali Akhtar

I'm not referring to the primary key, just to other columns.

My primary key is a text, and my table contains a mix of texts, ints, and
timestamps.

If I try to change one of the ints to a counter and run the create table
query, I get the error ' Cannot mix counter and non counter columns in the
same table'


On Tue, Nov 1, 2016 at 6:11 PM, Cody Yancey  wrote:

> For counter tables, non-counter types are of course allowed in the primary
> key. Counters would be meaningless otherwise.
>
> Thanks,
> Cody
>
> On Nov 1, 2016 7:00 AM, "Ali Akhtar"  wrote:
>
>> In the documentation for counters:
>>
>> https://docs.datastax.com/en/cql/3.1/cql/cql_using/use_counter_t.html
>>
>> The example table is created via:
>>
>> CREATE TABLE counterks.page_view_counts
>>   (counter_value counter,
>>   url_name varchar,
>>   page_name varchar,
>>   PRIMARY KEY (url_name, page_name)
>> );
>>
>> Yet if I try to create a table with a mixture of texts, ints, timestamps,
>> and counters, i get the error ' Cannot mix counter and non counter columns
>> in the same table'
>>
>> Is that supposed to be allowed or not allowed, given that the official
>> example contains a mix of counters and non-counters?
>>
>

Cannot mix counter and non counter columns in the same table

2016-11-01 Thread Ali Akhtar

In the documentation for counters:

https://docs.datastax.com/en/cql/3.1/cql/cql_using/use_counter_t.html

The example table is created via:

CREATE TABLE counterks.page_view_counts
  (counter_value counter,
  url_name varchar,
  page_name varchar,
  PRIMARY KEY (url_name, page_name)
);

Yet if I try to create a table with a mixture of texts, ints, timestamps,
and counters, i get the error ' Cannot mix counter and non counter columns
in the same table'

Is that supposed to be allowed or not allowed, given that the official
example contains a mix of counters and non-counters?

Specifying multiple conditions for lightweight conditions?

2016-11-01 Thread Ali Akhtar

In the following query:

UPDATE project SET last_due_at = '2013-01-01 00:00:00+0200'
WHERE id = '1'
IF last_due_at < '2013-01-01 00:00:00+0200';

The intent is to change the value of 'last_due_at' as long as 'last_due_at'
isn't already set to a later date than the one I've supplied.

The problem is, last_due_at starts off with an initial value of null, so
the above query fails.

If I try the following:


UPDATE project SET last_due_at = '2013-01-01 00:00:00+0200'
WHERE id = '1'
IF last_due_at < '2013-01-01 00:00:00+0200' OR last_due_at = null;

That fails due to a syntax error.

Is there any other way to achieve this?

Cannot restrict clustering columns by IN relations when a collection is selected by the query

2016-10-27 Thread Ali Akhtar

I have the following table schema:

*CREATE TABLE ticket_by_member (*
* project_id text,*
* member_id text,*
* ticket_id text,*
* ticket ticket,*
*assigned_members list,*
* votes list>,*
*labels list>,*
* PRIMARY KEY ( project_id, member_id, ticket_id )*
*);*

I have a scenario where I need to show all tickets for a particular
project, by a group of member ids.

I think it would be more efficient to do this as an IN query of the
type: *project_id
= x AND member_id IN (...)*, instead of doing multiple queries of: *project_id
= x AND member_id = y*

I tried to setup an accessor for this, as the following:

*   @Query("SELECT * FROM ticket_by_member WHERE project_id = ? AND
member_id IN(?)" )*

*Result cardsByMembers(String projectId,
List memberIds);*

But when I call this method, I get the exception:

 java.util.concurrent.ExecutionException:
com.datastax.driver.core.exceptions.InvalidQueryException: Cannot restrict
clustering columns by IN relations when a collection is selected by the
query

Any ideas on why this isn't working?

Re: which one of the following choices is more efficient?

2016-10-26 Thread Ali Akhtar

You would need to do each write twice and data will take up twice the space
as its duplicated in two places.

On Wed, Oct 26, 2016 at 1:17 PM, Kant Kodali  wrote:

> I guess the question can be rephrased into "What is the overhead of
> creating and maintaining an additional table?"
>
> On Wed, Oct 26, 2016 at 1:12 AM, Ali Akhtar  wrote:
>
>> Depends on the use case. No one right answer.
>>
>> On Wed, Oct 26, 2016 at 1:03 PM, Kant Kodali  wrote:
>>
>>> If one were given a choice of fitting all the data into one table vs
>>> fitting the data into two tables while say (keeping all the runtime and
>>> space complexity for CRUD operations the same in either case)  which one
>>> would you choose and why?
>>>
>>>
>>
>

Re: which one of the following choices is more efficient?

2016-10-26 Thread Ali Akhtar

Depends on the use case. No one right answer.

On Wed, Oct 26, 2016 at 1:03 PM, Kant Kodali  wrote:

> If one were given a choice of fitting all the data into one table vs
> fitting the data into two tables while say (keeping all the runtime and
> space complexity for CRUD operations the same in either case)  which one
> would you choose and why?
>
>

Re: CommitLogReadHandler$CommitLogReadException: Unexpected error deserializing mutation

2016-10-24 Thread Ali Akhtar

I want some of the newer UDT features, like not needing to have frozen UDTs

On Tue, Oct 25, 2016 at 6:34 AM, Ali Akhtar  wrote:

> 3.0.x? Isn't 3.7 stable?
>
> On Tue, Oct 25, 2016 at 6:32 AM, Jonathan Haddad 
> wrote:
>
>> If you're not in prod *yet*, I once again recommend not using 3.9 for
>> anything serious.  Use the latest 3.0.x.
>>
>> On Mon, Oct 24, 2016 at 6:19 PM Ali Akhtar  wrote:
>>
>>> Stefania,
>>>
>>> As this is just on my dev laptop, what I'm really looking for is a quick
>>> 1-2 min fix to this solution that will act as a workaround.
>>>
>>> If I drop the keyspace and recreate it, will that fix this problem? Or
>>> do I need to uninstall 3.9 and go back to 3.7?
>>>
>>> I probably have made a couple of changes to the schema of the tables
>>> after I first created them.
>>>
>>> Happy to share the schema with you privately if it will lead to a 1-2
>>> min fix to this problem for me.
>>>
>>> On Tue, Oct 25, 2016 at 5:59 AM, Stefania Alborghetti <
>>> stefania.alborghe...@datastax.com> wrote:
>>>
>>> Did the schema change? This would be 12397.
>>>
>>> If not, and if you don't mind sharing the data, or you have the steps to
>>> reproduce it, could you please open a ticket so it can be looked at? You
>>> need to attach the schema as well.
>>>
>>> On Mon, Oct 24, 2016 at 9:33 PM, Ali Akhtar 
>>> wrote:
>>>
>>> Its 'text'.  Don't know the answer of the 2nd question.
>>>
>>> On Mon, Oct 24, 2016 at 6:31 PM, Jonathan Haddad 
>>> wrote:
>>>
>>> What type is board id? Is the value a tombstone?
>>>
>>> On Mon, Oct 24, 2016 at 1:38 AM Ali Akhtar  wrote:
>>>
>>> Thanks, but I did come across those, it doesn't look like they provide a
>>> resolution.
>>>
>>> On Mon, Oct 24, 2016 at 1:36 PM, DuyHai Doan 
>>> wrote:
>>>
>>> You may read those:
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-12121
>>> https://issues.apache.org/jira/browse/CASSANDRA-12397
>>>
>>> On Mon, Oct 24, 2016 at 10:24 AM, Ali Akhtar 
>>> wrote:
>>>
>>> Any workarounds that don't involve me having to figure out how to
>>> uninstall and re-install a different version?
>>>
>>> On Mon, Oct 24, 2016 at 1:24 PM, Ali Akhtar 
>>> wrote:
>>>
>>> 3.9..
>>>
>>> On Mon, Oct 24, 2016 at 1:22 PM, DuyHai Doan 
>>> wrote:
>>>
>>> Which version of C* ? There was similar issues with commitlogs in
>>> tic-toc versions.
>>>
>>> On Mon, Oct 24, 2016 at 4:18 AM, Ali Akhtar 
>>> wrote:
>>>
>>> I have a single node cassandra installation on my dev laptop, which is
>>> used just for dev / testing.
>>>
>>> Recently, whenever I restart my laptop, Cassandra fails to start when I
>>> run it via 'sudo service cassandra start'.
>>>
>>> Doing a tail on /var/log/cassandra/system.log gives this log:
>>>
>>> *INFO  [main] 2016-10-24 07:08:02,950 CommitLog.java:166 - Replaying
>>> /var/lib/cassandra/commitlog/CommitLog-6-1476907676969.log,
>>> /var/lib/cassandra/commitlog/CommitLog-6-1476907676970.log,
>>> /var/lib/cassandra/commitlog/CommitLog-6-1477269052845.log*
>>> *ERROR [main] 2016-10-24 07:08:03,357 JVMStabilityInspector.java:82 -
>>> Exiting due to error while processing commit log during initialization.*
>>> *org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException:
>>> Unexpected error deserializing mutation; saved to
>>> /tmp/mutation9186356142128811141dat.  This may be caused by replaying a
>>> mutation against a table with the same name but incompatible schema.
>>> Exception follows: org.apache.cassandra.serializers.MarshalException: Not
>>> enough bytes to read 0th field board_id*
>>> * at
>>> org.apache.cassandra.db.commitlog.CommitLogReader.readMutation(CommitLogReader.java:410)
>>> [apache-cassandra-3.9.jar:3.9]*
>>> * at
>>> org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:343)
>>> [apache-cassandra-3.9.jar:3.9]*
>>> * at
>>> org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:202)
>>> [apache-cassandra-3.9.jar:3.9]*
>>> * at
>>> org.apache.cassandra.db.commitlog.CommitLogReader.readAllFile

Re: CommitLogReadHandler$CommitLogReadException: Unexpected error deserializing mutation

2016-10-24 Thread Ali Akhtar

3.0.x? Isn't 3.7 stable?

On Tue, Oct 25, 2016 at 6:32 AM, Jonathan Haddad  wrote:

> If you're not in prod *yet*, I once again recommend not using 3.9 for
> anything serious.  Use the latest 3.0.x.
>
> On Mon, Oct 24, 2016 at 6:19 PM Ali Akhtar  wrote:
>
>> Stefania,
>>
>> As this is just on my dev laptop, what I'm really looking for is a quick
>> 1-2 min fix to this solution that will act as a workaround.
>>
>> If I drop the keyspace and recreate it, will that fix this problem? Or do
>> I need to uninstall 3.9 and go back to 3.7?
>>
>> I probably have made a couple of changes to the schema of the tables
>> after I first created them.
>>
>> Happy to share the schema with you privately if it will lead to a 1-2 min
>> fix to this problem for me.
>>
>> On Tue, Oct 25, 2016 at 5:59 AM, Stefania Alborghetti <
>> stefania.alborghe...@datastax.com> wrote:
>>
>> Did the schema change? This would be 12397.
>>
>> If not, and if you don't mind sharing the data, or you have the steps to
>> reproduce it, could you please open a ticket so it can be looked at? You
>> need to attach the schema as well.
>>
>> On Mon, Oct 24, 2016 at 9:33 PM, Ali Akhtar  wrote:
>>
>> Its 'text'.  Don't know the answer of the 2nd question.
>>
>> On Mon, Oct 24, 2016 at 6:31 PM, Jonathan Haddad 
>> wrote:
>>
>> What type is board id? Is the value a tombstone?
>>
>> On Mon, Oct 24, 2016 at 1:38 AM Ali Akhtar  wrote:
>>
>> Thanks, but I did come across those, it doesn't look like they provide a
>> resolution.
>>
>> On Mon, Oct 24, 2016 at 1:36 PM, DuyHai Doan 
>> wrote:
>>
>> You may read those:
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-12121
>> https://issues.apache.org/jira/browse/CASSANDRA-12397
>>
>> On Mon, Oct 24, 2016 at 10:24 AM, Ali Akhtar 
>> wrote:
>>
>> Any workarounds that don't involve me having to figure out how to
>> uninstall and re-install a different version?
>>
>> On Mon, Oct 24, 2016 at 1:24 PM, Ali Akhtar  wrote:
>>
>> 3.9..
>>
>> On Mon, Oct 24, 2016 at 1:22 PM, DuyHai Doan 
>> wrote:
>>
>> Which version of C* ? There was similar issues with commitlogs in tic-toc
>> versions.
>>
>> On Mon, Oct 24, 2016 at 4:18 AM, Ali Akhtar  wrote:
>>
>> I have a single node cassandra installation on my dev laptop, which is
>> used just for dev / testing.
>>
>> Recently, whenever I restart my laptop, Cassandra fails to start when I
>> run it via 'sudo service cassandra start'.
>>
>> Doing a tail on /var/log/cassandra/system.log gives this log:
>>
>> *INFO  [main] 2016-10-24 07:08:02,950 CommitLog.java:166 - Replaying
>> /var/lib/cassandra/commitlog/CommitLog-6-1476907676969.log,
>> /var/lib/cassandra/commitlog/CommitLog-6-1476907676970.log,
>> /var/lib/cassandra/commitlog/CommitLog-6-1477269052845.log*
>> *ERROR [main] 2016-10-24 07:08:03,357 JVMStabilityInspector.java:82 -
>> Exiting due to error while processing commit log during initialization.*
>> *org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException:
>> Unexpected error deserializing mutation; saved to
>> /tmp/mutation9186356142128811141dat.  This may be caused by replaying a
>> mutation against a table with the same name but incompatible schema.
>> Exception follows: org.apache.cassandra.serializers.MarshalException: Not
>> enough bytes to read 0th field board_id*
>> * at
>> org.apache.cassandra.db.commitlog.CommitLogReader.readMutation(CommitLogReader.java:410)
>> [apache-cassandra-3.9.jar:3.9]*
>> * at
>> org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:343)
>> [apache-cassandra-3.9.jar:3.9]*
>> * at
>> org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:202)
>> [apache-cassandra-3.9.jar:3.9]*
>> * at
>> org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:85)
>> [apache-cassandra-3.9.jar:3.9]*
>> * at
>> org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:135)
>> [apache-cassandra-3.9.jar:3.9]*
>> * at
>> org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:187)
>> [apache-cassandra-3.9.jar:3.9]*
>> * at
>> org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:167)
>> [apache-cassandra-3.9.jar:3.9]*
>> * at
>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:323)
>> [apache-cassandra-3.9.jar:3.9]*
>> * at
>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601)
>> [apache-cassandra-3.9.jar:3.9]*
>> * at
>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:730)
>> [apache-cassandra-3.9.jar:3.9]*
>>
>>
>> I then have to do 'sudo rm -rf /var/lib/cassandra/commitlog/*' which
>> fixes the problem, but then I lose all of my data.
>>
>> It looks like its saying there wasn't enough data to read the field
>> 'board_id', any ideas why that would be?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>>
>> Stefania Alborghetti
>>
>> |+852 6114 9265| stefania.alborghe...@datastax.com
>>
>>
>>

Re: CommitLogReadHandler$CommitLogReadException: Unexpected error deserializing mutation

2016-10-24 Thread Ali Akhtar

Stefania,

As this is just on my dev laptop, what I'm really looking for is a quick
1-2 min fix to this solution that will act as a workaround.

If I drop the keyspace and recreate it, will that fix this problem? Or do I
need to uninstall 3.9 and go back to 3.7?

I probably have made a couple of changes to the schema of the tables after
I first created them.

Happy to share the schema with you privately if it will lead to a 1-2 min
fix to this problem for me.

On Tue, Oct 25, 2016 at 5:59 AM, Stefania Alborghetti <
stefania.alborghe...@datastax.com> wrote:

> Did the schema change? This would be 12397.
>
> If not, and if you don't mind sharing the data, or you have the steps to
> reproduce it, could you please open a ticket so it can be looked at? You
> need to attach the schema as well.
>
> On Mon, Oct 24, 2016 at 9:33 PM, Ali Akhtar  wrote:
>
>> Its 'text'.  Don't know the answer of the 2nd question.
>>
>> On Mon, Oct 24, 2016 at 6:31 PM, Jonathan Haddad 
>> wrote:
>>
>>> What type is board id? Is the value a tombstone?
>>>
>>> On Mon, Oct 24, 2016 at 1:38 AM Ali Akhtar  wrote:
>>>
>>>> Thanks, but I did come across those, it doesn't look like they provide
>>>> a resolution.
>>>>
>>>> On Mon, Oct 24, 2016 at 1:36 PM, DuyHai Doan 
>>>> wrote:
>>>>
>>>> You may read those:
>>>>
>>>> https://issues.apache.org/jira/browse/CASSANDRA-12121
>>>> https://issues.apache.org/jira/browse/CASSANDRA-12397
>>>>
>>>> On Mon, Oct 24, 2016 at 10:24 AM, Ali Akhtar 
>>>> wrote:
>>>>
>>>> Any workarounds that don't involve me having to figure out how to
>>>> uninstall and re-install a different version?
>>>>
>>>> On Mon, Oct 24, 2016 at 1:24 PM, Ali Akhtar 
>>>> wrote:
>>>>
>>>> 3.9..
>>>>
>>>> On Mon, Oct 24, 2016 at 1:22 PM, DuyHai Doan 
>>>> wrote:
>>>>
>>>> Which version of C* ? There was similar issues with commitlogs in
>>>> tic-toc versions.
>>>>
>>>> On Mon, Oct 24, 2016 at 4:18 AM, Ali Akhtar 
>>>> wrote:
>>>>
>>>> I have a single node cassandra installation on my dev laptop, which is
>>>> used just for dev / testing.
>>>>
>>>> Recently, whenever I restart my laptop, Cassandra fails to start when I
>>>> run it via 'sudo service cassandra start'.
>>>>
>>>> Doing a tail on /var/log/cassandra/system.log gives this log:
>>>>
>>>> *INFO  [main] 2016-10-24 07:08:02,950 CommitLog.java:166 - Replaying
>>>> /var/lib/cassandra/commitlog/CommitLog-6-1476907676969.log,
>>>> /var/lib/cassandra/commitlog/CommitLog-6-1476907676970.log,
>>>> /var/lib/cassandra/commitlog/CommitLog-6-1477269052845.log*
>>>> *ERROR [main] 2016-10-24 07:08:03,357 JVMStabilityInspector.java:82 -
>>>> Exiting due to error while processing commit log during initialization.*
>>>> *org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException:
>>>> Unexpected error deserializing mutation; saved to
>>>> /tmp/mutation9186356142128811141dat.  This may be caused by replaying a
>>>> mutation against a table with the same name but incompatible schema.
>>>> Exception follows: org.apache.cassandra.serializers.MarshalException: Not
>>>> enough bytes to read 0th field board_id*
>>>> * at
>>>> org.apache.cassandra.db.commitlog.CommitLogReader.readMutation(CommitLogReader.java:410)
>>>> [apache-cassandra-3.9.jar:3.9]*
>>>> * at
>>>> org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:343)
>>>> [apache-cassandra-3.9.jar:3.9]*
>>>> * at
>>>> org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:202)
>>>> [apache-cassandra-3.9.jar:3.9]*
>>>> * at
>>>> org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:85)
>>>> [apache-cassandra-3.9.jar:3.9]*
>>>> * at
>>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:135)
>>>> [apache-cassandra-3.9.jar:3.9]*
>>>> * at
>>>> org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:187)
>>>> [apache-cassandra-3.9.jar:3.9]*
>>>> * at
>>>> org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:167)
>>>> [apache-cassandra-3.9.jar:3.9]*
>>>> * at
>>>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:323)
>>>> [apache-cassandra-3.9.jar:3.9]*
>>>> * at
>>>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601)
>>>> [apache-cassandra-3.9.jar:3.9]*
>>>> * at
>>>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:730)
>>>> [apache-cassandra-3.9.jar:3.9]*
>>>>
>>>>
>>>> I then have to do 'sudo rm -rf /var/lib/cassandra/commitlog/*' which
>>>> fixes the problem, but then I lose all of my data.
>>>>
>>>> It looks like its saying there wasn't enough data to read the field
>>>> 'board_id', any ideas why that would be?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>
>
> --
>
>
> Stefania Alborghetti
>
> |+852 6114 9265| stefania.alborghe...@datastax.com
>

Doing an upsert into a collection?

2016-10-24 Thread Ali Akhtar

Say I have this UDT:

*CREATE TYPE rating (*
* user text,*
* rating int*
*);*

And, I have this table:

*CREATE TABLE movie (*
* id text,*
* name text,*
* ratings list>,*
* PRIMARY KEY ( id )*
*);*

Say a user 'bob' rated a movie as a 5. Is it possible to do something like
this:

*UPDATE movie set ratings.rating = 5 WHERE ratings.user = 'bob'*

And have that query either update bob's previous rating if he had already
rated, or have it insert a new Rating into the ratings w/ user = bob,
rating = 5?

If not, can this be achieved with a map instead of a list?

Thanks.

Re: CommitLogReadHandler$CommitLogReadException: Unexpected error deserializing mutation

2016-10-24 Thread Ali Akhtar

Its 'text'.  Don't know the answer of the 2nd question.

On Mon, Oct 24, 2016 at 6:31 PM, Jonathan Haddad  wrote:

> What type is board id? Is the value a tombstone?
>
> On Mon, Oct 24, 2016 at 1:38 AM Ali Akhtar  wrote:
>
>> Thanks, but I did come across those, it doesn't look like they provide a
>> resolution.
>>
>> On Mon, Oct 24, 2016 at 1:36 PM, DuyHai Doan 
>> wrote:
>>
>> You may read those:
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-12121
>> https://issues.apache.org/jira/browse/CASSANDRA-12397
>>
>> On Mon, Oct 24, 2016 at 10:24 AM, Ali Akhtar 
>> wrote:
>>
>> Any workarounds that don't involve me having to figure out how to
>> uninstall and re-install a different version?
>>
>> On Mon, Oct 24, 2016 at 1:24 PM, Ali Akhtar  wrote:
>>
>> 3.9..
>>
>> On Mon, Oct 24, 2016 at 1:22 PM, DuyHai Doan 
>> wrote:
>>
>> Which version of C* ? There was similar issues with commitlogs in tic-toc
>> versions.
>>
>> On Mon, Oct 24, 2016 at 4:18 AM, Ali Akhtar  wrote:
>>
>> I have a single node cassandra installation on my dev laptop, which is
>> used just for dev / testing.
>>
>> Recently, whenever I restart my laptop, Cassandra fails to start when I
>> run it via 'sudo service cassandra start'.
>>
>> Doing a tail on /var/log/cassandra/system.log gives this log:
>>
>> *INFO  [main] 2016-10-24 07:08:02,950 CommitLog.java:166 - Replaying
>> /var/lib/cassandra/commitlog/CommitLog-6-1476907676969.log,
>> /var/lib/cassandra/commitlog/CommitLog-6-1476907676970.log,
>> /var/lib/cassandra/commitlog/CommitLog-6-1477269052845.log*
>> *ERROR [main] 2016-10-24 07:08:03,357 JVMStabilityInspector.java:82 -
>> Exiting due to error while processing commit log during initialization.*
>> *org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException:
>> Unexpected error deserializing mutation; saved to
>> /tmp/mutation9186356142128811141dat.  This may be caused by replaying a
>> mutation against a table with the same name but incompatible schema.
>> Exception follows: org.apache.cassandra.serializers.MarshalException: Not
>> enough bytes to read 0th field board_id*
>> * at
>> org.apache.cassandra.db.commitlog.CommitLogReader.readMutation(CommitLogReader.java:410)
>> [apache-cassandra-3.9.jar:3.9]*
>> * at
>> org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:343)
>> [apache-cassandra-3.9.jar:3.9]*
>> * at
>> org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:202)
>> [apache-cassandra-3.9.jar:3.9]*
>> * at
>> org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:85)
>> [apache-cassandra-3.9.jar:3.9]*
>> * at
>> org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:135)
>> [apache-cassandra-3.9.jar:3.9]*
>> * at
>> org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:187)
>> [apache-cassandra-3.9.jar:3.9]*
>> * at
>> org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:167)
>> [apache-cassandra-3.9.jar:3.9]*
>> * at
>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:323)
>> [apache-cassandra-3.9.jar:3.9]*
>> * at
>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601)
>> [apache-cassandra-3.9.jar:3.9]*
>> * at
>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:730)
>> [apache-cassandra-3.9.jar:3.9]*
>>
>>
>> I then have to do 'sudo rm -rf /var/lib/cassandra/commitlog/*' which
>> fixes the problem, but then I lose all of my data.
>>
>> It looks like its saying there wasn't enough data to read the field
>> 'board_id', any ideas why that would be?
>>
>>
>>
>>
>>
>>
>>

Re: CommitLogReadHandler$CommitLogReadException: Unexpected error deserializing mutation

2016-10-24 Thread Ali Akhtar

Thanks, but I did come across those, it doesn't look like they provide a
resolution.

On Mon, Oct 24, 2016 at 1:36 PM, DuyHai Doan  wrote:

> You may read those:
>
> https://issues.apache.org/jira/browse/CASSANDRA-12121
> https://issues.apache.org/jira/browse/CASSANDRA-12397
>
> On Mon, Oct 24, 2016 at 10:24 AM, Ali Akhtar  wrote:
>
>> Any workarounds that don't involve me having to figure out how to
>> uninstall and re-install a different version?
>>
>> On Mon, Oct 24, 2016 at 1:24 PM, Ali Akhtar  wrote:
>>
>>> 3.9..
>>>
>>> On Mon, Oct 24, 2016 at 1:22 PM, DuyHai Doan 
>>> wrote:
>>>
>>>> Which version of C* ? There was similar issues with commitlogs in
>>>> tic-toc versions.
>>>>
>>>> On Mon, Oct 24, 2016 at 4:18 AM, Ali Akhtar 
>>>> wrote:
>>>>
>>>>> I have a single node cassandra installation on my dev laptop, which is
>>>>> used just for dev / testing.
>>>>>
>>>>> Recently, whenever I restart my laptop, Cassandra fails to start when
>>>>> I run it via 'sudo service cassandra start'.
>>>>>
>>>>> Doing a tail on /var/log/cassandra/system.log gives this log:
>>>>>
>>>>> *INFO  [main] 2016-10-24 07:08:02,950 CommitLog.java:166 - Replaying
>>>>> /var/lib/cassandra/commitlog/CommitLog-6-1476907676969.log,
>>>>> /var/lib/cassandra/commitlog/CommitLog-6-1476907676970.log,
>>>>> /var/lib/cassandra/commitlog/CommitLog-6-1477269052845.log*
>>>>> *ERROR [main] 2016-10-24 07:08:03,357 JVMStabilityInspector.java:82 -
>>>>> Exiting due to error while processing commit log during initialization.*
>>>>> *org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException:
>>>>> Unexpected error deserializing mutation; saved to
>>>>> /tmp/mutation9186356142128811141dat.  This may be caused by replaying a
>>>>> mutation against a table with the same name but incompatible schema.
>>>>> Exception follows: org.apache.cassandra.serializers.MarshalException: Not
>>>>> enough bytes to read 0th field board_id*
>>>>> * at
>>>>> org.apache.cassandra.db.commitlog.CommitLogReader.readMutation(CommitLogReader.java:410)
>>>>> [apache-cassandra-3.9.jar:3.9]*
>>>>> * at
>>>>> org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:343)
>>>>> [apache-cassandra-3.9.jar:3.9]*
>>>>> * at
>>>>> org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:202)
>>>>> [apache-cassandra-3.9.jar:3.9]*
>>>>> * at
>>>>> org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:85)
>>>>> [apache-cassandra-3.9.jar:3.9]*
>>>>> * at
>>>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:135)
>>>>> [apache-cassandra-3.9.jar:3.9]*
>>>>> * at
>>>>> org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:187)
>>>>> [apache-cassandra-3.9.jar:3.9]*
>>>>> * at
>>>>> org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:167)
>>>>> [apache-cassandra-3.9.jar:3.9]*
>>>>> * at
>>>>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:323)
>>>>> [apache-cassandra-3.9.jar:3.9]*
>>>>> * at
>>>>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601)
>>>>> [apache-cassandra-3.9.jar:3.9]*
>>>>> * at
>>>>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:730)
>>>>> [apache-cassandra-3.9.jar:3.9]*
>>>>>
>>>>>
>>>>> I then have to do 'sudo rm -rf /var/lib/cassandra/commitlog/*' which
>>>>> fixes the problem, but then I lose all of my data.
>>>>>
>>>>> It looks like its saying there wasn't enough data to read the field
>>>>> 'board_id', any ideas why that would be?
>>>>>
>>>>
>>>>
>>>
>>
>

Re: CommitLogReadHandler$CommitLogReadException: Unexpected error deserializing mutation

2016-10-24 Thread Ali Akhtar

Any workarounds that don't involve me having to figure out how to uninstall
and re-install a different version?

On Mon, Oct 24, 2016 at 1:24 PM, Ali Akhtar  wrote:

> 3.9..
>
> On Mon, Oct 24, 2016 at 1:22 PM, DuyHai Doan  wrote:
>
>> Which version of C* ? There was similar issues with commitlogs in tic-toc
>> versions.
>>
>> On Mon, Oct 24, 2016 at 4:18 AM, Ali Akhtar  wrote:
>>
>>> I have a single node cassandra installation on my dev laptop, which is
>>> used just for dev / testing.
>>>
>>> Recently, whenever I restart my laptop, Cassandra fails to start when I
>>> run it via 'sudo service cassandra start'.
>>>
>>> Doing a tail on /var/log/cassandra/system.log gives this log:
>>>
>>> *INFO  [main] 2016-10-24 07:08:02,950 CommitLog.java:166 - Replaying
>>> /var/lib/cassandra/commitlog/CommitLog-6-1476907676969.log,
>>> /var/lib/cassandra/commitlog/CommitLog-6-1476907676970.log,
>>> /var/lib/cassandra/commitlog/CommitLog-6-1477269052845.log*
>>> *ERROR [main] 2016-10-24 07:08:03,357 JVMStabilityInspector.java:82 -
>>> Exiting due to error while processing commit log during initialization.*
>>> *org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException:
>>> Unexpected error deserializing mutation; saved to
>>> /tmp/mutation9186356142128811141dat.  This may be caused by replaying a
>>> mutation against a table with the same name but incompatible schema.
>>> Exception follows: org.apache.cassandra.serializers.MarshalException: Not
>>> enough bytes to read 0th field board_id*
>>> * at
>>> org.apache.cassandra.db.commitlog.CommitLogReader.readMutation(CommitLogReader.java:410)
>>> [apache-cassandra-3.9.jar:3.9]*
>>> * at
>>> org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:343)
>>> [apache-cassandra-3.9.jar:3.9]*
>>> * at
>>> org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:202)
>>> [apache-cassandra-3.9.jar:3.9]*
>>> * at
>>> org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:85)
>>> [apache-cassandra-3.9.jar:3.9]*
>>> * at
>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:135)
>>> [apache-cassandra-3.9.jar:3.9]*
>>> * at
>>> org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:187)
>>> [apache-cassandra-3.9.jar:3.9]*
>>> * at
>>> org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:167)
>>> [apache-cassandra-3.9.jar:3.9]*
>>> * at
>>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:323)
>>> [apache-cassandra-3.9.jar:3.9]*
>>> * at
>>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601)
>>> [apache-cassandra-3.9.jar:3.9]*
>>> * at
>>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:730)
>>> [apache-cassandra-3.9.jar:3.9]*
>>>
>>>
>>> I then have to do 'sudo rm -rf /var/lib/cassandra/commitlog/*' which
>>> fixes the problem, but then I lose all of my data.
>>>
>>> It looks like its saying there wasn't enough data to read the field
>>> 'board_id', any ideas why that would be?
>>>
>>
>>
>

Re: CommitLogReadHandler$CommitLogReadException: Unexpected error deserializing mutation

2016-10-24 Thread Ali Akhtar

3.9..

On Mon, Oct 24, 2016 at 1:22 PM, DuyHai Doan  wrote:

> Which version of C* ? There was similar issues with commitlogs in tic-toc
> versions.
>
> On Mon, Oct 24, 2016 at 4:18 AM, Ali Akhtar  wrote:
>
>> I have a single node cassandra installation on my dev laptop, which is
>> used just for dev / testing.
>>
>> Recently, whenever I restart my laptop, Cassandra fails to start when I
>> run it via 'sudo service cassandra start'.
>>
>> Doing a tail on /var/log/cassandra/system.log gives this log:
>>
>> *INFO  [main] 2016-10-24 07:08:02,950 CommitLog.java:166 - Replaying
>> /var/lib/cassandra/commitlog/CommitLog-6-1476907676969.log,
>> /var/lib/cassandra/commitlog/CommitLog-6-1476907676970.log,
>> /var/lib/cassandra/commitlog/CommitLog-6-1477269052845.log*
>> *ERROR [main] 2016-10-24 07:08:03,357 JVMStabilityInspector.java:82 -
>> Exiting due to error while processing commit log during initialization.*
>> *org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException:
>> Unexpected error deserializing mutation; saved to
>> /tmp/mutation9186356142128811141dat.  This may be caused by replaying a
>> mutation against a table with the same name but incompatible schema.
>> Exception follows: org.apache.cassandra.serializers.MarshalException: Not
>> enough bytes to read 0th field board_id*
>> * at
>> org.apache.cassandra.db.commitlog.CommitLogReader.readMutation(CommitLogReader.java:410)
>> [apache-cassandra-3.9.jar:3.9]*
>> * at
>> org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:343)
>> [apache-cassandra-3.9.jar:3.9]*
>> * at
>> org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:202)
>> [apache-cassandra-3.9.jar:3.9]*
>> * at
>> org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:85)
>> [apache-cassandra-3.9.jar:3.9]*
>> * at
>> org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:135)
>> [apache-cassandra-3.9.jar:3.9]*
>> * at
>> org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:187)
>> [apache-cassandra-3.9.jar:3.9]*
>> * at
>> org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:167)
>> [apache-cassandra-3.9.jar:3.9]*
>> * at
>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:323)
>> [apache-cassandra-3.9.jar:3.9]*
>> * at
>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601)
>> [apache-cassandra-3.9.jar:3.9]*
>> * at
>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:730)
>> [apache-cassandra-3.9.jar:3.9]*
>>
>>
>> I then have to do 'sudo rm -rf /var/lib/cassandra/commitlog/*' which
>> fixes the problem, but then I lose all of my data.
>>
>> It looks like its saying there wasn't enough data to read the field
>> 'board_id', any ideas why that would be?
>>
>
>

CommitLogReadHandler$CommitLogReadException: Unexpected error deserializing mutation

2016-10-23 Thread Ali Akhtar

I have a single node cassandra installation on my dev laptop, which is used
just for dev / testing.

Recently, whenever I restart my laptop, Cassandra fails to start when I run
it via 'sudo service cassandra start'.

Doing a tail on /var/log/cassandra/system.log gives this log:

*INFO  [main] 2016-10-24 07:08:02,950 CommitLog.java:166 - Replaying
/var/lib/cassandra/commitlog/CommitLog-6-1476907676969.log,
/var/lib/cassandra/commitlog/CommitLog-6-1476907676970.log,
/var/lib/cassandra/commitlog/CommitLog-6-1477269052845.log*
*ERROR [main] 2016-10-24 07:08:03,357 JVMStabilityInspector.java:82 -
Exiting due to error while processing commit log during initialization.*
*org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException:
Unexpected error deserializing mutation; saved to
/tmp/mutation9186356142128811141dat.  This may be caused by replaying a
mutation against a table with the same name but incompatible schema.
Exception follows: org.apache.cassandra.serializers.MarshalException: Not
enough bytes to read 0th field board_id*
* at
org.apache.cassandra.db.commitlog.CommitLogReader.readMutation(CommitLogReader.java:410)
[apache-cassandra-3.9.jar:3.9]*
* at
org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:343)
[apache-cassandra-3.9.jar:3.9]*
* at
org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:202)
[apache-cassandra-3.9.jar:3.9]*
* at
org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:85)
[apache-cassandra-3.9.jar:3.9]*
* at
org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:135)
[apache-cassandra-3.9.jar:3.9]*
* at
org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:187)
[apache-cassandra-3.9.jar:3.9]*
* at
org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:167)
[apache-cassandra-3.9.jar:3.9]*
* at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:323)
[apache-cassandra-3.9.jar:3.9]*
* at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601)
[apache-cassandra-3.9.jar:3.9]*
* at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:730)
[apache-cassandra-3.9.jar:3.9]*


I then have to do 'sudo rm -rf /var/lib/cassandra/commitlog/*' which fixes
the problem, but then I lose all of my data.

It looks like its saying there wasn't enough data to read the field
'board_id', any ideas why that would be?

Re: Speeding up schema generation during tests

2016-10-23 Thread Ali Akhtar

I'm using https://github.com/jsevellec/cassandra-unit and haven't come
across any race issues or problems. Cassandra-unit takes care of creating
the schema before it runs the tests.

On Sun, Oct 23, 2016 at 6:17 PM, DuyHai Doan  wrote:

> Ok I have added -Dcassandra.unsafesystem=true and my tests are broken.
>
> The reason is that I create some schemas before executing tests.
>
> When unable unsafesystem, Cassandra does not block for schema flush so you
> man run into race conditions where the test start using the created schema
> but it has not been fully flushed yet to disk:
>
> See C* source code here: https://github.com/apache/cassandra/blob/trunk/
> src/java/org/apache/cassandra/schema/SchemaKeyspace.java#L278-L282
>
> static void flush()
> {
> if (!DatabaseDescriptor.isUnsafeSystem())
> ALL.forEach(table -> FBUtilities.waitOnFuture(
> getSchemaCFS(table).forceFlush()));
> }
>
> I don't know how it worked out for you but it didn't for me...
>
> On Wed, Oct 19, 2016 at 9:45 AM, DuyHai Doan  wrote:
>
>> Ohh didn't know such system property exist, nice idea!
>>
>> On Wed, Oct 19, 2016 at 9:40 AM, horschi  wrote:
>>
>>> Have you tried starting Cassandra with -Dcassandra.unsafesystem=true ?
>>>
>>>
>>> On Wed, Oct 19, 2016 at 9:31 AM, DuyHai Doan 
>>> wrote:
>>>
>>>> As I said, when I bootstrap the server and create some keyspace,
>>>> sometimes the schema is not fully initialized and when the test code tried
>>>> to insert data, it fails.
>>>>
>>>> I did not have time to dig into the source code to find the root cause,
>>>> maybe it's something really stupid and simple to fix. If you want to
>>>> investigate and try out my CassandraDaemon server, I'd be happy to get
>>>> feedbacks
>>>>
>>>> On Wed, Oct 19, 2016 at 9:22 AM, Ali Akhtar 
>>>> wrote:
>>>>
>>>>> Thanks. I've disabled durable writes but this is still pretty slow
>>>>> (about 10 seconds).
>>>>>
>>>>> What issues did you run into with your impl?
>>>>>
>>>>> On Wed, Oct 19, 2016 at 12:15 PM, DuyHai Doan 
>>>>> wrote:
>>>>>
>>>>>> There is a lot of pre-flight checks when starting the cassandra
>>>>>> server and they took time.
>>>>>>
>>>>>> For integration testing, I have developped a modified CassandraDeamon
>>>>>> here that remove pretty most of those checks:
>>>>>>
>>>>>> https://github.com/doanduyhai/Achilles/blob/master/achilles-
>>>>>> embedded/src/main/java/info/archinnov/achilles/embedded/Achi
>>>>>> llesCassandraDaemon.java
>>>>>>
>>>>>> The problem is that I felt into weird scenarios where creating a
>>>>>> keyspace wasn't created in timely manner so I just stop using this impl 
>>>>>> for
>>>>>> the moment, just look at it and do whatever you want.
>>>>>>
>>>>>> Another idea for testing is to disable durable write to speed up
>>>>>> mutation (CREATE KEYSPACE ... WITH durable_write=false)
>>>>>>
>>>>>> On Wed, Oct 19, 2016 at 3:24 AM, Ali Akhtar 
>>>>>> wrote:
>>>>>>
>>>>>>> Is there a way to speed up the creation of keyspace + tables during
>>>>>>> integration tests? I am using an RF of 1, with SimpleStrategy, but it 
>>>>>>> still
>>>>>>> takes upto 10-15 seconds.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Hadoop vs Cassandra

2016-10-23 Thread Ali Akhtar

"from a particular query" should be " from a particular country"

On Sun, Oct 23, 2016 at 2:36 PM, Ali Akhtar  wrote:

> They can be, but I would assume that if your Cassandra data model is
> inefficient for the kind of queries you want to do, Spark won't magically
> take that way.
>
> For example, say you have a users table. Each user has a country, which
> isn't a partitioning key or clustering key.
>
> If you wanted to calculate the number of all users from a particular
> query, there's no way to do that in the previous data model other than to
> do a full table scan and count the users from that country.
>
> Spark can do this full table scan for you and return the number of
> records. May be it can spread the work across multiple servers. But it
> can't reduce the amount of work that has to be done.
>
> Otoh, if you were okay with creating a new table in which the country is
> part of the primary key, and for each user that signed up, you created a
> record in this user_by_country table, then it would be a very fast query to
> look up the users in a particular country, as country is then the primary
> key.
>
>
>
> On Sun, Oct 23, 2016 at 2:18 PM, Welly Tambunan  wrote:
>
>> I like muti data centre resillience in cassandra.
>>
>> I think thats plus one for cassandra.
>>
>> Ali, complex analytics can be done in spark right?
>>
>> On 23 Oct 2016 4:08 p.m., "Ali Akhtar"  wrote:
>>
>> >
>>
>> > I would say it depends on your use case.
>> >
>> > If you need a lot of queries that require joins, or complex analytics
>> of the kind that Cassandra isn't suited for, then HDFS / HBase may be
>> better.
>> >
>> > If you can work with the cassandra way of doing things (creating new
>> tables for each query you'll need to do, duplicating data - doing extra
>> writes for faster reads) , then Cassandra should work for you. It is easier
>> to setup and do dev ops with, in my experience.
>> >
>> > On Sun, Oct 23, 2016 at 2:05 PM, Welly Tambunan 
>> wrote:
>>
>> >>
>>
>> >> I mean. HDFS and HBase.
>> >>
>> >> On Sun, Oct 23, 2016 at 4:00 PM, Ali Akhtar 
>> wrote:
>>
>> >>>
>>
>> >>> By Hadoop do you mean HDFS?
>> >>>
>> >>>
>> >>>
>> >>> On Sun, Oct 23, 2016 at 1:56 PM, Welly Tambunan 
>> wrote:
>>
>> >>>>
>>
>> >>>> Hi All,
>> >>>>
>> >>>> I read the following comparison between hadoop and cassandra. Seems
>> the conclusion that we use hadoop for data lake ( cold data ) and Cassandra
>> for hot data (real time data).
>> >>>>
>> >>>> http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop
>> <http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop>
>> >>>>
>> >>>> My question is, can we just use cassandra to rule them all ?
>> >>>>
>> >>>> What we are trying to achieve is to minimize the moving part on our
>> system.
>> >>>>
>> >>>> Any response would be really appreciated.
>> >>>>
>> >>>>
>> >>>> Cheers
>> >>>>
>> >>>> --
>> >>>> Welly Tambunan
>> >>>> Triplelands
>> >>>>
>> >>>> http://weltam.wordpress.com <http://weltam.wordpress.com>
>> >>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Welly Tambunan
>> >> Triplelands
>> >>
>> >> http://weltam.wordpress.com <http://weltam.wordpress.com>
>> >> http://www.triplelands.com <http://www.triplelands.com/blog/>
>> >
>> >
>>
>
>

Re: Hadoop vs Cassandra

2016-10-23 Thread Ali Akhtar

They can be, but I would assume that if your Cassandra data model is
inefficient for the kind of queries you want to do, Spark won't magically
take that way.

For example, say you have a users table. Each user has a country, which
isn't a partitioning key or clustering key.

If you wanted to calculate the number of all users from a particular query,
there's no way to do that in the previous data model other than to do a
full table scan and count the users from that country.

Spark can do this full table scan for you and return the number of records.
May be it can spread the work across multiple servers. But it can't reduce
the amount of work that has to be done.

Otoh, if you were okay with creating a new table in which the country is
part of the primary key, and for each user that signed up, you created a
record in this user_by_country table, then it would be a very fast query to
look up the users in a particular country, as country is then the primary
key.

On Sun, Oct 23, 2016 at 2:18 PM, Welly Tambunan  wrote:

> I like muti data centre resillience in cassandra.
>
> I think thats plus one for cassandra.
>
> Ali, complex analytics can be done in spark right?
>
> On 23 Oct 2016 4:08 p.m., "Ali Akhtar"  wrote:
>
> >
>
> > I would say it depends on your use case.
> >
> > If you need a lot of queries that require joins, or complex analytics of
> the kind that Cassandra isn't suited for, then HDFS / HBase may be better.
> >
> > If you can work with the cassandra way of doing things (creating new
> tables for each query you'll need to do, duplicating data - doing extra
> writes for faster reads) , then Cassandra should work for you. It is easier
> to setup and do dev ops with, in my experience.
> >
> > On Sun, Oct 23, 2016 at 2:05 PM, Welly Tambunan 
> wrote:
>
> >>
>
> >> I mean. HDFS and HBase.
> >>
> >> On Sun, Oct 23, 2016 at 4:00 PM, Ali Akhtar 
> wrote:
>
> >>>
>
> >>> By Hadoop do you mean HDFS?
> >>>
> >>>
> >>>
> >>> On Sun, Oct 23, 2016 at 1:56 PM, Welly Tambunan 
> wrote:
>
> >>>>
>
> >>>> Hi All,
> >>>>
> >>>> I read the following comparison between hadoop and cassandra. Seems
> the conclusion that we use hadoop for data lake ( cold data ) and Cassandra
> for hot data (real time data).
> >>>>
> >>>> http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop
> <http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop>
> >>>>
> >>>> My question is, can we just use cassandra to rule them all ?
> >>>>
> >>>> What we are trying to achieve is to minimize the moving part on our
> system.
> >>>>
> >>>> Any response would be really appreciated.
> >>>>
> >>>>
> >>>> Cheers
> >>>>
> >>>> --
> >>>> Welly Tambunan
> >>>> Triplelands
> >>>>
> >>>> http://weltam.wordpress.com <http://weltam.wordpress.com>
> >>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
> >>>
> >>>
> >>
> >>
> >>
> >> --
> >> Welly Tambunan
> >> Triplelands
> >>
> >> http://weltam.wordpress.com <http://weltam.wordpress.com>
> >> http://www.triplelands.com <http://www.triplelands.com/blog/>
> >
> >
>

Re: Hadoop vs Cassandra

2016-10-23 Thread Ali Akhtar

I would say it depends on your use case.

If you need a lot of queries that require joins, or complex analytics of
the kind that Cassandra isn't suited for, then HDFS / HBase may be better.

If you can work with the cassandra way of doing things (creating new tables
for each query you'll need to do, duplicating data - doing extra writes for
faster reads) , then Cassandra should work for you. It is easier to setup
and do dev ops with, in my experience.

On Sun, Oct 23, 2016 at 2:05 PM, Welly Tambunan  wrote:

> I mean. HDFS and HBase.
>
> On Sun, Oct 23, 2016 at 4:00 PM, Ali Akhtar  wrote:
>
>> By Hadoop do you mean HDFS?
>>
>>
>>
>> On Sun, Oct 23, 2016 at 1:56 PM, Welly Tambunan 
>> wrote:
>>
>>> Hi All,
>>>
>>> I read the following comparison between hadoop and cassandra. Seems the
>>> conclusion that we use hadoop for data lake ( cold data ) and Cassandra for
>>> hot data (real time data).
>>>
>>> http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop
>>>
>>> My question is, can we just use cassandra to rule them all ?
>>>
>>> What we are trying to achieve is to minimize the moving part on our
>>> system.
>>>
>>> Any response would be really appreciated.
>>>
>>>
>>> Cheers
>>>
>>> --
>>> Welly Tambunan
>>> Triplelands
>>>
>>> http://weltam.wordpress.com
>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>
>>
>>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com <http://www.triplelands.com/blog/>
>

Re: Hadoop vs Cassandra

2016-10-23 Thread Ali Akhtar

By Hadoop do you mean HDFS?



On Sun, Oct 23, 2016 at 1:56 PM, Welly Tambunan  wrote:

> Hi All,
>
> I read the following comparison between hadoop and cassandra. Seems the
> conclusion that we use hadoop for data lake ( cold data ) and Cassandra for
> hot data (real time data).
>
> http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop
>
> My question is, can we just use cassandra to rule them all ?
>
> What we are trying to achieve is to minimize the moving part on our
> system.
>
> Any response would be really appreciated.
>
>
> Cheers
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com 
>

Re: What is the maximum value of Cassandra Counter Column?

2016-10-23 Thread Ali Akhtar

It seems obvious.

On Sun, Oct 23, 2016 at 1:15 PM, Kant Kodali  wrote:

> where does it say counter is implemented as long?
>
> On Sun, Oct 23, 2016 at 1:13 AM, Ali Akhtar  wrote:
>
>> Probably: https://docs.oracle.com/javase/8/docs/api/java/lan
>> g/Long.html#MAX_VALUE
>>
>> On Sun, Oct 23, 2016 at 1:12 PM, Kant Kodali  wrote:
>>
>>> What is the maximum value of Cassandra Counter Column?
>>>
>>
>>
>

Re: What is the maximum value of Cassandra Counter Column?

2016-10-23 Thread Ali Akhtar

Probably:
https://docs.oracle.com/javase/8/docs/api/java/lang/Long.html#MAX_VALUE

On Sun, Oct 23, 2016 at 1:12 PM, Kant Kodali  wrote:

> What is the maximum value of Cassandra Counter Column?
>

Re: Speeding up schema generation during tests

2016-10-19 Thread Ali Akhtar

Horschi, you are the hero gotham deserves.

Test time reduced from 10 seconds to 800 ms

On Wed, Oct 19, 2016 at 12:40 PM, horschi  wrote:

> Have you tried starting Cassandra with -Dcassandra.unsafesystem=true ?
>
>
> On Wed, Oct 19, 2016 at 9:31 AM, DuyHai Doan  wrote:
>
>> As I said, when I bootstrap the server and create some keyspace,
>> sometimes the schema is not fully initialized and when the test code tried
>> to insert data, it fails.
>>
>> I did not have time to dig into the source code to find the root cause,
>> maybe it's something really stupid and simple to fix. If you want to
>> investigate and try out my CassandraDaemon server, I'd be happy to get
>> feedbacks
>>
>> On Wed, Oct 19, 2016 at 9:22 AM, Ali Akhtar  wrote:
>>
>>> Thanks. I've disabled durable writes but this is still pretty slow
>>> (about 10 seconds).
>>>
>>> What issues did you run into with your impl?
>>>
>>> On Wed, Oct 19, 2016 at 12:15 PM, DuyHai Doan 
>>> wrote:
>>>
>>>> There is a lot of pre-flight checks when starting the cassandra server
>>>> and they took time.
>>>>
>>>> For integration testing, I have developped a modified CassandraDeamon
>>>> here that remove pretty most of those checks:
>>>>
>>>> https://github.com/doanduyhai/Achilles/blob/master/achilles-
>>>> embedded/src/main/java/info/archinnov/achilles/embedded/Achi
>>>> llesCassandraDaemon.java
>>>>
>>>> The problem is that I felt into weird scenarios where creating a
>>>> keyspace wasn't created in timely manner so I just stop using this impl for
>>>> the moment, just look at it and do whatever you want.
>>>>
>>>> Another idea for testing is to disable durable write to speed up
>>>> mutation (CREATE KEYSPACE ... WITH durable_write=false)
>>>>
>>>> On Wed, Oct 19, 2016 at 3:24 AM, Ali Akhtar 
>>>> wrote:
>>>>
>>>>> Is there a way to speed up the creation of keyspace + tables during
>>>>> integration tests? I am using an RF of 1, with SimpleStrategy, but it 
>>>>> still
>>>>> takes upto 10-15 seconds.
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Speeding up schema generation during tests

2016-10-19 Thread Ali Akhtar

Thanks. I've disabled durable writes but this is still pretty slow (about
10 seconds).

What issues did you run into with your impl?

On Wed, Oct 19, 2016 at 12:15 PM, DuyHai Doan  wrote:

> There is a lot of pre-flight checks when starting the cassandra server and
> they took time.
>
> For integration testing, I have developped a modified CassandraDeamon here
> that remove pretty most of those checks:
>
> https://github.com/doanduyhai/Achilles/blob/master/achilles-
> embedded/src/main/java/info/archinnov/achilles/embedded/
> AchillesCassandraDaemon.java
>
> The problem is that I felt into weird scenarios where creating a keyspace
> wasn't created in timely manner so I just stop using this impl for the
> moment, just look at it and do whatever you want.
>
> Another idea for testing is to disable durable write to speed up mutation
> (CREATE KEYSPACE ... WITH durable_write=false)
>
> On Wed, Oct 19, 2016 at 3:24 AM, Ali Akhtar  wrote:
>
>> Is there a way to speed up the creation of keyspace + tables during
>> integration tests? I am using an RF of 1, with SimpleStrategy, but it still
>> takes upto 10-15 seconds.
>>
>
>

Speeding up schema generation during tests

2016-10-18 Thread Ali Akhtar

Is there a way to speed up the creation of keyspace + tables during
integration tests? I am using an RF of 1, with SimpleStrategy, but it still
takes upto 10-15 seconds.

Re: mapper.save() throws a ThreadPool error (Java)

2016-10-11 Thread Ali Akhtar

Uh, yeah, I'm a moron. I was doing this inside a try/catch block, and the
class containing my session was autoclosing the session at the end of the
try/ catch (i.e try (Environment env = new Environment() ).

Nvm, I'm an idiot

On Tue, Oct 11, 2016 at 8:29 PM, Ali Akhtar  wrote:

> This is a little urgent, so any help would be greatly appreciated.
>
> On Tue, Oct 11, 2016 at 8:22 PM, Ali Akhtar  wrote:
>
>> I'm creating a session, connecting to it, then creating a
>> mappingManager(), then obtaining a mapper for MyPojo.class
>>
>> If I then try to do mapper.save(myPojo), I get the following stacktrace:
>>
>> Oct 11, 2016 8:16:26 PM com.google.common.util.concurrent.ExecutionList
>> executeListener
>> SEVERE: RuntimeException while executing runnable
>> com.google.common.util.concurrent.Futures$ChainingListenable
>> Future@5164e29 with executor com.google.common.util.concurr
>> ent.MoreExecutors$ListeningDecorator@5f77d54d
>> java.util.concurrent.RejectedExecutionException: Task
>> com.google.common.util.concurrent.Futures$ChainingListenable
>> Future@5164e29 rejected from java.util.concurrent.ThreadPoo
>> lExecutor@53213dad[Terminated, pool size = 0, active threads = 0, queued
>> tasks = 0, completed tasks = 0]
>> at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.
>> rejectedExecution(ThreadPoolExecutor.java:2047)
>> at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExe
>> cutor.java:823)
>> at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolEx
>> ecutor.java:1369)
>> at com.google.common.util.concurrent.MoreExecutors$ListeningDec
>> orator.execute(MoreExecutors.java:484)
>> at com.google.common.util.concurrent.ExecutionList.executeListe
>> ner(ExecutionList.java:156)
>> at com.google.common.util.concurrent.ExecutionList.add(Executio
>> nList.java:101)
>> at com.google.common.util.concurrent.AbstractFuture.addListener
>> (AbstractFuture.java:170)
>> at com.google.common.util.concurrent.Futures.transform(Futures.java:608)
>> at com.datastax.driver.core.SessionManager.toPreparedStatement(
>> SessionManager.java:200)
>> at com.datastax.driver.core.SessionManager.prepareAsync(Session
>> Manager.java:161)
>> at com.datastax.driver.core.AbstractSession.prepareAsync(Abstra
>> ctSession.java:134)
>> at com.datastax.driver.mapping.Mapper.getPreparedQueryAsync(Map
>> per.java:121)
>> at com.datastax.driver.mapping.Mapper.saveQueryAsync(Mapper.java:224)
>> at com.datastax.driver.mapping.Mapper.saveAsync(Mapper.java:307)
>> at com.datastax.driver.mapping.Mapper.save(Mapper.java:270)
>>
>>
>>
>> Any ideas what's causing this? Afaik I'm doing all the steps asked for
>>
>>
>

Re: mapper.save() throws a ThreadPool error (Java)

2016-10-11 Thread Ali Akhtar

This is a little urgent, so any help would be greatly appreciated.

On Tue, Oct 11, 2016 at 8:22 PM, Ali Akhtar  wrote:

> I'm creating a session, connecting to it, then creating a
> mappingManager(), then obtaining a mapper for MyPojo.class
>
> If I then try to do mapper.save(myPojo), I get the following stacktrace:
>
> Oct 11, 2016 8:16:26 PM com.google.common.util.concurrent.ExecutionList
> executeListener
> SEVERE: RuntimeException while executing runnable com.google.common.util.
> concurrent.Futures$ChainingListenableFuture@5164e29 with executor
> com.google.common.util.concurrent.MoreExecutors$
> ListeningDecorator@5f77d54d
> java.util.concurrent.RejectedExecutionException: Task
> com.google.common.util.concurrent.Futures$ChainingListenableFuture@5164e29
> rejected from java.util.concurrent.ThreadPoolExecutor@53213dad[Terminated,
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
> at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(
> ThreadPoolExecutor.java:2047)
> at java.util.concurrent.ThreadPoolExecutor.reject(
> ThreadPoolExecutor.java:823)
> at java.util.concurrent.ThreadPoolExecutor.execute(
> ThreadPoolExecutor.java:1369)
> at com.google.common.util.concurrent.MoreExecutors$
> ListeningDecorator.execute(MoreExecutors.java:484)
> at com.google.common.util.concurrent.ExecutionList.
> executeListener(ExecutionList.java:156)
> at com.google.common.util.concurrent.ExecutionList.add(
> ExecutionList.java:101)
> at com.google.common.util.concurrent.AbstractFuture.
> addListener(AbstractFuture.java:170)
> at com.google.common.util.concurrent.Futures.transform(Futures.java:608)
> at com.datastax.driver.core.SessionManager.toPreparedStatement(
> SessionManager.java:200)
> at com.datastax.driver.core.SessionManager.prepareAsync(
> SessionManager.java:161)
> at com.datastax.driver.core.AbstractSession.prepareAsync(
> AbstractSession.java:134)
> at com.datastax.driver.mapping.Mapper.getPreparedQueryAsync(
> Mapper.java:121)
> at com.datastax.driver.mapping.Mapper.saveQueryAsync(Mapper.java:224)
> at com.datastax.driver.mapping.Mapper.saveAsync(Mapper.java:307)
> at com.datastax.driver.mapping.Mapper.save(Mapper.java:270)
>
>
>
> Any ideas what's causing this? Afaik I'm doing all the steps asked for
>
>

mapper.save() throws a ThreadPool error (Java)

2016-10-11 Thread Ali Akhtar

I'm creating a session, connecting to it, then creating a mappingManager(),
then obtaining a mapper for MyPojo.class

If I then try to do mapper.save(myPojo), I get the following stacktrace:

Oct 11, 2016 8:16:26 PM com.google.common.util.concurrent.ExecutionList
executeListener
SEVERE: RuntimeException while executing runnable
com.google.common.util.concurrent.Futures$ChainingListenableFuture@5164e29
with executor
com.google.common.util.concurrent.MoreExecutors$ListeningDecorator@5f77d54d
java.util.concurrent.RejectedExecutionException: Task
com.google.common.util.concurrent.Futures$ChainingListenableFuture@5164e29
rejected from java.util.concurrent.ThreadPoolExecutor@53213dad[Terminated,
pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
at
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
at
com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:484)
at
com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
at
com.google.common.util.concurrent.ExecutionList.add(ExecutionList.java:101)
at
com.google.common.util.concurrent.AbstractFuture.addListener(AbstractFuture.java:170)
at com.google.common.util.concurrent.Futures.transform(Futures.java:608)
at
com.datastax.driver.core.SessionManager.toPreparedStatement(SessionManager.java:200)
at
com.datastax.driver.core.SessionManager.prepareAsync(SessionManager.java:161)
at
com.datastax.driver.core.AbstractSession.prepareAsync(AbstractSession.java:134)
at com.datastax.driver.mapping.Mapper.getPreparedQueryAsync(Mapper.java:121)
at com.datastax.driver.mapping.Mapper.saveQueryAsync(Mapper.java:224)
at com.datastax.driver.mapping.Mapper.saveAsync(Mapper.java:307)
at com.datastax.driver.mapping.Mapper.save(Mapper.java:270)



Any ideas what's causing this? Afaik I'm doing all the steps asked for

Re: Java Driver - Specifying parameters for an IN() query?

2016-10-11 Thread Ali Akhtar

Justin,

I'm asking how to bind a parameter for IN queries thru the java driver.

On Tue, Oct 11, 2016 at 7:22 PM, Justin Cameron 
wrote:

> You need to specify the values themselves.
>
> CREATE TABLE user (
> id int,
> type text,
> val1 int,
> val2 text,
> PRIMARY KEY ((id, category), val1, val2)
> );
>
> SELECT * FROM user WHERE id = 1 AND type IN ('user', 'admin') AND val1 =
> 3 AND val2 IN ('a', 'v', 'd');
>
> On Tue, 11 Oct 2016 at 07:11 Ali Akhtar  wrote:
>
> Do you send the values themselves, or send them as an array / collection?
> Or will both work?
>
> On Tue, Oct 11, 2016 at 7:10 PM, Justin Cameron 
> wrote:
>
> You can pass multiple values to the IN clause, however they can only be
> used on the last column in the partition key and/or the last column in the
> full primary key.
>
> Example:
>
> 'Select * from my_table WHERE pk = 'test' And ck IN (1, 2)'
>
>
> On Tue, 11 Oct 2016 at 06:15 Ali Akhtar  wrote:
>
> If I wanted to create an accessor, and have a method which does a query
> like this:
>
> 'Select * from my_table WHERE pk = ? And ck IN (?)'
>
> And there were multiple options that could go inside the IN() query, how
> can I specify that? Will it e.g, let me pass in an array as the 2nd
> variable?
>
> --
>
> Justin Cameron
>
> Senior Software Engineer | Instaclustr
>
>
>
>
> This email has been sent on behalf of Instaclustr Pty Ltd (Australia) and
> Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>
>
>
>
> --
>
> Justin Cameron
>
> Senior Software Engineer | Instaclustr
>
>
>
>
> This email has been sent on behalf of Instaclustr Pty Ltd (Australia) and
> Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>

Re: Java Driver - Specifying parameters for an IN() query?

2016-10-11 Thread Ali Akhtar

Ah, thanks, good catch.

If I send a List / Array as value for the last param, will that get bound
as expected?

On Tue, Oct 11, 2016 at 7:16 PM, horschi  wrote:

> Hi Ali,
>
> do you perhaps want "'Select * from my_table WHERE pk = ? And ck IN ?'" ?
> (Without the brackets around the question mark)
>
> regards,
> Ch
>
> On Tue, Oct 11, 2016 at 3:14 PM, Ali Akhtar  wrote:
>
>> If I wanted to create an accessor, and have a method which does a query
>> like this:
>>
>> 'Select * from my_table WHERE pk = ? And ck IN (?)'
>>
>> And there were multiple options that could go inside the IN() query, how
>> can I specify that? Will it e.g, let me pass in an array as the 2nd
>> variable?
>>
>
>

Re: Java Driver - Specifying parameters for an IN() query?

2016-10-11 Thread Ali Akhtar

Do you send the values themselves, or send them as an array / collection?
Or will both work?

On Tue, Oct 11, 2016 at 7:10 PM, Justin Cameron 
wrote:

> You can pass multiple values to the IN clause, however they can only be
> used on the last column in the partition key and/or the last column in the
> full primary key.
>
> Example:
>
> 'Select * from my_table WHERE pk = 'test' And ck IN (1, 2)'
>
>
> On Tue, 11 Oct 2016 at 06:15 Ali Akhtar  wrote:
>
>> If I wanted to create an accessor, and have a method which does a query
>> like this:
>>
>> 'Select * from my_table WHERE pk = ? And ck IN (?)'
>>
>> And there were multiple options that could go inside the IN() query, how
>> can I specify that? Will it e.g, let me pass in an array as the 2nd
>> variable?
>>
> --
>
> Justin Cameron
>
> Senior Software Engineer | Instaclustr
>
>
>
>
> This email has been sent on behalf of Instaclustr Pty Ltd (Australia) and
> Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>

Java Driver - Specifying parameters for an IN() query?

2016-10-11 Thread Ali Akhtar

If I wanted to create an accessor, and have a method which does a query
like this:

'Select * from my_table WHERE pk = ? And ck IN (?)'

And there were multiple options that could go inside the IN() query, how
can I specify that? Will it e.g, let me pass in an array as the 2nd
variable?

Re: Being asked to use frozen for UDT in 3.9

2016-10-10 Thread Ali Akhtar

Is it possible to use fields on the UDT as primary / cluster keys?

On Tue, Oct 11, 2016 at 9:49 AM, Ali Akhtar  wrote:

> Yeah, you're right, it does work if I run it thru cqlsh. I was using
> DevCenter which shows that error.
>
> On Tue, Oct 11, 2016 at 9:48 AM, Andrew Tolbert <
> andrew.tolb...@datastax.com> wrote:
>
>> That works for me.   Are you sure you are on 3.6+?  What error message
>> are you getting?
>>
>> Thanks,
>> Andy
>>
>> On Mon, Oct 10, 2016 at 11:25 PM Ali Akhtar  wrote:
>>
>>> CREATE TYPE test (
>>> foo text,
>>> bar text
>>> );
>>>
>>> CREATE TABLE test_table (
>>> id text,
>>> this_doesnt_work test,
>>> PRIMARY KEY (id)
>>> );
>>>
>>> On Tue, Oct 11, 2016 at 9:23 AM, Andrew Tolbert <
>>> andrew.tolb...@datastax.com> wrote:
>>>
>>> Can you please share an example where it doesn't work?
>>>
>>> Thanks,
>>> Andy
>>>
>>> On Mon, Oct 10, 2016 at 11:21 PM Ali Akhtar 
>>> wrote:
>>>
>>> Not sure I understand the question, sorry.
>>>
>>> The column isn't part of the primary key.
>>>
>>> I defined a UDT and then I tried to define a column (not primary or
>>> cluster key) as being of that type, but it doesn't let me do that unless i
>>> set it as frozen. Docs indicate otherwise though
>>>
>>> On Tue, Oct 11, 2016 at 9:09 AM, Andrew Tolbert <
>>> andrew.tolb...@datastax.com> wrote:
>>>
>>> Is the column you are using that has the UDT type is the or is part of
>>> the primary key?  If that is the case it still needs to be frozen (the same
>>> goes for list, set, tuple as part of primary key).  This is the error I get
>>> when I try that:
>>>
>>> InvalidRequest: Error from server: code=2200 [Invalid query]
>>> message="Invalid non-frozen user-defined type for PRIMARY KEY component
>>> basics"
>>>
>>> Andy
>>>
>>> On Mon, Oct 10, 2016 at 8:27 PM Ali Akhtar  wrote:
>>>
>>> According to http://docs.datastax.com/en/cql/3.3/cql/cql_using/useCrea
>>> teUDT.html
>>>
>>> >  In Cassandra 3.6 and later, the frozen keyword is not required for
>>> UDTs that contain only non-collection fields.
>>>
>>> However if I create a type with 4-5 all text fields, and try to use that
>>> type in another table, I get told to use frozen , even though I'm on
>>> cassandra 3.9
>>>
>>> >  show VERSION
>>> > [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
>>>
>>> Any ideas?
>>>
>>>
>>>
>>>
>

Re: Being asked to use frozen for UDT in 3.9

2016-10-10 Thread Ali Akhtar

Yeah, you're right, it does work if I run it thru cqlsh. I was using
DevCenter which shows that error.

On Tue, Oct 11, 2016 at 9:48 AM, Andrew Tolbert  wrote:

> That works for me.   Are you sure you are on 3.6+?  What error message are
> you getting?
>
> Thanks,
> Andy
>
> On Mon, Oct 10, 2016 at 11:25 PM Ali Akhtar  wrote:
>
>> CREATE TYPE test (
>> foo text,
>> bar text
>> );
>>
>> CREATE TABLE test_table (
>> id text,
>> this_doesnt_work test,
>> PRIMARY KEY (id)
>> );
>>
>> On Tue, Oct 11, 2016 at 9:23 AM, Andrew Tolbert <
>> andrew.tolb...@datastax.com> wrote:
>>
>> Can you please share an example where it doesn't work?
>>
>> Thanks,
>> Andy
>>
>> On Mon, Oct 10, 2016 at 11:21 PM Ali Akhtar  wrote:
>>
>> Not sure I understand the question, sorry.
>>
>> The column isn't part of the primary key.
>>
>> I defined a UDT and then I tried to define a column (not primary or
>> cluster key) as being of that type, but it doesn't let me do that unless i
>> set it as frozen. Docs indicate otherwise though
>>
>> On Tue, Oct 11, 2016 at 9:09 AM, Andrew Tolbert <
>> andrew.tolb...@datastax.com> wrote:
>>
>> Is the column you are using that has the UDT type is the or is part of
>> the primary key?  If that is the case it still needs to be frozen (the same
>> goes for list, set, tuple as part of primary key).  This is the error I get
>> when I try that:
>>
>> InvalidRequest: Error from server: code=2200 [Invalid query]
>> message="Invalid non-frozen user-defined type for PRIMARY KEY component
>> basics"
>>
>> Andy
>>
>> On Mon, Oct 10, 2016 at 8:27 PM Ali Akhtar  wrote:
>>
>> According to http://docs.datastax.com/en/cql/3.3/cql/cql_using/
>> useCreateUDT.html
>>
>> >  In Cassandra 3.6 and later, the frozen keyword is not required for
>> UDTs that contain only non-collection fields.
>>
>> However if I create a type with 4-5 all text fields, and try to use that
>> type in another table, I get told to use frozen , even though I'm on
>> cassandra 3.9
>>
>> >  show VERSION
>> > [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
>>
>> Any ideas?
>>
>>
>>
>>

Re: NamingStrategy for the Java Driver for camelCase / snake_case conversion?

2016-10-10 Thread Ali Akhtar

Awesome, thank you.

Perhaps this should be updated on the docs here:
http://docs.datastax.com/en/developer/java-driver//3.1/manual/udts/



On Tue, Oct 11, 2016 at 9:27 AM, Andrew Tolbert  wrote:

> Indeed it is possible to use UDTs with the mapper (docs
> <http://datastax.github.io/java-driver/manual/object_mapper/creating/#mapping-user-types>).
> Pojos are annotated with @UDT and their fields are mapped with @Field (like
> table pojos are annotated with @Table and @Column respectively).  You are
> correct in that you can then use that type for a field on a @Table
> annotated class.
>
> Thanks,
> Andy
>
>
>
> On Mon, Oct 10, 2016 at 11:23 PM Ali Akhtar  wrote:
>
>> Thanks.
>>
>> Btw, is it possible to use UDTs and have them mapped via the java driver?
>> If so, how does that work - do I just create a pojo for the UDT, and use
>> @Column on the fields, and it will work if I define a field in the table
>> mapping class as being of that pojo type?
>>
>> On Tue, Oct 11, 2016 at 8:57 AM, Andrew Tolbert <
>> andrew.tolb...@datastax.com> wrote:
>>
>> I agree this would be a nice mechanism for the driver mapper given the
>> difference between java field name conventions and how cql column names are
>> typically defined.   I've created JAVA-1316
>> <https://datastax-oss.atlassian.net/browse/JAVA-1316> for this.
>>
>> Thanks,
>> Andy
>>
>>
>>
>> On Mon, Oct 10, 2016 at 10:30 PM Ali Akhtar  wrote:
>>
>> Please fix this.
>>
>>
>>
>> On Tue, Oct 11, 2016 at 8:28 AM, Andrew Tolbert <
>> andrew.tolb...@datastax.com> wrote:
>>
>> Hi Ali,
>>
>> As far as I know this hasn't changed.  Either the field name on the class
>> has to match the name of the column or you have to use the @Column with the
>> name attribute to set the column name being mapped by that field.
>>
>> Thanks,
>> Andy
>>
>> On Mon, Oct 10, 2016 at 8:03 PM Ali Akhtar  wrote:
>>
>> In working with Jackson, it has a NamingStrategy which lets you
>> automatically map snake_case fields in json to camelCase fields on the Java
>> class.
>>
>> Last time I worked w/ Cassandra, I didn't find anything like that, and
>> had to define an @Column annotation for each field.
>>
>> Please tell me this has changed now?
>>
>>
>>
>>

Re: Being asked to use frozen for UDT in 3.9

2016-10-10 Thread Ali Akhtar

CREATE TYPE test (
foo text,
bar text
);

CREATE TABLE test_table (
id text,
this_doesnt_work test,
PRIMARY KEY (id)
);

On Tue, Oct 11, 2016 at 9:23 AM, Andrew Tolbert  wrote:

> Can you please share an example where it doesn't work?
>
> Thanks,
> Andy
>
> On Mon, Oct 10, 2016 at 11:21 PM Ali Akhtar  wrote:
>
>> Not sure I understand the question, sorry.
>>
>> The column isn't part of the primary key.
>>
>> I defined a UDT and then I tried to define a column (not primary or
>> cluster key) as being of that type, but it doesn't let me do that unless i
>> set it as frozen. Docs indicate otherwise though
>>
>> On Tue, Oct 11, 2016 at 9:09 AM, Andrew Tolbert <
>> andrew.tolb...@datastax.com> wrote:
>>
>> Is the column you are using that has the UDT type is the or is part of
>> the primary key?  If that is the case it still needs to be frozen (the same
>> goes for list, set, tuple as part of primary key).  This is the error I get
>> when I try that:
>>
>> InvalidRequest: Error from server: code=2200 [Invalid query]
>> message="Invalid non-frozen user-defined type for PRIMARY KEY component
>> basics"
>>
>> Andy
>>
>> On Mon, Oct 10, 2016 at 8:27 PM Ali Akhtar  wrote:
>>
>> According to http://docs.datastax.com/en/cql/3.3/cql/cql_using/
>> useCreateUDT.html
>>
>> >  In Cassandra 3.6 and later, the frozen keyword is not required for
>> UDTs that contain only non-collection fields.
>>
>> However if I create a type with 4-5 all text fields, and try to use that
>> type in another table, I get told to use frozen , even though I'm on
>> cassandra 3.9
>>
>> >  show VERSION
>> > [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
>>
>> Any ideas?
>>
>>
>>

Re: NamingStrategy for the Java Driver for camelCase / snake_case conversion?

2016-10-10 Thread Ali Akhtar

Thanks.

Btw, is it possible to use UDTs and have them mapped via the java driver?
If so, how does that work - do I just create a pojo for the UDT, and use
@Column on the fields, and it will work if I define a field in the table
mapping class as being of that pojo type?

On Tue, Oct 11, 2016 at 8:57 AM, Andrew Tolbert  wrote:

> I agree this would be a nice mechanism for the driver mapper given the
> difference between java field name conventions and how cql column names are
> typically defined.   I've created JAVA-1316
> <https://datastax-oss.atlassian.net/browse/JAVA-1316> for this.
>
> Thanks,
> Andy
>
>
>
> On Mon, Oct 10, 2016 at 10:30 PM Ali Akhtar  wrote:
>
>> Please fix this.
>>
>>
>>
>> On Tue, Oct 11, 2016 at 8:28 AM, Andrew Tolbert <
>> andrew.tolb...@datastax.com> wrote:
>>
>> Hi Ali,
>>
>> As far as I know this hasn't changed.  Either the field name on the class
>> has to match the name of the column or you have to use the @Column with the
>> name attribute to set the column name being mapped by that field.
>>
>> Thanks,
>> Andy
>>
>> On Mon, Oct 10, 2016 at 8:03 PM Ali Akhtar  wrote:
>>
>> In working with Jackson, it has a NamingStrategy which lets you
>> automatically map snake_case fields in json to camelCase fields on the Java
>> class.
>>
>> Last time I worked w/ Cassandra, I didn't find anything like that, and
>> had to define an @Column annotation for each field.
>>
>> Please tell me this has changed now?
>>
>>
>>

Re: Being asked to use frozen for UDT in 3.9

2016-10-10 Thread Ali Akhtar

Not sure I understand the question, sorry.

The column isn't part of the primary key.

I defined a UDT and then I tried to define a column (not primary or cluster
key) as being of that type, but it doesn't let me do that unless i set it
as frozen. Docs indicate otherwise though

On Tue, Oct 11, 2016 at 9:09 AM, Andrew Tolbert  wrote:

> Is the column you are using that has the UDT type is the or is part of the
> primary key?  If that is the case it still needs to be frozen (the same
> goes for list, set, tuple as part of primary key).  This is the error I get
> when I try that:
>
> InvalidRequest: Error from server: code=2200 [Invalid query]
> message="Invalid non-frozen user-defined type for PRIMARY KEY component
> basics"
>
> Andy
>
> On Mon, Oct 10, 2016 at 8:27 PM Ali Akhtar  wrote:
>
>> According to http://docs.datastax.com/en/cql/3.3/cql/cql_using/
>> useCreateUDT.html
>>
>> >  In Cassandra 3.6 and later, the frozen keyword is not required for
>> UDTs that contain only non-collection fields.
>>
>> However if I create a type with 4-5 all text fields, and try to use that
>> type in another table, I get told to use frozen , even though I'm on
>> cassandra 3.9
>>
>> >  show VERSION
>> > [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
>>
>> Any ideas?
>>
>>

Re: NamingStrategy for the Java Driver for camelCase / snake_case conversion?

2016-10-10 Thread Ali Akhtar

Please fix this.



On Tue, Oct 11, 2016 at 8:28 AM, Andrew Tolbert  wrote:

> Hi Ali,
>
> As far as I know this hasn't changed.  Either the field name on the class
> has to match the name of the column or you have to use the @Column with the
> name attribute to set the column name being mapped by that field.
>
> Thanks,
> Andy
>
> On Mon, Oct 10, 2016 at 8:03 PM Ali Akhtar  wrote:
>
>> In working with Jackson, it has a NamingStrategy which lets you
>> automatically map snake_case fields in json to camelCase fields on the Java
>> class.
>>
>> Last time I worked w/ Cassandra, I didn't find anything like that, and
>> had to define an @Column annotation for each field.
>>
>> Please tell me this has changed now?
>>
>>

Being asked to use frozen for UDT in 3.9

2016-10-10 Thread Ali Akhtar

According to
http://docs.datastax.com/en/cql/3.3/cql/cql_using/useCreateUDT.html

>  In Cassandra 3.6 and later, the frozen keyword is not required for UDTs
that contain only non-collection fields.

However if I create a type with 4-5 all text fields, and try to use that
type in another table, I get told to use frozen , even though I'm on
cassandra 3.9

>  show VERSION
> [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]

Any ideas?

NamingStrategy for the Java Driver for camelCase / snake_case conversion?

2016-10-10 Thread Ali Akhtar

In working with Jackson, it has a NamingStrategy which lets you
automatically map snake_case fields in json to camelCase fields on the Java
class.

Last time I worked w/ Cassandra, I didn't find anything like that, and had
to define an @Column annotation for each field.

Please tell me this has changed now?

Re: Ordering by multiple columns?

2016-10-10 Thread Ali Akhtar

Okay.. so, how would you achieve the above scenario in cassandra?

On Tue, Oct 11, 2016 at 3:25 AM, Peddi, Praveen  wrote:

> That's not just a bad idea but that's impossible. Any field that is part
> of primary key is immutable. You should read up the Cassandra documentation
> and understand the basics before start using it. Otherwise you could easily
> abuse it inadvertently.
>
> Praveen
>
> On Oct 10, 2016, at 6:22 PM, Ali Akhtar  wrote:
>
> E.g if I wanted to select * from foo where last_updated <= ?
>
> In this case, (I believe) last_updated will have to be a clustering key.
> If the record got updated and I wanted to update last_updated accordingly,
> that's a bad idea?
>
> :S
>
> On Tue, Oct 11, 2016 at 3:19 AM, Ali Akhtar  wrote:
>
>> Huh - So if I wanted to search / filter by a timestamp field, and this
>> timestamp needed to get updated, that won't be possible?
>>
>> On Tue, Oct 11, 2016 at 3:07 AM, Nicolas Douillet <
>> nicolas.douil...@gmail.com> wrote:
>>
>>> If I correctly understand the answers, the solution to your ordering
>>> question is to use clustering keys.
>>> I'm agree, but I just wanted to warn you about one limitation :  the
>>> values of keys can't be updated, unless by using a delete and then an
>>> insert.
>>> (In the case of your song "example", putting the rate as a key can be
>>> tricky if the value has to be frequently updated)
>>>
>>>
>>> Le lun. 10 oct. 2016 à 22:15, Mikhail Krupitskiy <
>>> mikhail.krupits...@jetbrains.com> a écrit :
>>>
>>>> Looks like ordering by multiple columns in Cassandra has few sides that
>>>> are not obvious.
>>>> I wasn’t able to find this information in the official documentation
>>>> but it’s quite well described here:
>>>> http://stackoverflow.com/questions/35708118/where-and-order-
>>>> by-clauses-in-cassandra-cql
>>>>
>>>> Thanks,
>>>> Mikhail
>>>>
>>>> On 10 Oct 2016, at 21:55, DuyHai Doan  wrote:
>>>>
>>>> No, we didn't record the talk this time unfortunately :(
>>>>
>>>> On Mon, Oct 10, 2016 at 8:17 PM, Ali Akhtar 
>>>> wrote:
>>>>
>>>> Really helpful slides. Is there a video to go with them?
>>>>
>>>> On Sun, Oct 9, 2016 at 11:48 AM, DuyHai Doan 
>>>> wrote:
>>>>
>>>> Yes it is possible, read this: http://www.slideshare.ne
>>>> t/doanduyhai/datastax-day-2016-cassandra-data-modeling-basics/24
>>>>
>>>> and the following slides
>>>>
>>>> On Sun, Oct 9, 2016 at 2:04 AM, Ali Akhtar 
>>>> wrote:
>>>>
>>>> Is it possible to have multiple clustering keys in cassandra, or some
>>>> other way to order by multiple columns?
>>>>
>>>> For example, say I have a table of songs, and each song has a rating
>>>> and a date.
>>>>
>>>> I want to sort songs by rating first, and then with newer songs on top.
>>>>
>>>> So if two songs have 5 rating, and one's date is 1st Feb, the other is
>>>> 2nd Feb, then I want the 2nd feb one to be sorted above the 1st feb one.
>>>>
>>>> Like this:
>>>>
>>>> Select * from songs order by rating, createdAt
>>>>
>>>> Is this possible?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>

Re: Ordering by multiple columns?

2016-10-10 Thread Ali Akhtar

E.g if I wanted to select * from foo where last_updated <= ?

In this case, (I believe) last_updated will have to be a clustering key. If
the record got updated and I wanted to update last_updated accordingly,
that's a bad idea?

:S

On Tue, Oct 11, 2016 at 3:19 AM, Ali Akhtar  wrote:

> Huh - So if I wanted to search / filter by a timestamp field, and this
> timestamp needed to get updated, that won't be possible?
>
> On Tue, Oct 11, 2016 at 3:07 AM, Nicolas Douillet <
> nicolas.douil...@gmail.com> wrote:
>
>> If I correctly understand the answers, the solution to your ordering
>> question is to use clustering keys.
>> I'm agree, but I just wanted to warn you about one limitation :  the
>> values of keys can't be updated, unless by using a delete and then an
>> insert.
>> (In the case of your song "example", putting the rate as a key can be
>> tricky if the value has to be frequently updated)
>>
>>
>> Le lun. 10 oct. 2016 à 22:15, Mikhail Krupitskiy <
>> mikhail.krupits...@jetbrains.com> a écrit :
>>
>>> Looks like ordering by multiple columns in Cassandra has few sides that
>>> are not obvious.
>>> I wasn’t able to find this information in the official documentation but
>>> it’s quite well described here:
>>> http://stackoverflow.com/questions/35708118/where-and-order-
>>> by-clauses-in-cassandra-cql
>>>
>>> Thanks,
>>> Mikhail
>>>
>>> On 10 Oct 2016, at 21:55, DuyHai Doan  wrote:
>>>
>>> No, we didn't record the talk this time unfortunately :(
>>>
>>> On Mon, Oct 10, 2016 at 8:17 PM, Ali Akhtar 
>>> wrote:
>>>
>>> Really helpful slides. Is there a video to go with them?
>>>
>>> On Sun, Oct 9, 2016 at 11:48 AM, DuyHai Doan 
>>> wrote:
>>>
>>> Yes it is possible, read this: http://www.slideshare.ne
>>> t/doanduyhai/datastax-day-2016-cassandra-data-modeling-basics/24
>>>
>>> and the following slides
>>>
>>> On Sun, Oct 9, 2016 at 2:04 AM, Ali Akhtar  wrote:
>>>
>>> Is it possible to have multiple clustering keys in cassandra, or some
>>> other way to order by multiple columns?
>>>
>>> For example, say I have a table of songs, and each song has a rating and
>>> a date.
>>>
>>> I want to sort songs by rating first, and then with newer songs on top.
>>>
>>> So if two songs have 5 rating, and one's date is 1st Feb, the other is
>>> 2nd Feb, then I want the 2nd feb one to be sorted above the 1st feb one.
>>>
>>> Like this:
>>>
>>> Select * from songs order by rating, createdAt
>>>
>>> Is this possible?
>>>
>>>
>>>
>>>
>>>
>>>
>

Re: Ordering by multiple columns?

2016-10-10 Thread Ali Akhtar

Huh - So if I wanted to search / filter by a timestamp field, and this
timestamp needed to get updated, that won't be possible?

On Tue, Oct 11, 2016 at 3:07 AM, Nicolas Douillet <
nicolas.douil...@gmail.com> wrote:

> If I correctly understand the answers, the solution to your ordering
> question is to use clustering keys.
> I'm agree, but I just wanted to warn you about one limitation :  the
> values of keys can't be updated, unless by using a delete and then an
> insert.
> (In the case of your song "example", putting the rate as a key can be
> tricky if the value has to be frequently updated)
>
>
> Le lun. 10 oct. 2016 à 22:15, Mikhail Krupitskiy <
> mikhail.krupits...@jetbrains.com> a écrit :
>
>> Looks like ordering by multiple columns in Cassandra has few sides that
>> are not obvious.
>> I wasn’t able to find this information in the official documentation but
>> it’s quite well described here:
>> http://stackoverflow.com/questions/35708118/where-and-
>> order-by-clauses-in-cassandra-cql
>>
>> Thanks,
>> Mikhail
>>
>> On 10 Oct 2016, at 21:55, DuyHai Doan  wrote:
>>
>> No, we didn't record the talk this time unfortunately :(
>>
>> On Mon, Oct 10, 2016 at 8:17 PM, Ali Akhtar  wrote:
>>
>> Really helpful slides. Is there a video to go with them?
>>
>> On Sun, Oct 9, 2016 at 11:48 AM, DuyHai Doan 
>> wrote:
>>
>> Yes it is possible, read this: http://www.slideshare.
>> net/doanduyhai/datastax-day-2016-cassandra-data-modeling-basics/24
>>
>> and the following slides
>>
>> On Sun, Oct 9, 2016 at 2:04 AM, Ali Akhtar  wrote:
>>
>> Is it possible to have multiple clustering keys in cassandra, or some
>> other way to order by multiple columns?
>>
>> For example, say I have a table of songs, and each song has a rating and
>> a date.
>>
>> I want to sort songs by rating first, and then with newer songs on top.
>>
>> So if two songs have 5 rating, and one's date is 1st Feb, the other is
>> 2nd Feb, then I want the 2nd feb one to be sorted above the 1st feb one.
>>
>> Like this:
>>
>> Select * from songs order by rating, createdAt
>>
>> Is this possible?
>>
>>
>>
>>
>>
>>

Re: Where to change the datacenter name?

2016-10-10 Thread Ali Akhtar

Yeah, so what's happening is, I'm running Cassandra thru a docker image in
production, and so over there, it is using the datacenter name that I
specified thru an env variable.

But on my local machine, Cassandra is annoyingly insisting on 'datacenter1'.

So in order to maintain the same .cql scripts for setting up the db, I
either need to change the dc name locally or in production.

I guess it looks like I should leave it 'datacenter1' in production.

On Tue, Oct 11, 2016 at 1:19 AM, Amit Trivedi  wrote:

> I believe it is coming from system.local. You can verify by executing
>
> select data_center from system.local;
>
> I would be careful changing datacenter name, particularly in production. This
> is essentially because if change of datacenter requires snitch
> configuration change, it may result in stale data depending on token values
> and snitch settings and there is a risk of node reporting invalid/ missing
> data to client.
>
>
>
> On Mon, Oct 10, 2016 at 4:08 PM, Ali Akhtar  wrote:
>
>> So I see this:
>>
>> cluster_name: 'Test Cluster'
>>
>> But when I grep -i or ctrl + f for 'datacenter1` in cassandra.yaml, I
>> don't see that anywhere except in a comment.
>>
>>
>> Yet when I do nodetool status, I see: datacenter1
>>
>> And unless I define my replication as: '{'class':
>> 'NetworkTopologyStrategy', 'datacenter1' : 3}' when creating my keyspace,
>> my inserts / selects don't work because it says 0 replicas available (i.e
>> if i use anything other than 'datacenter1' in the above stmt)
>>
>> I don't see 'datacenter1' in rackdc.properties. So my question is, which
>> file contains 'datacenter1'?
>>
>> On Tue, Oct 11, 2016 at 12:54 AM, Adam Hutson  wrote:
>>
>>> There is a cluster name in the cassandra.yaml for naming the cluster,
>>> aka data center. Then you assign keyspaces to the data center within the
>>> CREATE KEYSPACE stmt with NetworkTopology.
>>>
>>>
>>> On Monday, October 10, 2016, Ali Akhtar  wrote:
>>>
>>>> Where can I change the default name 'datacenter1'? I've looked through
>>>> the configuration files in /etc/cassandra , and can't find where this value
>>>> is being defined.
>>>>
>>>
>>>
>>> --
>>>
>>> Adam Hutson
>>> Data Architect | DataScale
>>> +1 (417) 224-5212
>>> a...@datascale.io
>>>
>>
>>
>

Re: Where to change the datacenter name?

2016-10-10 Thread Ali Akhtar

So I see this:

cluster_name: 'Test Cluster'

But when I grep -i or ctrl + f for 'datacenter1` in cassandra.yaml, I don't
see that anywhere except in a comment.


Yet when I do nodetool status, I see: datacenter1

And unless I define my replication as: '{'class':
'NetworkTopologyStrategy', 'datacenter1' : 3}' when creating my keyspace,
my inserts / selects don't work because it says 0 replicas available (i.e
if i use anything other than 'datacenter1' in the above stmt)

I don't see 'datacenter1' in rackdc.properties. So my question is, which
file contains 'datacenter1'?

On Tue, Oct 11, 2016 at 12:54 AM, Adam Hutson  wrote:

> There is a cluster name in the cassandra.yaml for naming the cluster, aka
> data center. Then you assign keyspaces to the data center within the CREATE
> KEYSPACE stmt with NetworkTopology.
>
>
> On Monday, October 10, 2016, Ali Akhtar  wrote:
>
>> Where can I change the default name 'datacenter1'? I've looked through
>> the configuration files in /etc/cassandra , and can't find where this value
>> is being defined.
>>
>
>
> --
>
> Adam Hutson
> Data Architect | DataScale
> +1 (417) 224-5212
> a...@datascale.io
>

Where to change the datacenter name?

2016-10-10 Thread Ali Akhtar

Where can I change the default name 'datacenter1'? I've looked through the
configuration files in /etc/cassandra , and can't find where this value is
being defined.

Re: Ordering by multiple columns?

2016-10-10 Thread Ali Akhtar

Really helpful slides. Is there a video to go with them?

On Sun, Oct 9, 2016 at 11:48 AM, DuyHai Doan  wrote:

> Yes it is possible, read this: http://www.slideshare.
> net/doanduyhai/datastax-day-2016-cassandra-data-modeling-basics/24
>
> and the following slides
>
> On Sun, Oct 9, 2016 at 2:04 AM, Ali Akhtar  wrote:
>
>> Is it possible to have multiple clustering keys in cassandra, or some
>> other way to order by multiple columns?
>>
>> For example, say I have a table of songs, and each song has a rating and
>> a date.
>>
>> I want to sort songs by rating first, and then with newer songs on top.
>>
>> So if two songs have 5 rating, and one's date is 1st Feb, the other is
>> 2nd Feb, then I want the 2nd feb one to be sorted above the 1st feb one.
>>
>> Like this:
>>
>> Select * from songs order by rating, createdAt
>>
>> Is this possible?
>>
>
>

Doing a calculation in a query?

2016-10-10 Thread Ali Akhtar

I have a table for tracking orders. Each order has an `ordered_at` field
(can be a timestamp, or a long with the milliseconds of the timestamp) and
`shipped_at` field (ditto, timestamp or long).

orderd_at tracks when the order was made.

shipped_at tracks when the order was shipped.

When retrieving the orders, I need to calculate an additional field, called
'shipment_delay'. This is simply, 'shipped_at - ordered_at`. I.e how long
it took between when the order was made, and when it was shipped.

The tricky part is, that if an order isn't yet shipped, then it should just
return how many days it has been since the order was made.

E.g, if order was made on Jan 1 and shipped on Jan 5th, shipment_delay = 4
 days (in milliseconds if needed)

If order made on Jan 1, but not yet shipped, and today is Jan 10th, then
shipment_delay = 10 days.

I then need to sort the orders in the order of 'shipment_delay desc', i.e
show the orders which took the longest, at the top.

Is it possible to define 'shipment_delay' at the table or query level, so
it can be used in the 'order by' clause, or if this ordering will have to
be done myself after the data is received?

Thanks.

1 2 >

1 - 100 of 180 matches

Mail list logo