subject:"Re\: \[External\] Re\: Cassandra ad hoc search options"

RE: [External] Re: Cassandra ad hoc search options

2017-02-02 Thread Yu, John

Thanks for all the input. Is anyone aware of future enhancements in Cassandra 
itself in this area (besides the ticket below)? I tried looking into 4.0 but 
haven’t found much yet.

Regards,
John

From: Justin Cameron [mailto:jus...@instaclustr.com]
Sent: Tuesday, January 31, 2017 10:13 AM
To: user@cassandra.apache.org
Subject: Re: [External] Re: Cassandra ad hoc search options

+1

On Tue, 31 Jan 2017 at 10:04 Jonathan Haddad 
mailto:j...@jonhaddad.com>> wrote:
With regards to having DCs for specific workloads, it would be nice to have per 
DC indexes.  See https://issues.apache.org/jira/browse/CASSANDRA-12663.

On Tue, Jan 31, 2017 at 9:52 AM Justin Cameron 
mailto:jus...@instaclustr.com>> wrote:
Lucene/Elassandra and Spark serve different purposes.

Lucene & Elassandra are designed for real-time queries that have predicates on 
columns not in the Cassandra primary key (i.e. searches). For example if you 
have a "person" table with person_id as the primary key but you want to allow 
users of your app to search for users by their last name.

Spark is designed for batch and/or streaming analytical workloads (it can also 
do other things, but these are it's primary uses). For example you might want 
to know how many people of different age groups use your application.

Ideally you should separate these workloads from each other and from your 
operational workload (standard C* queries) into their own Cassandra 
datacenters, as they each have very different performance impacts & 
requirements.

On Tue, 31 Jan 2017 at 00:57 vincent gromakowski 
mailto:vincent.gromakow...@gmail.com>> wrote:
You can also have a look at https://github.com/strapdata/elassandra

2017-01-31 9:50 GMT+01:00 vincent gromakowski 
mailto:vincent.gromakow...@gmail.com>>:
The problem with adhoc queries on casssandra (with spark or not) is the 
partition model of cassandra that needs to be respected to avoid full scan 
queries (the link you mentioned explains all of them). With FiloDB, which works 
on cassandra, you can pushdown predicates of the partition key and segment key 
in an arbitrary order resulting in less full scan queries. Another advantage is 
the computed columns that can also prune partitions or segments so reduce the 
reads based on a subpart of the key (like a timerange of 2 hours or 10 min).
Anyway it's not magic and my personal analysis doesn't target filodb as a fully 
adhoc query solution but it's largely better than pure cassandra. You can 
easily have pushdown predicates on any combination of 1 to 3-5 columns 
depending on the dataset compared to pure cassandra where you need to provide a 
first key value to pushdown the second key predicate, then the third key...

2017-01-31 8:56 GMT+01:00 Yu, John 
mailto:john...@sandc.com>>:
Thanks. I thought you have given up Lucene for Spark, but it seems your Lucene 
still works.

Spark also has a Cassandra connector, and my questions were more towards that.
From 
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/3_selection.md,
 it seems there’re limitations on how much one can select the data to support 
ad hoc queries. It seems mostly limited to clustering columns. Maybe in other 
cases, it would result in full scan, but that’s going to be very slow.

Regards,
John

From: siddharth verma 
[mailto:sidd.verma29.l...@gmail.com<mailto:sidd.verma29.l...@gmail.com>]
Sent: Monday, January 30, 2017 10:20 PM

To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: [External] Re: Cassandra ad hoc search options

Hi,
Are you using the DataStax connector as well?
Yes, we used it to query on lucene index.

Does it support querying against any column well (not just clustering columns)?
Yes it does. We used lucene particularly for this purpose.
( You can use :
1. 
https://github.com/Stratio/cassandra-lucene-index/blob/branch-3.0.10/doc/documentation.rst#searching
2. https://www.youtube.com/watch?v=Hg5s-hXy_-M
for more details)

I’m wondering how it could build the index around them “on-the-fly”
You can build indexes at run time, but it takes time(took a lot of time on our 
cluster. Plus, CPU utilization went through the roof)

did you use Spark for the full set of data or just partial
We weren't allowed to install spark ( tech decision)
Some tech discussions going around for the bulk job ecosystem.

Hence as a work around, we used a faster scan utility.
For all the adhoc purposes/scripts, you could do a full scan.

I hope it helps.

Regards

On Tue, Jan 31, 2017 at 4:11 AM, Yu, John 
mailto:john...@sandc.com>> wrote:
A follow up question is: did you use Spark for the full set of data or just 
partial? In our case, I feel we need all the data to support ad hoc queries 
(with multiple conditional filters).

Thanks,
John

From: Yu, John [mailto:john...@sandc.com<mailto:john...@sandc.com>]
Sent: Monday, January 30, 2017 12:04 AM
To: user@cassandra.apache.

Re: [External] Re: Cassandra ad hoc search options

2017-01-31 Thread Justin Cameron

+1

On Tue, 31 Jan 2017 at 10:04 Jonathan Haddad  wrote:

> With regards to having DCs for specific workloads, it would be nice to
> have per DC indexes.  See
> https://issues.apache.org/jira/browse/CASSANDRA-12663.
>
> On Tue, Jan 31, 2017 at 9:52 AM Justin Cameron 
> wrote:
>
> Lucene/Elassandra and Spark serve different purposes.
>
> Lucene & Elassandra are designed for real-time queries that have
> predicates on columns not in the Cassandra primary key (i.e. searches). For
> example if you have a "person" table with person_id as the primary key but
> you want to allow users of your app to search for users by their last name.
>
> Spark is designed for batch and/or streaming analytical workloads (it can
> also do other things, but these are it's primary uses). For example you
> might want to know how many people of different age groups use your
> application.
>
> Ideally you should separate these workloads from each other and from your
> operational workload (standard C* queries) into their own Cassandra
> datacenters, as they each have very different performance impacts &
> requirements.
>
> On Tue, 31 Jan 2017 at 00:57 vincent gromakowski <
> vincent.gromakow...@gmail.com> wrote:
>
> You can also have a look at https://github.com/strapdata/elassandra
>
>
> 2017-01-31 9:50 GMT+01:00 vincent gromakowski <
> vincent.gromakow...@gmail.com>:
>
> The problem with adhoc queries on casssandra (with spark or not) is the
> partition model of cassandra that needs to be respected to avoid full scan
> queries (the link you mentioned explains all of them). With FiloDB, which
> works on cassandra, you can pushdown predicates of the partition key and
> segment key in an arbitrary order resulting in less full scan
> queries. Another advantage is the computed columns that can also prune
> partitions or segments so reduce the reads based on a subpart of the key
> (like a timerange of 2 hours or 10 min).
> Anyway it's not magic and my personal analysis doesn't target filodb as a
> fully adhoc query solution but it's largely better than pure cassandra. You
> can easily have pushdown predicates on any combination of 1 to 3-5 columns
> depending on the dataset compared to pure cassandra where you need to
> provide a first key value to pushdown the second key predicate, then the
> third key...
>
> 2017-01-31 8:56 GMT+01:00 Yu, John :
>
> Thanks. I thought you have given up Lucene for Spark, but it seems your
> Lucene still works.
>
>
>
> Spark also has a Cassandra connector, and my questions were more towards
> that.
>
> From
> https://github.com/datastax/spark-cassandra-connector/blob/master/doc/3_selection.md,
> it seems there’re limitations on how much one can select the data to
> support ad hoc queries. It seems mostly limited to clustering columns.
> Maybe in other cases, it would result in full scan, but that’s going to be
> very slow.
>
>
>
> Regards,
>
> John
>
>
>
> *From:* siddharth verma [mailto:sidd.verma29.l...@gmail.com]
> *Sent:* Monday, January 30, 2017 10:20 PM
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: [External] Re: Cassandra ad hoc search options
>
>
>
> Hi,
>
> *Are you using the DataStax connector as well? *
>
> Yes, we used it to query on lucene index.
>
>
>
> *Does it support querying against any column well (not just clustering
> columns)?*
>
> Yes it does. We used lucene particularly for this purpose.
>
> ( You can use :
>
> 1.
> https://github.com/Stratio/cassandra-lucene-index/blob/branch-3.0.10/doc/documentation.rst#searching
>
> 2. https://www.youtube.com/watch?v=Hg5s-hXy_-M
>
> for more details)
>
>
>
> *I’m wondering how it could build the index around them “on-the-fly”*
>
> You can build indexes at run time, but it takes time(took a lot of time on
> our cluster. Plus, CPU utilization went through the roof)
>
>
>
> *did you use Spark for the full set of data or just partial*
>
> We weren't allowed to install spark ( tech decision)
>
> Some tech discussions going around for the bulk job ecosystem.
>
>
>
> Hence as a work around, we used a faster scan utility.
>
> For all the adhoc purposes/scripts, you could do a full scan.
>
>
>
> I hope it helps.
>
>
>
> Regards
>
>
>
>
>
> On Tue, Jan 31, 2017 at 4:11 AM, Yu, John  wrote:
>
> A follow up question is: did you use Spark for the full set of data or
> just partial? In our case, I feel we need all the data to support ad hoc
> queries (with multiple conditional filters).
>
>
>
> Thanks,
>
> John
>
>
>
> *From:* Yu, John [mailto:john...@

Re: [External] Re: Cassandra ad hoc search options

2017-01-31 Thread Jonathan Haddad

With regards to having DCs for specific workloads, it would be nice to have
per DC indexes.  See https://issues.apache.org/jira/browse/CASSANDRA-12663.


On Tue, Jan 31, 2017 at 9:52 AM Justin Cameron 
wrote:

> Lucene/Elassandra and Spark serve different purposes.
>
> Lucene & Elassandra are designed for real-time queries that have
> predicates on columns not in the Cassandra primary key (i.e. searches). For
> example if you have a "person" table with person_id as the primary key but
> you want to allow users of your app to search for users by their last name.
>
> Spark is designed for batch and/or streaming analytical workloads (it can
> also do other things, but these are it's primary uses). For example you
> might want to know how many people of different age groups use your
> application.
>
> Ideally you should separate these workloads from each other and from your
> operational workload (standard C* queries) into their own Cassandra
> datacenters, as they each have very different performance impacts &
> requirements.
>
> On Tue, 31 Jan 2017 at 00:57 vincent gromakowski <
> vincent.gromakow...@gmail.com> wrote:
>
> You can also have a look at https://github.com/strapdata/elassandra
>
>
> 2017-01-31 9:50 GMT+01:00 vincent gromakowski <
> vincent.gromakow...@gmail.com>:
>
> The problem with adhoc queries on casssandra (with spark or not) is the
> partition model of cassandra that needs to be respected to avoid full scan
> queries (the link you mentioned explains all of them). With FiloDB, which
> works on cassandra, you can pushdown predicates of the partition key and
> segment key in an arbitrary order resulting in less full scan
> queries. Another advantage is the computed columns that can also prune
> partitions or segments so reduce the reads based on a subpart of the key
> (like a timerange of 2 hours or 10 min).
> Anyway it's not magic and my personal analysis doesn't target filodb as a
> fully adhoc query solution but it's largely better than pure cassandra. You
> can easily have pushdown predicates on any combination of 1 to 3-5 columns
> depending on the dataset compared to pure cassandra where you need to
> provide a first key value to pushdown the second key predicate, then the
> third key...
>
> 2017-01-31 8:56 GMT+01:00 Yu, John :
>
> Thanks. I thought you have given up Lucene for Spark, but it seems your
> Lucene still works.
>
>
>
> Spark also has a Cassandra connector, and my questions were more towards
> that.
>
> From
> https://github.com/datastax/spark-cassandra-connector/blob/master/doc/3_selection.md,
> it seems there’re limitations on how much one can select the data to
> support ad hoc queries. It seems mostly limited to clustering columns.
> Maybe in other cases, it would result in full scan, but that’s going to be
> very slow.
>
>
>
> Regards,
>
> John
>
>
>
> *From:* siddharth verma [mailto:sidd.verma29.l...@gmail.com]
> *Sent:* Monday, January 30, 2017 10:20 PM
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: [External] Re: Cassandra ad hoc search options
>
>
>
> Hi,
>
> *Are you using the DataStax connector as well? *
>
> Yes, we used it to query on lucene index.
>
>
>
> *Does it support querying against any column well (not just clustering
> columns)?*
>
> Yes it does. We used lucene particularly for this purpose.
>
> ( You can use :
>
> 1.
> https://github.com/Stratio/cassandra-lucene-index/blob/branch-3.0.10/doc/documentation.rst#searching
>
> 2. https://www.youtube.com/watch?v=Hg5s-hXy_-M
>
> for more details)
>
>
>
> *I’m wondering how it could build the index around them “on-the-fly”*
>
> You can build indexes at run time, but it takes time(took a lot of time on
> our cluster. Plus, CPU utilization went through the roof)
>
>
>
> *did you use Spark for the full set of data or just partial*
>
> We weren't allowed to install spark ( tech decision)
>
> Some tech discussions going around for the bulk job ecosystem.
>
>
>
> Hence as a work around, we used a faster scan utility.
>
> For all the adhoc purposes/scripts, you could do a full scan.
>
>
>
> I hope it helps.
>
>
>
> Regards
>
>
>
>
>
> On Tue, Jan 31, 2017 at 4:11 AM, Yu, John  wrote:
>
> A follow up question is: did you use Spark for the full set of data or
> just partial? In our case, I feel we need all the data to support ad hoc
> queries (with multiple conditional filters).
>
>
>
> Thanks,
>
> John
>
>
>
> *From:* Yu, John [mailto:john...@sandc.com]
> *Sent:* Monday, January 30, 2017 12:04 AM
> *To:* user@cassandra.ap

Re: [External] Re: Cassandra ad hoc search options

2017-01-31 Thread Justin Cameron

Lucene/Elassandra and Spark serve different purposes.

Lucene & Elassandra are designed for real-time queries that have predicates
on columns not in the Cassandra primary key (i.e. searches). For example if
you have a "person" table with person_id as the primary key but you want to
allow users of your app to search for users by their last name.

Spark is designed for batch and/or streaming analytical workloads (it can
also do other things, but these are it's primary uses). For example you
might want to know how many people of different age groups use your
application.

Ideally you should separate these workloads from each other and from your
operational workload (standard C* queries) into their own Cassandra
datacenters, as they each have very different performance impacts &
requirements.

On Tue, 31 Jan 2017 at 00:57 vincent gromakowski <
vincent.gromakow...@gmail.com> wrote:

> You can also have a look at https://github.com/strapdata/elassandra
>
>
> 2017-01-31 9:50 GMT+01:00 vincent gromakowski <
> vincent.gromakow...@gmail.com>:
>
> The problem with adhoc queries on casssandra (with spark or not) is the
> partition model of cassandra that needs to be respected to avoid full scan
> queries (the link you mentioned explains all of them). With FiloDB, which
> works on cassandra, you can pushdown predicates of the partition key and
> segment key in an arbitrary order resulting in less full scan
> queries. Another advantage is the computed columns that can also prune
> partitions or segments so reduce the reads based on a subpart of the key
> (like a timerange of 2 hours or 10 min).
> Anyway it's not magic and my personal analysis doesn't target filodb as a
> fully adhoc query solution but it's largely better than pure cassandra. You
> can easily have pushdown predicates on any combination of 1 to 3-5 columns
> depending on the dataset compared to pure cassandra where you need to
> provide a first key value to pushdown the second key predicate, then the
> third key...
>
> 2017-01-31 8:56 GMT+01:00 Yu, John :
>
> Thanks. I thought you have given up Lucene for Spark, but it seems your
> Lucene still works.
>
>
>
> Spark also has a Cassandra connector, and my questions were more towards
> that.
>
> From
> https://github.com/datastax/spark-cassandra-connector/blob/master/doc/3_selection.md,
> it seems there’re limitations on how much one can select the data to
> support ad hoc queries. It seems mostly limited to clustering columns.
> Maybe in other cases, it would result in full scan, but that’s going to be
> very slow.
>
>
>
> Regards,
>
> John
>
>
>
> *From:* siddharth verma [mailto:sidd.verma29.l...@gmail.com]
> *Sent:* Monday, January 30, 2017 10:20 PM
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: [External] Re: Cassandra ad hoc search options
>
>
>
> Hi,
>
> *Are you using the DataStax connector as well? *
>
> Yes, we used it to query on lucene index.
>
>
>
> *Does it support querying against any column well (not just clustering
> columns)?*
>
> Yes it does. We used lucene particularly for this purpose.
>
> ( You can use :
>
> 1.
> https://github.com/Stratio/cassandra-lucene-index/blob/branch-3.0.10/doc/documentation.rst#searching
>
> 2. https://www.youtube.com/watch?v=Hg5s-hXy_-M
>
> for more details)
>
>
>
> *I’m wondering how it could build the index around them “on-the-fly”*
>
> You can build indexes at run time, but it takes time(took a lot of time on
> our cluster. Plus, CPU utilization went through the roof)
>
>
>
> *did you use Spark for the full set of data or just partial*
>
> We weren't allowed to install spark ( tech decision)
>
> Some tech discussions going around for the bulk job ecosystem.
>
>
>
> Hence as a work around, we used a faster scan utility.
>
> For all the adhoc purposes/scripts, you could do a full scan.
>
>
>
> I hope it helps.
>
>
>
> Regards
>
>
>
>
>
> On Tue, Jan 31, 2017 at 4:11 AM, Yu, John  wrote:
>
> A follow up question is: did you use Spark for the full set of data or
> just partial? In our case, I feel we need all the data to support ad hoc
> queries (with multiple conditional filters).
>
>
>
> Thanks,
>
> John
>
>
>
> *From:* Yu, John [mailto:john...@sandc.com]
> *Sent:* Monday, January 30, 2017 12:04 AM
> *To:* user@cassandra.apache.org
> *Subject:* RE: [External] Re: Cassandra ad hoc search options
>
>
>
> Thanks for the input! Are you using the DataStax connector as well? Does
> it support querying against any column well (not just clustering columns)?
> I’m wondering how it could build the index around them “o

Re: [External] Re: Cassandra ad hoc search options

2017-01-31 Thread vincent gromakowski

You can also have a look at https://github.com/strapdata/elassandra


2017-01-31 9:50 GMT+01:00 vincent gromakowski :

> The problem with adhoc queries on casssandra (with spark or not) is the
> partition model of cassandra that needs to be respected to avoid full scan
> queries (the link you mentioned explains all of them). With FiloDB, which
> works on cassandra, you can pushdown predicates of the partition key and
> segment key in an arbitrary order resulting in less full scan
> queries. Another advantage is the computed columns that can also prune
> partitions or segments so reduce the reads based on a subpart of the key
> (like a timerange of 2 hours or 10 min).
> Anyway it's not magic and my personal analysis doesn't target filodb as a
> fully adhoc query solution but it's largely better than pure cassandra. You
> can easily have pushdown predicates on any combination of 1 to 3-5 columns
> depending on the dataset compared to pure cassandra where you need to
> provide a first key value to pushdown the second key predicate, then the
> third key...
>
> 2017-01-31 8:56 GMT+01:00 Yu, John :
>
>> Thanks. I thought you have given up Lucene for Spark, but it seems your
>> Lucene still works.
>>
>>
>>
>> Spark also has a Cassandra connector, and my questions were more towards
>> that.
>>
>> From https://github.com/datastax/spark-cassandra-connector/blob/
>> master/doc/3_selection.md, it seems there’re limitations on how much one
>> can select the data to support ad hoc queries. It seems mostly limited to
>> clustering columns. Maybe in other cases, it would result in full scan, but
>> that’s going to be very slow.
>>
>>
>>
>> Regards,
>>
>> John
>>
>>
>>
>> *From:* siddharth verma [mailto:sidd.verma29.l...@gmail.com]
>> *Sent:* Monday, January 30, 2017 10:20 PM
>>
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: [External] Re: Cassandra ad hoc search options
>>
>>
>>
>> Hi,
>>
>> *Are you using the DataStax connector as well? *
>>
>> Yes, we used it to query on lucene index.
>>
>>
>>
>> *Does it support querying against any column well (not just clustering
>> columns)?*
>>
>> Yes it does. We used lucene particularly for this purpose.
>>
>> ( You can use :
>>
>> 1. https://github.com/Stratio/cassandra-lucene-index/blob/branc
>> h-3.0.10/doc/documentation.rst#searching
>>
>> 2. https://www.youtube.com/watch?v=Hg5s-hXy_-M
>>
>> for more details)
>>
>>
>>
>> *I’m wondering how it could build the index around them “on-the-fly”*
>>
>> You can build indexes at run time, but it takes time(took a lot of time
>> on our cluster. Plus, CPU utilization went through the roof)
>>
>>
>>
>> *did you use Spark for the full set of data or just partial*
>>
>> We weren't allowed to install spark ( tech decision)
>>
>> Some tech discussions going around for the bulk job ecosystem.
>>
>>
>>
>> Hence as a work around, we used a faster scan utility.
>>
>> For all the adhoc purposes/scripts, you could do a full scan.
>>
>>
>>
>> I hope it helps.
>>
>>
>>
>> Regards
>>
>>
>>
>>
>>
>> On Tue, Jan 31, 2017 at 4:11 AM, Yu, John  wrote:
>>
>> A follow up question is: did you use Spark for the full set of data or
>> just partial? In our case, I feel we need all the data to support ad hoc
>> queries (with multiple conditional filters).
>>
>>
>>
>> Thanks,
>>
>> John
>>
>>
>>
>> *From:* Yu, John [mailto:john...@sandc.com]
>> *Sent:* Monday, January 30, 2017 12:04 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* RE: [External] Re: Cassandra ad hoc search options
>>
>>
>>
>> Thanks for the input! Are you using the DataStax connector as well? Does
>> it support querying against any column well (not just clustering columns)?
>> I’m wondering how it could build the index around them “on-the-fly”.
>>
>>
>>
>> Regards,
>>
>> John
>>
>>
>>
>> *From:* siddharth verma [mailto:sidd.verma29.l...@gmail.com
>> ]
>> *Sent:* Friday, January 27, 2017 12:15 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: [External] Re: Cassandra ad hoc search options
>>
>>
>>
>> Hi
>>
>> We used lucene stratio plugin with C*3.0.3
>>
>>
>>
>> Helped to solve a lot of some read patterns. Served well for p

Re: [External] Re: Cassandra ad hoc search options

2017-01-31 Thread vincent gromakowski

The problem with adhoc queries on casssandra (with spark or not) is the
partition model of cassandra that needs to be respected to avoid full scan
queries (the link you mentioned explains all of them). With FiloDB, which
works on cassandra, you can pushdown predicates of the partition key and
segment key in an arbitrary order resulting in less full scan
queries. Another advantage is the computed columns that can also prune
partitions or segments so reduce the reads based on a subpart of the key
(like a timerange of 2 hours or 10 min).
Anyway it's not magic and my personal analysis doesn't target filodb as a
fully adhoc query solution but it's largely better than pure cassandra. You
can easily have pushdown predicates on any combination of 1 to 3-5 columns
depending on the dataset compared to pure cassandra where you need to
provide a first key value to pushdown the second key predicate, then the
third key...

2017-01-31 8:56 GMT+01:00 Yu, John :

> Thanks. I thought you have given up Lucene for Spark, but it seems your
> Lucene still works.
>
>
>
> Spark also has a Cassandra connector, and my questions were more towards
> that.
>
> From https://github.com/datastax/spark-cassandra-connector/
> blob/master/doc/3_selection.md, it seems there’re limitations on how much
> one can select the data to support ad hoc queries. It seems mostly limited
> to clustering columns. Maybe in other cases, it would result in full scan,
> but that’s going to be very slow.
>
>
>
> Regards,
>
> John
>
>
>
> *From:* siddharth verma [mailto:sidd.verma29.l...@gmail.com]
> *Sent:* Monday, January 30, 2017 10:20 PM
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: [External] Re: Cassandra ad hoc search options
>
>
>
> Hi,
>
> *Are you using the DataStax connector as well? *
>
> Yes, we used it to query on lucene index.
>
>
>
> *Does it support querying against any column well (not just clustering
> columns)?*
>
> Yes it does. We used lucene particularly for this purpose.
>
> ( You can use :
>
> 1. https://github.com/Stratio/cassandra-lucene-index/blob/
> branch-3.0.10/doc/documentation.rst#searching
>
> 2. https://www.youtube.com/watch?v=Hg5s-hXy_-M
>
> for more details)
>
>
>
> *I’m wondering how it could build the index around them “on-the-fly”*
>
> You can build indexes at run time, but it takes time(took a lot of time on
> our cluster. Plus, CPU utilization went through the roof)
>
>
>
> *did you use Spark for the full set of data or just partial*
>
> We weren't allowed to install spark ( tech decision)
>
> Some tech discussions going around for the bulk job ecosystem.
>
>
>
> Hence as a work around, we used a faster scan utility.
>
> For all the adhoc purposes/scripts, you could do a full scan.
>
>
>
> I hope it helps.
>
>
>
> Regards
>
>
>
>
>
> On Tue, Jan 31, 2017 at 4:11 AM, Yu, John  wrote:
>
> A follow up question is: did you use Spark for the full set of data or
> just partial? In our case, I feel we need all the data to support ad hoc
> queries (with multiple conditional filters).
>
>
>
> Thanks,
>
> John
>
>
>
> *From:* Yu, John [mailto:john...@sandc.com]
> *Sent:* Monday, January 30, 2017 12:04 AM
> *To:* user@cassandra.apache.org
> *Subject:* RE: [External] Re: Cassandra ad hoc search options
>
>
>
> Thanks for the input! Are you using the DataStax connector as well? Does
> it support querying against any column well (not just clustering columns)?
> I’m wondering how it could build the index around them “on-the-fly”.
>
>
>
> Regards,
>
> John
>
>
>
> *From:* siddharth verma [mailto:sidd.verma29.l...@gmail.com
> ]
> *Sent:* Friday, January 27, 2017 12:15 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: [External] Re: Cassandra ad hoc search options
>
>
>
> Hi
>
> We used lucene stratio plugin with C*3.0.3
>
>
>
> Helped to solve a lot of some read patterns. Served well for prefix.
>
> But created problems as repairs failed repeatedly.
>
> We might have used it sub optimally, not sure.
>
>
>
> Later, we had to do away with it, and tried to serve most of the read
> patterns with materialised views. (currently C*3.0.9)
>
>
>
> Currently, for adhoc querries, we use spark or full scan.
>
>
>
> Regards,
>
>
>
> On Fri, Jan 27, 2017 at 1:03 PM, Yu, John  wrote:
>
> Thanks a lot. Mind sharing a couple of points where you feel it’s better
> than the alternatives.
>
>
>
> Regards,
>
> John
>
>
>
> *From:* Jonathan Haddad [mailto:j...@jonhaddad.com]
> *Sent:* Thursday, January 26, 2017 2:33 PM
> *To:

RE: [External] Re: Cassandra ad hoc search options

2017-01-30 Thread Yu, John

Thanks. I thought you have given up Lucene for Spark, but it seems your Lucene 
still works.

Spark also has a Cassandra connector, and my questions were more towards that.
From 
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/3_selection.md,
 it seems there’re limitations on how much one can select the data to support 
ad hoc queries. It seems mostly limited to clustering columns. Maybe in other 
cases, it would result in full scan, but that’s going to be very slow.

Regards,
John

From: siddharth verma [mailto:sidd.verma29.l...@gmail.com]
Sent: Monday, January 30, 2017 10:20 PM
To: user@cassandra.apache.org
Subject: Re: [External] Re: Cassandra ad hoc search options

Hi,
Are you using the DataStax connector as well?
Yes, we used it to query on lucene index.

Does it support querying against any column well (not just clustering columns)?
Yes it does. We used lucene particularly for this purpose.
( You can use :
1. 
https://github.com/Stratio/cassandra-lucene-index/blob/branch-3.0.10/doc/documentation.rst#searching
2. https://www.youtube.com/watch?v=Hg5s-hXy_-M
for more details)

I’m wondering how it could build the index around them “on-the-fly”
You can build indexes at run time, but it takes time(took a lot of time on our 
cluster. Plus, CPU utilization went through the roof)

did you use Spark for the full set of data or just partial
We weren't allowed to install spark ( tech decision)
Some tech discussions going around for the bulk job ecosystem.

Hence as a work around, we used a faster scan utility.
For all the adhoc purposes/scripts, you could do a full scan.

I hope it helps.

Regards


On Tue, Jan 31, 2017 at 4:11 AM, Yu, John 
mailto:john...@sandc.com>> wrote:
A follow up question is: did you use Spark for the full set of data or just 
partial? In our case, I feel we need all the data to support ad hoc queries 
(with multiple conditional filters).

Thanks,
John

From: Yu, John [mailto:john...@sandc.com<mailto:john...@sandc.com>]
Sent: Monday, January 30, 2017 12:04 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: RE: [External] Re: Cassandra ad hoc search options

Thanks for the input! Are you using the DataStax connector as well? Does it 
support querying against any column well (not just clustering columns)? I’m 
wondering how it could build the index around them “on-the-fly”.

Regards,
John

From: siddharth verma [mailto:sidd.verma29.l...@gmail.com]
Sent: Friday, January 27, 2017 12:15 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: [External] Re: Cassandra ad hoc search options

Hi
We used lucene stratio plugin with C*3.0.3

Helped to solve a lot of some read patterns. Served well for prefix.
But created problems as repairs failed repeatedly.
We might have used it sub optimally, not sure.

Later, we had to do away with it, and tried to serve most of the read patterns 
with materialised views. (currently C*3.0.9)

Currently, for adhoc querries, we use spark or full scan.

Regards,

On Fri, Jan 27, 2017 at 1:03 PM, Yu, John 
mailto:john...@sandc.com>> wrote:
Thanks a lot. Mind sharing a couple of points where you feel it’s better than 
the alternatives.

Regards,
John

From: Jonathan Haddad [mailto:j...@jonhaddad.com<mailto:j...@jonhaddad.com>]
Sent: Thursday, January 26, 2017 2:33 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [External] Re: Cassandra ad hoc search options

> With Cassandra, what are the options for ad hoc query/search similar to RDBMS?

Your best options are Spark w/ the DataStax connector or Presto.  Cassandra 
isn't built for ad-hoc queries so you need to use other tools to make it work.

On Thu, Jan 26, 2017 at 2:22 PM Yu, John 
mailto:john...@sandc.com>> wrote:
Hi All,

Hope I can get some help here. We’re using Cassandra for services, and recently 
we’re adding UI support.
With Cassandra, what are the options for ad hoc query/search similar to RDBMS? 
We love the features of Cassandra but it seems it’s a known “weakness” that it 
doesn’t come with strong support of indexing and ad hoc queries. There’re some 
recent development with SASI as part of secondary index. However I heard from a 
video where it says it shall not be extensively used.

Has anyone have much experience with SASI? How does it compare to Lucene plugin?
What is the direction of Apache Cassandra in the search area?

We’re also looking into Solr or ElasticSearch integration, but it seems it 
might take more efforts, and possibly involve data duplication.
For Solr, we don’t have DSE.
Sorry if this has been asked before, but I haven’t seen a more complete answer.

Thanks!
John

NOTICE OF CONFIDENTIALITY:
This message may contain information that is considered confidential and which 
may be prohibited from disclosure under applicable law or by contractual 
agreement. The information is intended solely for

RE: [External] Re: Cassandra ad hoc search options

2017-01-30 Thread Yu, John

Does this work with Cassandra, or provide an alternative? Thanks.

From: vincent gromakowski [mailto:vincent.gromakow...@gmail.com]
Sent: Monday, January 30, 2017 11:38 PM
To: user@cassandra.apache.org
Subject: Re: [External] Re: Cassandra ad hoc search options

I gave a try on spark+filodb and it's very interesting for ad-hoc queries

Le 31 janv. 2017 7:20 AM, "siddharth verma" 
mailto:sidd.verma29.l...@gmail.com>> a écrit :
Hi,
Are you using the DataStax connector as well?
Yes, we used it to query on lucene index.

Does it support querying against any column well (not just clustering columns)?
Yes it does. We used lucene particularly for this purpose.
( You can use :
1. 
https://github.com/Stratio/cassandra-lucene-index/blob/branch-3.0.10/doc/documentation.rst#searching
2. https://www.youtube.com/watch?v=Hg5s-hXy_-M
for more details)

I’m wondering how it could build the index around them “on-the-fly”
You can build indexes at run time, but it takes time(took a lot of time on our 
cluster. Plus, CPU utilization went through the roof)

did you use Spark for the full set of data or just partial
We weren't allowed to install spark ( tech decision)
Some tech discussions going around for the bulk job ecosystem.

Hence as a work around, we used a faster scan utility.
For all the adhoc purposes/scripts, you could do a full scan.

I hope it helps.

Regards

On Tue, Jan 31, 2017 at 4:11 AM, Yu, John 
mailto:john...@sandc.com>> wrote:
A follow up question is: did you use Spark for the full set of data or just 
partial? In our case, I feel we need all the data to support ad hoc queries 
(with multiple conditional filters).

Thanks,
John

From: Yu, John [mailto:john...@sandc.com<mailto:john...@sandc.com>]
Sent: Monday, January 30, 2017 12:04 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: RE: [External] Re: Cassandra ad hoc search options

Thanks for the input! Are you using the DataStax connector as well? Does it 
support querying against any column well (not just clustering columns)? I’m 
wondering how it could build the index around them “on-the-fly”.

Regards,
John

From: siddharth verma [mailto:sidd.verma29.l...@gmail.com]
Sent: Friday, January 27, 2017 12:15 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: [External] Re: Cassandra ad hoc search options

Hi
We used lucene stratio plugin with C*3.0.3

Helped to solve a lot of some read patterns. Served well for prefix.
But created problems as repairs failed repeatedly.
We might have used it sub optimally, not sure.

Later, we had to do away with it, and tried to serve most of the read patterns 
with materialised views. (currently C*3.0.9)

Currently, for adhoc querries, we use spark or full scan.

Regards,

On Fri, Jan 27, 2017 at 1:03 PM, Yu, John 
mailto:john...@sandc.com>> wrote:
Thanks a lot. Mind sharing a couple of points where you feel it’s better than 
the alternatives.

Regards,
John

From: Jonathan Haddad [mailto:j...@jonhaddad.com<mailto:j...@jonhaddad.com>]
Sent: Thursday, January 26, 2017 2:33 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [External] Re: Cassandra ad hoc search options

> With Cassandra, what are the options for ad hoc query/search similar to RDBMS?

Your best options are Spark w/ the DataStax connector or Presto.  Cassandra 
isn't built for ad-hoc queries so you need to use other tools to make it work.

On Thu, Jan 26, 2017 at 2:22 PM Yu, John 
mailto:john...@sandc.com>> wrote:
Hi All,

Hope I can get some help here. We’re using Cassandra for services, and recently 
we’re adding UI support.
With Cassandra, what are the options for ad hoc query/search similar to RDBMS? 
We love the features of Cassandra but it seems it’s a known “weakness” that it 
doesn’t come with strong support of indexing and ad hoc queries. There’re some 
recent development with SASI as part of secondary index. However I heard from a 
video where it says it shall not be extensively used.

Has anyone have much experience with SASI? How does it compare to Lucene plugin?
What is the direction of Apache Cassandra in the search area?

We’re also looking into Solr or ElasticSearch integration, but it seems it 
might take more efforts, and possibly involve data duplication.
For Solr, we don’t have DSE.
Sorry if this has been asked before, but I haven’t seen a more complete answer.

Thanks!
John

NOTICE OF CONFIDENTIALITY:
This message may contain information that is considered confidential and which 
may be prohibited from disclosure under applicable law or by contractual 
agreement. The information is intended solely for the use of the individual or 
entity named above. If you are not the intended recipient, you are hereby 
notified that any disclosure, copying, distribution or use of the information 
contained in or attached to this message is strictly prohibited. I

Re: [External] Re: Cassandra ad hoc search options

2017-01-30 Thread vincent gromakowski

I gave a try on spark+filodb and it's very interesting for ad-hoc queries

Le 31 janv. 2017 7:20 AM, "siddharth verma"  a
écrit :

Hi,
*Are you using the DataStax connector as well? *
Yes, we used it to query on lucene index.

*Does it support querying against any column well (not just clustering
columns)?*
Yes it does. We used lucene particularly for this purpose.
( You can use :
1. https://github.com/Stratio/cassandra-lucene-index/blob/branch-3.0.10/doc/
documentation.rst#searching
2. https://www.youtube.com/watch?v=Hg5s-hXy_-M
for more details)

*I’m wondering how it could build the index around them “on-the-fly”*
You can build indexes at run time, but it takes time(took a lot of time on
our cluster. Plus, CPU utilization went through the roof)

*did you use Spark for the full set of data or just partial*
We weren't allowed to install spark ( tech decision)
Some tech discussions going around for the bulk job ecosystem.

Hence as a work around, we used a faster scan utility.
For all the adhoc purposes/scripts, you could do a full scan.

I hope it helps.

Regards


On Tue, Jan 31, 2017 at 4:11 AM, Yu, John  wrote:

> A follow up question is: did you use Spark for the full set of data or
> just partial? In our case, I feel we need all the data to support ad hoc
> queries (with multiple conditional filters).
>
>
>
> Thanks,
>
> John
>
>
>
> *From:* Yu, John [mailto:john...@sandc.com]
> *Sent:* Monday, January 30, 2017 12:04 AM
> *To:* user@cassandra.apache.org
> *Subject:* RE: [External] Re: Cassandra ad hoc search options
>
>
>
> Thanks for the input! Are you using the DataStax connector as well? Does
> it support querying against any column well (not just clustering columns)?
> I’m wondering how it could build the index around them “on-the-fly”.
>
>
>
> Regards,
>
> John
>
>
>
> *From:* siddharth verma [mailto:sidd.verma29.l...@gmail.com
> ]
> *Sent:* Friday, January 27, 2017 12:15 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: [External] Re: Cassandra ad hoc search options
>
>
>
> Hi
>
> We used lucene stratio plugin with C*3.0.3
>
>
>
> Helped to solve a lot of some read patterns. Served well for prefix.
>
> But created problems as repairs failed repeatedly.
>
> We might have used it sub optimally, not sure.
>
>
>
> Later, we had to do away with it, and tried to serve most of the read
> patterns with materialised views. (currently C*3.0.9)
>
>
>
> Currently, for adhoc querries, we use spark or full scan.
>
>
>
> Regards,
>
>
>
> On Fri, Jan 27, 2017 at 1:03 PM, Yu, John  wrote:
>
> Thanks a lot. Mind sharing a couple of points where you feel it’s better
> than the alternatives.
>
>
>
> Regards,
>
> John
>
>
>
> *From:* Jonathan Haddad [mailto:j...@jonhaddad.com]
> *Sent:* Thursday, January 26, 2017 2:33 PM
> *To:* user@cassandra.apache.org
> *Subject:* [External] Re: Cassandra ad hoc search options
>
>
>
> > With Cassandra, what are the options for ad hoc query/search similar to
> RDBMS?
>
>
>
> Your best options are Spark w/ the DataStax connector or Presto.
> Cassandra isn't built for ad-hoc queries so you need to use other tools to
> make it work.
>
>
>
> On Thu, Jan 26, 2017 at 2:22 PM Yu, John  wrote:
>
> Hi All,
>
>
>
> Hope I can get some help here. We’re using Cassandra for services, and
> recently we’re adding UI support.
>
> With Cassandra, what are the options for ad hoc query/search similar to
> RDBMS? We love the features of Cassandra but it seems it’s a known
> “weakness” that it doesn’t come with strong support of indexing and ad hoc
> queries. There’re some recent development with SASI as part of secondary
> index. However I heard from a video where it says it shall not be
> extensively used.
>
>
>
> Has anyone have much experience with SASI? How does it compare to Lucene
> plugin?
>
> What is the direction of Apache Cassandra in the search area?
>
>
>
> We’re also looking into Solr or ElasticSearch integration, but it seems it
> might take more efforts, and possibly involve data duplication.
>
> For Solr, we don’t have DSE.
>
> Sorry if this has been asked before, but I haven’t seen a more complete
> answer.
>
>
>
> Thanks!
>
> John
> --
>
> NOTICE OF CONFIDENTIALITY:
> This message may contain information that is considered confidential and
> which may be prohibited from disclosure under applicable law or by
> contractual agreement. The information is intended solely for the use of
> the individual or entity named above. If you are not the intended
> recipient, you are hereby notified that any disclosu

Re: [External] Re: Cassandra ad hoc search options

2017-01-30 Thread siddharth verma

Hi,
*Are you using the DataStax connector as well? *
Yes, we used it to query on lucene index.

*Does it support querying against any column well (not just clustering
columns)?*
Yes it does. We used lucene particularly for this purpose.
( You can use :
1.
https://github.com/Stratio/cassandra-lucene-index/blob/branch-3.0.10/doc/documentation.rst#searching
2. https://www.youtube.com/watch?v=Hg5s-hXy_-M
for more details)

*I’m wondering how it could build the index around them “on-the-fly”*
You can build indexes at run time, but it takes time(took a lot of time on
our cluster. Plus, CPU utilization went through the roof)

*did you use Spark for the full set of data or just partial*
We weren't allowed to install spark ( tech decision)
Some tech discussions going around for the bulk job ecosystem.

Hence as a work around, we used a faster scan utility.
For all the adhoc purposes/scripts, you could do a full scan.

I hope it helps.

Regards


On Tue, Jan 31, 2017 at 4:11 AM, Yu, John  wrote:

> A follow up question is: did you use Spark for the full set of data or
> just partial? In our case, I feel we need all the data to support ad hoc
> queries (with multiple conditional filters).
>
>
>
> Thanks,
>
> John
>
>
>
> *From:* Yu, John [mailto:john...@sandc.com]
> *Sent:* Monday, January 30, 2017 12:04 AM
> *To:* user@cassandra.apache.org
> *Subject:* RE: [External] Re: Cassandra ad hoc search options
>
>
>
> Thanks for the input! Are you using the DataStax connector as well? Does
> it support querying against any column well (not just clustering columns)?
> I’m wondering how it could build the index around them “on-the-fly”.
>
>
>
> Regards,
>
> John
>
>
>
> *From:* siddharth verma [mailto:sidd.verma29.l...@gmail.com
> ]
> *Sent:* Friday, January 27, 2017 12:15 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: [External] Re: Cassandra ad hoc search options
>
>
>
> Hi
>
> We used lucene stratio plugin with C*3.0.3
>
>
>
> Helped to solve a lot of some read patterns. Served well for prefix.
>
> But created problems as repairs failed repeatedly.
>
> We might have used it sub optimally, not sure.
>
>
>
> Later, we had to do away with it, and tried to serve most of the read
> patterns with materialised views. (currently C*3.0.9)
>
>
>
> Currently, for adhoc querries, we use spark or full scan.
>
>
>
> Regards,
>
>
>
> On Fri, Jan 27, 2017 at 1:03 PM, Yu, John  wrote:
>
> Thanks a lot. Mind sharing a couple of points where you feel it’s better
> than the alternatives.
>
>
>
> Regards,
>
> John
>
>
>
> *From:* Jonathan Haddad [mailto:j...@jonhaddad.com]
> *Sent:* Thursday, January 26, 2017 2:33 PM
> *To:* user@cassandra.apache.org
> *Subject:* [External] Re: Cassandra ad hoc search options
>
>
>
> > With Cassandra, what are the options for ad hoc query/search similar to
> RDBMS?
>
>
>
> Your best options are Spark w/ the DataStax connector or Presto.
> Cassandra isn't built for ad-hoc queries so you need to use other tools to
> make it work.
>
>
>
> On Thu, Jan 26, 2017 at 2:22 PM Yu, John  wrote:
>
> Hi All,
>
>
>
> Hope I can get some help here. We’re using Cassandra for services, and
> recently we’re adding UI support.
>
> With Cassandra, what are the options for ad hoc query/search similar to
> RDBMS? We love the features of Cassandra but it seems it’s a known
> “weakness” that it doesn’t come with strong support of indexing and ad hoc
> queries. There’re some recent development with SASI as part of secondary
> index. However I heard from a video where it says it shall not be
> extensively used.
>
>
>
> Has anyone have much experience with SASI? How does it compare to Lucene
> plugin?
>
> What is the direction of Apache Cassandra in the search area?
>
>
>
> We’re also looking into Solr or ElasticSearch integration, but it seems it
> might take more efforts, and possibly involve data duplication.
>
> For Solr, we don’t have DSE.
>
> Sorry if this has been asked before, but I haven’t seen a more complete
> answer.
>
>
>
> Thanks!
>
> John
> --
>
> NOTICE OF CONFIDENTIALITY:
> This message may contain information that is considered confidential and
> which may be prohibited from disclosure under applicable law or by
> contractual agreement. The information is intended solely for the use of
> the individual or entity named above. If you are not the intended
> recipient, you are hereby notified that any disclosure, copying,
> distribution or use of the information contained in or attached to this
> message is strictly prohibited. If you have received this email
> transmission in error, please notify the sender by replying to this email
> and then delete it from your system.
>
>
>
>
>
> --
>
> Siddharth Verma
>
> (Visit https://github.com/siddv29/cfs for a high speed cassandra full
> table scan)
>



-- 
Siddharth Verma
(Visit https://github.com/siddv29/cfs for a high speed cassandra full table
scan)

RE: [External] Re: Cassandra ad hoc search options

2017-01-30 Thread Yu, John

A follow up question is: did you use Spark for the full set of data or just 
partial? In our case, I feel we need all the data to support ad hoc queries 
(with multiple conditional filters).

Thanks,
John

From: Yu, John [mailto:john...@sandc.com]
Sent: Monday, January 30, 2017 12:04 AM
To: user@cassandra.apache.org
Subject: RE: [External] Re: Cassandra ad hoc search options

Thanks for the input! Are you using the DataStax connector as well? Does it 
support querying against any column well (not just clustering columns)? I’m 
wondering how it could build the index around them “on-the-fly”.

Regards,
John

From: siddharth verma [mailto:sidd.verma29.l...@gmail.com]
Sent: Friday, January 27, 2017 12:15 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: [External] Re: Cassandra ad hoc search options

Hi
We used lucene stratio plugin with C*3.0.3

Helped to solve a lot of some read patterns. Served well for prefix.
But created problems as repairs failed repeatedly.
We might have used it sub optimally, not sure.

Later, we had to do away with it, and tried to serve most of the read patterns 
with materialised views. (currently C*3.0.9)

Currently, for adhoc querries, we use spark or full scan.

Regards,

On Fri, Jan 27, 2017 at 1:03 PM, Yu, John 
mailto:john...@sandc.com>> wrote:
Thanks a lot. Mind sharing a couple of points where you feel it’s better than 
the alternatives.

Regards,
John

From: Jonathan Haddad [mailto:j...@jonhaddad.com<mailto:j...@jonhaddad.com>]
Sent: Thursday, January 26, 2017 2:33 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [External] Re: Cassandra ad hoc search options

> With Cassandra, what are the options for ad hoc query/search similar to RDBMS?

Your best options are Spark w/ the DataStax connector or Presto.  Cassandra 
isn't built for ad-hoc queries so you need to use other tools to make it work.

On Thu, Jan 26, 2017 at 2:22 PM Yu, John 
mailto:john...@sandc.com>> wrote:
Hi All,

Hope I can get some help here. We’re using Cassandra for services, and recently 
we’re adding UI support.
With Cassandra, what are the options for ad hoc query/search similar to RDBMS? 
We love the features of Cassandra but it seems it’s a known “weakness” that it 
doesn’t come with strong support of indexing and ad hoc queries. There’re some 
recent development with SASI as part of secondary index. However I heard from a 
video where it says it shall not be extensively used.

Has anyone have much experience with SASI? How does it compare to Lucene plugin?
What is the direction of Apache Cassandra in the search area?

We’re also looking into Solr or ElasticSearch integration, but it seems it 
might take more efforts, and possibly involve data duplication.
For Solr, we don’t have DSE.
Sorry if this has been asked before, but I haven’t seen a more complete answer.

Thanks!
John

NOTICE OF CONFIDENTIALITY:
This message may contain information that is considered confidential and which 
may be prohibited from disclosure under applicable law or by contractual 
agreement. The information is intended solely for the use of the individual or 
entity named above. If you are not the intended recipient, you are hereby 
notified that any disclosure, copying, distribution or use of the information 
contained in or attached to this message is strictly prohibited. If you have 
received this email transmission in error, please notify the sender by replying 
to this email and then delete it from your system.

--
Siddharth Verma
(Visit https://github.com/siddv29/cfs for a high speed cassandra full table 
scan)

RE: [External] Re: Cassandra ad hoc search options

2017-01-30 Thread Yu, John

Thanks for the input! Are you using the DataStax connector as well? Does it 
support querying against any column well (not just clustering columns)? I’m 
wondering how it could build the index around them “on-the-fly”.

Regards,
John

From: siddharth verma [mailto:sidd.verma29.l...@gmail.com]
Sent: Friday, January 27, 2017 12:15 AM
To: user@cassandra.apache.org
Subject: Re: [External] Re: Cassandra ad hoc search options

Hi
We used lucene stratio plugin with C*3.0.3

Helped to solve a lot of some read patterns. Served well for prefix.
But created problems as repairs failed repeatedly.
We might have used it sub optimally, not sure.

Later, we had to do away with it, and tried to serve most of the read patterns 
with materialised views. (currently C*3.0.9)

Currently, for adhoc querries, we use spark or full scan.

Regards,

On Fri, Jan 27, 2017 at 1:03 PM, Yu, John 
mailto:john...@sandc.com>> wrote:
Thanks a lot. Mind sharing a couple of points where you feel it’s better than 
the alternatives.

Regards,
John

From: Jonathan Haddad [mailto:j...@jonhaddad.com<mailto:j...@jonhaddad.com>]
Sent: Thursday, January 26, 2017 2:33 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [External] Re: Cassandra ad hoc search options

> With Cassandra, what are the options for ad hoc query/search similar to RDBMS?

Your best options are Spark w/ the DataStax connector or Presto.  Cassandra 
isn't built for ad-hoc queries so you need to use other tools to make it work.

On Thu, Jan 26, 2017 at 2:22 PM Yu, John 
mailto:john...@sandc.com>> wrote:
Hi All,

Hope I can get some help here. We’re using Cassandra for services, and recently 
we’re adding UI support.
With Cassandra, what are the options for ad hoc query/search similar to RDBMS? 
We love the features of Cassandra but it seems it’s a known “weakness” that it 
doesn’t come with strong support of indexing and ad hoc queries. There’re some 
recent development with SASI as part of secondary index. However I heard from a 
video where it says it shall not be extensively used.

Has anyone have much experience with SASI? How does it compare to Lucene plugin?
What is the direction of Apache Cassandra in the search area?

We’re also looking into Solr or ElasticSearch integration, but it seems it 
might take more efforts, and possibly involve data duplication.
For Solr, we don’t have DSE.
Sorry if this has been asked before, but I haven’t seen a more complete answer.

Thanks!
John

NOTICE OF CONFIDENTIALITY:
This message may contain information that is considered confidential and which 
may be prohibited from disclosure under applicable law or by contractual 
agreement. The information is intended solely for the use of the individual or 
entity named above. If you are not the intended recipient, you are hereby 
notified that any disclosure, copying, distribution or use of the information 
contained in or attached to this message is strictly prohibited. If you have 
received this email transmission in error, please notify the sender by replying 
to this email and then delete it from your system.

--
Siddharth Verma
(Visit https://github.com/siddv29/cfs for a high speed cassandra full table 
scan)

Re: [External] Re: Cassandra ad hoc search options

2017-01-27 Thread siddharth verma

Hi
We used lucene stratio plugin with C*3.0.3

Helped to solve a lot of some read patterns. Served well for prefix.
But created problems as repairs failed repeatedly.
We might have used it sub optimally, not sure.

Later, we had to do away with it, and tried to serve most of the read
patterns with materialised views. (currently C*3.0.9)

Currently, for adhoc querries, we use spark or full scan.

Regards,

On Fri, Jan 27, 2017 at 1:03 PM, Yu, John  wrote:

> Thanks a lot. Mind sharing a couple of points where you feel it’s better
> than the alternatives.
>
>
>
> Regards,
>
> John
>
>
>
> *From:* Jonathan Haddad [mailto:j...@jonhaddad.com]
> *Sent:* Thursday, January 26, 2017 2:33 PM
> *To:* user@cassandra.apache.org
> *Subject:* [External] Re: Cassandra ad hoc search options
>
>
>
> > With Cassandra, what are the options for ad hoc query/search similar to
> RDBMS?
>
>
>
> Your best options are Spark w/ the DataStax connector or Presto.
> Cassandra isn't built for ad-hoc queries so you need to use other tools to
> make it work.
>
>
>
> On Thu, Jan 26, 2017 at 2:22 PM Yu, John  wrote:
>
> Hi All,
>
>
>
> Hope I can get some help here. We’re using Cassandra for services, and
> recently we’re adding UI support.
>
> With Cassandra, what are the options for ad hoc query/search similar to
> RDBMS? We love the features of Cassandra but it seems it’s a known
> “weakness” that it doesn’t come with strong support of indexing and ad hoc
> queries. There’re some recent development with SASI as part of secondary
> index. However I heard from a video where it says it shall not be
> extensively used.
>
>
>
> Has anyone have much experience with SASI? How does it compare to Lucene
> plugin?
>
> What is the direction of Apache Cassandra in the search area?
>
>
>
> We’re also looking into Solr or ElasticSearch integration, but it seems it
> might take more efforts, and possibly involve data duplication.
>
> For Solr, we don’t have DSE.
>
> Sorry if this has been asked before, but I haven’t seen a more complete
> answer.
>
>
>
> Thanks!
>
> John
> --
>
> NOTICE OF CONFIDENTIALITY:
> This message may contain information that is considered confidential and
> which may be prohibited from disclosure under applicable law or by
> contractual agreement. The information is intended solely for the use of
> the individual or entity named above. If you are not the intended
> recipient, you are hereby notified that any disclosure, copying,
> distribution or use of the information contained in or attached to this
> message is strictly prohibited. If you have received this email
> transmission in error, please notify the sender by replying to this email
> and then delete it from your system.
>
>


-- 
Siddharth Verma
(Visit https://github.com/siddv29/cfs for a high speed cassandra full table
scan)

RE: [External] Re: Cassandra ad hoc search options

2017-01-26 Thread Yu, John

Thanks a lot. Mind sharing a couple of points where you feel it’s better than 
the alternatives.

Regards,
John

From: Jonathan Haddad [mailto:j...@jonhaddad.com]
Sent: Thursday, January 26, 2017 2:33 PM
To: user@cassandra.apache.org
Subject: [External] Re: Cassandra ad hoc search options

> With Cassandra, what are the options for ad hoc query/search similar to RDBMS?

Your best options are Spark w/ the DataStax connector or Presto.  Cassandra 
isn't built for ad-hoc queries so you need to use other tools to make it work.

On Thu, Jan 26, 2017 at 2:22 PM Yu, John 
mailto:john...@sandc.com>> wrote:
Hi All,

Hope I can get some help here. We’re using Cassandra for services, and recently 
we’re adding UI support.
With Cassandra, what are the options for ad hoc query/search similar to RDBMS? 
We love the features of Cassandra but it seems it’s a known “weakness” that it 
doesn’t come with strong support of indexing and ad hoc queries. There’re some 
recent development with SASI as part of secondary index. However I heard from a 
video where it says it shall not be extensively used.

Has anyone have much experience with SASI? How does it compare to Lucene plugin?
What is the direction of Apache Cassandra in the search area?

We’re also looking into Solr or ElasticSearch integration, but it seems it 
might take more efforts, and possibly involve data duplication.
For Solr, we don’t have DSE.
Sorry if this has been asked before, but I haven’t seen a more complete answer.

Thanks!
John

NOTICE OF CONFIDENTIALITY:
This message may contain information that is considered confidential and which 
may be prohibited from disclosure under applicable law or by contractual 
agreement. The information is intended solely for the use of the individual or 
entity named above. If you are not the intended recipient, you are hereby 
notified that any disclosure, copying, distribution or use of the information 
contained in or attached to this message is strictly prohibited. If you have 
received this email transmission in error, please notify the sender by replying 
to this email and then delete it from your system.

RE: [External] Re: Cassandra ad hoc search options

Re: [External] Re: Cassandra ad hoc search options

Re: [External] Re: Cassandra ad hoc search options

Re: [External] Re: Cassandra ad hoc search options

Re: [External] Re: Cassandra ad hoc search options

Re: [External] Re: Cassandra ad hoc search options

RE: [External] Re: Cassandra ad hoc search options

RE: [External] Re: Cassandra ad hoc search options

Re: [External] Re: Cassandra ad hoc search options

Re: [External] Re: Cassandra ad hoc search options

RE: [External] Re: Cassandra ad hoc search options

RE: [External] Re: Cassandra ad hoc search options

Re: [External] Re: Cassandra ad hoc search options

RE: [External] Re: Cassandra ad hoc search options

14 matches

Site Navigation

Mail list logo

Footer information