Similarity Search on Cassandra

2017-06-11 Thread Antonio MourĂ£o
Hi devs,

I am a master degree student and i'm implementing Similarity Search on
Cassandra.
I have a short paper, but it is written in Portuguese, follow the link:
https://www.researchgate.net/publication/309121006_Busca_por_Similaridade_no_CassandraDB
Currently i have to implement a new operator that represents the similarity
search, something like:

select * from Table where PK SIMILARITY KEY;

This new operator will behave as a mix of direct search and range (or full)
search.
I already found where the parser is made and where the data is read in the
SSTables, but i can not find where the decision is made on which type of
search to use (direct or range/full).
I'm analyzing the code and debugging the execution, but i'm lost in the
queues of executions, after the system does the parser i can not find what
the next step.
Is there any documentation of the code or any guidance you can give me?

Best regards.


Reg:- Cassandra Data modelling for Search

2017-06-11 Thread @Nandan@
Hi,

Currently, I am working on data modeling for Video Company in which we have
different types of users as well as different user functionality.
But currently, my concern is about Search video module based on different
fields.

Query patterns are as below:-
1) Select video by actor.
2) select video by producer.
3) select video by music.
4) select video by actor and producer.
5) select video by actor and music.

Note: - In short, We want to establish an advanced search module by which
we can search by anyway and get the desired results.

During a search , we need partial search also such that if any user can
search "Harry" title, then we are able to give them result as all videos
whose
 title contains "Harry" at any location.

As per my ideas, I have to create separate tables such as video_by_actor,
video_by_producer etc.. and implement solr query on all tables. Otherwise,
is there any others way by which we can implement this search module
effectively.

Please suggest.

Best regards,


Re: AutoBoxing overhead in lambda expressions

2017-06-11 Thread Jeff Jirsa
Generally speaking,

Patches that decrease boxing/unboxing overhead are great, and we've made a
point of committing some of those in the past ( such as in CASSANDRA-12199
and CASSANDRA-8019 ) , though there are also some examples where we've
decided not to apply boxing changes to critical code paths that didn't have
a meaningful performance impact.

I would expect that if you refactor it in chunks - that is, break it down
into a handful of patches, where each patch is contained to a specific
subsystem (compaction, commitlog, repair, messaging, etc) so it could be
reasonably reviewed - such refactoring would likely be appreciated,
especially if it came with quantifiable performance increase (using
something like cstar https://github.com/datastax/cstar_perf or stress runs
or microbenchmarks or similar).

- Jeff


On Fri, Jun 9, 2017 at 10:50 AM, Ameya Sanjay Sanjay Ketkar <
ketk...@oregonstate.edu> wrote:

> Hello,
>
> I am  graduate researcher at Oregon State University, currently studying
> lambda expressions in java open source project.
> Using a generic Functional interface when a specialized primitive
> functional interface can be used, is one the lambda smells I am focussing
> on.
> for instance  Function could be IntUnaryOperator.
>
> https://github.com/apache/cassandra/pull/116  cassandra/pull/116>
>
> Is an instance of refactoring, of such lambda smell.
> I would love to contribute to this project, by refactoring such lambda
> smells.
>
> I am developing a tool to detect such opportunities.
> But all the refactoring done are manual and all instances are checked and
> verified by me manually, before creating a pull request.
>
> Regards,
> Ameya Ketkar
>
> Graduate Research Assistant
> Oregon State University