Re: Counters question - is there a better way to count

2013-12-05 Thread Andy Twigg
How many distinct uid,someid pairs will you have?
On Dec 5, 2013 3:44 PM, "Christopher Wirt"  wrote:

> I want to build a really simple column family which counts the occurrence
> of a single event X.
>
>
>
> Once we reach Y occurrences of X the counter resets to 0
>
>
>
> The obvious way to do this is with a counter CF.
>
>
>
> CREATE TABLE xcounter1 (
>
> id uuid,
>
> someid int,
>
> count counter
>
> ) PRIMARY KEY (uid, someid)
>
>
>
> This is how I’ve always done it in the past, but I’ve been told to avoid
> counters for various reasons, performance, consistency etc..
>
> I’m not too bothered about 100% absolute consistency, however read
> performance is certainly a big concern.
>
>
>
> So I was thinking to avoid using counters I could do something like this.
>
>
>
> CREATE TABLE xcounter2 (
>
> id uuid,
>
> someid int,
>
> time timeuuid
>
> ) PRIMARY KEY (uid, someid, time)
>
>
>
> Then retrieve all events and count in memory. Delete all id, someid
> records once I hit Y.
>
>
>
> Or I could
>
> CREATE TABLE xcounter3 (
>
> id uuid,
>
> someid int,
>
> time timeuuid,
>
> Ycount int
>
> ) PRIMARY KEY (uid, someid, time)
>
>
>
> Insert a ‘Ycount’ on each occurrence of the event.
>
> Only retrieve the last Y value inserted on reading
>
> Then delete all records once I hit the magic Y value.
>
>
>
>
>
> Anyone have any interesting thoughts or insight on what is likely to give
> me the best read performance?
>
> There will be 100’s of someid to each id. Reads will be 5-10x the writes.
>
>
>
>
>
> Thanks,
>
>
>
> Chris
>


Re: Is there update-in-place on maps?

2013-08-06 Thread Andy Twigg
Counters can be atomically incremented (
http://wiki.apache.org/cassandra/Counters). Pick a UUID for the counter,
and use that: c=map.get(k); c.incr()


On 6 August 2013 11:01, Jan Algermissen  wrote:

>
> On 06.08.2013, at 11:36, Andy Twigg  wrote:
>
> > Store pointers to counters as map values?
>
> Sorry, but this fits into nothing I know about C* so far - can you explain?
>
> Jan
>
>


-- 
Dr Andy Twigg
Junior Research Fellow, St Johns College, Oxford
Room 351, Department of Computer Science
http://www.cs.ox.ac.uk/people/andy.twigg/
andy.tw...@cs.ox.ac.uk | +447799647538


Re: Is there update-in-place on maps?

2013-08-06 Thread Andy Twigg
Store pointers to counters as map values?


Re: random thoughts for MUCH faster key lookup in cassandra

2013-05-29 Thread Andy Twigg
How would you implement range queries?



On 29 May 2013 17:49, Hiller, Dean  wrote:

> We recently ran into too much data in one CF because LCS can't really run
> in parallel on one CF in a single tier which got me thinking, why doesn't
> the CF directoy have 100 or 1000 directories 0-999 and cassandra hash the
> key to which directory it would go in and then put it in one of the
> sstables in that directory.  This would lead to
>
>  1.  Parallel compaction of LCS in a single CF   Yeah, faster
> compactions since there is less to sort in each directory(and it can be
> done in parallel too)
>  2.  Help with fast key lookups as it hashes to one of the 1000
> directories very quickly and then just needs to find the key in one of the
> sstables which are sorted (there would be 1000x less sstables in each
> directory than in one big CF)
>
> Am I on crack here? Or does that seem like it would be a pretty good
> direction to go?
>
> Maybe this is only because our system has 98% of it's data in one CF while
> other systems have 10% of their data in each CF though.  I still tend to
> think a lot of people will end up with 80% of their data in one CF and 20%
> in all the other CF's…isn't pareto's principal a natural tendency and if it
> is, maybe the above feature should be considered?
>
> Later,
> Dean
>



-- 
Dr Andy Twigg
Junior Research Fellow, St Johns College, Oxford
Room 351, Department of Computer Science
http://www.cs.ox.ac.uk/people/andy.twigg/
andy.tw...@cs.ox.ac.uk | +447799647538


Re: subscribe request

2013-02-14 Thread Andy Twigg
i was hoping for a rick roll.

On 14 February 2013 16:55, Eric Evans  wrote:
> This is new.
>
> On Thu, Feb 14, 2013 at 9:24 AM, Muntasir Raihan Rahman
>  wrote:
>>
>>
>> --
>> Best Regards
>> Muntasir Raihan Rahman
>> Email: muntasir.rai...@gmail.com
>> Phone: 1-217-979-9307
>> Department of Computer Science,
>> University of Illinois Urbana Champaign,
>> 3111 Siebel Center,
>> 201 N. Goodwin Avenue,
>> Urbana, IL  61801
>
>
>
> --
> Eric Evans
> Acunu | http://www.acunu.com | @acunu


Fwd: British Conference on Databases 2013 - Big Data special.

2013-01-14 Thread Andy Twigg
Dear all,

[Apologies if you receive this CFP multiple times or are uninterested]

I am organizing the British Conference on Databases(BNCOD) this year and we
would very much  like to see some industrial contributions around Big Data.
How have you used Hadoop, HBase, Cassandra, Machine learning techniques in
a way that others might want to know about? All contributions are
peer-reviewed so the quality should be somewhat high. Get in touch with me
if you have any questions.

29th British National Conference on Databases
University of Oxford, United Kingdom
 8-10 July 2013
 http://www.cs.ox.ac.uk/bncod2013/

CALL FOR PAPERS
 Abstract deadline: 31 January, 2013
 Paper deadline: 7 February 2013


BNCOD 2013 seeks research papers for presentation at the conference
and subsequent publication. It welcomes research papers on a broad
range of topics related to data-centric computation. For some years,
every edition of BNCOD has centred around a main theme, acting as a
focal point for keynote addresses, tutorials, and research papers. The
theme of BNCOD 2013 will be Big Data; it encompases a growing need to
manage data that is too big, too fast, or too hard for the existing
technology (Sam Madden: From Databases to Big Data. IEEE Internet
Computing 16(3): 4-6 (2012)).

BNCOD promises a very exciting programme featuring keynotes and
tutorials by distinguished researchers.

Christoph Koch will speak on Compilation and Synthesis in Big Data
Analytics and Dan Suciu will speak on Big Data Begets Big Database
Theory. There will be a further keynote by Peter Buneman.

There will also be tutorials on Querying Big Social Data by Wenfei Fan
and on Big Data Analytics by Chris Re.


TOPICS OF INTERESTS

The topics listed below are intended as a sample; we encourage
submissions on all data-centric topics.

Systems for Data Management: data system architecture; storage,
replication and consistency; physical representations; query and
dataflow processing

Scalable Data Analysis: complex queries and search; approximate
querying; scalable statistical methods; management of uncertainty and
reasoning at scale; data privacy and security; data mining and
knowledge discovery

Management of Very Large Data Systems: availability; adaptivity and
self-tuning; power management; virtualization

Data Models and Languages: XML and semi-structured data; multi-media,
temporal and spatial data; data streams; declarative languages;
language interfaces for databases

Domain-Specific Data Management: methods and systems for science;
networks and mobility; ubiquitous computing; sensor databases

Management of Web and Heterogeneous Data: information extraction;
information integration; meta-data management; data cleaning; service
oriented architectures

User Interfaces and Social Data: data visualization; collaborative
data analysis and curation; social networks; email and messaging
analytics

Data and Knowledge: Knowledge base management, reasoning over
incomplete and/or inconsistent data, ontology-based data access,
ontology querying, semantic query optimization, storing and
manipulating RDF data.


SUBMISSION GUIDELINES

The conference management tool for the submission of abstracts and
papers is accessible at
http://www.easychair.org/conferences/?conf=bncod2013

Full papers (from 12 to 14 pages), short papers (from 4 to 10 pages),
system descriptions and demonstrations (from 4 to 10 pages) may be
submitted.

As in previous years, papers will be published by Springer as a volume
in the Lecture Notes in Computer Science (LNCS) series. Accepted
papers will only be published if they are presented in person by a
registered author at the conference.

Submissions are reviewed in a single-blind manner. They must be in PDF
and formatted according to the Springer guidelines for the LNCS
series:
http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0




-- 
Dr Andy Twigg
Junior Research Fellow, St Johns College, Oxford
Room 351, Department of Computer Science
http://www.cs.ox.ac.uk/people/andy.twigg/
andy.tw...@cs.ox.ac.uk | +447799647538



-- 
Dr Andy Twigg
Junior Research Fellow, St Johns College, Oxford
Room 351, Department of Computer Science
http://www.cs.ox.ac.uk/people/andy.twigg/
andy.tw...@cs.ox.ac.uk | +447799647538