Re: [Status Update] Apache Cassandra backend for Sling

2013-06-30 Thread Dishara Wijewardana
On Fri, Jun 28, 2013 at 4:37 AM, Ian Boston  wrote:

> Hi,
> Have you tried the TypeInferringSerializer for the value serializer ?
> That claims to be detect what the column value is based on the Byte array.
>
> Failing that, I would consider making everything byte[] and using your own
> serializer that writes and read values to a byte[] using DataInputStream
> DataOutputStream.
>
> [2] Is an example of a serializer written for that purpose that was used
> with Cassandra over raw Thrift. Its not easy to read what it outputs to the
> storage layer, but it is compact and efficient. I would not use it directly
> as it does some very specific things like slicing large byte[]s into 1MB
> chunks and bypassing the 64K limit on reading and writing UTF8 strings with
> DataInputStream.
>
> Try the TypeInferringSerializer first. If it works great, no need to do
> anything more complex.
>

Hi,
In fact I was able to add as many params as I wanted with the same
configurations. But TypeInferringSerializer is a useful one too which might
need in future.
Also I was thinking rather than storing resource meta data as String
values, how about storing a serialized object as you mentioned ? It will be
clear. But I am not sure about the performance. Because when we have multi
valued columns like meta data we have to insert them in a single String as
comma separated values. It is scalable if we have a Bean for Cassandra
Resource ? What do you think ?

And I did a first cut of this  but with many TODOs ;-),  where getResource
method is implemented and currently all the content is printed, but I have
not implemented methods in CassandraResource yet. This is just a POC to
test whether the proposed model works. Apparently it works [1].  See
 CassandraDataPopulator class which is a plain java test class added for
the moment to test the POC.(I am moving this to a proper JUnit)

TODOs
- I am in the process of  finishing the implementation of Cassandra
Resource, CassandaResource Provider and etc END to END.
- Move to JUnit test framework and  write more tests for each scenario
where I can extend this to Mockito (I am still not clear how Mockito comes
in to the picture) in near future.
- Change the implementation based on the feedbacks from the community.
- Parameterize the constants as much as possible to read from a property
file.


[1] -
https://cassandra-backend-for-sling.googlecode.com/svn/trunk/main/cassandra

Thanks

>
>
> Ian
>
> 1
>
> http://hector-client.github.io/hector/source/content/API/core/0.8.0-2/me/prettyprint/cassandra/serializers/TypeInferringSerializer.html
>
> 2
>
> https://github.com/ieb/sparsemapcontent/tree/master/core/src/main/java/org/sakaiproject/nakamura/lite/storage/spi/types
>
>
> On 28 June 2013 05:14, Dishara Wijewardana 
> wrote:
>
> > Hi Ian,
> > I am having a problem with CQL..
> >
> > For example:
> > CqlQuery** cqlQuery = new CqlQuery*
> > *(keyspace, new StringSerializer(),new
> > StringSerializer(), new LongSerializer();
> > cqlQuery.setQuery("insert into mytable
> (KEY,password,gender,userid)
> > values (3,'pass1','male',34);");
> > QueryResult> result =
> > cqlQuery.execute();
> >
> > This will successfully insert the row with pass1,male and 34 values under
> > rowId=3.
> >
> > But in sling scenario, we need to have more serializers for a query as
> > follows. Since we have more columns.
> > i.e
> > CqlQuery* *cqlQuery = new CqlQuery*
> > *(keyspace, new StringSerializer(),new
> > StringSerializer(),new   StringSerializer(),new StringSerializer());
> > cqlQuery.setQuery("insert into mytable
> > (KEY,path,resourceType,resourceSuperType,metadata) values
> > (3,'/content/cassandra/foo/bar','nt:cassandra','nt:super','metadata');
> > QueryResult> result =
> > cqlQuery.execute();
> >
> > Here I am using me.prettyprint.cassandra.model.CqlQuery class. Any idea
> how
> > to proceed with this.
> >
> > Am I doing something wring or is this a limitation of the API I am using
> ?
> >
> >
> > On Thu, Jun 27, 2013 at 7:41 AM, Dishara Wijewardana <
> > ddwijeward...@gmail.com> wrote:
> >
> > >
> > >
> > > On Thu, Jun 27, 2013 at 4:26 AM, Ian Boston  wrote:
> > >
> > >> On 27 June 2013 02:34, Dishara Wijewardana 
> > >> wrote:
> > >>
> > >> > On Tue, Jun 25, 2013 at 4:52 AM, Ian Boston  wrote:
> > >> >
> > >> > > Hi,
> > >> > >
> > >> > > (I might have errors in the CQL, Cassandra schema and the
> functions
> > >> need
> > >> > > proper escaping)
> > >> > >
> > >> > >
> > >> > > Example 1:
> > >> > > Zero depth tree wiht UUID as the rowid or key.
> > >> > >
> > >> > > URL /content/cassandra/pictures/13f58d5c95c70b6f
> > >> > >
> > >> > > then the column family is pictures and the URL -> ROWID function
> > just
> > >> > > results in the ROWID being 13f58d5c95c70b6f and
> > >> > >
> > >> > > String cql =
> > mapOfCassandraMappers.get("pictures").getCQL("pictures",
> > >> "
> > >> > > 13f58d5c95c70b6f")
> > >> > > System.err.println(cql);
> > >> > >
> > >> > > where

Re: [Status Update] Apache Cassandra backend for Sling

2013-06-30 Thread Ian Boston
Hi Dishara,

I've taken the liberty of creating a code review at [1]. This is all
commits. I've emailed you separately with the comments. I think it would be
good if we can get into the habit of looking at the code in this way as it
often removes confusion introduced by the english language (which has many
compilers ;), mine has been known to be buggy at times.).


More comments inline below: (BTW, excellent progress!)

Best Regards
Ian


1 https://codereview.appspot.com/10811044/



On 30 June 2013 22:52, Dishara Wijewardana  wrote:

> On Fri, Jun 28, 2013 at 4:37 AM, Ian Boston  wrote:
>
> > Hi,
> > Have you tried the TypeInferringSerializer for the value serializer ?
> > That claims to be detect what the column value is based on the Byte
> array.
> >
> > Failing that, I would consider making everything byte[] and using your
> own
> > serializer that writes and read values to a byte[] using DataInputStream
> > DataOutputStream.
> >
> > [2] Is an example of a serializer written for that purpose that was used
> > with Cassandra over raw Thrift. Its not easy to read what it outputs to
> the
> > storage layer, but it is compact and efficient. I would not use it
> directly
> > as it does some very specific things like slicing large byte[]s into 1MB
> > chunks and bypassing the 64K limit on reading and writing UTF8 strings
> with
> > DataInputStream.
> >
> > Try the TypeInferringSerializer first. If it works great, no need to do
> > anything more complex.
> >
>
> Hi,
> In fact I was able to add as many params as I wanted with the same
> configurations. But TypeInferringSerializer is a useful one too which might
> need in future.
> Also I was thinking rather than storing resource meta data as String
> values, how about storing a serialized object as you mentioned ?


I suspect that TypeInferringSerializer will do a better job of serializing
than the approach I mentioned. Only consider writing your own, if there is
a real and demonstrated need for it.


> It will be
> clear. But I am not sure about the performance. Because when we have multi
> valued columns like meta data we have to insert them in a single String as
> comma separated values. It is scalable if we have a Bean for Cassandra
> Resource ? What do you think ?
>

Put one property per column in Cassandra if possible. IIRC it does a good
job of serializing data, and doesnt need a pre-defined schema as
traditional RDBMS's do. The serialisation I mentioned was mostly used to
get schemaless storage into an RDBMS.



>
> And I did a first cut of this  but with many TODOs ;-),  where getResource
> method is implemented and currently all the content is printed, but I have
> not implemented methods in CassandraResource yet. This is just a POC to
> test whether the proposed model works. Apparently it works [1].


Yes, this is a great start! I didn't find to many issues with the approach,
as you will see from the comments on the code review.




>  See
>  CassandraDataPopulator class which is a plain java test class added for
> the moment to test the POC.(I am moving this to a proper JUnit)
>

Good.


>
> TODOs
> - I am in the process of  finishing the implementation of Cassandra
> Resource, CassandaResource Provider and etc END to END.
> - Move to JUnit test framework and  write more tests for each scenario
> where I can extend this to Mockito (I am still not clear how Mockito comes
> in to the picture) in near future.
>

When you write the Unit tests, if you find that you need to mock anything
(ie ResourceResolver) to make your unit tests work, dont. Use Mocks. You
can even Mockup concrete clases so could mockup the behaviour of the Hector
API to respond in a pre-defined way to certain CQL queries. This will
eliminate the need to have a real cassandra server present when doing the
basic unit tests.




> - Change the implementation based on the feedbacks from the community.
> - Parameterize the constants as much as possible to read from a property
> file.
>

These should come from OSGi Properties. See the comments on
CassandraResoureProvider






>
>
> [1] -
> https://cassandra-backend-for-sling.googlecode.com/svn/trunk/main/cassandra
>
> Thanks
>

Excellent progress, thank you!
Ian


>
> >
> >
> > Ian
> >
> > 1
> >
> >
> http://hector-client.github.io/hector/source/content/API/core/0.8.0-2/me/prettyprint/cassandra/serializers/TypeInferringSerializer.html
> >
> > 2
> >
> >
> https://github.com/ieb/sparsemapcontent/tree/master/core/src/main/java/org/sakaiproject/nakamura/lite/storage/spi/types
> >
> >
> > On 28 June 2013 05:14, Dishara Wijewardana 
> > wrote:
> >
> > > Hi Ian,
> > > I am having a problem with CQL..
> > >
> > > For example:
> > > CqlQuery** cqlQuery = new CqlQuery*
> > > *(keyspace, new StringSerializer(),new
> > > StringSerializer(), new LongSerializer();
> > > cqlQuery.setQuery("insert into mytable
> > (KEY,password,gender,userid)
> > > values (3,'pass1','male',34);");
> > > QueryResult> result =
> > > cqlQuery.execute

Sling Proposal

2013-06-30 Thread Ben Zahler
Hi all,
I have done some work on selectors and security in CQ lately, and in the 
process I've had an idea how to handle some of the issues in Sling.
>From my point of view, this could well be intergrated into Sling, but it can 
>also easily work as an addition, so I'd like to hear some feedback from you.

The basic idea is to have the developer of a component/template define the 
selectors allowed on the component. I've used a property sling:allowedSelectors 
to do so.
In a servlet filter, we can then check for all the allowed selectors in the 
application and verify if the request's selector are valid.
Of course, there are a quite a few open questions/points:

  *   should the allowed selectors be cached?
  *   Servlets with sling.servlet.selectors property need to be included as well
  *   Should the sling:allowedSelectors configuration be component or template 
based? Component based means the definition is where the selectors are actually 
implemented, template based provides more accurate means of checking whether 
request selectors are valid.
  *   How can multisites be configured?

Attached is a very basic implementation of the Servlet Filter. Be aware that 
installing this into a CQ author instance will break some things as the default 
CQ selectors are not supported.

So basically, my question to you is if you think this is an interesting feature 
or if you consider this rather unnecessary. ;-)

Mit besten GrĂ¼ssen
Ben Zahler

Inside Solutions AG | Felsenstrasse 11 | 4450 Sissach | Schweiz
Telefon: +41 61 551 00 40 | Direkt: +41 61 551 00 43
http://www.inside-solutions.ch