[jira] [Commented] (DRILL-92) Cassandra storage engine

2021-01-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17274215#comment-17274215
 ] 

ASF GitHub Bot commented on DRILL-92:
-

vvysotskyi merged pull request #2152:
URL: https://github.com/apache/drill/pull/2152


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Cassandra storage engine
> 
>
> Key: DRILL-92
> URL: https://issues.apache.org/jira/browse/DRILL-92
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Steven Phillips
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.19.0
>
> Attachments: DRILL-92-Cassandra-Storage.patch, 
> DRILL-92-CassandraStorage.patch, DRILL-92.patch, DRILL-CASSANDRA.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-92) Cassandra storage engine

2021-01-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17274085#comment-17274085
 ] 

ASF GitHub Bot commented on DRILL-92:
-

cgivre commented on pull request #2152:
URL: https://github.com/apache/drill/pull/2152#issuecomment-769509780


   @vvysotskyi Can we merge this?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Cassandra storage engine
> 
>
> Key: DRILL-92
> URL: https://issues.apache.org/jira/browse/DRILL-92
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Steven Phillips
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.19.0
>
> Attachments: DRILL-92-Cassandra-Storage.patch, 
> DRILL-92-CassandraStorage.patch, DRILL-92.patch, DRILL-CASSANDRA.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-92) Cassandra storage engine

2021-01-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17270429#comment-17270429
 ] 

ASF GitHub Bot commented on DRILL-92:
-

vvysotskyi commented on a change in pull request #2152:
URL: https://github.com/apache/drill/pull/2152#discussion_r562906079



##
File path: 
contrib/storage-cassandra/src/main/java/org/apache/drill/exec/store/cassandra/CassandraStoragePlugin.java
##
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.cassandra;
+
+import org.apache.calcite.adapter.cassandra.CalciteUtils;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.drill.exec.ops.OptimizerRulesContext;
+import org.apache.drill.exec.planner.PlannerPhase;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.store.AbstractStoragePlugin;
+import org.apache.drill.exec.store.SchemaConfig;
+import org.apache.drill.exec.store.SchemaFactory;
+import 
org.apache.drill.exec.store.cassandra.schema.CassandraRootDrillSchemaFactory;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableSet;
+
+import java.io.IOException;
+import java.util.Collections;
+import java.util.Set;
+
+public class CassandraStoragePlugin extends AbstractStoragePlugin {
+
+  private final CassandraStorageConfig config;
+  private final SchemaFactory schemaFactory;
+
+  public CassandraStoragePlugin(
+  CassandraStorageConfig config, DrillbitContext context, String name) {
+super(context, name);
+this.config = config;
+this.schemaFactory = new CassandraRootDrillSchemaFactory(name, this);
+  }
+
+  @Override
+  public void registerSchemas(SchemaConfig schemaConfig, SchemaPlus parent) 
throws IOException {
+schemaFactory.registerSchemas(schemaConfig, parent);
+  }
+
+  @Override
+  public CassandraStorageConfig getConfig() {
+return config;
+  }
+
+  @Override
+  public boolean supportsRead() {
+return true;
+  }
+
+  @Override
+  public Set getOptimizerRules(OptimizerRulesContext 
optimizerContext, PlannerPhase phase) {
+switch (phase) {
+  case LOGICAL_PRUNE_AND_JOIN:
+  case LOGICAL_PRUNE:
+  case PARTITION_PRUNING:
+return Collections.emptySet();

Review comment:
   Oh, no, forgot to change it here. Thanks for finding it, combined it 
with the default case.

##
File path: contrib/storage-cassandra/pom.xml
##
@@ -0,0 +1,96 @@
+
+
+http://maven.apache.org/POM/4.0.0;
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+  4.0.0
+
+  
+drill-contrib-parent
+org.apache.drill.contrib
+1.19.0-SNAPSHOT
+  
+
+  drill-storage-cassandra
+
+  contrib/cassandra-storage-plugin
+
+  
+
+  org.apache.drill.exec
+  drill-java-exec
+  ${project.version}
+
+
+  ${calcite.groupId}
+  calcite-cassandra
+  ${calcite.version}
+  
+
+  commons-logging
+  commons-logging
+
+
+  com.datastax.cassandra
+  cassandra-driver-core
+
+  
+
+
+  com.scylladb
+  scylla-driver-core
+  3.10.1-scylla-0
+
+
+  org.ow2.asm
+  asm
+  ${asm.version}
+  runtime
+
+
+  org.apache.drill.exec
+  drill-java-exec
+  tests
+  ${project.version}
+  test
+
+
+  org.apache.drill
+  drill-common
+  tests
+  ${project.version}
+  test
+
+
+  com.github.nosan
+  embedded-cassandra
+  4.0.0
+  test
+
+
+  net.hydromatic
+  foodmart-data-json
+  0.4
+  test
+
+  
+
+

Review comment:
   Thanks, fixed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> 

[jira] [Commented] (DRILL-92) Cassandra storage engine

2021-01-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17270386#comment-17270386
 ] 

ASF GitHub Bot commented on DRILL-92:
-

ihuzenko commented on a change in pull request #2152:
URL: https://github.com/apache/drill/pull/2152#discussion_r562858218



##
File path: contrib/storage-cassandra/pom.xml
##
@@ -0,0 +1,96 @@
+
+
+http://maven.apache.org/POM/4.0.0;
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+  4.0.0
+
+  
+drill-contrib-parent
+org.apache.drill.contrib
+1.19.0-SNAPSHOT
+  
+
+  drill-storage-cassandra
+
+  contrib/cassandra-storage-plugin
+
+  
+
+  org.apache.drill.exec
+  drill-java-exec
+  ${project.version}
+
+
+  ${calcite.groupId}
+  calcite-cassandra
+  ${calcite.version}
+  
+
+  commons-logging
+  commons-logging
+
+
+  com.datastax.cassandra
+  cassandra-driver-core
+
+  
+
+
+  com.scylladb
+  scylla-driver-core
+  3.10.1-scylla-0
+
+
+  org.ow2.asm
+  asm
+  ${asm.version}
+  runtime
+
+
+  org.apache.drill.exec
+  drill-java-exec
+  tests
+  ${project.version}
+  test
+
+
+  org.apache.drill
+  drill-common
+  tests
+  ${project.version}
+  test
+
+
+  com.github.nosan
+  embedded-cassandra
+  4.0.0
+  test
+
+
+  net.hydromatic
+  foodmart-data-json
+  0.4
+  test
+
+  
+
+

Review comment:
   new line

##
File path: 
contrib/storage-cassandra/src/main/java/org/apache/drill/exec/store/cassandra/CassandraStoragePlugin.java
##
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.cassandra;
+
+import org.apache.calcite.adapter.cassandra.CalciteUtils;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.drill.exec.ops.OptimizerRulesContext;
+import org.apache.drill.exec.planner.PlannerPhase;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.store.AbstractStoragePlugin;
+import org.apache.drill.exec.store.SchemaConfig;
+import org.apache.drill.exec.store.SchemaFactory;
+import 
org.apache.drill.exec.store.cassandra.schema.CassandraRootDrillSchemaFactory;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableSet;
+
+import java.io.IOException;
+import java.util.Collections;
+import java.util.Set;
+
+public class CassandraStoragePlugin extends AbstractStoragePlugin {
+
+  private final CassandraStorageConfig config;
+  private final SchemaFactory schemaFactory;
+
+  public CassandraStoragePlugin(
+  CassandraStorageConfig config, DrillbitContext context, String name) {
+super(context, name);
+this.config = config;
+this.schemaFactory = new CassandraRootDrillSchemaFactory(name, this);
+  }
+
+  @Override
+  public void registerSchemas(SchemaConfig schemaConfig, SchemaPlus parent) 
throws IOException {
+schemaFactory.registerSchemas(schemaConfig, parent);
+  }
+
+  @Override
+  public CassandraStorageConfig getConfig() {
+return config;
+  }
+
+  @Override
+  public boolean supportsRead() {
+return true;
+  }
+
+  @Override
+  public Set getOptimizerRules(OptimizerRulesContext 
optimizerContext, PlannerPhase phase) {
+switch (phase) {
+  case LOGICAL_PRUNE_AND_JOIN:
+  case LOGICAL_PRUNE:
+  case PARTITION_PRUNING:
+return Collections.emptySet();

Review comment:
   Technically here returned also empty immutable set like for **default** 
case. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Cassandra storage engine

[jira] [Commented] (DRILL-92) Cassandra storage engine

2021-01-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17269911#comment-17269911
 ] 

ASF GitHub Bot commented on DRILL-92:
-

vvysotskyi opened a new pull request #2152:
URL: https://github.com/apache/drill/pull/2152


   # [DRILL-92](https://issues.apache.org/jira/browse/DRILL-92): Cassandra 
storage plugin implementation based on Calcite adapter
   
   ## Description
   This PR introduces a plugin for Cassandra and Scylla based on Calcite's 
Cassandra adapter.
   
   Please note, that this PR requires 
https://github.com/vvysotskyi/drill-calcite/pull/3 to be merged.
   
   ## Documentation
   Docs on the Drill web site should be updated to state the support of the 
Cassandra and Scylla plugin.
   
   ## Testing
   Added unit tests, checked with Cassandra and Scylla.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Cassandra storage engine
> 
>
> Key: DRILL-92
> URL: https://issues.apache.org/jira/browse/DRILL-92
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Steven Phillips
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.19.0
>
> Attachments: DRILL-92-Cassandra-Storage.patch, 
> DRILL-92-CassandraStorage.patch, DRILL-92.patch, DRILL-CASSANDRA.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-92) Cassandra storage engine

2021-01-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17269712#comment-17269712
 ] 

ASF GitHub Bot commented on DRILL-92:
-

vvysotskyi opened a new pull request #2152:
URL: https://github.com/apache/drill/pull/2152


   # [DRILL-92](https://issues.apache.org/jira/browse/DRILL-92): Cassandra 
storage plugin implementation based on Calcite adapter
   
   ## Description
   This PR introduces a plugin for Cassandra and Scylla based on Calcite's 
Cassandra adapter.
   
   Please note, that this PR requires 
https://github.com/vvysotskyi/drill-calcite/pull/3 to be merged.
   
   ## Documentation
   Docs on the Drill web site should be updated to state the support of the 
Cassandra and Scylla plugin.
   
   ## Testing
   Added unit tests, checked with Cassandra and Scylla.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Cassandra storage engine
> 
>
> Key: DRILL-92
> URL: https://issues.apache.org/jira/browse/DRILL-92
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Steven Phillips
>Priority: Major
> Fix For: Future
>
> Attachments: DRILL-92-Cassandra-Storage.patch, 
> DRILL-92-CassandraStorage.patch, DRILL-92.patch, DRILL-CASSANDRA.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-92) Cassandra storage engine

2018-11-12 Thread Davide Gesino (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684178#comment-16684178
 ] 

Davide Gesino commented on DRILL-92:


Any update on this issue?

> Cassandra storage engine
> 
>
> Key: DRILL-92
> URL: https://issues.apache.org/jira/browse/DRILL-92
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Steven Phillips
>Priority: Major
> Fix For: Future
>
> Attachments: DRILL-92-Cassandra-Storage.patch, 
> DRILL-92-CassandraStorage.patch, DRILL-92.patch, DRILL-CASSANDRA.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-92) Cassandra storage engine

2016-05-31 Thread SK (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308124#comment-15308124
 ] 

SK commented on DRILL-92:
-

Any new update or timeline on Cassandra storage engine availability in Drill? 

> Cassandra storage engine
> 
>
> Key: DRILL-92
> URL: https://issues.apache.org/jira/browse/DRILL-92
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Steven Phillips
> Fix For: Future
>
> Attachments: DRILL-92-Cassandra-Storage.patch, 
> DRILL-92-CassandraStorage.patch, DRILL-92.patch, DRILL-CASSANDRA.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-92) Cassandra storage engine

2015-04-29 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518841#comment-14518841
 ] 

Robert Stupp commented on DRILL-92:
---

FYI there's a ticket for an interesting enhancement to C* to read a token 
range: https://issues.apache.org/jira/browse/CASSANDRA-9259

 Cassandra storage engine
 

 Key: DRILL-92
 URL: https://issues.apache.org/jira/browse/DRILL-92
 Project: Apache Drill
  Issue Type: New Feature
Reporter: Steven Phillips
Assignee: Yash Sharma
 Fix For: Future

 Attachments: DRILL-92-Cassandra-Storage.patch, 
 DRILL-92-CassandraStorage.patch, DRILL-92.patch, DRILL-CASSANDRA.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-92) Cassandra storage engine

2015-04-26 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513026#comment-14513026
 ] 

Robert Stupp commented on DRILL-92:
---

[~yash...@gmail.com] sorry for the late response. Here are my comments:

Tables  Keyspace cache - you don’t need to cache it. It is already cached - so 
it’s duplicate effort. Makes your code less complex if you remove the cache - 
e.g. all those cache related {{catch ExecutionException}} stuff.

I’m not sure whether the connection cache in {{CassandraConnectionManager}} 
really works. For example, if you have hosts 127.0.0.1 and 127.0.0.2 using the 
same {{Cluster}} instance, and the cache decides to evict 127.0.0.1, the 
instance for 127.0.0.2 no longer works.
Beside that you query the cache using {{ListString}} but add using {{String}} 
for the key.
As a proposal: use the cluster name as the cache key and query for that.
I’m not sure whether you can always close the {{Cluster}} instance - in respect 
whether such an instance is still in use during a long running operation.

Both {{CassandraConnectionManager}} and {{CassandraSchemaFactory}} create 
individual {{Cluster}} instances and therefore independent resources 
(connections, threads, etc). Should be merged. If both classes are used in 
completely different contexts, please ignore this comment.
The {{Cluster}} instance in {{CassandraSchemaFactory}} is never closed.

bq. Endpoint affinity  Partition token
If you’re using the code just to assign ranges to Drill hosts, then that should 
be fine.
But do not assume anything about tokens assigned to a C* host. That code 
heavily depends on the individual cluster configuration (partitioner, topology, 
node placement (DC, rack)) and keyspace configuration. It’s not that easy, but 
manageable.

In {{org.apache.drill.exec.store.cassandra.CassandraRecordReader#setup}} you’re 
using {{QueryBuilder.token}} for paging / slicing. Unfortunately that would not 
work. Assume that you have vnodes in the C* cluster (defaults to 256 vnodes per 
C* node). Vnode tokens are assigned randomly to endpoints (=nodes) - it’s not 
like old-fashioned single token per node. You just cannot slice using the 
{{token()}} function. Even further it’s quite difficult to nicely split slices 
matching both C* nodes/vnodes *and* Drill _sub scan ranges_ (is this the 
correct wording?). Nice slicing across all nodes/vnodes is one of the weak 
sides in C*. That’s why Hadoop-on-C* recommends to prevent vnodes - they have 
the same problem. Let me think a bit about that - maybe I can provide a 
solution or at least a workaround for that.

For unit tests: take a look at https://github.com/doanduyhai/Achilles - it has 
some nice support for unit tests, which may make all that manual work to setup 
and fill keyspaces/tables superfluous.

Is it a Drill requirement that {{CassandraRecordReader#updateValueVector}} only 
mutates using {{String}}s?

General code comments:
* there’s some unused code, that can be safely removed
* in {{CassandraRecordReader}}: you can safely replace the 
{{clazz.isInstance()}}-sequence with {{clazz == ClassName.class}}

Note: the patch does not apply onto the current master - but on master as of 
March 31st. There were some breaking API changes in Drill.

For the future: I don’t know whether the current code supports Cassandra’s 
User-Defined-Types or collections (maps, sets, lists). If not, it might be a 
nice feature for later.


 Cassandra storage engine
 

 Key: DRILL-92
 URL: https://issues.apache.org/jira/browse/DRILL-92
 Project: Apache Drill
  Issue Type: New Feature
Reporter: Steven Phillips
Assignee: Yash Sharma
 Fix For: Future

 Attachments: DRILL-92-Cassandra-Storage.patch, 
 DRILL-92-CassandraStorage.patch, DRILL-92.patch, DRILL-CASSANDRA.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-92) Cassandra storage engine

2015-03-29 Thread Yash Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385660#comment-14385660
 ] 

Yash Sharma commented on DRILL-92:
--

[~snazy] + All : On the points mentioned in the review comment - 
- Multiple Initial contact points: The patch now handles multiple initial 
contact points and passes all the information while connecting to the cluster. 
Tested with bringing down certain nodes.
- Session handling : Now the patch uses the cluster instance rather than 
session.
- Endpoint affinity  Partition token: The code does consider endpoint affinity 
such that - if one of drill endpoint is also a cassandra node it would have an 
affinity. The partition token is however for a different purpose. The partition 
scheme just ensures that different sub-scans of drill do not fetch the same 
range of keys again. It works as a range restriction for each sub-scan. We are 
not using it to check where the actual partition data lies.
Tables  Keyspace cache: This is cached not for the functionality for querying 
cassandra. Its a schema information which we would need when the user would 
like to describe a table or check tables in keyspace etc. We just cache it. 

Thanks for the review comments.  Please share your thoughts on the new patch.

 Cassandra storage engine
 

 Key: DRILL-92
 URL: https://issues.apache.org/jira/browse/DRILL-92
 Project: Apache Drill
  Issue Type: New Feature
Reporter: Steven Phillips
Assignee: Yash Sharma
 Fix For: Future

 Attachments: DRILL-92-Cassandra-Storage.patch, DRILL-92.patch, 
 DRILL-CASSANDRA.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-92) Cassandra storage engine

2015-02-27 Thread Yash Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14340453#comment-14340453
 ] 

Yash Sharma commented on DRILL-92:
--

Thanks [~snazy]: I have definitely missed a lot in this patch - which started 
as a poc. 
There would be a lot of work to be done along your points enabling Drill and 
Cassandra inter-operable.
Thanks

 Cassandra storage engine
 

 Key: DRILL-92
 URL: https://issues.apache.org/jira/browse/DRILL-92
 Project: Apache Drill
  Issue Type: New Feature
Reporter: Steven Phillips
Assignee: Yash Sharma
 Fix For: Future

 Attachments: DRILL-92.patch, DRILL-CASSANDRA.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-92) Cassandra storage engine

2015-02-24 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335238#comment-14335238
 ] 

Robert Stupp commented on DRILL-92:
---

I went over your patch just to see how the actual C* integration has been 
implemented. Tbh - I don’t know how Drill works - but I know how C* and the 
Java Driver work.

Please let me explain some things in advance. A Cassandra cluster exists of 
many nodes. Some of them might be down without affecting the integrity of the 
whole cluster. Due to the fact that some hosts might be down, the Datastax Java 
Driver allows to specify *multiple* initial contact points - and multiple 
initial contact points (maybe 3 per data center) should be passed to 
{{Cluster.Builder}}. All connections to a C* cluster are managed by the 
{{Cluster}} instance - not directly by {{Session}}. That means: to effectively 
close connections to a cluster, you have to close the {{Cluster}} instance. 
Further, the {{Cluster}} instance learns about all other nodes in the C* 
cluster - i.e. it will know all nodes in the cluster, which token ranges they 
server, and it will perform a best-effort-approach to route direct DML 
statements (SELECT/INSERT/UPDATE/DELETE) to the nodes that hold replicas of 
them. A usual application does not care about where data actually lives - it’s 
handled by the Java Driver for you.

The lines in {{CassandraGroupScan}} calling 
{{com.datastax.driver.core.Metadata#getReplicas}} are wrong. Which nodes are 
replicas for a keyspace are defined by the replication strategy and the 
per-keyspace configuration. The method you’re calling determines the hosts for 
a specific _partition key_ - but you’re passing in the class name of 
partitioner. Those are completely different things.
Although not completely wrong, I’d encourage you not to assume which nodes hold 
the tokens you intend to request (in {{CassandraUtil}}). There are several 
other things that influence where data ”lives” - e.g. datacenter and rack 
awareness.

In {{CassandraSchemaFactory}} is a keyspace cache and a table cache. That’s 
completely superfluous since the Java Driver already holds that information in 
the {{Cluster}} instance and it gets automagically updated when the cluster 
topology and/or the schema changes. That kind of metadata is essential for the 
Java Driver to work and always present.

I’d recommend to start with a different approach and consider the current patch 
as a _proof-of-concept_ (you may of course take over working code):
# Learn a bit more about C* and the Java Driver architecture ;)
# Forget about accessing the ”nearest” node in an initial attempt - you can add 
that later anyway. BTW that does only make sense, if you have Drill slaves 
(don’t know if such exist) running on each C* node.
# Start with a simple cluster to work against. Take a look at _ccm_ - it’s a 
neat tool that spawns a C* using multiple nodes on your local machine: 
https://github.com/pcmanus/ccm/.
# If you have a basic implementation running you may improve it by adding 
datacenter-awareness to your client (it’s basically just a simple configuration 
using {{Cluster.Builder}}, authentication against the C* cluster and some other 
fine tuning

Feel free to ask questions on C* user mailing list u...@cassandra.apache.org or 
on freenode IRC #cassandra. There are many people happy to answer individual 
questions. Just ask - don’t ask to ask :)


 Cassandra storage engine
 

 Key: DRILL-92
 URL: https://issues.apache.org/jira/browse/DRILL-92
 Project: Apache Drill
  Issue Type: New Feature
Reporter: Steven Phillips
Assignee: Yash Sharma
 Fix For: Future

 Attachments: DRILL-92.patch, DRILL-CASSANDRA.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)