Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]

2023-12-07 Thread via GitHub


frankgh closed pull request #19: CASSANDRA-19024 Fix bulk reading when using 
identifiers that need quotes
URL: https://github.com/apache/cassandra-analytics/pull/19


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]

2023-12-07 Thread via GitHub


frankgh commented on PR #19:
URL: 
https://github.com/apache/cassandra-analytics/pull/19#issuecomment-1845780320

   Closed via 
https://github.com/apache/cassandra-analytics/commit/457b36bcb3c8a865cca83ca6c402246798113ab4


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]

2023-12-04 Thread via GitHub


frankgh commented on code in PR #19:
URL: 
https://github.com/apache/cassandra-analytics/pull/19#discussion_r1414720469


##
cassandra-bridge/src/main/java/org/apache/cassandra/spark/data/CqlField.java:
##
@@ -263,7 +263,7 @@ public CqlField(boolean isPartitionKey,
 this.isPartitionKey = isPartitionKey;
 this.isClusteringColumn = isClusteringColumn;
 this.isStaticColumn = isStaticColumn;
-this.name = name.replaceAll("\"", "");
+this.name = name;

Review Comment:
   ok, took a deeper look. I think the confusion is mainly on the EndToEnd 
testing code. So what I've done is address the test setup in testing code. I 
have made it so that we always quote the fields, when quotes are needed for 
testing purposes. 
   
   I have also added a test case that validates the column filter 
(`PartialRowBuilder` class). That way we guarantee that we are able to retrieve 
columns that are requested in the schema where columns are case sensitive in 
Cassandra



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]

2023-12-04 Thread via GitHub


frankgh commented on code in PR #19:
URL: 
https://github.com/apache/cassandra-analytics/pull/19#discussion_r1414386801


##
cassandra-analytics-core/src/main/spark3/org/apache/cassandra/spark/sparksql/CassandraScanBuilder.java:
##
@@ -135,17 +136,16 @@ private List 
buildPartitionKeyFilters()
 {
 List partitionKeyColumnNames = 
dataLayer.cqlTable().partitionKeys().stream().map(CqlField::name).collect(Collectors.toList());
 Map> partitionKeyValues = 
FilterUtils.extractPartitionKeyValues(pushedFilters, new 
HashSet<>(partitionKeyColumnNames));
-if (partitionKeyValues.size() > 0)
-{
-List> orderedValues = 
partitionKeyColumnNames.stream().map(partitionKeyValues::get).collect(Collectors.toList());
-return FilterUtils.cartesianProduct(orderedValues).stream()
-.map(this::buildFilter)
-.collect(Collectors.toList());
-}
-else
+
+if (partitionKeyValues.isEmpty())
 {
-return new ArrayList<>();
+return Collections.emptyList();
 }
+
+List> orderedValues = 
partitionKeyColumnNames.stream().map(partitionKeyValues::get).collect(Collectors.toList());
+return FilterUtils.cartesianProduct(orderedValues).stream()
+  .map(this::buildFilter)
+  .collect(Collectors.toList());

Review Comment:
   reverted



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]

2023-12-04 Thread via GitHub


frankgh commented on code in PR #19:
URL: 
https://github.com/apache/cassandra-analytics/pull/19#discussion_r1414386089


##
cassandra-bridge/src/main/java/org/apache/cassandra/spark/data/DataLayer.java:
##
@@ -171,8 +171,8 @@ public CassandraVersion version()
 public abstract boolean isInPartition(int partitionId, BigInteger token, 
ByteBuffer key);
 
 public List partitionKeyFiltersInRange(
-int partitionId,
-List partitionKeyFilters) throws 
NoMatchFoundException
+int partitionId,
+List partitionKeyFilters) throws NoMatchFoundException

Review Comment:
   reverted



##
cassandra-analytics-core/src/test/java/org/apache/cassandra/spark/EndToEndTests.java:
##
@@ -1851,19 +1848,19 @@ public void testUdtsWithNulls(CassandraBridge bridge)
   for (long pk = 0; pk < Tester.DEFAULT_NUM_ROWS; pk++)
   {
   Map value = ImmutableMap.of(
-pk < midPoint ? "a" : "b", 
RandomUtils.randomValue(bridge.text()).toString(),
-"c", 
RandomUtils.randomValue(bridge.text()).toString());
+  pk < midPoint ? "a" : "b", 
RandomUtils.randomValue(bridge.text()).toString(),
+  "c", RandomUtils.randomValue(bridge.text()).toString());

Review Comment:
   reverted



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]

2023-12-04 Thread via GitHub


frankgh commented on code in PR #19:
URL: 
https://github.com/apache/cassandra-analytics/pull/19#discussion_r1414386415


##
cassandra-analytics-core/src/test/java/org/apache/cassandra/spark/EndToEndTests.java:
##
@@ -154,15 +154,15 @@ public void testBasicSingleClusteringKey(CassandraBridge 
bridge)
 public void testSingleClusteringKeyOrderBy(CassandraBridge bridge)
 {
 qt().forAll(TestUtils.cql3Type(bridge), TestUtils.sortOrder())
-.checkAssert((clusteringKeyType, sortOrder) ->
+.checkAssert((clusteringKeyType, sortOrder) -> {
 Tester.builder(TestSchema.builder()
  .withPartitionKey("a", 
bridge.bigint())
  .withClusteringKey("b", 
clusteringKeyType)
  .withColumn("c", bridge.bigint())
  .withSortOrder(sortOrder))
   .withExpectedRowCountPerSSTable(Tester.DEFAULT_NUM_ROWS)
-  .run()
-);
+  .run();
+});

Review Comment:
   reverted them already 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]

2023-12-04 Thread via GitHub


frankgh commented on code in PR #19:
URL: 
https://github.com/apache/cassandra-analytics/pull/19#discussion_r1414346332


##
cassandra-bridge/src/main/java/org/apache/cassandra/spark/data/CqlField.java:
##
@@ -263,7 +263,7 @@ public CqlField(boolean isPartitionKey,
 this.isPartitionKey = isPartitionKey;
 this.isClusteringColumn = isClusteringColumn;
 this.isStaticColumn = isStaticColumn;
-this.name = name.replaceAll("\"", "");
+this.name = name;

Review Comment:
   Handling of UDTs is different from handling of keyspace / table. For UDTs, 
the bridge was already taking care of preserving the quotes. However, for user 
facing options (the keyspace and table specifically), we need special handling 
based on whether the user explicitly requests to quote identifiers



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]

2023-12-04 Thread via GitHub


yifan-c commented on code in PR #19:
URL: 
https://github.com/apache/cassandra-analytics/pull/19#discussion_r1414339007


##
cassandra-analytics-core/src/test/java/org/apache/cassandra/spark/EndToEndTests.java:
##
@@ -154,15 +154,15 @@ public void testBasicSingleClusteringKey(CassandraBridge 
bridge)
 public void testSingleClusteringKeyOrderBy(CassandraBridge bridge)
 {
 qt().forAll(TestUtils.cql3Type(bridge), TestUtils.sortOrder())
-.checkAssert((clusteringKeyType, sortOrder) ->
+.checkAssert((clusteringKeyType, sortOrder) -> {
 Tester.builder(TestSchema.builder()
  .withPartitionKey("a", 
bridge.bigint())
  .withClusteringKey("b", 
clusteringKeyType)
  .withColumn("c", bridge.bigint())
  .withSortOrder(sortOrder))
   .withExpectedRowCountPerSSTable(Tester.DEFAULT_NUM_ROWS)
-  .run()
-);
+  .run();
+});

Review Comment:
   I am fine with keeping them, if it is trouble to revert all of them. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]

2023-12-04 Thread via GitHub


frankgh commented on code in PR #19:
URL: 
https://github.com/apache/cassandra-analytics/pull/19#discussion_r1414336436


##
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/CassandraDataLayer.java:
##
@@ -343,7 +346,7 @@ private CompletionStage> 
createSnapshot(RingRespon
 snapshotName, keyspace, table, datacenter, 
ringEntry.fqdn());
 SidecarInstance sidecarInstance = new 
SidecarInstanceImpl(ringEntry.fqdn(), sidecarClientConfig.effectivePort());
 createSnapshotFuture = sidecar
-   .createSnapshot(sidecarInstance, 
keyspace, table, snapshotName)
+   .createSnapshot(sidecarInstance, 
maybeQuotedKeyspace, maybeQuotedTable, snapshotName)

Review Comment:
   however, the cleanup of snapshots happens on a different thread, if an 
exception occurs before the bridge has been initialized (i.e an error occurs 
when getting the cassandra version `String cassandraVersion = 
getEffectiveCassandraVersionForRead(clusterConfig, nodeSettings);` ) then the 
shutdown hook will be triggered and it will try to cleanup snapshots, even 
though the snapshot was never created in the first place



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]

2023-12-04 Thread via GitHub


frankgh commented on code in PR #19:
URL: 
https://github.com/apache/cassandra-analytics/pull/19#discussion_r1414334713


##
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/CassandraDataLayer.java:
##
@@ -343,7 +346,7 @@ private CompletionStage> 
createSnapshot(RingRespon
 snapshotName, keyspace, table, datacenter, 
ringEntry.fqdn());
 SidecarInstance sidecarInstance = new 
SidecarInstanceImpl(ringEntry.fqdn(), sidecarClientConfig.effectivePort());
 createSnapshotFuture = sidecar
-   .createSnapshot(sidecarInstance, 
keyspace, table, snapshotName)
+   .createSnapshot(sidecarInstance, 
maybeQuotedKeyspace, maybeQuotedTable, snapshotName)

Review Comment:
   if the bridge is initialized correctly, the fields will never be null when 
we reach this point.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]

2023-12-04 Thread via GitHub


frankgh commented on code in PR #19:
URL: 
https://github.com/apache/cassandra-analytics/pull/19#discussion_r1414331310


##
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/BulkSparkConf.java:
##
@@ -128,6 +128,7 @@ public class BulkSparkConf implements Serializable
 public final int commitThreadsPerInstance;
 protected final int effectiveSidecarPort;
 protected final int userProvidedSidecarPort;
+public boolean quoteIdentifiers;

Review Comment:
   changes related to bulk writes, moving to another PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]

2023-12-04 Thread via GitHub


frankgh commented on PR #19:
URL: 
https://github.com/apache/cassandra-analytics/pull/19#issuecomment-1839240525

   > But I would just write it explicitly that "Cassandra retreats all 
identifiers as they are in lower case unless quoted
   
   Yeah, this makes more sense, I will update the commit message


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]

2023-12-04 Thread via GitHub


frankgh commented on code in PR #19:
URL: 
https://github.com/apache/cassandra-analytics/pull/19#discussion_r1414326962


##
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/CassandraDataSourceHelper.java:
##
@@ -90,7 +94,7 @@ public static DataLayer getDataLayer(
 }
 }
 
-protected static Cache, CassandraDataLayer> 
getCassandraDataLayerCache()
+public static Cache, CassandraDataLayer> 
getCassandraDataLayerCache()

Review Comment:
   the change is not needed, I will revert



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]

2023-12-04 Thread via GitHub


frankgh commented on code in PR #19:
URL: 
https://github.com/apache/cassandra-analytics/pull/19#discussion_r1414315065


##
cassandra-analytics-core/src/test/java/org/apache/cassandra/spark/EndToEndTests.java:
##
@@ -154,15 +154,15 @@ public void testBasicSingleClusteringKey(CassandraBridge 
bridge)
 public void testSingleClusteringKeyOrderBy(CassandraBridge bridge)
 {
 qt().forAll(TestUtils.cql3Type(bridge), TestUtils.sortOrder())
-.checkAssert((clusteringKeyType, sortOrder) ->
+.checkAssert((clusteringKeyType, sortOrder) -> {
 Tester.builder(TestSchema.builder()
  .withPartitionKey("a", 
bridge.bigint())
  .withClusteringKey("b", 
clusteringKeyType)
  .withColumn("c", bridge.bigint())
  .withSortOrder(sortOrder))
   .withExpectedRowCountPerSSTable(Tester.DEFAULT_NUM_ROWS)
-  .run()
-);
+  .run();
+});

Review Comment:
   I'll revert the unnecessary changes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]

2023-12-04 Thread via GitHub


frankgh commented on code in PR #19:
URL: 
https://github.com/apache/cassandra-analytics/pull/19#discussion_r1414295872


##
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/CassandraDataLayer.java:
##
@@ -239,12 +241,23 @@ public void initialize(@NotNull ClientConfig options)
 LOGGER.info("Initialized Cassandra Bulk Reader with 
effectiveNumberOfCores={}", effectiveNumberOfCores);
 }
 
-private int initBulkReader(@NotNull ClientConfig options,
-   CompletableFuture 
nodeSettingsFuture,
-   CompletableFuture ringFuture) 
throws ExecutionException, InterruptedException
+private int initBulkReader(@NotNull ClientConfig options) throws 
ExecutionException, InterruptedException
 {
 Preconditions.checkArgument(keyspace != null, "Keyspace must be 
non-null for Cassandra Bulk Reader");
 Preconditions.checkArgument(table != null, "Table must be non-null for 
Cassandra Bulk Reader");
+
ShutdownHookManager.addShutdownHook(org.apache.spark.util.ShutdownHookManager.TEMP_DIR_SHUTDOWN_PRIORITY(),
+ScalaFunctions.wrapLambda(() -> 
shutdownHook(options)));
+
+NodeSettings nodeSettings = sidecar.nodeSettings().get();
+String cassandraVersion = 
getEffectiveCassandraVersionForRead(clusterConfig, nodeSettings);
+Partitioner partitioner = Partitioner.from(nodeSettings.partitioner());
+bridge = CassandraBridgeFactory.get(cassandraVersion);
+// optionally quote identifiers if the option has been set, we need an 
instance for the bridge
+maybeQuoteKeyspaceAndTable();

Review Comment:
   yeah, the constructor that calls this method is only called in the 
executors. For the driver, it will use this method to initialize the keyspace 
and table



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]

2023-12-04 Thread via GitHub


frankgh commented on code in PR #19:
URL: 
https://github.com/apache/cassandra-analytics/pull/19#discussion_r1414272495


##
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/WriterOptions.java:
##
@@ -45,5 +45,10 @@ public enum WriterOptions implements WriterOption
 ROW_BUFFER_MODE,
 SSTABLE_DATA_SIZE_IN_MB,
 TTL,
-TIMESTAMP
+TIMESTAMP,
+/**
+ * Option that specifies whether the identifiers (i.e. keyspace, table 
name, column names) should be quoted to
+ * support mixed case and reserved keyword names for these fields.
+ */
+QUOTE_IDENTIFIERS,

Review Comment:
   this should go in CASSANDRA-19031 instead. I will clean it up here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]

2023-12-04 Thread via GitHub


yifan-c commented on PR #19:
URL: 
https://github.com/apache/cassandra-analytics/pull/19#issuecomment-1838940923

   > Cassandra in nature is case insensitive when creating identifiers (i.e. 
keyspace names, table names, column names, etc). We can define a case-sensitive 
identifier or we can use a reserved word as an identifier by quoting it during 
DDL creation.
   
   I won't say that Cassandra is case-insensitive in nature. The statement is 
not helpful to understand the commit, which is essentially adding a new option 
to toggle quote the identifiers whenever desired. That said, I won't be 
insisting on removing the line. But I would just write it explicitly that 
"Cassandra retreats all identifiers as they are in lower case unless quoted".
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]

2023-12-04 Thread via GitHub


yifan-c commented on code in PR #19:
URL: 
https://github.com/apache/cassandra-analytics/pull/19#discussion_r1413883025


##
cassandra-analytics-core/src/main/java/org/apache/cassandra/clients/SidecarInstanceImpl.java:
##
@@ -90,14 +90,14 @@ public String toString()
 
 private void readObject(ObjectInputStream in) throws IOException, 
ClassNotFoundException
 {
-LOGGER.warn("Falling back to JDK deserialization");
+LOGGER.debug("Falling back to JDK deserialization");

Review Comment:
   +1 on changing the log level to debug. If warning on serialization method is 
desired, it should be logged separated and warn just once. 



##
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/WriterOptions.java:
##
@@ -45,5 +45,10 @@ public enum WriterOptions implements WriterOption
 ROW_BUFFER_MODE,
 SSTABLE_DATA_SIZE_IN_MB,
 TTL,
-TIMESTAMP
+TIMESTAMP,
+/**
+ * Option that specifies whether the identifiers (i.e. keyspace, table 
name, column names) should be quoted to
+ * support mixed case and reserved keyword names for these fields.
+ */
+QUOTE_IDENTIFIERS,

Review Comment:
   why it is a writer option?



##
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/CassandraDataLayer.java:
##
@@ -343,7 +346,7 @@ private CompletionStage> 
createSnapshot(RingRespon
 snapshotName, keyspace, table, datacenter, 
ringEntry.fqdn());

Review Comment:
   The log message should use `maybeQuotedKeyspace` and `maybeQuotedTable` 
instead.
   The parameters in the log message above (`LOGGER.warn("Skip snapshot 
creating when node is joining or down "`) should be updated too. 
   And the log messages in `listInstance` method.



##
cassandra-analytics-core/src/main/spark3/org/apache/cassandra/spark/sparksql/CassandraScanBuilder.java:
##
@@ -135,17 +136,16 @@ private List 
buildPartitionKeyFilters()
 {
 List partitionKeyColumnNames = 
dataLayer.cqlTable().partitionKeys().stream().map(CqlField::name).collect(Collectors.toList());
 Map> partitionKeyValues = 
FilterUtils.extractPartitionKeyValues(pushedFilters, new 
HashSet<>(partitionKeyColumnNames));
-if (partitionKeyValues.size() > 0)
-{
-List> orderedValues = 
partitionKeyColumnNames.stream().map(partitionKeyValues::get).collect(Collectors.toList());
-return FilterUtils.cartesianProduct(orderedValues).stream()
-.map(this::buildFilter)
-.collect(Collectors.toList());
-}
-else
+
+if (partitionKeyValues.isEmpty())
 {
-return new ArrayList<>();
+return Collections.emptyList();
 }
+
+List> orderedValues = 
partitionKeyColumnNames.stream().map(partitionKeyValues::get).collect(Collectors.toList());
+return FilterUtils.cartesianProduct(orderedValues).stream()
+  .map(this::buildFilter)
+  .collect(Collectors.toList());

Review Comment:
   It looks like only the if and else blocks are switched. I would discourage 
such refactoring. 
   We can keep this change since it is done, but let's refrain from doing so in 
the future patches. 



##
cassandra-analytics-core/src/test/java/org/apache/cassandra/spark/EndToEndTests.java:
##
@@ -1851,19 +1848,19 @@ public void testUdtsWithNulls(CassandraBridge bridge)
   for (long pk = 0; pk < Tester.DEFAULT_NUM_ROWS; pk++)
   {
   Map value = ImmutableMap.of(
-pk < midPoint ? "a" : "b", 
RandomUtils.randomValue(bridge.text()).toString(),
-"c", 
RandomUtils.randomValue(bridge.text()).toString());
+  pk < midPoint ? "a" : "b", 
RandomUtils.randomValue(bridge.text()).toString(),
+  "c", RandomUtils.randomValue(bridge.text()).toString());

Review Comment:
   It is hard to read. It is not obvious that those 2 lines are the parameters 
for the `of` method without indentations. 
   Please restore the formatting. 



##
cassandra-bridge/src/main/java/org/apache/cassandra/spark/data/CqlField.java:
##
@@ -263,7 +263,7 @@ public CqlField(boolean isPartitionKey,
 this.isPartitionKey = isPartitionKey;
 this.isClusteringColumn = isClusteringColumn;
 this.isStaticColumn = isStaticColumn;
-this.name = name.replaceAll("\"", "");
+this.name = name;

Review Comment:
   > we no longer need to drop the quotes, since we no longer pass the quoted 
string from here: 
https://github.com/apache/cassandra-analytics/pull/19/files#diff-143d0bc336629bfe78bf9f0c64d83803701d38b22a4be736323277862b784b25R522
   
   This comment is confusing to me. The new test cases in `EndToEndTests` pass 
quoted names, and the tests fail if quotation marks are removed. 
   Fo

Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]

2023-11-30 Thread via GitHub


frankgh commented on code in PR #19:
URL: 
https://github.com/apache/cassandra-analytics/pull/19#discussion_r1411410934


##
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/CassandraDataLayer.java:
##
@@ -950,204 +973,7 @@ protected void await(CountDownLatch latch)
 }
 }
 
-public static final class ClientConfig

Review Comment:
   moved this to its own class



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]

2023-11-13 Thread via GitHub


frankgh commented on code in PR #19:
URL: 
https://github.com/apache/cassandra-analytics/pull/19#discussion_r1391835706


##
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/utils/CqlUtils.java:
##
@@ -95,25 +95,6 @@ else if (schema.charAt(index) == '(')
 return schema.substring(0, index + 1);
 }
 
-/**
- * The schema might contain quotes, but on-disk the table path doesn't 
have quotes, so causes problems for create/list snapshot

Review Comment:
   this is now handled by Sidecar properly



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]

2023-11-13 Thread via GitHub


frankgh commented on code in PR #19:
URL: 
https://github.com/apache/cassandra-analytics/pull/19#discussion_r1391834218


##
cassandra-bridge/src/main/java/org/apache/cassandra/spark/data/CqlField.java:
##
@@ -263,7 +263,7 @@ public CqlField(boolean isPartitionKey,
 this.isPartitionKey = isPartitionKey;
 this.isClusteringColumn = isClusteringColumn;
 this.isStaticColumn = isStaticColumn;
-this.name = name.replaceAll("\"", "");
+this.name = name;

Review Comment:
   we no longer need to drop the quotes, since we no longer pass the quoted 
string from here: 
https://github.com/apache/cassandra-analytics/pull/19/files#diff-143d0bc336629bfe78bf9f0c64d83803701d38b22a4be736323277862b784b25R522



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]

2023-11-13 Thread via GitHub


frankgh commented on code in PR #19:
URL: 
https://github.com/apache/cassandra-analytics/pull/19#discussion_r1391833647


##
cassandra-four-zero/src/main/java/org/apache/cassandra/spark/reader/SchemaBuilder.java:
##
@@ -519,7 +519,7 @@ private List buildFields(TableMetadata metadata, 
Map