Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]
frankgh closed pull request #19: CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes URL: https://github.com/apache/cassandra-analytics/pull/19 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]
frankgh commented on PR #19: URL: https://github.com/apache/cassandra-analytics/pull/19#issuecomment-1845780320 Closed via https://github.com/apache/cassandra-analytics/commit/457b36bcb3c8a865cca83ca6c402246798113ab4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]
frankgh commented on code in PR #19: URL: https://github.com/apache/cassandra-analytics/pull/19#discussion_r1414720469 ## cassandra-bridge/src/main/java/org/apache/cassandra/spark/data/CqlField.java: ## @@ -263,7 +263,7 @@ public CqlField(boolean isPartitionKey, this.isPartitionKey = isPartitionKey; this.isClusteringColumn = isClusteringColumn; this.isStaticColumn = isStaticColumn; -this.name = name.replaceAll("\"", ""); +this.name = name; Review Comment: ok, took a deeper look. I think the confusion is mainly on the EndToEnd testing code. So what I've done is address the test setup in testing code. I have made it so that we always quote the fields, when quotes are needed for testing purposes. I have also added a test case that validates the column filter (`PartialRowBuilder` class). That way we guarantee that we are able to retrieve columns that are requested in the schema where columns are case sensitive in Cassandra -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]
frankgh commented on code in PR #19: URL: https://github.com/apache/cassandra-analytics/pull/19#discussion_r1414386801 ## cassandra-analytics-core/src/main/spark3/org/apache/cassandra/spark/sparksql/CassandraScanBuilder.java: ## @@ -135,17 +136,16 @@ private List buildPartitionKeyFilters() { List partitionKeyColumnNames = dataLayer.cqlTable().partitionKeys().stream().map(CqlField::name).collect(Collectors.toList()); Map> partitionKeyValues = FilterUtils.extractPartitionKeyValues(pushedFilters, new HashSet<>(partitionKeyColumnNames)); -if (partitionKeyValues.size() > 0) -{ -List> orderedValues = partitionKeyColumnNames.stream().map(partitionKeyValues::get).collect(Collectors.toList()); -return FilterUtils.cartesianProduct(orderedValues).stream() -.map(this::buildFilter) -.collect(Collectors.toList()); -} -else + +if (partitionKeyValues.isEmpty()) { -return new ArrayList<>(); +return Collections.emptyList(); } + +List> orderedValues = partitionKeyColumnNames.stream().map(partitionKeyValues::get).collect(Collectors.toList()); +return FilterUtils.cartesianProduct(orderedValues).stream() + .map(this::buildFilter) + .collect(Collectors.toList()); Review Comment: reverted -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]
frankgh commented on code in PR #19: URL: https://github.com/apache/cassandra-analytics/pull/19#discussion_r1414386089 ## cassandra-bridge/src/main/java/org/apache/cassandra/spark/data/DataLayer.java: ## @@ -171,8 +171,8 @@ public CassandraVersion version() public abstract boolean isInPartition(int partitionId, BigInteger token, ByteBuffer key); public List partitionKeyFiltersInRange( -int partitionId, -List partitionKeyFilters) throws NoMatchFoundException +int partitionId, +List partitionKeyFilters) throws NoMatchFoundException Review Comment: reverted ## cassandra-analytics-core/src/test/java/org/apache/cassandra/spark/EndToEndTests.java: ## @@ -1851,19 +1848,19 @@ public void testUdtsWithNulls(CassandraBridge bridge) for (long pk = 0; pk < Tester.DEFAULT_NUM_ROWS; pk++) { Map value = ImmutableMap.of( -pk < midPoint ? "a" : "b", RandomUtils.randomValue(bridge.text()).toString(), -"c", RandomUtils.randomValue(bridge.text()).toString()); + pk < midPoint ? "a" : "b", RandomUtils.randomValue(bridge.text()).toString(), + "c", RandomUtils.randomValue(bridge.text()).toString()); Review Comment: reverted -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]
frankgh commented on code in PR #19: URL: https://github.com/apache/cassandra-analytics/pull/19#discussion_r1414386415 ## cassandra-analytics-core/src/test/java/org/apache/cassandra/spark/EndToEndTests.java: ## @@ -154,15 +154,15 @@ public void testBasicSingleClusteringKey(CassandraBridge bridge) public void testSingleClusteringKeyOrderBy(CassandraBridge bridge) { qt().forAll(TestUtils.cql3Type(bridge), TestUtils.sortOrder()) -.checkAssert((clusteringKeyType, sortOrder) -> +.checkAssert((clusteringKeyType, sortOrder) -> { Tester.builder(TestSchema.builder() .withPartitionKey("a", bridge.bigint()) .withClusteringKey("b", clusteringKeyType) .withColumn("c", bridge.bigint()) .withSortOrder(sortOrder)) .withExpectedRowCountPerSSTable(Tester.DEFAULT_NUM_ROWS) - .run() -); + .run(); +}); Review Comment: reverted them already -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]
frankgh commented on code in PR #19: URL: https://github.com/apache/cassandra-analytics/pull/19#discussion_r1414346332 ## cassandra-bridge/src/main/java/org/apache/cassandra/spark/data/CqlField.java: ## @@ -263,7 +263,7 @@ public CqlField(boolean isPartitionKey, this.isPartitionKey = isPartitionKey; this.isClusteringColumn = isClusteringColumn; this.isStaticColumn = isStaticColumn; -this.name = name.replaceAll("\"", ""); +this.name = name; Review Comment: Handling of UDTs is different from handling of keyspace / table. For UDTs, the bridge was already taking care of preserving the quotes. However, for user facing options (the keyspace and table specifically), we need special handling based on whether the user explicitly requests to quote identifiers -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]
yifan-c commented on code in PR #19: URL: https://github.com/apache/cassandra-analytics/pull/19#discussion_r1414339007 ## cassandra-analytics-core/src/test/java/org/apache/cassandra/spark/EndToEndTests.java: ## @@ -154,15 +154,15 @@ public void testBasicSingleClusteringKey(CassandraBridge bridge) public void testSingleClusteringKeyOrderBy(CassandraBridge bridge) { qt().forAll(TestUtils.cql3Type(bridge), TestUtils.sortOrder()) -.checkAssert((clusteringKeyType, sortOrder) -> +.checkAssert((clusteringKeyType, sortOrder) -> { Tester.builder(TestSchema.builder() .withPartitionKey("a", bridge.bigint()) .withClusteringKey("b", clusteringKeyType) .withColumn("c", bridge.bigint()) .withSortOrder(sortOrder)) .withExpectedRowCountPerSSTable(Tester.DEFAULT_NUM_ROWS) - .run() -); + .run(); +}); Review Comment: I am fine with keeping them, if it is trouble to revert all of them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]
frankgh commented on code in PR #19: URL: https://github.com/apache/cassandra-analytics/pull/19#discussion_r1414336436 ## cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/CassandraDataLayer.java: ## @@ -343,7 +346,7 @@ private CompletionStage> createSnapshot(RingRespon snapshotName, keyspace, table, datacenter, ringEntry.fqdn()); SidecarInstance sidecarInstance = new SidecarInstanceImpl(ringEntry.fqdn(), sidecarClientConfig.effectivePort()); createSnapshotFuture = sidecar - .createSnapshot(sidecarInstance, keyspace, table, snapshotName) + .createSnapshot(sidecarInstance, maybeQuotedKeyspace, maybeQuotedTable, snapshotName) Review Comment: however, the cleanup of snapshots happens on a different thread, if an exception occurs before the bridge has been initialized (i.e an error occurs when getting the cassandra version `String cassandraVersion = getEffectiveCassandraVersionForRead(clusterConfig, nodeSettings);` ) then the shutdown hook will be triggered and it will try to cleanup snapshots, even though the snapshot was never created in the first place -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]
frankgh commented on code in PR #19: URL: https://github.com/apache/cassandra-analytics/pull/19#discussion_r1414334713 ## cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/CassandraDataLayer.java: ## @@ -343,7 +346,7 @@ private CompletionStage> createSnapshot(RingRespon snapshotName, keyspace, table, datacenter, ringEntry.fqdn()); SidecarInstance sidecarInstance = new SidecarInstanceImpl(ringEntry.fqdn(), sidecarClientConfig.effectivePort()); createSnapshotFuture = sidecar - .createSnapshot(sidecarInstance, keyspace, table, snapshotName) + .createSnapshot(sidecarInstance, maybeQuotedKeyspace, maybeQuotedTable, snapshotName) Review Comment: if the bridge is initialized correctly, the fields will never be null when we reach this point. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]
frankgh commented on code in PR #19: URL: https://github.com/apache/cassandra-analytics/pull/19#discussion_r1414331310 ## cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/BulkSparkConf.java: ## @@ -128,6 +128,7 @@ public class BulkSparkConf implements Serializable public final int commitThreadsPerInstance; protected final int effectiveSidecarPort; protected final int userProvidedSidecarPort; +public boolean quoteIdentifiers; Review Comment: changes related to bulk writes, moving to another PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]
frankgh commented on PR #19: URL: https://github.com/apache/cassandra-analytics/pull/19#issuecomment-1839240525 > But I would just write it explicitly that "Cassandra retreats all identifiers as they are in lower case unless quoted Yeah, this makes more sense, I will update the commit message -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]
frankgh commented on code in PR #19: URL: https://github.com/apache/cassandra-analytics/pull/19#discussion_r1414326962 ## cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/CassandraDataSourceHelper.java: ## @@ -90,7 +94,7 @@ public static DataLayer getDataLayer( } } -protected static Cache, CassandraDataLayer> getCassandraDataLayerCache() +public static Cache, CassandraDataLayer> getCassandraDataLayerCache() Review Comment: the change is not needed, I will revert -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]
frankgh commented on code in PR #19: URL: https://github.com/apache/cassandra-analytics/pull/19#discussion_r1414315065 ## cassandra-analytics-core/src/test/java/org/apache/cassandra/spark/EndToEndTests.java: ## @@ -154,15 +154,15 @@ public void testBasicSingleClusteringKey(CassandraBridge bridge) public void testSingleClusteringKeyOrderBy(CassandraBridge bridge) { qt().forAll(TestUtils.cql3Type(bridge), TestUtils.sortOrder()) -.checkAssert((clusteringKeyType, sortOrder) -> +.checkAssert((clusteringKeyType, sortOrder) -> { Tester.builder(TestSchema.builder() .withPartitionKey("a", bridge.bigint()) .withClusteringKey("b", clusteringKeyType) .withColumn("c", bridge.bigint()) .withSortOrder(sortOrder)) .withExpectedRowCountPerSSTable(Tester.DEFAULT_NUM_ROWS) - .run() -); + .run(); +}); Review Comment: I'll revert the unnecessary changes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]
frankgh commented on code in PR #19: URL: https://github.com/apache/cassandra-analytics/pull/19#discussion_r1414295872 ## cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/CassandraDataLayer.java: ## @@ -239,12 +241,23 @@ public void initialize(@NotNull ClientConfig options) LOGGER.info("Initialized Cassandra Bulk Reader with effectiveNumberOfCores={}", effectiveNumberOfCores); } -private int initBulkReader(@NotNull ClientConfig options, - CompletableFuture nodeSettingsFuture, - CompletableFuture ringFuture) throws ExecutionException, InterruptedException +private int initBulkReader(@NotNull ClientConfig options) throws ExecutionException, InterruptedException { Preconditions.checkArgument(keyspace != null, "Keyspace must be non-null for Cassandra Bulk Reader"); Preconditions.checkArgument(table != null, "Table must be non-null for Cassandra Bulk Reader"); + ShutdownHookManager.addShutdownHook(org.apache.spark.util.ShutdownHookManager.TEMP_DIR_SHUTDOWN_PRIORITY(), +ScalaFunctions.wrapLambda(() -> shutdownHook(options))); + +NodeSettings nodeSettings = sidecar.nodeSettings().get(); +String cassandraVersion = getEffectiveCassandraVersionForRead(clusterConfig, nodeSettings); +Partitioner partitioner = Partitioner.from(nodeSettings.partitioner()); +bridge = CassandraBridgeFactory.get(cassandraVersion); +// optionally quote identifiers if the option has been set, we need an instance for the bridge +maybeQuoteKeyspaceAndTable(); Review Comment: yeah, the constructor that calls this method is only called in the executors. For the driver, it will use this method to initialize the keyspace and table -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]
frankgh commented on code in PR #19: URL: https://github.com/apache/cassandra-analytics/pull/19#discussion_r1414272495 ## cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/WriterOptions.java: ## @@ -45,5 +45,10 @@ public enum WriterOptions implements WriterOption ROW_BUFFER_MODE, SSTABLE_DATA_SIZE_IN_MB, TTL, -TIMESTAMP +TIMESTAMP, +/** + * Option that specifies whether the identifiers (i.e. keyspace, table name, column names) should be quoted to + * support mixed case and reserved keyword names for these fields. + */ +QUOTE_IDENTIFIERS, Review Comment: this should go in CASSANDRA-19031 instead. I will clean it up here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]
yifan-c commented on PR #19: URL: https://github.com/apache/cassandra-analytics/pull/19#issuecomment-1838940923 > Cassandra in nature is case insensitive when creating identifiers (i.e. keyspace names, table names, column names, etc). We can define a case-sensitive identifier or we can use a reserved word as an identifier by quoting it during DDL creation. I won't say that Cassandra is case-insensitive in nature. The statement is not helpful to understand the commit, which is essentially adding a new option to toggle quote the identifiers whenever desired. That said, I won't be insisting on removing the line. But I would just write it explicitly that "Cassandra retreats all identifiers as they are in lower case unless quoted". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]
yifan-c commented on code in PR #19: URL: https://github.com/apache/cassandra-analytics/pull/19#discussion_r1413883025 ## cassandra-analytics-core/src/main/java/org/apache/cassandra/clients/SidecarInstanceImpl.java: ## @@ -90,14 +90,14 @@ public String toString() private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException { -LOGGER.warn("Falling back to JDK deserialization"); +LOGGER.debug("Falling back to JDK deserialization"); Review Comment: +1 on changing the log level to debug. If warning on serialization method is desired, it should be logged separated and warn just once. ## cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/WriterOptions.java: ## @@ -45,5 +45,10 @@ public enum WriterOptions implements WriterOption ROW_BUFFER_MODE, SSTABLE_DATA_SIZE_IN_MB, TTL, -TIMESTAMP +TIMESTAMP, +/** + * Option that specifies whether the identifiers (i.e. keyspace, table name, column names) should be quoted to + * support mixed case and reserved keyword names for these fields. + */ +QUOTE_IDENTIFIERS, Review Comment: why it is a writer option? ## cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/CassandraDataLayer.java: ## @@ -343,7 +346,7 @@ private CompletionStage> createSnapshot(RingRespon snapshotName, keyspace, table, datacenter, ringEntry.fqdn()); Review Comment: The log message should use `maybeQuotedKeyspace` and `maybeQuotedTable` instead. The parameters in the log message above (`LOGGER.warn("Skip snapshot creating when node is joining or down "`) should be updated too. And the log messages in `listInstance` method. ## cassandra-analytics-core/src/main/spark3/org/apache/cassandra/spark/sparksql/CassandraScanBuilder.java: ## @@ -135,17 +136,16 @@ private List buildPartitionKeyFilters() { List partitionKeyColumnNames = dataLayer.cqlTable().partitionKeys().stream().map(CqlField::name).collect(Collectors.toList()); Map> partitionKeyValues = FilterUtils.extractPartitionKeyValues(pushedFilters, new HashSet<>(partitionKeyColumnNames)); -if (partitionKeyValues.size() > 0) -{ -List> orderedValues = partitionKeyColumnNames.stream().map(partitionKeyValues::get).collect(Collectors.toList()); -return FilterUtils.cartesianProduct(orderedValues).stream() -.map(this::buildFilter) -.collect(Collectors.toList()); -} -else + +if (partitionKeyValues.isEmpty()) { -return new ArrayList<>(); +return Collections.emptyList(); } + +List> orderedValues = partitionKeyColumnNames.stream().map(partitionKeyValues::get).collect(Collectors.toList()); +return FilterUtils.cartesianProduct(orderedValues).stream() + .map(this::buildFilter) + .collect(Collectors.toList()); Review Comment: It looks like only the if and else blocks are switched. I would discourage such refactoring. We can keep this change since it is done, but let's refrain from doing so in the future patches. ## cassandra-analytics-core/src/test/java/org/apache/cassandra/spark/EndToEndTests.java: ## @@ -1851,19 +1848,19 @@ public void testUdtsWithNulls(CassandraBridge bridge) for (long pk = 0; pk < Tester.DEFAULT_NUM_ROWS; pk++) { Map value = ImmutableMap.of( -pk < midPoint ? "a" : "b", RandomUtils.randomValue(bridge.text()).toString(), -"c", RandomUtils.randomValue(bridge.text()).toString()); + pk < midPoint ? "a" : "b", RandomUtils.randomValue(bridge.text()).toString(), + "c", RandomUtils.randomValue(bridge.text()).toString()); Review Comment: It is hard to read. It is not obvious that those 2 lines are the parameters for the `of` method without indentations. Please restore the formatting. ## cassandra-bridge/src/main/java/org/apache/cassandra/spark/data/CqlField.java: ## @@ -263,7 +263,7 @@ public CqlField(boolean isPartitionKey, this.isPartitionKey = isPartitionKey; this.isClusteringColumn = isClusteringColumn; this.isStaticColumn = isStaticColumn; -this.name = name.replaceAll("\"", ""); +this.name = name; Review Comment: > we no longer need to drop the quotes, since we no longer pass the quoted string from here: https://github.com/apache/cassandra-analytics/pull/19/files#diff-143d0bc336629bfe78bf9f0c64d83803701d38b22a4be736323277862b784b25R522 This comment is confusing to me. The new test cases in `EndToEndTests` pass quoted names, and the tests fail if quotation marks are removed. Fo
Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]
frankgh commented on code in PR #19: URL: https://github.com/apache/cassandra-analytics/pull/19#discussion_r1411410934 ## cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/CassandraDataLayer.java: ## @@ -950,204 +973,7 @@ protected void await(CountDownLatch latch) } } -public static final class ClientConfig Review Comment: moved this to its own class -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]
frankgh commented on code in PR #19: URL: https://github.com/apache/cassandra-analytics/pull/19#discussion_r1391835706 ## cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/utils/CqlUtils.java: ## @@ -95,25 +95,6 @@ else if (schema.charAt(index) == '(') return schema.substring(0, index + 1); } -/** - * The schema might contain quotes, but on-disk the table path doesn't have quotes, so causes problems for create/list snapshot Review Comment: this is now handled by Sidecar properly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]
frankgh commented on code in PR #19: URL: https://github.com/apache/cassandra-analytics/pull/19#discussion_r1391834218 ## cassandra-bridge/src/main/java/org/apache/cassandra/spark/data/CqlField.java: ## @@ -263,7 +263,7 @@ public CqlField(boolean isPartitionKey, this.isPartitionKey = isPartitionKey; this.isClusteringColumn = isClusteringColumn; this.isStaticColumn = isStaticColumn; -this.name = name.replaceAll("\"", ""); +this.name = name; Review Comment: we no longer need to drop the quotes, since we no longer pass the quoted string from here: https://github.com/apache/cassandra-analytics/pull/19/files#diff-143d0bc336629bfe78bf9f0c64d83803701d38b22a4be736323277862b784b25R522 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-19024 Fix bulk reading when using identifiers that need quotes [cassandra-analytics]
frankgh commented on code in PR #19: URL: https://github.com/apache/cassandra-analytics/pull/19#discussion_r1391833647 ## cassandra-four-zero/src/main/java/org/apache/cassandra/spark/reader/SchemaBuilder.java: ## @@ -519,7 +519,7 @@ private List buildFields(TableMetadata metadata, Map