absurdfarce commented on code in PR #1952:
URL: 
https://github.com/apache/cassandra-java-driver/pull/1952#discussion_r1926337271


##########
core/src/main/java/com/datastax/oss/driver/api/core/type/DataTypes.java:
##########
@@ -69,12 +66,21 @@ public static DataType custom(@NonNull String className) {
 
     /* Vector support is currently implemented as a custom type but is also 
parameterized */
     if (className.startsWith(DefaultVectorType.VECTOR_CLASS_NAME)) {
-      List<String> params =
-          paramSplitter.splitToList(
-              className.substring(
-                  DefaultVectorType.VECTOR_CLASS_NAME.length() + 1, 
className.length() - 1));
-      DataType subType = classNameParser.parse(params.get(0), 
AttachmentPoint.NONE);
-      int dimensions = Integer.parseInt(params.get(1));
+      String paramsString =

Review Comment:
   I agree with @SiyaoIsHiding that the original impl doesn't handle nested 
vector types.  For example, with the current code the following fails on 
"test_type_string_nested" but passes everywhere else:
   
   ```
   package com.datastax.oss.driver.api.core.type;
   
   import com.datastax.oss.driver.internal.core.type.DefaultVectorType;
   import org.junit.Test;
   
   import static org.assertj.core.api.AssertionsForClassTypes.assertThat;
   
   public class DataTypesTest {
   
       private DataType float3 = DataTypes.vectorOf(DataTypes.FLOAT,3);
   
       @Test
       public void test_type_string_basic() {
           String typeString = "org.apache.cassandra.db.marshal.VectorType(" +
                   "org.apache.cassandra.db.marshal.FloatType,3)";
           DataType dataType = DataTypes.custom(typeString);
           assertThat(dataType).isInstanceOf(DefaultVectorType.class);
           DefaultVectorType result = (DefaultVectorType)dataType;
           assertThat(result.getElementType()).isEqualTo(DataTypes.FLOAT);
           assertThat(result.getDimensions()).isEqualTo(3);
       }
   
       @Test
       public void test_type_string_nested() {
           String typeString = "org.apache.cassandra.db.marshal.VectorType(" +
                   "org.apache.cassandra.db.marshal.VectorType(" +
                   "org.apache.cassandra.db.marshal.FloatType,3), 3)";
           DataType dataType = DataTypes.custom(typeString);
           assertThat(dataType).isInstanceOf(DefaultVectorType.class);
           DefaultVectorType result = (DefaultVectorType)dataType;
           assertThat(result.getElementType()).isEqualTo(float3);
           assertThat(result.getDimensions()).isEqualTo(3);
       }
   
       @Test
       public void test_vector_data_type_basic() {
           assertThat(float3).isInstanceOf(DefaultVectorType.class);
           DefaultVectorType result = (DefaultVectorType)float3;
           assertThat(result.getElementType()).isEqualTo(DataTypes.FLOAT);
           assertThat(result.getDimensions()).isEqualTo(3);
       }
   
       @Test
       public void test_vector_data_type_nested() {
           DataType float3 = DataTypes.vectorOf(DataTypes.FLOAT,3);
           DataType float3Nested = DataTypes.vectorOf(float3, 3);
           assertThat(float3Nested).isInstanceOf(DefaultVectorType.class);
           DefaultVectorType result = (DefaultVectorType)float3Nested;
           assertThat(result.getElementType()).isEqualTo(float3);
           assertThat(result.getDimensions()).isEqualTo(3);
       }
   }
   ```
   
   The changes in this PR make that test pass in it's entirety.
   
   I _will_ argue that the implementation in this PR does split the logic for 
handling these argument lists into two places.  Note that 
DataTypeClassNameParser [_already 
has_](https://github.com/apache/cassandra-java-driver/blob/4.x/core/src/main/java/com/datastax/oss/driver/internal/core/metadata/schema/parsing/DataTypeClassNameParser.java#L244-L267)
 logic for handling type parameters for type strings.  This PR then 
re-implements something similar in DataTypes.custom().  This has the effect of 
actually bouncing back and forth between the two classes when dealing with 
nested types: DataTypes.custom() calls DTCNP.parse() which calls 
DataTypes.custom() if it doesn't find a customized handler for a type and so on.
   
   This back-and-forth behaviour seems to be the key to the whole thing.  To me 
it's indicating that DataTypeClassNameParser hasn't been properly updated to 
handle vector types yet.  A quick review of the source code confirms that 
that's the case.  I'll note that I can also make the DataTypesTest class above 
pass via the following change:
   
   ```
   $ git diff                                                                   
                                             [29/29]
   diff --git 
a/core/src/main/java/com/datastax/oss/driver/api/core/type/DataTypes.java 
b/core/src/main/java/com/datastax/oss/driver/api/core/type/DataTypes.java
   index 4447e2830..492fc121c 100644
   --- a/core/src/main/java/com/datastax/oss/driver/api/core/type/DataTypes.java
   +++ b/core/src/main/java/com/datastax/oss/driver/api/core/type/DataTypes.java
   @@ -65,29 +65,8 @@ public class DataTypes {  
        if (className.equals("org.apache.cassandra.db.marshal.DurationType")) 
return DURATION;
     
        /* Vector support is currently implemented as a custom type but is also 
parameterized */                                                                
                                                   -    if 
(className.startsWith(DefaultVectorType.VECTOR_CLASS_NAME)) {
   -      String paramsString =     
   -          className.substring(                                              
                                                                                
                                                   
   -              DefaultVectorType.VECTOR_CLASS_NAME.length() + 1, 
className.length() - 1);                                                        
                                                               
   -      int lastCommaIndex = paramsString.lastIndexOf(',');                   
    
   -      if (lastCommaIndex == -1) {                                           
                           
   -        throw new IllegalArgumentException(                                 
                           
   -            String.format(                                                  
                           
   -                "Invalid vector type %s, expected format is %s<subtype, 
dimensions>",
   -                className, DefaultVectorType.VECTOR_CLASS_NAME));        
   -      }                                                                     
                           
   -      String subTypeString = paramsString.substring(0, 
lastCommaIndex).trim();   
   -      String dimensionsString = paramsString.substring(lastCommaIndex + 
1).trim();
   -                                                                            
                           
   -      DataType subType = classNameParser.parse(subTypeString, 
AttachmentPoint.NONE);
   -      int dimensions = Integer.parseInt(dimensionsString);
   -      if (dimensions <= 0) {
   -        throw new IllegalArgumentException(
   -            String.format(
   -                "Request to create vector of size %d, size must be 
positive", dimensions));
   -      }               
   -      return new DefaultVectorType(subType, dimensions);
   -    }
   +    if (className.startsWith(DefaultVectorType.VECTOR_CLASS_NAME))
   +      return classNameParser.parse(className, AttachmentPoint.NONE);
        return new DefaultCustomType(className);
      }
   diff --git 
a/core/src/main/java/com/datastax/oss/driver/internal/core/metadata/schema/parsing/DataTypeClassNameParser.java
 
b/core/src/main/java/com/datastax/oss/driver/internal/core/metadata/schema/parsing/Da
   taTypeClassNameParser.java
   index fd6f1a4bd..4cf77d467 100644                                            
                           
   --- 
a/core/src/main/java/com/datastax/oss/driver/internal/core/metadata/schema/parsing/DataTypeClassNameParser.java
   +++ 
b/core/src/main/java/com/datastax/oss/driver/internal/core/metadata/schema/parsing/DataTypeClassNameParser.java
   @@ -26,17 +26,21 @@ import 
com.datastax.oss.driver.api.core.type.UserDefinedType;
    import com.datastax.oss.driver.api.core.type.codec.TypeCodecs;
    import com.datastax.oss.driver.internal.core.context.InternalDriverContext;
    import com.datastax.oss.driver.internal.core.type.DefaultTupleType;
   +import com.datastax.oss.driver.internal.core.type.DefaultVectorType;
    import com.datastax.oss.driver.internal.core.type.UserDefinedTypeBuilder;
    import com.datastax.oss.driver.internal.core.type.codec.ParseUtils;
    import 
com.datastax.oss.driver.shaded.guava.common.annotations.VisibleForTesting;
    import com.datastax.oss.driver.shaded.guava.common.collect.ImmutableList;
    import com.datastax.oss.driver.shaded.guava.common.collect.ImmutableMap;
    import com.datastax.oss.protocol.internal.util.Bytes;
   +
    import java.util.ArrayList;
    import java.util.Collections;
   +import java.util.Iterator;
    import java.util.LinkedHashMap;
    import java.util.List;
    import java.util.Map;
   +
    import net.jcip.annotations.ThreadSafe;
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
   @@ -164,6 +168,13 @@ public class DataTypeClassNameParser implements 
DataTypeParser {
          return new DefaultTupleType(componentTypesBuilder.build(), 
attachmentPoint);
        }
     
   +    if (next.startsWith("org.apache.cassandra.db.marshal.VectorType")) {
   +      Iterator<String> rawTypes = parser.getTypeParameters().iterator();
   +      DataType subtype = parse(rawTypes.next(), userTypes, attachmentPoint, 
logPrefix);
   +      int dimensions = Integer.parseInt(rawTypes.next());
   +      return DataTypes.vectorOf(subtype, dimensions);
   +    }
   +
        DataType type = NATIVE_TYPES_BY_CLASS_NAME.get(next);
        return type == null ? DataTypes.custom(toParse) : type;
      }
   ```
   
   I'd argue this fix (or something very much like it) is much preferred 
because (a) it keeps the arg parsing logic consolidated in 
DataTypeClassNameParser (which seems like the ideal place to put it) and (b) it 
improves DTCNP so that it can be used for additional vector-related 
functionality in the future.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to