[ https://issues.apache.org/jira/browse/SPARK-33172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Rabinowitz updated SPARK-33172: ------------------------------------- Description: The CodeGenerator takes the DataType given to {{getValueFromVector()}} as is, and generates code based on its type. The generated code is not aware of the actual type, and therefore cannot be compiled. For example, using a DataFrame with a Spark ML Vector (VectorUDT) the generated code is: {{InternalRow datasourcev2scan_value_2 = datasourcev2scan_isNull_2 ? null : (datasourcev2scan_mutableStateArray_2[2].getStruct(datasourcev2scan_rowIdx_0, 4));}} {{ Which leads to a runtime error of}} {{20/10/14 13:20:51 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 153, Column 126: No applicable constructor/method found for actual parameters "int, int"; candidates are: "public org.apache.spark.sql.vectorized.ColumnarRow org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)"}} {{ org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 153, Column 126: No applicable constructor/method found for actual parameters "int, int"; candidates are: "public org.apache.spark.sql.vectorized.ColumnarRow org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)"}} {{ at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12124)}} {{...}} {{ which then throws Spark to an infinite loop of this error.}} The solution is quite simple, {{getValueFromVector()}} should match nad handle UserDefinedType the same as {{CodeGenerator.javaType()}} is doing. was: The CodeGenerator takes the DataType given to {{getValueFromVector()}} as is, and generates code based on its type. The generated code is not aware of the actual type, and therefore cannot be compiled. For example, using a DataFrame with a Spark ML Vector (VectorUDT) the generated code is: {{InternalRow datasourcev2scan_value_2 = datasourcev2scan_isNull_2 ? null : (datasourcev2scan_mutableStateArray_2[2].getStruct(datasourcev2scan_rowIdx_0, 4)); }} Which leads to a runtime error of {{ 20/10/14 13:20:51 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 153, Column 126: No applicable constructor/method found for actual parameters "int, int"; candidates are: "public org.apache.spark.sql.vectorized.ColumnarRow org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)" org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 153, Column 126: No applicable constructor/method found for actual parameters "int, int"; candidates are: "public org.apache.spark.sql.vectorized.ColumnarRow org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)" at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12124) }} which then throws Spark to an infinite loop of this error. The solution is quite simple, {{getValueFromVector()}} should match nad handle UserDefinedType the same as {{CodeGenerator.javaType()}} is doing. > Spark SQL CodeGenerator does not check for UserDefined type > ----------------------------------------------------------- > > Key: SPARK-33172 > URL: https://issues.apache.org/jira/browse/SPARK-33172 > Project: Spark > Issue Type: New Feature > Components: SQL > Affects Versions: 2.4.7, 3.0.1 > Reporter: David Rabinowitz > Priority: Minor > > The CodeGenerator takes the DataType given to {{getValueFromVector()}} as > is, and generates code based on its type. The generated code is not aware of > the actual type, and therefore cannot be compiled. For example, using a > DataFrame with a Spark ML Vector (VectorUDT) the generated code is: > {{InternalRow datasourcev2scan_value_2 = datasourcev2scan_isNull_2 ? null : > (datasourcev2scan_mutableStateArray_2[2].getStruct(datasourcev2scan_rowIdx_0, > 4));}} > {{ Which leads to a runtime error of}} > {{20/10/14 13:20:51 ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 153, Column 126: No applicable constructor/method found for actual parameters > "int, int"; candidates are: "public > org.apache.spark.sql.vectorized.ColumnarRow > org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)"}} > {{ org.codehaus.commons.compiler.CompileException: File 'generated.java', > Line 153, Column 126: No applicable constructor/method found for actual > parameters "int, int"; candidates are: "public > org.apache.spark.sql.vectorized.ColumnarRow > org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)"}} > {{ at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12124)}} > {{...}} > {{ which then throws Spark to an infinite loop of this error.}} > The solution is quite simple, {{getValueFromVector()}} should match nad > handle UserDefinedType the same as {{CodeGenerator.javaType()}} is doing. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org