[GitHub] [hudi] codejoyan opened a new issue #2592: [SUPPORT] Does latest versions of Hudi (0.7.0, 0.6.0) work with Spark 2.3.0 when reading orc files?

GitBox Mon, 22 Feb 2021 20:14:59 -0800


codejoyan opened a new issue #2592:
URL: https://github.com/apache/hudi/issues/2592



   **Description:**
   The documentation says: Hudi works with Spark-2.x & Spark 3.x versions. 
(https://hudi.apache.org/docs/quick-start-guide.html) But I have not been able 
to use hudi-spark-bundle_2.11 version 0.7.0 with Spark 2.3.0 and Scala 2.11.12. 
Is there any specific spark_avro package one has to use?
   
   I am trying to write a dataframe read from orc file
   The job fails with the below error: java.lang.NoSuchMethodError: 
org.apache.spark.sql.types.Decimal$.minBytesForPrecision()[I Any inputs will be 
very helpful.
   
   In the cluster I am working with we have Spark 2.3.0 and there is no 
immediate upgrade planned. Wanted to check if there is there any way to make 
Hudi 0.7.0 work with Spark 2.3.0?
   
   Note: I am able to use Spark 2.3.0 with 
hudi-spark-bundle-0.5.0-incubating.jar
   
   **spark-shell Command**
   spark-shell \
    --packages 
org.apache.hudi:hudi-spark-bundle_2.11:0.7.0,org.apache.spark:spark-avro_2.11:2.4.4,org.apache.avro:avro:1.8.2
 \
    --conf 
spark.driver.extraClassPath=/u/users/j0s0j7j/.ivy2/jars/org.apache.avro_avro-1.8.2.jar
 \
    --conf 
spark.executor.extraClassPath=/u/users/j0s0j7j/.ivy2/jars/org.apache.avro_avro-1.8.2.jar
 \
    --conf "spark.sql.hive.convertMetastoreParquet=false" \
    --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
   
   **Stack Strace**
   scala>   val BaseTableIncrLoad = spark.sql(
           s"""select * from $srcSchema.$srcTable
              |where load_time >= ('2021-01-20') and load_time < ('2021-01-21')
   
   val transformedDF = BaseTableIncrLoad.withColumn("partitionpath", 
concat(lit("part_str_col="), col("part_str_col"), lit("/part_date_col="), 
col("part_date_col")))
   
   scala> transformedDF.write.format("org.apache.hudi").
            |         options(getQuickstartWriteConfigs).
            |         option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, 
"col1").
            |         //option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, 
"col2").
            |         option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, 
"col3,col4,col5").
            |         
option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
            |         option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, 
"org.apache.hudi.ComplexKeyGenerator").
            |         option("hoodie.upsert.shuffle.parallelism","20").
            |         option("hoodie.insert.shuffle.parallelism","20").
            |         
option(HoodieCompactionConfig.PARQUET_SMALL_FILE_LIMIT_BYTES, 128 * 1024 * 
1024).
            |         option(HoodieStorageConfig.PARQUET_FILE_MAX_BYTES, 128 * 
1024 * 1024).
            |         option(HoodieWriteConfig.TABLE_NAME, "targetTableHudi").
            |         mode(SaveMode.Append).
            |         save(targetPath)
       21/02/22 07:14:03 WARN Utils: Truncated the string representation of a 
plan since it was too large. This behavior can be adjusted by setting 
'spark.debug.maxToStringFields' in SparkEnv.conf.
       java.lang.NoSuchMethodError: 
org.apache.spark.sql.types.Decimal$.minBytesForPrecision()[I
         at 
org.apache.hudi.spark.org.apache.spark.sql.avro.SchemaConverters$.toAvroType(SchemaConverters.scala:156)
         at 
org.apache.hudi.spark.org.apache.spark.sql.avro.SchemaConverters$$anonfun$5.apply(SchemaConverters.scala:176)
         at 
org.apache.hudi.spark.org.apache.spark.sql.avro.SchemaConverters$$anonfun$5.apply(SchemaConverters.scala:174)
         at scala.collection.Iterator$class.foreach(Iterator.scala:893)
         at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
         at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
         at org.apache.spark.sql.types.StructType.foreach(StructType.scala:99)
         at 
org.apache.hudi.spark.org.apache.spark.sql.avro.SchemaConverters$.toAvroType(SchemaConverters.scala:174)
         at 
org.apache.hudi.AvroConversionUtils$.convertStructTypeToAvroSchema(AvroConversionUtils.scala:52)
         at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:139)
         at 
org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:134)
         at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
         at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
         at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
         at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
         at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
         at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
         at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
         at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
         at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
         at 
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
         at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
         at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
         at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
         at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
         at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
         at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654)
         at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273)
         at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267)
         at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:225)
         ... 62 elided


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codejoyan opened a new issue #2592: [SUPPORT] Does latest versions of Hudi (0.7.0, 0.6.0) work with Spark 2.3.0 when reading orc files?

Reply via email to