davidsheard opened a new issue #1759:
URL: https://github.com/apache/hudi/issues/1759


   Hi,
   
   We can't seem to get our Hudi Table to show in Hive on Cloudera. We have 
dropped the Hudi jar into Hive Auxiliary JARs Directory and restarted Hive. But 
no luck. We are hoping to Demo the merits of Hudi but can't until we can 
rectify the Hive issue.
   
   Spark Config:
   SparkConf conf = new SparkConf();
                conf.setAppName("Hudi Test");
                conf.set("spark.debug.maxToStringFields", "100");
                conf.set("spark.sql.shuffle.partitions", "2001");
                conf.set("spark.sql.warehouse.dir", "/user/hive/warehouse");
                conf.set("spark.sql.autoBroadcastJoinThreshold","31457280");
                
conf.set("spark.sql.hive.filesourcePartitionFileCacheSize","2000000000");
                conf.set("spark.sql.sources.partitionOverwriteMode","dynamic");
                
conf.set("mapreduce.input.fileinputformat.input.dir.recursive","true");
                conf.set("spark.storage.replication.proactive","true");
   
   Hudi Config:
   forms.write()
                        .format("org.apache.hudi")
                        
.option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), "id_trans")
                        
.option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), "id_form_str")
                        
.option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(), "update_dttm")
                        
.option(DataSourceWriteOptions.HIVE_STYLE_PARTITIONING_OPT_KEY(), "true")
                        .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY(), 
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL())
                        .option(DataSourceWriteOptions.OPERATION_OPT_KEY(), 
DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL())
                        .option(DataSourceWriteOptions.HIVE_URL_OPT_KEY(), 
"jdbc:hive2://localhost:10000")                      
                        .option(HoodieWriteConfig.TABLE_NAME, "david.davhudi2")
                        .mode(SaveMode.Append)
                        .save(savePath);
   
   Environment Description
   
       Hudi version : 0.5.3
   
       Spark version : 2.40
   
       Cloudera version : 6.33
   
       Hadoop version : 3.0.0
   
       Storage (HDFS/S3/GCS..) : HDFS
   
       Running on Docker? (yes/no) : No
   
   Spark-submit:
   output_name=`date +%s`
   log4j_setting='-Dlog4j.configuration=file:log4j.properties'
   echo "Running Spark-submit"
   echo `date`
   
   SPARK_CMD="spark2-submit  \
   --files log4j.properties
   --files /etc/hive/conf.cloudera.hive/hive-site.xml
   --conf "spark.driver.extraJavaOptions=${log4j_setting}" \
   --conf "spark.executor.extraJavaOptions=${log4j_setting}" \
    --master yarn \
    --deploy-mode client \
    --num-executors 30 \
    --executor-memory 16g \
    --driver-memory 10g \
    --queue root.adhoc.dataScientists \
    --conf spark.scheduler.mode=FAIR \
    --conf yarn.nodemanager.vmem-check-enabled=false \
    --conf spark.executor.memoryOverhead=1072 \
    --conf spark.driver.memoryOverhead=2048 \
    --conf spark.executor.cores=2 \
    --conf spark.kryoserializer.buffer.max=2000m \
    --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
    --conf spark.sql.hive.convertMetastoreParquet=false \
    --conf spark.executor.heartbeatInterval=120s \
    --conf spark.network.timeout=600s \
    --conf spark.sql.catalogImplementation=hive \
    --class 'hudi.DataLoader' \
   'hudi-poc-0.0.1-SNAPSHOT.jar'  $1"
   
   eval nohup $SPARK_CMD > logs/run_hudi_forms_bulk.log 2>&1 &
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to