[GitHub] [hudi] rubenssoto commented on issue #2588: [SUPPORT] Cannot create hive connection
rubenssoto commented on issue #2588: URL: https://github.com/apache/hudi/issues/2588#issuecomment-821765451 @nsivabalan I really tried, our migration to hudi was late more than 2 months, we have a 2 months ticket with AWS and no solution was gave to us. We will create a file on table folder with all table columns, when the new dataframe have different columns comparing to that file, we will enable hudi hive sync. I think it could solve the problem for now, until aws gave to us a better solution. Another approach that we want to try, is to sync hive table through metastore, disabling jdbc -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] rubenssoto commented on issue #2588: [SUPPORT] Cannot create hive connection
rubenssoto commented on issue #2588: URL: https://github.com/apache/hudi/issues/2588#issuecomment-818455364 Hello Guys, I think is something related to EMR and Hive, we made a solution for our own to only enable hive sync when schema changes. It could be a good feature to Hudi, enable syncs only when schema changes or a new partition arrives. I will close the issue, because is not something related to hudi. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] rubenssoto commented on issue #2588: [SUPPORT] Cannot create hive connection
rubenssoto commented on issue #2588: URL: https://github.com/apache/hudi/issues/2588#issuecomment-810773796 @bvaradar which hive version do you use? How many jobs running at the same time? Probably your case, you have a big Hive, to respond a lot of requests, in my case I use hive inside EMR only to integrate with Glue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] rubenssoto commented on issue #2588: [SUPPORT] Cannot create hive connection
rubenssoto commented on issue #2588: URL: https://github.com/apache/hudi/issues/2588#issuecomment-785933427 @bvaradar I think it is a hive issue, I'm trying to increase hive heap size, I hope it helps. I process the tables in threads, so I have almost 20 hive connections open. Do you have any experience with Hudi and Hive? Because Hudi probably execute simple queries to verify table schema This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] rubenssoto commented on issue #2588: [SUPPORT] Cannot create hive connection
rubenssoto commented on issue #2588: URL: https://github.com/apache/hudi/issues/2588#issuecomment-785309606 Hello Guys, I found some new errors: `21/02/24 18:53:18 ERROR HiveSyncTool: Got runtime exception when hive syncing org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync partitions for table order_delivery_failure at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:211) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:148) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94) at org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:355) at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$4(HoodieSparkSqlWriter.scala:403) at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$4$adapted(HoodieSparkSqlWriter.scala:399) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:399) at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:460) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:218) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:134) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:124) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:123) at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:963) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227) at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:107) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:132) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:132) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:248) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:131) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:963) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:415) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:399) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:288) at hudiwriter.HudiWriter.merge(HudiWriter.scala:79) at hudiwriter.HudiContext.writeToHudi(HudiContext.scala:34) at jobs.TableProcessor.start(TableProcessor.scala:86) at TableProcessorWrapper$.$anonfun$main$2(TableProcessorWrapper.scala:23) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) at scala.util.Success.$anonfun$map$1(Try.scala:255) at scala.util.Success.map(Try.scala:213) at scala.concurrent.Future.$anonfun$map$1(Future.scala:292) at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
[GitHub] [hudi] rubenssoto commented on issue #2588: [SUPPORT] Cannot create hive connection
rubenssoto commented on issue #2588: URL: https://github.com/apache/hudi/issues/2588#issuecomment-784705748 @umehrot2 @bvaradar if you have any idea, please help me, is the only problem that prevents me from turning Hudi to production. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] rubenssoto commented on issue #2588: [SUPPORT] Cannot create hive connection
rubenssoto commented on issue #2588: URL: https://github.com/apache/hudi/issues/2588#issuecomment-784695940 https://user-images.githubusercontent.com/36298331/108935409-68927780-762c-11eb-9c3f-591f1b626557.png;> I'm having the problem right now, on master side you could see a lot of cpu usage, but I dont understand why hive is using a lot of CPU This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] rubenssoto commented on issue #2588: [SUPPORT] Cannot create hive connection
rubenssoto commented on issue #2588: URL: https://github.com/apache/hudi/issues/2588#issuecomment-783534393 https://user-images.githubusercontent.com/36298331/108744715-07c64a80-7519-11eb-8b02-98261e74474d.png;> Sometimes take a while to show the error, these jobs run in 5 minutes, more than 20 minutes trying to connect to hive. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org