[GitHub] [hudi] rubenssoto commented on issue #2588: [SUPPORT] Cannot create hive connection

2021-04-16 Thread GitBox


rubenssoto commented on issue #2588:
URL: https://github.com/apache/hudi/issues/2588#issuecomment-821765451


   @nsivabalan 
   I really tried, our migration to hudi was late more than 2 months, we have a 
2 months ticket with AWS and no solution was gave to us.
   
   We will create a file on table folder with all table columns, when the new 
dataframe have different columns comparing to that file, we will enable hudi 
hive sync. I think it could solve the problem for now, until aws gave to us a 
better solution.
   
   Another approach that we want to try, is to sync hive table through 
metastore, disabling jdbc


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] rubenssoto commented on issue #2588: [SUPPORT] Cannot create hive connection

2021-04-12 Thread GitBox


rubenssoto commented on issue #2588:
URL: https://github.com/apache/hudi/issues/2588#issuecomment-818455364


   Hello Guys,
   
   I think is something related to EMR and Hive, we made a solution for our own 
to only enable hive sync when schema changes.
   
   It could be a good feature to Hudi, enable syncs only when schema changes or 
a new partition arrives.
   
   I will close the issue, because is not something related to hudi.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] rubenssoto commented on issue #2588: [SUPPORT] Cannot create hive connection

2021-03-30 Thread GitBox


rubenssoto commented on issue #2588:
URL: https://github.com/apache/hudi/issues/2588#issuecomment-810773796


   @bvaradar which hive version do you use?  How many jobs running at the same 
time?
   
   Probably your case, you have a big Hive, to respond a lot of requests, in my 
case I use hive inside EMR only to integrate with Glue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] rubenssoto commented on issue #2588: [SUPPORT] Cannot create hive connection

2021-02-25 Thread GitBox


rubenssoto commented on issue #2588:
URL: https://github.com/apache/hudi/issues/2588#issuecomment-785933427


   @bvaradar I think it is a hive issue, I'm trying to increase hive heap size, 
I hope it helps.
   
   I process the tables in threads, so I have almost 20 hive connections open.
   
   Do you have any experience with Hudi and Hive? Because Hudi probably execute 
simple queries to verify table schema
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] rubenssoto commented on issue #2588: [SUPPORT] Cannot create hive connection

2021-02-24 Thread GitBox


rubenssoto commented on issue #2588:
URL: https://github.com/apache/hudi/issues/2588#issuecomment-785309606


   Hello Guys,
   
   I found some new errors:
   
   `21/02/24 18:53:18 ERROR HiveSyncTool: Got runtime exception when hive 
syncing
   org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync partitions for 
table order_delivery_failure
at 
org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:211)
at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:148)
at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94)
at 
org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:355)
at 
org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$4(HoodieSparkSqlWriter.scala:403)
at 
org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$4$adapted(HoodieSparkSqlWriter.scala:399)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
at 
org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:399)
at 
org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:460)
at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:218)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:134)
at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:124)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:123)
at 
org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:963)
at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104)
at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227)
at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:107)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:132)
at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104)
at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:132)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:248)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:131)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:963)
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:415)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:399)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:288)
at hudiwriter.HudiWriter.merge(HudiWriter.scala:79)
at hudiwriter.HudiContext.writeToHudi(HudiContext.scala:34)
at jobs.TableProcessor.start(TableProcessor.scala:86)
at 
TableProcessorWrapper$.$anonfun$main$2(TableProcessorWrapper.scala:23)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
at 
java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)

[GitHub] [hudi] rubenssoto commented on issue #2588: [SUPPORT] Cannot create hive connection

2021-02-23 Thread GitBox


rubenssoto commented on issue #2588:
URL: https://github.com/apache/hudi/issues/2588#issuecomment-784705748


   @umehrot2 @bvaradar if you have any idea, please help me, is the only 
problem that prevents me from turning Hudi to production.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] rubenssoto commented on issue #2588: [SUPPORT] Cannot create hive connection

2021-02-23 Thread GitBox


rubenssoto commented on issue #2588:
URL: https://github.com/apache/hudi/issues/2588#issuecomment-784695940


   https://user-images.githubusercontent.com/36298331/108935409-68927780-762c-11eb-9c3f-591f1b626557.png;>
   
   I'm having the problem right now, on master side you could see a lot of cpu 
usage, but I dont understand why hive is using a lot of CPU



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] rubenssoto commented on issue #2588: [SUPPORT] Cannot create hive connection

2021-02-22 Thread GitBox


rubenssoto commented on issue #2588:
URL: https://github.com/apache/hudi/issues/2588#issuecomment-783534393


   https://user-images.githubusercontent.com/36298331/108744715-07c64a80-7519-11eb-8b02-98261e74474d.png;>
   
   Sometimes take a while to show the error, these jobs run in 5 minutes, more 
than 20 minutes trying to connect to hive.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org