spark streaming to jdbc
cution$streaming$MicroBatchExecution$$runBatch$5$$anonfun$apply$17.apply(MicroBatchExecution.scala:534) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5.apply(MicroBatchExecution.scala:532) at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351) at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:531) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:198) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166) at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351) at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:166) at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160) at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:279) ... 1 more 21/09/03 15:18:56 INFO SparkContext: Invoking stop() from shutdown hook igyu
type mismatch
val schemas = createSchemas(config) val arr = new Array[String](schemas.size()) lines.map(x => { val obj = JSON.parseObject(x) val vs = new Array[Any](schemas.size()) for (i <- 0 until schemas.size()) { arr(i) = schemas.get(i).name vs(i) = x.getString(schemas.get(i).name) } } val seq = Seq(vs: _*) val record = Row.fromSeq(seq) record })(Encoders.javaSerialization(Row.getClass)) .toDF(arr: _*) I get a error type mismatch; found : Class[?0] where type ?0 <: org.apache.spark.sql.Row.type required: Class[org.apache.spark.sql.Row] })(Encoders.javaSerialization(Row.getClass)) igyu
How can I read two different hbase cluster with kerberos
org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1992) at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:1360) at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1140) at com.join.hbase.reader.HbaseReader.initKerberos(HbaseReader.scala:203) at com.join.hbase.reader.HbaseReader.beforeDo(HbaseReader.scala:138) at com.join.Synctool$.main(Synctool.scala:327) at com.join.Synctool.main(Synctool.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:673) Caused by: javax.security.auth.login.LoginException: Cannot locate KDC at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:808) at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) at javax.security.auth.login.LoginContext.login(LoginContext.java:587) at org.apache.hadoop.security.UserGroupInformation$HadoopLoginContext.login(UserGroupInformation.java:2070) at org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1982) ... 11 more Caused by: KrbException: Cannot locate KDC at sun.security.krb5.Config.getKDCList(Config.java:1121) at sun.security.krb5.KdcComm.send(KdcComm.java:218) at sun.security.krb5.KdcComm.send(KdcComm.java:200) at sun.security.krb5.KrbAsReqBuilder.send(KrbAsReqBuilder.java:335) at sun.security.krb5.KrbAsReqBuilder.action(KrbAsReqBuilder.java:488) at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:780) ... 25 more Caused by: KrbException: Generic error (description in e-text) (60) - Unable to locate KDC for realm JOIN.COM at sun.security.krb5.Config.getKDCFromDNS(Config.java:1218) at sun.security.krb5.Config.getKDCList(Config.java:1094) ... 30 more Exception in thread "main" org.apache.spark.SparkException: Application application_1627287887991_1323 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:1158) at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1606) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:851) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:926) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:935) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 21/08/23 11:40:29 INFO util.ShutdownHookManager: Shutdown hook called igyu
How can I use sparkContext.addFile
in spark-shell I can run val url = "hdfs://nameservice1/user/jztwk/config.json" Spark.sparkContext.addFile(url) val json_str = readLocalFile(SparkFiles.get(url.split("/").last)) but when I make jar package spark-submit --master yarn --deploy-mode cluster --principal jztwk/had...@join.com --keytab /hadoop/app/jztwk.keytab --class com.join.Synctool --jars hdfs://nameservice1/sparklib/* jztsynctools-1.0-SNAPSHOT.jar I get a error ERROR yarn.Client: Application diagnostics message: User class threw exception: java.io.FileNotFoundException: /hadoop/yarn/nm1/usercache/jztwk/appcache/application_1627287887991_0571/spark-020a769c-6d9c-42ff-9bb2-1407cf6ed0bc/userFiles-1f57a3ed-22fa-4464-84e4-e549685b0d2d/hadoop/yarn/nm1/usercache/jztwk/appcache/application_1627287887991_0571/spark-020a769c-6d9c-42ff-9bb2-1407cf6ed0bc/userFiles-1f57a3ed-22fa-4464-84e4-e549685b0d2d/config.json (No such file or directory) but igyu
How can I LoadIncrementalHFiles
DF.toJavaRDD.rdd.hbaseBulkLoadThinRows(hbaseContext, TableName.valueOf(config.getString("table")), R => { val rowKey = Bytes.toBytes(R.getAs[String](name)) val family = Bytes.toBytes(_family) val qualifier = Bytes.toBytes(name) var value: Array[Byte] = value = Bytes.toBytes(R.getAs[String](name)) familyQualifiersValues += (family, qualifier, value) } } (new ByteArrayWrapper(rowKey), familyQualifiersValues) }, config.getString("tmp")) val table = connection.getTable(TableName.valueOf(config.getString("table"))) val load = new LoadIncrementalHFiles(conf) load.doBulkLoad(new Path(config.getString("tmp")), connection.getAdmin, table, connection.getRegionLocator(TableName.valueOf(config.getString("table" } I get a error 21/08/19 15:12:22 INFO LoadIncrementalHFiles: Split occurred while grouping HFiles, retry attempt 9 with 1 files remaining to group or split 21/08/19 15:12:22 INFO LoadIncrementalHFiles: Trying to load hfile=file:/d:/tmp/f/bb4706276d5d40c5b3014cc74dc39ddd first=Optional[0001] last=Optional[0003] 21/08/19 15:12:22 WARN LoadIncrementalHFiles: Attempt to bulk load region containing into table sparktest1 with files [family:f path:file:/d:/tmp/f/bb4706276d5d40c5b3014cc74dc39ddd] failed. This is recoverable and they will be retried. 21/08/19 15:12:22 INFO LoadIncrementalHFiles: Split occurred while grouping HFiles, retry attempt 10 with 1 files remaining to group or split 21/08/19 15:12:22 ERROR LoadIncrementalHFiles: - Bulk load aborted with some files not yet loaded: - file:/d:/tmp/f/bb4706276d5d40c5b3014cc74dc39ddd Exception in thread "main" java.io.IOException: Retry attempted 10 times without completing, bailing out at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.performBulkLoad(LoadIncrementalHFiles.java:419) at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:342) at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:256) at com.join.hbase.writer.HbaseWriter.saveTo(HbaseWriter.scala:167) at com.join.Synctool$.main(Synctool.scala:587) at com.join.Synctool.main(Synctool.scala) file:/d:/tmp/f/bb4706276d5d40c5b3014cc74dc39ddd is existent os hbaseBulkLoadThinRows function is OK in official web I find rdd.hbaseBulkLoad(TableName.valueOf(tableName), t => { val rowKey = t._1 val family:Array[Byte] = t._2(0)._1 val qualifier = t._2(0)._2 val value = t._2(0)._3 val keyFamilyQualifier= new KeyFamilyQualifier(rowKey, family, qualifier) Seq((keyFamilyQualifier, value)).iterator }, stagingFolder.getPath) val load = new LoadIncrementalHFiles(config) load.doBulkLoad(new Path(stagingFolder.getPath), conn.getAdmin, table, conn.getRegionLocator(TableName.valueOf(tableName))) path of hbaseBulkLoad and LoadIncrementalHFiles is the same stagingFolder.getPath and I hbaseBulkLoad expected local file igyu
about spark on hbase problem
1.run(NettyRpcConnection.java:344) at org.apache.hbase.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) at org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:495) at org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905) at org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: 你的主机中的软件中止了一个已建立的连接。 at sun.nio.ch.SocketDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:51) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:51) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:469) at org.apache.hbase.thirdparty.io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:405) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:939) at org.apache.hbase.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:360) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:906) at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1370) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:739) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:731) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:717) at org.apache.hbase.thirdparty.io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:117) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:739) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:731) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:717) at org.apache.hbase.thirdparty.io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:117) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:739) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:754) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:778) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:747) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:801) at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1036) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:305) ... 9 more igyu
How can I config hive.metastore.warehouse.dir
r$$anonfun$execute$1.apply(RuleExecutor.scala:76) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76) at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:127) at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:121) at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:106) at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:105) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201) at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:105) at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57) at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47) at org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:61) at org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:60) at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:66) at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:66) at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:72) at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:68) at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:77) at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:77) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:76) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668) at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:325) at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:311) at com.join.hive.writer.HiveJdbcWriter.saveTo(HiveJdbcWriter.scala:87) at com.join.Synctool$.main(Synctool.scala:249) at com.join.Synctool.main(Synctool.scala) Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'hivetest' not found; at org.apache.spark.sql.catalyst.catalog.ExternalCatalog$class.requireDbExists(ExternalCatalog.scala:44) at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.requireDbExists(InMemoryCatalog.scala:46) at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.tableExists(InMemoryCatalog.scala:338) at org.apache.spark.sql.catalyst.catalog.ExternalCatalog$class.requireTableExists(ExternalCatalog.scala:49) at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.requireTableExists(InMemoryCatalog.scala:46) at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.getTable(InMemoryCatalog.scala:333) at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.getTable(ExternalCatalogWithListener.scala:146) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:701) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:730) ... 46 more INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/D:/file/code/Java/jztsynctools/spark-warehouse/'). I think hive.metastore.warehouse.dir is null so I can find hivetest database but I set proper.setProperty("spark.sql.warehouse.dir", "/user/hive/warehouse") proper.setProperty("hive.metastore.warehouse.dir","/user/hive/warehouse") proper.setProperty("hive.metastore.uris", "thrift://bigdser1:9083") use sparkSession.sparkContext.hadoopConfiguration.addResource("D:\\file\\core-site.xml") sparkSession.sparkContext.hadoopConfiguration.addResource("D:\\file\\hdfs-site.xml") sparkSession.sparkContext.hadoopConfiguration.addResource("D:\\file\\hive-site.xml") sparkSession.sparkContext.hadoopConfiguration.addResource("D:\\file\\yarn-site.xml")I aslo get the same error igyu
How can I config hive.metastore.warehouse.dir
r$$anonfun$execute$1.apply(RuleExecutor.scala:76) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76) at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:127) at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:121) at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:106) at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:105) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201) at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:105) at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57) at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47) at org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:61) at org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:60) at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:66) at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:66) at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:72) at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:68) at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:77) at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:77) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:76) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668) at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:325) at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:311) at com.join.hive.writer.HiveJdbcWriter.saveTo(HiveJdbcWriter.scala:87) at com.join.Synctool$.main(Synctool.scala:249) at com.join.Synctool.main(Synctool.scala) Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'hivetest' not found; at org.apache.spark.sql.catalyst.catalog.ExternalCatalog$class.requireDbExists(ExternalCatalog.scala:44) at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.requireDbExists(InMemoryCatalog.scala:46) at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.tableExists(InMemoryCatalog.scala:338) at org.apache.spark.sql.catalyst.catalog.ExternalCatalog$class.requireTableExists(ExternalCatalog.scala:49) at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.requireTableExists(InMemoryCatalog.scala:46) at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.getTable(InMemoryCatalog.scala:333) at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.getTable(ExternalCatalogWithListener.scala:146) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:701) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:730) ... 46 more INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/D:/file/code/Java/jztsynctools/spark-warehouse/'). I think hive.metastore.warehouse.dir is null so I can find hivetest database but I set proper.setProperty("spark.sql.warehouse.dir", "/user/hive/warehouse") proper.setProperty("hive.metastore.warehouse.dir","/user/hive/warehouse") proper.setProperty("hive.metastore.uris", "thrift://bigdser1:9083") igyu
about ShellBasedUnixGroupsMapping question
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:379) at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41) at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:615) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:339) at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3383) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2544) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2544) at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363) at org.apache.spark.sql.Dataset.head(Dataset.scala:2544) at org.apache.spark.sql.Dataset.take(Dataset.scala:2758) at org.apache.spark.sql.Dataset.getRows(Dataset.scala:254) at org.apache.spark.sql.Dataset.showString(Dataset.scala:291) at org.apache.spark.sql.Dataset.show(Dataset.scala:745) at org.apache.spark.sql.Dataset.show(Dataset.scala:704) at org.apache.spark.sql.Dataset.show(Dataset.scala:713) at com.join.hive.reader.HiveReader.readFrom(HiveReader.scala:15) at com.join.Synctool$.main(Synctool.scala:200) at com.join.Synctool.main(Synctool.scala) but I use LdapGroupsMapping How can fix it? igyu
How can I write data to hive with jdbc
anonfun$withNewExecutionId$1.apply(SQLExecution.scala:76) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668) at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:325) at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:311) at com.join.hive.writer.HiveWriter.saveTo(HiveWriter.scala:39) at com.join.synctool$.main(synctool.scala:43) at com.join.synctool.main(synctool.scala) Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'ods_job_log' not found in database 'default'; at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:81) at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:81) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:81) at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:84) at org.apache.spark.sql.hive.HiveExternalCatalog.getRawTable(HiveExternalCatalog.scala:120) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:737) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:737) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99) at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:736) at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.getTable(ExternalCatalogWithListener.scala:146) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:701) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:730) igyu
How can I write data to ftp
itTask(HadoopMapReduceCommitProtocol.scala:225) at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:78) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:247) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:242) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1442) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:248) ... 10 more 21/08/10 08:33:44 INFO SparkContext: Invoking stop() from shutdown hook igyu
How can I write to ftp
performCommit$1(SparkHadoopMapRedUtil.scala:50) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:77) at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitTask(HadoopMapReduceCommitProtocol.scala:225) at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:78) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:247) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:242) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1442) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:248) ... 10 more 21/08/09 14:50:14 INFO SparkContext: Invoking stop() from shutdown hook igyu
How can I read ftp
val ftpUrl = "ftp://ftpuser:ftpuser@10.3.87.51:21/sparkftp/"; val schemas = StructType(List( new StructField("name", DataTypes.StringType, true), new StructField("age", DataTypes.IntegerType, true), new StructField("remk", DataTypes.StringType, true))) val DF = sparkSession.read.format("csv") .schema(schemas) .option("header","true") .load(ftpUrl) // .filter("created<=1602864000") DF.printSchema() DF.show() I get error Exception in thread "main" java.lang.IllegalArgumentException: Illegal pattern component: XXX at org.apache.commons.lang3.time.FastDatePrinter.parsePattern(FastDatePrinter.java:282) at org.apache.commons.lang3.time.FastDatePrinter.init(FastDatePrinter.java:149) at org.apache.commons.lang3.time.FastDatePrinter.(FastDatePrinter.java:142) at org.apache.commons.lang3.time.FastDateFormat.(FastDateFormat.java:384) at org.apache.commons.lang3.time.FastDateFormat.(FastDateFormat.java:369) at org.apache.commons.lang3.time.FastDateFormat$1.createInstance(FastDateFormat.java:91) at org.apache.commons.lang3.time.FastDateFormat$1.createInstance(FastDateFormat.java:88) at org.apache.commons.lang3.time.FormatCache.getInstance(FormatCache.java:82) at org.apache.commons.lang3.time.FastDateFormat.getInstance(FastDateFormat.java:165) at org.apache.spark.sql.execution.datasources.csv.CSVOptions.(CSVOptions.scala:139) at org.apache.spark.sql.execution.datasources.csv.CSVOptions.(CSVOptions.scala:41) at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.buildReader(CSVFileFormat.scala:105) at org.apache.spark.sql.execution.datasources.FileFormat$class.buildReaderWithPartitionValues(FileFormat.scala:129) at org.apache.spark.sql.execution.datasources.TextBasedFileFormat.buildReaderWithPartitionValues(FileFormat.scala:165) at org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:312) at org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:310) at org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:330) at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41) at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:615) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:339) at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3383) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2544) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2544) at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363) at org.apache.spark.sql.Dataset.head(Dataset.scala:2544) at org.apache.spark.sql.Dataset.take(Dataset.scala:2758) at org.apache.spark.sql.Dataset.getRows(Dataset.scala:254) at org.apache.spark.sql.Dataset.showString(Dataset.scala:291) at org.apache.spark.sql.Dataset.show(Dataset.scala:745) at org.apache.spark.sql.Dataset.show(Dataset.scala:704) at org.apache.spark.sql.Dataset.show(Dataset.scala:713) at com.join.ftp.reader.FtpReader.readFrom(FtpReader.scala:40) at com.join.synctool$.main(synctool.scala:41) at com.join.synctool.main(synctool.scala) 21/08/09 11:15:08 INFO SparkContext: Invoking stop() from shutdown hook igyu
How can transform RDD[Seq[String]] to RDD[ROW]
val ftpUrl = "ftp://test:test@ip:21/upload/test/_temporary/0/_temporary/task_2019124756_0002_m_00_0/*"; val rdd = spark.sparkContext.wholeTextFiles(ftpUrl) val value = rdd.map(_._2).map(csv=>csv.split(",").toSeq) val schemas = StructType(List( new StructField("id", DataTypes.StringType, true), new StructField("name", DataTypes.StringType, true), new StructField("year", DataTypes.IntegerType, true), new StructField("city", DataTypes.StringType, true))) val DF = spark.createDataFrame(value,schemas) How can I createDataFrame igyu
How can I write data to hive with jdbc
anonfun$withNewExecutionId$1.apply(SQLExecution.scala:76) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668) at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:325) at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:311) at com.join.hive.writer.HiveWriter.saveTo(HiveWriter.scala:39) at com.join.synctool$.main(synctool.scala:43) at com.join.synctool.main(synctool.scala) Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'ods_job_log' not found in database 'default'; at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:81) at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:81) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:81) at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:84) at org.apache.spark.sql.hive.HiveExternalCatalog.getRawTable(HiveExternalCatalog.scala:120) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:737) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:737) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99) at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:736) at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.getTable(ExternalCatalogWithListener.scala:146) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:701) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:730) igyu
How can I write data to hive with jdbc
$Processor$ExecuteStatement.getResult(TCLIService.java:1422) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:567) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.parse.ParseException:line 1:53 cannot recognize input near 'TEXT' ',' 'password' in column type at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:221) at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:75) at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:68) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:564) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1425) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1398) at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:205) ... 15 more 21/07/30 17:16:39 INFO SparkContext: Invoking stop() from shutdown hook 21/07/30 17:16:39 INFO SparkUI: Stopped Spark web UI at http://WIN-20201231YGA:4040 igyu
How can I sync 2 hive cluster
I want read data from hive cluster1 and write data to hive cluster2 How can I do it? notice: cluster1,cluster2 are enable kerberos igyu