spark streaming to jdbc

2021-09-03 Thread igyu
cution$streaming$MicroBatchExecution$$runBatch$5$$anonfun$apply$17.apply(MicroBatchExecution.scala:534)
at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5.apply(MicroBatchExecution.scala:532)
at 
org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
at 
org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:531)
at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:198)
at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
at 
org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
at 
org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:166)
at 
org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160)
at 
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:279)
... 1 more
21/09/03 15:18:56 INFO SparkContext: Invoking stop() from shutdown hook


igyu


type mismatch

2021-09-02 Thread igyu
val schemas = createSchemas(config)
val arr = new Array[String](schemas.size())

lines.map(x => {
  val obj = JSON.parseObject(x)
  val vs = new Array[Any](schemas.size())
  for (i <- 0 until schemas.size()) {
arr(i) = schemas.get(i).name
 vs(i) = x.getString(schemas.get(i).name)
}
  }

  val seq = Seq(vs: _*)
  val record = Row.fromSeq(seq)
  record
})(Encoders.javaSerialization(Row.getClass))
  .toDF(arr: _*)

I get a error

type mismatch;
 found   : Class[?0] where type ?0 <: org.apache.spark.sql.Row.type
 required: Class[org.apache.spark.sql.Row]
})(Encoders.javaSerialization(Row.getClass))


igyu


How can I read two different hbase cluster with kerberos

2021-08-22 Thread igyu
 
org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1992)
at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:1360)
at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1140)
at com.join.hbase.reader.HbaseReader.initKerberos(HbaseReader.scala:203)
at com.join.hbase.reader.HbaseReader.beforeDo(HbaseReader.scala:138)
at com.join.Synctool$.main(Synctool.scala:327)
at com.join.Synctool.main(Synctool.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:673)
Caused by: javax.security.auth.login.LoginException: Cannot locate KDC
at 
com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:808)
at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
at 
org.apache.hadoop.security.UserGroupInformation$HadoopLoginContext.login(UserGroupInformation.java:2070)
at 
org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1982)
... 11 more
Caused by: KrbException: Cannot locate KDC
at sun.security.krb5.Config.getKDCList(Config.java:1121)
at sun.security.krb5.KdcComm.send(KdcComm.java:218)
at sun.security.krb5.KdcComm.send(KdcComm.java:200)
at sun.security.krb5.KrbAsReqBuilder.send(KrbAsReqBuilder.java:335)
at sun.security.krb5.KrbAsReqBuilder.action(KrbAsReqBuilder.java:488)
at 
com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:780)
... 25 more
Caused by: KrbException: Generic error (description in e-text) (60) - Unable to 
locate KDC for realm JOIN.COM
at sun.security.krb5.Config.getKDCFromDNS(Config.java:1218)
at sun.security.krb5.Config.getKDCList(Config.java:1094)
... 30 more

Exception in thread "main" org.apache.spark.SparkException: Application 
application_1627287887991_1323 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1158)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1606)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:851)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:926)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:935)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/08/23 11:40:29 INFO util.ShutdownHookManager: Shutdown hook called





igyu


How can I use sparkContext.addFile

2021-08-20 Thread igyu
in spark-shell
I can run 

val url = "hdfs://nameservice1/user/jztwk/config.json"
Spark.sparkContext.addFile(url)
val json_str = readLocalFile(SparkFiles.get(url.split("/").last))

but when I make jar package

spark-submit --master yarn --deploy-mode cluster --principal 
jztwk/had...@join.com --keytab /hadoop/app/jztwk.keytab --class 
com.join.Synctool --jars hdfs://nameservice1/sparklib/* 
jztsynctools-1.0-SNAPSHOT.jar

I get a error

 ERROR yarn.Client: Application diagnostics message: User class threw 
exception: java.io.FileNotFoundException: 
/hadoop/yarn/nm1/usercache/jztwk/appcache/application_1627287887991_0571/spark-020a769c-6d9c-42ff-9bb2-1407cf6ed0bc/userFiles-1f57a3ed-22fa-4464-84e4-e549685b0d2d/hadoop/yarn/nm1/usercache/jztwk/appcache/application_1627287887991_0571/spark-020a769c-6d9c-42ff-9bb2-1407cf6ed0bc/userFiles-1f57a3ed-22fa-4464-84e4-e549685b0d2d/config.json
 (No such file or directory)




but 



igyu


How can I LoadIncrementalHFiles

2021-08-19 Thread igyu
 DF.toJavaRDD.rdd.hbaseBulkLoadThinRows(hbaseContext, 
TableName.valueOf(config.getString("table")), R => {
  val rowKey = Bytes.toBytes(R.getAs[String](name))
  val family = Bytes.toBytes(_family)
  val qualifier = Bytes.toBytes(name)
  var value: Array[Byte] = value = Bytes.toBytes(R.getAs[String](name))
  familyQualifiersValues += (family, qualifier, value)
}
  }

  (new ByteArrayWrapper(rowKey), familyQualifiersValues)
}, config.getString("tmp"))

val table = 
connection.getTable(TableName.valueOf(config.getString("table")))
val load = new LoadIncrementalHFiles(conf)
load.doBulkLoad(new Path(config.getString("tmp")),
  connection.getAdmin, table, 
connection.getRegionLocator(TableName.valueOf(config.getString("table"
  }

I get a error

21/08/19 15:12:22 INFO LoadIncrementalHFiles: Split occurred while grouping 
HFiles, retry attempt 9 with 1 files remaining to group or split
21/08/19 15:12:22 INFO LoadIncrementalHFiles: Trying to load 
hfile=file:/d:/tmp/f/bb4706276d5d40c5b3014cc74dc39ddd first=Optional[0001] 
last=Optional[0003]
21/08/19 15:12:22 WARN LoadIncrementalHFiles: Attempt to bulk load region 
containing  into table sparktest1 with files [family:f 
path:file:/d:/tmp/f/bb4706276d5d40c5b3014cc74dc39ddd] failed.  This is 
recoverable and they will be retried.
21/08/19 15:12:22 INFO LoadIncrementalHFiles: Split occurred while grouping 
HFiles, retry attempt 10 with 1 files remaining to group or split
21/08/19 15:12:22 ERROR LoadIncrementalHFiles: 
-
Bulk load aborted with some files not yet loaded:
-
  file:/d:/tmp/f/bb4706276d5d40c5b3014cc74dc39ddd

Exception in thread "main" java.io.IOException: Retry attempted 10 times 
without completing, bailing out
at 
org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.performBulkLoad(LoadIncrementalHFiles.java:419)
at 
org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:342)
at 
org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:256)
at com.join.hbase.writer.HbaseWriter.saveTo(HbaseWriter.scala:167)
at com.join.Synctool$.main(Synctool.scala:587)
at com.join.Synctool.main(Synctool.scala)



file:/d:/tmp/f/bb4706276d5d40c5b3014cc74dc39ddd is existent

os hbaseBulkLoadThinRows function is OK

in official web I find

rdd.hbaseBulkLoad(TableName.valueOf(tableName),
  t => {
   val rowKey = t._1
   val family:Array[Byte] = t._2(0)._1
   val qualifier = t._2(0)._2
   val value = t._2(0)._3
   val keyFamilyQualifier= new KeyFamilyQualifier(rowKey, family, qualifier)
   Seq((keyFamilyQualifier, value)).iterator
  },
  stagingFolder.getPath)
val load = new LoadIncrementalHFiles(config)
load.doBulkLoad(new Path(stagingFolder.getPath),
  conn.getAdmin, table, conn.getRegionLocator(TableName.valueOf(tableName)))

  path of hbaseBulkLoad and LoadIncrementalHFiles  is the same

stagingFolder.getPath

and I hbaseBulkLoad  expected local file


igyu


about spark on hbase problem

2021-08-17 Thread igyu
1.run(NettyRpcConnection.java:344)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
at 
org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:495)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: 你的主机中的软件中止了一个已建立的连接。
at sun.nio.ch.SocketDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:51)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:51)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:469)
at 
org.apache.hbase.thirdparty.io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:405)
at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:939)
at 
org.apache.hbase.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:360)
at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:906)
at 
org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1370)
at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:739)
at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:731)
at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:717)
at 
org.apache.hbase.thirdparty.io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:117)
at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:739)
at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:731)
at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:717)
at 
org.apache.hbase.thirdparty.io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:117)
at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:739)
at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:754)
at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:778)
at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:747)
at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:801)
at 
org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1036)
at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:305)
... 9 more


igyu


How can I config hive.metastore.warehouse.dir

2021-08-11 Thread igyu
r$$anonfun$execute$1.apply(RuleExecutor.scala:76)
at scala.collection.immutable.List.foreach(List.scala:392)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:127)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:121)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:106)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:105)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:105)
at 
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
at 
org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
at 
org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:61)
at 
org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:60)
at 
org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:66)
at 
org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:66)
at 
org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:72)
at 
org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:68)
at 
org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:77)
at 
org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:77)
at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:76)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:325)
at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:311)
at com.join.hive.writer.HiveJdbcWriter.saveTo(HiveJdbcWriter.scala:87)
at com.join.Synctool$.main(Synctool.scala:249)
at com.join.Synctool.main(Synctool.scala)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: 
Database 'hivetest' not found;
at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalog$class.requireDbExists(ExternalCatalog.scala:44)
at 
org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.requireDbExists(InMemoryCatalog.scala:46)
at 
org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.tableExists(InMemoryCatalog.scala:338)
at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalog$class.requireTableExists(ExternalCatalog.scala:49)
at 
org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.requireTableExists(InMemoryCatalog.scala:46)
at 
org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.getTable(InMemoryCatalog.scala:333)
at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.getTable(ExternalCatalogWithListener.scala:146)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:701)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:730)
... 46 more

INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of 
spark.sql.warehouse.dir 
('file:/D:/file/code/Java/jztsynctools/spark-warehouse/').

I think hive.metastore.warehouse.dir is null so I can find hivetest database

but I set 
proper.setProperty("spark.sql.warehouse.dir", "/user/hive/warehouse")
proper.setProperty("hive.metastore.warehouse.dir","/user/hive/warehouse")
proper.setProperty("hive.metastore.uris", "thrift://bigdser1:9083")
use 
sparkSession.sparkContext.hadoopConfiguration.addResource("D:\\file\\core-site.xml")
sparkSession.sparkContext.hadoopConfiguration.addResource("D:\\file\\hdfs-site.xml")
sparkSession.sparkContext.hadoopConfiguration.addResource("D:\\file\\hive-site.xml")
sparkSession.sparkContext.hadoopConfiguration.addResource("D:\\file\\yarn-site.xml")I
 aslo get the same error


igyu


How can I config hive.metastore.warehouse.dir

2021-08-11 Thread igyu
r$$anonfun$execute$1.apply(RuleExecutor.scala:76)
at scala.collection.immutable.List.foreach(List.scala:392)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:127)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:121)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:106)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:105)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:105)
at 
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
at 
org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
at 
org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:61)
at 
org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:60)
at 
org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:66)
at 
org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:66)
at 
org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:72)
at 
org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:68)
at 
org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:77)
at 
org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:77)
at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:76)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:325)
at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:311)
at com.join.hive.writer.HiveJdbcWriter.saveTo(HiveJdbcWriter.scala:87)
at com.join.Synctool$.main(Synctool.scala:249)
at com.join.Synctool.main(Synctool.scala)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: 
Database 'hivetest' not found;
at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalog$class.requireDbExists(ExternalCatalog.scala:44)
at 
org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.requireDbExists(InMemoryCatalog.scala:46)
at 
org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.tableExists(InMemoryCatalog.scala:338)
at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalog$class.requireTableExists(ExternalCatalog.scala:49)
at 
org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.requireTableExists(InMemoryCatalog.scala:46)
at 
org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.getTable(InMemoryCatalog.scala:333)
at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.getTable(ExternalCatalogWithListener.scala:146)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:701)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:730)
... 46 more

INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of 
spark.sql.warehouse.dir 
('file:/D:/file/code/Java/jztsynctools/spark-warehouse/').

I think hive.metastore.warehouse.dir is null so I can find hivetest database

but I set 
proper.setProperty("spark.sql.warehouse.dir", "/user/hive/warehouse")
proper.setProperty("hive.metastore.warehouse.dir","/user/hive/warehouse")
proper.setProperty("hive.metastore.uris", "thrift://bigdser1:9083")


igyu


about ShellBasedUnixGroupsMapping question

2021-08-11 Thread igyu
 org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:379)
at 
org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:615)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:339)
at 
org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3383)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2544)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2544)
at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364)
at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2544)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2758)
at org.apache.spark.sql.Dataset.getRows(Dataset.scala:254)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:291)
at org.apache.spark.sql.Dataset.show(Dataset.scala:745)
at org.apache.spark.sql.Dataset.show(Dataset.scala:704)
at org.apache.spark.sql.Dataset.show(Dataset.scala:713)
at com.join.hive.reader.HiveReader.readFrom(HiveReader.scala:15)
at com.join.Synctool$.main(Synctool.scala:200)
at com.join.Synctool.main(Synctool.scala)

but I use LdapGroupsMapping

How can fix it?



igyu


How can I write data to hive with jdbc

2021-08-10 Thread igyu
anonfun$withNewExecutionId$1.apply(SQLExecution.scala:76)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:325)
at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:311)
at com.join.hive.writer.HiveWriter.saveTo(HiveWriter.scala:39)
at com.join.synctool$.main(synctool.scala:43)
at com.join.synctool.main(synctool.scala)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table 
or view 'ods_job_log' not found in database 'default';
at 
org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:81)
at 
org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:81)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:81)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:84)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.getRawTable(HiveExternalCatalog.scala:120)
at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:737)
at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:737)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:736)
at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.getTable(ExternalCatalogWithListener.scala:146)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:701)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:730)



igyu


How can I write data to ftp

2021-08-09 Thread igyu
itTask(HadoopMapReduceCommitProtocol.scala:225)
at 
org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:78)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:247)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:242)
at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1442)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:248)
... 10 more
21/08/10 08:33:44 INFO SparkContext: Invoking stop() from shutdown hook


igyu


How can I write to ftp

2021-08-08 Thread igyu
performCommit$1(SparkHadoopMapRedUtil.scala:50)
at 
org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:77)
at 
org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitTask(HadoopMapReduceCommitProtocol.scala:225)
at 
org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:78)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:247)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:242)
at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1442)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:248)
... 10 more
21/08/09 14:50:14 INFO SparkContext: Invoking stop() from shutdown hook



igyu


How can I read ftp

2021-08-08 Thread igyu
val ftpUrl = "ftp://ftpuser:ftpuser@10.3.87.51:21/sparkftp/";

val schemas = StructType(List(
new StructField("name", DataTypes.StringType, true),
new StructField("age", DataTypes.IntegerType, true),
new StructField("remk", DataTypes.StringType, true)))   val DF = 
sparkSession.read.format("csv")
  .schema(schemas)
  .option("header","true")
  .load(ftpUrl)
//  .filter("created<=1602864000")

DF.printSchema()
DF.show()
I get error

Exception in thread "main" java.lang.IllegalArgumentException: Illegal pattern 
component: XXX
at 
org.apache.commons.lang3.time.FastDatePrinter.parsePattern(FastDatePrinter.java:282)
at org.apache.commons.lang3.time.FastDatePrinter.init(FastDatePrinter.java:149)
at 
org.apache.commons.lang3.time.FastDatePrinter.(FastDatePrinter.java:142)
at org.apache.commons.lang3.time.FastDateFormat.(FastDateFormat.java:384)
at org.apache.commons.lang3.time.FastDateFormat.(FastDateFormat.java:369)
at 
org.apache.commons.lang3.time.FastDateFormat$1.createInstance(FastDateFormat.java:91)
at 
org.apache.commons.lang3.time.FastDateFormat$1.createInstance(FastDateFormat.java:88)
at org.apache.commons.lang3.time.FormatCache.getInstance(FormatCache.java:82)
at 
org.apache.commons.lang3.time.FastDateFormat.getInstance(FastDateFormat.java:165)
at 
org.apache.spark.sql.execution.datasources.csv.CSVOptions.(CSVOptions.scala:139)
at 
org.apache.spark.sql.execution.datasources.csv.CSVOptions.(CSVOptions.scala:41)
at 
org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.buildReader(CSVFileFormat.scala:105)
at 
org.apache.spark.sql.execution.datasources.FileFormat$class.buildReaderWithPartitionValues(FileFormat.scala:129)
at 
org.apache.spark.sql.execution.datasources.TextBasedFileFormat.buildReaderWithPartitionValues(FileFormat.scala:165)
at 
org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:312)
at 
org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:310)
at 
org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:330)
at 
org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:615)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:339)
at 
org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3383)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2544)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2544)
at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364)
at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2544)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2758)
at org.apache.spark.sql.Dataset.getRows(Dataset.scala:254)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:291)
at org.apache.spark.sql.Dataset.show(Dataset.scala:745)
at org.apache.spark.sql.Dataset.show(Dataset.scala:704)
at org.apache.spark.sql.Dataset.show(Dataset.scala:713)
at com.join.ftp.reader.FtpReader.readFrom(FtpReader.scala:40)
at com.join.synctool$.main(synctool.scala:41)
at com.join.synctool.main(synctool.scala)
21/08/09 11:15:08 INFO SparkContext: Invoking stop() from shutdown hook



igyu


How can transform RDD[Seq[String]] to RDD[ROW]

2021-08-05 Thread igyu
val ftpUrl = 
"ftp://test:test@ip:21/upload/test/_temporary/0/_temporary/task_2019124756_0002_m_00_0/*";
val rdd = spark.sparkContext.wholeTextFiles(ftpUrl)
val value = rdd.map(_._2).map(csv=>csv.split(",").toSeq)

val schemas = StructType(List(
new StructField("id", DataTypes.StringType, true),
new StructField("name", DataTypes.StringType, true),
new StructField("year", DataTypes.IntegerType, true),
new StructField("city", DataTypes.StringType, true)))
val DF = spark.createDataFrame(value,schemas)
How can I createDataFrame



igyu


How can I write data to hive with jdbc

2021-08-04 Thread igyu
anonfun$withNewExecutionId$1.apply(SQLExecution.scala:76)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:325)
at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:311)
at com.join.hive.writer.HiveWriter.saveTo(HiveWriter.scala:39)
at com.join.synctool$.main(synctool.scala:43)
at com.join.synctool.main(synctool.scala)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table 
or view 'ods_job_log' not found in database 'default';
at 
org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:81)
at 
org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:81)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:81)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:84)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.getRawTable(HiveExternalCatalog.scala:120)
at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:737)
at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:737)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:736)
at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.getTable(ExternalCatalogWithListener.scala:146)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:701)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:730)



igyu


How can I write data to hive with jdbc

2021-07-30 Thread igyu
$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:567)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.parse.ParseException:line 1:53 cannot recognize input 
near 'TEXT' ',' 'password' in column type
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:221)
at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:75)
at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:68)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:564)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1425)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1398)
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:205)
... 15 more
21/07/30 17:16:39 INFO SparkContext: Invoking stop() from shutdown hook
21/07/30 17:16:39 INFO SparkUI: Stopped Spark web UI at 
http://WIN-20201231YGA:4040



igyu


How can I sync 2 hive cluster

2021-07-29 Thread igyu
I want read data from hive cluster1
and write data to hive cluster2

How can I do it?

notice: cluster1,cluster2 are enable kerberos



igyu