hdfs 坏文件导致hive无法读取

2020-02-29 Thread allanqinjy
各位好,
   请教个问题,就是在往hdfs写数据的时候,会经常遇到坏文件导致hive读取的时候报异常。写hdfs 代码如下,之后的是hive 
读取时候由于坏文件导致没法select 报的异常,把坏文件删了就可以了。请问如何解决避免生成坏文件,这种生成坏文件有没有哪位遇到过并且有效的解决了。


BucketingSink> HDFS_SINK = new BucketingSink<>(path);
HDFS_SINK.setBucketer(new DateTimeBucketer(format));
HDFS_SINK.setPendingPrefix("flink_");
HDFS_SINK.setInProgressPrefix("flink_");
HDFS_SINK.setPartPrefix("pulsar_part");
HDFS_SINK.setInactiveBucketThreshold(bucketThreshold);
HDFS_SINK.setWriter(new SequenceFileWriter("SnappyCodec", 
SequenceFile.CompressionType.BLOCK));




  2020-02-29 18:31:30,747 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : java.io.IOException: java.io.IOException: 
java.io.EOFException
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:227)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:137)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.IOException: java.io.EOFException
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:116)
at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:43)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:225)
... 11 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:70)
at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:120)
at org.apache.hadoop.io.SequenceFile$Reader.readBuffer(SequenceFile.java:2158)
at 
org.apache.hadoop.io.SequenceFile$Reader.seekToCurrentValue(SequenceFile.java:2224)
at 
org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2299)
at 
org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:109)
at 
org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:84)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
... 15 more

回复: Hive Source With Kerberos认证问题

2020-02-29 Thread 叶贤勋
Hi 李锐,感谢你的回复。
前面的问题通过设置yarn.resourcemanager.principal,已经解决。
但是现在出现另外一个问题,请帮忙看看。
背景:flink任务还是source&sink带有kerberos的hive,相同代码在本地进行测试是能通过kerberos认证,并且能够查询和插入数据到hive。但是任务提交到集群就报kerberos认证失败的错误。
Flink:1.9.1, 
flink-1.9.1/lib/有flink-dist_2.11-1.9.1.jar,flink-shaded-hadoop-2-uber-2.7.5-7.0.jar,log4j-1.2.17.jar,slf4j-log4j12-1.7.15.jar
Hive:2.1.1
flink任务主要依赖的jar:
[INFO] +- org.apache.flink:flink-table-api-java:jar:flink-1.9.1:compile
[INFO] |  +- org.apache.flink:flink-table-common:jar:flink-1.9.1:compile
[INFO] |  |  \- org.apache.flink:flink-core:jar:flink-1.9.1:compile
[INFO] |  | +- org.apache.flink:flink-annotations:jar:flink-1.9.1:compile
[INFO] |  | +- org.apache.flink:flink-metrics-core:jar:flink-1.9.1:compile
[INFO] |  | \- com.esotericsoftware.kryo:kryo:jar:2.24.0:compile
[INFO] |  |+- com.esotericsoftware.minlog:minlog:jar:1.2:compile
[INFO] |  |\- org.objenesis:objenesis:jar:2.1:compile
[INFO] |  +- com.google.code.findbugs:jsr305:jar:1.3.9:compile
[INFO] |  \- org.apache.flink:force-shading:jar:1.9.1:compile
[INFO] +- 
org.apache.flink:flink-table-planner-blink_2.11:jar:flink-1.9.1:compile
[INFO] |  +- org.apache.flink:flink-table-api-scala_2.11:jar:flink-1.9.1:compile
[INFO] |  |  +- org.scala-lang:scala-reflect:jar:2.11.12:compile
[INFO] |  |  \- org.scala-lang:scala-compiler:jar:2.11.12:compile
[INFO] |  +- 
org.apache.flink:flink-table-api-java-bridge_2.11:jar:flink-1.9.1:compile
[INFO] |  |  +- org.apache.flink:flink-java:jar:flink-1.9.1:compile
[INFO] |  |  \- org.apache.flink:flink-streaming-java_2.11:jar:1.9.1:compile
[INFO] |  +- 
org.apache.flink:flink-table-api-scala-bridge_2.11:jar:flink-1.9.1:compile
[INFO] |  |  \- org.apache.flink:flink-scala_2.11:jar:flink-1.9.1:compile
[INFO] |  +- 
org.apache.flink:flink-table-runtime-blink_2.11:jar:flink-1.9.1:compile
[INFO] |  |  +- org.codehaus.janino:janino:jar:3.0.9:compile
[INFO] |  |  \- org.apache.calcite.avatica:avatica-core:jar:1.15.0:compile
[INFO] |  \- org.reflections:reflections:jar:0.9.10:compile
[INFO] +- org.apache.flink:flink-table-planner_2.11:jar:flink-1.9.1:compile
[INFO] +- org.apache.commons:commons-lang3:jar:3.9:compile
[INFO] +- com.typesafe.akka:akka-actor_2.11:jar:2.5.21:compile
[INFO] |  +- org.scala-lang:scala-library:jar:2.11.8:compile
[INFO] |  +- com.typesafe:config:jar:1.3.3:compile
[INFO] |  \- org.scala-lang.modules:scala-java8-compat_2.11:jar:0.7.0:compile
[INFO] +- org.apache.flink:flink-sql-client_2.11:jar:1.9.1:compile
[INFO] |  +- org.apache.flink:flink-clients_2.11:jar:1.9.1:compile
[INFO] |  |  \- org.apache.flink:flink-optimizer_2.11:jar:1.9.1:compile
[INFO] |  +- org.apache.flink:flink-streaming-scala_2.11:jar:1.9.1:compile
[INFO] |  +- log4j:log4j:jar:1.2.17:compile
[INFO] |  \- org.apache.flink:flink-shaded-jackson:jar:2.9.8-7.0:compile
[INFO] +- org.apache.flink:flink-json:jar:1.9.1:compile
[INFO] +- org.apache.flink:flink-csv:jar:1.9.1:compile
[INFO] +- org.apache.flink:flink-hbase_2.11:jar:1.9.1:compile
[INFO] +- org.apache.hbase:hbase-server:jar:2.2.1:compile
[INFO] |  +- org.apache.hbase.thirdparty:hbase-shaded-protobuf:jar:2.2.1:compile
[INFO] |  +- org.apache.hbase.thirdparty:hbase-shaded-netty:jar:2.2.1:compile
[INFO] |  +- 
org.apache.hbase.thirdparty:hbase-shaded-miscellaneous:jar:2.2.1:compile
[INFO] |  |  \- com.google.errorprone:error_prone_annotations:jar:2.3.3:compile
[INFO] |  +- org.apache.hbase:hbase-common:jar:2.2.1:compile
[INFO] |  |  \- 
com.github.stephenc.findbugs:findbugs-annotations:jar:1.3.9-1:compile
[INFO] |  +- org.apache.hbase:hbase-http:jar:2.2.1:compile
[INFO] |  |  +- org.eclipse.jetty:jetty-util:jar:9.3.27.v20190418:compile
[INFO] |  |  +- org.eclipse.jetty:jetty-util-ajax:jar:9.3.27.v20190418:compile
[INFO] |  |  +- org.eclipse.jetty:jetty-http:jar:9.3.27.v20190418:compile
[INFO] |  |  +- org.eclipse.jetty:jetty-security:jar:9.3.27.v20190418:compile
[INFO] |  |  +- org.glassfish.jersey.core:jersey-server:jar:2.25.1:compile
[INFO] |  |  |  +- org.glassfish.jersey.core:jersey-common:jar:2.25.1:compile
[INFO] |  |  |  |  +- 
org.glassfish.jersey.bundles.repackaged:jersey-guava:jar:2.25.1:compile
[INFO] |  |  |  |  \- org.glassfish.hk2:osgi-resource-locator:jar:1.0.1:compile
[INFO] |  |  |  +- org.glassfish.jersey.core:jersey-client:jar:2.25.1:compile
[INFO] |  |  |  +- 
org.glassfish.jersey.media:jersey-media-jaxb:jar:2.25.1:compile
[INFO] |  |  |  +- javax.annotation:javax.annotation-api:jar:1.2:compile
[INFO] |  |  |  +- org.glassfish.hk2:hk2-api:jar:2.5.0-b32:compile
[INFO] |  |  |  |  +- org.glassfish.hk2:hk2-utils:jar:2.5.0-b32:compile
[INFO] |  |  |  |  \- 
org.glassfish.hk2.external:aopalliance-repackaged:jar:2.5.0-b32:compile
[INFO] |  |  |  +- org.glassfish.hk2.external:javax.inject:jar:2.5.0-b32:compile
[INFO] |  |  |  \- org.glassfish.hk2:hk2-locator:jar:2.5.0-b32:compile
[INFO] |  |  +- 
org.glassfish.jersey.containers:jersey-container-servlet-core:jar:2.25.1:compile
[INFO] |  |  \- javax.ws.rs:javax.ws.rs-api:jar:2.0.1:compile
[INFO