Re: [Question] ORC - EMRFS Problem

Cazen Lee Sun, 13 Sep 2015 00:55:08 -0700

Hi Owen Thank you for reply

I heard that some peoples say ORC is Owen’s RC file haha ;)

And, Some peoples tells to me after posting it’s already known issues about AWS 
EMR 4.0.0

They said that it might be Hive 0.13.1 and Spark 1.4.1 compatibility issue

So AWS will launch EMR 4.1.0 in couple of weeks with Spark 1.5 and higher 
version of Hive

I hope it works properly after 4.1.0

Error log is below, but don’t mind about that

Thank you very much

scala> val ORCFile = 
sqlContext.read.format("orc").load("s3n://S3bucketName/S3serviceCode/yymmdd=20150801/country=eu/75e91844-2a87-4d8f-af9f-9268e34daef6-000000")

2015-09-13 07:33:29,228 INFO  [main] fs.EmrFileSystem 
(EmrFileSystem.java:initialize(107)) - Consistency disabled, using 
com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem as filesystem implementation 
2015-09-13 07:33:29,314 INFO  [main] amazonaws.latency 
(AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[200], 
ServiceName=[Amazon S3], AWSRequestID=[CF49E1372BEF2E81], 
ServiceEndpoint=[https://S3bucketName.s3.amazonaws.com 
<https://s3bucketname.s3.amazonaws.com/>], HttpClientPoolLeasedCount=0, 
RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=0, 
ClientExecuteTime=[85.608], HttpRequestTime=[85.101], 
HttpClientReceiveResponseTime=[13.891], RequestSigningTime=[0.259], 
ResponseProcessingTime=[0.007], HttpClientSendRequestTime=[0.305], 
2015-09-13 07:33:29,351 INFO  [main] amazonaws.latency 
(AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[200], 
ServiceName=[Amazon S3], AWSRequestID=[55B8C5E6009F0246], 
ServiceEndpoint=[https://S3bucketName.s3.amazonaws.com 
<https://s3bucketname.s3.amazonaws.com/>], HttpClientPoolLeasedCount=0, 
RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, 
ClientExecuteTime=[32.776], HttpRequestTime=[13.17], 
HttpClientReceiveResponseTime=[10.961], RequestSigningTime=[0.28], 
ResponseProcessingTime=[19.042], HttpClientSendRequestTime=[0.295], 
2015-09-13 07:33:29,421 INFO  [main] s3n.S3NativeFileSystem 
(S3NativeFileSystem.java:open(1159)) - Opening 
's3n://S3bucketName/S3serviceCode/yymmdd=20150801/country=eu/75e91844-2a87-4d8f-af9f-9268e34daef6-000000'
 for reading 
2015-09-13 07:33:29,477 INFO  [main] amazonaws.latency 
(AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[206], 
ServiceName=[Amazon S3], AWSRequestID=[F698A6A43297754E], 
ServiceEndpoint=[https://S3bucketName.s3.amazonaws.com 
<https://s3bucketname.s3.amazonaws.com/>], HttpClientPoolLeasedCount=0, 
RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, 
ClientExecuteTime=[53.698], HttpRequestTime=[50.815], 
HttpClientReceiveResponseTime=[48.774], RequestSigningTime=[0.372], 
ResponseProcessingTime=[0.861], HttpClientSendRequestTime=[0.362], 
2015-09-13 07:33:29,478 INFO  [main] metrics.MetricsSaver 
(MetricsSaver.java:<init>(915)) - Thread 1 created MetricsLockFreeSaver 1 
2015-09-13 07:33:29,479 INFO  [main] s3n.S3NativeFileSystem 
(S3NativeFileSystem.java:retrievePair(292)) - Stream for key 
'S3serviceCode/yymmdd=20150801/country=eu/75e91844-2a87-4d8f-af9f-9268e34daef6-000000'
 seeking to position '217260502' 
2015-09-13 07:33:29,590 INFO  [main] amazonaws.latency 
(AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[206], 
ServiceName=[Amazon S3], AWSRequestID=[AD631A8AE229AFE7], 
ServiceEndpoint=[https://S3bucketName.s3.amazonaws.com 
<https://s3bucketname.s3.amazonaws.com/>], HttpClientPoolLeasedCount=0, 
RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=0, 
ClientExecuteTime=[109.859], HttpRequestTime=[109.204], 
HttpClientReceiveResponseTime=[58.468], RequestSigningTime=[0.286], 
ResponseProcessingTime=[0.133], HttpClientSendRequestTime=[0.327], 
2015-09-13 07:33:29,753 INFO  [main] s3n.S3NativeFileSystem 
(S3NativeFileSystem.java:listStatus(896)) - listStatus 
s3n://S3bucketName/S3serviceCode/yymmdd=20150801/country=eu/75e91844-2a87-4d8f-af9f-9268e34daef6-000000
 with recursive false 
2015-09-13 07:33:29,877 INFO  [main] hive.HiveContext 
(Logging.scala:logInfo(59)) - Initializing HiveMetastoreConnection version 
0.13.1 using Spark classes. 
2015-09-13 07:33:30,593 WARN  [main] util.NativeCodeLoader 
(NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for 
your platform... using builtin-java classes where applicable 
2015-09-13 07:33:30,622 INFO  [main] metastore.HiveMetaStore 
(HiveMetaStore.java:newRawStore(493)) - 0: Opening raw store with implemenation 
class:org.apache.hadoop.hive.metastore.ObjectStore 
2015-09-13 07:33:30,641 INFO  [main] metastore.ObjectStore 
(ObjectStore.java:initialize(246)) - ObjectStore, initialize called 
2015-09-13 07:33:30,782 INFO  [main] DataNucleus.Persistence 
(Log4JLogger.java:info(77)) - Property datanucleus.cache.level2 unknown - will 
be ignored 
2015-09-13 07:33:30,782 INFO  [main] DataNucleus.Persistence 
(Log4JLogger.java:info(77)) - Property hive.metastore.integral.jdo.pushdown 
unknown - will be ignored 
2015-09-13 07:33:31,208 INFO  [main] metastore.ObjectStore 
(ObjectStore.java:getPMF(315)) - Setting MetaStore object pin classes with 
hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"

2015-09-13 07:33:32,375 INFO  [main] DataNucleus.Datastore 
(Log4JLogger.java:info(77)) - The class 
"org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as 
"embedded-only" so does not have its own datastore table. 
2015-09-13 07:33:32,376 INFO  [main] DataNucleus.Datastore 
(Log4JLogger.java:info(77)) - The class 
"org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so 
does not have its own datastore table. 
2015-09-13 07:33:32,470 INFO  [main] DataNucleus.Datastore 
(Log4JLogger.java:info(77)) - The class 
"org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as 
"embedded-only" so does not have its own datastore table. 
2015-09-13 07:33:32,470 INFO  [main] DataNucleus.Datastore 
(Log4JLogger.java:info(77)) - The class 
"org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so 
does not have its own datastore table. 
2015-09-13 07:33:32,558 INFO  [main] DataNucleus.Query 
(Log4JLogger.java:info(77)) - Reading in results for query 
"org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is 
closing 
2015-09-13 07:33:32,561 INFO  [main] metastore.ObjectStore 
(ObjectStore.java:setConf(229)) - Initialized ObjectStore 
2015-09-13 07:33:32,816 INFO  [main] metastore.HiveMetaStore 
(HiveMetaStore.java:createDefaultRoles(551)) - Added admin role in metastore 
2015-09-13 07:33:32,819 INFO  [main] metastore.HiveMetaStore 
(HiveMetaStore.java:createDefaultRoles(560)) - Added public role in metastore 
2015-09-13 07:33:32,888 INFO  [main] metastore.HiveMetaStore 
(HiveMetaStore.java:addAdminUsers(588)) - No user is added in admin role, since 
config is empty 
2015-09-13 07:33:33,343 INFO  [main] session.SessionState 
(SessionState.java:start(360)) - No Tez session required at this point. 
hive.execution.engine=mr. 
ORCFile: org.apache.spark.sql.DataFrame = [h_header1: string, h_header2: 
string, h_header3: string, h_header4: string, h_header5: string, h_header6: 
string, h_header7: string, h_header8: string, h_header9: string, body: 
map<string,string>, yymmdd: int, country: string] 

scala> ORCFile.head 

2015-09-13 07:33:41,080 INFO  [main] sources.DataSourceStrategy 
(Logging.scala:logInfo(59)) - Selected 1 partitions out of 1, pruned 0.0% 
partitions. 
2015-09-13 07:33:41,169 INFO  [main] storage.MemoryStore 
(Logging.scala:logInfo(59)) - ensureFreeSpace(243112) called with curMem=0, 
maxMem=280248975 
2015-09-13 07:33:41,171 INFO  [main] storage.MemoryStore 
(Logging.scala:logInfo(59)) - Block broadcast_0 stored as values in memory 
(estimated size 237.4 KB, free 267.0 MB) 
2015-09-13 07:33:41,214 INFO  [main] storage.MemoryStore 
(Logging.scala:logInfo(59)) - ensureFreeSpace(22100) called with curMem=243112, 
maxMem=280248975 
2015-09-13 07:33:41,215 INFO  [main] storage.MemoryStore 
(Logging.scala:logInfo(59)) - Block broadcast_0_piece0 stored as bytes in 
memory (estimated size 21.6 KB, free 267.0 MB) 
2015-09-13 07:33:41,216 INFO  [sparkDriver-akka.actor.default-dispatcher-3] 
storage.BlockManagerInfo (Logging.scala:logInfo(59)) - Added broadcast_0_piece0 
in memory on 10.0.0.112:48218 (size: 21.6 KB, free: 267.2 MB) 
2015-09-13 07:33:41,221 INFO  [main] spark.SparkContext 
(Logging.scala:logInfo(59)) - Created broadcast 0 from head at <console>:22 
2015-09-13 07:33:41,396 INFO  [main] storage.MemoryStore 
(Logging.scala:logInfo(59)) - ensureFreeSpace(244448) called with 
curMem=265212, maxMem=280248975 
2015-09-13 07:33:41,396 INFO  [main] storage.MemoryStore 
(Logging.scala:logInfo(59)) - Block broadcast_1 stored as values in memory 
(estimated size 238.7 KB, free 266.8 MB) 
2015-09-13 07:33:41,422 INFO  [main] storage.MemoryStore 
(Logging.scala:logInfo(59)) - ensureFreeSpace(22567) called with curMem=509660, 
maxMem=280248975 
2015-09-13 07:33:41,422 INFO  [main] storage.MemoryStore 
(Logging.scala:logInfo(59)) - Block broadcast_1_piece0 stored as bytes in 
memory (estimated size 22.0 KB, free 266.8 MB) 
2015-09-13 07:33:41,423 INFO  [sparkDriver-akka.actor.default-dispatcher-3] 
storage.BlockManagerInfo (Logging.scala:logInfo(59)) - Added broadcast_1_piece0 
in memory on 10.0.0.112:48218 (size: 22.0 KB, free: 267.2 MB) 
2015-09-13 07:33:41,426 INFO  [main] spark.SparkContext 
(Logging.scala:logInfo(59)) - Created broadcast 1 from head at <console>:22 
2015-09-13 07:33:41,495 INFO  [main] log.PerfLogger 
(PerfLogger.java:PerfLogBegin(108)) - <PERFLOG method=OrcGetSplits 
from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
2015-09-13 07:33:41,497 INFO  [main] Configuration.deprecation 
(Configuration.java:warnOnceIfDeprecated(1049)) - mapred.input.dir is 
deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 
2015-09-13 07:33:41,501 INFO  [ORC_GET_SPLITS #0] s3n.S3NativeFileSystem 
(S3NativeFileSystem.java:listStatus(896)) - listStatus 
s3n://S3bucketName/S3serviceCode/yymmdd=20150801/country=eu/75e91844-2a87-4d8f-af9f-9268e34daef6-000000
 with recursive false 
2015-09-13 07:33:41,504 INFO  [ORC_GET_SPLITS #1] s3n.S3NativeFileSystem 
(S3NativeFileSystem.java:open(1159)) - Opening 
's3n://S3bucketName/S3serviceCode/yymmdd=20150801/country=eu/75e91844-2a87-4d8f-af9f-9268e34daef6-000000'
 for reading 
2015-09-13 07:33:41,593 INFO  [ORC_GET_SPLITS #1] amazonaws.latency 
(AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[206], 
ServiceName=[Amazon S3], AWSRequestID=[8DFE404E45BFD9CD], 
ServiceEndpoint=[https://S3bucketName.s3.amazonaws.com 
<https://s3bucketname.s3.amazonaws.com/>], HttpClientPoolLeasedCount=0, 
RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=0, 
ClientExecuteTime=[88.129], HttpRequestTime=[86.932], 
HttpClientReceiveResponseTime=[42.613], RequestSigningTime=[0.539], 
ResponseProcessingTime=[0.142], HttpClientSendRequestTime=[0.337], 
2015-09-13 07:33:41,594 INFO  [ORC_GET_SPLITS #1] s3n.S3NativeFileSystem 
(S3NativeFileSystem.java:retrievePair(292)) - Stream for key 
'S3serviceCode/yymmdd=20150801/country=eu/75e91844-2a87-4d8f-af9f-9268e34daef6-000000'
 seeking to position '217260502' 
2015-09-13 07:33:41,674 INFO  [ORC_GET_SPLITS #1] amazonaws.latency 
(AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[206], 
ServiceName=[Amazon S3], AWSRequestID=[040D77B7E7E76AA5], 
ServiceEndpoint=[https://S3bucketName.s3.amazonaws.com 
<https://s3bucketname.s3.amazonaws.com/>], HttpClientPoolLeasedCount=0, 
RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=0, 
ClientExecuteTime=[79.608], HttpRequestTime=[79.064], 
HttpClientReceiveResponseTime=[36.843], RequestSigningTime=[0.222], 
ResponseProcessingTime=[0.11], HttpClientSendRequestTime=[0.343], 
2015-09-13 07:33:41,681 ERROR [ORC_GET_SPLITS #1] orc.OrcInputFormat 
(OrcInputFormat.java:run(826)) - Unexpected Exception 
java.lang.ArrayIndexOutOfBoundsException: 3 
        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.createSplit(OrcInputFormat.java:694)

        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:822)

        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
        at java.lang.Thread.run(Thread.java:745) 
java.lang.RuntimeException: serious problem 
        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$Context.waitForTasks(OrcInputFormat.java:466)

        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:919)

        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:944)

        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) 
        at scala.Option.getOrElse(Option.scala:120) 
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) 
        at 
org.apache.spark.rdd.HadoopRDD$HadoopMapPartitionsWithSplitRDD.getPartitions(HadoopRDD.scala:375)

        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) 
        at scala.Option.getOrElse(Option.scala:120) 
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) 
        at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) 
        at scala.Option.getOrElse(Option.scala:120) 
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) 
        at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) 
        at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) 
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

        at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)

        at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) 
        at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:244) 
        at scala.collection.AbstractTraversable.map(Traversable.scala:105) 
        at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) 
        at scala.Option.getOrElse(Option.scala:120) 
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) 
        at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) 
        at scala.Option.getOrElse(Option.scala:120) 
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) 
        at 
org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:121) 
        at 
org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:125) 
        at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1269) 
        at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1203) 
        at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1210) 
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:22) 
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27) 
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29) 
        at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31) 
        at $iwC$$iwC$$iwC$$iwC.<init>(<console>:33) 
        at $iwC$$iwC$$iwC.<init>(<console>:35) 
        at $iwC$$iwC.<init>(<console>:37) 
        at $iwC.<init>(<console>:39) 
        at <init>(<console>:41) 
        at .<init>(<console>:45) 
        at .<clinit>(<console>) 
        at .<init>(<console>:7) 
        at .<clinit>(<console>) 
        at $print(<console>) 
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:606) 
        at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) 
        at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338) 
        at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) 
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) 
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) 
        at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) 
        at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) 
        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) 
        at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) 
        at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) 
        at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)

        at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)

        at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)

        at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)

        at 
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)

        at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)

        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) 
        at org.apache.spark.repl.Main$.main(Main.scala:31) 
        at org.apache.spark.repl.Main.main(Main.scala) 
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:606) 
        at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)

        at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170) 
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193) 
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112) 
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
Caused by: java.lang.ArrayIndexOutOfBoundsException: 3 
        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.createSplit(OrcInputFormat.java:694)

        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:822)

        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
        at java.lang.Thread.run(Thread.java:745) 

--
ca...@korea.com
cazen....@samsung.com
http://www.Cazen.co.kr

> 2015. 9. 13., 오후 3:00, Owen O'Malley <omal...@apache.org> 작성:
> 
> Do you have a stack trace of the array out of bounds exception? I don't 
> remember an array out of bounds problem off the top of my head. A stack trace 
> will tell me a lot, obviously.
> 
> If you are using Spark 1.4 that implies Hive 0.13, which is pretty old. It 
> may be a problem that we fixed a while ago.
> 
> Thanks,
>    Owen
> 
> 
> 
> On Sat, Sep 12, 2015 at 8:15 AM, Cazen Lee <cazen....@gmail.com 
> <mailto:cazen....@gmail.com>> wrote:
> Good Day!
> 
> I think there are some problems between ORC and AWS EMRFS.
> 
> When I was trying to read "upper 150M" ORC files from S3, ArrayOutOfIndex 
> Exception occured.
> 
> I'm sure that it's AWS side issue because there was no exception when trying 
> from HDFS or S3NativeFileSystem.
> 
> Parquet runs ordinarily but it's inconvenience(Almost our system runs based 
> on ORC)
> 
> Does anybody knows about this issue?
> 
> I've tried spark 1.4.1(EMR 4.0.0) and there are no 1.5 patch-note about this
> 
> Thank You
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> <mailto:user-unsubscr...@spark.apache.org>
> For additional commands, e-mail: user-h...@spark.apache.org 
> <mailto:user-h...@spark.apache.org>
> 
>

Re: [Question] ORC - EMRFS Problem

Reply via email to