Andras Istvan Nagy created KYLIN-4299:
-----------------------------------------

             Summary: Issue with building real-time segment cache into HBase 
when using S3 as working dir
                 Key: KYLIN-4299
                 URL: https://issues.apache.org/jira/browse/KYLIN-4299
             Project: Kylin
          Issue Type: Bug
          Components: Real-time Streaming
    Affects Versions: v3.0.0-alpha2
            Reporter: Andras Istvan Nagy


We have an issue with using S3 as working dir for Kylin when using real-time 
streaming. The reason why we would like to do this is to have no state in HDFS, 
so the actual runtime environment running Kylin becomes stateless. 
We already have HBase data on S3, but there is persistent data also in 
{{kylin.env.hdfs-working-dir}} (cube dictionaries), so we need to have that in 
S3 as well to have a setup where it's possible to fail over to a new cluster 
without having to rebuild all cubes.

We are using the real-time streaming feature in Kylin, which persists segment 
caches hourly and a MR job merges those hourly segments into HBase. In these MR 
jobs, we get the following exception:
{code:java}
Error: java.lang.IllegalArgumentException: Wrong FS: 
s3://kylin-XXXXX/kylin-dev/hdfs-rootdir/kylin_metadata/stream/tops_jaywalks/20191206010000_20191206020000/1/1,
 expected: hdfs://ip-24-0-3-243.us-west-2.compute.internal:8020 at 
org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:669) at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:214)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:897)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:964)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:961)
 at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:971)
 at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1551) at 
org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1577) at 
org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1625) at 
org.apache.hadoop.fs.FileSystem$4.<init>(FileSystem.java:1808) at 
org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1807) at 
org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1785) at 
org.apache.hadoop.fs.FileSystem$6.<init>(FileSystem.java:1887) at 
org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:1885) at 
org.apache.kylin.engine.mr.streaming.ColumnarFilesReader.checkPath(ColumnarFilesReader.java:46)
 at 
org.apache.kylin.engine.mr.streaming.ColumnarFilesReader.<init>(ColumnarFilesReader.java:41)
 at 
org.apache.kylin.engine.mr.streaming.DictsReader.<init>(DictsReader.java:43) at 
org.apache.kylin.engine.mr.streaming.ColumnarSplitDictReader.init(ColumnarSplitDictReader.java:65)
 at 
org.apache.kylin.engine.mr.streaming.ColumnarSplitDictReader.<init>(ColumnarSplitDictReader.java:52)
 at 
org.apache.kylin.engine.mr.streaming.ColumnarSplitDictInputFormat.createRecordReader(ColumnarSplitDictInputFormat.java:32)
 at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:524)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at 
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:422) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) at 
org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:173)
 at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
 at 
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
 at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
 at 
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to