[jira] [Commented] (KYLIN-3555) Garbage collection on HBase step fails with S3 selected as storage

2018-09-11 Thread JIRA


[ 
https://issues.apache.org/jira/browse/KYLIN-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16610423#comment-16610423
 ] 

Iñigo Martinez commented on KYLIN-3555:
---

This problem is NOT present on 2.4.0 or at least does not raise an exception.
{code:java}
2018-09-11 12:39:06,197 DEBUG [Scheduler 2125314127 Job 
f8416975-eea6-4500-9cb7-4374f28451dc-125] 
steps.HDFSPathGarbageCollectionStep:78 : Drop HDFS path on FileSystem: 
s3://-emr-kylin
2018-09-11 12:39:06,248 DEBUG [Scheduler 2125314127 Job 
f8416975-eea6-4500-9cb7-4374f28451dc-125] 
steps.HDFSPathGarbageCollectionStep:90 : HDFS path 
/kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/fact_distinct_columns
 not exists.
2018-09-11 12:39:06,390 DEBUG [Scheduler 2125314127 Job 
f8416975-eea6-4500-9cb7-4374f28451dc-125] 
steps.HDFSPathGarbageCollectionStep:90 : HDFS path 
/kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/hfile
 not exists.
2018-09-11 12:39:06,500 DEBUG [Scheduler 2125314127 Job 
f8416975-eea6-4500-9cb7-4374f28451dc-125] 
steps.HDFSPathGarbageCollectionStep:78 : Drop HDFS path on FileSystem: 
s3://-emr-kylin
2018-09-11 12:39:06,505 DEBUG [Scheduler 2125314127 Job 
f8416975-eea6-4500-9cb7-4374f28451dc-125] 
steps.HDFSPathGarbageCollectionStep:90 : HDFS path 
/kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/fact_distinct_columns
 not exists.
2018-09-11 12:39:06,552 DEBUG [Scheduler 2125314127 Job 
f8416975-eea6-4500-9cb7-4374f28451dc-125] 
steps.HDFSPathGarbageCollectionStep:90 : HDFS path 
/kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/hfile
 not exists.
2018-09-11 12:39:06,652 INFO [Scheduler 2125314127 Job 
f8416975-eea6-4500-9cb7-4374f28451dc-125] execution.ExecutableManager:411 : job 
id:f8416975-eea6-4500-9cb7-4374f28451dc-15 from RUNNING to SUCCEED
2018-09-11 12:39:06,695 INFO [Scheduler 2125314127 Job 
f8416975-eea6-4500-9cb7-4374f28451dc-125] execution.ExecutableManager:411 : job 
id:f8416975-eea6-4500-9cb7-4374f28451dc from RUNNING to SUCCEED
2018-09-11 12:39:06,696 DEBUG [Scheduler 2125314127 Job 
f8416975-eea6-4500-9cb7-4374f28451dc-125] execution.AbstractExecutable:310 : no 
need to send email, user list is empty
{code}

> Garbage collection on HBase step fails with S3 selected as storage
> --
>
> Key: KYLIN-3555
> URL: https://issues.apache.org/jira/browse/KYLIN-3555
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v2.4.1
>Reporter: Iñigo Martinez
>Priority: Major
>  Labels: build
> Attachments: Screenshot from 2018-09-11 12-31-25.png
>
>
> When building a cube with S3 selected has storage, build process fails at 
> latest step.
> Although s3 has been defined as storage, cleanup task tries to delete from 
> HDFS and, of course, there is no file at HDFS.
>  
> {code:java}
> 2018-09-11 12:27:56,311 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:78 : Drop HDFS path on FileSystem: 
> s3://XXX-emr-kylin
> 2018-09-11 12:27:57,364 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:87 : HDFS path 
> /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/fact_distinct_columns
>  is dropped.
> 2018-09-11 12:27:58,104 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:87 : HDFS path 
> /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/hfile
>  is dropped.
> 2018-09-11 12:27:58,140 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:78 : Drop HDFS path on FileSystem: 
> hdfs://ip-10-0-1-63.eu-west-1.compute.internal:8020
> 2018-09-11 12:27:58,142 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:90 : HDFS path 
> /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/fact_distinct_columns
>  not exists.
> 2018-09-11 12:27:58,147 ERROR [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:68 : 
> job:f8416975-eea6-4500-9cb7-4374f28451dc-15 execute finished with exception
> java.io.FileNotFoundException: File 
> /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1
>  does not exist.
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:904)
> at

[jira] [Commented] (KYLIN-3555) Garbage collection on HBase step fails with S3 selected as storage

2018-09-11 Thread Shaofeng SHI (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16611665#comment-16611665
 ] 

Shaofeng SHI commented on KYLIN-3555:
-

How did you configure the filesytem in kylin.properties? Especially the 
following parameters:

kylin.storage.hbase.cluster-fs=

kylin.env.hdfs-working-dir=

> Garbage collection on HBase step fails with S3 selected as storage
> --
>
> Key: KYLIN-3555
> URL: https://issues.apache.org/jira/browse/KYLIN-3555
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v2.4.1
>Reporter: Iñigo Martinez
>Priority: Major
>  Labels: build
> Attachments: Screenshot from 2018-09-11 12-31-25.png
>
>
> When building a cube with S3 selected has storage, build process fails at 
> latest step.
> Although s3 has been defined as storage, cleanup task tries to delete from 
> HDFS and, of course, there is no file at HDFS.
>  
> {code:java}
> 2018-09-11 12:27:56,311 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:78 : Drop HDFS path on FileSystem: 
> s3://XXX-emr-kylin
> 2018-09-11 12:27:57,364 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:87 : HDFS path 
> /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/fact_distinct_columns
>  is dropped.
> 2018-09-11 12:27:58,104 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:87 : HDFS path 
> /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/hfile
>  is dropped.
> 2018-09-11 12:27:58,140 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:78 : Drop HDFS path on FileSystem: 
> hdfs://ip-10-0-1-63.eu-west-1.compute.internal:8020
> 2018-09-11 12:27:58,142 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:90 : HDFS path 
> /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/fact_distinct_columns
>  not exists.
> 2018-09-11 12:27:58,147 ERROR [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:68 : 
> job:f8416975-eea6-4500-9cb7-4374f28451dc-15 execute finished with exception
> java.io.FileNotFoundException: File 
> /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1
>  does not exist.
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:904)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:964)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:961)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:971)
> at 
> org.apache.kylin.storage.hbase.steps.HDFSPathGarbageCollectionStep.dropHdfsPathOnCluster(HDFSPathGarbageCollectionStep.java:95)
> at 
> org.apache.kylin.storage.hbase.steps.HDFSPathGarbageCollectionStep.doWork(HDFSPathGarbageCollectionStep.java:65)
> at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162)
> at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:69)
> at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162)
> at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:113)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748){code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3555) Garbage collection on HBase step fails with S3 selected as storage

2018-09-12 Thread JIRA


[ 
https://issues.apache.org/jira/browse/KYLIN-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612285#comment-16612285
 ] 

Iñigo Martinez commented on KYLIN-3555:
---

Hi Shaofeng.

This is our config.

kylin.env.hdfs-working-dir=s3://XXX-emr-kylin/kylin/
kylin.storage.hbase.cluster-fs=s3://XXX-emr-kylin/hbase/

In 2.4.0 the config is exactly the same and this problem is not present.

We have compared 2.4.1 and 2.4.0 and it seems that some changes has been done 
in Garbage method.

https://github.com/apache/kylin/commit/3177d79ca5cd8533164319acda8676684a6d307e#diff-784d6aaca261296ea18c7dd2de78

 

> Garbage collection on HBase step fails with S3 selected as storage
> --
>
> Key: KYLIN-3555
> URL: https://issues.apache.org/jira/browse/KYLIN-3555
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v2.4.1
>Reporter: Iñigo Martinez
>Priority: Major
>  Labels: build
> Attachments: Screenshot from 2018-09-11 12-31-25.png
>
>
> When building a cube with S3 selected has storage, build process fails at 
> latest step.
> Although s3 has been defined as storage, cleanup task tries to delete from 
> HDFS and, of course, there is no file at HDFS.
>  
> {code:java}
> 2018-09-11 12:27:56,311 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:78 : Drop HDFS path on FileSystem: 
> s3://XXX-emr-kylin
> 2018-09-11 12:27:57,364 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:87 : HDFS path 
> /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/fact_distinct_columns
>  is dropped.
> 2018-09-11 12:27:58,104 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:87 : HDFS path 
> /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/hfile
>  is dropped.
> 2018-09-11 12:27:58,140 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:78 : Drop HDFS path on FileSystem: 
> hdfs://ip-10-0-1-63.eu-west-1.compute.internal:8020
> 2018-09-11 12:27:58,142 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:90 : HDFS path 
> /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/fact_distinct_columns
>  not exists.
> 2018-09-11 12:27:58,147 ERROR [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:68 : 
> job:f8416975-eea6-4500-9cb7-4374f28451dc-15 execute finished with exception
> java.io.FileNotFoundException: File 
> /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1
>  does not exist.
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:904)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:964)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:961)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:971)
> at 
> org.apache.kylin.storage.hbase.steps.HDFSPathGarbageCollectionStep.dropHdfsPathOnCluster(HDFSPathGarbageCollectionStep.java:95)
> at 
> org.apache.kylin.storage.hbase.steps.HDFSPathGarbageCollectionStep.doWork(HDFSPathGarbageCollectionStep.java:65)
> at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162)
> at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:69)
> at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162)
> at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:113)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748){code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3555) Garbage collection on HBase step fails with S3 selected as storage

2018-09-13 Thread Shaofeng SHI (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614204#comment-16614204
 ] 

Shaofeng SHI commented on KYLIN-3555:
-

The "kylin.storage.hbase.cluster-fs" need to be a file system uri, not an 
alsolute path. In this case, as you use S3 for both, so  
"kylin.storage.hbase.cluster-fs" can be empty.

> Garbage collection on HBase step fails with S3 selected as storage
> --
>
> Key: KYLIN-3555
> URL: https://issues.apache.org/jira/browse/KYLIN-3555
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v2.4.1
>Reporter: Iñigo Martinez
>Priority: Major
>  Labels: build
> Attachments: Screenshot from 2018-09-11 12-31-25.png
>
>
> When building a cube with S3 selected has storage, build process fails at 
> latest step.
> Although s3 has been defined as storage, cleanup task tries to delete from 
> HDFS and, of course, there is no file at HDFS.
>  
> {code:java}
> 2018-09-11 12:27:56,311 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:78 : Drop HDFS path on FileSystem: 
> s3://XXX-emr-kylin
> 2018-09-11 12:27:57,364 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:87 : HDFS path 
> /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/fact_distinct_columns
>  is dropped.
> 2018-09-11 12:27:58,104 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:87 : HDFS path 
> /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/hfile
>  is dropped.
> 2018-09-11 12:27:58,140 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:78 : Drop HDFS path on FileSystem: 
> hdfs://ip-10-0-1-63.eu-west-1.compute.internal:8020
> 2018-09-11 12:27:58,142 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:90 : HDFS path 
> /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/fact_distinct_columns
>  not exists.
> 2018-09-11 12:27:58,147 ERROR [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:68 : 
> job:f8416975-eea6-4500-9cb7-4374f28451dc-15 execute finished with exception
> java.io.FileNotFoundException: File 
> /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1
>  does not exist.
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:904)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:964)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:961)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:971)
> at 
> org.apache.kylin.storage.hbase.steps.HDFSPathGarbageCollectionStep.dropHdfsPathOnCluster(HDFSPathGarbageCollectionStep.java:95)
> at 
> org.apache.kylin.storage.hbase.steps.HDFSPathGarbageCollectionStep.doWork(HDFSPathGarbageCollectionStep.java:65)
> at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162)
> at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:69)
> at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162)
> at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:113)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748){code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3555) Garbage collection on HBase step fails with S3 selected as storage

2018-09-14 Thread JIRA


[ 
https://issues.apache.org/jira/browse/KYLIN-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614536#comment-16614536
 ] 

Iñigo Martinez commented on KYLIN-3555:
---

Hi Shaofeng.

According to install documentation for S3 / EMR, we should use an absolute URI.

[http://kylin.apache.org/docs23/install/kylin_aws_emr.html]

By using a relative path, it fails into HDFS. This causes some troubles because 
we use a different cluster for Hive jobs and only way to share files between 
hive and our kylin build clusters is by using S3 storage.

> Garbage collection on HBase step fails with S3 selected as storage
> --
>
> Key: KYLIN-3555
> URL: https://issues.apache.org/jira/browse/KYLIN-3555
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v2.4.1
>Reporter: Iñigo Martinez
>Priority: Major
>  Labels: build
> Attachments: Screenshot from 2018-09-11 12-31-25.png
>
>
> When building a cube with S3 selected has storage, build process fails at 
> latest step.
> Although s3 has been defined as storage, cleanup task tries to delete from 
> HDFS and, of course, there is no file at HDFS.
>  
> {code:java}
> 2018-09-11 12:27:56,311 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:78 : Drop HDFS path on FileSystem: 
> s3://XXX-emr-kylin
> 2018-09-11 12:27:57,364 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:87 : HDFS path 
> /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/fact_distinct_columns
>  is dropped.
> 2018-09-11 12:27:58,104 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:87 : HDFS path 
> /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/hfile
>  is dropped.
> 2018-09-11 12:27:58,140 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:78 : Drop HDFS path on FileSystem: 
> hdfs://ip-10-0-1-63.eu-west-1.compute.internal:8020
> 2018-09-11 12:27:58,142 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:90 : HDFS path 
> /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/fact_distinct_columns
>  not exists.
> 2018-09-11 12:27:58,147 ERROR [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:68 : 
> job:f8416975-eea6-4500-9cb7-4374f28451dc-15 execute finished with exception
> java.io.FileNotFoundException: File 
> /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1
>  does not exist.
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:904)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:964)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:961)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:971)
> at 
> org.apache.kylin.storage.hbase.steps.HDFSPathGarbageCollectionStep.dropHdfsPathOnCluster(HDFSPathGarbageCollectionStep.java:95)
> at 
> org.apache.kylin.storage.hbase.steps.HDFSPathGarbageCollectionStep.doWork(HDFSPathGarbageCollectionStep.java:65)
> at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162)
> at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:69)
> at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162)
> at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:113)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748){code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3555) Garbage collection on HBase step fails with S3 selected as storage

2018-11-01 Thread Gaurav Rawat (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16672507#comment-16672507
 ] 

Gaurav Rawat commented on KYLIN-3555:
-

I also faced the same with similar settings trying to use   
"kylin.storage.hbase.cluster-fs" as empty for now and looks to be working don't 
see the error in the last step now .

> Garbage collection on HBase step fails with S3 selected as storage
> --
>
> Key: KYLIN-3555
> URL: https://issues.apache.org/jira/browse/KYLIN-3555
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v2.4.1
>Reporter: Iñigo Martinez
>Priority: Major
>  Labels: build
> Attachments: Screenshot from 2018-09-11 12-31-25.png
>
>
> When building a cube with S3 selected has storage, build process fails at 
> latest step.
> Although s3 has been defined as storage, cleanup task tries to delete from 
> HDFS and, of course, there is no file at HDFS.
>  
> {code:java}
> 2018-09-11 12:27:56,311 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:78 : Drop HDFS path on FileSystem: 
> s3://XXX-emr-kylin
> 2018-09-11 12:27:57,364 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:87 : HDFS path 
> /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/fact_distinct_columns
>  is dropped.
> 2018-09-11 12:27:58,104 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:87 : HDFS path 
> /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/hfile
>  is dropped.
> 2018-09-11 12:27:58,140 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:78 : Drop HDFS path on FileSystem: 
> hdfs://ip-10-0-1-63.eu-west-1.compute.internal:8020
> 2018-09-11 12:27:58,142 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:90 : HDFS path 
> /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/fact_distinct_columns
>  not exists.
> 2018-09-11 12:27:58,147 ERROR [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:68 : 
> job:f8416975-eea6-4500-9cb7-4374f28451dc-15 execute finished with exception
> java.io.FileNotFoundException: File 
> /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1
>  does not exist.
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:904)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:964)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:961)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:971)
> at 
> org.apache.kylin.storage.hbase.steps.HDFSPathGarbageCollectionStep.dropHdfsPathOnCluster(HDFSPathGarbageCollectionStep.java:95)
> at 
> org.apache.kylin.storage.hbase.steps.HDFSPathGarbageCollectionStep.doWork(HDFSPathGarbageCollectionStep.java:65)
> at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162)
> at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:69)
> at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162)
> at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:113)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748){code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)