[jira] [Updated] (KYLIN-3035) How to use Kylin on EMR with S3 as hbase storage

2017-11-12 Thread Shawn Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Wang updated KYLIN-3035:
--
Description: 
Can somebody give an example of how to use kylin on EMR with S3 as hbase 
storage, which support reuse the previously built cube on new EMR after the 
original EMR has been terminated.

My purpose is simple:
1. use transient EMR cluster to build cubes
2. use a persistent cluster to handle query requests

Of course, the culsters should share same hbase storage, so I setup the cluster 
to use S3 as hbase storage, after 2.2.0 fix the "HFile not written to S3" 
issue, I have been built a sample cube successfully, using configurations:

EMR:
{noformat}
[
{
"Classification": "hbase-site",
"Properties": {
"hbase.rootdir": "s3://kylin-emrfs/hbase-production"
}
},
{
"Classification": "hbase",
"Properties": {
"hbase.emr.storageMode": "s3"
}
},
{
"Classification": "emrfs-site",
"Properties": {
"fs.s3.consistent": "true",
"fs.s3.consistent.metadata.tableName": 
"KylinEmrFSMetadata"
}
}
]
{noformat}

kylin.propertities:
{noformat}
kylin.env.hdfs-working-dir=s3://kylin/kylin-emrfs/kylin-working-dir
kylin.server.mode=all
{noformat}

Then I create a new cluster with same EMR configuration and query mode for 
kylin, kylin just can't startup with errors:
{noformat}
2017-11-13 07:33:44,415 INFO  
[main-SendThread(ip-172-31-1-10.cn-north-1.compute.internal:2181)] 
zookeeper.ClientCnxn:876 : Socket connection established to 
ip-172-31-1-10.cn-north-1.compute.internal/172.31.1.10:2181, initiating session
2017-11-13 07:33:44,422 INFO  
[main-SendThread(ip-172-31-1-10.cn-north-1.compute.internal:2181)] 
zookeeper.ClientCnxn:1299 : Session establishment complete on server 
ip-172-31-1-10.cn-north-1.compute.internal/172.31.1.10:2181, sessionid = 
0x15fb4173c100156, negotiated timeout = 4
2017-11-13 07:33:48,380 DEBUG [main] hbase.HBaseConnection:279 : HTable 
'kylin_metadata' already exists
Exception in thread "main" java.lang.IllegalArgumentException: Failed to find 
metadata store by url: kylin_metadata@hbase
at 
org.apache.kylin.common.persistence.ResourceStore.createResourceStore(ResourceStore.java:89)
at 
org.apache.kylin.common.persistence.ResourceStore.getStore(ResourceStore.java:101)
at 
org.apache.kylin.rest.service.AclTableMigrationTool.checkIfNeedMigrate(AclTableMigrationTool.java:94)
at 
org.apache.kylin.tool.AclTableMigrationCLI.main(AclTableMigrationCLI.java:41)
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed 
after attempts=1, exceptions:
Mon Nov 13 07:33:48 UTC 2017, RpcRetryingCaller{globalStartTime=1510558428667, 
pause=100, retries=1}, java.net.ConnectException: 拒绝连接

at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:864)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:830)
at 
org.apache.kylin.storage.hbase.HBaseResourceStore.internalGetFromHTable(HBaseResourceStore.java:385)
at 
org.apache.kylin.storage.hbase.HBaseResourceStore.getFromHTable(HBaseResourceStore.java:363)
at 
org.apache.kylin.storage.hbase.HBaseResourceStore.existsImpl(HBaseResourceStore.java:116)
at 
org.apache.kylin.common.persistence.ResourceStore.exists(ResourceStore.java:144)
at 
org.apache.kylin.common.persistence.ResourceStore.createResourceStore(ResourceStore.java:84)
... 3 more
Caused by: java.net.ConnectException: 拒绝连接
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupConnection(RpcClientImpl.java:416)
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:722)
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:909)
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:873)
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1244)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractR

[jira] [Updated] (KYLIN-3035) How to use Kylin on EMR with S3 as hbase storage

2017-11-13 Thread Shawn Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Wang updated KYLIN-3035:
--
Description: 
Can somebody give an example of how to use kylin on EMR with S3 as hbase 
storage, which support reuse the previously built cube on new EMR after the 
original EMR has been terminated.

My purpose is simple:
1. use transient EMR cluster to build cubes
2. use a persistent cluster to handle query requests

Of course, the culsters should share same hbase storage, so I setup the cluster 
to use S3 as hbase storage, after 2.2.0 fix the "HFile not written to S3" 
issue, I have been built a sample cube successfully, using configurations:

EMR:
{noformat}
[
{
"Classification": "hbase-site",
"Properties": {
"hbase.rootdir": "s3://kylin-emrfs/hbase-production"
}
},
{
"Classification": "hbase",
"Properties": {
"hbase.emr.storageMode": "s3"
}
},
{
"Classification": "emrfs-site",
"Properties": {
"fs.s3.consistent": "true",
"fs.s3.consistent.metadata.tableName": 
"KylinEmrFSMetadata"
}
}
]
{noformat}

kylin.propertities:
{noformat}
kylin.env.hdfs-working-dir=s3://kylin-emrfs/kylin-working-dir
kylin.server.mode=all
{noformat}

Then I create a new cluster with same EMR configuration and query mode for 
kylin, kylin just can't startup with errors:
{noformat}
2017-11-13 07:33:44,415 INFO  
[main-SendThread(ip-172-31-1-10.cn-north-1.compute.internal:2181)] 
zookeeper.ClientCnxn:876 : Socket connection established to 
ip-172-31-1-10.cn-north-1.compute.internal/172.31.1.10:2181, initiating session
2017-11-13 07:33:44,422 INFO  
[main-SendThread(ip-172-31-1-10.cn-north-1.compute.internal:2181)] 
zookeeper.ClientCnxn:1299 : Session establishment complete on server 
ip-172-31-1-10.cn-north-1.compute.internal/172.31.1.10:2181, sessionid = 
0x15fb4173c100156, negotiated timeout = 4
2017-11-13 07:33:48,380 DEBUG [main] hbase.HBaseConnection:279 : HTable 
'kylin_metadata' already exists
Exception in thread "main" java.lang.IllegalArgumentException: Failed to find 
metadata store by url: kylin_metadata@hbase
at 
org.apache.kylin.common.persistence.ResourceStore.createResourceStore(ResourceStore.java:89)
at 
org.apache.kylin.common.persistence.ResourceStore.getStore(ResourceStore.java:101)
at 
org.apache.kylin.rest.service.AclTableMigrationTool.checkIfNeedMigrate(AclTableMigrationTool.java:94)
at 
org.apache.kylin.tool.AclTableMigrationCLI.main(AclTableMigrationCLI.java:41)
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed 
after attempts=1, exceptions:
Mon Nov 13 07:33:48 UTC 2017, RpcRetryingCaller{globalStartTime=1510558428667, 
pause=100, retries=1}, java.net.ConnectException: 拒绝连接

at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:864)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:830)
at 
org.apache.kylin.storage.hbase.HBaseResourceStore.internalGetFromHTable(HBaseResourceStore.java:385)
at 
org.apache.kylin.storage.hbase.HBaseResourceStore.getFromHTable(HBaseResourceStore.java:363)
at 
org.apache.kylin.storage.hbase.HBaseResourceStore.existsImpl(HBaseResourceStore.java:116)
at 
org.apache.kylin.common.persistence.ResourceStore.exists(ResourceStore.java:144)
at 
org.apache.kylin.common.persistence.ResourceStore.createResourceStore(ResourceStore.java:84)
... 3 more
Caused by: java.net.ConnectException: 拒绝连接
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupConnection(RpcClientImpl.java:416)
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:722)
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:909)
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:873)
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1244)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClie