How to config the size of one Region?

2016-02-18 Thread 吴国泉wgq
hi all:
   I wonder if I increase  the size  of the  Region, will it make bad 
effect on  the latency of the read operateion?
   The default size of the region is 10G  I saw from the official guide.
   I configured 100G per region on my cluster.
   Because  my Regionserver  Hardware parameters is :
 RAM: 128G
 CPU:  24 core
 Disk:   3T * 10total:30T
Regionserver disk capacity is 30T, if I use the default 
configuration(10G per region), this will result too many regions(about 
30T/3/10G=1000) on one region server.

But when I retrieve the 100G storefile, it will take a longer 
time(about 50ms,sometimes 100ms+) to return the result than a storefile which 
is 200M(below 10ms) dose.

I wonder to know  whether is the size of the region result in the 
difference?

If 100G is too big for a region, which size will be suitful for my 
region server?



吴国泉   wgq.wu
Post: DBA  Hbase
Email: wgq...@qunar.com
Tel: 13051697997
Adr: 中国电子大厦17层





Re: Store Large files on HBase/HDFS

2016-02-18 Thread Jameson Li
maybe U can parse the HDFS image file, then transform them as the Hfile,
and load into hbase Tables.
--remember to partition the hbase table

2016-02-18 7:40 GMT+08:00 Arun Patel :

> I would like to store large documents (over 100 MB) on HDFS and insert
> metadata in HBase.
>
> 1) Users will use HBase REST API for PUT and GET requests for storing and
> retrieving documents. In this case, how to PUT and GET documents to/from
> HDFS?What are the recommended ways for storing and accessing document
> to/from HDFS that provides optimum performance?
>
> Can you please share any sample code?  or a Github project?
>
> 2)  What are the performance issues I need to know?
>
> Regards,
> Arun
>



-- 


Thanks & Regards,
李剑 Jameson Li
Focus on Hadoop,Mysql


Re: Error : starting spark-shell with phoenix client jar

2016-02-18 Thread Ted Yu
If you cannot wait for the next HDP release with fix for PHOENIX-2608,
please consider rebuilding Phoenix with patch from PHOENIX-2608 applied.

Cheers

On Thu, Feb 18, 2016 at 6:19 PM, Divya Gehlot 
wrote:

> Thanks Ted for getting me to the root cause of the issue .
> I am using hortonworks distribution  HDP2.3.4.. How can I upgrade ?
> Could you provide me the steps ?
>
>
>
>
>
> On 18 February 2016 at 20:39, Ted Yu  wrote:
>
>> This was likely caused by version conflict of jackson dependencies between
>> Spark and Phoenix.
>> Phoenix uses 1.8.8 while Spark uses 1.9.13
>>
>> One solution is to upgrade the jackson version in Phoenix.
>> See PHOENIX-2608
>>
>>
>> On Thu, Feb 18, 2016 at 12:31 AM, Divya Gehlot 
>> wrote:
>>
>> > Hi,
>> > I am getting following error while starting spark shell with phoenix
>> > clients
>> > spark-shell  --jars
>> > /usr/hdp/current/phoenix-client/phoenix-4.4.0.2.3.4.0-3485-client.jar
>> > --driver-class-path
>> > /usr/hdp/current/phoenix-client/phoenix-4.4.0.2.3.4.0-3485-client.jar
>> > --master yarn-client
>> >
>> > StackTrace :
>> >
>> > >  INFO TimelineClientImpl: Timeline service address:
>> > >
>> >
>> http://ip-xxx-xx-xx-xxx.ap-southeast-1.compute.internal:8188/ws/v1/timeline/
>> > > java.lang.NoSuchMethodError:
>> > >
>> >
>> org.codehaus.jackson.map.ObjectMapper.setSerializationInclusion(Lorg/codehaus/jackson/map/annotate/JsonSerialize$Inclusion;)Lorg/codehaus/jackson/map/ObjectMapper;
>> > > at
>> > >
>> >
>> org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider.configObjectMapper(YarnJacksonJaxbJsonProvider.java:59)
>> > > at
>> > >
>> >
>> org.apache.hadoop.yarn.util.timeline.TimelineUtils.(TimelineUtils.java:50)
>> > > at
>> > >
>> >
>> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:172)
>> > > at
>> > >
>> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>> > > at
>> > >
>> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:108)
>> > > at
>> > >
>> >
>> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
>> > > at
>> > >
>> >
>> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
>> > > at
>> org.apache.spark.SparkContext.(SparkContext.scala:523)
>> > > at
>> > >
>> >
>> org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017)
>> > > at $iwC$$iwC.(:9)
>> > > at $iwC.(:18)
>> > > at (:20)
>> > > at .(:24)
>> > > at .()
>> > > at .(:7)
>> > > at .()
>> > > at $print()
>> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > > at
>> > >
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> > > at
>> > >
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> > > at java.lang.reflect.Method.invoke(Method.java:606)
>> > > at
>> > >
>> >
>> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>> > > at
>> > >
>> >
>> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
>> > > at
>> > > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>> > > at
>> > org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>> > > at
>> > org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>> > > at
>> > >
>> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
>> > > at
>> > >
>> >
>> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>> > > at
>> org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>> > > at
>> > >
>> >
>> org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:125)
>> > > at
>> > >
>> >
>> org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:124)
>> > > at
>> > > org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324)
>> > > at
>> > >
>> >
>> org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:124)
>> > > at
>> > > org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64)
>> > > at
>> > >
>> >
>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:974)
>> > > at
>> > >
>> >
>> org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:159)
>> > > at
>> > org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64)
>> > > at
>> > >
>> >
>> org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:108)
>> > > at
>> > >
>> org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:64)
>> > >

Re: Error : starting spark-shell with phoenix client jar

2016-02-18 Thread Divya Gehlot
Thanks Ted for getting me to the root cause of the issue .
I am using hortonworks distribution  HDP2.3.4.. How can I upgrade ?
Could you provide me the steps ?





On 18 February 2016 at 20:39, Ted Yu  wrote:

> This was likely caused by version conflict of jackson dependencies between
> Spark and Phoenix.
> Phoenix uses 1.8.8 while Spark uses 1.9.13
>
> One solution is to upgrade the jackson version in Phoenix.
> See PHOENIX-2608
>
>
> On Thu, Feb 18, 2016 at 12:31 AM, Divya Gehlot 
> wrote:
>
> > Hi,
> > I am getting following error while starting spark shell with phoenix
> > clients
> > spark-shell  --jars
> > /usr/hdp/current/phoenix-client/phoenix-4.4.0.2.3.4.0-3485-client.jar
> > --driver-class-path
> > /usr/hdp/current/phoenix-client/phoenix-4.4.0.2.3.4.0-3485-client.jar
> > --master yarn-client
> >
> > StackTrace :
> >
> > >  INFO TimelineClientImpl: Timeline service address:
> > >
> >
> http://ip-xxx-xx-xx-xxx.ap-southeast-1.compute.internal:8188/ws/v1/timeline/
> > > java.lang.NoSuchMethodError:
> > >
> >
> org.codehaus.jackson.map.ObjectMapper.setSerializationInclusion(Lorg/codehaus/jackson/map/annotate/JsonSerialize$Inclusion;)Lorg/codehaus/jackson/map/ObjectMapper;
> > > at
> > >
> >
> org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider.configObjectMapper(YarnJacksonJaxbJsonProvider.java:59)
> > > at
> > >
> >
> org.apache.hadoop.yarn.util.timeline.TimelineUtils.(TimelineUtils.java:50)
> > > at
> > >
> >
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:172)
> > > at
> > >
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> > > at
> > > org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:108)
> > > at
> > >
> >
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
> > > at
> > >
> >
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
> > > at org.apache.spark.SparkContext.(SparkContext.scala:523)
> > > at
> > >
> >
> org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017)
> > > at $iwC$$iwC.(:9)
> > > at $iwC.(:18)
> > > at (:20)
> > > at .(:24)
> > > at .()
> > > at .(:7)
> > > at .()
> > > at $print()
> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > at
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > > at
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > at java.lang.reflect.Method.invoke(Method.java:606)
> > > at
> > >
> >
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
> > > at
> > >
> >
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
> > > at
> > > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
> > > at
> > org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
> > > at
> > org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
> > > at
> > >
> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
> > > at
> > >
> >
> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
> > > at
> org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
> > > at
> > >
> >
> org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:125)
> > > at
> > >
> >
> org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:124)
> > > at
> > > org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324)
> > > at
> > >
> >
> org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:124)
> > > at
> > > org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64)
> > > at
> > >
> >
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:974)
> > > at
> > >
> >
> org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:159)
> > > at
> > org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64)
> > > at
> > >
> >
> org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:108)
> > > at
> > >
> org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:64)
> > > at
> > >
> >
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:991)
> > > at
> > >
> >
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> > > at
> > >
> >
> org.apache.spark.repl.SparkILoop$$anonfun$org$

Re: HBase Cluster not responding after shutting down one of slave nodes

2016-02-18 Thread Samir Ahmic
Thanks for explaining Hironori,
By shuting down DataNode process on host7516 we also have to add hadoop
recovery time in total recovery time.
Here is great blog explaining how hadoop recovery process works.

*http://blog.cloudera.com/blog/2015/02/understanding-hdfs-recovery-processes-part-1/
*

Regards
Samir

On Thu, Feb 18, 2016 at 11:04 AM, おぎばやしひろのり  wrote:

> Samir,
>
> Thank you for your advice.
> The actual operation I had to shutdown host7516 was, just click "stop"
> button on our VM console. I don't know the internal, but I can see
> some process terminating messages in /var/log/messages, so it looks
> like a kind of graceful shutdown.
> About reproducing, yes, I can, but not every time. maybe once in 3,4
> times or so. I am not sure the difference.
>
> I also checked the regionserver logs. There ware warnings related to
> DataNode access, this is because there was DataNode on the shutdown
> host7516, too.
> Regarding log split, I couldn't find any errors. Both host finished
> the task within a few seconds.
>
> --- host7517
> 2016-02-17 15:39:59,002 INFO  [LruBlockCacheStatsExecutor]
> hfile.LruBlockCache: totalSize=420.92 KB, freeSize=399.19 MB,
> max=399.60 MB, blockCount=0, accesses=24227, hits=0, hitRatio=0,
> cachingAccesses=0, cachingHits=0, cachingHitsRatio=0
> ,evictions=7289, evicted=0, evictedPerRun=0.0
> 2016-02-17 15:39:59,869 INFO  [HOST7517:16020Replication Statistics
> #0] regionserver.Replication: Normal source for cluster 1: Total
> replicated edits: 0, currently replicating from:
>
> hdfs://hdpts/apps/hbase/data/WALs/host7517.mydomain,16020,1455618296942/host7517.mydomain%2C16020%2C1455618296942.default.1455690301451
> at position: 96609590
>
> 2016-02-17 15:40:50,953 WARN  [ResponseProcessor for block
> BP-1843495860-192.168.189.219-1453778090403:blk_1073753847_13125]
> hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block
> BP-1843495860-192.168.189.219-1453778090403:blk_1073753847_13125
> java.io.IOException: Bad response ERROR for block
> BP-1843495860-192.168.189.219-1453778090403:blk_1073753847_13125 from
> datanode DatanodeInfoWithStorage[192.168.184.73:50010
> ,DS-9787b201-fc64-450e-a20f-dcc79fb94b6f,DISK]
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:786)
> 2016-02-17 15:40:50,954 WARN  [DataStreamer for file
>
> /apps/hbase/data/WALs/host7517.mydomain,16020,1455618296942/host7517.mydomain%2C16020%2C1455618296942.default.1455690301451
> block BP-1843495860-192.168.189.219-1453778090403:blk_1073753847_13125]
> hdfs.DFSClient: Error Recovery for block
> BP-1843495860-192.168.189.219-1453778090403:blk_1073753847_13125 in
> pipeline DatanodeInfoWithStorage[192.168.185.249:50010
> ,DS-52f00167-e863-48d8-8f98-3b613f774b0c,DISK],
> DatanodeInfoWithStorage[192.168.184.85:50010
> ,DS-f8c4c3b2-a7cc-40e0-9ebb-2b9c352c0d28,DISK],
> DatanodeInfoWithStorage[192.168.184.73:50010
> ,DS-9787b201-fc64-450e-a20f-dcc79fb94b6f,DISK]:
> bad datanode DatanodeInfoWithStorage[192.168.184.73:50010
> ,DS-9787b201-fc64-450e-a20f-dcc79fb94b6f,DISK]
> ...
> 2016-02-17 15:56:55,633 INFO  [SplitLogWorker-HOST7517:16020]
> coordination.ZkSplitLogWorkerCoordination: worker
> host7517.mydomain,16020,1455618296942 acquired task
>
> /hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C1455618299902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902.default.1455690304230
> 2016-02-17 15:56:55,677 INFO  [RS_LOG_REPLAY_OPS-HOST7517:16020-0]
> wal.WALSplitter: Splitting wal:
>
> hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902.default.1455690304230,
> length=83
> 2016-02-17 15:56:55,677 INFO  [RS_LOG_REPLAY_OPS-HOST7517:16020-0]
> wal.WALSplitter: DistributedLogReplay = false
> 2016-02-17 15:56:55,685 INFO  [RS_LOG_REPLAY_OPS-HOST7517:16020-0]
> util.FSHDFSUtils: Recovering lease on dfs file
>
> hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902.default.1455690304230
> 2016-02-17 15:56:55,696 INFO  [RS_LOG_REPLAY_OPS-HOST7517:16020-0]
> util.FSHDFSUtils: recoverLease=false, attempt=0 on
>
> file=hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902.default.1455690304230
> after 11ms
> 2016-02-17 15:56:59,698 INFO  [RS_LOG_REPLAY_OPS-HOST7517:16020-0]
> util.FSHDFSUtils: recoverLease=true, attempt=1 on
>
> file=hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020
> 2016-02-17 15:56:59,771 INFO
> [RS_LOG_REPLAY_OPS-HOST7517:16020-0-Writer-1] wal.WALSplitter:
> Creating writer
>
> path=hdfs://hdpts/apps/hbase/data/data/default/usertable/64199c31957c01b5bd9ee50b02e1f7fd/recovered.edits/0542072.temp
> region=64199c31957c01b5bd9ee50b02e1f7fd
> ...(similar lines)

Re: Error : starting spark-shell with phoenix client jar

2016-02-18 Thread Ted Yu
This was likely caused by version conflict of jackson dependencies between
Spark and Phoenix.
Phoenix uses 1.8.8 while Spark uses 1.9.13

One solution is to upgrade the jackson version in Phoenix.
See PHOENIX-2608


On Thu, Feb 18, 2016 at 12:31 AM, Divya Gehlot 
wrote:

> Hi,
> I am getting following error while starting spark shell with phoenix
> clients
> spark-shell  --jars
> /usr/hdp/current/phoenix-client/phoenix-4.4.0.2.3.4.0-3485-client.jar
> --driver-class-path
> /usr/hdp/current/phoenix-client/phoenix-4.4.0.2.3.4.0-3485-client.jar
> --master yarn-client
>
> StackTrace :
>
> >  INFO TimelineClientImpl: Timeline service address:
> >
> http://ip-xxx-xx-xx-xxx.ap-southeast-1.compute.internal:8188/ws/v1/timeline/
> > java.lang.NoSuchMethodError:
> >
> org.codehaus.jackson.map.ObjectMapper.setSerializationInclusion(Lorg/codehaus/jackson/map/annotate/JsonSerialize$Inclusion;)Lorg/codehaus/jackson/map/ObjectMapper;
> > at
> >
> org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider.configObjectMapper(YarnJacksonJaxbJsonProvider.java:59)
> > at
> >
> org.apache.hadoop.yarn.util.timeline.TimelineUtils.(TimelineUtils.java:50)
> > at
> >
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:172)
> > at
> > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> > at
> > org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:108)
> > at
> >
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
> > at
> >
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
> > at org.apache.spark.SparkContext.(SparkContext.scala:523)
> > at
> >
> org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017)
> > at $iwC$$iwC.(:9)
> > at $iwC.(:18)
> > at (:20)
> > at .(:24)
> > at .()
> > at .(:7)
> > at .()
> > at $print()
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:606)
> > at
> >
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
> > at
> >
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
> > at
> > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
> > at
> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
> > at
> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
> > at
> > org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
> > at
> >
> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
> > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
> > at
> >
> org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:125)
> > at
> >
> org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:124)
> > at
> > org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324)
> > at
> >
> org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:124)
> > at
> > org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64)
> > at
> >
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:974)
> > at
> >
> org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:159)
> > at
> org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64)
> > at
> >
> org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:108)
> > at
> > org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:64)
> > at
> >
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:991)
> > at
> >
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> > at
> >
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> > at
> >
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> > at org.apache.spark.repl.SparkILoop.org
> > $apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> > at
> org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> > at org.apache.spark.repl.Main$.main(Main.scala:31)
> > at org.apache.spark.repl.Main.main(Main.scala)
> >

Re: disable major compaction per table

2016-02-18 Thread Ted Yu
Please see the following for description on data block encoding :
http://hbase.apache.org/book.html#compression

If you have further questions on data block encoding, please start another
thread.

On Thu, Feb 18, 2016 at 3:16 AM, Shushant Arora 
wrote:

> Thanks!
>
> does hbase compress repeated values in keys and  columns :location say
> (ASIA). will that be repeated with each key or hbase snappy compression
> will handle that.
>
> same applies for repeated values of a column?
>
> Thanks!
>
> On Wed, Feb 17, 2016 at 7:14 AM, Ted Yu  wrote:
>
> > bq. hbase.hregion.majorcompaction = 0 per table/column family
> >
> > I searched code base but didn't find relevant test case for the above.
> > Mind giving me some pointer ?
> >
> > Thanks
> >
> > On Tue, Feb 16, 2016 at 5:38 PM, Vladimir Rodionov <
> vladrodio...@gmail.com
> > >
> > wrote:
> >
> > > 1.does major compaction in hbase runs per table basis.
> > >
> > > Per Region
> > >
> > > 2.By default every 24 hours?
> > >
> > > In older versions - yes. Current  (1.x+) - 7 days
> > >
> > > 3.Can I disable automatic major compaction for few tables while keep it
> > > enable for rest of tables?
> > >
> > > yes, you can. You can set
> > >
> > > hbase.hregion.majorcompaction = 0 per table/column family
> > >
> > > 4.Does hbase put ,get and delete are blocked while major compaction and
> > are
> > > working in minor compaction?
> > >
> > > No, they are not.
> > >
> > > -Vlad
> > >
> > > On Tue, Feb 16, 2016 at 4:51 PM, Ted Yu  wrote:
> > >
> > > > For #2, see http://hbase.apache.org/book.html#managed.compactions
> > > >
> > > > For #3, I don't think so.
> > > >
> > > > On Tue, Feb 16, 2016 at 4:46 PM, Shushant Arora <
> > > shushantaror...@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > 1.does major compaction in hbase runs per table basis.
> > > > > 2.By default every 24 hours?
> > > > > 3.Can I disable automatic major compaction for few tables while
> keep
> > it
> > > > > enable for rest of tables?
> > > > >
> > > > > 4.Does hbase put ,get and delete are blocked while major compaction
> > and
> > > > are
> > > > > working in minor compaction?
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > >
> >
>


Re: disable major compaction per table

2016-02-18 Thread Shushant Arora
Thanks!

does hbase compress repeated values in keys and  columns :location say
(ASIA). will that be repeated with each key or hbase snappy compression
will handle that.

same applies for repeated values of a column?

Thanks!

On Wed, Feb 17, 2016 at 7:14 AM, Ted Yu  wrote:

> bq. hbase.hregion.majorcompaction = 0 per table/column family
>
> I searched code base but didn't find relevant test case for the above.
> Mind giving me some pointer ?
>
> Thanks
>
> On Tue, Feb 16, 2016 at 5:38 PM, Vladimir Rodionov  >
> wrote:
>
> > 1.does major compaction in hbase runs per table basis.
> >
> > Per Region
> >
> > 2.By default every 24 hours?
> >
> > In older versions - yes. Current  (1.x+) - 7 days
> >
> > 3.Can I disable automatic major compaction for few tables while keep it
> > enable for rest of tables?
> >
> > yes, you can. You can set
> >
> > hbase.hregion.majorcompaction = 0 per table/column family
> >
> > 4.Does hbase put ,get and delete are blocked while major compaction and
> are
> > working in minor compaction?
> >
> > No, they are not.
> >
> > -Vlad
> >
> > On Tue, Feb 16, 2016 at 4:51 PM, Ted Yu  wrote:
> >
> > > For #2, see http://hbase.apache.org/book.html#managed.compactions
> > >
> > > For #3, I don't think so.
> > >
> > > On Tue, Feb 16, 2016 at 4:46 PM, Shushant Arora <
> > shushantaror...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi
> > > >
> > > > 1.does major compaction in hbase runs per table basis.
> > > > 2.By default every 24 hours?
> > > > 3.Can I disable automatic major compaction for few tables while keep
> it
> > > > enable for rest of tables?
> > > >
> > > > 4.Does hbase put ,get and delete are blocked while major compaction
> and
> > > are
> > > > working in minor compaction?
> > > >
> > > > Thanks
> > > >
> > >
> >
>


Re: HBase Cluster not responding after shutting down one of slave nodes

2016-02-18 Thread おぎばやしひろのり
Samir,

Thank you for your advice.
The actual operation I had to shutdown host7516 was, just click "stop"
button on our VM console. I don't know the internal, but I can see
some process terminating messages in /var/log/messages, so it looks
like a kind of graceful shutdown.
About reproducing, yes, I can, but not every time. maybe once in 3,4
times or so. I am not sure the difference.

I also checked the regionserver logs. There ware warnings related to
DataNode access, this is because there was DataNode on the shutdown
host7516, too.
Regarding log split, I couldn't find any errors. Both host finished
the task within a few seconds.

--- host7517
2016-02-17 15:39:59,002 INFO  [LruBlockCacheStatsExecutor]
hfile.LruBlockCache: totalSize=420.92 KB, freeSize=399.19 MB,
max=399.60 MB, blockCount=0, accesses=24227, hits=0, hitRatio=0,
cachingAccesses=0, cachingHits=0, cachingHitsRatio=0
,evictions=7289, evicted=0, evictedPerRun=0.0
2016-02-17 15:39:59,869 INFO  [HOST7517:16020Replication Statistics
#0] regionserver.Replication: Normal source for cluster 1: Total
replicated edits: 0, currently replicating from:
hdfs://hdpts/apps/hbase/data/WALs/host7517.mydomain,16020,1455618296942/host7517.mydomain%2C16020%2C1455618296942.default.1455690301451
at position: 96609590

2016-02-17 15:40:50,953 WARN  [ResponseProcessor for block
BP-1843495860-192.168.189.219-1453778090403:blk_1073753847_13125]
hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block
BP-1843495860-192.168.189.219-1453778090403:blk_1073753847_13125
java.io.IOException: Bad response ERROR for block
BP-1843495860-192.168.189.219-1453778090403:blk_1073753847_13125 from
datanode 
DatanodeInfoWithStorage[192.168.184.73:50010,DS-9787b201-fc64-450e-a20f-dcc79fb94b6f,DISK]
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:786)
2016-02-17 15:40:50,954 WARN  [DataStreamer for file
/apps/hbase/data/WALs/host7517.mydomain,16020,1455618296942/host7517.mydomain%2C16020%2C1455618296942.default.1455690301451
block BP-1843495860-192.168.189.219-1453778090403:blk_1073753847_13125]
hdfs.DFSClient: Error Recovery for block
BP-1843495860-192.168.189.219-1453778090403:blk_1073753847_13125 in
pipeline 
DatanodeInfoWithStorage[192.168.185.249:50010,DS-52f00167-e863-48d8-8f98-3b613f774b0c,DISK],
DatanodeInfoWithStorage[192.168.184.85:50010,DS-f8c4c3b2-a7cc-40e0-9ebb-2b9c352c0d28,DISK],
DatanodeInfoWithStorage[192.168.184.73:50010,DS-9787b201-fc64-450e-a20f-dcc79fb94b6f,DISK]:
bad datanode 
DatanodeInfoWithStorage[192.168.184.73:50010,DS-9787b201-fc64-450e-a20f-dcc79fb94b6f,DISK]
...
2016-02-17 15:56:55,633 INFO  [SplitLogWorker-HOST7517:16020]
coordination.ZkSplitLogWorkerCoordination: worker
host7517.mydomain,16020,1455618296942 acquired task
/hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C1455618299902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902.default.1455690304230
2016-02-17 15:56:55,677 INFO  [RS_LOG_REPLAY_OPS-HOST7517:16020-0]
wal.WALSplitter: Splitting wal:
hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902.default.1455690304230,
length=83
2016-02-17 15:56:55,677 INFO  [RS_LOG_REPLAY_OPS-HOST7517:16020-0]
wal.WALSplitter: DistributedLogReplay = false
2016-02-17 15:56:55,685 INFO  [RS_LOG_REPLAY_OPS-HOST7517:16020-0]
util.FSHDFSUtils: Recovering lease on dfs file
hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902.default.1455690304230
2016-02-17 15:56:55,696 INFO  [RS_LOG_REPLAY_OPS-HOST7517:16020-0]
util.FSHDFSUtils: recoverLease=false, attempt=0 on
file=hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902.default.1455690304230
after 11ms
2016-02-17 15:56:59,698 INFO  [RS_LOG_REPLAY_OPS-HOST7517:16020-0]
util.FSHDFSUtils: recoverLease=true, attempt=1 on
file=hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020
2016-02-17 15:56:59,771 INFO
[RS_LOG_REPLAY_OPS-HOST7517:16020-0-Writer-1] wal.WALSplitter:
Creating writer
path=hdfs://hdpts/apps/hbase/data/data/default/usertable/64199c31957c01b5bd9ee50b02e1f7fd/recovered.edits/0542072.temp
region=64199c31957c01b5bd9ee50b02e1f7fd
...(similar lines)
2016-02-17 15:57:01,001 INFO  [RS_LOG_REPLAY_OPS-HOST7517:16020-0]
wal.WALSplitter: Split writers finished
2016-02-17 15:57:01,034 INFO  [split-log-closeStream-3]
wal.WALSplitter: Rename
hdfs://hdpts/apps/hbase/data/data/default/usertable/06ed7277b7b9539a3ba597e0041acb12/recovered.edits/0054100.temp
to 
hdfs://hdpts/apps/hbase/data/data/default/usertable/06ed7277b7b9539a3ba597e0041acb12/recovered.edits/0055498
...(similar lines)
2016-02-17 15:57:01,332 INFO  [RS_LOG_REPLAY_OPS-HOST7517:16020-0]
wal.WALSplitter: Processed 73326 edits across 53 regions; edits
skipped=0; log 
file=hdfs://h

Filters & getNextCellHint details

2016-02-18 Thread Laurent Senta
Hi
I'm trying to use the key hints in a filter.

What I want to do:
- Retrieve the content of a given family A if, and only if, there is some
data in another column family B.

They are some details about the filter.getNextCellHint I'm not sure of:
- Are the families and qualifiers processed by the filter in lexicographic
order?
- When the cell hint is for another family, does HBase "jumps" to the next
family, without processing every qualifier in the current one?
(family B is very large; without batching I'll get random scan timeouts).

My implementation of HasDependentFamilyFilter(familyB):
- processCell(x): if family(x) == familyB => found=true, return
seek_using_hint
- nextKeyHint(x): return cell(row(x), family(x) + 1, ...)
- filterRow(): return found;

Is that correct even for very large familyB?

This is for HBase on CDH5.5, which should be based on HBase 1.0.

Thanks for taking the time to take a look,
Bests.


Error : starting spark-shell with phoenix client jar

2016-02-18 Thread Divya Gehlot
Hi,
I am getting following error while starting spark shell with phoenix
clients
spark-shell  --jars
/usr/hdp/current/phoenix-client/phoenix-4.4.0.2.3.4.0-3485-client.jar
--driver-class-path
/usr/hdp/current/phoenix-client/phoenix-4.4.0.2.3.4.0-3485-client.jar
--master yarn-client

StackTrace :

>  INFO TimelineClientImpl: Timeline service address:
> http://ip-xxx-xx-xx-xxx.ap-southeast-1.compute.internal:8188/ws/v1/timeline/
> java.lang.NoSuchMethodError:
> org.codehaus.jackson.map.ObjectMapper.setSerializationInclusion(Lorg/codehaus/jackson/map/annotate/JsonSerialize$Inclusion;)Lorg/codehaus/jackson/map/ObjectMapper;
> at
> org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider.configObjectMapper(YarnJacksonJaxbJsonProvider.java:59)
> at
> org.apache.hadoop.yarn.util.timeline.TimelineUtils.(TimelineUtils.java:50)
> at
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:172)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at
> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:108)
> at
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
> at
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
> at org.apache.spark.SparkContext.(SparkContext.scala:523)
> at
> org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017)
> at $iwC$$iwC.(:9)
> at $iwC.(:18)
> at (:20)
> at .(:24)
> at .()
> at .(:7)
> at .()
> at $print()
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
> at
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
> at
> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
> at
> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
> at
> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
> at
> org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:125)
> at
> org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:124)
> at
> org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324)
> at
> org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:124)
> at
> org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64)
> at
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:974)
> at
> org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:159)
> at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64)
> at
> org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:108)
> at
> org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:64)
> at
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:991)
> at
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at org.apache.spark.repl.SparkILoop.org
> $apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:685)
> at
> org.apache.spark.deploy.SparkSubmit$.doRunMai