Re: Supervisely, RAID0 provides best io performance whereas no RAID the worst

2016-08-02 Thread Dejan Menges
Hi Shady,

Great point, didn't know it. Thanks a lot, will definitely check if this
was only related to HWX distribution.

Thanks a lot, and sorry if I spammed this topic, it wasn't my intention at
all.

Dejan

On Tue, Aug 2, 2016 at 9:37 AM Shady Xu  wrote:

> Hi Dejan,
>
> I checked on Github and found that DEFAULT_DATA_SOCKET_SIZE locates in
> the hadoop-hdfs-project/hadoop-hdfs-client/ package in the apache version
> of Hadoop, whereas hadoop-hdfs-project/hadoop-hdfs/ in that of
> Hortonworks.   I am not sure if that means that parameter affects the
> performance of Hadoop client in Apache HDFS and the performance of DataNode
> in HortonWorks HDFS. If that's the fact, maybe it's a bug brought in by
> HortonWorks?
>
> 2016-08-01 17:47 GMT+08:00 Dejan Menges :
>
>> Hi Shady,
>>
>> We did extensive tests on this and received fix from Hortonworks which we
>> are probably first and only to test most likely tomorrow evening. If
>> Hortonworks guys are reading this maybe they know official HDFS ticket ID
>> for this, if there is such, as I can not find it in our correspondence.
>> Long story short - single server had RAID controllers with 1G and 2G cache
>> (both scenarios were tested). It started just as a simple benchmark test
>> using TestDFSIO after trying to narrow down best configuration on server
>> side (discussions like this one, JBOD, RAID0, benchmarking etc). However,
>> having 10-12 disks in a single server, and mentioned controllers, we got
>> 6-10 times higher write speed when not using replication (meaning using
>> replication factor one). Took really months to narrow it down to single
>> hardcoded value in HdfsConstants.DEFAULT_DATA_SOCKET_SIZE (just looking
>> into patch). In the
>> end 
>> tcpPeerServer.setReceiveBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE)
>> basically limited write speed to this constant when using replication,
>> which is super annoying (specially in the context where more or less
>> everyone is using now network speed bigger than 100Mbps). This can be found
>> in 
>> b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
>>
>> On Mon, Aug 1, 2016 at 11:39 AM Shady Xu  wrote:
>>
>>> Thanks Allen. I am aware of the fact you said and am wondering what's
>>> the await and svctm on your cluster nodes. If there are no signifiant
>>> difference, maybe I should try other ways to tune my HBase.
>>>
>>> And Dejan, I've never heard of or noticed what you said. If that's true
>>> it's really disappointing and please notice us if there's any progress.
>>>
>>> 2016-08-01 15:33 GMT+08:00 Dejan Menges :
>>>
>>>> Sorry for jumping in, but hence performance... it took as a while to
>>>> figure out why, whatever disk/RAID0 performance you have, when it comes to
>>>> HDFS and replication factor bigger then zero, disk write speed drops to
>>>> 100Mbps... After long long tests with Hortonworks they found that issue is
>>>> that someone at some point in history hardcoded stuff somewhere, and
>>>> whatever setup you have, you were limited to this. Luckily we have quite
>>>> powerful testing environment and plan is to test this patch later this
>>>> week. I'm not sure if there's either official HDFS bug for this, checked
>>>> our internal history but didn't see anything like that.
>>>>
>>>> This was quite disappointing, as whatever tuning, controllers, setups
>>>> you do, it goes down the water with this.
>>>>
>>>> On Mon, Aug 1, 2016 at 8:30 AM Allen Wittenauer  wrote:
>>>>
>>>>>
>>>>>
>>>>> On 2016-07-30 20:12 (-0700), Shady Xu  wrote:
>>>>> > Thanks Andrew, I know about the disk failure risk and that it's one
>>>>> of the
>>>>> > reasons why we should use JBOD. But JBOD provides worse performance
>>>>> than
>>>>> > RAID 0.
>>>>>
>>>>> It's not about failure: it's about speed.  RAID0 performance will drop
>>>>> like a rock if any one disk in the set is slow. When all the drives are
>>>>> performing at peak, yes, it's definitely faster.  But over time, drive
>>>>> speed will decline (sometimes to half speed or less!) usually prior to a
>>>>> failure. This failure may take a while, so in the mean time your cluster 
>>>>> is
>>>>> getting slower ... and slower ... and slower ...
>>>>>
>>>>> As a result, JBOD will be significantly faster over the _lifetime_ of
>>>>> the disks vs. a comparison made _today_.
>>>>>
>>>>> -
>>>>> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
>>>>> For additional commands, e-mail: user-h...@hadoop.apache.org
>>>>>
>>>>>
>>>
>


Re: Supervisely, RAID0 provides best io performance whereas no RAID the worst

2016-08-01 Thread Dejan Menges
Hi Shady,

We did extensive tests on this and received fix from Hortonworks which we
are probably first and only to test most likely tomorrow evening. If
Hortonworks guys are reading this maybe they know official HDFS ticket ID
for this, if there is such, as I can not find it in our correspondence.
Long story short - single server had RAID controllers with 1G and 2G cache
(both scenarios were tested). It started just as a simple benchmark test
using TestDFSIO after trying to narrow down best configuration on server
side (discussions like this one, JBOD, RAID0, benchmarking etc). However,
having 10-12 disks in a single server, and mentioned controllers, we got
6-10 times higher write speed when not using replication (meaning using
replication factor one). Took really months to narrow it down to single
hardcoded value in HdfsConstants.DEFAULT_DATA_SOCKET_SIZE (just looking
into patch). In the
end tcpPeerServer.setReceiveBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE)
basically limited write speed to this constant when using replication,
which is super annoying (specially in the context where more or less
everyone is using now network speed bigger than 100Mbps). This can be found
in 
b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java

On Mon, Aug 1, 2016 at 11:39 AM Shady Xu  wrote:

> Thanks Allen. I am aware of the fact you said and am wondering what's the
> await and svctm on your cluster nodes. If there are no signifiant
> difference, maybe I should try other ways to tune my HBase.
>
> And Dejan, I've never heard of or noticed what you said. If that's true
> it's really disappointing and please notice us if there's any progress.
>
> 2016-08-01 15:33 GMT+08:00 Dejan Menges :
>
>> Sorry for jumping in, but hence performance... it took as a while to
>> figure out why, whatever disk/RAID0 performance you have, when it comes to
>> HDFS and replication factor bigger then zero, disk write speed drops to
>> 100Mbps... After long long tests with Hortonworks they found that issue is
>> that someone at some point in history hardcoded stuff somewhere, and
>> whatever setup you have, you were limited to this. Luckily we have quite
>> powerful testing environment and plan is to test this patch later this
>> week. I'm not sure if there's either official HDFS bug for this, checked
>> our internal history but didn't see anything like that.
>>
>> This was quite disappointing, as whatever tuning, controllers, setups you
>> do, it goes down the water with this.
>>
>> On Mon, Aug 1, 2016 at 8:30 AM Allen Wittenauer  wrote:
>>
>>>
>>>
>>> On 2016-07-30 20:12 (-0700), Shady Xu  wrote:
>>> > Thanks Andrew, I know about the disk failure risk and that it's one of
>>> the
>>> > reasons why we should use JBOD. But JBOD provides worse performance
>>> than
>>> > RAID 0.
>>>
>>> It's not about failure: it's about speed.  RAID0 performance will drop
>>> like a rock if any one disk in the set is slow. When all the drives are
>>> performing at peak, yes, it's definitely faster.  But over time, drive
>>> speed will decline (sometimes to half speed or less!) usually prior to a
>>> failure. This failure may take a while, so in the mean time your cluster is
>>> getting slower ... and slower ... and slower ...
>>>
>>> As a result, JBOD will be significantly faster over the _lifetime_ of
>>> the disks vs. a comparison made _today_.
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
>>> For additional commands, e-mail: user-h...@hadoop.apache.org
>>>
>>>
>


Re: Supervisely, RAID0 provides best io performance whereas no RAID the worst

2016-08-01 Thread Dejan Menges
Sorry for jumping in, but hence performance... it took as a while to figure
out why, whatever disk/RAID0 performance you have, when it comes to HDFS
and replication factor bigger then zero, disk write speed drops to
100Mbps... After long long tests with Hortonworks they found that issue is
that someone at some point in history hardcoded stuff somewhere, and
whatever setup you have, you were limited to this. Luckily we have quite
powerful testing environment and plan is to test this patch later this
week. I'm not sure if there's either official HDFS bug for this, checked
our internal history but didn't see anything like that.

This was quite disappointing, as whatever tuning, controllers, setups you
do, it goes down the water with this.

On Mon, Aug 1, 2016 at 8:30 AM Allen Wittenauer  wrote:

>
>
> On 2016-07-30 20:12 (-0700), Shady Xu  wrote:
> > Thanks Andrew, I know about the disk failure risk and that it's one of
> the
> > reasons why we should use JBOD. But JBOD provides worse performance than
> > RAID 0.
>
> It's not about failure: it's about speed.  RAID0 performance will drop
> like a rock if any one disk in the set is slow. When all the drives are
> performing at peak, yes, it's definitely faster.  But over time, drive
> speed will decline (sometimes to half speed or less!) usually prior to a
> failure. This failure may take a while, so in the mean time your cluster is
> getting slower ... and slower ... and slower ...
>
> As a result, JBOD will be significantly faster over the _lifetime_ of the
> disks vs. a comparison made _today_.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
>
>


Re: Hadoop In Real Scenario

2016-06-22 Thread Dejan Menges
Hello Renjith,

Hortonworks have self contained box where you can just download and spin up
stuff and see how it looks like:

http://hortonworks.com/downloads/#sandbox

Cheers,
Dejan

On Wed, Jun 22, 2016 at 6:33 PM Renjith  wrote:

> Hello All,
>
> before proceeding, seek an expert advice from the group, whether we can
> use docker for mac and then get the docker image for hadoop. i observe that
> docker is light weight as they share the same system kernel as well as less
> RAM space.
>
> kindly advice as i am going to remove VM ware fusion from mac as it
> occupies majority of my mac memory.
>
> Thanks,
> Renjith
>
> On 22 Jun 2016, at 08:17, Phillip Wu  wrote:
>
> You should be able to run it on a Mac as there is a Java engine for the Mac
>
> Hadoop binary are java binaries
>
> *From:* Renjith Gk [mailto:renjit...@gmail.com ]
> *Sent:* Wednesday, 22 June 2016 12:15 PM
> *To:* Phillip Wu 
> *Cc:* user@hadoop.apache.org
> *Subject:* RE: Hadoop In Real Scenario
>
>
> Thanks philip.
>
> As u had mentioned it runs on unix/Linux. Can it run on an unix flavour
> machine  (mac os ) which I am using than installing vmware for Mac and then
> installing Linux os + javasdk +hadoop.
> On 22 Jun 2016 07:15, "Phillip Wu"  wrote:
>
> This is my understanding:
> Hadoop provides a filesystem called hdfs which allows you store files,
> delete files but not update files.
> There are other provides that can be installed on top of Hadoop eg. Hive
> that provide SQL access to hdfs
>
> Hadoop simply can be one node or many nodes.
>
> Each node can be a namenode or datanode.
> Namenode store information about files. Datanodes store the file contents.
>
> The code is java and so can run on most Unix/Linux servers.
>
> Installation is by:
> 1. Download the software onto one node
> 2. ungzip/untar the software
> 3. Configure the software
> 4. Copy the same software and configuration to other node(s)
> 5. Format the namenode
> 6. Start the Hadoop daemons
>
> -Original Message-
> From: Renjith [mailto:renjit...@gmail.com]
> Sent: Tuesday, 21 June 2016 1:30 AM
> To: user@hadoop.apache.org
> Subject: Hadoop In Real Scenario
>
> Dear Mates,
>
> I am a begineer in Hadoop and very new to this.
>
> I would like to know how hadoop is used. is there any UI to allocate the
> nodes and store files. or is it done at the backend or we should have the
> backend knowledge.
>
> can you provide a real scenario used.
>
> came to know about Ambari , kindly provide some information on this.
>
> can it return on any Unix flavour. from an hadoop administrator role, what
> are the activities
>
> Thanks,
> Renjith
>
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
>
>
>


Re: Reliability of Hadoop

2016-05-27 Thread Dejan Menges
Hi Deepak,

Hadoop is just platform (Hadoop and all around it). Toolset to do what you
want to do.

If you are writing bad code you can't blame programming language. It's you
not being able to write good code. There's also nothing bad in using
commodity hardware (and not sure I understand whats' commodity software).
In this very moment, while we are exchanging this - how much do we know or
care on which hardware mail servers are running? We don't, neither we care.

For whitepapers and use cases internet is full of them.

My company is keeping majority of the really important data in Hadoop
ecosystem. Some of the best software developers I met so far are writing
different types of code from it, from analytics to development of in house
software and plugins for different things.

However, I'm not sure that anyone on any mailing list can give you answers
than you need. I would start with official documentation and understanding
how specific component works in depth and why it works the way it works.

My 2c

Cheers,
Dejan

On Fri, May 27, 2016 at 9:41 PM Deepak Goel  wrote:

> Sorry once again if I am wrong, or my comments are without significance
>
> I am not saying Hadoop is bad or good...It is just that Hadoop might be
> indirectly encouraging commodity hardware and software to be developed
> which is convenient but might not be very good (also the cost factor is
> unproven with no proper case studies or whitepaper)
>
> It is like the fast food industry, which is very convenient (a commodity)
> but causing obesity all over the world (And hence also causing many
> illness, poor health, social trauma therefore the cost of a burger to
> anyone is actually far more than what a company charges when you eat it)
>
> In effect what Hadoop (and all the other commercial software around it) is
> saying that its ok if you have bad software (Application, JVM, OS), I will
> provide another software which will hide all the problems of yours... We
> might all just go the obesity way in the software industry too
>
>
>
> Hey
>
> Namaskara~Nalama~Guten Tag~Bonjour
>
>
>
>--
> Keigu
>
> Deepak
> 73500 12833
> www.simtree.net, dee...@simtree.net
> deic...@gmail.com
>
> LinkedIn: www.linkedin.com/in/deicool
> Skype: thumsupdeicool
> Google talk: deicool
> Blog: http://loveandfearless.wordpress.com
> Facebook: http://www.facebook.com/deicool
>
> "Contribute to the world, environment and more :
> http://www.gridrepublic.org
> "
>
> On Sat, May 28, 2016 at 12:51 AM, J. Rottinghuis 
> wrote:
>
>> We run several clusters of thousands of nodes (as do many companies), our
>> largest one has over 10K nodes. Disks, machines, memory, and network fail
>> all the time. The larger the scale, the higher the odds that some machine
>> is bad in a given day. On the other hand, scale helps. If a single node our
>> of 10K fails, 9,999 others participate in re-distributing state. Even a
>> rack failure isn't a big deal most of the time (plus typically a rack fails
>> due to a TOR issue, so the data is offline, but typically not lost
>> permanently).
>>
>> Hadoop is designed to deal with this, and by-and-large it does. Critical
>> components (such as Namenodes) can be configured to run in an HA pair with
>> automatic failover. There is quite a bit of work going on by many in the
>> Hadoop community to keep pushing the boundaries of scale.
>>
>> A node or a rack failing in a large cluster actually has less impact than
>> at smaller scale. With a 5-node cluster, if 1 machine crashes you've taken
>> 20% capacity (disk and compute) offline. 1 out of 1K barely registers.
>> Ditto with a 3 rack cluster. Loose a rack and 1/3rd of your capacity is
>> offline.
>>
>> It is large-scale coordinated failure you should worry about. Think
>> several rows of racks coming offline due to power failure, a DC going
>> offline due to fire in the building etc. Those are hard to deal with in
>> software within a single DC. They should also be more rare, but as many
>> companies have experienced, large scale coordinated failures do
>> occasionally happen.
>>
>> As to your question in the other email thread, it is a well-established
>> pattern that scaling horizontally with commodity hardware (and letting
>> software such as Hadoop deal with failures) help with both scale and
>> reducing cost.
>>
>> Cheers,
>>
>> Joep
>>
>>
>> On Fri, May 27, 2016 at 11:02 AM, Arun Natva 
>> wrote:
>>
>>> Deepak,
>>> I have managed clusters where worker nodes crashed, disks failed..
>>> HDFS takes care of the data replication unless you loose too many of the
>>> nodes where there is not enough space to fit the replicas.
>>>
>>>
>>>
>>> Sent from my iPhone
>>>
>>> On May 27, 2016, at 11:54 AM, Deepak Goel  wrote:
>>>
>>>
>>> Hey
>>>
>>> Namaskara~Nalama~Guten Tag~Bonjour
>>>
>>> We are yet to see any server go down in our cluster nodes in the
>>> production environment? Has anyone seen reliability problems in their
>>> production environment? How many times?
>>>
>>> Thanks
>>> Deepak
>>>

Re: Redefine yarn.application.classpath value

2016-02-07 Thread Dejan Menges
Hello Jose,

For Yarn's classpath, depending how you installed everything on Ubuntu,
take a look also into yarn-env.sh, and inside
/etc/default/${whateveryarnorhadoopfile}.

However, I would personally expect it to be in yarn-env.sh.

Cheers

On Sun, Feb 7, 2016 at 2:02 AM José Luis Larroque 
wrote:

> In SO someone recommend me this:
>
> "check the place where you are setting $HADOOP_CONF_DIR, you might be
> setting that multiple places.. The $HADOOP_CONF_DIR is added by the
> bin/hadoop script to the front of the path. "
>
> i'm setting it on hadoop-env.sh:
>  export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/usr/local/hadoop/etc/hadoop"}
>
> But it isn't working, the classpath remains the same. I tried removing it,
> but was the same anyway.
>
> Please!!! any help would be greatly appreciated.
>
> Bye!
> Jose
>
> 2016-01-30 13:42 GMT-03:00 José Luis Larroque :
>
>> Hi guys!
>>
>> i have a Ubuntu 14.04 LTS, with a hadoop 2.4.0 single node cluster. When
>> i execute yarn classpath on console, gives me this:
>>
>> /usr/local/hadoop/etc/hadoop:/usr/local/hadoop/etc/hadoop:/usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/share/hadoop/common/*:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/*:/usr/local/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/share/hadoop/yarn/lib/*:/usr/local/hadoop/share/hadoop/yarn/*:/usr/local/hadoop/share/hadoop/mapreduce/lib/*:/usr/local/hadoop/share/hadoop/mapreduce/*:/usr/local/hadoop/contrib/capacity-scheduler/*.jar:/usr/local/hadoop/share/hadoop/yarn/*:/usr/local/hadoop/share/hadoop/yarn/lib/*
>>
>> As you can see, the first value is repeated three times. I looked for a
>> way to redefine or reset this. The only way, as far as i know, it's
>> redefining a property in yarn-site.xml, like this:
>>
>>  
>> Classpath for typical applications.
>>  yarn.application.classpath
>> 
>> $HADOOP_CONF_DIR, $HADOOP_COMMON_HOME/share/hadoop/common/*, 
>> $HADOOP_COMMON_HOME/share/hadoop/common/lib/*, $HADO$
>>  
>>   
>>
>> But yarn.application.classpath remains the same, without changes, even
>> if i stop every process of hadoop and start them all again. What other
>> option do i have? i'm doing something wrong here?
>>
>
>


Re: HDFS short-circuit tokens expiring

2016-01-20 Thread Dejan Menges
Hi Nick,

I had exactly the same case, and in our case it was that tokens were
expiring too quickly. What we increased was
dfs.client.read.shortcircuit.streams.cache.size
and dfs.client.read.shortcircuit.streams.cache.expiry.ms.

Hope this helps.

Best,
Dejan

On Wed, Jan 20, 2016 at 12:15 AM Nick Dimiduk  wrote:

> Hi folks,
>
> This looks like it sits in the intersection between HDFS and HBase. My
> region server logs are flooding with messages like
> "SecretManager$InvalidToken: access control error while attempting to set
> up short-circuit access to  ... is expired" [0].
>
> These logs correspond with responseTooSlow WARNings from the region server.
>
> Maybe I have misconfigured short-circuit reads? Such an expiration seems
> like something the client or client consumer should handle re-negotiations.
>
> Thanks a lot,
> -n
>
> [0]
>
> 2016-01-19 22:10:14,432 INFO
>  [B.defaultRpcServer.handler=4,queue=1,port=16020]
> shortcircuit.ShortCircuitCache: ShortCircuitCache(0x71bdc547): could not
> load 1074037633_BP-1145309065-XXX-1448053136416 due to InvalidToken
> exception.
> org.apache.hadoop.security.token.SecretManager$InvalidToken: access
> control error while attempting to set up short-circuit access to  path> token with block_token_identifier (expiryDate=1453194430724,
> keyId=1508822027, userId=hbase,
> blockPoolId=BP-1145309065-XXX-1448053136416, blockId=1074037633, access
> modes=[READ]) is expired.
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:591)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:490)
> at
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:782)
> at
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:716)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:422)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:333)
> at
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)
> at
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)
> at java.io.DataInputStream.read(DataInputStream.java:149)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:678)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1372)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1591)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1470)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:437)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:259)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:614)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:267)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:181)
> at
> org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:256)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:817)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:792)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:621)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5486)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5637)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5424)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5410)
> at
> org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver$1.next(GroupedAggregateRegionObserver.java:510)
> at
> org.apache.phoenix.coprocessor.BaseRegionScanner.next(BaseRegionScanner.java:40)
> at
> org.apache.phoenix.coprocessor.BaseRegionScanner.nextRaw(BaseRegionScanner.java:60)
> at
> org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:77)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2395)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
> at o

Re: unsubscribe

2015-11-05 Thread Dejan Menges
Ok, this is not a joke...

On Thu, Nov 5, 2015 at 3:06 PM Bourre, Marc <
marc.bou...@ehealthontario.on.ca> wrote:

> Unsubcribe
>
>
>
> *From:* mark charts [mailto:mcha...@yahoo.com]
> *Sent:* Thursday, November 05, 2015 8:54 AM
> *To:* user@hadoop.apache.org; wadood.chaudh...@instinet.com
> *Subject:* Re: unsubscribe
>
>
>
> Welcome to human race.  You can bring a horse to water but you can't make
> it drink it.
>
>
>
>
>
> On Thursday, November 5, 2015 8:49 AM, Daniel Jankovic 
> wrote:
>
>
>
> So, here's a cool thing we can do to these charming people that are too
> lazy to go and read a bit about a group that they so cheerfully joined:
>
>
>
> Create a rule in your e-mail client to reply "Why don't you read this:
> https://hadoop.apache.org/mailing_lists.html";
>
>
>
> It will be very right thing to do, because they honored us with their
> laziness.
>
>
>
> I know I'm doing it.
>
>
>
> On Wed, Nov 4, 2015 at 8:15 PM,  wrote:
>
>  unsubscribe
>
> -
> *Wadood Chaudhary*
> 309 West 49th Street
> New York, NY 10019 US
>
>
> [image: Inactive hide details for Cao Yi ---10/28/2015 01:45:50
> AM---unsubscribe]Cao Yi ---10/28/2015 01:45:50 AM---unsubscribe
>
> From: Cao Yi 
> To: user@hadoop.apache.org,
> Date: 10/28/2015 01:45 AM
> Subject: unsubscribe
> --
>
>
>
>
> unsubscribe
>
>
>
> *
> =
> *
>
> * Disclaimer *
>
> *This message is intended solely for use by the named addressee(s). If you
> receive this transmission in error, please immediately notify the sender
> and destroy this message in its entirety, whether in electronic or hard
> copy format. Any unauthorized use (and reliance thereon), copying,
> disclosure, retention, or distribution of this transmission or the material
> in this transmission is forbidden. We reserve the right to monitor and
> archive electronic communications. This material does not constitute an
> offer or solicitation with respect to the purchase or sale of any security.
> It should not be construed to contain any recommendation regarding any
> security or strategy. Any views expressed are those of the individual
> sender, except where the message states otherwise and the sender is
> authorized to state them to be the views of any such entity. This
> communication is provided on an “as is” basis. It contains material that is
> owned by Instinet Incorporated, its subsidiaries or its or their licensors,
> and may not, in whole or in part, be (i) copied, photocopied or duplicated
> in any form, by any means, or (ii) redistributed, posted, published,
> excerpted, or quoted without Instinet Incorporated's prior written consent.
> Please access the following link for important information and
> instructions:
> http://instinet.com/includes/index.jsp?thePage=/html/le_index.txt
>  *
>
> *Securities products and services are provided by locally registered
> brokerage subsidiaries of Instinet Incorporated: Instinet Australia Pty
> Limited (ACN: 131 253 686 AFSL No: 327834), regulated by the Australian
> Securities & Investments Commission; Instinet Canada Limited, member
> IIROC/CIPF; Instinet Pacific Limited, authorized and regulated by the
> Securities and Futures Commission of Hong Kong; Instinet Singapore Services
> Private Limited, regulated by the Monetary Authority of Singapore, trading
> member of The Singapore Exchange Securities Trading Private Limited and
> clearing member of The Central Depository (Pte) Limited; and Instinet, LLC,
> member SIPC. *
>
>
>
> *
> =
> *
>
>
>
>
>


Re: Start/stop scripts - particularly start-dfs.sh - in Hortonworks Data Platform 2.3.X

2015-10-24 Thread Dejan Menges
Hi Stephen,

In /usr/hdp/version there's etc/ subfolder with init scripts, beside others
one for datanode too (hadoop-hdfs-datanode is the name).

Cheers
On Oct 24, 2015 7:00 PM, "Stephen Boesch"  wrote:

> OK I will continue on hdp list: I am already using the hdfs command for
> all of those individual commands but they are *not*  a replacement for the
> single start-dfs.sh
>
>
>
> 2015-10-24 9:48 GMT-07:00 Ted Yu :
>
>> See /usr/hdp/current/hadoop-hdfs-client/bin/hdfs which calls hdfs.distro
>>
>> At the top of hdfs.distro, you would see the usage:
>>
>> function print_usage(){
>>   echo "Usage: hdfs [--config confdir] COMMAND"
>>   echo "   where COMMAND is one of:"
>>   echo "  dfs  run a filesystem command on the file
>> systems supported in Hadoop."
>>   echo "  namenode -format format the DFS filesystem"
>>   echo "  secondarynamenoderun the DFS secondary namenode"
>>   echo "  namenode run the DFS namenode"
>>   echo "  journalnode  run the DFS journalnode"
>>
>> BTW since this question is vendor specific, I suggest continuing on
>> vendor's forum.
>>
>> Cheers
>>
>> On Fri, Oct 23, 2015 at 7:06 AM, Stephen Boesch 
>> wrote:
>>
>>>
>>> We are setting up automated deployments on a headless system: so using
>>> the GUI is not an option here.  When we search for those scripts under
>>> HDP they are not found:
>>>
>>> $ pwd
>>> /usr/hdp/current
>>>
>>> Which scripts exist in HDP ?
>>>
>>> [stack@s1-639016 current]$ find -L . -name \*.sh
>>> ...
>>>
>>> There are ZERO start/stop sh scripts..
>>>
>>> In particular I am interested in the *start-dfs.sh* script that starts
>>> the namenode(s) , journalnode, and datanodes.
>>>
>>>
>>
>


DataNode does not start after changing RAID controller

2015-10-03 Thread Dejan Menges
Hi,

We had situation that RAID controller died in one of our nodes, and we had
to change it obviously. After changing it, from system side all looks good,
but DataNode doesn't want to start start anymore:

https://gist.github.com/dejo1307/5ca4946275eb81aa96f1

Using HDP 2.2, Hadoop version is 2.6.0.2.2.6.0-2800,
racb70ecfae2c3c5ab46e24b0caebceaec16fdcd0

To be honest this happened first time to us, and I'm not quite sure how to
proceed. Probably removing content of folders would help to start datanode,
but that would kill our however bad data locality in this cluster.

Thanks a lot!

Dejan


HIVE-9223 or running multiple Hive queries from single client

2015-09-22 Thread Dejan Menges
Hi,

Does anyone knows if there are any plans related to this ticket:

https://issues.apache.org/jira/browse/HIVE-9223

I also asked for update in the ticket too, just to be sure.

Thanks a lot,

Dejan


Re: YARN - avoid removing container shell scripts

2015-07-27 Thread Dejan Menges
Found it - yarn.nodemanager.delete.debug-delay-sec

On Mon, Jul 27, 2015 at 2:25 PM Dejan Menges  wrote:

> Hi,
>
> I remember there was an option to retain container lunch scripts for some
> period of time, but at this moment neither I can remember what parameter it
> was, neither I can find it in documentation.
>
> Any information would be appreciated!
>
> Cheers,
> Dejan
>


YARN - avoid removing container shell scripts

2015-07-27 Thread Dejan Menges
Hi,

I remember there was an option to retain container lunch scripts for some
period of time, but at this moment neither I can remember what parameter it
was, neither I can find it in documentation.

Any information would be appreciated!

Cheers,
Dejan


HDFS Short-Circuit Local Reads

2015-06-20 Thread Dejan Menges
Hi,

We are using (still, until Monday) HDP 2.1 for quite some time now, and SC
local reads were enabled all the time. In beginning, we used Hortonworks
recommendations and set SC cache size to 256, with default 5 minutes to
invalidate them, and that's where problems started.

At some point in time we started using multigets. After very short time
they started timing out on our side. We were playing with different
timeouts, graphite was showing (metric
hbase.regionserver.RegionServer.get_mean) that load on three nodes out of
all other increased drastically. Looking into logs, googling, going through
documentation over and over again, we found some discussion that SC cache
by should be no lower than 4096. After setting it up to 4096, our problem
was solved. For some time.

At some point our data usage patterns were changed, and as we already had
monitoring for this stuff, multigets started timing out again, monitoring
showing it's timing out on two nodes where number of open sockets was ~3-4k
per node, while on all others was 400-500. Narrowing this down a little bit
we found some strangely too big regions, did some splitting, some manual
merges, HBase distributed it around, but issue was still there. And then I
found next three things (here's questions coming):

- With cache size of 4096, and 30ms cache expiry timeout, we saw
exactly every ten minutes this error in logs:

2015-06-18 14:26:07,093 WARN org.apache.hadoop.hdfs.BlockReaderLocal: error
creating DomainSocket
2015-06-18 14:26:07,093 WARN
org.apache.hadoop.hdfs.client.ShortCircuitCache:
ShortCircuitCache(0x3d1dc8c9): failed to load
1109699858_BP-1988583858-172.22.5.40-1424448407690
--
2015-06-18 14:36:07,135 WARN org.apache.hadoop.hdfs.BlockReaderLocal: error
creating DomainSocket
2015-06-18 14:36:07,136 WARN
org.apache.hadoop.hdfs.client.ShortCircuitCache:
ShortCircuitCache(0x3d1dc8c9): failed to load
1109704764_BP-1988583858-172.22.5.40-1424448407690
--
2015-06-18 14:46:07,137 WARN org.apache.hadoop.hdfs.BlockReaderLocal: error
creating DomainSocket
2015-06-18 14:46:07,138 WARN
org.apache.hadoop.hdfs.client.ShortCircuitCache:
ShortCircuitCache(0x3d1dc8c9): failed to load
1105787899_BP-1988583858-172.22.5.40-1424448407690

- After increasing SC cache to 8192 (as on those couple that were getting
up to 5-7k 4096 obviously wasn't enough):
- Our multigets are not taking between 20-30 seconds anymore but being
again done within 5 seconds, what's our client timeout.
- netstat -tanlp | grep -c 50010 shows now ~ 2800 open local SC per
every node.

Why would those errors be logged exactly every 10 minutes with 4096 cache
size and 5 minutes expire timeout?

Why would increasing SC cache also 'balance' number of open SC on all nodes?

Am I right that hbase.regionserver.RegionServer.get_mean shows mean number
of gets in unit on time, not time needed to make a gets? If I'm true,
increasing this made, in our case, gets faster. If I'm wrong, it made gets
slower, but then it speeded up our multigets, what's twisting my brain
after narrowing this down for a week.

How should cache and expiry timeout correlate to each other?

Thanks a lot!


When is DataNode 'bad'?

2015-06-10 Thread Dejan Menges
Hi,

>From time to time I see some reduces failing with this:

Error: java.io.IOException: Failed to replace a bad datanode on the
existing pipeline due to no more good datanodes being available to try. The
current failed datanode replacement policy is DEFAULT, and a client may
configure this via
'dfs.client.block.write.replace-datanode-on-failure.policy' in its
configuration.

I don't see any issues in HDFS during this period (for example, for
specific node on which this happened, I checked the logs, and only thing
that was happening at that specific point was that pipeline was
recovering).

So not quite sure how there's no more good datanodes in cluster of 15 nodes
with replication factor three?

Also, regarding
http://blog.cloudera.com/blog/2015/03/understanding-hdfs-recovery-processes-part-2/
- there is parameter called dfs.client.block.write.replace-datanode-on-
failure.best-effort which I can not find currently. From which Hadoop
version this parameter can be used, and how much sense it makes to use it
to avoid issues like this one from above?

It's about Hadoop 2.4, Hortonworks 2.1, and currently preparing upgrade to
2.2 and not sure if this is maybe some known issue or something I don't get.

Thanks a lot,
Dejan


Socket Timeout Exception

2015-05-26 Thread Dejan Menges
Hi,

I'm seeing this exception on every HDFS node once in a while on one cluster:

2015-05-26 13:37:31,831 INFO  datanode.DataNode
(BlockSender.java:sendPacket(566)) - Failed to send data:
java.net.SocketTimeoutException: 1 millis timeout while waiting for
channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/172.22.5.34:50010 remote=/
172.22.5.34:31684]

2015-05-26 13:37:31,831 INFO  DataNode.clienttrace
(BlockSender.java:sendBlock(738)) - src: /172.22.5.34:50010, dest: /
172.22.5.34:31684, bytes: 12451840, op: HDFS_READ, cliID:
DFSClient_hb_rs_my-hadoop-node-fqdn,60020,1432041913240_-1351889511_35,
offset: 47212032, srvID: 9bfc58b8-94b0-40a5-ba33-6d712fa1faa2, blockid:
BP-1988583858-172.22.5.40-1424448407690:blk_1105314202_31576629, duration:
10486866121

2015-05-26 13:37:31,831 WARN  datanode.DataNode
(DataXceiver.java:readBlock(541)) - DatanodeRegistration(172.22.5.34,
datanodeUuid=9bfc58b8-94b0-40a5-ba33-6d712fa1faa2, infoPort=50075,
ipcPort=8010,
storageInfo=lv=-55;cid=CID-962af1ea-201a-4d27-ae80-e4a7b712f1ac;nsid=109597947;c=0):Got
exception while serving
BP-1988583858-172.22.5.40-1424448407690:blk_1105314202_31576629 to /
172.22.5.34:31684

java.net.SocketTimeoutException: 1 millis timeout while waiting for
channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/172.22.5.34:50010 remote=/
172.22.5.34:31684]

at
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)

at
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)

at
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)

at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547)

at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716)

at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:506)

at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110)

at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)

at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)

at java.lang.Thread.run(Thread.java:745)

2015-05-26 13:37:31,831 ERROR datanode.DataNode (DataXceiver.java:run(250))
- my-hadoop-node-fqdn:50010:DataXceiver error processing READ_BLOCK
operation  src: /172.22.5.34:31684 dst: /172.22.5.34:50010

java.net.SocketTimeoutException: 1 millis timeout while waiting for
channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/172.22.5.34:50010 remote=/
172.22.5.34:31684]

at
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)

at
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)

at
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)

at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547)

at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716)

at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:506)

at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110)

at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)

at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)

at java.lang.Thread.run(Thread.java:745)


...and it's basically only complaining about itself. On same node there's
HDFS, RegionServer and Yarn.

I'm struggling little bit how to interpret this. Funny thing is that this
is our live cluster, the one where we are writing everything. Thinking if
it's possible that HBase flush size (256M) is problem while block size is
128M.

Any advice where to look is welcome!

Thanks,
Dejan


Re: HDFS Installation

2014-04-13 Thread Dejan Menges
Your output says permission denied for ssh@localhost. Try to fix that first
(there are bunch of tutorials on passwordless SSH connection).
On Apr 13, 2014 7:37 PM, "Ekta Agrawal"  wrote:

> Hi,
>
> I started with "ssh localhost" command.
> Does anything else is needed to check SSH?
>
> Then I stopped all the services which were running by "stop-all.sh"
> and start them again with "start-all.sh".
>
> I have copied the way it executed on the terminal for some commands.
>
> I don't know, why after start-all.sh it says starting namenode and does
> not show any failure but
> when I check through jps it does not list namenode.
>
> I tried opening namenode in browser. It is also not getting open.
>
>
> 
>
> These is the way it executed on terminal:
>
> hduser@ubuntu:~$ ssh localhost
> hduser@localhost's password:
> Welcome to Ubuntu 12.04.2 LTS
>
>  * Documentation:  https://help.ubuntu.com/
>
> 459 packages can be updated.
> 209 updates are security updates.
>
> Last login: Sun Feb  2 00:28:46 2014 from localhost
>
>
>
>
> hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode -format
> 14/04/07 01:44:20 INFO namenode.NameNode: STARTUP_MSG:
> /
> STARTUP_MSG: Starting NameNode
> STARTUP_MSG:   host = ubuntu/127.0.0.1
> STARTUP_MSG:   args = [-format]
> STARTUP_MSG:   version = 1.0.3
> STARTUP_MSG:   build =
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
> 1335192; compiled by 'hortonfo' on Tue May  8 20:31:25 UTC 2012
> /
> Re-format filesystem in /app/hadoop/tmp/dfs/name ? (Y or N) y
> Format aborted in /app/hadoop/tmp/dfs/name
> 14/04/07 01:44:27 INFO namenode.NameNode: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.0.1
> /
>
>
> hduser@ubuntu:~$ /usr/local/hadoop/bin/start-all.sh
> starting namenode, logging to /usr/local/hadoop/libexec/../
> logs/hadoop-hduser-namenode-ubuntu.out
> ehduser@localhost's password:
> hduser@localhost's password: localhost: Permission denied, please try
> again.
> localhost: starting datanode, logging to /usr/local/hadoop/libexec/../
> logs/hadoop-hduser-datanode-ubuntu.out
> hduser@
>
>
>
>
>
> On Sun, Apr 13, 2014 at 9:14 PM, Mahesh Khandewal 
> wrote:
>
>> Ekta it may be ssh problem. first check for ssh
>>
>>
>> On Sun, Apr 13, 2014 at 8:46 PM, Ekta Agrawal 
>> wrote:
>>
>>> I already used the same guide to install hadoop.
>>>
>>> If HDFS does not require anything except Hadoop single node
>>> installation then the installation part is complete.
>>>
>>> I tried running bin/hadoop dfs -mkdir /foodir
>>> bin/hadoop dfsadmin -safemode enter
>>>
>>> these commands are giving following exception:
>>>
>>> 14/04/07 00:23:09 INFO ipc.Client: Retrying connect to server:localhost/
>>> 127.0.0.1:54310. Already tried 9 time(s).
>>> Bad connection to FS. command aborted. exception: Call to localhost/
>>> 127.0.0.1:54310 failed on connection exception:
>>> java.net.ConnectException: Connection refused
>>>
>>> Can somebody help me to understand that why it is happening?
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Apr 13, 2014 at 10:33 AM, Mahesh Khandewal <
>>> mahesh.k@gmail.com> wrote:
>>>
 I think in hadoop installation only hdfs comes.
 Like you need to insert script like
 bin/hadoop start-dfs.sh in $hadoop_home path


 On Sun, Apr 13, 2014 at 10:27 AM, Ekta Agrawal <
 ektacloudst...@gmail.com> wrote:

> Can anybody suggest any good tutorial to install hdfs and work with
> hdfs?
>
> I installed hadoop on Ubuntu as single node. I can see those service
> running.
>
> But how to install and work with hdfs? Please give some guidance.
>


>>>
>>
>