Re: issue about write append into hdfs "ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver error processing READ_BLOCK operation "

2014-02-20 Thread ch huang
i changed all datanode config add dfs.datanode.max.xcievers value is 131072
and restart all DN, still no use

On Fri, Feb 21, 2014 at 12:25 PM, Anurag Tangri wrote:

>  Did you check your unix open file limit and data node xceiver value ?
>
> Is it too low for the number of blocks/data in your cluster ?
>
> Thanks,
> Anurag Tangri
>
> On Feb 20, 2014, at 6:57 PM, ch huang  wrote:
>
>   hi,maillist:
>   i see the following info in my hdfs log ,and the block belong to
> the file which write by scribe ,i do not know why
> is there any limit in hdfs system ?
>
> 2014-02-21 10:33:30,235 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: opReadBlock
> BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240
> received exc
> eption java.io.IOException: Replica gen stamp < block genstamp,
> block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
> replica=ReplicaWaitingToBeRecov
> ered, blk_-8536558734938003208_3820986, RWR
>   getNumBytes() = 35840
>   getBytesOnDisk()  = 35840
>   getVisibleLength()= -1
>   getVolume()   = /data/4/dn/current
>   getBlockFile()=
> /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
>   unlinked=false
> 2014-02-21 10:33:30,235 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(192.168.11.12,
> storageID=DS-754202132-192.168.11.12-50010-1382443087835, infoP
> ort=50075, ipcPort=50020,
> storageInfo=lv=-40;cid=CID-0e777b8c-19f3-44a1-8af1-916877f2506c;nsid=2086828354;c=0):Got
> exception while serving BP-1043055049-192.168.11.11-1382442676
> 609:blk_-8536558734938003208_3823240 to /192.168.11.15:56564
> java.io.IOException: Replica gen stamp < block genstamp,
> block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
> replica=ReplicaWaitingToBeRecovered, b
> lk_-8536558734938003208_3820986, RWR
>   getNumBytes() = 35840
>   getBytesOnDisk()  = 35840
>   getVisibleLength()= -1
>   getVolume()   = /data/4/dn/current
>   getBlockFile()=
> /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
>   unlinked=false
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:205)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
> at java.lang.Thread.run(Thread.java:744)
> 2014-02-21 10:33:30,236 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver
> error processing READ_BLOCK operation  src: /192.168.11.15:56564 dest: /
> 192.168.11.12:50010
> java.io.IOException: Replica gen stamp < block genstamp,
> block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
> replica=ReplicaWaitingToBeRecovered, blk_-8536558734938003208_3820986, RWR
>   getNumBytes() = 35840
>   getBytesOnDisk()  = 35840
>   getVisibleLength()= -1
>   getVolume()   = /data/4/dn/current
>   getBlockFile()=
> /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
>   unlinked=false
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:205)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
> at java.lang.Thread.run(Thread.java:744)
>
>


Re: issue about write append into hdfs "ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver error processing READ_BLOCK operation "

2014-02-20 Thread ch huang
one more question is if i need add the value of data node xceiver
need i add it to my NN config file?



On Fri, Feb 21, 2014 at 12:25 PM, Anurag Tangri wrote:

>  Did you check your unix open file limit and data node xceiver value ?
>
> Is it too low for the number of blocks/data in your cluster ?
>
> Thanks,
> Anurag Tangri
>
> On Feb 20, 2014, at 6:57 PM, ch huang  wrote:
>
>   hi,maillist:
>   i see the following info in my hdfs log ,and the block belong to
> the file which write by scribe ,i do not know why
> is there any limit in hdfs system ?
>
> 2014-02-21 10:33:30,235 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: opReadBlock
> BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240
> received exc
> eption java.io.IOException: Replica gen stamp < block genstamp,
> block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
> replica=ReplicaWaitingToBeRecov
> ered, blk_-8536558734938003208_3820986, RWR
>   getNumBytes() = 35840
>   getBytesOnDisk()  = 35840
>   getVisibleLength()= -1
>   getVolume()   = /data/4/dn/current
>   getBlockFile()=
> /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
>   unlinked=false
> 2014-02-21 10:33:30,235 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(192.168.11.12,
> storageID=DS-754202132-192.168.11.12-50010-1382443087835, infoP
> ort=50075, ipcPort=50020,
> storageInfo=lv=-40;cid=CID-0e777b8c-19f3-44a1-8af1-916877f2506c;nsid=2086828354;c=0):Got
> exception while serving BP-1043055049-192.168.11.11-1382442676
> 609:blk_-8536558734938003208_3823240 to /192.168.11.15:56564
> java.io.IOException: Replica gen stamp < block genstamp,
> block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
> replica=ReplicaWaitingToBeRecovered, b
> lk_-8536558734938003208_3820986, RWR
>   getNumBytes() = 35840
>   getBytesOnDisk()  = 35840
>   getVisibleLength()= -1
>   getVolume()   = /data/4/dn/current
>   getBlockFile()=
> /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
>   unlinked=false
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:205)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
> at java.lang.Thread.run(Thread.java:744)
> 2014-02-21 10:33:30,236 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver
> error processing READ_BLOCK operation  src: /192.168.11.15:56564 dest: /
> 192.168.11.12:50010
> java.io.IOException: Replica gen stamp < block genstamp,
> block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
> replica=ReplicaWaitingToBeRecovered, blk_-8536558734938003208_3820986, RWR
>   getNumBytes() = 35840
>   getBytesOnDisk()  = 35840
>   getVisibleLength()= -1
>   getVolume()   = /data/4/dn/current
>   getBlockFile()=
> /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
>   unlinked=false
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:205)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
> at java.lang.Thread.run(Thread.java:744)
>
>


Re: Capacity Scheduler capacity vs. maximum-capacity

2014-02-20 Thread Vinod Kumar Vavilapalli

Yes, it does take those extra resources away back to queue B. How quickly it 
takes them away depends on whether preemption is enabled or not. If preemption 
is not enabled, it 'takes away' as and when containers from queue A start 
finishing.

+Binod

On Feb 19, 2014, at 5:35 PM, Alex Nastetsky  wrote:

> Will the scheduler take away the 10% from queue B and give it back to queue A 
> even if queue B needs it? If not, it would seem that the scheduler is 
> reneging on its guarantee.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: issue about write append into hdfs "ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver error processing READ_BLOCK operation "

2014-02-20 Thread ch huang
i use default value it seems the value is 4096,

and also i checked hdfs user limit ,it's large enough

-bash-4.1$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 514914
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files  (-n) 32768
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 65536
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited


On Fri, Feb 21, 2014 at 12:25 PM, Anurag Tangri wrote:

>  Did you check your unix open file limit and data node xceiver value ?
>
> Is it too low for the number of blocks/data in your cluster ?
>
> Thanks,
> Anurag Tangri
>
> On Feb 20, 2014, at 6:57 PM, ch huang  wrote:
>
>   hi,maillist:
>   i see the following info in my hdfs log ,and the block belong to
> the file which write by scribe ,i do not know why
> is there any limit in hdfs system ?
>
> 2014-02-21 10:33:30,235 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: opReadBlock
> BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240
> received exc
> eption java.io.IOException: Replica gen stamp < block genstamp,
> block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
> replica=ReplicaWaitingToBeRecov
> ered, blk_-8536558734938003208_3820986, RWR
>   getNumBytes() = 35840
>   getBytesOnDisk()  = 35840
>   getVisibleLength()= -1
>   getVolume()   = /data/4/dn/current
>   getBlockFile()=
> /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
>   unlinked=false
> 2014-02-21 10:33:30,235 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(192.168.11.12,
> storageID=DS-754202132-192.168.11.12-50010-1382443087835, infoP
> ort=50075, ipcPort=50020,
> storageInfo=lv=-40;cid=CID-0e777b8c-19f3-44a1-8af1-916877f2506c;nsid=2086828354;c=0):Got
> exception while serving BP-1043055049-192.168.11.11-1382442676
> 609:blk_-8536558734938003208_3823240 to /192.168.11.15:56564
> java.io.IOException: Replica gen stamp < block genstamp,
> block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
> replica=ReplicaWaitingToBeRecovered, b
> lk_-8536558734938003208_3820986, RWR
>   getNumBytes() = 35840
>   getBytesOnDisk()  = 35840
>   getVisibleLength()= -1
>   getVolume()   = /data/4/dn/current
>   getBlockFile()=
> /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
>   unlinked=false
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:205)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
> at java.lang.Thread.run(Thread.java:744)
> 2014-02-21 10:33:30,236 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver
> error processing READ_BLOCK operation  src: /192.168.11.15:56564 dest: /
> 192.168.11.12:50010
> java.io.IOException: Replica gen stamp < block genstamp,
> block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
> replica=ReplicaWaitingToBeRecovered, blk_-8536558734938003208_3820986, RWR
>   getNumBytes() = 35840
>   getBytesOnDisk()  = 35840
>   getVisibleLength()= -1
>   getVolume()   = /data/4/dn/current
>   getBlockFile()=
> /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
>   unlinked=false
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:205)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
> at java.lang.Thread.run(Thread.java:744)
>
>


Re: hadoop-3.0.0-SNAPSHOT runtime exception when trying PI example

2014-02-20 Thread Vinod Kumar Vavilapalli
Are you trying to run it on a cluster or locally? There is something called 
'local mode' which is getting activated here (see that job-id has a _local_ 
string).

You need to point your configuration to the real ResourceManager. Share your 
configs. Or better share the resource that you are following as instructions.

+Viond

On Feb 19, 2014, at 10:09 PM, Chen, Richard  wrote:

> Dear Hadoop User group,
>  
> This is my virgin post. So please bear with me if I made any mistakes at my 
> early time here.
>  
> Recently I successfully downloaded and compiled Hadoop trunk code (svn co 
> http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-trunk)
> I also deployed the hadoop-3.0.0-SNAPSHOT distribution to a physical cluster 
> of 5 nodes each running 64-bit Cent OS 6.
>  
> After careful configuration, when I tried running the 1st PI job example:
>  $hadoop jar 
> share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-SNAPSHOT.jar pi 4 10
>  
> The result was successfully calculated but some exception message was also 
> dumped during the execution and I couldn’t figure out why.
> The exception message goes like this:
>  
> Starting Job
> 14/02/20 13:34:12 INFO Configuration.deprecation: session.id is deprecated. 
> Instead, use dfs.metrics.session-id
> 14/02/20 13:34:12 INFO jvm.JvmMetrics: Initializing JVM Metrics with 
> processName=JobTracker, sessionId=
> 14/02/20 13:34:12 INFO input.FileInputFormat: Total input paths to process : 4
> 14/02/20 13:34:12 INFO mapreduce.JobSubmitter: number of splits:4
> 14/02/20 13:34:12 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
> job_local1598491292_0001
> 14/02/20 13:34:12 INFO mapreduce.Job: The url to track the job: 
> http://localhost:8080/
> 14/02/20 13:34:12 INFO mapreduce.Job: Running job: job_local1598491292_0001
> 14/02/20 13:34:12 INFO mapred.LocalJobRunner: OutputCommitter set in config 
> null
> 14/02/20 13:34:12 INFO mapred.LocalJobRunner: OutputCommitter is 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
> 14/02/20 13:34:13 INFO mapred.LocalJobRunner: Waiting for map tasks
> 14/02/20 13:34:13 INFO mapred.LocalJobRunner: Starting task: 
> attempt_local1598491292_0001_m_00_0
> 14/02/20 13:34:13 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
> 14/02/20 13:34:13 INFO mapred.MapTask: Processing 
> split:hdfs://samdev06:9000/user/hadoop/QuasiMonteCarlo_1392874450186_1650772770/in/part0:0+118
> 14/02/20 13:34:13 INFO mapred.MapTask: Map output collector class = 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer
> 14/02/20 13:34:13 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
> 14/02/20 13:34:13 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
> 14/02/20 13:34:13 INFO mapred.MapTask: soft limit at 83886080
> 14/02/20 13:34:13 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
> 14/02/20 13:34:13 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
> 14/02/20 13:34:13 INFO mapred.LocalJobRunner:
> 14/02/20 13:34:13 INFO mapred.MapTask: Starting flush of map output
> 14/02/20 13:34:13 INFO mapred.MapTask: Spilling map output
> 14/02/20 13:34:13 INFO mapred.MapTask: bufstart = 0; bufend = 18; bufvoid = 
> 104857600
> 14/02/20 13:34:13 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 
> 26214392(104857568); length = 5/6553600
> 14/02/20 13:34:13 INFO mapred.MapTask: Finished spill 0
> 14/02/20 13:34:13 INFO mapred.Task: 
> Task:attempt_local1598491292_0001_m_00_0 is done. And is in the process 
> of committing
> 14/02/20 13:34:13 WARN mapred.Task: Could not find output size
> java.io.FileNotFoundException: File does not exist: 
> /tmp/hadoop/mapred/local/localRunner/hadoop/jobcache/job_local1598491292_0001/attempt_local1598491292_0001_m_00_0/output/file.out
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1072)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1064)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1064)
> at 
> org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:1124)
> at 
> org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:1102)
> at org.apache.hadoop.mapred.Task.done(Task.java:1048)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:244)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(Thr

Re: issue about write append into hdfs "ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver error processing READ_BLOCK operation "

2014-02-20 Thread ch huang
hi, i use CDH4.4

On Fri, Feb 21, 2014 at 12:04 PM, Ted Yu  wrote:

> Which hadoop release are you using ?
>
> Cheers
>
>
> On Thu, Feb 20, 2014 at 8:57 PM, ch huang  wrote:
>
>>  hi,maillist:
>>   i see the following info in my hdfs log ,and the block belong
>> to the file which write by scribe ,i do not know why
>> is there any limit in hdfs system ?
>>
>> 2014-02-21 10:33:30,235 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: opReadBlock
>> BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240
>> received exc
>> eption java.io.IOException: Replica gen stamp < block genstamp,
>> block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
>> replica=ReplicaWaitingToBeRecov
>> ered, blk_-8536558734938003208_3820986, RWR
>>   getNumBytes() = 35840
>>   getBytesOnDisk()  = 35840
>>   getVisibleLength()= -1
>>   getVolume()   = /data/4/dn/current
>>   getBlockFile()=
>> /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
>>   unlinked=false
>> 2014-02-21 10:33:30,235 WARN
>> org.apache.hadoop.hdfs.server.datanode.DataNode:
>> DatanodeRegistration(192.168.11.12,
>> storageID=DS-754202132-192.168.11.12-50010-1382443087835, infoP
>> ort=50075, ipcPort=50020,
>> storageInfo=lv=-40;cid=CID-0e777b8c-19f3-44a1-8af1-916877f2506c;nsid=2086828354;c=0):Got
>> exception while serving BP-1043055049-192.168.11.11-1382442676
>> 609:blk_-8536558734938003208_3823240 to /192.168.11.15:56564
>> java.io.IOException: Replica gen stamp < block genstamp,
>> block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
>> replica=ReplicaWaitingToBeRecovered, b
>> lk_-8536558734938003208_3820986, RWR
>>   getNumBytes() = 35840
>>   getBytesOnDisk()  = 35840
>>   getVisibleLength()= -1
>>   getVolume()   = /data/4/dn/current
>>   getBlockFile()=
>> /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
>>   unlinked=false
>> at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:205)
>> at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326)
>> at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
>> at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
>> at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
>> at java.lang.Thread.run(Thread.java:744)
>> 2014-02-21 10:33:30,236 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver
>> error processing READ_BLOCK operation  src: /192.168.11.15:56564 dest: /
>> 192.168.11.12:50010
>> java.io.IOException: Replica gen stamp < block genstamp,
>> block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
>> replica=ReplicaWaitingToBeRecovered, blk_-8536558734938003208_3820986, RWR
>>   getNumBytes() = 35840
>>   getBytesOnDisk()  = 35840
>>   getVisibleLength()= -1
>>   getVolume()   = /data/4/dn/current
>>   getBlockFile()=
>> /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
>>   unlinked=false
>> at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:205)
>> at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326)
>> at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
>> at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
>> at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
>> at java.lang.Thread.run(Thread.java:744)
>>
>
>


Re: history server for 2 clusters

2014-02-20 Thread Vinod Kumar Vavilapalli
Interesting use-case and setup. We never had this use-case in mind so far - we 
so far assumed a history-server per YARN cluster. You may be running into some 
issues where this assumption is not valid.

Why do you need two separate YARN clusters for the same underlying data on 
HDFS? And if that can't change, why can't you have two history-servers?

+Vinod

On Feb 20, 2014, at 6:08 PM, Anfernee Xu  wrote:

> Hi,
> 
> I'm at 2.2.0 release and I have a HDFS cluster which is shared by 2 YARN(MR) 
> cluster, also I have a single shared history server, what I'm seeing is I can 
> see all job summary for all jobs from history server UI, I also can see task 
> log for jobs running in one cluster, but if I want to see log for jobs 
> running in another cluster, it showed me below error
> 
> Logs not available for attempt_1392933787561_0024_m_00_0. Aggregation may 
> not be complete, Check back later or try the nodemanager at 
> slc03jvt.mydomain.com:31303 
> 
> Here's my configuration:
> 
> Note: my history server is running on RM node of the MR cluster where I can 
> see the log.
> 
> 
> mapred-site.xml
> 
>   mapreduce.jobhistory.address
>   slc00dgd:10020
>   MapReduce JobHistory Server IPC host:port
> 
> 
> 
>   mapreduce.jobhistory.webapp.address
>   slc00dgd:19888
>   MapReduce JobHistory Server Web UI host:port
> 
> 
> --yarn-site.xml
>   
>  yarn.log-aggregation-enable
>  true
>
> 
>
>  yarn.nodemanager.remote-app-log-dir-suffix
>  dc
>
> 
> Above configuration are almost same for both clusters, the only difference is 
> "yarn.nodemanager.remote-app-log-dir-suffix", they have different suffix.
> 
> 
> 
> -- 
> --Anfernee


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: issue about write append into hdfs "ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver error processing READ_BLOCK operation "

2014-02-20 Thread Anurag Tangri
Did you check your unix open file limit and data node xceiver value ?

Is it too low for the number of blocks/data in your cluster ? 

Thanks,
Anurag Tangri

> On Feb 20, 2014, at 6:57 PM, ch huang  wrote:
> 
> hi,maillist:
>   i see the following info in my hdfs log ,and the block belong to 
> the file which write by scribe ,i do not know why
> is there any limit in hdfs system ?
>  
> 2014-02-21 10:33:30,235 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> opReadBlock 
> BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240 
> received exc
> eption java.io.IOException: Replica gen stamp < block genstamp, 
> block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
>  replica=ReplicaWaitingToBeRecov
> ered, blk_-8536558734938003208_3820986, RWR
>   getNumBytes() = 35840
>   getBytesOnDisk()  = 35840
>   getVisibleLength()= -1
>   getVolume()   = /data/4/dn/current
>   getBlockFile()= 
> /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
>   unlinked=false
> 2014-02-21 10:33:30,235 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(192.168.11.12, 
> storageID=DS-754202132-192.168.11.12-50010-1382443087835, infoP
> ort=50075, ipcPort=50020, 
> storageInfo=lv=-40;cid=CID-0e777b8c-19f3-44a1-8af1-916877f2506c;nsid=2086828354;c=0):Got
>  exception while serving BP-1043055049-192.168.11.11-1382442676
> 609:blk_-8536558734938003208_3823240 to /192.168.11.15:56564
> java.io.IOException: Replica gen stamp < block genstamp, 
> block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
>  replica=ReplicaWaitingToBeRecovered, b
> lk_-8536558734938003208_3820986, RWR
>   getNumBytes() = 35840
>   getBytesOnDisk()  = 35840
>   getVisibleLength()= -1
>   getVolume()   = /data/4/dn/current
>   getBlockFile()= 
> /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
>   unlinked=false
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:205)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
> at java.lang.Thread.run(Thread.java:744)
> 2014-02-21 10:33:30,236 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver error 
> processing READ_BLOCK operation  src: /192.168.11.15:56564 dest: 
> /192.168.11.12:50010
> java.io.IOException: Replica gen stamp < block genstamp, 
> block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
>  replica=ReplicaWaitingToBeRecovered, blk_-8536558734938003208_3820986, RWR
>   getNumBytes() = 35840
>   getBytesOnDisk()  = 35840
>   getVisibleLength()= -1
>   getVolume()   = /data/4/dn/current
>   getBlockFile()= 
> /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
>   unlinked=false
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:205)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
> at java.lang.Thread.run(Thread.java:744)


Re: any optimize suggestion for high concurrent write into hdfs?

2014-02-20 Thread Suresh Srinivas
Another alternative is to write block sized chunks into multiple hdfs files 
concurrently followed by concat to all those into a single file. 

Sent from phone

> On Feb 20, 2014, at 8:15 PM, Chen Wang  wrote:
> 
> Ch,
> you may consider using flume as it already has a flume sink that can sink to 
> hdfs. What I did is to set up a flume listening on an Avro sink, and then 
> sink to hdfs. Then in my application, i just send my data to avro socket.
> Chen
> 
> 
>> On Thu, Feb 20, 2014 at 5:07 PM, ch huang  wrote:
>> hi,maillist:
>>   is there any optimize for large of write into hdfs in same time ? 
>> thanks
> 

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: any optimize suggestion for high concurrent write into hdfs?

2014-02-20 Thread Chen Wang
Ch,
you may consider using flume as it already has a flume sink that can sink
to hdfs. What I did is to set up a flume listening on an Avro sink, and
then sink to hdfs. Then in my application, i just send my data to avro
socket.
Chen


On Thu, Feb 20, 2014 at 5:07 PM, ch huang  wrote:

> hi,maillist:
>   is there any optimize for large of write into hdfs in same time
> ? thanks
>


Re: issue about write append into hdfs "ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver error processing READ_BLOCK operation "

2014-02-20 Thread Ted Yu
Which hadoop release are you using ?

Cheers


On Thu, Feb 20, 2014 at 8:57 PM, ch huang  wrote:

> hi,maillist:
>   i see the following info in my hdfs log ,and the block belong to
> the file which write by scribe ,i do not know why
> is there any limit in hdfs system ?
>
> 2014-02-21 10:33:30,235 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: opReadBlock
> BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240
> received exc
> eption java.io.IOException: Replica gen stamp < block genstamp,
> block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
> replica=ReplicaWaitingToBeRecov
> ered, blk_-8536558734938003208_3820986, RWR
>   getNumBytes() = 35840
>   getBytesOnDisk()  = 35840
>   getVisibleLength()= -1
>   getVolume()   = /data/4/dn/current
>   getBlockFile()=
> /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
>   unlinked=false
> 2014-02-21 10:33:30,235 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(192.168.11.12,
> storageID=DS-754202132-192.168.11.12-50010-1382443087835, infoP
> ort=50075, ipcPort=50020,
> storageInfo=lv=-40;cid=CID-0e777b8c-19f3-44a1-8af1-916877f2506c;nsid=2086828354;c=0):Got
> exception while serving BP-1043055049-192.168.11.11-1382442676
> 609:blk_-8536558734938003208_3823240 to /192.168.11.15:56564
> java.io.IOException: Replica gen stamp < block genstamp,
> block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
> replica=ReplicaWaitingToBeRecovered, b
> lk_-8536558734938003208_3820986, RWR
>   getNumBytes() = 35840
>   getBytesOnDisk()  = 35840
>   getVisibleLength()= -1
>   getVolume()   = /data/4/dn/current
>   getBlockFile()=
> /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
>   unlinked=false
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:205)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
> at java.lang.Thread.run(Thread.java:744)
> 2014-02-21 10:33:30,236 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver
> error processing READ_BLOCK operation  src: /192.168.11.15:56564 dest: /
> 192.168.11.12:50010
> java.io.IOException: Replica gen stamp < block genstamp,
> block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
> replica=ReplicaWaitingToBeRecovered, blk_-8536558734938003208_3820986, RWR
>   getNumBytes() = 35840
>   getBytesOnDisk()  = 35840
>   getVisibleLength()= -1
>   getVolume()   = /data/4/dn/current
>   getBlockFile()=
> /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
>   unlinked=false
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:205)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
> at java.lang.Thread.run(Thread.java:744)
>


No job shown in Hadoop resource manager web UI when running jobs in the cluster

2014-02-20 Thread Chen, Richard
Dear group,

I compiled hadoop 2.2.0 x64 and running it on a cluster. When I do hadoop job 
-list or hadoop job -list all, it throws a NPE like this:
14/01/28 17:18:39 INFO Configuration.deprecation: session.id is deprecated. 
Instead, use dfs.metrics.session-id
14/01/28 17:18:39 INFO jvm.JvmMetrics: Initializing JVM Metrics with 
processName=JobTracker, sessionId=
Exception in thread "main" java.lang.NullPointerException
at org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:504)
at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:312)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1237)
and on hadoop webapp like jobhistory ( I turn on the jobhistory server). It 
shows no job was running and no job finishing although I was running jobs.
Please help me to solve this problem.
Thanks!!

Richard Chen



issue about write append into hdfs "ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver error processing READ_BLOCK operation "

2014-02-20 Thread ch huang
hi,maillist:
  i see the following info in my hdfs log ,and the block belong to
the file which write by scribe ,i do not know why
is there any limit in hdfs system ?

2014-02-21 10:33:30,235 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: opReadBlock
BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240
received exc
eption java.io.IOException: Replica gen stamp < block genstamp,
block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
replica=ReplicaWaitingToBeRecov
ered, blk_-8536558734938003208_3820986, RWR
  getNumBytes() = 35840
  getBytesOnDisk()  = 35840
  getVisibleLength()= -1
  getVolume()   = /data/4/dn/current
  getBlockFile()=
/data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
  unlinked=false
2014-02-21 10:33:30,235 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(192.168.11.12,
storageID=DS-754202132-192.168.11.12-50010-1382443087835, infoP
ort=50075, ipcPort=50020,
storageInfo=lv=-40;cid=CID-0e777b8c-19f3-44a1-8af1-916877f2506c;nsid=2086828354;c=0):Got
exception while serving BP-1043055049-192.168.11.11-1382442676
609:blk_-8536558734938003208_3823240 to /192.168.11.15:56564
java.io.IOException: Replica gen stamp < block genstamp,
block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
replica=ReplicaWaitingToBeRecovered, b
lk_-8536558734938003208_3820986, RWR
  getNumBytes() = 35840
  getBytesOnDisk()  = 35840
  getVisibleLength()= -1
  getVolume()   = /data/4/dn/current
  getBlockFile()=
/data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
  unlinked=false
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:205)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
at java.lang.Thread.run(Thread.java:744)
2014-02-21 10:33:30,236 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver
error processing READ_BLOCK operation  src: /192.168.11.15:56564 dest: /
192.168.11.12:50010
java.io.IOException: Replica gen stamp < block genstamp,
block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
replica=ReplicaWaitingToBeRecovered, blk_-8536558734938003208_3820986, RWR
  getNumBytes() = 35840
  getBytesOnDisk()  = 35840
  getVisibleLength()= -1
  getVolume()   = /data/4/dn/current
  getBlockFile()=
/data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
  unlinked=false
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:205)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
at java.lang.Thread.run(Thread.java:744)


history server for 2 clusters

2014-02-20 Thread Anfernee Xu
Hi,

I'm at 2.2.0 release and I have a HDFS cluster which is shared by 2
YARN(MR) cluster, also I have a single shared history server, what I'm
seeing is I can see all job summary for all jobs from history server UI, I
also can see task log for jobs running in one cluster, but if I want to see
log for jobs running in another cluster, it showed me below error

Logs not available for attempt_1392933787561_0024_m_00_0. Aggregation
may not be complete, Check back later or try the nodemanager at
slc03jvt.mydomain.com:31303

Here's my configuration:

Note: my history server is running on RM node of the MR cluster where I can
see the log.


mapred-site.xml

  mapreduce.jobhistory.address
  slc00dgd:10020
  MapReduce JobHistory Server IPC host:port



  mapreduce.jobhistory.webapp.address
  slc00dgd:19888
  MapReduce JobHistory Server Web UI host:port


--yarn-site.xml
  
 yarn.log-aggregation-enable
 true
   

   
 yarn.nodemanager.remote-app-log-dir-suffix
 dc
   

Above configuration are almost same for both clusters, the only difference
is "yarn.nodemanager.remote-app-log-dir-suffix", they have different suffix.



-- 
--Anfernee


any optimize suggestion for high concurrent write into hdfs?

2014-02-20 Thread ch huang
hi,maillist:
  is there any optimize for large of write into hdfs in same time ?
thanks


Re: datanode is slow

2014-02-20 Thread Haohui Mai
It looks like your datanode is overloaded. You can scale your system by
adding more datanodes.

You can also try tighten the admission control to recover. You can lower
the number of dfs.datanode.max.transfer.threads so that the datanode
accepts fewer concurrent requests (but which also means that it serves less
number of clients).

~Haohui



On Thu, Feb 20, 2014 at 8:44 AM, lei liu  wrote:

> I use Hbase0.94 and CDH4. There are 25729 tcp connections in one
> machine,example:
> hadoop@apayhbs081 ~ $ netstat -a | wc -l
> 25729
>
> The linux configration is :
>softcore0
>hardrss 1
>hardnproc   20
>softnproc   20
>hardnproc   50
>hardnproc   0
>maxlogins   4
>nproc  20480
> nofile 204800
>
>
> When there are 25729 tcp connections in one machine, the datanode is very
> slow.
> How can I resolve the question?
>
>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Reg:Hive query with mapreduce

2014-02-20 Thread Shekhar Sharma
Assuming you are using TextInputFormat and your data set is comma separated
value , where secondColumn is empId third column is salary, then your
mapfunction would look like this



public class FooMapper extends Mapper
{


public void map(LongWritable offset, Text empRecord, Context context)
{
   String[]  splits = empRecord.toString().split(",");
   double salary = Double.parseDouble(splits[2]);
   if(salary > 12)
{
  context.write(new Text(splits[1],null);
}

}


set the number of reducer tasks to zero.

No of output files would be equal to number of map tasks in this case and
if you want to have single output file then

(1) Set the mapred.min.split.size=>. It will spawn only one mapper task and you will get
one output file



}

Regards,
Som Shekhar Sharma
+91-8197243810


On Thu, Feb 20, 2014 at 5:55 PM, Ranjini Rathinam wrote:

> Hi,
>
> How to implement the Hive query such as
>
> select * from table comp;
>
> select empId from comp where sal>12000;
>
> in mapreduce.
>
> Need to use this query in mapreduce code. How to implement the above query
> in the code using mapreduce , JAVA.
>
>
> Please provide the sample code.
>
> Thanks in advance for the support
>
> Regards
>
> Ranjini
>
>
>
>
>


datanode is slow

2014-02-20 Thread lei liu
I use Hbase0.94 and CDH4. There are 25729 tcp connections in one
machine,example:
hadoop@apayhbs081 ~ $ netstat -a | wc -l
25729

The linux configration is :
   softcore0
   hardrss 1
   hardnproc   20
   softnproc   20
   hardnproc   50
   hardnproc   0
   maxlogins   4
   nproc  20480
nofile 204800


When there are 25729 tcp connections in one machine, the datanode is very
slow.
How can I resolve the question?


har file globbing problem

2014-02-20 Thread Dan Buchan
We have a dataset of ~8Milllion files about .5 to 2 Megs each. And we're
having trouble getting them analysed after building a har file.

The files are already in a pre-existing directory structure, with, two
nested set of dirs with 20-100 pdfs at the bottom of each leaf of the dir
tree.

user->hadoop->/all_the_files/*/*/*.pdf

It was trivial to move these to hdfs and to build a har archive; I used the
following command to make the archive

bin/hadoop archive -archiveName test.har -p /user/hadoop/
all_the_files/*/*/ /user/hadoop/

Listing the contents of the har (bin/hadoop fs -lsr
har:///user/hadoop/epc_test.har) and everything looks as I'd expect.

When we come to run the hadoop job with this command, trying to wildcard
the archive:

bin/hadoop jar My.jar har:///user/hadoop/test.har/all_the_files/*/*/ output

it fails with the following exception

Exception in thread "main" java.lang.IllegalArgumentException: Can not
create a Path from an empty string

Running the job with the non-archived files is fine i.e:

bin/hadoop jar My.jar all_the_files/*/*/ output

However this only works for our modest test set of files. Any substantial
number of files quickly makes the namenode run out of memory.

Can you use file globs with the har archives? Is there a different way to
build the archive to just include the files which I've missed?
I appreciate that a sequence file might be a better fit for this task but
I'd like to know the solution to this issue if there is one.

-- 
 

t.  020 7739 3277
a. 131 Shoreditch High Street, London E1 6JE


Re: Service Level Authorization

2014-02-20 Thread Alex Nastetsky
If your test1 queue is under test queue, then you have to specify the path
in the same way:

yarn.scheduler.capacity.root.test.test1.acl_submit_applications (you are
missing the "test")

Also, if your "hadoop" user is a member of user group "hadoop", that is the
default value of the mapreduce.cluster.administrators in mapred-site.xml.
Users of that group can submit jobs to and administer all queues.


On Thu, Feb 20, 2014 at 11:28 AM, Juan Carlos  wrote:

> Yes, that is what I'm looking for, but I couldn't find this information
> for hadoop 2.2.0. I saw mapreduce.cluster.acls.enabled it's now the
> parameter to use. But I don't know how to set my ACLs.
> I'm using capacity schedurler and I've created 3 new queues test (which is
> under root at the same level as default) and test1 and test2, which are
> under test. As I said, I enabled mapreduce.cluster.acls.enabled in
> mapred-site.xml and later added the parameter
> yarn.scheduler.capacity.root.test1.acl_submit_applications with value
> "jcfernandez ". If I submit a job to queue test1 with user hadoop, it
> allows it to run it.
> Which is my error?
>
>
> 2014-02-20 16:41 GMT+01:00 Alex Nastetsky :
>
> Juan,
>>
>> What kind of information are you looking for? The service level ACLs are
>> for limiting which services can communicate under certain protocols, by
>> username or user group.
>>
>> Perhaps you are looking for client level ACL, something like the
>> MapReduce ACLs?
>> https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html#Job+Authorization
>>
>> Alex.
>>
>>
>> 2014-02-20 4:58 GMT-05:00 Juan Carlos :
>>
>> Where could I find some information about ACL? I only could find the
>>> available in
>>> http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/ServiceLevelAuth.html,
>>>  which isn't so detailed.
>>> Regards
>>>
>>> Juan Carlos Fernández Rodríguez
>>> Consultor Tecnológico
>>>
>>> Telf: +34918105294
>>> Móvil: +34639311788
>>>
>>> CEDIANT
>>> Centro para el Desarrollo, Investigación y Aplicación de Nuevas
>>> Tecnologías
>>> HPC Business Solutions
>>>
>>> * AVISO LEGAL *
>>> Este mensaje es solamente para la persona a la que va dirigido. Puede
>>> contener información confidencial o legalmente protegida. No hay renuncia a
>>> la confidencialidad o privilegio por cualquier transmisión mala/errónea. Si
>>> usted ha recibido este mensaje por error,le rogamos que borre de su sistema
>>> inmediatamente el mensaje asi como todas sus copias, destruya todas las
>>> copias del mismo de su disco duro y notifique al remitente. No debe,
>>> directa o indirectamente, usar, revelar, distribuir, imprimir o copiar
>>> ninguna de las partes de este mensaje si no es usted el destinatario.
>>> Cualquier opinión expresada en este mensaje proviene del remitente, excepto
>>> cuando el mensaje establezca lo contrario y el remitente esté autorizado
>>> para establecer que dichas opiniones provienen de 'CEDIANT'. Nótese que el
>>> correo electrónico vía Internet no permite asegurar ni la confidencialidad
>>> de los mensajes que se transmiten ni la correcta recepción de los mismos.
>>> En el caso de que el destinatario de este mensaje no consintiera la
>>> utilización del correo electrónico vía Internet, rogamos lo ponga en
>>> nuestro conocimiento de manera inmediata.
>>>
>>> * DISCLAIMER *
>>>  This message is intended exclusively for the named person. It may
>>> contain confidential, propietary or legally privileged information. No
>>> confidentiality or privilege is waived or lost by any mistransmission. If
>>> you receive this message in error, please immediately delete it and all
>>> copies of it from your system, destroy any hard copies of it an notify the
>>> sender. Your must not, directly or indirectly, use, disclose, distribute,
>>> print, or copy any part of this message if you are not the intended
>>> recipient. Any views expressed in this message are those of the individual
>>> sender, except where the message states otherwise and the sender is
>>> authorised to state them to be the views of 'CEDIANT'. Please note that
>>> internet e-mail neither guarantees the confidentiality nor the proper
>>> receipt of the message sent. If the addressee of this message does not
>>> consent to the use of internet e-mail, please communicate it to us
>>> immediately.
>>>
>>>
>>
>


Re: Service Level Authorization

2014-02-20 Thread Juan Carlos
Yes, that is what I'm looking for, but I couldn't find this information for
hadoop 2.2.0. I saw mapreduce.cluster.acls.enabled it's now the parameter
to use. But I don't know how to set my ACLs.
I'm using capacity schedurler and I've created 3 new queues test (which is
under root at the same level as default) and test1 and test2, which are
under test. As I said, I enabled mapreduce.cluster.acls.enabled in
mapred-site.xml and later added the parameter
yarn.scheduler.capacity.root.test1.acl_submit_applications with value
"jcfernandez ". If I submit a job to queue test1 with user hadoop, it
allows it to run it.
Which is my error?


2014-02-20 16:41 GMT+01:00 Alex Nastetsky :

> Juan,
>
> What kind of information are you looking for? The service level ACLs are
> for limiting which services can communicate under certain protocols, by
> username or user group.
>
> Perhaps you are looking for client level ACL, something like the MapReduce
> ACLs?
> https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html#Job+Authorization
>
> Alex.
>
>
> 2014-02-20 4:58 GMT-05:00 Juan Carlos :
>
> Where could I find some information about ACL? I only could find the
>> available in
>> http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/ServiceLevelAuth.html,
>>  which isn't so detailed.
>> Regards
>>
>> Juan Carlos Fernández Rodríguez
>> Consultor Tecnológico
>>
>> Telf: +34918105294
>> Móvil: +34639311788
>>
>> CEDIANT
>> Centro para el Desarrollo, Investigación y Aplicación de Nuevas
>> Tecnologías
>> HPC Business Solutions
>>
>> * AVISO LEGAL *
>> Este mensaje es solamente para la persona a la que va dirigido. Puede
>> contener información confidencial o legalmente protegida. No hay renuncia a
>> la confidencialidad o privilegio por cualquier transmisión mala/errónea. Si
>> usted ha recibido este mensaje por error,le rogamos que borre de su sistema
>> inmediatamente el mensaje asi como todas sus copias, destruya todas las
>> copias del mismo de su disco duro y notifique al remitente. No debe,
>> directa o indirectamente, usar, revelar, distribuir, imprimir o copiar
>> ninguna de las partes de este mensaje si no es usted el destinatario.
>> Cualquier opinión expresada en este mensaje proviene del remitente, excepto
>> cuando el mensaje establezca lo contrario y el remitente esté autorizado
>> para establecer que dichas opiniones provienen de 'CEDIANT'. Nótese que el
>> correo electrónico vía Internet no permite asegurar ni la confidencialidad
>> de los mensajes que se transmiten ni la correcta recepción de los mismos.
>> En el caso de que el destinatario de este mensaje no consintiera la
>> utilización del correo electrónico vía Internet, rogamos lo ponga en
>> nuestro conocimiento de manera inmediata.
>>
>> * DISCLAIMER *
>>  This message is intended exclusively for the named person. It may
>> contain confidential, propietary or legally privileged information. No
>> confidentiality or privilege is waived or lost by any mistransmission. If
>> you receive this message in error, please immediately delete it and all
>> copies of it from your system, destroy any hard copies of it an notify the
>> sender. Your must not, directly or indirectly, use, disclose, distribute,
>> print, or copy any part of this message if you are not the intended
>> recipient. Any views expressed in this message are those of the individual
>> sender, except where the message states otherwise and the sender is
>> authorised to state them to be the views of 'CEDIANT'. Please note that
>> internet e-mail neither guarantees the confidentiality nor the proper
>> receipt of the message sent. If the addressee of this message does not
>> consent to the use of internet e-mail, please communicate it to us
>> immediately.
>>
>>
>


Re: Service Level Authorization

2014-02-20 Thread Alex Nastetsky
Juan,

What kind of information are you looking for? The service level ACLs are
for limiting which services can communicate under certain protocols, by
username or user group.

Perhaps you are looking for client level ACL, something like the MapReduce
ACLs?
https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html#Job+Authorization

Alex.


2014-02-20 4:58 GMT-05:00 Juan Carlos :

> Where could I find some information about ACL? I only could find the
> available in
> http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/ServiceLevelAuth.html,
>  which isn't so detailed.
> Regards
>
> Juan Carlos Fernández Rodríguez
> Consultor Tecnológico
>
> Telf: +34918105294
> Móvil: +34639311788
>
> CEDIANT
> Centro para el Desarrollo, Investigación y Aplicación de Nuevas Tecnologías
> HPC Business Solutions
>
> * AVISO LEGAL *
> Este mensaje es solamente para la persona a la que va dirigido. Puede
> contener información confidencial o legalmente protegida. No hay renuncia a
> la confidencialidad o privilegio por cualquier transmisión mala/errónea. Si
> usted ha recibido este mensaje por error,le rogamos que borre de su sistema
> inmediatamente el mensaje asi como todas sus copias, destruya todas las
> copias del mismo de su disco duro y notifique al remitente. No debe,
> directa o indirectamente, usar, revelar, distribuir, imprimir o copiar
> ninguna de las partes de este mensaje si no es usted el destinatario.
> Cualquier opinión expresada en este mensaje proviene del remitente, excepto
> cuando el mensaje establezca lo contrario y el remitente esté autorizado
> para establecer que dichas opiniones provienen de 'CEDIANT'. Nótese que el
> correo electrónico vía Internet no permite asegurar ni la confidencialidad
> de los mensajes que se transmiten ni la correcta recepción de los mismos.
> En el caso de que el destinatario de este mensaje no consintiera la
> utilización del correo electrónico vía Internet, rogamos lo ponga en
> nuestro conocimiento de manera inmediata.
>
> * DISCLAIMER *
>  This message is intended exclusively for the named person. It may contain
> confidential, propietary or legally privileged information. No
> confidentiality or privilege is waived or lost by any mistransmission. If
> you receive this message in error, please immediately delete it and all
> copies of it from your system, destroy any hard copies of it an notify the
> sender. Your must not, directly or indirectly, use, disclose, distribute,
> print, or copy any part of this message if you are not the intended
> recipient. Any views expressed in this message are those of the individual
> sender, except where the message states otherwise and the sender is
> authorised to state them to be the views of 'CEDIANT'. Please note that
> internet e-mail neither guarantees the confidentiality nor the proper
> receipt of the message sent. If the addressee of this message does not
> consent to the use of internet e-mail, please communicate it to us
> immediately.
>
>


Re: Reg:Hive query with mapreduce

2014-02-20 Thread Nitin Pawar
try this

http://ysmart.cse.ohio-state.edu/online.html


On Thu, Feb 20, 2014 at 5:55 PM, Ranjini Rathinam wrote:

> Hi,
>
> How to implement the Hive query such as
>
> select * from table comp;
>
> select empId from comp where sal>12000;
>
> in mapreduce.
>
> Need to use this query in mapreduce code. How to implement the above query
> in the code using mapreduce , JAVA.
>
>
> Please provide the sample code.
>
> Thanks in advance for the support
>
> Regards
>
> Ranjini
>
>
>
>
>



-- 
Nitin Pawar


Reg:Hive query with mapreduce

2014-02-20 Thread Ranjini Rathinam
Hi,

How to implement the Hive query such as

select * from table comp;

select empId from comp where sal>12000;

in mapreduce.

Need to use this query in mapreduce code. How to implement the above query
in the code using mapreduce , JAVA.


Please provide the sample code.

Thanks in advance for the support

Regards

Ranjini


[no subject]

2014-02-20 Thread x



Service Level Authorization

2014-02-20 Thread Juan Carlos
Where could I find some information about ACL? I only could find the
available in
http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/ServiceLevelAuth.html,
which isn't so detailed.
Regards

Juan Carlos Fernández Rodríguez
Consultor Tecnológico

Telf: +34918105294
Móvil: +34639311788

CEDIANT
Centro para el Desarrollo, Investigación y Aplicación de Nuevas Tecnologías
HPC Business Solutions

* AVISO LEGAL *
Este mensaje es solamente para la persona a la que va dirigido. Puede
contener información confidencial o legalmente protegida. No hay renuncia a
la confidencialidad o privilegio por cualquier transmisión mala/errónea. Si
usted ha recibido este mensaje por error,le rogamos que borre de su sistema
inmediatamente el mensaje asi como todas sus copias, destruya todas las
copias del mismo de su disco duro y notifique al remitente. No debe,
directa o indirectamente, usar, revelar, distribuir, imprimir o copiar
ninguna de las partes de este mensaje si no es usted el destinatario.
Cualquier opinión expresada en este mensaje proviene del remitente, excepto
cuando el mensaje establezca lo contrario y el remitente esté autorizado
para establecer que dichas opiniones provienen de 'CEDIANT'. Nótese que el
correo electrónico vía Internet no permite asegurar ni la confidencialidad
de los mensajes que se transmiten ni la correcta recepción de los mismos.
En el caso de que el destinatario de este mensaje no consintiera la
utilización del correo electrónico vía Internet, rogamos lo ponga en
nuestro conocimiento de manera inmediata.

* DISCLAIMER *
 This message is intended exclusively for the named person. It may contain
confidential, propietary or legally privileged information. No
confidentiality or privilege is waived or lost by any mistransmission. If
you receive this message in error, please immediately delete it and all
copies of it from your system, destroy any hard copies of it an notify the
sender. Your must not, directly or indirectly, use, disclose, distribute,
print, or copy any part of this message if you are not the intended
recipient. Any views expressed in this message are those of the individual
sender, except where the message states otherwise and the sender is
authorised to state them to be the views of 'CEDIANT'. Please note that
internet e-mail neither guarantees the confidentiality nor the proper
receipt of the message sent. If the addressee of this message does not
consent to the use of internet e-mail, please communicate it to us
immediately.