Re: Can we control data distribution and load balancing in Hadoop Cluster?

2015-05-04 Thread Answer Agrawal
Thanks Mr Chandrashekhar

The input data sets in HDFS breaks it in blocks of default size 128 MB and
replicate it by default replication factor 3. It also balance load by
transfering job of failed or busy nodes to free or active nodes. Can we
manage how much data sets and load should assign to which node by ourselves.

On Mon, May 4, 2015 at 12:03 AM, Chandrashekhar Kotekar <
shekhar.kote...@gmail.com> wrote:

> Your question is very vague. Can you give us more details about the
> problem you are trying to solve?
>
>
> Regards,
> Chandrash3khar Kotekar
> Mobile - +91 8600011455
>
> On Sun, May 3, 2015 at 11:59 PM, Answer Agrawal 
> wrote:
>
>> Hi
>>
>> As I studied that data distribution, load balancing, fault tolerance are
>> implicit in Hadoop. But I need to customize it, can we do that?
>>
>> Thanks
>>
>>
>


Connect c language with HDFS

2015-05-04 Thread unmesha sreeveni
Hi
  Can we connect c with HDFS using cloudera hadoop distribution.

-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/


Re: Connect c language with HDFS

2015-05-04 Thread Alexander Alten-Lorenz
Google:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/LibHdfs.html

--
Alexander Alten-Lorenz
m: wget.n...@gmail.com
b: mapredit.blogspot.com

> On May 4, 2015, at 10:57 AM, unmesha sreeveni  wrote:
> 
> Hi 
>   Can we connect c with HDFS using cloudera hadoop distribution.
> 
> -- 
> Thanks & Regards
> 
> Unmesha Sreeveni U.B
> Hadoop, Bigdata Developer
> Centre for Cyber Security | Amrita Vishwa Vidyapeetham
> http://www.unmeshasreeveni.blogspot.in/ 
> 
> 
> 



Re: Connect c language with HDFS

2015-05-04 Thread unmesha sreeveni
thanks alex
  I have gone through the same. but once I checked my cloudera distribution
I am not able to get those folders ..Thats y I posted here. I dont know if
I made any mistake.

On Mon, May 4, 2015 at 2:40 PM, Alexander Alten-Lorenz 
wrote:

> Google:
>
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/LibHdfs.html
>
> --
> Alexander Alten-Lorenz
> m: wget.n...@gmail.com
> b: mapredit.blogspot.com
>
> On May 4, 2015, at 10:57 AM, unmesha sreeveni 
> wrote:
>
> Hi
>   Can we connect c with HDFS using cloudera hadoop distribution.
>
> --
> *Thanks & Regards *
>
>
> *Unmesha Sreeveni U.B*
> *Hadoop, Bigdata Developer*
> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
> http://www.unmeshasreeveni.blogspot.in/
>
>
>
>


-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/


Re: Connect c language with HDFS

2015-05-04 Thread Alexander Alten-Lorenz
That depends on the installation source (rpm, tgz or parcels). Usually, when 
you use parcels, libhdfs.so* should be within /opt/cloudera/parcels/CDH/lib64/ 
(or similar). Or just use linux' "locate" (locate libhdfs.so*) to find the 
library.




--
Alexander Alten-Lorenz
m: wget.n...@gmail.com
b: mapredit.blogspot.com

> On May 4, 2015, at 11:39 AM, unmesha sreeveni  wrote:
> 
> thanks alex
>   I have gone through the same. but once I checked my cloudera distribution I 
> am not able to get those folders ..Thats y I posted here. I dont know if I 
> made any mistake.
> 
> On Mon, May 4, 2015 at 2:40 PM, Alexander Alten-Lorenz  > wrote:
> Google:
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/LibHdfs.html
>  
> 
> 
> --
> Alexander Alten-Lorenz
> m: wget.n...@gmail.com 
> b: mapredit.blogspot.com 
> 
>> On May 4, 2015, at 10:57 AM, unmesha sreeveni > > wrote:
>> 
>> Hi 
>>   Can we connect c with HDFS using cloudera hadoop distribution.
>> 
>> -- 
>> Thanks & Regards
>> 
>> Unmesha Sreeveni U.B
>> Hadoop, Bigdata Developer
>> Centre for Cyber Security | Amrita Vishwa Vidyapeetham
>> http://www.unmeshasreeveni.blogspot.in/ 
>> 
>> 
>> 
> 
> 
> 
> 
> -- 
> Thanks & Regards
> 
> Unmesha Sreeveni U.B
> Hadoop, Bigdata Developer
> Centre for Cyber Security | Amrita Vishwa Vidyapeetham
> http://www.unmeshasreeveni.blogspot.in/ 
> 
> 
> 



Re: Connect c language with HDFS

2015-05-04 Thread unmesha sreeveni
Thanks
Did it.
http://unmeshasreeveni.blogspot.in/2015/05/hadoop-word-count-using-c-hadoop.html

On Mon, May 4, 2015 at 3:43 PM, Alexander Alten-Lorenz 
wrote:

> That depends on the installation source (rpm, tgz or parcels). Usually,
> when you use parcels, libhdfs.so* should be within /opt/cloudera/parcels/
> CDH/lib64/ (or similar). Or just use linux' "locate" (locate libhdfs.so*)
> to find the library.
>
>
>
>
> --
> Alexander Alten-Lorenz
> m: wget.n...@gmail.com
> b: mapredit.blogspot.com
>
> On May 4, 2015, at 11:39 AM, unmesha sreeveni 
> wrote:
>
> thanks alex
>   I have gone through the same. but once I checked my cloudera
> distribution I am not able to get those folders ..Thats y I posted here. I
> dont know if I made any mistake.
>
> On Mon, May 4, 2015 at 2:40 PM, Alexander Alten-Lorenz <
> wget.n...@gmail.com> wrote:
>
>> Google:
>>
>> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/LibHdfs.html
>>
>> --
>> Alexander Alten-Lorenz
>> m: wget.n...@gmail.com
>> b: mapredit.blogspot.com
>>
>> On May 4, 2015, at 10:57 AM, unmesha sreeveni 
>> wrote:
>>
>> Hi
>>   Can we connect c with HDFS using cloudera hadoop distribution.
>>
>> --
>> *Thanks & Regards *
>>
>>
>> *Unmesha Sreeveni U.B*
>> *Hadoop, Bigdata Developer*
>> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
>> http://www.unmeshasreeveni.blogspot.in/
>>
>>
>>
>>
>
>
> --
> *Thanks & Regards *
>
>
> *Unmesha Sreeveni U.B*
> *Hadoop, Bigdata Developer*
> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
> http://www.unmeshasreeveni.blogspot.in/
>
>
>
>


-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/


Re: parque table

2015-05-04 Thread gabriel balan

Hi

If you're quoted fields may contain commas, you must use RegexSerDe to parse 
each line into fields.

   create table foo(c0 string, c1 string, c2 string, c3 string,  c4 string,  c5 
string,  c6 string,  c7 string)
   row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
   with serdeproperties
   ("input.regex" = 
"^([^,]*),\"([^\"]*)\",([^,]*),([^,]*),\"([^\"]*)\",\"([^\"]*)\",\"([^\"]*)\",\"([^\"]*)\"$");


   --here I assumed some fields are always quoted, and some fields are always 
unquoted. You may need something fancier for the general case.

   load DATA local inpath 'log.txt.gz' into table foo;

   select * from foo;
   OK
   106 2003-02-03  20  2   A   2 2   037
   106 2003-02-03  20  3   A   2 2   037
   106 2003-02-03  8   2   A   2 2   037

If you're sure there are no commas in your quoted fields, then you could try 
putting a view on top of the table, and have the view use UDFs to strip the 
quotes.


hth
Gabriel Balan

On 5/2/2015 1:04 AM, Kumar Jayapal wrote:6

Hi,

When I am loading this data I am getting " "  inserted into the table how to load with 
out " "it.


Inline image 1



thanks
jay










Thanks
Jay

On Fri, May 1, 2015 at 8:21 AM, Hadoop User mailto:kjayapa...@gmail.com>> wrote:

Here is the content of the file once it's unzip

106,"2003-02-03",20,2,"A","2","2","037"
106,"2003-02-03",20,3,"A","2","2","037"
106,"2003-02-03",8,2,"A","2","2","037"





On May 1, 2015, at 7:32 AM, Asit Parija mailto:a...@sigmoidanalytics.com>> wrote:


Hi Kumar ,
  You can remove the stored as text file part and then try that out by 
default it should be able to read the .gz files ( if they are comma delimited 
csv files ) .


Thanks
Asit

On Fri, May 1, 2015 at 10:55 AM, Kumar Jayapal mailto:kjayapa...@gmail.com>> wrote:

Hello Nitin,

Dint understand what you mean. Are you telling me to  set 
COMPRESSION_CODEC=gzip ?

thanks
Jay

On Thu, Apr 30, 2015 at 10:02 PM, Nitin Pawar mailto:nitinpawar...@gmail.com>> wrote:

You loaded a gz file in a table stored as text file
either define compression format or uncompress the file and load it

On Fri, May 1, 2015 at 9:17 AM, Kumar Jayapal mailto:kjayapa...@gmail.com>> wrote:

Created table CREATE TABLE raw (line STRING) PARTITIONED BY 
(FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
STORED AS TEXTFILE;

and loaded it with data.

|LOAD DATA LOCAL INPATH 
||'/tmp/weblogs/20090603-access.log.gz'||INTO TABLE raw;|
|
|
|I have to load it to parque table|
|
|
|when I say select * from raw it shows all null values.|
|
|
|

NULLNULLNULLNULLNULLNULLNULL
NULL

NULLNULLNULLNULLNULLNULLNULL
NULL

NULLNULLNULLNULLNULLNULLNULL
NULL

NULLNULLNULLNULLNULLNULLNULL
NULL

|
Why is not show showing the actual data in file. will it show 
once I load it to parque table?

Please let me know if I am doing anything wrong.


Thanks
jay
|
|




-- 
Nitin Pawar








--
The statements and opinions expressed here are my own and do not necessarily 
represent those of Oracle Corporation.



Number of vcores for YARN

2015-05-04 Thread Akmal Abbasov
Hi,
when I execute 
cat /proc/cpuinfo | grep ^processor | wc -l 
I get 2, do I need to specify this value in yarn.nodemanager.resource.cpu-vcores
or there is some kind of ratio between pcore and vcore?
I found yarn.nodemanager.vcores-pcores-ratio but it seems that is is 
deprecated, since I cannot find it in hadoop 2.5.1.
Thank you.

Regards,
Akmal Abbasov

Re: Connect c language with HDFS

2015-05-04 Thread Demai Ni
I would also suggest to take a look at
https://issues.apache.org/jira/browse/HDFS-6994. I have been using libhdfs3
for POC in past few months, and highly recommend it.  the only drawback is
the libhdfs3 has not been formed committed into hadoop/hdfs yet.

if you only like to play with hdfs, using the existing libhdfs lib is fine.
but if you are looking for some serious development, libhdfs3 has a lot of
advantage.


On Mon, May 4, 2015 at 3:59 AM, unmesha sreeveni 
wrote:

> Thanks
> Did it.
>
> http://unmeshasreeveni.blogspot.in/2015/05/hadoop-word-count-using-c-hadoop.html
>
> On Mon, May 4, 2015 at 3:43 PM, Alexander Alten-Lorenz <
> wget.n...@gmail.com> wrote:
>
>> That depends on the installation source (rpm, tgz or parcels). Usually,
>> when you use parcels, libhdfs.so* should be within /opt/cloudera/parcels/
>> CDH/lib64/ (or similar). Or just use linux' "locate" (locate
>> libhdfs.so*) to find the library.
>>
>>
>>
>>
>> --
>> Alexander Alten-Lorenz
>> m: wget.n...@gmail.com
>> b: mapredit.blogspot.com
>>
>> On May 4, 2015, at 11:39 AM, unmesha sreeveni 
>> wrote:
>>
>> thanks alex
>>   I have gone through the same. but once I checked my cloudera
>> distribution I am not able to get those folders ..Thats y I posted here. I
>> dont know if I made any mistake.
>>
>> On Mon, May 4, 2015 at 2:40 PM, Alexander Alten-Lorenz <
>> wget.n...@gmail.com> wrote:
>>
>>> Google:
>>>
>>> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/LibHdfs.html
>>>
>>> --
>>> Alexander Alten-Lorenz
>>> m: wget.n...@gmail.com
>>> b: mapredit.blogspot.com
>>>
>>> On May 4, 2015, at 10:57 AM, unmesha sreeveni 
>>> wrote:
>>>
>>> Hi
>>>   Can we connect c with HDFS using cloudera hadoop distribution.
>>>
>>> --
>>> *Thanks & Regards *
>>>
>>>
>>> *Unmesha Sreeveni U.B*
>>> *Hadoop, Bigdata Developer*
>>> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
>>> http://www.unmeshasreeveni.blogspot.in/
>>>
>>>
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards *
>>
>>
>> *Unmesha Sreeveni U.B*
>> *Hadoop, Bigdata Developer*
>> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
>> http://www.unmeshasreeveni.blogspot.in/
>>
>>
>>
>>
>
>
> --
> *Thanks & Regards *
>
>
> *Unmesha Sreeveni U.B*
> *Hadoop, Bigdata Developer*
> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
> http://www.unmeshasreeveni.blogspot.in/
>
>
>


RE: Error in YARN localization with Active Directory user -- inconsistent directory name escapement

2015-05-04 Thread John Lilley
Follow-up, this is indeed a YARN bug and I've filed a JIRA, which has garnered 
a lot of attention and a patch.
john

From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Friday, April 17, 2015 1:01 PM
To: 'user@hadoop.apache.org'
Subject: Error in YARN localization with Active Directory user -- inconsistent 
directory name escapement

We have a Cloudera 5.3 cluster running on CentOS6 that is Kerberos-enabled and 
uses an external AD domain controller for the KDC.  We are able to 
authenticate, browse HDFS, etc.  However, YARN fails during localization 
because it seems to get confused by the presence of a \ character in the local 
user name.

Our AD authentication on the nodes goes through sssd and set configured to map 
AD users onto the form domain\username.  For example, our test user has a 
Kerberos principal of 
rpdmuse...@office.datalevr.com and that 
maps onto a CentOS user "office\rpdmuserAD".  We have no problem validating 
that user with PAM, logging in as that user, su-ing to that user, etc.

However, when we attempt to run a YARN application master, the localization 
step fails when setting up the local cache directory for the AM.  The error 
that comes out of the RM logs:
2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication: 
ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED, 
diagnostics='Application application_1429295486450_0001 failed 1 times due to 
AM Container for appattempt_1429295486450_0001_01 exited with  exitCode: 
-1000 due to: Application application_1429295486450_0001 initialization failed 
(exitCode=255) with output: main : command provided 0
main : user is OFFICE\rpdmuserad
main : requested yarn user is office\rpdmuserAD
org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create directory: 
/data/yarn/nm/usercache/office%5CrpdmuserAD/appcache/application_1429295486450_0001/filecache/10
at 
org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347)
.Failing this attempt.. Failing the application.'

However, when we look on the node launching the AM, we see this:
[root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache
[root@rpb-cdh-kerb-2 usercache]# ls -l
drwxr-s--- 4 OFFICE\rpdmuserad yarn 4096 Apr 17 12:10 office\rpdmuserAD

There appears to be different treatment of the \ character in different places. 
 Something creates the directory as "office\rpdmuserAD" but something else 
later attempts to use it as "office%5CrpdmuserAD".  I'm not sure where or why 
the URL escapement converts the \ to %5C or why this is not consistent.

Is this a known issue?  Any fixes available?  Are we simply not allowed to map 
local usernames this way?

I should also mention, for the sake of completeness, our auth_to_local rule is 
set up to map u...@office.datalever.com to 
OFFICE\user:
RULE:[1:$1@$0](^.*@OFFICE\.DATALEVER\.COM$)s/^(.*)@OFFICE\.DATALEVER\.COM$/office\\$1/g

Thanks
John Lilley