Re: Copy Vs DistCP

2013-04-10 Thread Alexander Pivovarov
if cluster is busy with other jobs distcp will wait for free map slots. Regular cp is more reliable and predictable. Especialy if you need to copy just several GB On Apr 10, 2013 6:31 PM, "Azuryy Yu" wrote: > CP command is not parallel, It's just call FileSystem, even if DFSClient > has multi thr

Re: Nested class

2012-10-01 Thread Alexander Pivovarov
it should work. Make sure top level class is public On Oct 1, 2012 1:32 PM, "Kartashov, Andy" wrote: > Hello all, > > > > Is this possible to have Reducer and Mapper as a static nested classes > inside a driver file? Keep getting an ERROR: $.class not > found during map-reduce job execution. I

Re: HDFS disk space requirement

2013-01-10 Thread Alexander Pivovarov
finish elementary school first. (plus, minus operations at least) On Thu, Jan 10, 2013 at 7:23 PM, Panshul Whisper wrote: > Thank you for the response. > > Actually it is not a single file, I have JSON files that amount to 115 GB, > these JSON files need to be processed and loaded into a Hbase d

Re: doubt

2014-01-18 Thread Alexander Pivovarov
it' enough. hadoop uses only 1GB RAM by default. On Sat, Jan 18, 2014 at 10:11 PM, sri harsha wrote: > Hi , > i want to install 4 node cluster in 64-bit LINUX. 4GB RAM 500HD is enough > for this or shall i need to expand ? > please suggest about my query. > > than x > > -- > amiable harsha >

Re: Setting Up First Hadoop / Yarn Cluster

2014-07-31 Thread Alexander Pivovarov
Probably permission issue. On Thu, Jul 31, 2014 at 11:32 AM, Houston King wrote: > Hey Everyone, > > I'm a noob working to setup my first 13 node Hadoop 2.4.0 cluster, and > I've run into some problems that I'm having a heck of a time debugging. > > I've been following the guide posted at > htt

Re: How to serialize very large object in Hadoop Writable?

2014-08-22 Thread Alexander Pivovarov
Max array size is max integer. So, byte array can not be bigger than 2GB On Aug 22, 2014 1:41 PM, "Yuriy" wrote: > Hadoop Writable interface relies on "public void write(DataOutput out)" > method. > It looks like behind DataOutput interface, Hadoop uses DataOutputStream, > which uses a simple ar

Re: How to serialize very large object in Hadoop Writable?

2014-08-22 Thread Alexander Pivovarov
uld be the workaround if the combined set of data is larger than 2 GB? > > > On Fri, Aug 22, 2014 at 1:50 PM, Alexander Pivovarov > wrote: > >> Max array size is max integer. So, byte array can not be bigger than 2GB >> On Aug 22, 2014 1:41 PM, "Yuriy" wrote:

Re: Tez and MapReduce

2014-09-01 Thread Alexander Pivovarov
e.g. in hive to switch engines set hive.execution.engine=mr; or set hive.execution.engine=tez; tez is faster especially on complex queries. On Aug 31, 2014 10:33 PM, "Adaryl "Bob" Wakefield, MBA" < adaryl.wakefi...@hotmail.com> wrote: > Can Tez and MapReduce live together and get along in the s

Re: No space when running a hadoop job

2014-09-27 Thread Alexander Pivovarov
It can read/write in parallel to all drives. More hdd more io speed. On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay" wrote: > Correct me if I am wrong. > > Adding multiple directories will not balance the files distributions > across these locations. > > Hadoop will add exhaust the first direct

Re: Spark vs Tez

2014-10-17 Thread Alexander Pivovarov
Spark creator Amplab did some benchmarks. https://amplab.cs.berkeley.edu/benchmark/ On Fri, Oct 17, 2014 at 11:06 AM, Adaryl "Bob" Wakefield, MBA < adaryl.wakefi...@hotmail.com> wrote: > Does anybody have any performance figures on how Spark stacks up > against Tez? If you don’t have figures, d

Re: Spark vs Tez

2014-10-17 Thread Alexander Pivovarov
It's going to be spark engine for hive (in addition to mr and tez). Spark API is available for Java and Python as well. Tez engine is available now and it's quite stable. As for speed. For complex queries it shows 10x-20x improvement in comparison to mr engine. e.g. one of my queries runs 30 min

Re: how to copy data between two hdfs cluster fastly?

2014-10-17 Thread Alexander Pivovarov
try to run on dest cluster datanode $ hadoop fs -cp hdfs://from_cluster/hdfs://to_cluster/ On Fri, Oct 17, 2014 at 11:26 AM, Shivram Mani wrote: > What is your approx input size ? > Do you have multiple files or is this one large file ? > What is your block size (source and destina

Re: problems with Hadoop instalation

2014-10-29 Thread Alexander Pivovarov
Are RHEL7 based OSs supported? On Wed, Oct 29, 2014 at 3:59 PM, David Novogrodsky < david.novogrod...@gmail.com> wrote: > All, > > I am new to Hadoop so any help would be appreciated. > > I have a question for the mailing list regarding Hadoop. I have installed > the most recent stable version

Re: High Availability hadoop cluster.

2014-11-06 Thread Alexander Pivovarov
2 boxes for 2 NNs (better dedicated boxes) min 3 JNs min 3 ZKs JNs and ZKs can share boxes with other services On Wed, Nov 5, 2014 at 11:31 PM, Oleg Ruchovets wrote: > Hello. > We are using hortonwork distribution and want to evaluate HA capabilities. > Can community please share the best pra

Re: High Availability hadoop cluster.

2014-11-06 Thread Alexander Pivovarov
ZKs installed also as part of > hortonworks distributions. > Sorry for dummy question - what is JN is? > Can you please please point me on some manual wiki for installation / > configuration. > > Thanks > Oleg. > > > On Thu, Nov 6, 2014 at 4:04 PM, Alexander Pivovarov

Re: High Availability hadoop cluster.

2014-11-06 Thread Alexander Pivovarov
you for the link. > Just to be sure - JN can be installed on data nodes like zookeeper? > If we have 2 Name Nodes and 15 Data Nodes - is it correct to install ZK > and JN on datanodes machines? > Thanks > Oleg. > > > On Thu, Nov 6, 2014 at 5:06 PM, Alexander Pivovarov > wr

Re: Hardware requirements for simple node hadoop cluster

2014-12-07 Thread Alexander Pivovarov
for balanced conf you need (per core) 1-1.5 2 GB 7200 SATA hdd for hdfs (in JBOD mode, not RAID) 3-4 GB RAM ECC reserve 4GB RAM for OS better to use separate hdd or usb stick for OS e.g. for 16 cores you can use 16-24 2GB hdds 64 GB RAM (if planing to use Apache Spark put 128 GB) On Sun, D

Re: way to add custom udf jar in hadoop 2.x version

2014-12-31 Thread Alexander Pivovarov
I found that the easiest way is to put udf jar to /usr/lib/hadoop-mapred on all computers in the cluster. Hive cli, hiveserver2, oozie launcher, oozie hive action, mr will see the jar then. I'm using hdp-2.1.5 On Dec 30, 2014 10:58 PM, "reena upadhyay" wrote: > Hi, > > I am using hadoop 2.4.0 ve

Re: Multiple separate Hadoop clusters on same physical machines

2015-02-01 Thread Alexander Pivovarov
start several vms and install hadoop on each vm keywords: kvm, QEMU On Mon, Jan 26, 2015 at 1:18 AM, Harun Reşit Zafer < harun.za...@tubitak.gov.tr> wrote: > Hi everyone, > > We have set up and been playing with Hadoop 1.2.x and its friends (Hbase, > pig, hive etc.) on 7 physical servers. We want

Re: Building for Windows

2015-02-11 Thread Alexander Pivovarov
try mvn package -Pdist -Dtar -DskipTests On Wed, Feb 11, 2015 at 2:02 PM, Lucio Crusca wrote: > Hello everybody, > > I'm absolutely new to hadoop and a customer asked me to build version 2.6 > for > Windows Server 2012 R2. I'm myself a java programmer, among other things, > but > I've never use

Re: Building for Windows

2015-02-11 Thread Alexander Pivovarov
n hadoop-dist\target\. https://wiki.apache.org/hadoop/Hadoop2OnWindows https://svn.apache.org/viewvc/hadoop/common/branches/branch-2/BUILDING.txt?view=markup On Wed, Feb 11, 2015 at 3:09 PM, Alexander Pivovarov wrote: > try > > mvn package -Pdist -Dtar -DskipTests > > On Wed, Fe

Re: Building for Windows

2015-02-11 Thread Alexander Pivovarov
PM, Lucio Crusca wrote: > In data mercoledì 11 febbraio 2015 15:17:23, Alexander Pivovarov ha > scritto: > > in addition to skipTests you want to add "native-win" profile > > > > > mvn clean package -Pdist,native-win -DskipTests -Dtar > > Ok thanks but..

Re: Copying many files to HDFS

2015-02-16 Thread Alexander Pivovarov
Hi Kevin, What is network throughput btw 1. NFS server and client machine? 2. client machine and dananodes? Alex On Feb 13, 2015 5:29 AM, "Kevin" wrote: > Hi, > > I am setting up a Hadoop cluster (CDH5.1.3) and I need to copy a thousand > or so files into HDFS, which totals roughly 1 TB. The c

Re: Kerberos Security in Hadoop

2015-02-18 Thread Alexander Pivovarov
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SecureMode.html https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Managing_Smart_Cards/Configuring_a_Kerberos_5_Server.html On Wed, Feb 18, 2015 at 4:49 PM, Krish Donald wrote: > Hi, > > Has anyb

Re: Kerberos Security in Hadoop

2015-02-18 Thread Alexander Pivovarov
anager to setup your > cluster. > > On Wed, Feb 18, 2015 at 4:51 PM, Alexander Pivovarov > wrote: > >> >> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SecureMode.html >> >> >> https://access.redhat.com/documentation/en-US/R

Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?

2015-03-05 Thread Alexander Pivovarov
What is the easiest way to assign names to aws ec2 computers? I guess computer need static hostname and dns name before it can be used in hadoop cluster. On Mar 5, 2015 4:36 PM, "Jonathan Aquilina" wrote: > When I started with EMR it was alot of testing and trial and error. HUE > is already supp

Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?

2015-03-05 Thread Alexander Pivovarov
I dont know how you would do that to be honest. With EMR you have > destinctions master core and task nodes. If you need to change > configuration you just ssh into the EMR master node. > > > > --- > Regards, > Jonathan Aquilina > Founder Eagle Eye T > > On 2015-03

Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?

2015-03-05 Thread Alexander Pivovarov
be honest. With EMR you have >> destinctions master core and task nodes. If you need to change >> configuration you just ssh into the EMR master node. >> >> >> >> --- >> Regards, >> Jonathan Aquilina >> Founder Eagle Eye T >> >> O

Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?

2015-03-05 Thread Alexander Pivovarov
gt; On Thu, Mar 5, 2015 at 8:47 PM, Alexander Pivovarov > wrote: > >> what about DNS? >> if you have 2 computers (nn and dn) how nn knows dn ip? >> >> The script puts only this computer ip to /etc/hosts >> >> On Thu, Mar 5, 2015 at 6:39 PM, max scalf wrot

Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?

2015-03-05 Thread Alexander Pivovarov
ll bootstrap > action (script) to have it distribute esri to the entire cluster. > > Why are you guys reinventing the wheel? > > > > --- > Regards, > Jonathan Aquilina > Founder Eagle Eye T > > On 2015-03-06 03:35, Alexander Pivovarov wrote: > >I

Re: sorting in hive -- general

2015-03-07 Thread Alexander Pivovarov
sort by query produces multiple independent files. order by - just one file usually sort by is used with distributed by. In older hive versions (0.7) they might be used to implement local sort within partition similar to RANK() OVER (PARTITION BY A ORDER BY B) On Sat, Mar 7, 2015 at 3:02 PM, ma

Re: sorting in hive -- general

2015-03-08 Thread Alexander Pivovarov
stributed by and expect same result as > order by ? > > On Sat, Mar 7, 2015 at 7:05 PM, Alexander Pivovarov > wrote: > >> sort by query produces multiple independent files. >> >> order by - just one file >> >> usually sort by is used with distribute

Re: how to load data

2015-04-29 Thread Alexander Pivovarov
1. Create external textfile hive table pointing to /extract/DBCLOC and specify CSVSerde if using hive-0.14 and newer use this https://cwiki.apache.org/confluence/display/Hive/CSV+Serde if hive-0.13 and older use https://github.com/ogrodnek/csv-serde You do not even need to unzgip the file. hive a

Re: how to load data

2015-04-30 Thread Alexander Pivovarov
Follow the links I sent you already. On Apr 30, 2015 11:52 AM, "Kumar Jayapal" wrote: > Hi Alex, > > How to create "external textfile hive table pointing to /extract/DBCLOC and > specify CSVSerde" ? > > Thanks > Jay > > On Wed, Apr 29, 2015 a

Re: How to move back to .gz file from hive to hdfs

2015-04-30 Thread Alexander Pivovarov
Try to find the file in hdfs trash On Apr 30, 2015 2:14 PM, "Kumar Jayapal" wrote: > Hi, > > I loaded one file to hive table it is in .gz extension. file is > moved/deleted from hdfs. > > when I execute select command I get an error. > > Error: Error while processing statement: FAILED: Execution

Re: How to move back to .gz file from hive to hdfs

2015-04-30 Thread Alexander Pivovarov
try desc formatted ; it shows you table location on hdfs On Thu, Apr 30, 2015 at 2:43 PM, Kumar Jayapal wrote: > I did not find it in .Trash file is moved to hive table I want to move it > back to hdfs. > > On Thu, Apr 30, 2015 at 2:20 PM, Alexander Pivovarov > wrote: > >&

Re: how to load data

2015-05-01 Thread Alexander Pivovarov
n file. will it show once I load > it to parque table? > > Please let me know if I am doing anything wrong. > > I appreciate your help. > > > Thanks > jay > > > > Thank you very much for you help Alex, > > > On Wed, Apr 29, 2015 at 3:43 PM, Alexander P

query uses WITH blocks and throws exception if run as Oozie hive action (hive-0.13.1)

2015-05-15 Thread Alexander Pivovarov
Hi Everyone I'm using hive-0.13.1 (HDP-2.1.5) and getting the following stacktrace if run my query (which has WITH block) via Oozie. (BTW, the query works fine in CLI) I can't put exact query but the structure is similar to create table my_consumer as with sacusaloan as (select distinct e,f,

Re: query uses WITH blocks and throws exception if run as Oozie hive action (hive-0.13.1)

2015-05-15 Thread Alexander Pivovarov
Looks like it was fixed in hive-0.14 https://issues.apache.org/jira/browse/HIVE-7079 On Fri, May 15, 2015 at 2:26 PM, Alexander Pivovarov wrote: > Hi Everyone > > I'm using hive-0.13.1 (HDP-2.1.5) and getting the following stacktrace > if run my query (which has WITH block)

Re: query uses WITH blocks and throws exception if run as Oozie hive action (hive-0.13.1)

2015-05-15 Thread Alexander Pivovarov
about UDTF and Kryo issues fixed after 0.13.1? On Fri, May 15, 2015 at 3:20 PM, Alexander Pivovarov wrote: > Looks like it was fixed in hive-0.14 > https://issues.apache.org/jira/browse/HIVE-7079 > > On Fri, May 15, 2015 at 2:26 PM, Alexander Pivovarov > wrote: > >> Hi Ev

Re: query uses WITH blocks and throws exception if run as Oozie hive action (hive-0.13.1)

2015-05-15 Thread Alexander Pivovarov
, Alexander Pivovarov wrote: > I also noticed another error message in logs > > 10848 [main] ERROR org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor - > Status: Failed > 10849 [main] ERROR org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor - > Vertex failed, vertexNam

Re: How do I integrate Hadoop app development with Eclipse IDE?

2015-05-20 Thread Alexander Pivovarov
1. create pom.xml for your project 2. add hadoop dependencies which you need 3. $ mvn eclipse:eclipse -DdownloadSources=true -DdownloadJavadocs=true 4. import existing java project to eclipse On Wed, May 20, 2015 at 5:31 PM, Caesar Samsi wrote: > Hello, > > > > I’m embarking on my first tutoria

What settings I need to access remove HA cluster.

2015-06-29 Thread Alexander Pivovarov
Hi Everyone I have 2 HA clusters "mydev" and "myqa" I want to have an ability to access hdfs://myqa/ paths from mydev cluster boxes. What settings should I add to mydev hdfs-site.xml so that hadoop can resolve "myqa" HA alias to active NN? Thank you Alex

Re: fs.s3a.endpoint not working

2016-01-14 Thread Alexander Pivovarov
Add jets3t.properties file with s3service.s3-endpoint= to /etc/hadoop/conf folder The folder with the file should be in HADOOP_CLASSPATH JetS3t library which is used by hadoop is looking for this file. On Dec 22, 2015 12:39 PM, "Phillips, Caleb" wrote: > Hi All, > > New to this list. Looking fo

Re: fs.s3a.endpoint not working

2016-01-14 Thread Alexander Pivovarov
http://www.jets3t.org/toolkit/configuration.html On Jan 14, 2016 10:56 AM, "Alexander Pivovarov" wrote: > Add jets3t.properties file with s3service.s3-endpoint= to > /etc/hadoop/conf folder > > The folder with the file should be in HADOOP_CLASSPATH > > JetS3t librar