Re: Need urgent suggestion on the below issue

2012-06-11 Thread Jonathan Seidman
Matt – changing the DNS resolved the Hive errors, but led to other issues
which I'm afraid I can't remember right now. I just remember the change
broke something else, so the best course seemed to be to fix the metadata.
This of course doesn't mean you'll hit the same issue, but on the other
hand if you're using say MySQL as the metastore, you can fix the metadata
with a couple of simple queries.

On Mon, Jun 11, 2012 at 5:19 PM, Matthew Byrd  wrote:

> Hi Jon,
>
> I've just encountered the same issue.
> I was wondering if you would be so kind as to elaborate, on why you'd be
> best off manipulating the metadata as opposed to trying to manipulate the
> DNS?
>
> I had a go at having the Namenode use a dns alias Namenode, then the hive
> metadata did indeed contain this alias rather than a hostname.
> So when I changed to a new Namendoe and changed the alias as well
> everything seemed to work fine.
>
> I'm just wondering if there isn't something bad lurking underneath this
> approach?
> Is using dns aliases for Namnode//Jobtracker something that people in the
> Hadoop world do? or frown upon?
> Can anyone see any potential problems with this approach?
> Maybe I should be posting this to hadoop-common?
>
> Thanks in advance,
> Matt
>
>
> On Wed, May 9, 2012 at 7:11 PM, Jonathan Seidman <
> jonathan.seid...@gmail.com> wrote:
>
>> Varun – So yes, Hive stores the full URI to the NameNode in the metadata
>> for every table and partition. From my experience you're best off modifying
>> the metadata to point to the new NN, as opposed to trying to manipulate
>> DNS. Fortunately, this is fairly straightforward since there's mainly one
>> column you need to modify, and assuming you're using something like MySQL
>> will only require a global search-and-replace on the URI in this column. I
>> don't remember the exact table that contains this info, but if you browse
>> the metastore tables you should find a LOCATION column which contains the
>> NN URI that you need to change.
>>
>>
>> On Wed, May 9, 2012 at 11:14 AM, varun kumar  wrote:
>>
>>> Hi All,
>>>
>>> I have changed the namenode from one server to another when there was a
>>> crash of hardware.
>>>
>>> After configuring the Namenode.
>>>
>>> When i am executing hive query below error is shown..
>>>
>>> bin/hive -e “insert overwrite table pokes select a.* from invites a
>>> where a.ds=’2008-08-15′;”
>>> Hive history
>>> file=/tmp/Bhavesh.Shah/hive_job_log_Bhavesh.Shah_201112021007_2120318983.txt
>>> Total MapReduce jobs = 2
>>> Launching Job 1 out of 2
>>> Number of reduce tasks is set to 0 since there’s no reduce operator
>>> Starting Job = job_201112011620_0004, Tracking URL =
>>> http://x.x.x.b:50030/jobdetails.jsp?jobid=job_201112011620_0004<http://localhost:50030/jobdetails.jsp?jobid=job_201112011620_0004>
>>> Kill Command = C:\cygwin\home\Bhavesh.Shah\hadoop-0.20.2\/bin/hadoop job
>>> -Dmapred.job.tracker=localhost:9101 -kill job_201112011620_0004
>>> 2011-12-02 10:07:30,777 Stage-1 map = 0%, reduce = 0%
>>> 2011-12-02 10:07:57,796 Stage-1 map = 100%, reduce = 100%
>>> Ended Job = job_201112011620_0004 with errors
>>> FAILED: Execution Error, return code 2 from
>>> org.apache.hadoop.hive.ql.exec.MapRedTask
>>>
>>> I have noticed that it is trying to communicate with the old host.I am
>>> unable to trouble shoot where  i have done  wrong  in building the hadoop
>>> Namenode.
>>>
>>> Can you please suggest me why hive is not able to communicate to the new
>>> Name node.
>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Varun Kumar.P
>>>
>>>
>>
>


Re: Need urgent suggestion on the below issue

2012-05-09 Thread Jonathan Seidman
Varun – So yes, Hive stores the full URI to the NameNode in the metadata
for every table and partition. From my experience you're best off modifying
the metadata to point to the new NN, as opposed to trying to manipulate
DNS. Fortunately, this is fairly straightforward since there's mainly one
column you need to modify, and assuming you're using something like MySQL
will only require a global search-and-replace on the URI in this column. I
don't remember the exact table that contains this info, but if you browse
the metastore tables you should find a LOCATION column which contains the
NN URI that you need to change.

On Wed, May 9, 2012 at 11:14 AM, varun kumar  wrote:

> Hi All,
>
> I have changed the namenode from one server to another when there was a
> crash of hardware.
>
> After configuring the Namenode.
>
> When i am executing hive query below error is shown..
>
> bin/hive -e “insert overwrite table pokes select a.* from invites a where
> a.ds=’2008-08-15′;”
> Hive history
> file=/tmp/Bhavesh.Shah/hive_job_log_Bhavesh.Shah_201112021007_2120318983.txt
> Total MapReduce jobs = 2
> Launching Job 1 out of 2
> Number of reduce tasks is set to 0 since there’s no reduce operator
> Starting Job = job_201112011620_0004, Tracking URL =
> http://x.x.x.b:50030/jobdetails.jsp?jobid=job_201112011620_0004
> Kill Command = C:\cygwin\home\Bhavesh.Shah\hadoop-0.20.2\/bin/hadoop job
> -Dmapred.job.tracker=localhost:9101 -kill job_201112011620_0004
> 2011-12-02 10:07:30,777 Stage-1 map = 0%, reduce = 0%
> 2011-12-02 10:07:57,796 Stage-1 map = 100%, reduce = 100%
> Ended Job = job_201112011620_0004 with errors
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapRedTask
>
> I have noticed that it is trying to communicate with the old host.I am
> unable to trouble shoot where  i have done  wrong  in building the hadoop
> Namenode.
>
> Can you please suggest me why hive is not able to communicate to the new
> Name node.
>
>
>
>
> --
> Regards,
> Varun Kumar.P
>
>


Re: How to get a flat file out of a table in Hive

2012-03-06 Thread Jonathan Seidman
Farah – The easiest way to dump data to a file is with a query like the
following:

hive> INSERT OVERWRITE LOCAL DIRECTORY 'DIRECTORY_NAME' SELECT * from
TABLE_NAME;

The drawback of this is that Hive uses ^A as the separator by default. In
the past what I found easiest was to just run a simple sed script on the
output file to replace the ^A's with another character. For example:

sed -e 's/\A/,/g' FILE > FILE.NEW

That's from memory, so I'm not guaranteeing the syntax of the sed command,
but I think it's basically correct.

Jonathan

On Tue, Mar 6, 2012 at 10:32 AM, Omer, Farah wrote:

>  Whats the easiest way to get a flat file out from a table in Hive?
>
> I have a table in HIVE, that has millions of rows. I want to get a dump of
> this table out in flat file format, and it should be comma separated.
>
> Anyone knows the syntax to do it?
>
> Thanks for the help!
>
> *Farah Omer*
>
> *Senior DB Engineer, MicroStrategy, Inc.*
> T: 703 2702230
> E: *fo...@microstrategy.com* 
> *http://www.microstrategy.com* 
>
>
>
>


Re: How to load a table from external server....

2012-03-06 Thread Jonathan Seidman
Farah – can you configure the remote server as a client machine? You would
just need to  install Hadoop with a configuration pointing to your cluster,
and then install Hive. You'd then be able to execute all Hive commands
against your cluster. Note that you won't run any daemons on this node, so
you want to make sure that none of the Hadoop processes are getting started.

Jonathan

On Thu, Mar 1, 2012 at 10:20 AM, Omer, Farah wrote:

>  Hello,
>
> Could anybody tell me how can I load data into a Hive table when the flat
> file is existing on another server and bit locally on Hadoop node.
>
> For example, I am trying to load the table LU_CUSTOMER, and the flat file
> for this table exists on some other RH linux server: 10.11.12.13. The
> LU_CUSTOMER flat file is about 30 GB in size, hence if I move it locally to
> the Hadoop node, that will take a long time. I am trying to avoid this
> loading onto Hadoop node part.
> So I wonder if there is a way to load the table directly from the other
> server.
>
> The syntax that I know currently is: LOAD DATA LOCAL INPATH
> '/home/nzdata/CLOUD/SCRIPT/LU_CUSTOMER.txt' OVERWRITE INTO TABLE
> LU_CUSTOMER;
>
> But in case I want to load from the other server directly, the path won’t
> be local.
>
> Any suggestions? Is that possible….
>
> Thanks.
>
> *Farah Omer*
>
> *Senior DB Engineer, MicroStrategy, Inc.*
> T: 703 2702230
> E: *fo...@microstrategy.com* 
> *http://www.microstrategy.com* 
>
>
>
>


Re: HiveR

2012-02-13 Thread Jonathan Seidman
Are you actually referring to RHive: https://github.com/nexr/RHive/wiki? If
so it looks like a very interesting project, but I haven't talked to anyone
actually using it yet. If it looks like a good fit for your particular
applications then the best thing would be to install and start working with
it.

On Sun, Feb 12, 2012 at 5:18 AM, Dalia Sobhy wrote:

> Do anyone have any idea about hiveR??
>
> Sent from my iPhone
>


Re: "Path Is Not Legal" when loading HDFS->S3

2011-09-26 Thread Jonathan Seidman
Hey Bradford - from my experience that error occurs when there's a conflict
between the "default.fs.name" setting and the value in the
metastore.SDS.location column in the Hive metadata. For us this has occurred
when either migrating to a new cluster or changing the NN hostname. Not sure
how all this works with AWS/EMR, but that's the first thing I'd check.

Jonathan

On Mon, Sep 26, 2011 at 5:16 PM, Bradford Stephens <
bradfordsteph...@gmail.com> wrote:

> Hey amigos,
>
> I'm doing a EMR load for HDFS to S3 data. My example looks correct,
> but I'm getting an odd error. Since all the EMR data is in one
> directory, I'm copying the file to HDFS, then doing 'LOAD DATA INPATH'
> to put it back into S3.
>
> CREATE TABLE events(
> ..blahblah...
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> LOCATION 's3://outputdir/table_out/events'
> ;
>
> LOAD DATA INPATH '/user/hadoop/eos/events_20110107.csv.gz' overwrite
> INTO TABLE events;
>
> The error I get is:
> FAILED: Error in semantic analysis: line 3:17 Path is not legal
> '/user/hadoop/eos/events_20110430.csv.gz': Move from:
>
> hdfs://domU-12-31-39-14-19-F1.compute-1.internal:9000/user/hadoop/eos/events_20110430.csv.gz
> to: s3://outputdir/table_out/events is not valid. Please check that
> values for params "default.fs.name" and "hive.metastore.warehouse.dir"
> do not conflict.
>
> This is EMR, and I've checked the params and see they do not conflict.
>
>
> --
> Bradford Stephens,
> CEO and Founder, Drawn to Scale
> http://drawntoscale.com
> (530) 763-DATA
>
> http://www.drawntoscale.com -- Spire, the "Heroku for Big Data"
>